The AI Singapore’s AI Apprenticeship Programme (AIAP) is one of AI Singapore’s popular programme, with on average 120-160 applicants everytime we open for an intake. It is also rumoured to be very hard to get into, according to friendly sources from the ground.
This article will share what it takes to get into the AIAP, and map out a training roadmap that someone who is keen to join the AIAP should pursue. As mentioned in my earlier article *Excuse me, are you a Singaporean AI Engineer?”, the AIAP is a programme where you get to deepskill and work on a real-world AI project over 9 months. It is not a re-skilling programme where you come in to train and learn Python.
Do note that our expectations of what an Artificial Intelligence (AI) Engineer or Machine Learning (ML) Engineer will do go beyond just ML modeling. The AI Engineer is expected to ingest data, do feature engineering, build, train and test the model, and lastly deploy it at scale with ability of the model to be retrained and refined whenever required.
What we look for in an AI Apprentice?
One of the most valued traits in our AI Apprentices, is that all of them are self-starters. We are looking for individuals who are self-directed learners and keen to learn data science (DS), AI, and ML from everyone and everywhere, and anyhow.
They are curious, they search, hunt and dig to learn. They do not ask “tell me how?” or “where can I go to learn?”.
Oh, and if you are one of those that need to pay money to attend a classroom to learn about Python programming, then you are also likely not someone we will be keen on.
Recommended Learning Journey
Here is a 12-months intensive roadmap to help you in your journey to prepare for the AI Apprentice entrance test if you are keen to join the programme, otherwise, you can also use this for your own AI/ML learning journey. We assume you are starting with at least the following:
- You have some programming background, if not, please pick up few good Python books or you can review the Python tutorial here https://docs.python.org/3/tutorial/index.html
- You understand what a database is and can run some basic SQL queries, if not, you may wish to review w3schools.com SQL tutorials here: https://www.w3schools.com/sql/sql_intro.asp
- You should be comfortable using public cloud services like AWS, Google Cloud and Azure.
12-months AI/ML Learning Roadmap Overview
|2||Statistical Learning, Machine Learning and Deep Learning|
|a1||Statistical Learning @ Stanford||X||X||X|
|a2||The Data Science Design Manual Course||X||X||X|
|b||Intel AI Academy||X||X||X||X||X||X|
|c||Azure Machine Learning Service||X|
|3||Spark Big Data Platform||X|
|4||Kaggle competition (for some near real world experience)||X||X||X||X|
Recommended Book List
I have not put a link or book image as some of these books gets updated often.
- Python Crash Course, 2nd Edition: A Hands-On, Project-Based Introduction to Programming by Eric Matthes
- Learning Python by Mark Lutz
- Fluent Python: Clear, Concise, and Effective Programming by Luciano Ramalho
- The Hundred-Page Machine Learning Book by Andriy Burkov
- The Data Science Design Manual by Steven Skiena
- Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython by Wes McKinney
- Python Data Science Handbook: Essential Tools for Working with Data by Jake VanderPlas
- Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems by Aurelien Geron
- Deep Learning with Python by Francois Chollet
AI (Beyond Machine Learning):
- Artificial Intelligence: A Modern Approach by Stuart Russell
- On Intelligence: How a New Understanding of the Brain Will Lead to the Creation of Truly Intelligent Machines by Jeff Hawkins
- Weapons of Math Destruction by Cathy O’Neil
- Rise of the Robots: Technology and the Threat of a Jobless Future by Martin Ford
AI and Ethics
A good friend reminded me we need to ensure AI and Ethics is part of this learning journey. Here are some recommended reading materials:
- Singapore’s IMDA AI Ethics and Governance framework (See https://www2.imda.gov.sg/infocomm-media-landscape/SGDigital/tech-pillars/Artificial-Intelligence ). Download PDF.
- AI Singapore is leveraging this framework as part of our 100E projects engineering best practice.
- Over the next 5 years, AI Singapore ‘s 100E programme would have delivered around 100 AI projects, and we will use this opportunity to help refine the framework providing useful real-world feedback to the team.
- Weapons of Math Destruction by Cathy O’Neil is a recommended read.
- Microsoft’s take on AI Ethics
- AI Principals: https://www.microsoft.com/en-us/AI/our-approach-to-ai
- FATE: https://www.microsoft.com/en-us/research/group/fate/
- Building responsible Bots: https://www.microsoft.com/en-us/research/publication/responsible-bots/
- The Future Computed by Microsoft (free eBook)
Detailed Training Curriculum
1. Software Engineering
As mentioned, an AI Engineer does a lot more than just modeling. So the below list of reading materials and courses would help prepare you for the software engineering side of things. We have chosen Azure for practical reasons, you can opt to use your favorite public cloud provider courses.
|1||What is Git?||2||https://docs.microsoft.com/en-us/azure/devops/learn/git/what-is-git|
|2||Welcome to Agile||2||https://docs.microsoft.com/en-us/azure/devops/learn/learn-agile|
|3||Develop Windows 10 applications||5||https://docs.microsoft.com/en-us/learn/paths/develop-windows10-apps/|
|4||Azure Fundamentals – Cloud Computing||9||https://docs.microsoft.com/en-us/learn/paths/azure-fundamentals/|
|5||Azure for the Data Engineer||2||https://docs.microsoft.com/en-us/learn/paths/azure-for-the-data-engineer/|
|6||Work with relational data in Azure||4||https://docs.microsoft.com/en-us/learn/paths/work-with-relational-data-in-azure/|
|7||Work with NoSQL data in Azure Cosmos DB||6||https://docs.microsoft.com/en-us/learn/paths/work-with-nosql-data-in-azure-cosmos-db/|
|8||Administer containers in Azure||4||https://docs.microsoft.com/en-us/learn/paths/administer-containers-in-azure/|
|9||Run Docker containers with Azure Container Instances||2||https://docs.microsoft.com/en-us/learn/modules/run-docker-with-azure-container-instances/|
Expect to spend 3-4 hours a week on the above for the next 8-12 weeks (2-3 months).
2. Statistical Learning, Machine Learning and Deep Learning
Here you can opt to choose either (a1) or (a2). Both are excellent, you should choose one only. It boils down to whose style you prefer.
a1) Statistical Learning @ Stanford
A good introduction that will provide you a solid foundation is Hastie and Tibshirani’s Statistical Learning at Stanford. It is a self-paced course with video and free PDF book – An Introduction to Statistical Learning, with Applications in R by James, Witten, Hastie and Tibshirani (Springer, 2013).
Course description from the website:
This is an introductory-level course in supervised learning, with a focus on regression and classification methods. The syllabus includes: linear and polynomial regression, logistic regression and linear discriminant analysis; cross-validation and the bootstrap, model selection and regularization methods (ridge and lasso); nonlinear models, splines and generalized additive models; tree-based methods, random forests and boosting; support-vector machines. Some unsupervised learning methods are discussed: principal components and clustering (k-means and hierarchical).
This is not a math-heavy class, so we try and describe the methods without heavy reliance on formulas and complex mathematics. We focus on what we consider to be the important elements of modern data analysis. Computing is done in R. There are lectures devoted to R, giving tutorials from the ground up, and progressing with more detailed sessions that implement the techniques in each chapter.
Note that the above course is in R. The AIAP is Python heavy, but this should not stop you from learning the basics and some R along the way.
The authors recommend you spend 3-5 hours per week on each chapter. There are 10 chapters in the course.
Expect to spend 8 - 12 weeks (2-3 months) to complete this section.
a2) The Data Science Design Manual
Another very good introduction and foundation course is by Professor Steven Skiena who is the Distinguished Teaching Professor of Computer Science at Stony Brook University. His course and book The Data Science Design Manual provides an excellent introduction with interesting war stories.
Additional resources including data sets for projects and assignments can be found at the website of the book.
From the book website:
The Data Science Design Manual serves as an introduction to data science, focusing on the skills and principles needed to build systems for collection, analyzing, and interpreting data. As a discipline data science sits at the intersection of statistics, computer science, and machine learning, but it is building a distinct heft and character of its own.
The book covers enough material for an “Introduction to Data Science” course at the undergraduate or early graduate student levels.
The course is designed to provide a hands-on introduction to Data Science by challenging student groups to build predictive models for upcoming events, and validating their models against the actual outcomes.
The video and lecture slides are from my Fall 2016 Data Science (CSE 519) course.
Expect to spend 8 - 12 weeks (2-3 months) to complete this section.
b) Intel AI Academy
Intel have developed a set of structured courses with hands-on exercises which everyone can follow. It is free. You just need a PC or laptop. No GPU is required to complete the hands-on exercises.
The courses are targeted at graduate level and is meant for software developers, data scientists and students. Each course is for a 1-semester module in a typical university and lasts between 8 – 12 weeks and requires 3 hours per week of your time. It is self-paced, so you can finish it faster if you are able to.
You can access the Intel AI Academy here: https://software.intel.com/en-us/ai/courses
|Intel AI Academy (3 hours per week)||Hours|
|Introduction to AI ( 8 weeks)||24|
|Machine Learning (12 weeks)||36|
|Deep Learning (12 weeks)||36|
|Natural Language Processing ( 8 weeks)||24|
|Computer Vision (8 weeks)||24|
|Time-Series Analysis (8 weeks)||24|
We recommend you should aim to finish the above within 4-8 months.
c) Azure Machine Learning Service (optional)
We also recommend that you be familiar with an AI service from a cloud provider like Azure. From the Azure Machine Learning Service page:
Step by Step Tutorials: https://docs.microsoft.com/en-in/azure/machine-learning/service/
Azure Machine Learning service provides SDKs and services to quickly prep data, train, and deploy machine learning models. Improve productivity and costs with autoscaling compute & pipelines. Use these capabilities with open-source Python frameworks, such as PyTorch, TensorFlow, and scikit-learn. Get started with our quickstarts and tutorials.
Learn how to prep data, set up experiments, train and operationalize your machine learning models.
Use the Azure portal, an easy-to-use, no code interface for machine learning:
- Train your first automated machine learning experiment.
- Train your first experiment in visual interface”s drag and drop UI.
Use the Python SDK in scripts and notebooks for machine learning:
- Set up your environment & train your first ML experiment.
- Train & deploy image classification models.
- Prepare data and use automated ML to predict taxi fares.
- Run batch predictions on large data sets with ML pipelines.
You may also wish to complete the following modules to be familiar with Azure Cognitive API services: https://docs.microsoft.com/en-us/learn/browse/?roles=ai-engineer&products=azure
Plan to spend 2 - 3 weekends on the above.
3. Spark Big Data Platform (optional)
Spark and in particular Databricks is a popular platform for large scale Machine Learning.
Databricks have 9 online courses and each course takes between 3-6 hours to complete. The cost is US$75/course. It is for 1 user for 12 months access. You can access their online learning platform here: https://academy.databricks.com/category/self-paced
|1||SP800-Az: Getting Started with Apache Spark DataFrames (Azure Databricks)|
|2||SP805-Az: Getting Started with Apache Spark SQL (Azure Databricks)|
|3||SP820-Az: ETL Part 1: Data Extraction (Azure Databricks)|
|4||SP821-Az: ETL Part 2: Transformations and Loads (Azure Databricks)|
|5||SP822-Az: ETL Part 3: Production (Azure Databricks) –|
|6||SP840-Az: Managed Delta Lake (Azure Databricks)|
|7||SP850-Az: Structured Streaming (Azure Databricks)|
|8||SP860-Az: Introduction to Data Science and Machine Learning (Azure Databricks)|
|9||SP863-Az: MLflow: Managing the Machine Learning Lifecycle on Azure Databricks|
Plan to spend 4 - 5 weekends on the above.
4. Putting Knowledge into Practice
Practice makes perfect, so it is also essential to work on some small scale AI/ML projects instead of just limiting yourself to the textbooks and online exercises. A good place to start will be Kaggle (https://www.kaggle.com). There are many competitions there for you to try and some even have nicely written user submitted solutions for reference.
Attempt one or more competitions (can be past competitions) to help you gain some experience in working on an AI/ML project. Try different machine learning techniques to solve the problem as it will give you a deeper insight and a better understanding of each technique used. Additionally, always remember to structure your code efficiently as this you help cultivate good coding habits!
You should be ready to tackle some of the Kaggle competition towards your 7th or 8th month.
Plan to spend 2- 4 months on the above even as you continue the rest of your modules.
Except for the Databricks courses, all the courses listed above are FREE. The recommended books is an investment in your education.
Of course the above is not the only way to learn. Some of you prefer learning only from books (like me), that is perfectly fine. Some like to watch online tutorials or University lectures on YouTube or attend MOOC and no books – please go ahead.
Some of you may be surprised, that is a lot you need to know even before you become an AI Apprentice!
As mentioned right at the beginning, the AIAP is a deep-skilling programme and provides the apprentice with an opportunity to work on a real-world AI/ML problem statement with all the common issues faced in the real world, from incomplete and dirty datasets to difficult requests from project sponsors, and yet at the same time, constant pressure to deliver at every Sprint and produce a working end to end model and solution within 9-months.
We cannot train you to be an AI Engineer in 9-months if you do not already have the required foundation. We need you to acquire that foundation on your own, and we help to polish you up in 9-months so that you can land a role as an AI/ML Engineer or Scientist when you graduate from the programme.