So you want to become an AI Apprentice?

Share on facebook
Share on twitter
Share on linkedin
Share on email
Share on whatsapp
Share on telegram
Share on pocket

The AI Singapore’s AI Apprenticeship Programme (AIAP) is one of AI Singapore’s popular programme, with on average 120-160 applicants everytime we open for an intake. It is also rumoured to be very hard to get into, according to friendly sources from the ground.

This article will share what it takes to get into the AIAP, and map out a training roadmap that someone who is keen to join the AIAP should pursue. As mentioned in my earlier article *Excuse me, are you a Singaporean AI Engineer?”, the AIAP is a programme where you get to deepskill and work on a real-world AI project over 9 months. It is not a re-skilling programme where you come in to train and learn Python.

Do note that our expectations of what an Artificial Intelligence (AI) Engineer or Machine Learning (ML) Engineer will do go beyond just ML modeling. The AI Engineer is expected to ingest data, do feature engineering, build, train and test the model, and lastly deploy it at scale with ability of the model to be retrained and refined whenever required.

What we look for in an AI Apprentice?

One of the most valued traits in our AI Apprentices, is that all of them are self-starters. We are looking for individuals who are self-directed learners and keen to learn data science (DS), AI, and ML  from everyone and everywhere, and anyhow. 

They are curious, they search, hunt and dig to learn. They do not ask “tell me how?” or “where can I go to learn?”.

Oh, and if you are one of those that need to pay money to attend a classroom to learn about Python programming, then you are also likely not someone we will be keen on.

Recommended Learning Journey

Here is a 12-months intensive roadmap to help you in your journey to prepare for the AI Apprentice entrance test if you are keen to join the programme, otherwise, you can also use this for your own AI/ML learning journey. We assume you are starting with at least the following:

  1. You have some programming background, if not, please pick up few good Python books or you can review the Python tutorial here https://docs.python.org/3/tutorial/index.html
  2. You understand what a database is and can run some basic SQL queries, if not, you may wish to review w3schools.com SQL tutorials here: https://www.w3schools.com/sql/sql_intro.asp
  3. You should be comfortable using public cloud services like AWS, Google Cloud and Azure.

12-months AI/ML Learning Roadmap Overview

SNPROGRAMME MODULES123456789101112
1Software EngineeringXX









2Statistical Learning, Machine Learning and Deep Learning











a1Statistical Learning @ Stanford

XXX






a2The Data Science Design Manual CourseXXX
bIntel AI Academy


XXXXXX

cAzure Machine Learning Service









X
3Spark Big Data Platform










X
4Kaggle competition (for some near real world experience)







XXXX

Recommended Book List
I have not put a link or book image as some of these books gets updated often.

Learning Python:

  1. Python Crash Course, 2nd Edition: A Hands-On, Project-Based Introduction to Programming by Eric Matthes
  2. Learning Python by Mark Lutz
  3. Fluent Python: Clear, Concise, and Effective Programming by Luciano Ramalho

Machine Learning:

  1. The Hundred-Page Machine Learning Book by Andriy Burkov
  2. The Data Science Design Manual by Steven Skiena
  3. Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython by Wes McKinney
  4. Python Data Science Handbook: Essential Tools for Working with Data by Jake VanderPlas
  5. Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems by Aurelien Geron
  6. Deep Learning with Python by Francois Chollet

AI (Beyond Machine Learning):

  1. Artificial Intelligence: A Modern Approach by Stuart Russell

General AI:

  1. On Intelligence: How a New Understanding of the Brain Will Lead to the Creation of Truly Intelligent Machines by Jeff Hawkins
  2. Weapons of Math Destruction by Cathy O’Neil
  3. Rise of the Robots: Technology and the Threat of a Jobless Future by Martin Ford

AI and Ethics

A good friend reminded me we need to ensure AI and Ethics is part of this learning journey. Here are some recommended reading materials:

  1. Singapore’s IMDA AI Ethics and Governance framework (See https://www2.imda.gov.sg/infocomm-media-landscape/SGDigital/tech-pillars/Artificial-Intelligence ). Download PDF.
    • AI Singapore is leveraging this framework as part of our 100E projects engineering best practice.
    • Over the next 5 years, AI Singapore ‘s 100E programme would have delivered around 100 AI projects, and we will use this opportunity to help refine the framework providing useful real-world feedback to the team.
  2. Weapons of Math Destruction by Cathy O’Neil is a recommended read.
  3. Microsoft’s take on AI Ethics

Detailed Training Curriculum

1. Software Engineering

As mentioned, an AI Engineer does a lot more than just modeling. So the below list of reading materials and courses would help prepare you for the software engineering side of things. We have chosen Azure for practical reasons, you can opt to use your favorite public cloud provider courses.

SNCourseHoursLink
1What is Git?2https://docs.microsoft.com/en-us/azure/devops/learn/git/what-is-git
2Welcome to Agile2https://docs.microsoft.com/en-us/azure/devops/learn/learn-agile
3Develop Windows 10 applications5https://docs.microsoft.com/en-us/learn/paths/develop-windows10-apps/
4Azure Fundamentals – Cloud Computing9https://docs.microsoft.com/en-us/learn/paths/azure-fundamentals/
5Azure for the Data Engineer2https://docs.microsoft.com/en-us/learn/paths/azure-for-the-data-engineer/
6Work with relational data in Azure4https://docs.microsoft.com/en-us/learn/paths/work-with-relational-data-in-azure/
7Work with NoSQL data in Azure Cosmos DB6https://docs.microsoft.com/en-us/learn/paths/work-with-nosql-data-in-azure-cosmos-db/
8Administer containers in Azure4https://docs.microsoft.com/en-us/learn/paths/administer-containers-in-azure/
9Run Docker containers with Azure Container Instances2https://docs.microsoft.com/en-us/learn/modules/run-docker-with-azure-container-instances/
Expect to spend 3-4 hours a week on the above for the next 8-12 weeks (2-3 months).

2. Statistical Learning, Machine Learning and Deep Learning

Here you can opt to choose either (a1) or (a2). Both are excellent, you should choose one only. It boils down to whose style you prefer.

a1) Statistical Learning @ Stanford

A good introduction that will provide you a solid foundation is Hastie and Tibshirani’s Statistical Learning at Stanford. It is a self-paced course with video and free PDF book – An Introduction to Statistical Learning, with Applications in R by James, Witten, Hastie and Tibshirani (Springer, 2013). 

Course description from the website:

This is an introductory-level course in supervised learning, with a focus on regression and classification methods. The syllabus includes: linear and polynomial regression, logistic regression and linear discriminant analysis; cross-validation and the bootstrap, model selection and regularization methods (ridge and lasso); nonlinear models, splines and generalized additive models; tree-based methods, random forests and boosting; support-vector machines. Some unsupervised learning methods are discussed: principal components and clustering (k-means and hierarchical).

This is not a math-heavy class, so we try and describe the methods without heavy reliance on formulas and complex mathematics. We focus on what we consider to be the important elements of modern data analysis. Computing is done in R. There are lectures devoted to R, giving tutorials from the ground up, and progressing with more detailed sessions that implement the techniques in each chapter.

Note that the above course is in R. The AIAP is Python heavy, but this should not stop you from learning the basics and some R along the way.

The authors recommend you spend 3-5 hours per week on each chapter. There are 10 chapters in the course.

Expect to spend 8 - 12 weeks (2-3 months) to complete this section.

a2) The Data Science Design Manual

Another very good introduction and foundation course is by Professor Steven Skiena who is the Distinguished Teaching Professor of Computer Science at Stony Brook University. His course and book The Data Science Design Manual  provides an excellent introduction with interesting war stories.

Additional resources including data sets for projects and assignments can be found at the website of the book.

From the book website:

The Data Science Design Manual serves as an introduction to data science, focusing on the skills and principles needed to build systems for collection, analyzing, and interpreting data. As a discipline data science sits at the intersection of statistics, computer science, and machine learning, but it is building a distinct heft and character of its own.

The book covers enough material for an “Introduction to Data Science” course at the undergraduate or early graduate student levels.

The course is designed to provide a hands-on introduction to Data Science by challenging student groups to build predictive models for upcoming events, and validating their models against the actual outcomes.

The video and lecture slides are from my Fall 2016 Data Science (CSE 519) course.

Expect to spend 8 - 12 weeks (2-3 months) to complete this section.

b) Intel AI Academy

Intel have developed a set of structured courses with hands-on exercises which everyone can follow. It is free. You just need a PC or laptop. No GPU is required to complete the hands-on exercises.

The courses are targeted at graduate level and is meant for software developers, data scientists and students.  Each course is for a 1-semester module in a typical university and lasts between 8 – 12 weeks and requires 3 hours per week of your time. It is self-paced, so you can finish it faster if you are able to.

You can access the Intel AI Academy here: https://software.intel.com/en-us/ai/courses

Intel AI Academy (3 hours per week)Hours
Introduction to AI ( 8 weeks)24
Machine Learning (12 weeks)36
Deep Learning (12 weeks)36
Natural Language Processing ( 8 weeks)24
Computer Vision (8 weeks)24
Time-Series Analysis (8 weeks)24
We recommend you should aim to finish the above within 4-8 months.

c) Azure Machine Learning Service (optional)

We also recommend that you be familiar with an AI service from a cloud provider like Azure. From the Azure Machine Learning Service page:

Step by Step Tutorials: https://docs.microsoft.com/en-in/azure/machine-learning/service/

Azure Machine Learning service provides SDKs and services to quickly prep data, train, and deploy machine learning models. Improve productivity and costs with autoscaling compute & pipelines. Use these capabilities with open-source Python frameworks, such as PyTorch, TensorFlow, and scikit-learn. Get started with our quickstarts and tutorials.

Learn how to prep data, set up experiments, train and operationalize your machine learning models.

Use the Azure portal, an easy-to-use, no code interface for machine learning:

Use the Python SDK in scripts and notebooks for machine learning:

You may also wish to complete the following modules to be familiar with Azure Cognitive API services: https://docs.microsoft.com/en-us/learn/browse/?roles=ai-engineer&products=azure

Plan to spend 2 - 3 weekends on the above.

3. Spark Big Data Platform (optional)

Spark and in particular Databricks is a popular platform for large scale Machine Learning. 

Databricks have 9 online courses and each course takes between 3-6 hours to complete. The cost is US$75/course. It is for 1 user for 12 months access. You can access their online learning platform here: https://academy.databricks.com/category/self-paced

SNCourse
1SP800-Az: Getting Started with Apache Spark DataFrames (Azure Databricks)
2SP805-Az: Getting Started with Apache Spark SQL (Azure Databricks)
3SP820-Az: ETL Part 1: Data Extraction (Azure Databricks)
4SP821-Az: ETL Part 2: Transformations and Loads (Azure Databricks)
5SP822-Az: ETL Part 3: Production (Azure Databricks) –
6SP840-Az: Managed Delta Lake (Azure Databricks)
7SP850-Az: Structured Streaming (Azure Databricks)
8SP860-Az: Introduction to Data Science and Machine Learning (Azure Databricks)
9SP863-Az: MLflow: Managing the Machine Learning Lifecycle on Azure Databricks

Plan to spend 4 - 5 weekends on the above.

4. Putting Knowledge into Practice

Practice makes perfect, so it is also essential to work on some small scale AI/ML projects instead of just limiting yourself to the textbooks and online exercises. A good place to start will be Kaggle (https://www.kaggle.com). There are many competitions there for you to try and some even have nicely written user submitted solutions for reference.

Attempt one or more competitions (can be past competitions) to help you gain some experience in working on an AI/ML project. Try different machine learning techniques to solve the problem as it will give you a deeper insight and a better understanding of each technique used. Additionally, always remember to structure your code efficiently as this you help cultivate good coding habits!

You should be ready to tackle some of the Kaggle competition towards your 7th or 8th month.

Plan to spend 2- 4 months on the above even as you continue the rest of your modules.

Conclusion

Except for the Databricks courses, all the courses listed above are FREE.  The recommended books is an investment in your education. 

Of course the above is not the only way to learn.  Some of you prefer learning only from books (like me), that is perfectly fine. Some like to watch online tutorials or University lectures on YouTube or attend MOOC and no books – please go ahead.

Some of you may be surprised, that is a lot you need to know even before you become an AI Apprentice!

As mentioned right at the beginning, the AIAP is a deep-skilling programme and provides the apprentice with an opportunity to work on a real-world AI/ML problem statement with all the common issues faced in the real world, from incomplete and dirty datasets to difficult requests from project sponsors, and yet at the same time, constant pressure to deliver at every Sprint and produce a working end to end model and solution within 9-months.

We cannot train you to be an AI Engineer in 9-months if you do not already have the required  foundation. We need you to acquire that foundation on your own, and we help to polish you up in 9-months so that you can land a role as an AI/ML Engineer or Scientist when you graduate from the programme.

Happy learning!

Leave a Reply

Please Login to comment

  Subscribe  
Notify of
Previous

Excuse me, are you a Singaporean AI Engineer?