Data Engineering at AI Singapore

Share on facebook
Share on twitter
Share on linkedin
Share on email
Share on whatsapp
Share on telegram
Share on pocket

The newly formed Data Engineering team at AI Singapore has plans to refine data management practices in tandem with the growth in number of projects

Toward a Common Data Platform

The AI Innovation team at AI Singapore has evolved from a few staff to a strong collection of engineering teams in the last two years. We’ve already delivered several successful AI-based solutions to organisations of different types : government agencies, SMEs and multi-national corporations. Building AI solutions depends on data – a large amount of data. Data is the key project asset. Our engineering teams operate on various data formats : image, video, text, csv, etc. As our organisation has grown, so has the number of ongoing parallel projects. While we have put processes in place, the management of project data has primarily been the responsibility of the project manager and the technical leader in the team.

Data is an asset that must be managed and engineered to deliver value to end users.

Recently some of our senior engineers determined that we needed to move forward on a data platform programme that has been waiting in the wings for a while now. Our platforms team has also been deploying additional hardware at our site over the last few months which enables our senior technical team to think more broadly about evolving our systems architecture. The newly formed Data Engineering team will architect a common data platform that supports our AI engineers and facilitates efficient delivery of solutions.

Identifying Our Needs

At a high level, the tactical needs that our AI engineers have expressed include:

  • Data should be transferred between our external stakeholders and our project teams in a simple, secure fashion which provides tracking and notification.
  • Engineers should be able to view the holistic picture of the raw data sets and the data products and other artefacts that were created by downstream processes such as data cleaning, filtering and feature engineering.
  • The frameworks deployed and processes implemented should be oriented toward modern engineering practices and AI-oriented solutions.
  • Simple and efficient access to all types of data from interactive notebooks or processing pipelines.

Additionally, senior technical staff has expressed strategic needs that include:

  • Clear data governance practices of our current data inventory.
  • Data provenance and data versioning to enable reproducibility.
  • A common data model for metadata management.
  • ‘Right size’ our data platform : scope the effort and timeline to available resources while still addressing the challenges listed above.

There are many additional perspectives on data platforms on the internet. It seems that many are from vendors advocating for their products or describe a platform for a specific industry or purpose such as a customer data platform. As further reading, the descriptions provided here are more vendor agnostic:

The Next Steps

We have existing tools in place, both internally developed and open source, that currently support our engineers. Some were created opportunistically by a project team to address a specific need. A few open source frameworks were deployed to support certain types of projects. The technical leadership from the engineering groups are reviewing the current state. We may need to simply augment a tool, make more teams aware of how to employ it, or build processes that facilitate adoption. For gaps in our toolset, we will survey the open source and commercial solutions available for data storage, processing, governance and other tasks. The goal is an integrated data platform that supports each stage of the AI lifecycle : data collection, annotation, exploration, feature engineering, experimentation, evaluation and deployment.

As we begin to think about the scope of that effort and our near term priorities the data engineering team at AI Singapore will share our challenges and discoveries with the wider engineering community in Singapore and beyond. Hopefully, other data engineers can benefit from our experience. We also invite you to leave a comment if you have anything to share.

Author

  • Maurice is the Head of AI Applications at AI Singapore. He leads the design of applications for the 100 Experiments programme in partnership with local enterprises and government organisations. Having led development teams in both North America and Asia, he has extensive experience in software development, systems design and data integration, and has delivered solutions across a wide range of industries including biotech and pharmaceutical, aerospace and finance. Prior to joining AI Singapore, he worked at University of California where he was a systems architect on the cyber-infrastructure team at the NSF Ocean Observatories Initiative and subsequently as the systems architect on PATH, an intelligent transportation research program at University of California, Berkeley in collaboration with the California Department of Transportation.

  • A 20-year veteran in tech startups and MNCs, Najib focuses on High- Performance Computing (HPC ) as well as Cloud, Data and Artificial Intelligence (AI). He has led engineering teams in several organisations, some of which were startups that were acquired or exited successfully. He has helped build several of the first generation HPC cluster systems and infrastructure in Singapore and the region. He was also a lecturer for NUS School of Continuing and Lifelong Education (NUS SCALE) where he conducted workshops on Reproducible Data Science, Data Engineering and Conversational AI bots (Chatbots). He currently heads the AI Platforms Engineering team in the Industry Innovation Pillar at AI Singapore (AISG) where his team focuses on building the AI infrastructure and platforms for researchers, engineers and collaborators to solve challenging problems.

  • Software, machine learning, AI technologist. Enjoys exploring new uses of technology just to make life a little easier, for all.

  • building software, data pipeline, science and cutting edge technology enthusiast

Share on facebook
Share on twitter
Share on linkedin
Share on email
Share on whatsapp
Share on telegram
Share on pocket

Leave a Comment

Previous

Scaling Speech Lab Offline

My Not-So-Insane Career Leap: Chasing AI Dreams From My HR Job

Next