Speech Recognition

Sub-field of computational linguistics that develops methodologies and technologies that enables recognition and translation of spoken language into text by computers

About Speech Recognition

Speech recognition helps you convert audio to text. This speech engine recognises and transcribes English, Mandarin and Singlish. Our codeswitching technology is uniquely catered for Singlish as it can recognise conversations comprising words from different languages (mix of English and Mandarin). This speech engine can be further customized for various use cases in different domains. Tip: This speech engine performs best in quiet environment. For optimal results, users should use a ‘close-talk’ microphone and any speech to be transcribed should be as
conversational as possible.

This was developed in collaboration with AI Singapore’s Speech Lab, led by Professor Li Haizhou (National University of Singapore) and Associate Professor Chng Eng Siong (Nanyang Technological University).

Key Features

Recognise unique Singlish conversations

Understand mix of English and Mandarin within the same sentence. Also recognises local terms such as landmarks and road names.

Easily customized for your domain

Can be retrained with domain-specific audio.

Ability to provide on-premise solution

Catered for users with sensitive data.

Try it for free

Examples of Use Cases

  1. Automatic transcription
    – Speech to text for Call Centres
    – Interviews
    – Medical consultations
  2. Chatbot and Digital Assistants
    – Transcribing voice commands

Click here to find the documentation

Common Tools and Libraries

AI Speech Lab

AI Singapore (AISG) has set up an AI Speech Lab to develop a speech recognition system that could interpret and process the unique vocabulary used by Singaporeans – including Singlish and dialects – in conversations.

SpeechLab technology is available as a service for both batch and near-real-time processing. Please contact AI Singapore for further information.


Kaldi is an open source toolkit made for dealing with speech data. it’s being used in voice-related applications mostly for speech recognition but also for other tasks — like speaker recognition and speaker diarisation.

Kaldi GStreamer: https://github.com/jcsilva/docker-kaldi-gstreamer-server


Porcupine is a self-service, highly-accurate, and lightweight wake word (voice control) engine. It enables developers to build always-listening voice-enabled applications/platforms.

Developer's Resource: https://github.com/Picovoice/Porcupine ​


Speech-to-text conversion powered by machine learning and available for short-form or long-form audio.

Developer's Resource: https://cloud.google.com/speech-to-text/

Azure Cognitive Services

Create apps, websites and bots with intelligent algorithms to see, hear, speak, understand and interpret your user needs through natural methods of communication.

Developer's Resource: https://azure.microsoft.com/en-us/services/cognitive-services/​


Open source implementation of end-to-end Automatic Speech Recognition (ASR) engine, based on Baidu's Deep Speech 2 paper, with PaddlePaddle platform.

Developer's Resource: https://github.com/PaddlePaddle/DeepSpeech

Open Datasets

National Speech Corpus

Contains 2,000 hours of locally accented audio and text transcriptions.

Free Spoken Digit Dataset

A simple audio/speech dataset consisting of recordings of spoken digits in wav files at 8kHz.


Dataset consists of a large-scale corpus of around 1000 hours of English speech.

The Spoken Wikipedia Corpora

Corpus of aligned spoken Wikipedia articles from the English, German, and Dutch Wikipedia.


A collection of recordings of 630 speakers of American English.

Google Audioset

Large-scale collection of human-labeled 10-second sound clips drawn from YouTube videos.

Related Articles

  1. How to start with Kaldi and Speech Recognition
    Link to article: https://towardsdatascience.com/how-to-start-with-kaldi-and-speech-recognition-a9b7670ffff6
  2. Simple guide to Kaldi – an efficient open source speech recognition tool for extreme beginners
    Link to article: https://medium.com/@nikhilamunipalli/simple-guide-to-kaldi-an-efficient-open-source-speech-recognition-tool-for-extreme-beginners-98a48bb34756
  3. Creating voice assistant for games tutorial for Fifa
    Link to articlehttps://towardsdatascience.com/creating-voice-assistant-for-games-tutorial-for-fifa-71cfbe428bd1