Natural Language Processing

Enabling computers to understand and process human languages
– get computers closer to a human-level understanding of language.

About Natural Language Processing

The ultimate goal of Natural Language Processing (NLP) is to enable computers to understand language as well as we do. It is the driving force behind things like virtual assistants, speech recognition, sentiment analysis, automatic text summarization, machine translation and much more. Some real-life examples of NLP techniques include not only Voice Assistants like Alexa and Siri but also things like Machine Translation and text-filtering.

Common Tools and Libraries

NLTK

NLTK is a suite of open source Python modules, data sets and tutorials supporting research and development in Natural Language Processing. NLTK offers Lexical Corpus Integration(WordNet, Stopwords, etc), Tokenization, Sentiment Analysis capabilities.

Developer's Resource:
http://www.nltk.org/api/nltk.html#

OpenNLP

OpenNLP supports the most common NLP tasks, such as tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, chunking, parsing, language detection and coreference resolution.

Developer's Resource:
http://opennlp.apache.org/source-code.html

spaCy

spaCy excels at large-scale information extraction tasks. It's written from the ground up in carefully memory-managed Cython. spaCy is the best way to prepare text for deep learning. It interoperates seamlessly with TensorFlow, PyTorch, scikit-learn, Gensim and the rest of Python's awesome AI ecosystem.

Developer's Resource:
https://spacy.io/usage/spacy-101

TextBlob

TextBlob is a Python library for processing textual data. It provides a simple API for diving into common natural language processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, classification, translation and more.

Developer's Resource:
https://textblob.readthedocs.io/en/dev

Microsoft Azure Text Analytics API

Microsoft Azure Text Analytics API can be used to extract information from text such as the language, sentiment, key phrases and entities.

Developer's Resource:
https://azure.microsoft.com/en-us/services/cognitive-services/text-analytics/

PyTorch NLP

PyTorch-NLP is a library for Natural Language Processing (NLP) in Python. PyTorch-NLP comes with pre-trained embeddings, samplers, dataset loaders, metrics, neural network modules and text encoders.

Developer's Resource:
https://pytorchnlp.readthedocs.io/en/latest/

100E Use Cases

  1. A dating agency uses NLP models to match the couples and make them understand each other better
    Technologies used: spaCy
  2. A management company builds new2risk models for commodities forecasting using alternative data source
    Technologies Used: spaCy
  3. An investment company uses NLP models to identify the specific type of documents from a collection of documents
    Technologies Used: spaCy

Open Datasets

GDELT Project

World's broadcast, print, and web news in over 100 languages.

Yelp Reviews

Businesses, reviews, and user data for use in personal, educational, and academic purposes.

WordNet

Lexical database of nouns, verbs, adjectives and adverbs grouped into sets of cognitive synonyms.

Blogger Corpus

Collected posts of 19,320 bloggers gathered from blogger.com.

Wikipedia Links Data

Data from web pages that contain at least one hyperlink that points to English Wikipedia.

Conversational Dataset

Collection of large datasets for conversational response selection from Reddit, OpenSubtitles and Amazon QA.

Related Articles