Here is a collection of different Python libraries for natural language processing (NLP) which can be invaluable for rapid prototyping.
Sentence Transformers – https://github.com/UKPLab/sentence-transformers
BERT/XLNET produces rather bad sentence embeddings out-of-the-box . This library helps you produce your own sentence embeddings tuned for your specific task. This would be useful for anything to do with semantic textual similarity, clustering and semantic search.
Rule-based Text Sentiment for Social Media
VaderSentiment – https://github.com/cjhutto/vaderSentiment
While deep learning models are cool, rule-based models still have their place under the sun. Especially when you don’t have a lot of data and time to tune your model. The library describes itself as specifically attuned to sentiments expressed in social media. This means that emoticons and sentiment intensity markers (e.g. “!!!”) are taken into account.
Named Entity Recognition
SpaCy is a popular open-source library which can be used for production. Apart from the default entities, spaCy also gives us the liberty to add arbitrary classes to the NER model, by training the model to update it with newer trained examples. This blog post shows how.
Production-Ready BERT Models
BERT-as-a-Service – https://github.com/hanxiao/bert-as-service
BERT-as-a-Service wraps the BERT code and serves it using ZeroMQ, allowing one to serve BERT embeddings with just a few lines of code which is fast (optimised), scalable and reliable.
Analysis of Tweets on the Hong Kong Protest Movement 2019 with Python demonstrates the use of the Vader tool.