Roundup of Python NLP Libraries

The purpose of this post is to gather into a list, the most important libraries in the Python NLP libraries ecosystem. This list is important because Python is by far the most popular language for doing Natural Language Processing. This list is constantly updated as new libraries come into existence. In case you are looking for a list of useful corpora, check out this NLP corpora list

General Purpose

NameFunctionalitiesNotesURL
NLTK tokenization, POS, NER, classification, sentiment analysis, access to corporaMaybe the best known Python NLP Library. Not entirely suited for production environments but really good for getting started GitHub
spaCy tokenization, POS, NER, classification, sentiment analysis, dependency parsing, word vectorsEfficient and performant NLP Library built with Cython for speed GitHub
Gensim topic modelling, word vectors, access to corporaPerfomant topic modelling library GitHub
Stanford NLP tokenization, POS, NER, classification, word vectorsThe famous Stanford CoreNLP Library GitHub
Flair tokenization, POS, NER, dependency parsingA very simple framework for state-of-the-art NLP GitHub
TextBlob tokenization, POS, NER, classification, sentiment analysis, spellcheck, parsingPythonic library built upon NLTK and Pattern GitHub
Polyglot tokenization, POS, NER, classification, sentiment analysis, spellcheck, parsingLibrary focusing on multilingual NLP. Models available for most languages. GitHub
Pattern tokenization, POS, NER, sentiment analysis, parsingGeneral purpose framework similar in purpose to NLTK GitHub
ScikitLearn classificationGeneral purpose machine learning framework with text classification features GitHub
SkLearn CRF sequence taggingSequence tagging classifiers following the ScikitLearn API GitHub
Ambiverse NLUNER, Concept ExtractionA Natural Language Understanding suite by Max Planck Institute for Informatics GitHub
Textacytokenization, POS, NER, sentiment analysis, parsing, corpora access, topic modelling, statisticsHigh level library built on top of spaCy GitHub
thinchigh-level deep learning modelsspaCy’s deep learning infrastructure GitHub
NLPNetPOS, parsing, SRLNeural models for POS tagging, dependency parsing, semantic role labelling GitHub
finetuneClassification, Entailment, Sequence TaggingScikit-learn style model finetuning for NLP GitHub

SpellChecking

NameFunctionalitiesURL
JamSpellspellcheck GitHub
PySpellcheckerspellcheck GitHub
PyEnchantspellcheck GitHub

Based on PyTorch

NameFunctionalitiesNotesURL
PyTextbuilt-in neural modelsNLP framework built on top of PyTorch from Facebook Research GitHub
PyTorch-NLPbuild neural models, corpora accessSimple high level framework built on top of PyTorch GitHub
torchtextcorpora accessLoad text data for processing with PyTorch GitHub
AllenNLPSRL, Question Answering, EntailmentState-of-the-art deep learning models on a wide variety of linguistic tasks GitHub

Visualizing Text

NameFunctionalitiesNotesURL
ScattertextvisualizationPerform visual exploratory text analysis GitHub
word_cloudvisualizationDraw word clouds GitHub

Chatbots

NameFunctionalitiesNotesURL
SnipsNLUNLU enginePretrained NLU models available GitHub
Rasa NLUNLU engineNLU Engine that can use pretrained spaCy or mitie models GitHub
DeepPavlovNLU Engine, Dialog SystemOpen Source conversational AI library GitHub

Miscellaneous

NameFunctionalitiesNotesURL
Splittasentence boundary detectionStatistical models for sentence boundary detection GitHub
chardetencoding detectionUniversal Character Encoding Detector GitHub
vocabularysynonims, dictionaryDictionary as a module GitHub
langdetectlanguage detectionwell … it detects the language a text is written in 🙂 GitHub