Roundup of Python NLP Libraries

The purpose of this post is to gather into a list, the most important libraries in the Python NLP libraries ecosystem. This list is important because Python is by far the most popular language for doing Natural Language Processing. This list is constantly updated as new libraries come into existence. In case you are looking for a list of useful corpora, check out this NLP corpora list

General Purpose

Name Functionalities Notes URL
NLTK tokenization, POS, NER, classification, sentiment analysis, access to corpora Maybe the best known Python NLP Library. Not entirely suited for production environments but really good for getting started GitHub
spaCy tokenization, POS, NER, classification, sentiment analysis, dependency parsing, word vectors Efficient and performant NLP Library built with Cython for speed GitHub
Gensim topic modelling, word vectors, access to corpora Perfomant topic modelling library GitHub
Stanford NLP tokenization, POS, NER, classification, word vectors The famous Stanford CoreNLP Library GitHub
Flair tokenization, POS, NER, dependency parsing A very simple framework for state-of-the-art NLP GitHub
TextBlob tokenization, POS, NER, classification, sentiment analysis, spellcheck, parsing Pythonic library built upon NLTK and Pattern GitHub
Polyglot tokenization, POS, NER, classification, sentiment analysis, spellcheck, parsing Library focusing on multilingual NLP. Models available for most languages. GitHub
Pattern tokenization, POS, NER, sentiment analysis, parsing General purpose framework similar in purpose to NLTK GitHub
ScikitLearn classification General purpose machine learning framework with text classification features GitHub
SkLearn CRF sequence tagging Sequence tagging classifiers following the ScikitLearn API GitHub
Ambiverse NLU NER, Concept Extraction A Natural Language Understanding suite by Max Planck Institute for Informatics GitHub
Textacy tokenization, POS, NER, sentiment analysis, parsing, corpora access, topic modelling, statistics High level library built on top of spaCy GitHub
thinc high-level deep learning models spaCy’s deep learning infrastructure GitHub
NLPNet POS, parsing, SRL Neural models for POS tagging, dependency parsing, semantic role labelling GitHub
finetune Classification, Entailment, Sequence Tagging Scikit-learn style model finetuning for NLP GitHub


Name Functionalities URL
JamSpell spellcheck GitHub
PySpellchecker spellcheck GitHub
PyEnchant spellcheck GitHub

Based on PyTorch

Name Functionalities Notes URL
PyText built-in neural models NLP framework built on top of PyTorch from Facebook Research GitHub
PyTorch-NLP build neural models, corpora access Simple high level framework built on top of PyTorch GitHub
torchtext corpora access Load text data for processing with PyTorch GitHub
AllenNLP SRL, Question Answering, Entailment State-of-the-art deep learning models on a wide variety of linguistic tasks GitHub

Visualizing Text

Name Functionalities Notes URL
Scattertext visualization Perform visual exploratory text analysis GitHub
word_cloud visualization Draw word clouds GitHub


Name Functionalities Notes URL
SnipsNLU NLU engine Pretrained NLU models available GitHub
Rasa NLU NLU engine NLU Engine that can use pretrained spaCy or mitie models GitHub
DeepPavlov NLU Engine, Dialog System Open Source conversational AI library GitHub


Name Functionalities Notes URL
Splitta sentence boundary detection Statistical models for sentence boundary detection GitHub
chardet encoding detection Universal Character Encoding Detector GitHub
vocabulary synonims, dictionary Dictionary as a module GitHub
langdetect language detection well … it detects the language a text is written in 🙂 GitHub