Introduction We talked briefly about word embeddings (also known as word vectors) in the spaCy tutorial. SpaCy has word vectors included in its models. This tutorial will go deep into the intricacies of how to compute them and their different applications.
Updates 29-Apr-2018 – Fixed import in extension code (Thanks Ruben) spaCy is a relatively new framework in the Python Natural Language Processing environment but it quickly gains ground and will most likely become the de facto library. There are some really good reasons for its popularity:
The task of summarization is a classic one and has been studied from different perspectives. The task consists of picking a subset of a text so that the information disseminated by the subset is as close to the original text as possible. The subset, named the summary, should be human readable. The task is not […]
NLTK (Natural Language ToolKit) is the most popular Python framework for working with human language. There’s a bit of controversy around the question whether NLTK is appropriate or not for production environments. Here’s my take on the matter:
Wordnet is a lexical database created at Princeton University. Its size and several properties it holds make Wordnet one of the most useful tools you can have in your NLP arsenal. Here are a few properties that make Wordnet so useful:
It’s common in the world on Natural Language Processing to need to compute sentence similarity. Wordnet is an awesome tool and you should always keep it in mind when working with text. It’s of great help for the task we’re trying to tackle. Suppose we have these sentences: