The purpose of this post is to gather into a list, the most important libraries in the Python NLP libraries ecosystem. This list is important because Python is by far the most popular language for doing Natural Language Processing. This list is constantly updated as new libraries come into existence. In case you are looking […]
Deep Learning is one of those hyper-hyped subjects that everybody is talking about and everybody claims they’re doing. In certain cases, startups just need to mention they use Deep Learning and they instantly get appreciation. Deep Learning is indeed a powerful technology, but it’s not an answer to every problem. It’s also not magic like […]
Introduction We talked briefly about word embeddings (also known as word vectors) in the spaCy tutorial. SpaCy has word vectors included in its models. This tutorial will go deep into the intricacies of how to compute them and their different applications.
A while back I wrote a Complete guide for training your own Part-Of-Speech Tagger. If you are new to Part-Of-Speech Tagging (POS Tagging) make sure you follow that tutorial first. This article is more of an enhancement of the work done there. What is a CRF? A Conditional Random Field (CRF for short) is a […]
Updates 29-Apr-2018 – Fixed import in extension code (Thanks Ruben) spaCy is a relatively new framework in the Python Natural Language Processing environment but it quickly gains ground and will most likely become the de facto library. There are some really good reasons for its popularity:
What is Topic Modeling? Topic modelling, in the context of Natural Language Processing, is described as a method of uncovering hidden structure in a collection of texts. Although that is indeed true it is also a pretty useless definition. Let’s define topic modeling in more practical terms.
What are Word Clouds? Word Clouds are a popular way of displaying how important words are in a collection of texts. Basically, the more frequent the word is, the greater space it occupies in the image. One of the uses of Word Clouds is to help us get an intuition about what the collection of […]
The task of summarization is a classic one and has been studied from different perspectives. The task consists of picking a subset of a text so that the information disseminated by the subset is as close to the original text as possible. The subset, named the summary, should be human readable. The task is not […]
If you come from a statistical background or a machine learning one then probably you don’t need any reasons for why it’s useful to build language models. If not, here’s what language models are and why they are useful.
One of the reasons why it’s so hard to learn, practice and experiment with Natural Language Processing is due to the lack of available corpora. Building a gold standard corpus is seriously hard work. That’s why resources are so scarce or cost a lot of money. In this post, I’m going to aggregate some cool […]