This is probably the first post I should have written on the blog. The thing is, I did machine learning and natural language processing for a long time before putting the concepts in order inside my own mind. I’ve learned techniques and hacks to boost precision of classifiers before fully understanding how a classifier computes […]
The most direct definition of the task is: “Does a text express a positive or negative sentiment?”. Usually, we assign a polarity value to a text. This value is usually in the [-1, 1] interval, 1 being very positive, -1 very negative.
In a previous article, we studied training a NER (Named-Entity-Recognition) system from the ground up, using the Groningen Meaning Bank Corpus. This article is a continuation of that tutorial. The main purpose of this extension to training a NER is to: Replace the classifier with a Scikit-Learn Classifier Train a NER on a larger subset […]
If you have been working with NLTK for some time now, you probably find the task of preprocessing the text a bit cumbersome. In this post, I will walk you through a simple and fun approach for performing repetitive tasks using coroutines. The coroutines concept is a pretty obscure one but very useful indeed. You […]
What is chunking Text chunking, also referred to as shallow parsing, is a task that follows Part-Of-Speech Tagging and that adds more structure to the sentence. The result is a grouping of the words in “chunks”. Here’s a quick example:
NER, short for Named Entity Recognition is probably the first step towards information extraction from unstructured text. It basically means extracting what is a real world entity from the text (Person, Organization, Event etc …).
Stackoverflow is full of questions about why stemmers and lemmatizers don’t work as expected. The root cause of the confusion is that their role is often misunderstood. Here’s a comparison:
Wordnet is a lexical database created at Princeton University. Its size and several properties it holds make Wordnet one of the most useful tools you can have in your NLP arsenal. Here are a few properties that make Wordnet so useful:
It’s common in the world on Natural Language Processing to need to compute sentence similarity. Wordnet is an awesome tool and you should always keep it in mind when working with text. It’s of great help for the task we’re trying to tackle. Suppose we have these sentences:
Part-Of-Speech tagging (or POS tagging, for short) is one of the main components of almost any NLP analysis. The task of POS-tagging simply implies labelling words with their appropriate Part-Of-Speech (Noun, Verb, Adjective, Adverb, Pronoun, …).