Menu Sidebar
Menu

Recipe: Text classification using NLTK and scikit-learn

Text classification is most probably, the most encountered Natural Language Processing task. It can be described as assigning texts to an appropriate bucket. A sports article should go in SPORT_NEWS, and a medical prescription should go in MEDICAL_PRESCRIPTIONS.

To train a text classifier, we need some annotated data. This training data can be obtained through several methods. Suppose you want to build a spam classifier. You would export the contents of your mailbox. You’d label the email in the inbox folder as NOT_SPAM and the contents of your spam folder as SPAM.
Read More

Building a simple inverted index using NLTK

In this example I want to show how to use some of the tools packed in NLTK to build something pretty awesome. Inverted indexes are a very powerful tool and is one of the building blocks of modern day search engines.

While building the inverted index, you’ll learn to:
1. Use a stemmer from NLTK
2. Filter words using a stopwords list
3. Tokenize text
Read More

How to convert between verb/noun/adjective/adverb forms using Wordnet

You might have stumbled in your NLP application development upon situations when you needed to get the “closest” adjective to a noun, or maybe you needed to “nounify” a verb. After poking around Wordnet I found a simple and pretty effective way to do this. Keep in mind that it is not error proof, but for most of my needs, I found it to perform pretty well. We’ll be using NLTK Wordnet wrapper for this. Let’s have a look at the code:
Read More

Newer Posts

NLP-FOR-HACKERS

I’m writing a book!

NLP-FOR-HACKERS Book

Pin It on Pinterest

Sign up for the Newsletter

Here's what to expect:

* Newly published content

* Curated articles from around the web about NLP and related

* Absolutely NO SPAM

You have Successfully Subscribed!