Building a simple inverted index using NLTK

In this example I want to show how to use some of the tools packed in NLTK to build something pretty awesome. Inverted indexes are a very powerful tool and is one of the building blocks of modern day search engines.

While building the inverted index, you’ll learn to:
1. Use a stemmer from NLTK
2. Filter words using a stopwords list
3. Tokenize text
How to convert between verb/noun/adjective/adverb forms using Wordnet

You might have stumbled in your NLP application development upon situations when you needed to get the “closest” adjective to a noun, or maybe you needed to “nounify” a verb. After poking around Wordnet I found a simple and pretty effective way to do this. Keep in mind that it is not error proof, but for most of my needs, I found it to perform pretty well. We’ll be using NLTK Wordnet wrapper for this. Let’s have a look at the code:
