Stackoverflow is full of questions about why stemmers and lemmatizers don’t work as expected. The root cause of the confusion is that their role is often misunderstood. Here’s a comparison: Both stemmers and lemmatizers try to bring inflected words to the same form Stemmers use an algorithmic approach of removing prefixes and suffixes. The result […]
Simple recipe for text clustering. This sometimes creates issues in scikit-learn because text has sparse features.
Text classification is most probably, the most encountered Natural Language Processing task. It can be described as assigning texts to an appropriate bucket. A sports article should go in SPORT_NEWS, and a medical prescription should go in MEDICAL_PRESCRIPTIONS. To train a text classifier, we need some annotated data. This training data can be obtained through […]
In this example I want to show how to use some of the tools packed in NLTK to build something pretty awesome. Inverted indexes are a very powerful tool and is one of the building blocks of modern day search engines. While building the inverted index, you’ll learn to: 1. Use a stemmer from NLTK […]