Stemmers vs. Lemmatizers
Stackoverflow is full of questions about why stemmers and lemmatizers don’t work as expected. The root cause of the confusion is that their role is often misunderstood. Here’s a comparison:
Stackoverflow is full of questions about why stemmers and lemmatizers don’t work as expected. The root cause of the confusion is that their role is often misunderstood. Here’s a comparison:
Simple recipe for text clustering. This sometimes creates issues in scikit-learn because text has sparse features.
Text classification is most probably, the most encountered Natural Language Processing task. It can be described as assigning texts to an appropriate bucket. A sports article should go in SPORT_NEWS, and a medical prescription should go in MEDICAL_PRESCRIPTIONS. To train a text classifier, we need some annotated data. This training data can be obtained through […]
In this example I want to show how to use some of the tools packed in NLTK to build something pretty awesome. Inverted indexes are a very powerful tool and is one of the building blocks of modern day search engines. While building the inverted index, you’ll learn to: 1. Use a stemmer from NLTK […]