## Classification Performance Metrics

Throughout this blog, we seek to obtain good performance on our classification tasks. Classification is one of the most popular tasks in Machine Learning. Be sure you understand what classification is before going through this tutorial. You can check this Introduction to Machine Learning, specially created for hackers.

## Is it a boy or a girl? An introduction to Machine Learning

Have you ever noticed what happens when you hear a name you haven’t heard before? You automatically put it in a bucket, the girl names bucket or the boy names bucket. In this tutorial, we’re getting started with machine learning. We’ll be building a classifier able to distinguish between boy and girl names. If this sounds interesting read along. If you expect a tonne of intricate math, read along. It’s easier and more fun than you think.

## Splitting text into sentences

Few people realise how tricky splitting text into sentences can be. Most of the NLP frameworks out there already have English models created for this task.

You might encounter issues with the pretrained models if:

## What is Natural Language Processing?

This is probably the first post I should have written on the blog. The thing is, I did machine learning and natural language processing for a long time before putting the concepts in order inside my own mind.

I’ve learned techniques and hacks to boost precision of classifiers before fully understanding how a classifier computes its weights or whatever. So I guess it makes sense to publish a general introductory post after some real hands-on posts.

Here’s a popular diagram used to describe what data science usually implies:

## Getting Started with Sentiment Analysis

The most direct definition of the task is: “Does a text express a positive or negative sentiment?”. Usually, we assign a polarity value to a text. This value is usually in the [-1, 1] interval, 1 being very positive, -1 very negative.

## Training a NER System Using a Large Dataset

In a previous article, we studied training a NER (Named-Entity-Recognition) system from the ground up, using the Groningen Meaning Bank Corpus. This article is a continuation of that tutorial. The main purpose of this extension to training a NER is to:

1. Replace the classifier with a Scikit-Learn Classifier
2. Train a NER on a larger subset of the training data
3. Increase accuracy
4. Understand Out Of Core Learning

What was wrong with the initial system you might ask. There wasn’t anything fundamentally wrong with the process. In fact, it’s a great didactical example, and we can build upon it. This is where it was lacking:

## Building a NLP pipeline in NLTK

If you have been working with NLTK for some time now, you probably find the task of preprocessing the text a bit cumbersome. In this post, I will walk you through a simple and fun approach for performing repetitive tasks using coroutines. The coroutines concept is a pretty obscure one but very useful indeed. You can check out this awesome presentation by David Beazley to grasp all the stuff needed to get you through this (plus much, much more).

## What is chunking

Text chunking, also referred to as shallow parsing, is a task that follows Part-Of-Speech Tagging and that adds more structure to the sentence. The result is a grouping of the words in “chunks”. Here’s a quick example:

## Complete guide to build your own Named Entity Recognizer with Python

• 29-Apr-2018 – Added Gist for the entire code

NER, short for Named Entity Recognition is probably the first step towards information extraction from unstructured text. It basically means extracting what is a real world entity from the text (Person, Organization, Event etc …).

## Stemmers vs. Lemmatizers

Stackoverflow is full of questions about why stemmers and lemmatizers don’t work as expected. The root cause of the confusion is that their role is often misunderstood. Here’s a comparison:

## Popular Posts

### Privacy Preference Center

• Warning: reset() expects parameter 1 to be array, string given in /home/bogdani/webapps/nlpforhackers/wp-content/plugins/gdpr/public/partials/privacy-preferences-modal.php on line 32

• Warning: Invalid argument supplied for foreach() in /home/bogdani/webapps/nlpforhackers/wp-content/plugins/gdpr/public/partials/privacy-preferences-modal.php on line 36

Warning: Invalid argument supplied for foreach() in /home/bogdani/webapps/nlpforhackers/wp-content/plugins/gdpr/public/partials/privacy-preferences-modal.php on line 71

Your account will be closed and all data will be permanently deleted and cannot be recovered. Are you sure?

### Are you sure?

By disagreeing you will no longer have access to our site and will be logged out.