Menu Sidebar
Menu

Language models

If you come from a statistical background or a machine learning one then probably you don’t need any reasons for why it’s useful to build language models. If not, here’s what language models are and why they are useful.
Read More

Natural Language Processing Corpora

Natural Language Processing Corpora

One of the reasons why it’s so hard to learn, practice and experiment with Natural Language Processing is due to the lack of available corpora. Building a gold standard corpus is seriously hard work. That’s why resources are so scarce or cost a lot of money. In this post, I’m going to aggregate some cool resources, some very well known, some a bit under the radar.
Read More

Introduction to Python NLTK

Introduction to NLTK

NLTK (Natural Language ToolKit) is the most popular Python framework for working with human language. There’s a bit of controversy around the question whether NLTK is appropriate or not for production environments. Here’s my take on the matter:
Read More

term-frequency-inverse-document-frequency

Weighting words using Tf-Idf

Updates

  • 29-Apr-2018 – Added string instance check Python 2.7, Python3.6 compatibility (Thanks Greg)

If I ask you “Do you remember the article about electrons in NY Times?” there’s a better chance you will remember it than if I asked you “Do you remember the article about electrons in the Physics books?”. Here’s why: an article about electrons in NY Times is far less common than in a collection of physics books. It is less likely to stumble upon the “electron” concept in NY Times than in a physics book.
Read More

Is it a boy or a girl? An introduction to Machine Learning

Have you ever noticed what happens when you hear a name you haven’t heard before? You automatically put it in a bucket, the girl names bucket or the boy names bucket. In this tutorial, we’re getting started with machine learning. We’ll be building a classifier able to distinguish between boy and girl names. If this sounds interesting read along. If you expect a tonne of intricate math, read along. It’s easier and more fun than you think.
Read More

Natural Language Processing - Introduction

What is Natural Language Processing?

This is probably the first post I should have written on the blog. The thing is, I did machine learning and natural language processing for a long time before putting the concepts in order inside my own mind.

I’ve learned techniques and hacks to boost precision of classifiers before fully understanding how a classifier computes its weights or whatever. So I guess it makes sense to publish a general introductory post after some real hands-on posts.

Here’s a popular diagram used to describe what data science usually implies:
Read More

Training a NER System Using a Large Dataset

In a previous article, we studied training a NER (Named-Entity-Recognition) system from the ground up, using the Groningen Meaning Bank Corpus. This article is a continuation of that tutorial. The main purpose of this extension to training a NER is to:

  1. Replace the classifier with a Scikit-Learn Classifier
  2. Train a NER on a larger subset of the training data
  3. Increase accuracy
  4. Understand Out Of Core Learning

What was wrong with the initial system you might ask. There wasn’t anything fundamentally wrong with the process. In fact, it’s a great didactical example, and we can build upon it. This is where it was lacking:
Read More

Newer Posts
Older Posts

NLP-FOR-HACKERS

The NLP-FOR-HACKERS Book

NLP-FOR-HACKERS Book

Like My Tutorials?

Buy me a coffee
GDPR
Privacy Policy

Pin It on Pinterest

Sign up for the Newsletter

Here's what to expect:

* Newly published content

* Curated articles from around the web about NLP and related

* Absolutely NO SPAM

You have Successfully Subscribed!