Menu Sidebar
Menu

Getting started with Keras for NLP

In the previous tutorial on Deep Learning, we’ve built a super simple network with numpy. I figured that the best next step is to jump right in and build some deep learning models for text. The best way to do this at the time of writing is by using Keras .

What is Keras?

Keras is a deep learning framework that actually under the hood uses other deep learning frameworks in order to expose a beautiful, simple to use and fun to work with, high-level API. Keras can use either of these backends:

Read More

Roundup of Python NLP Libraries

The purpose of this post is to gather into a list, the most important libraries in the Python NLP libraries ecosystem. This list is important because Python is by far the most popular language for doing Natural Language Processing. This list is constantly updated as new libraries come into existence. In case you are looking for a list of useful corpora, check out this NLP corpora list
Read More

Introduction to Deep Learning

Introduction to Deep Learning – Sentiment Analysis

Deep Learning is one of those hyper-hyped subjects that everybody is talking about and everybody claims they’re doing. In certain cases, startups just need to mention they use Deep Learning and they instantly get appreciation. Deep Learning is indeed a powerful technology, but it’s not an answer to every problem. It’s also not magic like many people make it look like.

In this post, we’ll be doing a gentle introduction to the subject. You’ll learn what a Neural Network is, how to train it and how to represent text features (in 2 ways). For this purpose, we’ll be using the IMDB dataset. It contains around 25.000 sentiment annotated reviews. Deep Learning models usually require a lot of data to train properly. If you have little data, maybe Deep Learning is not the solution to your problem. In this case, the amount of data is a good compromise: it’s enough to train some toy models and we don’t need to spend days waiting for the training to finish or use GPU.

You can get the dataset from here: Kaggle IMDB Movie Reviews Dataset

Let’s quickly explore the IMDB dataset:

Read More

Word Embeddings Cover

Complete Guide to Word Embeddings

Introduction

We talked briefly about word embeddings (also known as word vectors) in the spaCy tutorial.
SpaCy has word vectors included in its models. This tutorial will go deep into the intricacies of how to compute them and their different applications.

Bag Of Words Model
In most of our tutorials so far, we’ve been using a Bag-Of-Words model.
Take for example this article: Text Classification Recipe. Using the BOW model we just keep counts of the words from the vocabulary. We don’t know anything about the words semantics.

Read More

Part Of Speech tagging with CRF

Quick Recipe: Build a POS tagger using a Conditional Random Field

A while back I wrote a Complete guide for training your own Part-Of-Speech Tagger. If you are new to Part-Of-Speech Tagging (POS Tagging) make sure you follow that tutorial first. This article is more of an enhancement of the work done there.

What is a CRF?

A Conditional Random Field (CRF for short) is a discriminative sequence labelling model. It’s fairly easy to explain model (compared to Hidden Markov Models). Basically, given:

  1. some feature extractors (feature extractors need to output real numbers)
  2. weights associated with the features (which are learned)
  3. previous labels

predict the current label.

You probably just realized that they seem totally appropriate for doing POS tagging. That’s true, and it’s also appropriate for other NLP tools like NE Extractors and Chunkers .

Read More

spaCy Tutorial Cover

Complete Guide to spaCy

Updates

  • 29-Apr-2018 – Fixed import in extension code (Thanks Ruben)

spaCy is a relatively new framework in the Python Natural Language Processing environment but it quickly gains ground and will most likely become the de facto library. There are some really good reasons for its popularity:

It's really FAST
Written in Cython, it was specifically designed to be as fast as possible
It's really ACCURATE
spaCy implementation of its dependency parser is one of the best-performing in the world:
It Depends: Dependency Parser Comparison
Using A Web-based Evaluation Tool

Batteries included
  • Index preserving tokenization (details about this later)
  • Models for Part Of Speech tagging, Named Entity Recognition and Dependency Parsing
  • Supports 8 languages out of the box
  • Easy and beautiful visualizations
  • Pretrained word vectors
Extensible
It plays nicely with all the other already existing tools that you know and love: Scikit-Learn, TensorFlow, gensim
DeepLearning Ready
It also has its own deep learning framework that’s especially designed for NLP tasks:
Thinc

Read More

Complete Guide to Topic Modeling

Complete Guide to Topic Modeling

What is Topic Modeling?

Topic modelling, in the context of Natural Language Processing, is described as a method of uncovering hidden structure in a collection of texts. Although that is indeed true it is also a pretty useless definition. Let’s define topic modeling in more practical terms.
Read More

WordClouds Cover

Quick Recipe: Building Word Clouds

What are Word Clouds?

Word Clouds are a popular way of displaying how important words are in a collection of texts. Basically, the more frequent the word is, the greater space it occupies in the image. One of the uses of Word Clouds is to help us get an intuition about what the collection of texts is about. Here are some classic examples of when Word Clouds can be useful:

Read More

TextRank for Text Summarization

TextRank for Text Summarization

The task of summarization is a classic one and has been studied from different perspectives. The task consists of picking a subset of a text so that the information disseminated by the subset is as close to the original text as possible. The subset, named the summary, should be human readable. The task is not about picking the most common words or entities. Think of it as a quick digest for a news article.
Read More

Language models

If you come from a statistical background or a machine learning one then probably you don’t need any reasons for why it’s useful to build language models. If not, here’s what language models are and why they are useful.
Read More

Older Posts

NLP-FOR-HACKERS

The NLP-FOR-HACKERS Book

NLP-FOR-HACKERS Book

Like My Tutorials?

Buy me a coffee
GDPR
Privacy Policy

Privacy Preference Center


  • Warning: reset() expects parameter 1 to be array, string given in /home/bogdani/webapps/nlpforhackers/wp-content/plugins/gdpr/public/partials/privacy-preferences-modal.php on line 32

    • Warning: Invalid argument supplied for foreach() in /home/bogdani/webapps/nlpforhackers/wp-content/plugins/gdpr/public/partials/privacy-preferences-modal.php on line 36

Warning: Invalid argument supplied for foreach() in /home/bogdani/webapps/nlpforhackers/wp-content/plugins/gdpr/public/partials/privacy-preferences-modal.php on line 71

Close your account?

Your account will be closed and all data will be permanently deleted and cannot be recovered. Are you sure?

Are you sure?

By disagreeing you will no longer have access to our site and will be logged out.

Pin It on Pinterest

Sign up for the Newsletter

Here's what to expect:

* Newly published content

* Curated articles from around the web about NLP and related

* Absolutely NO SPAM

You have Successfully Subscribed!