Text Chunking with NLTK

What is chunking

Text chunking, also referred to as shallow parsing, is a task that follows Part-Of-Speech Tagging and that adds more structure to the sentence. The result is a grouping of the words in “chunks”. Here’s a quick example:

In other words, in a shallow parse tree, there’s one maximum level between the root and the leaves. A deep parse tree looks like this:

There are several advantages and drawbacks for using one against the other. The most obvious advantage of shallow parsing is that it’s an easier task and a shallow parser can be more accurate. Also, working with chunks is way easier than working with full-blown parse trees.

Chunking is a very similar task to Named-Entity-Recognition. In fact, the same format, IOB-tagging is used. You can read about it in the post about Named-Entity-Recognition.

Corpus for Chunking

Good news, NLTK has a handy corpus for training a chunker. Chunking was part of the CoNLL-2000 shared task. You can read a paper about the task here: Introduction to the CoNLL-2000 Shared Task: Chunking

Let’s have a look at the corpus:

Here’s the first annotated sentence in the corpus:

We already approached a very similar problem to chunking on the blog: Named Entity Recognition. The approach we’re going to take is almost identical. The feature selection is going to be different and of course, the corpus. We’re going to use the CoNLL-2000 corpus in this case. Let’s remind ourselves how to transform between the nltk.Tree and IOB format:

Let’s get an idea of how large the corpus is:

That’s a decent amount to produce a well-behaved chunker.

Training a chunker

We’re going to train 2 chunkers, just for the fun of it and then compare.

Preparing the training and test datasets

NLTK TrigramTagger as a chunker

We’re going to train a chunker using only the Part-Of-Speech as information.

Classifier based tagger

We’re now going to do something very similar to the code we implemented in the NER article.

We can see that the difference in performance between trigram model approach and the classifier approach is significant. I’ve picked only the features that worked best in this case. Be sure to play a little with them. You might get a better performance if you use one set of features or the other.

Let’s take our new chunker for a spin:


  • Text chunking can be reduced to a tagging problem
  • Chunking and Named-Entity-Recognition are very similar tasks
  • Chunking is also called shallow-parsing.
  • Deep-parsing creates the full parse tree, shallow parsing adds a single extra level to the tree

That’s all. Happy chunking!