Gold BlogNatural Language Processing with Python: The Free eBook

This free eBook is an introduction to natural language processing, and to NLTK, one of the most prevalent Python NLP libraries.

When it comes to the field of natural language processing, it ends up that we are actually talking about a very broad number of related concepts, techniques, and approaches. Word vectors, dependency parsing, text classification, regular expressions, language models, speech translation; these can all be lumped together under the banner of NLP, though they are very different tasks and techniques.

Given the very broad and wide ranging nature of this field — a field which is developing at a rapid pace — gaining a strong foundation of basic concepts, and doing so in a practical manner, should be viewed as being of high importance. This week's highlighted free eBook, Natural Language Processing with Python, is a great way to help achieve this strong foundation.



The Natural Language Toolkit (NLTK) is a general purpose NLP library that, while not generally viewed as a choice for production systems, is well-suited to teaching and learning how to implement some of the fundamental concepts of NLP. This accompanying book is designed specifically to guide a reader through this learning process.

From the book's preface:

This book provides a highly accessible introduction to the field of NLP. It can be used for individual study or as the textbook for a course on natural language processing or computational linguistics, or as a supplement to courses in artificial intelligence, text mining, or corpus linguistics. The book is intensely practical, containing hundreds of fully-worked examples and graded exercises.
This book is intended for a diverse range of people who want to learn how to write programs that analyze written language, regardless of previous programming experience.

As stated above, the book is definitely of a practical nature. While you will assuredly have concepts explained as you go, there is little doubt that the book is crafted as a launchpad for those looking to get going with implementing NLP solutions with Python, and doing so now.

It should be noted that this version of the free online book has had its code updated to Python 3, as the original version, now a decade old, was written in Python 2. Also note that the book is not available as a PDF download, but instead is freely available on its site in HTML format.

The chapters of the book are as follows:

  1. Language Processing and Python
  2. Accessing Text Corpora and Lexical Resources
  3. Processing Raw Text
  4. Writing Structured Programs
  5. Categorizing and Tagging Words
  6. Learning to Classify Text
  7. Extracting Information from Text
  8. Analyzing Sentence Structure
  9. Building Feature Based Grammars
  10. Analyzing the Meaning of Sentences
  11. Managing Linguistic Data
  12. Afterword: Facing the Language Challenge

The book's preface also states these specific expected learning outcomes from the book:

  • How simple programs can help you manipulate and analyze language data, and how to write these programs
  • How key concepts from NLP and linguistics are used to describe and analyse language
  • How data structures and algorithms are used in NLP
  • How language data is stored in standard formats, and how data can be used to evaluate the performance of NLP techniques

Sample chunk parser output visualization. From Natural Language Processing with Python Chapter 7.


The book starts off slow — describing NLP, how Python can be used to perform some NLP programming tasks, how to access natural language content to process — and moves on to bigger concepts, both conceptually (NLP) programmatically (Python). Soon it gets to categorization, text classification, information extraction, and other topics more often thought of as classic NLP. After getting the basics of NLP with this book, you can move on to more modern and cutting edge techniques, perhaps through the use of materials such as some of Stanford's free courses.

Best of luck to everyone wading into the natural language processing waters. This is great book to start out with, and one which can be absorbed relatively quickly given its short length, meaning you can move on to more advanced topics in short order.