Training a Computer to Recognize Your Handwriting

The remarkable system of neurons is the inspiration behind a widely used machine learning technique called Artificial Neural Networks (ANN), used for image recognition. Learn how you can use this to recognize handwriting.

By Kenneth Soo, Stanford.

Take a look at the picture below and try to identify what it is:


One should be able to tell that it is a giraffe, despite it being strangely fat. We recognize images and objects instantly, even if these images are presented in a form that is very different from what we have seen before. We do this with the 80 billion or more neurons in the human brain working together to transmit information. The remarkable system of neurons is also the inspiration behind a widely used machine learning technique called Artificial Neural Networks (ANN), commonly used for image recognition. In some cases, computers using this technique have even out-performed humans in recognizing images.

The Problem

Image recognition is important for many of the advanced technologies we use today. It is used in visual surveillance, guiding autonomous vehicles and even identifying the presence of diseases from X-ray images. Most modern smartphones also come with pre-installed image recognition programs that recognizes handwriting and convert them into typed words.

In this chapter we will look at how we can train an ANN algorithm to recognize images of handwritten digits. We will be using the images from the famous MNIST (Mixed National Institute of Standards and Technology) database.


Some of the handwritten digits in the MNIST database

An Illustration

We first train our ANN model (further explained later in the chapter) by giving it examples of 10,000 handwritten digits, as well as the correct answer. After the ANN model is trained, we can test how well the model performs by giving it 1,000 new handwritten digits without the correct answer. The model is then required to recognize the actual digit.

At the start, the ANN translates handwritten images into a language it understands. Black pixels are given the value “0” and white pixels the value “1”. Each pixel in an image is called a variable.

Out of the 1,000 pictures that the model was asked to recognize, it correctly identified 922 of them, which is equivalent to a 92.2% accuracy. We can use a contingency table to view the results, as shown below.

ANN Handwriting Contingency Table

Contingency Table showing the performance of the ANN model. For example, the first row tells us that out of 85 actual digit “0”s given to the model, 84 were correctly identified and 1 was wrongly identified as “6”. The last column indicates prediction accuracy.

From the table, we can see that when given a handwritten image of either “0” or “1”, the model almost always identifies it correctly. On the other hand, the digit “5” is the trickiest to identify. An advantage of using a contingency table is that it tells us the frequency of mis-identification. When given an image of the digit “2”, it misidentifies it as “7” or “8” in about 8% of the time. Let’s take an in-depth look at some of these misidentified digits.

ANN Mis-Identification Errors

While the images look obviously like a digit “2” to human eyes, the ANN is sometimes unable to recognize shapes and features of images, like the tail of the digit “2” (Read Limitations). Another interesting observation is how the digits “3” and “5” are mixed up by the model with significant frequency (about 10%).

ANN Mis-Identification Errors