Training a Computer to Recognize Your Handwriting

The remarkable system of neurons is the inspiration behind a widely used machine learning technique called Artificial Neural Networks (ANN), used for image recognition. Learn how you can use this to recognize handwriting.

The Neurons that Inspired the Network

We first take a look at how neurons in our brains work. Our brain has a large network of interlinked neurons, which act as a highway for information to be transmitted from point A to point B. To send different kinds of information from A to B, the brain activates a different sets of neurons, and so essentially uses a different route to get from A to B. This is how a typical neuron might look like.Neuron Illustration

An illustration of a brain neuron, and labels of its main components.

At each neuron, its dendrites receive incoming signals sent by other neurons. If the neuron receives a high enough level of signals within a certain period of time, the neuron sends an electrical pulse into the terminals. These outgoing signals are then received by other neurons.

Technical Explanation I: How the Model works

ANN Tutorial Overview.png

A simple Artificial Neural Network map, showing two scenarios with two different inputs but with the same output. Activated neurons along the path are shown in red.

Similarly, in the ANN model, we have an input node, which is the image we give the program, and an output node, which is the digit that the program recognized. The main characteristics of an ANN is as such:

Step 1. When the input node is given an image, it activates a unique set of neurons in the first layer, starting a chain reaction that would pave a unique path to the output node. In Scenario 1, neurons A, B, and D are activated in layer 1.

Step 2. The activated neurons send signals to every connected neuron in the next layer. This directly affects which neurons are activated in the next layer. In Scenario 1, neuron A sends a signal to E and G, neuron B sends a signal to E, and neuron D sends a signal to F and G.

Step 3. In the next layer, each neuron is governed by a rule on what combinations of received signals would activate the neuron (further explained later). In Scenario 1, neuron E is activated by the signals from A and B. However, for neuron F and G, their neurons’ rules tell them that they have not received the right signals to be activated, and hence they remains grey.

Step 4. Steps 2-3 are repeated for all the remaining layers (it is possible for the model to have more than 2 layers), until we are left with the output node.

Step 5. The output node deduces the correct digit based on signals received from neurons in the layer directly preceding it (layer 2). Each combination of activated neurons in layer 2 leads to one solution, though each solution can be represented by different combinations of activated neurons. In Scenarios 1 & 2, two images given to the input. Because the images are different, the network activates a different set of neurons to get from the input to the output. However, the output is still able to recognise that both images are “6”.

Technical Explanation II: Training the Model

We need to first decide the number of layers and number of neurons in each layer for our ANN model. While there is no limit, a good start is to use 3 layers, with the number of neurons being proportional to the number of variables. For the digit recognizer ANN, we used 3 layers with 500 neurons each. The two main ingredients involved in training a model are: a metric to evaluate the accuracy of the model, as well as the rules of the neurons that tells them whether they are activated or not.

A common metric used is the sum of the squared errors (SSE). Roughly speaking, a squared error denotes how close a predicted digit is to the actual digit. The ANN model will try to minimise the SSE by changing the rules of the neurons, and the change is determined by a mathematical concept known as differentiation.

Each neuron’s rule has two components – the weight (i.e. strength) of incoming signals [w], and the minimum received signal strength for activation [m]. In the following example, we illustrate the rules for neuron G. Zero weight is given to the signals from A and B (i.e. no connection), and weights of 1, 2, and -1 are given to the signals from C, D, and E respectively. The m-value for G is 2, so G is activated if:

  • D is activated and E is not activated, or if,
  • C and D are activated.

ANN Animation.gif

An example of a neuron(G)’s rule. The braces below G indicates the received signal strength.


Computationally Expensive. The amount of CPU power and time taken to train an ANN model is significantly higher compared to other types of models that can be used for a similar purpose (e.g. Random Forests), yet the results are not better. Although ANN has been known for a long time, it was previously not widely used and has gained a resurgence only because of the advances in hardware that made its computing feasible. However, ANN is the basis for more advanced models, like Deep Neural Network (DNN), which was used by Google in Oct 2015 and Mar 2016 to defeat human champions in the game of Go, widely viewed as an unsolved “grand challenge” for Artificial Intelligence.

Lack of feature recognition. The ANN is unable to recognize features of the image if they are of a different shape or location. For example, if we want our ANN to recognize images of cats, and suppose we give it examples in which the cat always appear in the bottom of the image, then the ANN will not recognize the same cats if they appear at the top, or the same cats of larger sizes. An advanced version of ANN called Convolutional Neural Networks (CNN) solve this problem by looking at various regions of the image. In fact, CNNs are also more efficient, and they are widely used in image and video recognition. For more information, check out a previous blog post on Introduction to Convolutional Neural Network.

Additional Notes for Advanced Readers

[This section is intended for readers who have mathematics or computer science background and wish to implement their own ANN.]

The neuron’s rule described in the technical explanation is actually a mathematical function called “activation function”. It gives zero output when the input is low, and gives positive output when the input is high enough. Some commonly used activation functions are the sigmoid function and the rectifier function. The output node is also a function, and usually the softmax function is used (a generalization of the logistic function). As such, the ANN can be viewed as “a grand function of functions”. This is also why we use differentiation to find the correct weights through gradient descent. Lastly, note that it is essential to normalize the input data during implementation.

Try out ANN using this C++ code by Ben Graham:

For more posts like this, visit