A Guide to Train an Image Classification Model Using Tensorflow

Classify images at scale and with very high accuracy with the advent of machine learning and deep learning algorithms.

By Vidhi Chugh, KDnuggets AI Strategy Content Specialist on December 28, 2022 in Computer Vision

Humans learn to identify and label visuals at a very early age. Now, computers are able to classify images at scale and with very high accuracy with the advent of machine learning and deep learning algorithms. Such advanced algorithms have a multitude of applications — the common ones include distinguishing healthy lung scans, facial recognition by a mobile device, or classifying objects into different categories for a retailer.

A Guide to Train an Image Classification Model Using Tensorflow

Facial Recognition

The post explains one such application of computer vision i.e. image classification and illustrates how to train a model on a small dataset of images using Tensorflow.

Dataset and Objectives

For the purpose of this demo, we will use the MNIST dataset which contains images of digits from 0 to 9. Sample images are shown below:

Tensorflow-dataset

The objective of training this model is to classify the images to their respective label i.e. their respective digit equivalent. Deep neural network architecture with one input, one output, two hidden, and one dropout layer is used for training the model. CNN or Convolutional Neural Network is the preferred choice for larger images because of its ability to capture relevant information while reducing the size of the input.

Getting Started

Firstly, import all the relevant libraries starting with TensorFlow, to_categorical (for converting numeric class values to categories), Sequential, Flatten, Dense, and Dropout for building Neural Network architecture. If some of these libraries are new to you, then do not worry. They are explained in the upcoming section.

Hyperparameters

The following pointers help you choose the right set of hyperparameters:

Let's define some hyperparameters as a starting point, you can tune them to run different experiments. We have chosen a mini-batch size of 128. The batch size can take any value but selecting a batch size as a power of 2 is memory efficient and hence, is a preferred choice. Let us also understand one of the prime reasoning behind deciding appropriate batch size – a tiny batch size would make the convergence very noisy and a very large batch size might not fit in the memory of your computer.
Let's keep the number of epochs as 50 to quickly train the model. The dataset is small and simple and justifies the low epoch number.
Next, you need to add hidden layers. We have kept two hidden layers of 128 neurons each – you can experiment with 64 and 32 as well. Higher numbers are not recommended for a simple dataset like MINST.
You can try different learning rates like 0.01, 0.05, and 0.1. For the purpose of this demo, it is kept at 0.01.
Other hyperparameters like decay steps and decay rate are chosen as 2000, and 0.9 respectively. They are used to reduce the learning rate as the training progresses.
Adamax is picked as an optimizer, although you have options of other optimizers like Adam, RMSProp, SGD, etc to choose from. You can read more about the available list of optimizers along with their differences to choose the right one for your solution.

import tensorflow as tf
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Flatten, Dense, Dropout

params = {
    'dropout': 0.25,
    'batch-size': 128,
    'epochs': 50,
    'layer-1-size': 128,
    'layer-2-size': 128,
    'initial-lr': 0.01,
    'decay-steps': 2000,
    'decay-rate': 0.9,
    'optimizer': 'adamax'
}

mnist = tf.keras.datasets.mnist  
num_class = 10

# split between train and test sets
(x_train, y_train), (x_test, y_test) = mnist.load_data()

# reshape and normalize the data
x_train = x_train.reshape(60000, 784).astype("float32")/255
x_test = x_test.reshape(10000, 784).astype("float32")/255

# convert class vectors to binary class matrices
y_train = to_categorical(y_train, num_class)
y_test = to_categorical(y_test, num_class)

Creating Train and Test Set

Tensorflow library also includes the MNIST dataset which you can get by calling datasets.mnist and then load_data() on the object to get the train (60,000 samples) and test (10,000 samples) datasets separately.
Next, you need to reshape and normalize the training and test images, where normalization bounds image pixel intensity between 0 and 1.
Convert train and test labels to categorical by using the to_categorical method imported earlier. This is essential to communicate to the TensorFlow framework that the output labels i.e. 0 to 9 are classes and not numerical in nature.

Designing Neural Network Architecture

It is important to understand the nuanced details of how to design a neural network architecture.

Define DNN (Deep Neural Network) structure by adding Flatten to convert 2D image matrices to vectors. The input neurons correspond to the numbers in these vectors.
Next, Dense() method is used to add two hidden dense layers pulling in the hyperparameters from the “params” dictionary defined earlier. Let’s use the activation function for these layers as “relu” i.e. Rectified Linear Unit which is one of the most used activation functions in neural network hidden layers.
Next add the dropout layer using the Dropout method. It is used to avoid overfitting while training the neural network. An overfit model has a tendency to remember the training set exactly and is unable to generalize over unseen datasets.
The output layer is the last layer in our network, which is defined using the Dense() method. It is important to note that the output layer has 10 neurons corresponding to the number of classes (digits).

# Model Definition
# Get parameters from logged hyperparameters
model = Sequential([
  Flatten(input_shape=(784, )),
  Dense(params('layer-1-size'), activation='relu'),
  Dense(params('layer-2-size'), activation='relu'),
  Dropout(params('dropout')),
  Dense(10)
  ])

lr_schedule = 
    tf.keras.optimizers.schedules.ExponentialDecay(
    initial_learning_rate=experiment.get_parameter('initial-lr'),
    decay_steps=experiment.get_parameter('decay-steps'),
    decay_rate=experiment.get_parameter('decay-rate')
    )

loss_fn = tf.keras.losses.CategoricalCrossentropy(from_logits=True)

model.compile(optimizer='adamax', 
              loss=loss_fn,
              metrics=['accuracy'])

model.fit(x_train, y_train,
    batch_size=experiment.get_parameter('batch-size'),
    epochs=experiment.get_parameter('epochs'),
    validation_data=(x_test, y_test),)

score = model.evaluate(x_test, y_test)

# Log Model
model.save('tf-mnist-comet.h5')

Time to Train

Now that we have defined the architecture, let’s compile and train the neural network with the given training data.

Define a learning rate schedule with ExponentialDecay (exponentially decaying learning rate) with initial learning rate, decay steps, and decay rate as arguments.
Define loss function as CategoricalCrossentropy (for multi-class classification).
Compile the model by passing the optimizer (adamax), loss function, and metrics (choosing accuracy because all classes are equally important and uniformly distributed) as arguments.
Fit the model by calling a fit method with x_train, y_train, batch_size, epochs, and validation_data.
Call the evaluate method over the model object to get a score for how well the model is performing on the unseen dataset.
You can save the model object to be deployed in production using the save method called over the model object.

The post explained the primer to train a Deep Neural Network for an Image Classification task and is a good starting point to get familiar with image classification tasks using neural networks. It elaborated on the choices and reasoning behind how we choose the right set of parameters and architecture, in general.

Vidhi Chugh is an AI strategist and a digital transformation leader working at the intersection of product, sciences, and engineering to build scalable machine learning systems. She is an award-winning innovation leader, an author, and an international speaker. She is on a mission to democratize machine learning and break the jargon for everyone to be a part of this transformation.