An Introduction to the MXNet Python API 

This post outlines an entire 6-part tutorial series on the MXNet deep learning library and its Python API. In-depth and descriptive, this is a great guide for anyone looking to start leveraging this powerful neural network library.

By Julien Simon, Principal Technical Evangelist, Amazon Web Services.

In this series, I will try to give you an overview of the MXnet Deep Learning library: we’ll look at its main features and its Python API (which I suspect will be the #1 choice). Later on, we’ll explore some of the MXNet tutorials and notebooks available online, and we’ll hopefully manage to understand every single line of code!

If you’d like learn more about the rationale and the architecture of MXNet, you should read this paper, named “MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems”. We’ll cover most of the concepts presented in the paper, but hopefully in a more accessible way.

MXNet tutorial
MXNet Tutorial.

Part 1: Getting started

First things first: let’s install MXNet. You’ll find the official instructions here, but here are some additional tips.

One of the cool features of MXNet is that it can run identically on CPU and GPU (we’ll see later how to pick one or the other for our computations). This means that even if your computer doesn’t have an Nvidia GPU (just like my MacBook), you can still write and run MXNet code which you’ll use later on GPU-enabled systems.

Part 2: The Symbol API

In part 1, we covered some MXNet basics and then discussed the NDArray API (tldr: NDArrays is where we’re going to store data, parameters, etc).

Now that we’ve got data covered, it’s time to look at how MXNet defines computation steps.

Part 3: The Module API

In this article, we’re going to use what we learned on Symbols and NDArrays to prepare some data and build a neural network. Then, we’ll use the Module API to train the network and predict results.

Part 4: Using a pre-trained model for image classification (Inception v3)

In part 3, we built and trained our first neural network. We now know enough to take on more advanced examples.

State of the art Deep Learning models are insanely complex. They have hundreds of layers and take days — if not weeks — to train on vast amounts of data. Building and tuning these models requires a lot of expertise.

Fortunately, using these models is much simpler and only requires a few lines of code. In this article, we’re going to work with a pre-trained model for image classification called Inception v3.

CNN architecture
Architecture of a CNN (Source: Nvidia)

Part 5: More pre-trained models (VGG16 and ResNet-152)

In part 4, we saw how easy it was to use a pre-trained version of the Inception v3 model for object detection. In this article, we’re going to load two other famous Convolutional Neural Networks (VGG19 and ResNet-152) and we’ll compare them to Inception v3.

Part 6: Real-time object detection on a Raspberry Pi (and it speaks, too!)

In part 5, we used three different pre-trained models for object detection and compared them using a couple of images.

One of the things we learned is that models have very different memory requirements, the most frugal model being Inception v3 with “only” 43MB. Obviously, this begs the question: “can we run this on something really small, say a Raspberry Pi?”. Well, let’s find out!

Bio: Julien Simon (@julsimon) is hands-on technology executive. Expert in web architecture & infrastructure. Scalability addict. Agile warrior.