First Steps of Learning Deep Learning: Image Classification in Keras

Whether you want to start learning deep learning for you career, to have a nice adventure (e.g. with detecting huggable objects) or to get insight into machines before they take over, this post is for you!



Frameworks

 
There is a handful of popular deep learning libraries, including TensorFlowTheanoTorch and Caffe. Each of them has Python interface (now also for Torch: PyTorch).

So, which to choose? First, as always, screw all subtle performance benchmarks, as premature optimization is the root of all evil. What is crucial is to start with one which is easy to write (and read!), one with many online resources, and one that you can actually install on your computer without too much pain.

Bear in mind that core frameworks are multidimensional array expression compilers with GPU support. Current neural networks can be expressed as such. However, if you just want to work with neural networks, by rule of least power, I recommend starting with a framework just for neural networks. For example…
 

Keras

 
If you like the philosophy of Python (brevity, readability, one preferred way to do things), Keras is for you. It is a high-level library for neural networks, using TensorFlow or Theano as its backend. Also, if you want to have a propaganda picture, there is a possibly biased (or overfitted?) popularity ranking:

If you want to consult a different source, based on arXiv papers rather than GitHub activity, see A Peek at Trends in Machine Learning by Andrej Karpathy. Popularity is important - it means that if you want to search for a network architecture, googling for it (e.g. UNet Keras) is likely to return an example. Where to start learning it? Documentation on Keras is nice, and its blog is a valuable resource. For a complete, interactive introduction to deep learning with Keras in Jupyter Notebook, I really recommend:

For shorter ones, try one of these:

There are a few add-ons to Keras, which are especially useful for learning it. I created ASCII summary for sequential models to show data flow inside networks (in a nicer way than model.summary()). It shows layers, dimensions of data (x, y, channels) and the number of free parameters (to be optimized). For example, for a network for digit recognition it might look like:

           OPERATION           DATA DIMENSIONS   WEIGHTS(N)   WEIGHTS(%)

               Input   #####     32   32    3
              Conv2D    \|/  -------------------       896     0.1%
                relu   #####     32   32   32
              Conv2D    \|/  -------------------      9248     0.7%
                relu   #####     30   30   32
        MaxPooling2D   Y max -------------------         0     0.0%
                       #####     15   15   32
             Dropout    | || -------------------         0     0.0%
                       #####     15   15   32
              Conv2D    \|/  -------------------     18496     1.5%
                relu   #####     15   15   64
              Conv2D    \|/  -------------------     36928     3.0%
                relu   #####     13   13   64
        MaxPooling2D   Y max -------------------         0     0.0%
                       #####      6    6   64
             Dropout    | || -------------------         0     0.0%
                       #####      6    6   64
             Flatten   ||||| -------------------         0     0.0%
                       #####        2304
               Dense   XXXXX -------------------   1180160    94.3%
                relu   #####         512
             Dropout    | || -------------------         0     0.0%
                       #####         512
               Dense   XXXXX -------------------      5130     0.4%
             softmax   #####          10


You might be also interested in nicer progress bars with keras-tqdm, exploration of activations at each layer with quiver, checking attention maps with keras-vis or converting Keras models to JavaScript, runnable in a browser with Keras.js. Speaking of languages, there is also R interface to Keras.
 

TensorFlow

 
If not Keras, then I recommend starting with bare TensorFlow. It is a bit more low-level and verbose, but makes it straightforward to optimize various multidimensional array (or, well, tensor) operations. A few good resources:

In any case, TensorBoard makes it easy to keep track of the training process. It can also be used with Keras, via callbacks.
 

Other

 
Theano is similar to TensorFlow, but a bit older and harder to start. For example, you need to manually write updates of variables. Typical neural network layers are not included, so one often uses libraries such as Lasagne. If you’re looking for a place to start, I like this introduction:

At the same time, if you see some nice code in Torch or PyTorch, don’t be afraid to install and run it!

EDIT (July 2017): If you want a low-level framework, PyTorch may be the best way to start. It combines relatively brief and readable code (almost like Keras) but at the same time gives low-level access to all features (actually, more than TensorFlow). Start here:

 

Datasets

 
Every machine learning problem needs data. You cannot just tell it “detect if there is a cat in this picture”and expect the computer to tell you the answer. You need to show many instances of cats, and pictures not containing cats, and (hopefully) it will learn to generalize it to other cases. So, you need some data to start. And it is not a drawback of machine learning or just deep learning - it is a fundamental property of any learning!

Before you dive into uncharted waters, it is good to take a look at some popular datasets. The key part about them is that they are… popular. It means that you can find a lot of examples what works. And have a guarantee that these problems can be solved with neural networks.
 

MNIST

 

Many good ideas will not work well on MNIST (e.g. batch norm). Inversely many bad ideas may work on MNIST and no[t] transfer to real [computer vision]. - François Chollet’s tweet

Still, I recommend starting with the MNIST digit recognition dataset (60k grayscale 28x28 images), included in keras.datasets. Not necessary to master it, but just to get a sense that it works at all (or to test the basics of Keras on your local machine).
 

notMNIST

 

Indeed, I once even proposed that the toughest challenge facing AI workers is to answer the question: “What are the letters ‘A’ and ‘I’? - Douglas R. Hofstadter (1995)

A more interesting dataset, and harder for classical machine learning algorithms, is notMNIST (letters A-J from strange fonts). If you want to start with it, here is my code for notMNIST loading and logistic regression in Keras.
 

CIFAR

 
If you want to play with image recognition, there is CIFAR dataset, a dataset of 32x32 photos (also in keras.datasets). It comes in two versions: 10 simple classes (including cats, dogs, frogs and airplanes ) and 100 harder and more nuanced classes (including beaver, dolphin, otter, seal and whale). I strongly suggest starting with CIFAR-10, the simpler version. Beware, more complicated networks may take quite some time (~12h on CPU my 7 year old Macbook Pro).
 

More

 
Deep learning requires a lot of data. If you want to train your network from scratch, it may require as many as ~10k images even if low-resolution (32x32). Especially if data is scarce, there is no guarantee that a network will learn anything. So, what are the ways to go?

  • use really low res (if your eye can see it, no need to use higher resolution)
  • get a lot of data (for images like 256x256 it may be: millions of instances)
  • re-train a network that already saw a lot
  • generate much more data (with rotations, shifts, distortions)

Often, it’s a combination of everything mentioned here.
 

Standing on the shoulders of giants

 
Creating a new neural network has a lot in common with cooking - there are typical ingredients (layers) and recipes (popular network architectures). The most important cooking contest is ImageNet Large Scale Visual Recognition Challenge, with recognition of hundreds of classes from half a million dataset of photos. Look at these Neural Network Architectures, typically using 224x224x3 input (chart by Eugenio Culurciello):

Deep Learning Architectures - a scatter plot of network sizes, performances and ops per run

Circle size represents the number of parameters (a lot!). It doesn’t mention SqueezeNet though, an architecture vastly reducing the number of parameters (e.g. 50x fewer).

A few key networks for image classification can be readily loaded from the keras.applications module: Xception, VGG16, VGG19, ResNet50, InceptionV3. Some others are not as plug & play, but still easy to find online - yes, there is SqueezeNet in Keras. These networks serve two purposes:

  • they give insight into useful building blocks and architectures
  • they are great candidates for retraining (so-called transfer learning), when using architecture along with pre-trained weights)

Some other important network architectures for images:

Another set of insights:

 

Infrastructure

 
For very small problems (e.g. MNIST, notMNIST), you can use your personal computer - even if it is a laptop and computations are on CPU.

For small problems (e.g. CIFAR, the unreasonable RNN), you might be still able to use a PC, but it requires much more patience and trade-offs.

For medium and larger problems, essentially the only way to go is to use a machine with a strong graphic card (GPU). For example, it took us 2 days to train a model for satellite image processing for a Kaggle competition, see our:

On a strong CPU it would have taken weeks, see:

The easiest, and the cheapest, way to use a strong GPU is to rent a remote machine on a per-hour basis. You can use Amazon (it is not only a bookstore!), here are some guides:

 

Further learning

 
I encourage you to interact with code. For example, notMNIST or CIFAR-10 can be great starting points. Sometimes the best start is to start with someone’s else code and run it, then see what happens when you modify parameters.

For learning how it works, this one is a masterpiece:

When it comes to books, there is a wonderful one, starting from introduction to mathematics and machine learning learning context (it even covers log-loss and entropy in a way I like!):

Alternatively, you can use (it may be good for an introduction with interactive materials, but I’ve found the style a bit long-winded):

 

Other materials

 
There are many applications of deep learning (it’s not only image recognition!). I collected some introductory materials to cover its various aspects (beware: they are of various difficulty). Don’t try to read them all - I list them for inspiration, not intimidation!

 

Thanks

 
I would like to thank Kasia KulmaMartina Pugliese, Paweł Subko, Monika Pawłowska and Łukasz Kidziński for helpful feedback on the content and to Sarah Martin for polishing my English.

If you recommend a source that helped you with your adventure with deep learning - feel invited to contact me! (@pmigdal for short links, an email for longer remarks.)

The deep learning meme is not mine - I’ve just I rewrote from Theano to Keras (with TensorFlow backend).

  1. NOAA Right Whale Recognition, Winners’ Interview (1st place, Jan 2016), and a fresh one: Deep learning for satellite imagery via image segmentation (4th place, Apr 2017).
  2. This January during a 5-day workshop 6 high-school students participated in a rather NSFL project - constructing a neural network for detecting trypophobia triggers, see e.g. grzegorz225/trypophobia-detector and cytadela8/trypophobia_detector.
  3. It made a few episodes of webcomics obsolete: xkcd: Tasks (totally, by Park or Bird?), xkcd: Game AI (partially, by AlphaGo), PHD Comics: If TV Science was more like REAL Science (not exactly, but still it’s cool, by LapSRN).
  4. The title alludes to The Unreasonable Effectiveness of Mathematics in the Natural Sciences by Eugene Wigner (1960), one of my favourite texts in philosophy of science. Along with More is Different by PW Andreson (1972) and Genesis and development of a scientific fact (pdf here) by Ludwik Fleck (1935).
  5. If your background is in quantum information, the only thing you need to change is ℂ to ℝ. Just expect less tensor structure, but more convolutions.
  6. Is it only me, or does Theano tensor dimension order sound like some secret convent? Before you start searching how to join it: it is about the shape of multi-dimensional arrays: (samples, channels, x, y) rather than TensorFlow’s (samples, x, y, channels)

 
Bio: Piotr Migdał (@pmigdal)is a data science freelancer, with PhD in quantum physics; based in Warsaw, Poland. Active in gifted education, developing a quantum game and working as a data science instructor at deepsense.io.

Original. Reposted with permission.

Related: