Deep Learning can be easily fooled

It is almost impossible for human eyes to label the images below to be anything but abstract arts. However, researchers found that Deep Neural Network will label them to be familiar objects with 99.99% confidence. The generality of DNN is questioned again.


On a post I wrote last year, I talked about the fact that Deep Neural Network could not label a changed image correctly (e.g. mislabeling a dog to flower) although the change is imperceptible to humans (e.g. still a dog). Recently, a related result is shown by researchers from University of Wyoming and Cornell University. They produced images completely unrecognizable to human eyes (as shown in the right picture) while DNN will still label them to be familiar objects (such as cheetah/peacock/baseball/…) with 99.99% confidence.

Researchers used one of the best Deep Neural Networks, the “AlexNet” trained on the 1.3-million-image ILSVRC 2012 ImageNet dataset, to achieve state-of-the-art performance, and “LeNet” model trained on the MNIST dataset to test if the result holds for other DNN architectures. “AlexNet” and “LeNet” are both provided by the Caffe Software package.

Surprisingly, these images to fool DNN are not hard to produce using evolutionary algorithms. The images are represented as a genome by two different encoding methods. Evolutionary algorithms start with a real image, slowly evolve and keep the best individual found so far for each objective. Here, the best individual is determined by the prediction value a DNN makes for that image belonging to a class.

After 200 generations, the images become completely unrecognizable to us. However, LeNet believes with 99.99% confidence that these are digits 0-9.



Direct encoding (top) and indirect encoding (bottom). Each column is a digit class and each row is the result after 200 generations of a randomly selected, independent run of evolution. 

The images from direct encoding are only noise to us, while certain patterns repeatedly evolve in some digit classes on the images produced by indirect encoding method, such as images classified as a 1 tend to have vertical bars.  The same property also appears when representing the images from ImageNet.

Why would DNN be fooled?

This may be another evidence to prove that Deep Neural Network does not see the world in the same way as human vision. We use the whole image to identify things while DNN depends on the features. As long as DNN detects certain features (like the vertical bars mentioned), it will classify the image as a familiar object it has been trained on.

How can we prevent such fooling when we train DNN?

The researchers proposed one way to prevent such fooling by adding the fooling images to the dataset in a new class and training DNN on the enlarged dataset. In the experiment, the confidence score decreases significantly for ImageNet AlexNet. It is not easy to fool the retrained DNN this time. But when the researchers applied such method to MNIST LeNet, evolution still produces many unrecognizable images with confidence scores of 99.99%.

See more details on the original research paper: Deep Neural Networks are Easily Fooled: High Confidence Predictions for Unrecognizable Images, by Anh Nguyen, Jason Yosinski, Jeff Clune.