Peeking Inside Convolutional Neural Networks
This post discusses using some tricks to peek inside of the neural network, and to visualize what the individual units in a layer detect.
By Audun M. Øygard, Schibsted Media Group.
Convolutional neural networks are used extensively for a number of image related tasks these days. Despite being very successful, they're mostly seen as "black box" models, since it's hard to understand what happens inside the network. There are however methods to "peek inside" the convnets, and thus understand a bit more about how they work.
In a previous blogpost I showed how you could use gradient ascent, with some special tricks, to make a convolutional network visualize the classes it's learnt to classify. In this post I'll show that the same technique can also be used to "peek inside the network" by visualizing what the individual units in a layer detect. To give you an idea of the results, here's some highlights of visualizations of individual units from convolutional layer 5 in the VGG-S network:
Visualization of units 10,334,425 and 435 in convolutional layer 5 in VGG-S.
From top left we can pretty clearly see the head of a cocker spaniel-type dog, the head of some kind of bird, the ears of a canine, and a seaside coastline. Not all unit vizualisations are as clearly defined as these, but most nevertheless give us some interesting insights into what the individual units detect.
Earlier methods for figuring out what the units detect (e.g. in Zeiler & Fergus) have been to find images which maximally activate the individual units. Here's an example of the images (sampled from numerous crops of 100 000 images in the imagenet validation dataset) which give maximal activations for a specific unit in layer 5 of VGG-S:
While this gives us an idea of what the unit is detecting, by visualizing the same unit we can see explicitly the details the unit is focusing on. Applying this technique to the same unit as above, we can see that the unit seems to focus on the characteristic pattern on the muzzle of the dog, seemingly ignoring most other details in the image.
Visualization of unit 5 in convolutional layer 5 in VGG-S
We can use our visualization technique to get an overview of what all the different units in a typical layer detects. Here we've focused on convolutional layer 5 in VGG-S, which is the final convolutional layer in that specific network. Seemingly there are a large number of units that detect very specific features, such as (from top left below) forests/bushes in the background, buildings with pitched roofs, individual trees, clouds, collars, brass instruments, ship masts, bottle/jug tops, and seemingly the shoulders of people:
Visualization of (from top left) unit 94,159,201,432,258,7,136,88 & 449 in convolutional layer 5 in VGG-S
What is interesting to notice, is that the network doesn't seem to have learned detailed representations of faces. In e.g. the visualization featuring the collar, the face looks more like a spooky flesh-colored blob than a face. This might be an artifact of the visualization process, but it's not entirely unlikely that the network have either not found it necessary to learn the details, or not had the capacity to learn them.
There also are a surprisingly large number of units that detect dog-related features. I counted somewhere around 50, out of 512 units in the layer in total, which means a surprising 10% of the network may be dedicated solely to dogs. Here's a small sample of these:
Visualization of (from top left) unit 249,468,170 & 75 in convolutional layer 5 in VGG-S
On the other hand I could only find a single unit that clearly detected cat features (!):
Visualization of unit 484 in convolutional layer 5 in VGG-S