Deep Learning in Neural Networks: An Overview
This post summarizes Schmidhuber's now-classic (and still relevant) 35 page summary of 900 deep learning papers, giving an overview of the state of deep learning as of 2014. A great introduction to a great paper!
Deep Learning in Neural Networks: An Overview – Schmidhuber 2014
What a wonderful treasure trove this paper is! Schmidhuber provides all the background you need to gain an overview of deep learning (as of 2014) and how we got there through the preceding decades.
Starting from recent DL results, I tried to trace back the origins of relevant ideas through the past half century and beyond.
The main part of the paper runs to 35 pages, and then there are 53 pages of references. As a rough guess, that’s somewhere around 900 referenced works. Now, I know that many of you think I read a lot of papers – just over 200 a year on this blog – but if I did nothing but review these key works in the development of deep learning it would take me about 4.5 years to get through them at that rate! And when I’d finished I’d still be about 6 years behind the then current state of the art! My guess is most of you would like a little more variety in the subject matter than that too. It’s a good reminder of how vast the CS body of knowledge is that we’ve built up over the last half-century and more.
I shall now attempt to condense a 35-page summary of 900 papers into a single blog post! Needless to say, there’s a lot more detail in the full paper and references than I can cover here. We’ll look at the following topics: Credit assignment paths and the question of how deep is deep?; Key themes of Deep Learning; Highlights in the development of Supervised and Unsupervised Learning methods, Reinforcement Learning; and a short look at where things might be heading.
How deep is deep?
We don’t know, but ’10’ is very deep…
Which modifiable components of a learning system are responsible for its success or failure? What changes to them help improve performance? This has been called the fundamental credit assignment problem (Minsky, 1963)…. The present survey will focus on the narrower, but now commercially important, subfield of Deep Learning (DL) in Artificial Neural Networks (NNs)… Learning or credit assignment is about finding weights that make the NN exhibit desired behaviour – such as driving a car. Depending on the problem and how the neurons are connected, such behaviour may require long causal chains of computational stages, where each stage transforms (often in a non-linear way) the aggregate activation of the network. Deep Learning is about accurately assigning credit across many such stages.
Feedforward neural networks (FNNs) are acyclic, recurrent neural networks (RNNs) are cyclic. “In a sense, RNNs are the deepest of all NNs” in principle they can create and process memories of arbitrary sequences of input patterns.
To measure whether credit assignment in a given NN application is of the deep or shallow type, I introduce the concept of Credit Assignment Paths or CAPs, which are chains of possibly causal links between events… e.g. from input through hidden to output layers in FNNs, or through transformations over time in RNNs.
If a credit assignment path (a path through the graph starting with an input) is of the form (…, k, t, …, q), where k and t are the first successive elements withmodifiable weights (it’s possible that t = q), then the length of the suffix list t…q is the path’s depth.
This depth limits how far backwards credit assignment can move down the causal chain to find a modifiable weight… the depth of the deepest CAP within an event sequence is called the solution depth… Given some fixed NN topology, the smallest depth of anysolution is called the problem depth. Sometimes we also speak of the depth of an architecture: supervised learning FNNs with fixed topology imply a problem-independent maximal problem depth bounded by the number of non-input layers… In general, RNNs may learn to solve problems of potenitally unlimited depth.
So where does shallow learning end, and_deep learning_ begin? “Discussions with DL experts have not yet yielded a conclusive response to this question!”
Instead of committing myself to a precise answer, let me just define for the purposes of this overview: problems of depth > 10 require Very Deep Learning.
Several themes recur across the different types of deep learning:
- Dynamic Programming can help to facilitate credit assignment. In supervised learning backpropagation itself can be viewed as a dynamic programming-derived method. Dynamic programming can also help to reduce problem depth in traditional reinforcement learning, and dynamic programming algorithms are essential for systems that combine concepts of NNs ad graphical models, such as Hidden Markov Models (HMMs).
- Unsupervised learning can facilitate both supervised and reinforcement learning by first encoding essential features of inputs in a way that describes the original data in a less redundant or more compact way. These codes become the new inputs for supervised or reinforcement learning.
- Many methods learn hierarchies of more and more abstract data representations – continuously learning concepts by combining previously learnt concepts.
- “In the NN case, the Minimum Description Length principle suggest that a low NN weight complexity corresponds to high NN probability in the Bayesian view, and to high generalization performance, without overfitting the training data. Many methods have been proposed for regularizing NNs, that is, searching for solution-computing but simple, low-complexity supervised learning NNs.”
- GPUs! GPUs excel at the fast matrix and vector multiplications required for NN training, where they can speed up learning by a factor of 50 and more.