# Deep Learning in Neural Networks: An Overview

This post summarizes Schmidhuber's now-classic (and still relevant) 35 page summary of 900 deep learning papers, giving an overview of the state of deep learning as of 2014. A great introduction to a great paper!

**Reinforcement Learning**

Without a teacher, solely from occasional real-valued pain and pleasure signals, RL agents must discover how to interact with a dynamic, initially unknown environment to maximize their expected cumulative reward signals. There may be arbitrary, a priori unknown delays between actions and perceivable consequences. The problem is as hard as any problem of computer science, since any task with a computable description can be formulate in the RL framework…

In the general case RL implies deep CAPs, but under the simplifying assumption of *Markov Decision Processes* (MDPs) the CAP depth can be greatly reduced. In an MDP, the current input of the RL agent conveys all information necessary to compute an optimal next output event or decision.

Perhaps the most well-known RL NN is the world-class RL backgammon player (Tesauro, 1994) which achieved the level of human world champions by playing against itself… More recently, a rather deep GPU-CNN was used in a traditional RL framework to play several Atari 2600 computer games directly from 84×84 pixel 60Hz video input… Even better results are achieved by using (slow) Monte Carlo tree planning to train comparatively fast deep NNs.

For many situations the MDP assumption is unrealistic. “However, memories of previous events can help to deal with _partially observable Markov decision problems (POMDPs).”

See also the work on Neural Turing Machines and Memory Networks from Google and Facebook respectively that was published after the date of the survey paper we’re looking at today.

Not quite as universal… yet both more practical and more general than most traditional RL algorithms are methods for

Direct Policy Search(DS). Without a need for value functions or Markovian assumptions, the weights of an FNN or RNN are directly evaluated on the given RL problem. The results of successive trials inform further search for better weights. Unlike with RL supported by BP, CAP depth is not a crucial issue. DS may solve the credit assignment problem without backtracking through deep causal chains of modifiable parameters – it neither cares for their existence, nor tries to exploit them.

The very most general type of RL is constrained only be the fundamental limitations of computability. “Remarkably, there exist blueprints of *universal problem solvers* or *universal RL machines* for unlimited problem depth that are time-optimal in various theoretical senses.” These can solve any well-defined problem as quickly as the unknown fastest way of solving it, save for an additive constant overhead that becomes negligible as the problem size grows…

Note that most problems are large; only few are small. AI and DL researchers are still in business because many are interested in problems so small that it is worth trying to reduce the overhead through less general methods, including heuristics…

**Where next? (c. 2014)**

… humans

learn to actively perceivepatterns by sequentially directing attention to relevant parts of the available data. Near future deep NNs will do so too, extending previous work since 1990 on NNs that learn selective attention through RL of (a) motor actions such as saccade control, and (b) internal actions controlling spotlights of attention within RNNs, thus closing the general sensorimotor loop through both external and internal feedback.

See the Google Deep Mind Neural Turing Machine paper for an example of a system with a memory and trainable attention mechanism.

Many recent DL results profit from GPU-based traditional deep NNs. Current GPUs, however, are little ovens, much hungrier for energy than biological brains, whose neurons communicate by brief spikes and often remain quiet. Many computational models of such

spiking neuronshave been proposed and analyzed…. Future energy-efficient hardware for DL in NNs may implement aspects of such models.

The last word is reserved for universal problem solvers:

The more distant future may belong to general purpose learning algorithms that improve themselves in provably optimal ways, but these are not yet practical or commercially relevant.

**Bio: Adrian Colyer** was CTO of SpringSource, then CTO for Apps at VMware and subsequently Pivotal. He is now a Venture Partner at Accel Partners in London, working with early stage and startup companies across Europe. *If you’re working on an interesting technology-related business he would love to hear from you: you can reach him at acolyer at accel dot com*.

Original. Reposted with permission.

**Related:**