Talking Machine – 3 Deep Learning Gurus Talk about History and Future, part 2

Key ideas from a podcast with Deep Learning gurus Geoff Hinton, Yoshua Bengio, and Yann LeCun, where they explain the power of distributed representation and also propose a new open paper review process.

See also the first part of this blog. This is the second part

The talking machine is a series of podcasts about Machine Learning made by Katherine Gorman (a journalist) and Ryan Adams (an expert in Machine Learning and an Assistant Professor of CS at Harvard). A recent podcast featured an interview made at NIPS 2014 with three pillars of deep learning community – Geoffrey Hinton, Yoshua Bengio, and Yann LeCun – where they talked about the history and future of deep learning.

Deep Learning

They first shared stories during the renewal of neural networks (see part 1). When talking about unsupervised learning, they all expressed confidence in it.

“The importance of unsupervised learning is going to grow in the future as we try to apply our methods to much larger datasets as most of which human won’t have time to manually label”, Yoshua said.

“We all have the sense that the future belongs to unsupervised learning. And we are seeing this right now in natural language processing, such as embedding words and text. Now it is the rage. Everybody is using word embedding.”

(If you want to have a quick idea about word embedding, here is a very nice blog. )

Distributed Representation

Geoffrey first gave some explanation of distributed representation. “The idea is that you have a large number of neurons, each representing some tiny aspect. Between them, they represent the whole thing. It is very different from symbol, where symbols are either identical or not to another symbol; while the distributed representation has the properties to make them related to another distributed representation. You only need a whole bunch of connection strings instead of explicit rules. “

Yann said, “The power of the concept can be seen in how to find certain vector representation for words, texts in various languages, images, videos. Finding the embedding is a very interesting thing. There are lots of different methods doing this.” Geoffrey and Yann are also working separately on Matrix Learning applied on images, which Facebook is using for face recognition.

Geoffrey commented more on it. “My belief is that we can get recurrent neural network to translate from one language to another. We can do that with nothing looks like symbol or symbol rules. They are just vectors insides. It works very well in Google and also Yoshua’s Lab and is developing very fast.”

Why distributed representation is so powerful?

Yoshua said: “One way to think about the vectors is that they are attributes learned by the machine. Words, images or concepts are going to be associated with these attributes. Note that the attributes here are learned and the learning system discovers all the attributes to do the good job.

The important notion here is notion of composition. Some computer scientists thought neural nets cannot do composition. Actually composition is at the heart of why deep learning works. In the case of attributes and distributed representation, there are so many configurations of these attributes that can be composed and these representations are so powerful. When you consider multiple levels of representation, which deep learning is about, you get next level of composition come in and allow representing even more abstract things.”


As Yoshua said in the interview, “from historic perspective, it’s not easy to go against the fashion and stick to your idea. ” They all went through a hard time when convincing the community to buy their ideas.

During the interview, Yann proposed a new publication mode to avoid rejecting brand new ideas. He and Yoshua co-founded International Conference on Learning Representations (ICLR) which adopted a open review process. People put their papers on the arxiv, and will be reviewed by official reviewers. Review will be published and everybody can comment on the papers. Although it is against a ingrained point-counting reviews, Yann wishes the mode can be adopted by other conferences.