The Gap Between Deep Learning and Human Cognitive Abilities
How do we bridge this gap between deep learning and human cognitive ability?
Photo by David Cassolato
Hi! I am Bohdan Ponomar, CEO at the AI HOUSE community. We are part of the ecosystem being built up by Roosh technology company. Roosh creates ML/AI projects and invests in innovative ideas in the industry. Our ecosystem also includes Pawa venture studio, Roosh Ventures venture fund, SET University technological university, Reface and ZibraAI startups, and Neurons Lab company.
In August 2022, we launched a new educational project 'AI for Ukraine' — a series of workshops and lectures held by international artificial intelligence experts to support the development of Ukraine’s tech community.
The first lecture of the series was delivered by Yoshua Bengio, professor at the University of Montreal, founder and scientific director of the Quebec Artificial Intelligence Institute, head of the CIFAR Learning in Machines & Brains program, and one of the leading experts in the AI industry. In 2022, he became the computer scientist with the highest h-index in the world.
During the lecture, the professor covers his research project, which aimed to bridge the gap between modern AI based on Deep Learning, and human intelligence featuring creativity. The full recording is available here for a donation, and in this article, we cover the main points of the lecture.
Current Machine Learning has issues with reliability due to the poor performance of OOD (Out-Of-Distribution) sample representation. We are used to relying on the IID (Independent & Identically Distributed) hypothesis that the test distribution and the training distribution are the same. But without this assumption, we need some alternative hypothesis to perform the generalization.
This results in a research question: how exactly can distributions change? Let's consider how humans usually cope with such tasks, as this can inspire AI learning methods development.
For many years linguists have been studying systematic generalization, which is easily observed in natural language. A human can take familiar concepts and arrange them in a new order, while the meaning of the statement remains fully clear.
We can even create such configurations that would have zero probability according to the training distribution. For example, a driver generalizes his knowledge of driving laws in his home country to other countries where road rules may be slightly different. However, Deep Learning hasn’t yet achieved similar results. This shows the nature of the gap between state-of-the-art AI systems and human intelligence.
Compositional Knowledge Representation
We have large language models, but they require a huge amount of data, which makes their use senseless. This is a problem of sample complexity — the number of samples needed for training.
Therefore, we shouldn’t consider quantitative scaling, but the qualitative development of Deep Learning. And the questions are:
- How can these systems be generalized to new out-of-distribution settings?
- How quickly can they adapt to these settings (transfer learning)?
These questions are directly related to human’s ability to establish and identify causal relationships. Humans can make new conclusions by combining and recombining pieces of their previous knowledge. This compositional ability to represent knowledge in natural language also allows us to consider the course of future AI generations development.
Conscious Processing of Information
Everything we covered above was related to one key ability of humans that is currently beyond the AI’s reach. This is the conscious information processing performed by our brain.
For instance, what happens when a driver starts driving in a foreign country? Let's assume the driver is used to left-hand traffic, but now he has to adapt to driving on right. He cannot fully apply his previous experience here, because that would bring the car to the oncoming lane. But he can focus on the task, constantly reminding himself of the difference in road rules. And this is where his previous driving experience helps.
Thus, when humans face a new situation, they call to conscious attention in order to combine relevant pieces of knowledge on-the-fly, analyze them, and in the end successfully complete the task. Such conscious information processing differs by its nature from the one we are guided by in our routine (see "Thinking, Fast and Slow" by D. Kahneman).
Current Deep Learning systems successfully reproduce fast thinking with a simple sequence of actions, when there is no need to solve a non-trivial issue. But the reproduction of more complex and algorithmic slow thinking is a challenge for future industry development.
For this, we need to organize knowledge in a way to make it easy to select the relevant pieces of knowledge out of the training distribution to reuse when solving a new problem. The analogy can be a program code consisting of independent modules and functions.
A human is able to distinguish two perspectives of their knowledge about the world:
- those depending on immutable physical laws; and
- those associated with dynamically changing conditions.
This differs from the usual IID assumption, for things that are preserved in distributions and related to physical laws in fact remain unchanged. And things related to variable conditions change.
Therefore, the goal of Deep Learning is to discover such a knowledge representation that reflects the cause-and-effect relationship of variable factors. In other words, the outcome depends on the actions taken.
A human can receive and share a lot of information using a language. The most suitable for verbalization is the knowledge that relies on inductive biases, such as the use of abstract named objects.
For instance, I hold a ball in my hand. This sentence contains named objects: me, the hand, and the ball. Each of them has its own features, like coordinates in space. If I suddenly drop the ball, they can be used to predict the coordinates of the ball each next moment while it falls. This prediction is accurate, though it is based on a few variables only.
But this approach will not work if applied at the pixel level. It is impossible to accurately predict the state of a pixel itself. But you can predict the state of the pixel, related to an abstract named object, like a ball. Because the statistical structure of such named objects differs from ordinary pixels.
Besides, causal relationships between abstract objects are reusable. No matter what object falls as a result of dropping — a ball, a phone, or else — the mechanism remains the same.
In neuroscience, there is a random factor: in a certain situation, a human can have this or that thought. So there is some discrete stochastic aspect of thinking that cannot be accounted for in advance. Then, from a Machine Learning point of view, we need a probabilistic neural network that can generate thoughts from a selected distribution, such as Bayesian posterior probability.
GFlowNets is a versatile probabilistic modeling tool. Such networks make it possible to model distributions by composite objects and to estimate such quantities as normalizing constants or conditional probabilities. The easiest way to imagine this structure is the hypograph.
How do GFlowNets generate such structured objects as graphs? For several years now, we have known networks that take not a set of fixed-size vectors, but a graph as an input. Now we consider getting a graph as an output. This process is similar to how the brain generates thoughts. To create a composite structure, we add one element at a time. I.e., we take a partially constructed thought as the input and derive a distribution as the output, that determines all possible potential further actions. Thus, we get the desired outcome step by step.
GFlowNets can be organized as different modules specializing in different types of knowledge. Competing with each other, they provide the normalized estimations as the output. Moreover, each module shares information with others — this is how short-term working memory is formed. Finally, one of the estimates is stochastically chosen, such as the process of initiating conscious processing in the human brain.
Causal Relationships Model
When working with such a neural network the main challenge is to correctly identify and model the causal structure. Because if you simply take two A and B variables with a correlation between them, then there’s no knowing which one of them triggers another one. But we can assume that this connection will change due to external factors. For example, if we change the state of A, which also changes B, then it is likely that A has an effect on B and not vice versa.
Causal relationships are asymmetric. If you don’t understand it perfectly, you can make big mistakes. For instance, it can cause ambiguity, like has a patient been cured by the new medicine, or due to some other reason?
Therefore, it is necessary to build such models that can cover the entire set of possible causal explanations. In their thinking humans are able to make similar hypotheses. For AI, this task is solved by Bayesian posterior causal models.
So far, the gap between Deep Learning and human cognitive abilities is significant. After all, people can generalize their knowledge, apply slow thinking and use it to solve non-trivial issues, consciously process information, and understand the cause-and-effect relationship between phenomena.
However, world-leading AI experts work on bridging this gap and improving AI capabilities, inspired by studying the principles of the human brain function. Joshua Bengio's team in Montreal researches probabilistic neural networks to make a step toward the next generation of Deep Learning.