Creating Curious Machines: Building Information-seeking Agents

Researchers at Maluuba are developing ways to teach artificial agents how to seek information actively, by asking questions. This includes a deep neural agent that learns to accomplish these tasks through efficient information-seeking behaviour, a vital step towards Artificial General Intelligence.

By Adam Trischler, Philip Bachman & Alessandro Sordoni, Maluuba.

Humans have an innate desire to know and understand. From a child learning to ride a bike to an adult gaining skills in an online course, we constantly absorb information from our environment through interaction. Motivated by this observation, we’ve developed a suite of tasks that teach artificial agents how to seek information actively, by asking questions. We’ve also designed a deep neural agent that learns to accomplish these tasks through efficient information-seeking behaviour. Such behaviour is a vital research step towards Artificial General Intelligence.

Asking the Right Questions

Let’s say you’re at a dinner party with friends and you decide to play 20 Questions. It’s your turn and you choose ‘cat’ as the thing for others to guess. They begin by asking broad questions: “Is it alive?”, “Is it a person?”, “Is it an animal?”, “Does it live underwater?”. The person who correctly identifies the item first is the winner, so your friends are not just trying to get the right answer, they’re trying to do so with as few questions as possible. Based on your simple yes-or-no responses, your friends can quickly narrow down the set of viable items until one correctly guesses ‘cat’.

This example demonstrates the iterative nature of information seeking: the information currently sought must be intelligently conditioned on the information already acquired. To be effective, an information-seeking agent must in some sense understand the state of its current knowledge. It must know what it knows, and how to bridge the gap between what it knows and what it needs to know.

The 20 Questions example also highlights how communication necessarily takes place over a restricted channel: each answer is a simple ‘yes’ or ‘no’ (conveying just one bit of information), and the number of questions is limited. Real-world information seeking is typically restricted in a similar sense -- we communicate via finite languages, over limited amounts of time. Consider searching online to choose a gift for a friend. Perhaps you start broadly -- guided loosely by age, gender and budget -- then hone in based on specific interests and recommendations.

Because of its fundamental role in intelligent behaviour, information seeking has been studied from a variety of perspectives, including cognitive science, psychology, neuroscience, and machine learning. In neuroscience, for instance, information-seeking strategies are often explained by biases toward novel, surprising, or uncertain events (Ranganath & Rainer, 2003). Information seeking is a key component in formal notions of fun and creativity (Schmidhuber, 2010), and intrinsic motivation (Oudeyer and Kaplan, 2007). It is also closely related to the concept of attention, which improves efficiency by ignoring irrelevant features (Mnih et al., 2014) and may be considered a strategy for information seeking.

New Tasks for Exploring Information Seeking

Researchers have used different tools and systems to help train intelligent agents, from datasets through to bespoke learning environments. The use of games like chess, Go, and the Atari suite has been incredibly fruitful in training intelligent agents. Similarly, many of the games that humans enjoy seem expressly designed to train efficient information seeking, or at least to exploit our joy in exercising this skill.

Thus motivated, we designed a suite of tasks to train and evaluate information-seeking behaviour. Three of these tasks are demonstrated here (see our paper for more details):

The highlighted tasks were Hangman, Face Challenge and War Boat. Each of these tasks has a distinct mode of play and unique rules and objectives. Each requires the ability to seek information iteratively based on an agent’s current “picture” of the world. The tasks are illustrated in the gifs below.

Hangman: The classic game where an agent must identify a phrase within a set number of turns by guessing letters of the alphabet.


Face challenge: An agent must determine the answer to questions like “Is this person wearing a hat?”, or “Does this person have a moustache?” by peeking at small chunks of an occluded portrait.

Face challenge

War Boat: An agent aims to sink an opponent’s naval fleet, which is randomly positioned on a hidden grid. Correctly guessing points where a boat is located means that the ship is ‘hit’, and with enough hits the boat sinks.

War boat

Training Models to Seek Information

The actions agents perform in our tasks can be interpreted as questions asked of the environment, e.g. “Does this phrase contain the letter ‘a’?” or “What does this block of pixels look like?”. To succeed, an agent must learn to ask useful questions and assimilate the information it obtains.

We developed a model that can be trained to do just that. At each step in completing a task, the model asks what it believes to be the most useful question, receives a response from the environment, and integrates that response with its existing knowledge. The model is a deep neural network that we trained through a combination of reinforcement learning techniques (specifically: Generalized Advantage Estimation, Schulman et al. 2016) and backpropagation. See the paper for full details.


During training, the agent maximizes a reward which combines task-specific extrinsic rewards and a task-agnostic intrinsic reward. The extrinsic rewards encourage the agent to achieve its goal using as few questions as possible. The intrinsic reward encourages the model to ask questions which provide the most new information about the environment. Specifically, we reward each question according to how much its answer increases similarity between model’s belief about the world and the actual state of the world. Thus, the agent learns to efficiently form an accurate internal picture of its environment.

Towards Artificial General Intelligence

As the demo shows, our methods produce agents that succeed across a broad range of tasks. The same approach can be applied to language, image, and strategy domains. In our tasks, the trained agents exhibit interpretable, intelligent information-seeking behavior, often performing at super-human levels.

We believe that information seeking plays a fundamental role in General Intelligence. Our present work is a small step towards this grander goal.



  • Charan Ranganath and Gregor Rainer. Neural mechanisms for detecting and remembering novel events. Nature Reviews Neuroscience, 4(3):193–202, 2003.
  • Jürgen Schmidhuber. Formal theory of creativity, fun, and intrinsic motivation (1990-2010). IEEE Transactions on Autonomous Mental Development, 2(3):230–247, 2010.
  • Pierre-Yves Oudeyer, Frédéric Kaplan, and Verena V. Hafner. Intrinsic Motivation Systems for Autonomous Mental Development , 2007.
  • Volodymyr Mnih, Nicolas Heess, Alex Graves, et al. Recurrent models of visual attention. In Advances in Neural Information Processing Systems (NIPS), pp. 2204–2212, 2014.
  • John Schulman, Philipp Moritz, Sergey Levine, Michael I Jordan, and Pieter Abbeel. High-dimensional continuous control using generalized advantage estimation. In International Conference on Learning Representations (ICLR), 2016.

Adam Trischler is Research Scientist at Maluuba, and a Machine Learning researcher focused on Deep Learning methods for Natural Language Processing and Artificial Intelligence.

Philip Bachman is Senior Research Scientist at Maluuba, where he plans and executes various research projects spread across Maluuba's areas of interest, and participates in the academic research community.

Alessandro Sordoni is a Research Scientist at Maluuba, interested in Natural Language Processing (NLP), Statistical Models, Machine Learning for NLP, Bayesian Probabilistic Models, Information Retrieval, Text Categorization, and Quantum Framework.

Maluuba is a Canadian Artificial Intelligence company focused on language understanding research. The company’s vision is to solve 'Artificial General Intelligence' by creating literate machines that can think, reason and communicate like humans.

Original. Reposted with permission.