How DeepMind Trains Agents to Play Any Game Without Intervention

A new paper proposes a new architecture and training environment for generally capable agents.





Image Credit: DeepMind

I recently started a new newsletter focus on AI education and already has over 50,000 subscribers. TheSequence is a no-BS( meaning no hype, no news etc) AI-focused newsletter that takes 5 minutes to read. The goal is to keep you up to date with machine learning projects, research papers and concepts. Please give it a try by subscribing below:

Image

 

Gaming have been at the center of some of the biggest deep learning in the recent years. The sputnik moment of deep learning and gaming came when DeepMind’s reinforcement learning agent AlphaGo beating go world champion Lee Sedol. AlphaGo was later perfected with AlphaZero which was able to master games like chess, go or shogi. Reinforcement learning agents have also achieved super human performance in multi-player games like AtariCapture the FlagStarCraft IIDota 2, and Hide-and-Seek. However, in each case, the reinforcement learning agents have been train in a single game at a time. The idea of building agents that can master multiple games at the same time without major human intervention have remained an elusive goal in the deep learning space. Recently, DeepMind published “Open-Ended Learning Leads to Generally Capable Agents”, a research paper that details methods and processes to train reinforcement learning agents capable of mastering multiple simultaneous games without human intervention. This paper represents a major step towards building more generally capable agents that an interact in real world environments.

In essence, the DeepMind recipe to build generally capable agents is based in three intuitive building blocks:

  1. A rich universe of training tasks.
  2. A flexible architecture and training methods.
  3. A rigorous process of measuring progress.

 

A Rich Universe of Training Tasks

 
 
To generally master the skills to learn different games, DeepMind created an environment called XLand which is, essentially, a galaxy of games. In the XLand galaxy, games are placed based on the proximity of some characteristics such as cooperation or competition dynamics. Each game can be played using different levels of complexity that are dynamically changed to improve the learning behavior of the agent.



Image Credit: DeepMind

 

A Flexible Architecture and Training Method

 
 
DeepMind’s agent architecture is based on a goal-attentive agent(GOAT) neural network that uses attention over its current state. This mechanism helps the agent to focus on specific subgoals within a given game. The distribution of training tasks are selected using DeepMind’s favorite population based training (PBT) which have been used in many of their reinforcement learning models. PBT adjust the parameters of the task generation process in order to improve the learning of the agent. The training process literally starts from zero and gradually builds up complexity based on the agent’s progress.



Image Credit: DeepMind

 

Measuring Progress

 
 
Quantifying learning progress across heterogenous tasks can be a major challenge. To address that, DeepMind’s normalizes the scores per task and uses the Nash equilibrium value computed using a current set of trained players. The evaluation tasks look at the different percentiles of the normalized scores which can be compared across different agents.

 

The Results

 
 
DeepMind trained its generally capable agent in roughly 700,000 games across 4000 worlds in XLand. That translated approximately to 200,000,000,000 training steps and 3,400, 000 training tasks. The agents were able to master nearly every task with near zero human intervention. This clearly shows the viability of this type of approach to master multiple complex tasks using a single agent without human supervision. The ideas outlined in this paper might be the beginning of a new wave of reinforcement learning milestones. You can see the agents in action in the video below:

 
Bio: Jesus Rodriguez is currently a CTO at Intotheblock. He is a technology expert, executive investor and startup advisor. Jesus founded Tellago, an award winning software development firm focused helping companies become great software organizations by leveraging new enterprise software trends.

Original. Reposted with permission.

Related: