Gold BlogDeepMind’s MuZero is One of the Most Important Deep Learning Systems Ever Created

MuZero takes a unique approach to solve the problem of planning in deep learning models.

This year I am going to make an attempt to summarize important papers in the AI space, in very short and simple terms that can be easily understood by anyone.

I recently started a new newsletter focus on AI education and already has over 50,000 subscribers. TheSequence is a no-BS( meaning no hype, no news etc) AI-focused newsletter that takes 5 minutes to read. The goal is to keep you up to date with machine learning projects, research papers and concepts. Please give it a try by subscribing below:



“I’ve seen the future of artificial intelligence(AI) and it’s called MuZero”. Those were the words used by one of my mentors when he read the first preliminary research paper about MuZero published by DeepMind in 2019. A single deep learning model that can master games like Atari, Go, Chess or Shogi without even knowing the rules. That seems like something out of a sci-fi book. Well, that’s the essence of MuZero as described by DeepMind in a new research paper published in Nature a few weeks ago.

Conceptually, MuZero presents a solution to one of the toughest challenges in the deep learning space: planning. Since the early days of machine learning, researchers have looked at techniques that can both effectively learn a model given an environment and also plan the best course of action. Think about a self-driving car or a stock market scenario in which the rules of the environment are constantly changing. Typically, those environment has resulted incredibly challenging for planning in deep learning models. At a high level, most efforts related to planning in deep neural network fit into the following categories:

1) Lookahead Search Systems: This type of systems rely on knowledge of the environment for its planning. AlphaZero is a prominent example of models in this group. However, lookahead search techniques struggled when applied to messy environments.
2) Model-Based Systems: This type of systems try to learn a representation of the environment in order to plan. Systems such as Agent57 have been successful in this area but they can be incredibly expensive to implement.

MuZero combines ideas from both approaches but using an incredibly simple principle. Instead of trying to model the entire environment, MuZero solely focuses on its most important aspects that can drive the most useful planning decisions. Specifically, MuZero decomposes the problem in three elements critical to planning:

1) The value: how good is the current position?
2) The policy: which action is the best to take?
3) The reward: how good was the last action?

For instance, using the given position in a game, MuZero uses a representation function H to map the observations to an input embedding used by the model. Planned actions are described by a dynamic function G and a prediction function F.


The experience collected is used to train a neural network. It is important to notice that the experience includes both observations and rewards as well as the results of searches.


Using this simple idea DeepMind was able to evolve MuZero into a model able to achieve super-human performance in complex planning problems ranging from Chess to Atari. In all benchmarks, MuZero outperformed state-of-the-art reinforcement learning algorithms.

The impact of methods such as MuZero in deep learning planning is likely to be relevant for years to come. Certainly, we should keep an eye into what DeepMind is going to do next in this area.

Original. Reposted with permission.