Silver BlogAlphaGo Zero: The Most Significant Research Advance in AI

The previous version of AlphaGo beat the human world champion in 2016. The new AlphaGo Zero beat the previous version by 100 games to 0, and learned Go completely on its own. We examine what this means for AI.



Recently Google DeepMind program AlphaGo Zero achieved superhuman level without any help - entirely by self-play! Here is the Nature paper explaining technical details (also PDF version: Mastering the Game of Go without Human Knowledge)

One of the main reasons for success was the use of a novel form of Reinforcement learning in which AlphaGo learned by playing itself.

The system starts with a neural net that does not know anything about Go. It plays millions of games against itself and tuned the neural network to predict next move and the eventual winner of the games.

The updated neural network was merged with the Monte Carlo Tree Search algorithm to create a new and stronger version of AlphaGo Zero, and the process resumed. In each iteration, the performance improved by a small amount, but because it can play millions of games a day, AlphaGo Zero surpassed thousands of years of human knowledge of Go in just 3 days.

Alphago Zero Progress
Fig. 1: The progression of AlphaGo Zero, from DeepMind post.

This is a hugely significant advance for AI and Machine Learning research.

See also AMA with 2 members of DeepMind AlphaGo team.

Here is slightly shortened Quora answer by Xavier Amatriain which eloquently explains why this is so significant.

Xavier Amatriain: In my opinion Alpha Go Zero represents the most significant research advance in AI in the last few years. It is more significant than the original Alpha Go.


Alphago Zero 70 Hours
Fig. 2: After 70 hours AlphaGo Zero plays at super-human level. The game is disciplined and involves multiple challenges across the board. Source.


Why? The key is not in any of the components being extremely innovative (although there is definitely some smart new stuff going on), but rather in the formulation of the problem itself. This is not about supervised vs. unsupervised learning. It is not even about the fact the network learns without human intervention or examples. It is about the fact that Alpha Go Zero learned without any data!

This is a feat that can not be understated. We have all heard about the "Unreasonable effectiveness of data". We have all heard how data-hungry deep learning approaches are. Well, it turns out that (under some constraints) we don't need data at all! The only thing that was input into the model was the basic rules of the game, not even complex strategies or known "tricks".

Can you imagine if you could do the same thing in other domains? You specify the rules of the system, you let it generate data and learn from itself. You can stretch your mind to think about that in physical world situations (e.g. biological systems) where you could describe the "rules of the game" and then allow the AI model to generate data and learn on its own. As a matter of fact, the whole network can also be seen as a synthetic data generation system. I would be really curious to see how AlphaGo Lee (the previous version) would perform if trained on the data generated by AlphaGo Zero.

Just a few days after the paper was published, I am pretty sure there are many researchers thinking on how to apply this approach to many practical problems. I definitely am thinking on ways to apply it to medicine.

As a final note, it is funny to think how about 10 years ago, some were claiming that we did not need smart algorithms and math anymore. "All you need is data", they said. While data is obviously valuable in many cases, this breakthrough does clearly represent a complete change of direction. I am excited to see where it takes us.