AlphaGo is not the solution to AI

The field will be better off without an bust cycle it is important for people to keep and inform a balanced view of successes and their extent. AlphoGo might be a step forward for the AI community, but it is still no way close to the true AI.



By John Langford, Hunch.net.

Congratulations are in order for the folks at Google Deepmind who have mastered Go.

Google DeepMind AlphaGo vs Lee Sedol

However, some of the discussion around this seems like giddy overstatement. Wired says Machines have conquered the last games and Slashdot says We know now that we don’t need any big new breakthroughs to get to true AI. The truth is nowhere close.

For Go itself, it’s been well-known for a decade that Monte Carlo tree search (i.e. valuation by assuming randomized playout) is unusually effective in Go. Given this, it’s unclear that the AlphaGo algorithm extends to other board games where MCTS does not work so well. Maybe? It will be interesting to see.

Delving into existing computer games, the Atari results (see figure 3 below) are very fun but obviously unimpressive on about 1/4 of the games.

deepmind-2016-fig3

Figure 3 from DeepMind paper on Atari results. Comparison of the DQN agent with the best reinforcement learning methods in the literature. The performance of DQN is normalized with respect to a professional human games tester (that is, 100% level) and random play (that is, 0% level).

My hypothesis for why is that their solution does only local (epsilon-greedy style) exploration rather than global exploration so they can only learn policies addressing either very short credit assignment problems or with greedily accessible polices. Global exploration strategies are known to result in exponentially more efficient strategies in general for deterministic decision process(1993), Markov Decision Processes (1998), and for MDPs without modeling (2006).

The reason these strategies are not used is because they are based on tabular learning rather than function fitting. That’s why I shifted to Contextual Bandit research after the 2006 paper. We’ve learned quite a bit there, enough to start tackling a Contextual Deterministic Decision Process, but that solution is still far from practical. Addressing global exploration effectively is only one of the significant challenges between what is well known now and what needs to be addressed for what I would consider a real AI.

This is generally understood by people working on these techniques but seems to be getting lost in translation to public news reports. That’s dangerous because it leads to disappointment. The field will be better off without an overpromise/bust cycle so I would encourage people to keep and inform a balanced view of successes and their extent. Mastering Go is a great accomplishment, but it is quite far from everything.

(Editor: read Yann LeCun post about AlphaGo, and more on Yann LeCun reply here.)

Original.

Related: