An Introduction to Monte Carlo Tree Search

A great explanation of the concept behind Monte Carlo Tree Search algorithm and a brief example of how it was used at the European Space Agency for planning interplanetary flights.



Monte Carlo Tree Search
We are ready to learn how the Monte Carlo Tree Search algorithm works.

As long as we have enough information to treat child nodes as slot machines, we choose the next node (move) as we would have when solving Multi-Armed Bandit Problem. This can be done when we have some information about the pay-outs for each child node.

 

At some point we reach the node where we can no longer proceed in this manner because there is at least one node with no statistic present. It’s time to explore the tree to get new information. This can be done either completely randomly or by applying some heuristic methods when choosing child nodes (in practice this might be necessary for games with high branching factor - like chess or Go - if we want to achieve good results).

 

Finally, we arrive at a leaf. Here we can check whether we have won or lost.

 

It’s time to update the nodes we have visited on our way to the leaf. If the player making a move at the corresponding node turned out to be the winner we increase the number of wins by one. Otherwise we keep it the same. Whether we have won or not, we always increase the number of times the node was played (in the corresponding picture we can automatically deduce it from the number of loses and wins).

That’s it! We repeat this process until some condition is met: timeout is reached or the confidence intervals we mentioned earlier stabilize (convergence). Then we make our final decision based on the information we have gathered during the search. We can choose a node with the highest upper bound for pay-out (as we would have in each iteration of the Multi-Armed Bandit). Or, if you prefer, choose the one with the highest mean pay-out.

The decision is made, a move was chosen. Now it’s time for our opponent (or opponents). When they’ve finished we arrive at a new node, somewhere deeper in the tree, and the story repeats.

Not just for games

As you might have noticed, Monte Carlo Tree Search can be considered as a general technique for making decisions in perfect information scenarios. Therefore it’s use does not have to be restrained to games only. The most amazing use case I have heard of is to use it for planning interplanetary flights. You can read about it at this website but I will summarize it briefly.

Think of an interplanetary flight as of a trip during which you would like to visit more than one planet. For instance, you are flying from Earth to Jupiter via Mars.

An efficient way to do this is to make use of these planets gravitational field (like they did in ‘The Martian’ movie) so you can take less fuel with you. The question is when the best time to arrive and leave from each planets surface or orbit is (for the first and last planet it’s only leave and arrive, respectively).

You can treat this problem as a decision tree. If you divide time into intervals, at each planet you make a decision: in which time slot I should arrive and in which I should leave. Each such choice determines the next. First of all, you cannot leave before you arrive. Second - your previous choices determine how much fuel you have left and consequently - what places in the universe you can reach.

Each such set of consecutive choices determines where you arrive at the end. If you visited all required checkpoints - you’ve won. Otherwise, you’ve lost. It’s like a perfect information game. There is no opponent and you make a move by determining the timeslot for leave/arrival. This can be treated using the above Monte Carlo Tree Search. As you can read here, it fares quite well in comparison with other known approaches to this problem. And at the same time – it does not require any domain-specific knowledge to implement it.

Write your question and comments below. We’d love to hear what you think.

Original. Reposted with permission.

Related