Monte Carlo Tree Search (Tic-Tac-Toe)

Monte Carlo Tree Search (MCTS) is a decision making algorithm that can be used to create strong AI for games. The algorithm works by creating a decision tree every time it needs to make a move.

This post gives a basic outline of the algorithm and demonstrates its effectiveness by creating a Tic-Tac-Toe implementation and then creating an MCTS agent to play the game.

There are many other posts online about MCTS that do a good job of describing how MCTS works in theory. The focus of this post is on the practical implementation of an MCTS agent.

If you want to take a look at the Java code, you can download it here. You should be able to open it in netbeans without a problem.

Advantages and Disadvantages:

The advantages of MCTS are:
  • it is anytime, i.e. it can be stopped at any time and can return the best move found so far.
  • the algorithm doesn't require any domain knowledge i.e. it doesn't need a good heuristic to make it work.
One disadvantage of the algorithm is
  • it does not perform well in games that have trap states. For example, in chess, one bad move can cause a winning player to become a losing player.

4 Phases of MCTS

MCTS builds its decision tree by iterating over the following 4 phases:
  1. Selection: Find the best node to start exploring, until we come across a node with unexplored children or the game is over.
  2. Expansion: If the current selected node has unexplored child nodes, select a child and add it to the decision tree.
  3. Rollout: Simulate the game till the end from the currently selected node.
  4. Back Propagation: Take the rewards received by each player at the end of the game and pass those values back up the tree to update all node statistics.

Tic Tac Toe Implementation

The first piece of this project is to code up a Tic-Tac-Toe implementation. There are several classes that I created in order to implement the Tic Tac Toe Game:
WinningLine: Stores a collection of points that represent a three in a row.
                        
WinningLines: This class is responsible for creating and maintaining the list of all the winning lines on the board. It was implemented using the Singleton Patern because our MCTS player will need to duplicate game states but we always want to use the same set of winning lines to optimise memory and speed.
                        
GameSymbols: A simple class that will hold the characters used to represent the X's O's and empty spaces on the board.
                        
TicTacToeMove: Represents a move in the game by a point and a symbol.
                        
TicTacToeBoard: Contains the grid of the game and a bunch of methods to query the grid.
                        
Reward: This class is required for the MCTS implementation. If a player wins a game, the player receives a reward of 1, if the player loses, the player will receive a reward of -1. Draws will give players a reward of 0.
                        
TicTacToeGame: Contains a TicTacToeBoard and alsos keeps track of which player should move next. Also has methods specifically required by the MCTS algorithm, such as getPlayerToMove(), getReward() and getAvailableMoves().
                        
TicTacToePlayer: An interface that will allow us to add different types of players into our game.
                        

Adding a human player

We just created a ton of code, so we should create some way of making sure that our Tic-Tac-Toe code is working before we continue. Unit tests are a good way of doing this, but unit testing all that code is out of the scope of this post. We will instead allow a human to interact with the game.
BufferedReaderPlayer: This will allow human players to play the game using the command line.
                        
Now that we have our player, we can create a new Main class and add the following lines to play a game:
                        
When we run it, it should present us with a menu of options and then print the board once have chosen our option. Our Tic-Tac-Toe game is now complete. But we sill need to add an MCTS player.

Adding an MCTS player

Now the fun begins - we can finally start implementing our MCTS agent. Before we get to the MCTS player, we need to create one thing. We need access to a node object that can be used to build our decision tree.
MctsNode: this class was created to be the building block of our decision tree. The class contains many fields, but they are all necessary for the MCTS algorithm.
  • parent: keeps tracks of the parent node so that we can perform the backpropagation step of MCTS.
  • numSimulations, reward, children, player: are all necessary because they are used in the formula for the selection policy.
  • unexploredMoves: we want to keep track of unexpanded nodes because we need them for the the expansion part of the algorithm.
  • moveUsedToGetToNode: this is an interesting one. The thing about the MCTS algorithm is that it builds its decision tree by performing moves on the game. Imagine in Chess, if a player kept moving all the pieces of the board around every time they were thinking about their move. This would be a mess, they might not even remember where all the pieces were before they started moving. In the same way that Chess players must think about their moves in their heads, MCTS must make it's moves on a copy of the game. Instead of the original game. At each node, the MCTS algorithm could store a copy of the game at that node or it could store just the move it used to get to the node from the parent node. It is generally more memory efficient to store only the move.
                        
MctsTicTacToePlayer: I mentioned at the begining of this post that the MCTS algorithm is an iterative algorithm. Since it is an iterative algorithm, we need to know when to stop iterating. Our MctsTicTacToePlayer takes in an "int" in its constructor which is the number of iterations to perform before returning a move. As mentioned in our description of MctsNode, when the algorithm is "thinking" about its best move, it needs to make moves on the copy of the game. So at the beginning of each iteration, we start off by creating a copy of the game. As we traverse the tree with the select method, we make sure to play the same moves that were used to get to those nodes of the tree.
                        
That's it! We're done. It only took us 2 classes to implement MCTS. All the other classes created in this post were related to the Tic-Tac-Toe implementation.

We still have one problem though - How do we know how many simulations to use for the Mcts Tic Tac Toe Player? Well we know that in tic tac toe, if both players play perfectly, then no player should lose.

We don't want to use too many simulations, otherwise our MCTS player will take too long to make a move, we don't want to use too little, otherwise the MCTS player won't always play perfectly.

I have found that around 5000 simulations is a good amount - I have found that around 5000 simulations is a good amount - You can always increase this amount later if you are betting someone money on the game.

Fine tuning the MCTS player

How did I find that 5000 was a good amount? - I wrote a bunch of unit tests using JUnit! The following unit tests all have for loops inside of them to test the MCTS agent mutliple times. This is because the MCTS agent is non deterministic - i.e it doesn't always do the same thing.
At the top of our file, we should create a field that holds the number of simulaitons that of agent will use for each test:
                        
The most obvious thing to check is that our agent should stop an immediate loss when presented with one. So we set up some board states where our player is put into this situation and then we get a move from our agent. We then check that this move was the move that stops the loss.
                        
                        
Something more interesting to check is that our player takes a trick win when presented with one.
                        
These initial tests can probably pass quite easliy with a smaller number of simulations. This is because the board has some moves made on it already, and so the decision tree that needs to be built is smaller, so less exploring is needed.

In Tic-Tac-Toe, the following position and its reflections are a win for the X player.



This is because X can play in row 2, col 0 and force the O player to play at row 0 column 2.



Now x has a trick win by playing at row 0, col 0.

This test should require more simulations than the previous tests in order to pass 100 times.
                        
Now because of the previous tests, it makes sense that if our MCTS player is the O player, we should never move on the red squares if X moves in the center.



This time, there is only one move on the board when we ask our agent to make a decision. This means that the decision tree created is even larger, and so this test will take even more simulations in order to pass the test 100 times. But you'll notice that this test actually runs 100000 times instead of 100. This is because I found that sometimes, very rarely it would not pass. When run with 10000 trials it would fail about 9 times for a certain number of simulations. This means that sometimes it could pass the test test and not show that it will lose some games. We always want to know when our agent will play incorrectly. This made me up the number of trials to 100000. This test takes a about an hour to run.
                        
At that point, I couldn't think of any more of these tests to write. So I decided that I would test that the MCTS never loses. The game between two perfect players should always be a draw.
                        

Playing MCTS against Humans

The whole point of creating this agent was so that we could show off how good it is! Our current TicTacToeGame doesn't print out the board after each player has made a move. This is probably for the best, otherwise our console would just be cluttered boards when our MCTS agent used the play() method in the TicTacToeGame.

LoggingTicTacToeGame: extends our TicTacToeGame so that it prints out the board every time a player makes a move.
                    
Now we can change our Main class to have this code to play against our MCTS player:
                    

Conclusion

This nice thing about this algorithm is that it comes in with a built in difficuly picker. If we want it to be easier to play against, we just decrease the number of simulations. To make it more difficult, we increase the number of simulations. This concludes our implementation of the Tic-Tac-Toe game and the MCTS player. In the future I may make a visualizer for the human player that will allow them to click on the screen instead of having to use the console to play. I may also adapt our MCTS player to play connect 4 or some other game.