A survey of existing Monte Carlo Tree Searchmethods – with applications in game playing AI.Sabarinath MohandasTVE16CSCE11Department of Computer Science and EngineeringCollege of Engineering TrivandrumDr Abdul NizarProfessorCollege of Engineering TrivandrumAbstract–Monte Carlo tree search (MCTS) is a heuristicsearch method that is used to efficiently search decision trees.The method is particularly efficient in searching trees with ahigh branching factor. MCTS has a number of advantagesover traditional tree search algorithms like simplicity, adaptabilityetc. This paper is a survey of existing literature onMCTS. It intends to provide an overview about MCTS, itsadvantages and disadvantages, and the many enhancementsand variations done on it, especially with regard to gameplaying.Index Terms–Artificial intelligence (AI), game playing,Monte Carlo tree search (MCTS), genetic algorithms (GA),gomoku, Connect-5.1 IntroductionA decision tree is a commonly used decision supporttool that uses a tree-like graph, showing the current state,the decisions that can be taken, their possible outcomes, resourcecosts and utility. The number of decisions that haveto be evaluated to find the best one depends on the complexityof the problem at hand. For very complex problems thedecision tree has a large number of possible actions. Most ofthese paths are low reward actions. The intuition of humanmind is very efficient in such scenarios, wherein all possibilitiesare quickly evaluated, and low reward paths are promptlyeliminated. This allows a human to concentrate on the fewremaining high reward paths. Machine A.I, on the other handlacks this intuition ability, forcing it to evaluate each and everypath before deciding which one is the best.An example is shown below, which is a directed graphcontaining all possible moves from each position, whosenodes are positions in a game and whose edges are moves. Itis a type of Markov Decision Process (S, A, R, d, g) whereinS – State, A – Action, R – a Reward Function (S x A), d – aTransition function and g – a Discount Factor.Fig. 1. Sample game tree of tic-tac-toe2 MCTS – Monte Carlo Tree SearchMonte Carlo tree search (MCTS) combines the precisionof tree search with the generality of random sampling.It has received considerable interest lately due to its spectacularsuccess in the difficult problem of computer Go. It hasalso proved beneficial in a range of other domains. MCTSis ideal both deterministic as well as stochastic games whereheuristic search meets its limits. The basic idea of MCTS isto view the action selection in a state as a multiarmed banditproblem and sample the outcome in a Monte Carlo fashion.Fig. 2. Stages of Monte Carlo Tree SearchFigure above shows the general process of MCTS. Firsta selection strategy is applied until a leaf node is reached.The stored tree is then expanded by adding a new node tothis leaf. Its quality is estimated by performing a rolloutand propagating the achieved outcome as an estimate for allnodes among the selected path. Those nodes store the numberof updates and the average estimate, to trade off explorationof new nodes with exploitation of already promisingnodes. Practically, depth limits are used if it is too expensiveto compute rollouts till a terminal state. A benefit ofMCTS is that the values of intermediate states do not have tobe evaluated (as in minimax search). Only the value of theterminal node at the end of each simulation is required.Searching game-tree always brings up the question ofexploration vs exploitation. I.e upon finding a suitable node,should resources be spend to search for a better node; or isit better to repeatedly keep on selecting this node itself. Thisis analogous to the problem in which a gambler at a row ofslot machines has to decide how many times to play eachmachine. The Upper Confidence Bounds (UCB) value is anideal method to used in an AI agent to balance exploration vsexploitation. The UCB value is calculated as vi + C * sqrt(ln N / ni ), where vi is the estimate value, ni is the numberof times the node has been visited, N is the total numberof times its parent has been visited and C is a tunable biasparameter.3 Literature Review3.1 Searching for Solutions in Games and Artificial IntelligenceDr L.V Allis 1 had introduced the proof-number search(p-n search) method in his doctoral thesis in 1994. The gametree of a two-person, zero-sum, perfect information game ismapped to an andor tree. Maximizing nodes become ORnodes,minimizing nodes are mapped to AND-nodes. For allnodes proof and disproof numbers are stored, and updatedduring the search. Maximizing nodes become OR-nodes,minimizing nodes are mapped to AND-nodes. For all nodes,proof and disproof numbers are stored, and updated duringthe search. A proof number represents the minimum numberof leaf nodes which have to be proved in order to prove thenode. A disproof number represents the minimum numberof leaves which have to be disproved in order to disprove thenode. Because the goal of the tree is to prove a forced win,winning nodes are regarded as proved. Therefore, they haveproof number 0 and disproof number . Lost or drawn nodesare regarded as disproved. They have proof number and disproofnumber 0. The proof number of an internal AND nodeis equal to the sum of its children’s proof numbers, since toprove an AND node all the children have to be proved. Thedisproof number of an AND node is equal to the minimumof its children’s disproof numbers. The disproof number ofan internal OR node is equal to the sum of its children’s disproofnumbers, since to disprove an OR node all the childrenhave to be disproved.3.2 Two-Stage Monte Carlo Tree Search for Connect6Shi-Jim Yen et al2 introduced a new MCTS variantcalled Kavalan for playing Connect6. It uses a two-stageMCTS as shown below.Fig. 3. Two stage MCTS algorithmThe first stage focuses on threat space search (TSS),which is designed to solve the sudden-death problem (theproblem when an immediate move is the best one leading tovictory straightaway, but it is not found out by the heuristicsearch). Threat Space Search (TSS) – uses the concept ofthreat (defined in L.V Allis 1 paper.) and aims to generate alarger number of threats than the number of legal stones thedefender can play. This ensures the defender cannot blockvictory for the attacker. This method can significantly decreasethe searching states, and quickly identifies a possiblewinning move for the present position. A double-threat TSSis introduced wherein the player finds double-threat moves.This double move generation is done using connection pattern.A conservative defense, is a method Defender uses toblock double-threat moves. A double-threat TSS with thiskind of defense is called a conservative threat space search(CTSS). This approach is based on having Defender placestones on all the cells in normal defense moves. For thedouble-threat TSS in Connect6, this study proposes an algorithmcalled iterative threat space search (ITSS) whichcombines normal TSS with conservative threat space search(CTSS).Consider the attacker wishes to play an offensive moveA. The attacker wishes to evaluate if A would lead to a surewin. But he cannot search the game tree because its too largeand he does not know the exact defensive move the defenderwould play. To reduce the search space, the attacker considersthat the defender has played all the defensive moves possible.Making this assumption enables the attacker to searchsearching for an offensive move from the list of defensivepositions that the defending player is forced to play from, asshown in the below figure.Fig. 4. Eg of ITSS finding double-threat solutionAn important concept of using relevance-zone searchto accelerate identifying winning and losing moves is introduced.The relevance zone concept being, trying to combinedefensive and offensive moves together. Consider acase wherein the opponent player B has a CTSS solution thatis guaranteed to win. Player A now does a relevance zonesearch in an attempt to defend against B.3.3 Evolving Gomoku Solver by Genetic AlgorithmJunruWang et al3 have introduced a framework for applyinggenetic algorithm(GA) to games. GA performs searchin a way similar to the natural selection process. Each individualstate is coded as a set of genes. Individual state evolveby crossing-over with other individuals and by mutating theirgenome. The fitness function assesses the quality of a candidate.The possible layouts are listed into 12 types as shownbelow along with their values.Fig. 5. Layout Patterns and their valuesAnother aspect considered is to favor early victories.When five in a row occurs, subsequent steps become redundant.One percent of the population involves a mutated genotype.The best individual in each generation (i.e. iteration)is kept, and the genetic algorithm converges if the first genotypeof these best individuals stabilizes for five consecutivegenerations.The game considered in the paper, Gomoku, is a livestrategical game and short response time is crucial. It isfound that the GA based solver converges quickly. All testsrun converged quickly, within seven runs.3.4 MCTS-Minmax HybridsHendrik Baier and Mark H. M.Winands have introduceda hybrid approach for MCTS, that integrate shallow minimaxsearches into the MCTS framework. Normal MCTSruns a higher risk, than full-width minimax search, of missingindividual moves and falling into traps. This is observablein games such as Chess and Checkers in which the traditionalapproach to adversarial planning, minimax searchwith pruning, remains superior. In trap situations such asthose frequent in Chess, precise tactical play is required toavoid immediate loss. MCTS methods based on samplingcould easily miss a crucial move or underestimate the significanceof an encountered terminal state due to averaging valuebackups. Conversely, MCTS could be more effective in domainssuch as Go, where terminal states and potential trapsdo not occur until the latest stage of the game. This problemis not present in games like Go, because in regions ofa search space containing no or very few terminal positions,shallow traps should be rare and MCTS variants should makecomparatively better. The paper outlines three approaches ofusing minimax, in the selection/ expansion phase, the rolloutphase, and the backpropagation phase of MCTS.A. Minimax in the Rollout Phase. While uniformly randommove choices in the rollout are sufficient to guaranteethe convergence of MCTS to the optimal policy, more informedrollout strategies typically greatly improve performance.For this reason, it seems natural to use fixed-depthminimax searches for choosing rollout moves. The two notabledifference from MCTS are1) A minimax search is started to find the first rollout move.Consider there are two moves represented by child nodes Aand B. If the opponent has a winning answer to move A,move B is chosen instead in this example.2)After B is choosen, another minimax search is conductedfor this second rollout move. In this case, no terminal statesare found and a random move choice will be made.Fig. 6. Minimax in the Rollout PhaseB. Minimax in the Selection and Expansion Phases.This strategy can use a variety of possible criteria to choosewhether or not to trigger a minimax search at any state encounteredduring the traversal. It could be e.g a minimaxsearch as soon as a state has reached a given number of visits,or starting a minimax search for a loss as soon as a givennumber of moves from a state have already been proven to belosses. This strategy improves MCTS search by performingshallow-depth, full-width checks of the immediate descendantsof a subset of tree nodes.Fig. 7. Minimax in the Selection PhaseC. Minimax in the Backpropagation Phase. This methodemploys shallow minimax searches, actively searching forproven losses instead of hoping for the MCTSSolver to findthem in future simulations. If minimax succeeds at provingall moves from a given state to be losses, we can backpropagatea proven loss instead of just a loss to the next-highesttree level.Fig. 8. Minimax in the Backpropogation PhaseThe following results were obtained from their experimentation.The baseline MCTS-Solver scales better due tobeing faster and building larger trees. The larger trees couldhelp avoid deeper traps than the MCTS-MR rollouts can detect.The branching factor has a strong effect on MCTSMR.Deeper minimax searches in MCTS rollouts scale evenworse.The three knowledge-free approaches of integratingminimax into MCTS were examined: the application of minimaxin the rollout phase with MCTS-MR, the selectionand expansion phases with MCTS-MS, and the backpropagationphase with MCTS-MB. The newly proposed variantMCTS-MS significantly outperformed regular MCTS withthe MCTS-Solver extension in games like Connect-4.4 ConclusionsMCTS has been shown to have many advantages.MCTS offers significant advantages over other methods likealpha-beta pruning. MCTS does not require an explicit evaluationfunction. The flexibility of MCTS allows it to be hybridizedwith a range of other techniques. MCTS can beemployed to any problem just by having the state, transitionsand win conditions. I.e, using MCTS, effective gameplaycan be obtained with no knowledge of a game beyondits rules. MCTS can be further enhanced easily by incorporatinghuman knowledge, machine learning or other heuristicapproaches. MCTS tree search reduces branching factorand concentrates on more promising subtrees, thus results aremuch better than classical algorithms.A disadvantage is that, in cases where a single branchleads to a single loss node, MCTS may not always find it.Because this is not easily found at random, the search maynot take it into account. For video game and real-time controlapplications, a systematic way to incorporate knowledge isrequired in order to restrict the subtree to be searched.References1 Louis Victor Allis, “Searching for Solutions in Gamesand Artificial Intelligence” in Doctoral Thesis, Universityof Limburg, Maastricht, The Netherlands.2 Shi-Jim Yen and Jung-Kuei Yang, “Two-Stage MonteCarlo Tree Search for Connect6,” IEEE Transactionson Computational Intelligence and AI in Games, Vol.3, No. 2, June 20113 Junru Wang and Lan Huangb, “Evolving GomokuSolver by Genetic Algorithm,” 2014 IEEE Workshopon Advanced Research and Technology in Industry Applications(WARTIA)4 Cameron B. Browne and Edward Powley, “A Survey ofMonte Carlo Tree Search Methods,” IEEE Transactionson Computational Intelligence and AI in Games, Vol. 4,No. 1, March 2012