Four phases

Selection (walk tree via UCB), Expansion (add new node), Simulation (random rollout), Backpropagation (update statistics up tree).

Advertisement

UCB1 formula

UCB(node) = win_rate + c · √(ln(parent_visits) / node_visits). Balance visited-good vs unvisited-uncertain.

Advertisement

vs Minimax

Minimax: perfect play if eval accurate. MCTS: no eval needed, learns from simulations. Anytime — stop anytime.

AlphaGo

MCTS + neural network policy/value. Neural net guides search, MCTS refines. Beat Lee Sedol 2016.