Stochastic Games in Artificial Intelligence
Many unforeseeable external occurrences can place us in unforeseen circumstances in real life. Many games, such as dice tossing, have a random element to reflect this unpredictability. These are known as stochastic games. Backgammon is a classic game that mixes skill and luck. The legal moves are determined by rolling dice at the start of each player’s turn white, for example, has rolled a 6–5 and has four alternative moves in the backgammon scenario shown in the figure below.
This is a standard backgammon position. The object of the game is to get all of one’s pieces off the board as quickly as possible. White moves in a clockwise direction toward 25, while Black moves in a counterclockwise direction toward 0. Unless there are many opponent pieces, a piece can advance to any position; if there is only one opponent, it is caught and must start over. White has rolled a 6–5 and must pick between four valid moves: (5–10,5–11), (5–11,19–24), (5–10,10–16), and (5–11,11–16), where the notation (5–11,11–16) denotes moving one piece from position 5 to 11 and then another from 11 to 16.
Stochastic game tree for a backgammon position
White knows his or her own legal moves, but he or she has no idea how Black will roll, and thus has no idea what Black’s legal moves will be. That means White won’t be able to build a normal game tree-like in chess or tic-tac-toe. In backgammon, in addition to M A X and M I N nodes, a game tree must include chance nodes. The figure below depicts chance nodes as circles. The possible dice rolls are indicated by the branches leading from each chance node; each branch is labelled with the roll and its probability. There are 36 different ways to roll two dice, each equally likely, yet there are only 21 distinct rolls because a 6–5 is the same as a 5–6. P (1–1) = 1/36 because each of the six doubles (1–1 through 6–6) has a probability of 1/36. Each of the other 15 rolls has a 1/18 chance of happening.
The following phase is to learn how to make good decisions. Obviously, we want to choose the move that will put us in the best position. Positions, on the other hand, do not have specific minimum and maximum values. Instead, we can only compute a position’s anticipated value, which is the average of all potential outcomes of the chance nodes.
As a result, we can generalize the deterministic minimax value to an expected-minimax value for games with chance nodes. Terminal nodes, MAX and MIN nodes (for which the dice roll is known), and MAX and MIN nodes (for which the dice roll is unknown) all function as before. We compute the expected value for chance nodes, which is the sum of all outcomes, weighted by the probability of each chance action.
where r is a possible dice roll (or other random events) and RESULT(s,r) denotes the same state as s, but with the addition that the dice roll’s result is r.