Subgame Perfect Equilibrium

Machine Learning, a subset of artificial intelligence, has revolutionized various industries by enabling computers to learn and make decisions without explicit programming. One important concept in game theory and machine learning is the subgame perfect equilibrium.

Subgame perfect equilibrium is a strategic concept that provides a solution for multi-stage games, ensuring that players make optimal decisions at every stage, leading to an overall optimal outcome. By applying machine learning algorithms to analyze and predict player behaviour in multi-stage games, subgame perfect equilibrium can be achieved, leading to improved decision-making and better outcomes in various fields such as economics, politics, and even sports.

Subgame Perfect Nash Equilibrium

Subgame Perfect Nash Equilibrium, as the name suggests is a refinement of the Nash equilibrium, where the players choose the optimal strategy for all the subgames that may arise as a result of the main game, not only confined to the whole game. In simple words, it says that at any point in a game, the player’s action will lead to Nash equilibrium of the following game, i.e., the subgame, no matter what the earlier actions were.

Every finite repeated game with perfect recall has a subgame perfect Nash equilibrium.

Perfect recall is an assertion that each player is allowed by the rules of the game to remember everything he knew at previous moves and all of his choices at those moves.

Key terms in game theory

Below are some of the important concepts in game theory that you need to be aware of before diving into a folk theorem.

Nash Equilibrium: Nash equilibrium is an important concept in game theory that provides the optimal solution to be one when the player doesn’t deviate from its initial strategy and when he is aware of the other player’s strategy as well. None of the players will be changing their strategy and will be keeping it the same.
Utility: Utility is generally defined as the payoff of individual actions which can be grouped under transferable and non-transferable. Under transferable utility, the payoffs to the coalition need to be freely distributed among its members, whereas in the case of non-transferable, the payoff of the player in the coalition is pre-determined and so the value of the coalition cannot be described by a function.
Coalitional Game theory: Coalitional Game theory focuses on the achievement of groups of agents, rather in an individualistic sense. In such a case, the payoffs to a coalition may be freely redistributed among its members since we would be looking to the coalition as a whole. Such transferable utility is assumed to be satisfied whenever there is a universal currency used for exchange in the system.

What is a Subgame?

A subgame is a smaller game contained in a bigger game. An extensive-form game may have a portion that may be considered a smaller game in and of itself. A node x in a game is said to be a subgame, when x and all its successors are in the information sets which only contains the successors of x in an extensive form game.

Below is the representation of a game where all nodes initiate a subgame.

The initial node a denotes the strategies with Player 1 where his action can lead to either b or c, where Player 2 can initiate.
Further, taking b as the action for Player 2, he can lead to either of d or e, on the other hand, with c, either he can move to f or he has the strategy to end the game as well.
At last, with any of d, e or f, Player 1 can carry his action to lead the game.

Thе payoffs in a game are determined based on the outcomes of the game for each player. In the context of extensive-form games and backward induction, payoffs are typically represented as pairs of numbers, with the first number representing the payoff to Player 1 and the second number representing the payoff to Player 2. In the context of backward induction, the payoffs for each subgame are determined by working backward from the end of the game and considering the optimal strategies for each player at each decision point. This involves comparing the payoffs associated with different actions and selecting the action that maximizes the player’s payoff at each stage of the game.

Using the backward induction, the players will take the following actions for each subgame:

Subgame for actions p and q: Player 1 will take action p with payoff (3, -1) to maximize Player 1’s payoff, so the payoff for action W becomes (3,-1).
Subgame for actions r and s: Player 1 will take action s with payoff (4, 2) to maximize Player 1’s payoff, so the payoff for action X becomes (4,2).
Subgame for actions t and u: Player 1 will take action u with payoff (7, 2) to maximize Player 1’s payoff, so the payoff for action Y becomes (7,2).
Subgame for actions W and X: Player 2 will take action W to maximize their payoff, resulting in the payoff for action M becoming (2, 3).
Subgame for actions Y and Z: Player 2 will take action Y to maximize Player 2’s payoff, so the payoff for action N becomes (2,4).
Subgame for actions M and N: Player 1 will take action N to maximize Player 1’s payoff.

The process involves identifying subgames within the larger game and then determining the optimal strategies for each subgame using backward induction allowing the players to make the best decisions at each stage of the game, taking into account the future actions of their opponents.

How to check for Subgame Perfect Equilibrium?

To check Subgame Perfect Equilibrium (will be referred to as SPE, from now on), we will have to validate locally, whether the player will be taking the following action supporting the Nash equilibrium or not.

To perform this, we will have to identify the decisions independent of the actions happening in the rest of the game.

The following points needs to be kept in mind while checking for SPE.

The analysis of the decision as independent from the rest of the game is only possible iff the subset itself corresponds to a complete game.
As any sequential game, the subgame also starts with an initial decision node.
If the initial node is a part of the subgame, all the corresponding nodes should also be a part of the subgame.
We cannot check the payoff of the strategy until the terminal nodes.
If a node which is included in the subgame is a part of the information set containing another node, that another node should also be a part of the subgame as the decision can be taken separately in such a case.
Each game has at least one subgame, i.e., the game tree itself.

Therefore, A Nash equilibrium is considered subgame perfect equilibrium if and only if it Nash equilibrium exists in each of the game’s subgames.

Let us consider the below example.

Player 1: 
A   or      B
|              |
X or   Y       Z   or           W
|           |           |                |
(3, 2)  (1, 1)     (2, 3)      (0, 0)

The game starts at the initial node, and Player 1 has two possible actions: A or B. If Player 1 chooses action A, the game proceeds to a new decision node where Player 2 can choose between X or Y. If Player 1 chooses action B, the game proceeds to a new decision node where Player 2 can choose between Z or W. The payoffs for each player at the terminal nodes are as follows:

If Player 1 chooses A and Player 2 chooses X, the payoffs are (3, 2).
If Player 1 chooses A and Player 2 chooses Y, the payoffs are (1, 1).
If Player 1 chooses B and Player 2 chooses Z, the payoffs are (2, 3).
If Player 1 chooses B and Player 2 chooses W, the payoffs are (0, 0).

Now, If the strategies at each decision node within each subgame constitute a Nash equilibrium, then the game has a subgame perfect equilibrium.

Subgamе 1: The decision node for Player 1’s action A and Player 2’s subsequent actions X or Y.

If Player 1 chooses A, Player 2’s best response is to choose Y, as it leads to a higher payoff (1) compared to X (3).
If Player 1 chooses A and Player 2 chooses Y, neither player has an incentive to unilaterally deviate from their strategy, as any unilateral deviation would result in a lower payoff for the deviating player.

Hence, the strategies (A, Y) form a Nash equilibrium in Subgame 1.

Subgame 2: The decision node for Player 1’s action B and Player 2’s subsequent actions Z or W.

If Player 1 chooses B, Player 2’s best response is to choose Z, as it leads to a higher payoff (3) compared to W (0).
If Player 1 chooses B and Player 2 chooses Z, neithеr player has an incentive to unilaterally deviate from their strategy, as any unilateral deviation would result in a lower payoff for the deviating player.

Therefore, the strategies (B, Z) form a Nash equilibrium in Subgame 2.

The subgame perfect equilibrium for this game is as follows:

Player 1 chooses action A.
In Subgame 1, Player 2 chooses action X.
In Subgame 2, Player 2 chooses action Z.

Applications of Subgame Perfect Equilibrium

There can be multiple applications of SPE, quite a few of them has been listed below.

Finitely repeated Prisoner’s dilemma

The Prisoner’s Dilemma involves two suspects who are arrested, with the authorities lacking sufficient evidence to convict them on a greater charge. When they are interrogated, they have the option to either cooperate or betray each other. If both of them kept quite, they both would be facing milder charges and would be thrown out of the prison due to no evidence. The suspects are given the option to either cooperate with each other by remaining silent or betray each other by confessing. The payoffs are structured in such a way that each suspect has an incentive to betray the other, even though both would be better off if they both remained silent.

If both suspects choose to cooperate, they receive a moderate sentence due to lack of evidence (let’s say 2 years each). If one suspect betrays the other while the other cooperates, the betrayer goes free (0 years) and the cooperator receives a severe sentence (let’s say 5 years). If both betray each other, they both receive a moderately severe sentence (let’s say 4 years each).

The payoffs in this game are represented by the number of years of imprisonment for each suspect based on their combined choices.

Using backward induction, the last subgame requires players to defect each other, the unique Nash equilibrium. Because of this, all games prior to the last subgame will also play the Nash equilibrium to maximize their single-period payoffs.

Ultimatum game

In an ultimatum game, player 1 makes a proposal of splitting 1000 units among them and Player 2 can either accept or reject the proposal. The proposal corresponds to (x, 1000-x) such that the result can be splitted among them in such a way, that both get a percentage like (100, 900),………, (800, 200), (900, 100). If player 2 rejects, both player will get nothing. If he accepts, then the payoff would be determined by (x, 1000-x).

The payoffs associated with each outcome can be represented at the terminal nodes. For example, if Player 1 proposes to keep 80% of the money and offers 20% to Player 2, and Player 2 accepts, the payoffs could be $80 for Player 1 and $20 for Player 2. If Player 2 rejects the offer, both players receive $0.

Now, regarding the subgame perfect equilibrium with the associated payoffs, in the Ultimatum Game, a subgame perfect equilibrium occurs when both players are playing a Nash equilibrium at every subgame of the original game. In this case, the subgame perfect equilibrium would involve Player 1 making an offer that maximizes their payoff, and Player 2 accepting any positive offer to maximize their own payoff.

For example, if the sum of money is $100, a subgame perfect equilibrium might be Player 1 offering $30 and Player 2 accepting this offer. In this case, Player 1 maximizes their payoff by offering the smallest amount that Player 2 is willing to accept, and Player 2 maximizes their payoff by accepting any positive offer.

Conclusion

SPE corresponds to a refinement of Nash equilibrium. Games where the player is aware of the perfect information, the Nash equilibrium obtained through backwards induction is the Subgame Perfect Equilibrium. Here the player considers the last action of the game and identifies the final action which the mover may take in each case to maximize his/her utility, i.e., his ordinal preferences over a choice set. Again, one supposes the probable action of the last actor, and considers the second to last action so that he can maximize his utility, and the process is repeated until one reaches the final move of the game.

However, backward induction cannot be applied to games of incomplete information as this will entail cutting through non-singleton information sets.

Article Tags :

AI-ML-DS

Geeks Premier League

Machine Learning

Geeks Premier League 2023