EXAMPLE 7 Prisoners’ Dilemma

The Prisoners’ Dilemma is a two-person variable-sum game. It provides a simple explanation of the forces at work behind arms races, price wars, and agreeing to environmental regulations. In these and other similar situations, the players can do better by cooperating. But there may be no compelling reasons for them to do so unless there are credible threats of retaliation for not cooperating. The name Prisoners’ Dilemma was first given to this game by Princeton mathematician Albert W. Tucker (1905–1994) in 1950.

Before defining the formal game, we introduce it through a story.

Prisoners’ Dilemma STORY

The Prisoners’ Dilemma involves two people, each accused of the same crime, who are separated so that they cannot speak to one another (to corroborate their stories). Each prisoner has two choices: to maintain his or her innocence or to sign a confession accusing the partner of being the mastermind behind the crime. It is in each suspect’s interest to confess and implicate the partner, in an effort to receive a reduced sentence. Yet if both suspects confess, they ensure a bad outcome—namely, they are both found guilty. What is good for the prisoners as a pair—to deny having committed the crime, leaving the state with insufficient evidence to convict them—is frustrated by their pursuit of their own individual interests.

The game of Prisoners’ Dilemma, as we already noted, has many applications, but we will use it here to model a recurrent problem in international relations: arms races between antagonistic countries, which earlier included the superpowers of the United States and the Soviet Union but more recently have included such countries as India and Pakistan and Israel and some of its Arab neighbors. Other countries, such as Iran, may be antagonistic to more than one other country (e.g., Israel and the United States).

For simplicity, assume there are two nations, Red and Blue. Each can independently select one of two policies:

  • : Arm in preparation for a possible war (noncooperation).
  • : Disarm, or at least try to negotiate an arms-control agreement (cooperation).

There are four possible outcomes:

  • (, ): Red and Blue disarm, which is next best for both because, while advantageous to each, it also entails certain risks.
  • (, ): Red and Blue arm, which is next worst for both because they spend needlessly on arms and are comparatively no better off than at (, ).
  • (, ): Red arms and Blue disarms, which is best for Red and worst for Blue because Red gains a big edge over Blue.
  • (, ): Red disarms and Blue arms, which is worst for Red and best for Blue because Blue gains a big edge over Red.

639

This situation can be modeled by means of the matrix in Table 15.8, which gives the possible outcomes. Here, Red’s choice involves picking one of the two rows, whereas Blue’s choice involves picking one of the two columns.

Table 15.11: TABLE 15.8 The Outcomes in an Arms Race, as Modeled by the Prisoners’ Dilemma
Blue
Red Arms race Favors Red
Favors Blue Disarmament

We assume that the players can rank the four outcomes from best to worst, where , , , and ; thus, the higher the number, the greater the payoff. The resulting game is an ordinal game: It indicates an ordering of outcomes from best to worst but says nothing about the degree to which a player prefers one outcome over another. To illustrate, if a player despises the outcome that he or she ranks 1 but sees little difference among the outcomes ranked 4, 3, and 2, the “payoff distance” between 4 and 2 will be less than that between 2 and 1, even though the numerical difference between 4 and 2 is greater.

Self Check 6

Return to the payoff matrix in Table 15.3 from the restricted location/schedule game between Mark and Lisa. Rewrite the payoff matrix using ordinal payoffs from Mark’s perspective. How does this compare to rewriting the payoff matrix using ordinal payoffs from Lisa’s perspective?

  • Table 15.3 is rewritten with ordinal payoffs from Mark’s perspective in the payoff matrix to the left and from Lisa’s perspective in the payoff matrix to the right:

    2 1
    4 5
    6 3
    5 6
    3 2
    1 4

    For a fixed row and a fixed column, the entries of the payoff matrices sum to 7.

The ordinal payoffs to the players for choosing their strategies of and are shown in Table 15.9, where the first number in the pair indicates the payoff to the row player (Red), and the second number the payoff to the column player (Blue). Thus, for example, the pair (1, 4) in the second row and first column signifies a payoff of 1 (worst outcome) to Red and a payoff of 4 (best outcome) to Blue. This outcome occurs when Red unilaterally disarms while Blue continues to arm, making Blue, in a sense, the winner and Red the loser.

Table 15.14: TABLE 15.9 Ordinal Payoffs in an Arms Race, as Modeled by the Prisoners’ Dilemma
Blue
Red (2, 2) (4, 1)
(1, 4) (3, 3)

640

Let’s examine this strategic situation more closely. Should Red select Strategy or ? There are two cases to consider, which depend on what Blue does:

  • If Blue selects : Red will receive a payoff of 2 for and 1 for , so it will choose .
  • If Blue selects : Red will receive a payoff of 4 for and 3 for , so it will choose .

In both cases, Red’s first strategy () gives it a more desirable outcome than its second strategy (). Consequently, we say that A is Red’s dominant strategy because it is always advantageous for Red to choose over .

In the Prisoners’ Dilemma, dominates for Red, so we presume that a rational Red would choose . A similar argument leads Blue to choose A as well—that is, to pursue a policy of arming. Thus, when each nation strives to maximize its own payoffs independently, the pair is driven to the outcome (), with payoffs of (2, 2). The better outcome for both, (), with payoffs of (3, 3), appears unobtainable when this game is played noncooperatively.

The outcome (), which is the product of dominant strategy choices by both players in the Prisoners’ Dilemma, is called a Nash equilibrium.

Nash Equilibrium DEFINITION

When no player can benefit by departing unilaterally (by itself) from its strategy associated with an outcome, the strategies of the players constitute a Nash equilibrium. Technically, while it is the set of strategies that defines the equilibrium, the choice of strategies leads to an outcome that we shall also refer to as the equilibrium.

Note that in the Prisoners’ Dilemma, if either player departs from (), the payoff for the departing player who switches to drops from 2 to 1 at () and (). Not only is there no benefit from departing, but there is actually a loss, with the player selecting being punished with its worst payoff of 1. These losses would presumably deter each nation from moving away from the Nash equilibrium of (), assuming that the other nation sticks to .

Even if both nations agreed in advance jointly to pursue the socially beneficial outcome, (), in which both nations disarm and receive payoff 3, the result is called unstable: If either nation alone reneges on the agreement and secretly arms (as North Korea did when it developed nuclear weapons), it will benefit, obtaining its best payoff of 4. Consequently, each nation would be tempted to go back on its word and select . Nations with no great confidence in the trustworthiness of their opponents have good reason to try to protect themselves against the other side’s defection from an agreement by arming.

Prisoners’ Dilemma DEFINITION

The Prisoners’ Dilemma is a two-person variable-sum game in which each player has two strategies, cooperate or defect (not cooperate). Defect dominates cooperate for both players, even though the mutual-defection outcome, which is the unique Nash equilibrium in the game, is worse for both players than the mutual-cooperation outcome.

641

Note that if 4, 3, 2, and 1 in the Prisoners’ Dilemma were not just ranks but numerical payoffs, their sum would be at the mutual-defection outcome and at the mutual-cooperation outcome. At the other two outcomes, the sum, , is still different, illustrating why the Prisoners’ Dilemma is a variable-sum game.

In real life, of course, people often manage to escape the noncooperative Nash equilibrium in the Prisoners’ Dilemma. Either the game is played within a larger context, in which other incentives are at work, such as cultural norms that prescribe cooperation [though this is just another way of saying that defection from (, ) is not rational, rendering the game not the Prisoners’ Dilemma], or the game is played on a repeated basis—it is not a one-shot affair—so players can induce cooperation by setting a pattern of rewards for cooperation and penalties for noncooperation.

In a repeated game, factors like reputation and trust may play a role. Realizing the mutual advantages of cooperation in costly arms races, players may inch toward the cooperative outcome by slowly phasing down their acquisition of weapons over time, or even destroying them. (The United States and Russia have been doing exactly this.) They may also initiate other productive measures, such as improving their communication channels, making inspection procedures more reliable, writing agreements that are truly enforceable, or imposing penalties for violators when their violations are detected (as has occurred through reconnaissance or spy satellites).

The Prisoners’ Dilemma illustrates the intractable nature of certain competitive situations that blend conflict and cooperation. The standoff that results at the Nash equilibrium of (2, 2) is obviously not as good for the players as that which they could achieve by cooperating—but they risk a good deal if the other player defects.

While saddlepoints are Nash equilibria in total-conflict games, they can never be worse for both players than some other outcome (as in partial-conflict games like the Prisoners’ Dilemma). The reason is that if one player does worse in a total-conflict or zero-sum game, the other player must do better.

The fact that the players must forsake their dominant strategies to achieve the (3, 3) cooperative outcome (see Table 15.9) makes this outcome a difficult one to sustain in one-shot play. on the other hand, assume that the players can threaten each other with a policy of tit-for-tat in repeated play: “I’ll cooperate on each round unless you defect, in which case I will defect until you start cooperating again.” If these threats are credible, the players may well shun their defect strategies and try to establish a pattern of cooperation in early rounds, thereby fostering the choice of (3, 3) in the future.