Silvia Rossi!

(1)

Lezione n. !

Corso di Laurea:! Insegnamento:!

Email:!

A.A. 2014-2015!

Silvia Rossi !

Computing equilibria

Lezione n.

11

Informatica!

Insegnamento:

Sistemi !

multi-agente!

Email:

[email protected]!

(2)

Computing Equilibria !

(MAS: 4.1, 4.5)

2

(3)

Computing NE for zero sum games !

s

₂

and U

₁

* are variables

Pure strategy leading to U

₁

* value are in the best responce set

3

(4)

Computing NE for zero sum games !

s

₁

and U

₁

* are variables

Pure strategy leading to U

₁

* value are in the best responce set

4

(5)

The problem of finding a Nash equilibrium of a two-player,

general-sum game cannot be formulated as a linear program. ! The two players’ interests are no longer diametrically opposed. ! We cannot state our problem as an optimization problem: one player is not trying to minimize the other’s utility. !

5

(6)

1. (Uniqueness) Given a game G, does there exist a unique equilibrium in G?! 2. (Pareto optimality) Given a game G, does there exist a strictly Pareto

e"cient equilibrium in G?!

3. (Guaranteed payo#) Given a game G and a value v, does there exist an

equilibrium in G in which some player i obtains an expected payo# of at least v?!

4. (Guaranteed social welfare) Given a game G, does there exist an equilibrium in which the sum of agents’ utilities is at least k?!

5. (Action inclusion) Given a game G and an action ai ! Ai for some player! i ! N, does there exist an equilibrium of G in which player i plays action ai! with strictly positive probability?!

6. (Action exclusion) Given a game G and an action ai ! Ai for some player! i ! N, does there exist an equilibrium of G in which player i plays action ai! with zero probability?!

6

(7)

The following problems are NP-hard when applied to Nash equilibria:

uniqueness, Pareto optimality, guaranteed payoff,

guaranteed social welfare, action inclusion, and action exclusion.

Computing all of the equilibria of a two-player, general- sum game requires worst-case time that is exponential in the number of actions for each player. !

7

(8)

Computational problems in domination !

• !Identifying strategies dominated by a pure strategy

• !Identifying strategies dominated by a mixed strategy

• !Identifying strategies that survive iterated elimination

• !Asking whether a strategy survives iterated elimination underall elimination orderings

• !We'll assume that i's utility function is strictly positive everywhere (why is this OK?)

8

(9)

Pure strategy !

9

(10)

Domination by a mixed strategy !

Constraints for determining whether s

_i

is strictly dominated by any mixed strategy

10

(11)

Domination by a mixed strategy !

Constraints for determining whether s

_i

is strictly dominated by any mixed strategy

What's wrong with this program?

11

(12)

This is clearly an LP. Why is it a solution to our problem?

12

(13)

Further questions !

1 (Strategy Elimination) Does there exist some elimination path under which the strategy s_i is eliminated?

2 (Uniqueness) Does every elimination path lead to the same reduced game?

4 (Reduction Size) Given constants ki for each player i, does there exist a maximally reduced game where each player i has exactly ki actions?

For iterated strict dominance these problems are all in P.

For iterated weak or very weak dominance these problems are all NP-complete.

13

(14)

Computing equilibria in sequential games !

The concept of subgame-perfect equilibrium is related to the concept of backward induction

Identify the equilibria in the bottom most tree and adopt these as one moves up the tree

14

(15)

For zero-sum games, Backward Induction has another name: the minimax algorithm.

Here it's enough to store one number per node.

It's possible to speed things up by pruning nodes that will never be reached in play: “alpha-beta pruning".

15

(16)

MiniMaxvalue del nodo n ! Minimax(n): !

• ! Utility(n) se n è terminale

• ! max

s!Successors(n)

Minimax_value(n) se n nodo Max

• ! min

s!Successors(n)

Minimax_value(n) se n nodo Min

Infatti quando Max deve giocare sceglie per

massimizzare la sua utilità, mentre Min gioca per minimizzare l’utilità di Max. !

16

(17)

Intuizione da usare per il minimax ! l’albero del gioco può essere visto come un albero

“and” “or” che codifica il ragionamento di Max: !

• ! in uno stato s in cui devo giocare per ogni mia mossa (figlio di un or) devo prendere in considerazione tutte le possibili risposte

(figli di un and).

• ! La migliore mossa per me sarà quella che porterà il mio avversario a darmi il maggior vantaggio, tenuto conto che lui gioca a

farmi perdere (max dei min).

17

(18)

Algoritmo MiniMax ! E’ un algoritmo che serve più come base per altri

algoritmi con complessità più a #rontabile e per

studiare matematicamente i giochi che per essere realmente utilizzato. !

Computa nello stato corrente la decisione basandosi sui valori minimax. !

18

(19)

Algoritmo MiniMax ! Usa la computazione ricorsiva dei valori minimax dei

successori, utilizzando la definizione ricorsiva di essi fino ad arrivare alle foglie dell’albero in cui finalmente trova i valori dell’utility function e incomincia il back up dei valori. !

19

(20)

Algoritmo MiniMax ! Minimax compie una completa esplorazione depth first dell’ albero: !

• ! per m=max del depth,

• ! e b il fattore di allargamento

la complessità temporale è O(b

^m

) e spaziale O(mb). !

20

(21)

Properties of minimax !

Complete? Yes (if tree is finite)

Optimal? Yes (against an optimal opponent) Time complexity? O(b

^m

)

Space complexity? O(bm) (depth-first exploration)

(22)

But in general the search tree is

too big to make it possible to reach the terminal states!

Examples:

• ! Checkers: ~10

⁴⁰

nodes

• ! Chess: ~10

¹²⁰

nodes for "reasonable" games

! exact solution completely infeasible

22

(23)

Evaluation Functionof a State !

e(s) = +" if s is a win for MAX e(s) = -" if s is a win for MIN

e(s) = a measure of how “favorable”

is s for MAX

> 0 if s is considered favorable to MAX

< 0 otherwise

23

(24)

Construction of an Evaluation Function !

" ! Usually a weighted sum of “features”:

" ! Features may include

" ! Number of pieces of each type

" ! Number of possible moves

" ! Number of squares controlled

(25)

Example: Tic-Tac-Toe ! e(s) = number of rows, columns, and diagonals open for MAX

- number of rows, columns, and diagonals open for MIN

25

8-8 = 0 6-4 = 2 3-3 = 0

(26)

Example !

26

6-5=1

5-6=-1 5-5=0

5-5=0 6-5=1 5-5=1 4-5=-1

5-6=-1

6-4=2 5-4=1

6-6=0 4-6=-2

-1

-2

1 Tic-Tac-Toe with 1

horizon = 2

(27)

Example !

27

0 1

1

1 3

2 1 2 1

1

0

1 1 0 0 2 0

1 1 1

2 2 3 1 2 2

(28)

Example !

28

(29)

Why using backed-up values? !

" ! At each non-leaf node N, the backed-up value

is the value of the best state that MAX can

reach at depth h if MIN plays well (by the same criterion as MAX applies to itself)

" ! If e is to be trusted in the first place, then the

backed-up value is a better estimate of how

favorable STATE(N) is than e(STATE(N))

(30)

Minimax procedure !

1.! Expand the game tree uniformly from the current state (where it is MAX’s turn to play) to depth h

2.! Compute the evaluation function at every leaf of the tree

3.! Back-up the values from the leaves to the root of the tree as follows:

1.! A MAX node gets the maximum of the evaluation of its successors

2.! A MIN node gets the minimum of the evaluation of its successors

4.! Select the move toward the MIN node that has the maximal

backed-up value

30 Horizon of the procedure

Needed to limit the size of the tree or to

return a decision within allowed time

(31)

Issues !

Choice of the horizon !

Size of memory needed !

Number of nodes examined !

31

(32)

Giochi a Somma Zero: Alpha/Beta !

(RN: 6.3, 6.4, 6.6)

32

(33)

Number of nodes examined ! Limitare i nodi esaminati (soprattutto sviluppati) è

essenziale: !

• ! la scelta dell’ orizzonte serve anche a questo, ma non è sufficiente.

• ! Per diminuire ulteriormente si introduce la strategia !/".

33

(34)

Can we do better? ! Yes ! Much better !

-!

# Pruning

#

This part of the tree can’t have any effect on the value that will be backed up to the root

(35)

Can we do better? ! Yes ! Much better !

3

-1

# Pruning

$ -1

% 3

#

This part of the tree can’t have any effect on the value that will be backed up to the root

(36)

Alpha-Beta Procedure ! Generate the game tree to depth h in depth-first

manner !

Back-up estimates (alpha and beta values) of the evaluation functions whenever possible !

Prune branches that cannot lead to changing the final decision !

36

(37)

Significato dei valori " #!

! rappresenta per un nodo max il valore del massimo della funzione di valutazione dei valori dei figli min fino a quel momento sviluppati: !

• ! non può mai diminuire (peggio di così non può andare)

" rappresenta per i nodi min il valore del minimo della funzione di valutazione dei figli max fino a quel

momento sviluppati !

• ! non può mai aumentare (meglio di così non può andare)

se " è ad un certo istante $ dell’! di suo padre la

ricerca sotto di lui può essere troncata. ! ₃₇

(38)

"-# pruning example!

(39)

"-# pruning example!

(40)

"-# pruning example!

(41)

"-# pruning example!

(42)

"-# pruning example!

(43)

Alpha-Beta Pruning !

" ! Explore the game tree to depth h in depth-first

manner

" ! Back up alpha and beta values whenever

possible

" ! Prune branches that can’t lead to changing the

final decision

(44)

Alpha-Beta Example !

0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2

44

0

(45)

Alpha-Beta Example !

0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2

45

0 0

(46)

Alpha-Beta Example !

0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2

46

0 0 -3

(47)

Alpha-Beta Example !

0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2

47

0 0 -3

Search can be

discontinued below any MIN node whose beta value is

less than or equal to the alpha value

of one of its MAX ancestors

(48)

Alpha-Beta Example !

0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2

48

0

0 -3

(49)

Alpha-Beta Example !

0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2

49

0

0 -3 3

3

(50)

Alpha-Beta Example !

0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2

50

0

0 -3 3

3

Search can be

(51)

Alpha-Beta Example !

0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2

51

0

0 -3 3

3

0

(52)

Alpha-Beta Example !

0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2

52

0

0 -3 3

3

0

5

(53)

Alpha-Beta Example !

0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2

53

0

0 -3 3

3

0

2

(54)

Alpha-Beta Example !

0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2

54

0

0 -3 3

3

0

2

Search can be

(55)

Alpha-Beta Example !

0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2

55

0

0 -3 3

3

0

2

2 2

2

(56)

Alpha-Beta Example !

0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2

56

0

0 -3 3

3

0

2

2 2

2

Search can be

(57)

Alpha-Beta Example !

0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2

57

0

0 -3 3

3

0

2

2 2

2

0

(58)

Alpha-Beta Example !

0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2

58

0

0 -3 3

3

0

2

2 2

2

5 0

(59)

Alpha-Beta Example !

0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2

59

0

0 -3 3

3

0

2

2 2

2

1

1 0

(60)

Alpha-Beta Example !

0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2

60

0

0 -3 3

3

0

2

2 2

2

1

-3 0

(61)

Alpha-Beta Example !

0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2

61

0

0 -3 3

3

0

2

2 2

2

1

-3 0

Search can be

(62)

Alpha-Beta Example !

0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2

62

0

0 -3 3

3

0

2

2 2

2

1

-3 1

1 0

(63)

Alpha-Beta Example !

0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2

63

0

0 -3 3

3

0

2

2 2

2

1

-3 1

1

-5 0

(64)

Alpha-Beta Example !

0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2

64

0

0 -3 3

3

0

2

2 2

2

1

-3 1

1

-5 0

Search can be

(65)

Alpha-Beta Example !

0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2

65

0

0 -3 3

3

0

2

2 2

2

1

-3 1

1

-5 -5 -5 0

(66)

Alpha-Beta Example !

0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2

66

0

0 -3 3

3

0

2

2 2

2

1

-3 1

1

-5 -5 -5 0

Search can be

(67)

Alpha-Beta Example !

0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2

67

0

0 -3 3

3

0

2

2 2

2

1

-3 1

1

-5 -5 -5 0

1

(68)

Alpha-Beta Example !

0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2

68

0

0 -3 3

3

0

2

2 2

2

1

-3 1

1

-5 -5 -5 1

2 2 2

2 0

(69)

Alpha-Beta Example !

0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2

69

0

0 -3 3

3

0

2

2 2

2

1

-3 1

1

-5 -5 -5 1

2 2 2

2 1

(70)

How much do we gain? !

" ! Then alpha-beta examines O(b

^h/2

) nodes

[Knuth and Moore, 1975]

" ! Branching factor from b -> sqrt(b)

" ! Chess: 35 -> 6

" ! But this requires an oracle (if we knew how to

order nodes perfectly, we would not need to search the game tree)

" ! If nodes are ordered at random, then the

average number of nodes examined by alpha-

beta is ~O(b

^3h/4

)

(71)

Centipede game _!

At each turn choose between going “down” and ending the game or going “across” and

continuing it.

71

(72)

Criticisms to backward induction !

The payoffs are constructed in such a way that the only SPE is for each player to

always choose to go down.

72

(73)

Criticisms to backward induction !

73 SPE prediction in this case flies in the face of intuition

(in laboratory experiments subjects in fact continue to

play “across” until close to the end of the game)

(74)

Criticisms to backward induction !

74 The second problem is theoretical.

Imagine that you are the second player in the game, and in the first step of the game the first player

actually goes across.

What should you do? The SPE suggests you should go down, but the same analysis suggests that you would not have gotten to this choice point in the first place.

This seemingly small inconvenience actually raises a

fundamental problem in game theory.

(75)

Repeated Games!

75

(76)

The Prisoner’s Dilemma !

Two men are collectively charged with a crime and held in separate cells, with no way of

meeting or communicating. They are told that:

if one confesses and the other does not, the confessor will be freed, and the other will be jailed for three years

if both confess, then each will be jailed for two years

Both prisoners know that if neither confesses, then they will each be jailed for one year

76

(77)

The Prisoner’s Dilemma !

Payoff matrix for

prisoner’s dilemma:

Top left: If both defect (confess), then both get punishment for mutual defection

Top right: If i cooperates and j defects, i gets payoff of 0, while j gets 5

Bottom left: If j cooperates and i defects, j gets payoff of 0, while i gets 5

Bottom right: Reward for mutual cooperation (not confessing)

j

i

defects

coperates

2 2

0 5 5 0 3 3

77

(78)

The Prisoner’s Dilemma !

The individual rational action is defect

This guarantees a payoff of no worse than 2, whereas cooperating guarantees a payoff of at most 0

So defection is the best response to all possible

strategies: both agents defect, and get payoff = 2 But intuition says this is not the best outcome:

Surely they should both cooperate and each get payoff of 3!

78

(79)

The Prisoner’s Dilemma !

This apparent paradox is the fundamental problem of multi-agent interactions.

It appears to imply that cooperation will not occur in societies of self-interested agents.

Real world examples:

nuclear arms reduction (“why don’t I keep mine. . . ”) free rider systems — public transport;

The prisoner’s dilemma is ubiquitous.

Can we recover cooperation?

79

(80)

Arguments for Recovering Cooperation !

Conclusions that some have drawn from this analysis:

the game theory notion of rational action is wrong!

somehow the dilemma is being formulated wrongly Arguments to recover cooperation:

We are not all Machiavelli!

The other prisoner is my twin!

The shadow of the future…

80

(81)

The Iterated Prisoner’s Dilemma !

One answer: play the game more than once

If you know you will be meeting your opponent again, then the incentive to defect appears to evaporate

Cooperation is the rational choice in the infinititely repeated prisoner’s dilemma (Hurrah!)

81

(82)

Backwards Induction !

But…suppose you both know that you will play the game exactly n times

On round n - 1, you have an incentive to defect, to gain that extra bit of payoff…

But this makes round n – 2 the last “real”, and so you have an incentive to defect there, too.

This is the backwards induction problem.

Playing the prisoner’s dilemma with a fixed, finite, pre-determined, commonly known

number of rounds, defection is the best strategy

82

(83)

Axelrod’s Tournament !

Suppose you play iterated prisoner’s

dilemma against a range of opponents…

What strategy should you choose, so as to maximize your overall payoff?

Axelrod (1984) investigated this problem, with a computer tournament for programs playing the prisoner’s dilemma

83

(84)

Strategies in Axelrod’s Tournament !

ALLD:

“Always defect” — the hawk strategy;

TIT-FOR-TAT:

1. ! On round u = 0, cooperate

2. ! On round u > 0, do what your opponent did on round u – 1

TESTER:

On 1st round, defect. If the opponent retaliated, then play TIT-FOR-TAT. Otherwise intersperse

cooperation and defection.

JOSS:

As TIT-FOR-TAT, except periodically defect ₈₄

(85)

Recipes for Success in Axelrod’s Tournament !

Axelrod suggests the following rules for succeeding in his tournament:

Don’t be envious:

Don’t play as if it were zero sum!

Be nice:

Start by cooperating, and reciprocate cooperation

Retaliate appropriately:

Always punish defection immediately, but use “measured” force — don’t overdo it Don’t hold grudges:

Always reciprocate cooperation immediately

85

(86)

Game of Chicken !

Consider another type of encounter — the game of chicken:

(Think of James Dean in Rebel without a Cause:

swerving = coop, driving straight = defect.) Difference to prisoner’s dilemma:

Mutual defection is most feared outcome.

(Whereas sucker’s payoff is most feared in prisoner’s dilemma.)

Strategies (c,d) and (d,c) are in Nash equilibrium ₈₆

(87)