Lezione n. !
Corso di Laurea:! Insegnamento:!
Email:!
A.A. 2014-2015!
Silvia Rossi !
Computing equilibria
Lezione n.
11
Informatica!
Insegnamento:
Sistemi !
multi-agente!
Email:
silrossi@unina.it!
Computing Equilibria !
(MAS: 4.1, 4.5)
2
Computing NE for zero sum games !
s
2and U
1* are variables
Pure strategy leading to U
1* value are in the best responce set
3
Computing NE for zero sum games !
s
1and U
1* are variables
Pure strategy leading to U
1* value are in the best responce set
4
The problem of finding a Nash equilibrium of a two-player,
general-sum game cannot be formulated as a linear program. ! The two players’ interests are no longer diametrically opposed. ! We cannot state our problem as an optimization problem: one player is not trying to minimize the other’s utility. !
5
1. (Uniqueness) Given a game G, does there exist a unique equilibrium in G?! 2. (Pareto optimality) Given a game G, does there exist a strictly Pareto
e"cient equilibrium in G?!
3. (Guaranteed payo#) Given a game G and a value v, does there exist an
equilibrium in G in which some player i obtains an expected payo# of at least v?!
4. (Guaranteed social welfare) Given a game G, does there exist an equilibrium in which the sum of agents’ utilities is at least k?!
5. (Action inclusion) Given a game G and an action ai ! Ai for some player! i ! N, does there exist an equilibrium of G in which player i plays action ai! with strictly positive probability?!
6. (Action exclusion) Given a game G and an action ai ! Ai for some player! i ! N, does there exist an equilibrium of G in which player i plays action ai! with zero probability?!
6
The following problems are NP-hard when applied to Nash equilibria:
uniqueness, Pareto optimality, guaranteed payoff,
guaranteed social welfare, action inclusion, and action exclusion.
Computing all of the equilibria of a two-player, general- sum game requires worst-case time that is exponential in the number of actions for each player. !
7
Computational problems in domination !
• !Identifying strategies dominated by a pure strategy
• !Identifying strategies dominated by a mixed strategy
• !Identifying strategies that survive iterated elimination
• !Asking whether a strategy survives iterated elimination underall elimination orderings
• !We'll assume that i's utility function is strictly positive everywhere (why is this OK?)
8
Pure strategy !
9
Domination by a mixed strategy !
Constraints for determining whether s
iis strictly dominated by any mixed strategy
10
Domination by a mixed strategy !
Constraints for determining whether s
iis strictly dominated by any mixed strategy
What's wrong with this program?
11
This is clearly an LP. Why is it a solution to our problem?
12
Further questions !
1 (Strategy Elimination) Does there exist some elimination path under which the strategy si is eliminated?
2 (Uniqueness) Does every elimination path lead to the same reduced game?
4 (Reduction Size) Given constants ki for each player i, does there exist a maximally reduced game where each player i has exactly ki actions?
For iterated strict dominance these problems are all in P.
For iterated weak or very weak dominance these problems are all NP-complete.
13
Computing equilibria in sequential games !
The concept of subgame-perfect equilibrium is related to the concept of backward induction
Identify the equilibria in the bottom most tree and adopt these as one moves up the tree
14
For zero-sum games, Backward Induction has another name: the minimax algorithm.
Here it's enough to store one number per node.
It's possible to speed things up by pruning nodes that will never be reached in play: “alpha-beta pruning".
15
MiniMaxvalue del nodo n ! Minimax(n): !
• ! Utility(n) se n è terminale
• ! max
s!Successors(n)Minimax_value(n) se n nodo Max
• ! min
s!Successors(n)Minimax_value(n) se n nodo Min
Infatti quando Max deve giocare sceglie per
massimizzare la sua utilità, mentre Min gioca per minimizzare l’utilità di Max. !
16
Intuizione da usare per il minimax ! l’albero del gioco può essere visto come un albero
“and” “or” che codifica il ragionamento di Max: !
• ! in uno stato s in cui devo giocare per ogni mia mossa (figlio di un or) devo prendere in considerazione tutte le possibili risposte
(figli di un and).
• ! La migliore mossa per me sarà quella che porterà il mio avversario a darmi il maggior vantaggio, tenuto conto che lui gioca a
farmi perdere (max dei min).
17
Algoritmo MiniMax ! E’ un algoritmo che serve più come base per altri
algoritmi con complessità più a #rontabile e per
studiare matematicamente i giochi che per essere realmente utilizzato. !
Computa nello stato corrente la decisione basandosi sui valori minimax. !
18
Algoritmo MiniMax ! Usa la computazione ricorsiva dei valori minimax dei
successori, utilizzando la definizione ricorsiva di essi fino ad arrivare alle foglie dell’albero in cui finalmente trova i valori dell’utility function e incomincia il back up dei valori. !
19
Algoritmo MiniMax ! Minimax compie una completa esplorazione depth first dell’ albero: !
• ! per m=max del depth,
• ! e b il fattore di allargamento
la complessità temporale è O(b
m) e spaziale O(mb). !
20
Properties of minimax !
Complete? Yes (if tree is finite)
Optimal? Yes (against an optimal opponent) Time complexity? O(b
m)
Space complexity? O(bm) (depth-first exploration)
But in general the search tree is
too big to make it possible to reach the terminal states!
Examples:
• ! Checkers: ~10
40nodes
• ! Chess: ~10
120nodes for "reasonable" games
! exact solution completely infeasible
22
Evaluation Functionof a State !
e(s) = +" if s is a win for MAX e(s) = -" if s is a win for MIN
e(s) = a measure of how “favorable”
is s for MAX
> 0 if s is considered favorable to MAX
< 0 otherwise
23
Construction of an Evaluation Function !
" ! Usually a weighted sum of “features”:
" ! Features may include
" ! Number of pieces of each type
" ! Number of possible moves
" ! Number of squares controlled
Example: Tic-Tac-Toe ! e(s) = number of rows, columns, and diagonals open for MAX
- number of rows, columns, and diagonals open for MIN
25
8-8 = 0 6-4 = 2 3-3 = 0
Example !
26
6-5=1
5-6=-1 5-5=0
5-5=0 6-5=1 5-5=1 4-5=-1
5-6=-1
6-4=2 5-4=1
6-6=0 4-6=-2
-1
-2
1 Tic-Tac-Toe with 1
horizon = 2
Example !
27
0 1
1
1 3
2 1 2 1
1
0
1 1 0 0 2 0
1 1 1
2 2 3 1 2 2
Example !
28
Why using backed-up values? !
" ! At each non-leaf node N, the backed-up value
is the value of the best state that MAX can
reach at depth h if MIN plays well (by the same criterion as MAX applies to itself)
" ! If e is to be trusted in the first place, then the
backed-up value is a better estimate of how
favorable STATE(N) is than e(STATE(N))
Minimax procedure !
1.! Expand the game tree uniformly from the current state (where it is MAX’s turn to play) to depth h
2.! Compute the evaluation function at every leaf of the tree
3.! Back-up the values from the leaves to the root of the tree as follows:
1.! A MAX node gets the maximum of the evaluation of its successors
2.! A MIN node gets the minimum of the evaluation of its successors
4.! Select the move toward the MIN node that has the maximal
backed-up value
30
Horizon of the procedure
Needed to limit the size of the tree or to
return a decision within allowed time
Issues !
Choice of the horizon !
Size of memory needed !
Number of nodes examined !
31
Giochi a Somma Zero: Alpha/Beta !
(RN: 6.3, 6.4, 6.6)
32
Number of nodes examined ! Limitare i nodi esaminati (soprattutto sviluppati) è
essenziale: !
• ! la scelta dell’ orizzonte serve anche a questo, ma non è sufficiente.
• ! Per diminuire ulteriormente si introduce la strategia !/".
33
Can we do better? ! Yes ! Much better !
-!
# Pruning
#
This part of the tree can’t have any effect on the value that will be backed up to the root
Can we do better? ! Yes ! Much better !
3
-1
# Pruning
$ -1
% 3
#
This part of the tree can’t have any effect on the value that will be backed up to the root
Alpha-Beta Procedure ! Generate the game tree to depth h in depth-first
manner !
Back-up estimates (alpha and beta values) of the evaluation functions whenever possible !
Prune branches that cannot lead to changing the final decision !
36
Significato dei valori " #!
! rappresenta per un nodo max il valore del massimo della funzione di valutazione dei valori dei figli min fino a quel momento sviluppati: !
• ! non può mai diminuire (peggio di così non può andare)
" rappresenta per i nodi min il valore del minimo della funzione di valutazione dei figli max fino a quel
momento sviluppati !
• ! non può mai aumentare (meglio di così non può andare)
se " è ad un certo istante $ dell’! di suo padre la
ricerca sotto di lui può essere troncata. ! 37
"-# pruning example!
"-# pruning example!
"-# pruning example!
"-# pruning example!
"-# pruning example!
Alpha-Beta Pruning !
" ! Explore the game tree to depth h in depth-first
manner
" ! Back up alpha and beta values whenever
possible
" ! Prune branches that can’t lead to changing the
final decision
Alpha-Beta Example !
0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2
44
0
Alpha-Beta Example !
0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2
45
0 0
Alpha-Beta Example !
0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2
46
0 0 -3
Alpha-Beta Example !
0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2
47
0 0 -3
Search can be
discontinued below any MIN node whose beta value is
less than or equal to the alpha value
of one of its MAX ancestors
Alpha-Beta Example !
0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2
48
0
0
0 -3
Alpha-Beta Example !
0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2
49
0
0
0 -3 3
3
Alpha-Beta Example !
0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2
50
0
0
0 -3 3
3
Search can be
discontinued below any MIN node whose beta value is
less than or equal to the alpha value
of one of its MAX ancestors
Alpha-Beta Example !
0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2
51
0
0
0
0 -3 3
3
0
Alpha-Beta Example !
0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2
52
0
0
0
0 -3 3
3
0
5
Alpha-Beta Example !
0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2
53
0
0
0
0 -3 3
3
0
2
2
Alpha-Beta Example !
0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2
54
0
0
0
0 -3 3
3
0
2
2
Search can be
discontinued below any MIN node whose beta value is
less than or equal to the alpha value
of one of its MAX ancestors
Alpha-Beta Example !
0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2
55
0
0
0
0 -3 3
3
0
2
2 2
2
Alpha-Beta Example !
0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2
56
0
0
0
0 -3 3
3
0
2
2 2
2
Search can be
discontinued below any MIN node whose beta value is
less than or equal to the alpha value
of one of its MAX ancestors
Alpha-Beta Example !
0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2
57
0
0
0
0 -3 3
3
0
2
2 2
2
0
Alpha-Beta Example !
0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2
58
0
0
0
0 -3 3
3
0
2
2 2
2
5 0
Alpha-Beta Example !
0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2
59
0
0
0
0 -3 3
3
0
2
2 2
2
1
1 0
Alpha-Beta Example !
0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2
60
0
0
0
0 -3 3
3
0
2
2 2
2
1
1
-3 0
Alpha-Beta Example !
0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2
61
0
0
0
0 -3 3
3
0
2
2 2
2
1
1
-3 0
Search can be
discontinued below any MIN node whose beta value is
less than or equal to the alpha value
of one of its MAX ancestors
Alpha-Beta Example !
0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2
62
0
0
0
0 -3 3
3
0
2
2 2
2
1
1
-3 1
1 0
Alpha-Beta Example !
0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2
63
0
0
0
0 -3 3
3
0
2
2 2
2
1
1
-3 1
1
-5 0
Alpha-Beta Example !
0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2
64
0
0
0
0 -3 3
3
0
2
2 2
2
1
1
-3 1
1
-5 0
Search can be
discontinued below any MIN node whose beta value is
less than or equal to the alpha value
of one of its MAX ancestors
Alpha-Beta Example !
0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2
65
0
0
0
0 -3 3
3
0
2
2 2
2
1
1
-3 1
1
-5 -5 -5 0
Alpha-Beta Example !
0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2
66
0
0
0
0 -3 3
3
0
2
2 2
2
1
1
-3 1
1
-5 -5 -5 0
Search can be
discontinued below any MIN node whose beta value is
less than or equal to the alpha value
of one of its MAX ancestors
Alpha-Beta Example !
0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2
67
0
0
0
0 -3 3
3
0
2
2 2
2
1
1
-3 1
1
-5 -5 -5 0
1
Alpha-Beta Example !
0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2
68
0
0
0
0 -3 3
3
0
2
2 2
2
1
1
-3 1
1
-5 -5 -5 1
2 2 2
2 0
Alpha-Beta Example !
0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2
69
0
0
0
0 -3 3
3
0
2
2 2
2
1
1
-3 1
1
-5 -5 -5 1
2 2 2
2 1
How much do we gain? !
" ! Then alpha-beta examines O(b
h/2) nodes
[Knuth and Moore, 1975]
" ! Branching factor from b -> sqrt(b)
" ! Chess: 35 -> 6
" ! But this requires an oracle (if we knew how to
order nodes perfectly, we would not need to search the game tree)
" ! If nodes are ordered at random, then the
average number of nodes examined by alpha-
beta is ~O(b
3h/4)
Centipede game !
At each turn choose between going “down” and ending the game or going “across” and
continuing it.
71
Criticisms to backward induction !
The payoffs are constructed in such a way that the only SPE is for each player to
always choose to go down.
72
Criticisms to backward induction !
73
SPE prediction in this case flies in the face of intuition
(in laboratory experiments subjects in fact continue to
play “across” until close to the end of the game)
Criticisms to backward induction !
74
The second problem is theoretical.
Imagine that you are the second player in the game, and in the first step of the game the first player
actually goes across.
What should you do? The SPE suggests you should go down, but the same analysis suggests that you would not have gotten to this choice point in the first place.
This seemingly small inconvenience actually raises a
fundamental problem in game theory.
Repeated Games!
75
The Prisoner’s Dilemma !
Two men are collectively charged with a crime and held in separate cells, with no way of
meeting or communicating. They are told that:
if one confesses and the other does not, the confessor will be freed, and the other will be jailed for three years
if both confess, then each will be jailed for two years
Both prisoners know that if neither confesses, then they will each be jailed for one year
76
The Prisoner’s Dilemma !
Payoff matrix for
prisoner’s dilemma:
Top left: If both defect (confess), then both get punishment for mutual defection
Top right: If i cooperates and j defects, i gets payoff of 0, while j gets 5
Bottom left: If j cooperates and i defects, j gets payoff of 0, while i gets 5
Bottom right: Reward for mutual cooperation (not confessing)
j
i
defects
defects
coperates
coperates