• Non ci sono risultati.

Computational results for normal form coarse correlated equilibria

N/A
N/A
Protected

Academic year: 2021

Condividi "Computational results for normal form coarse correlated equilibria"

Copied!
32
0
0

Testo completo

(1)

Politecnico di Milano

Scuola di Ingegneria Industriale e dell’Iinformazione

Corso di laurea in Computer Science and Engineering

Computational results for Normal Form Coarse

Correlated Equilibria

Relatore:

Nicola GATTI

Correlatore: Andrea CELLI

Tesi di Laurea di:

Matteo MARRONE Matr. 883702

(2)

Sommario

Questo lavoro ha per tema la complessit`a computazionale del concetto di soluzione Coarse Correlated Equilibrium nel contesto della teoria dei giochi non cooperativa; ci concentreremo su equilibri ottimali, tali cio`e da garantire la massimizzazione dell’utilit`a cumulata dei giocatori. Sia Coarse Correlated Equilibrium che Corre-lated Equilibrium meritano attenzione in quanto rappresentativi di una situazione di stabilit`a in giochi che modellino situazioni, frequenti nella realt`a, dove sia pos-sibile qualche forma di coordinamento strategico tra i giocatori prima che questi intraprendano azioni. Qui faremo uso di una rappresentazione ibrida delle strate-gie dei giocatori che combini forma normale e forma sequenza per mostrare come, partendo dal vincolo rappresentativo del Coarse Correlated Equilibrium, sia pos-sibile definire un problema di ottimizzazione risolvibile in tempo polinomiale. In particolare, mostreremo dapprima come ci`o sia possibile per il caso base di giochi con 2 giocatori per poi osservare come i programmi lineari e gli algoritmi definite per il caso base siano estendibili a giochi con 2 giocatori che includano Natura e a giochi con 3 o pi`u giocatori. Proponiamo anche un’implementazione dell’ algoritmo simplex adatta a identificare soluzioni nella pratica.

(3)

Contents

1 Introduction 1

2 Preliminaries 3

2.1 Basics of Game Theory . . . 3

2.2 Extensive Form . . . 3 2.3 Normal Form . . . 5 2.4 Sequence Form . . . 5 2.5 Agent Form . . . 6 2.6 Solution Concepts . . . 7 2.6.1 Nash Equilibrium . . . 7 2.6.2 Correlated Equilibrium . . . 8

2.6.3 Coarse Correlated Equilibrium . . . 8

2.7 Known Complexity Results . . . 9

2.8 Applications of Game Theoretical Models . . . 10

3 NFCCE in 2 players games 11 3.1 The Formulation of the LP . . . 11

3.2 The Formulation of the constraint of the Dual Problem . . . 13

3.3 The Algorithm of Constrained Plan Search . . . 14

3.4 Application of the Simplex Algorithm . . . 16

4 NFCCE in 2 players games with Nature 18

5 NFCCE in 3 or more players games 21

(4)

List of Figures

2.1 Example of extensive-form game. . . 4

2.2 Game 1: Prisoner’s Dilemma (left) - Game 2: Bach-Stravinskij (right). 5 2.3 Some elements of the sequence-form representation of the game pro-vided in Figure 2.1. . . 6

2.4 The only Nash equilibrium of the Prisoner’s Dilemma game. . . 7

2.5 Correlated Equilibrium Example. . . 8

(5)

List of Tables

(6)

List of Algorithms

1 CONSTRAINED PLAN SEARCH . . . 15 2 N F  MIN  SUP P ORT  IDENT IF ICAT ION . . . 19

(7)

Chapter 1

Introduction

Part of the research on non-cooperative game theory has focused, throughout the years, on the design of algorithms for the computation of solution concepts on games modeling adversarial interactions. This problem is nowadays central in the Artificial Intelligence field. Many efforts have been spent to derive re-sults on the computation of Nash Equilibria (NEs), but relatively little rere-sults are known in literature on the computation of other solution concepts, such as Cor-related Equlibria (CEs), which assume the possibility of communication among the players (a feature which is often present in real world scenarios ) and as a consequence are better suited as a prescriptive tool (since players cannot syn-cronize their strategies to choose a particular NE when they are asked to find one and multiple coexist at once, due to communication being impossible). Moreover, most of the existing works focus on 2 players games, whereas many practical ap-plications involve multiple agents strategically interacting with each other. The study of multi-player solution concepts has recently received increasing attention (see, e.g., [BCDNG17, FCGS18]). CEs are an ideal tool to model these type of multi-player problems.

In this thesis, we will analyze how it is possible to derive an optimal, that is, social welfare maximizing Normal Form Coarse Correlated Equilibrium (NFCCE), a relaxation of the NFCE (Normal Form Correlated Equilibrium), in polynomial time for games with 2 players and how linear programs and algorithms proposed can be adjusted for games including chance and a number of players greater or equal than 3. In principle, the same approach could be exploited to obtain a poly-nomial time algorithm approximating an optimal NFCE, but the approximation factor would not be better than exponential as it was recently shown in [CG18] that the optimization problem is not in Poly-APX unless P NP.

This thesis is structured as follows:

(8)

Computa-tional Complexity, necessary to understand the content of the other chapters;

• in Chapter 3, we show the derivation of the linear programming formulation of the problem of finding an optimal NFCCE in the basic case (that is, in games without Nature with a number of players n 2), and prove that it is possible to solve it in polynomial time. We also discuss an implementation of the simplex algorithm suitable for computing solutions in practice;

• in Chapter 4, we describe the adjustments to be made to the linear program defined in Chapter 3 for the case of 2 players games with Nature, showing how the total number of constraints increases proportionally to the number of outcomes of the game, but linearly so, allowing for resolution in polynomial time;

• in Chapter 5, we describe the adjustments to be made to the linear program defined in Chapter 3 for the case of n ¡ 2 players games; in particular, we observe that the addition of players is the source of nonlinearity in the dual of the new linear programming formulation and we derive an additional linear program meant to allow for the resolution of the main problem in polynomial time;

(9)

Chapter 2

Preliminaries

2.1

Basics of Game Theory

A game models strategic interactions between rational players. Players’ actions can be observable by other players or not. In the first case we have a perfect information game, in the second an imperfect information one. A game with Nature is a game in which there is a non-strategic player (i.e. choosing her actions according to a fixed probability distribution ) said Nature; Nature is meant to model uncertainty. Games where every player remembers all the moves she made and observed during the game are said to be perfect-recall games; in the following analysis we focus on perfect-recall games only. We briefly survey the main game models known in the literature; more details can be found in any textbook on game theory, e.g., [SLB08].

2.2

Extensive Form

The extensive form representation of a game is the one most closely associated with the idea of sequential play along the game tree. If the game is perfect information, it is defined as a tuple pN, A, V, L, ι, ρ, χ, Uq, where:

• N is a set of n players, • A is a set of actions,

• V is the set of non-terminal decision nodes of the game tree, • L is the set of terminal nodes or leaves,

• ι : V Ñ N is a function returning the index of the player which is to play at the turn corresponding to a decision node,

(10)

1.1 2.1 1.2 8, 3 C 6, 5 D a 1.2 5, 8 C 2, 7 D b A 2.2 1.3 7, 6 E 3, 4 F c 1.3 1, 0 E 0, 1 F d B

Figure 2.1: Example of extensive-form game.

• ρ : V Ñ 2A is the function returning the set of actions available to a player

at a given decision node,

• χ : V  A Ñ V Y L is the function specifying which node is reached by playing a certain action at a certain non-terminal node, and, finally,

• U  U1, U2, . . . , Un is the set of utility functions, each Ui : LÑ < describing

utilities over terminal nodes for player i.

Games with imperfect information are defined similarly. More precisely, they are defined by a tuple pN, A, V, L, ι, ρ, χ, U, Hq, where pN, A, V, L, ι, ρ, χ, Uq is a perfect-information game and H defines the information structure of the players. In particular, H  H1, H2, . . . , Hn, where Hi is a partition of Vi such that, for each

x1, x2 P Hi, ρpx1q  ρpx2q whenever there exists a hi P His.t.x1 P hi and x2 P hi.

Each hi is called information set. Basically, an information set of a player is a

set of non-terminal nodes associated to the player such that, for any two nodes belonging to the set, the actions available to the player are the same. On the other hand, player i has perfect recall in an imperfect-information extensive-form game if for any two decision nodes w1, w2 that are in the same information set h for player i, given any couple of paths from the root w0 of the game tree to w1

and from the root to w2 it is the case that: the paths are of the same length; all the intermediate nodes along the paths with equal path index belong to the same information set; for any index the actions played along both paths for that index are the same. In Figure 2.1, an example of two-player imperfect-information game in extensive form is provided.

(11)

2.3

Normal Form

The normal form representation, on the other hand, is tabular. Given a game in extensive form, its equivalent normal form representation will be a tripletpN, P, U1q where:

• N is, as before, the set of n players, • P is the set of plans, and

• U1 is the utility function.

A plan pi P Pi specifies for a player i the actions to be undertaken at each

infor-mation set Hi, whereas U1 can be seen as the set of functions Ui1 : P Ñ < such

that each Ui1pp1, p2, . . . , pnq  Uiplq being l the terminal node reached by playing

the plan profile pp1, p2, . . . , pnq . The normal form representation of an

imper-fect information game is formally defined in the same way as that of a perimper-fect information one, but in the former each plan includes the actions to be played at each information set, even at those mutually exclusive. The reduced normal form, see [VJ98], can be obtained by deleting the replicated strategies within the normal form (that is, there is no realization-equivalent couple of plans in a reduced normal-form representation). Being the reduced normal form exponential in the size of the information sets, the sequence form, linear in the size of the informa-tion sets, is often used in its stead. We report two simple examples of two-players games in Normal Form in Figure 2.2, where each plan pi is made up of a single

action in the corresponding extensive-form representation.

Player 2 A B Player 1 A p3, 3q p1, 4q B p4, 1q p2, 2q Player 2 Bach Stravinskij Player 1 Bach p2, 1q p0, 0q Stravinskij p0, 0q p1, 2q

Figure 2.2: Game 1: Prisoner’s Dilemma (left) - Game 2: Bach-Stravinskij (right).

2.4

Sequence Form

The sequence form is an alternative representation proposed in [vS96] that plays a crucial role when one needs to compute equilibria in games with sequential moves. A sequence q for player i is an entity associated to a node x of the game tree, listing the actions to be played by player i from the root to reach x. A sequence is said to be terminal whenever for some sequences of the other players it leads

(12)

to a terminal node. An additional sequence qH is defined to denote the fictitious sequence leading to the root node QH. Formally, the sequence form representation of an extensive form game is given by a tuple pN, Q, U1, Cq, where:

• N is once again the set of players,

• Q is the set of sets of sequences of the players,

• U1 is a utility function which associates a utility values set to a terminal sequences set, and

• C is the set of constraints over the sequence form strategies of the players. A sequence-form strategy, known as realization plan, is afunction ri : Qi Ñ r0, 1s

which associates to each sequence qi its probability of being played by player i. For

each player i ripqHq  1 and ripqq 

°

aPρphqripqaq for each information set h and

eachq, where qa represents the sequence obtained by extending sequence q with sequence a; basically the value of the realization plan for a sequence associated to a node is given by the sum of the values of the realization plan for the sequence obtainable by extension through one of the actions comprised in the information set for the node. These constraints are usually written in the form Firi  fi, where

Fi is a matrix sized p|Hi| 1q  |Qi| and fi is a column vector sized |Hi| 1. In

Figure 2.3, we provide some elements of the sequence-form representation of the game provided in Figure 2.1.

r1            r1pqHq r1pAq r1pBq r1pACq r1pADq r1pBEq r1pBF q           F1      1 0 0 0 0 0 0 1 1 1 0 0 0 0 0 1 0 1 1 0 0 0 0 1 0 0 1 1     f1      1 0 0 0    

Figure 2.3: Some elements of the sequence-form representation of the game pro-vided in Figure 2.1.

2.5

Agent Form

The agent form has been introduced by Selten in [Sel74]. Given a game in extensive form, its equivalent agent form representation is a triplet pN1, A1, U1q, where

(13)

• N1 is as always the set of players /agents (one for each information set of each player as defined in the extensive form),

• A1 is the set of the actions of the agents, and • U1 is the utility function.

An agent-form strategy (said behavioral strategy) πi.h : Ai.h Ñ r0, 1swith πi.h P

∆pAi.hq is a function returning the probability with which each action a P Pi is

played by agent i.h. Basically, in the agent form representation a normal form like tabular representation is defined for each information set, and the strategy is defined over every information set as if it were independent from the strategies at the other information sets. It follows that, in games with a single information set per player, the agent form representation is the same as the normal form one.

2.6

Solution Concepts

2.6.1

Nash Equilibrium

Let us consider now a normal form game pN, P, U1q, defined as Γ: we denote a probability distribution over the set of joint plans as σ P ∆pP q and let pi be the plan profile for the players other than i as pi  pp1, . . . , pi1, pi 1, . . . , pnq. A

Nash equilibrium (NE), proposed in [Nas50], for Γ is a strategy profile s.t. each strategy is a best response to the strategy of all the other players, that is, no player can improve her payoff by unilaterally modifying her strategy once the strategies of all the players have been fixed. More formally, in a normal form game a plan vector p ppi, piq is said to be a Nash equilibrium if for all players i we have:

U1ppi, piq ¥ U1ppi1, piq @p1i P Pi.

In Figure 2.4, we provide an example of Nash equilibrium for the game of the prisoners’ dilemma. In this game, pB, Bq is the only couple of plans of the game that the players do not unilaterally deviate from since doing so would lead to a worse payoff.

Player 2

A B

Player 1 A p3, 3q p1, 4q B p4, 1q p2, 2q

(14)

2.6.2

Correlated Equilibrium

A correlated equilibrium (CE), proposed in [Aum74], for Γ is a probability distri-bution σP ∆pP q s.t. for every i P N and pi, p1i P Pi, the following holds:

¸

piPPi

σppi, piqpUi1ppi, piq  Ui1pp1i, piqq ¥ 0.

Basically, a correlated equilibrium arises whenever no player can increase her expected payoff by switching to a plan other than the suggested one pi, which

is part of a plan profile drawn from the aforementioned distribution, assuming the other players will follow their recommendations. The drawing is modeled as done by a coordinator/correlation device which communicates privately to all players its recommendation to her. The correlated equilibrium generalizes the Nash equilibrium, since a NE can be seen as a special case of CE where the distribution over P is the product of independent distributions for each player (that is, a NE is a CE where the players can choose their strategies independently and thus there is no actual correlation). In Figure 2.5, we provide a game in normal form (on the left) and an associated CE (on the right).

Player 2 A B Player 1 A p6, 6q p2, 7q B p7, 2q p0, 0q Player 2 A B Player 1 A p1{3q p1{3q B p1{3q p0q

Figure 2.5: Correlated Equilibrium Example.

In this example, the σ characterizing the CE associates to the couple of plans pA, Aq, pA, Bq, and pB, Aq (and consequently to the outcomes they lead to) a probability of 1/3 each. The mediator or correlation device will give out to the players private recommendations over what action to play compatible with this distribution, and the players will have no incentive to deviate because on average that would decrease their expected utility.

2.6.3

Coarse Correlated Equilibrium

We focus on a solution concept which is a relaxation of the CE, the Coarse Cor-related Equilibrium (CCE) [MV78]. A coarse corCor-related equilibrium for Γ is a probability distribution σ P ∆pP q s.t. for every i P N and p1i P Pi, the following

holds: ¸

piPPi

¸

piPPi

(15)

A CCE occurs when the publicly known σ—i.e. the probability distribution over the set of joint plans—is such that the player commits to following the suggestion she is going to receive over the action to play only before the suggestion is known, since doing so is a best response in expectation independently from the recommen-dation. We report in Figure 2.6, an example of normal-form game and of a CCE of that game. A B C A (1,1) (-1,-1) (0,0) B (-1,-1) (1,1) (0,0) C (0,0) (0,0) (-2,-2) A B C A (1/3) (0) (0) B (0) (1/3) (0) C (0) (0) (1/3)

Figure 2.6: Coarse Correlated Equilibrium Example.

As in the example discussed in the previous section, this equilibrium associates to 3 outcomes of the game a probability of 1/3. The expected utility coming from following recommendations associated to this distribution would still be better than the one obtained by playing A or B, but players would not necessarily commit to the recommendation this time after receiving it, since if either of them were told to play C she would be sure to increase her own utility by deviating.

2.7

Known Complexity Results

In computer science and, in particular in the artificial intelligence field, many efforts have been done to assess the complexity of finding an equilibrium. For instance, computing a Nash equilibrium for a game in normal form with two players is known to be PPAD-complete, as discussed in [DGP09]. Algorithms to find a Nash equilibrium with two players are studied in, e.g., [CGPR10, GPRS12]. With more than two players, even finding an approximate Nash equilibrium is PPAD-complete. The problem of finding an optimal Nash equilibrium is studied in [CS08], showing that it is NP-hard. Recently, the also re-optimization problem has been studied [CMG17], showing that it keeps to be NP-hard.

Finding an optimal NFCE is NP-hard even with two players [vSF08]. Instead, an optimal EFCE (Extensive Form Correlated Equilibrium) can be found efficiently in two-player games without chance, while it is NP-hard in games with three or more players, including nature [vSF08]. However, finding a (potentially non-optimal) EFCE can be done in polynomial time for every fixed number of players. All kinds of correlated equilibria feature a coordinator that interacts with the player unidirectionally by giving them recommendations; it is only in NFCEs and

(16)

NFCCEs though that the interaction occurs only once before the beginning of the game, so that for such equilibria we speak of ex ante or preplay correlation. It was recently shown [CG18] that finding an optimal NFCCE can be done through a polynomial time algorithm when the number of player is two. Complexity results for NFCCE in the basic case as well as in games involving nature and in games with three or more players will be analyzed in the following chapters.

Many computational results are known for other solution concepts that are not discussed in this work. Some recent results are: algorithms for the team-maxmin equilibrium [BCNG17], strong Nash equilibria [GRS13], and Stackelberg equilibrium [MCG18, FMK 18, NMG18, CGM17].

2.8

Applications of Game Theoretical Models

In this section, we mention some modern applications of game theoretical models. Probably, the most known application of game theory is for Poker games. In 2017, Libratus, see [BS17], won more than 1.8 Million US Dollar against the top 4 professional humans of the international circuit of Texas Hold’em Poker. This kind of Poker game is with two players and in extensive form.

Other well-known applications are for security problems. Since 2008, see [JPT 08], game theory has been used to produce randomized schedules in security problems. A recent survey of the successes achieved is provided in [SFA 18]. Mod-els of security games, in which sensors and alarms are used, are described in, e.g., [dCSB 13, BNG17, BNG16]. Online learning techniques are used in [BNT 17] to estimate the behavior of the attacker when it is not rational.

Many applications of game theory models can be found in economic scenar-ios. For instance, automated negotiations [JFL 01, LC14, AGL16], combinatorial auctions [VV03], and auctions for advertising [CGG11].

(17)

Chapter 3

NFCCE in 2 players games

3.1

The Formulation of the LP

An optimal NFCCE will be from now on considered as the one maximizing the cumulative utility of the players, and thus referred to as NFCCE-SW (the normal form coarse correlated equilibrium maximizing the social welfare). In the previous section it was outlined how a NFCCE could be described by means of the following constraint: ¸ piPPi ¸ piPPi σppi, piq  Ui1ppi, piq  Ui1pp1i, piq ¥ 0.

Let’s start by focusing on the 2-player case. For |N|  2 the non-deviation con-straints of player 1 can be rewritten as:

¸ p1PP1 ¸ p2PP2 σpp1, p2qU11pp1, p2q  ¸ p1PP1 ¸ p2PP2 σpp1, p2qU11pp11, p2q ¥ 0 @p11 P P1,

where the first term is the expected utility of player 1 at the equilibrium. While this constraints hold, we want to maximize the expected cumulative utility. This would lead to the following optimization problem:

max ¸ p1PP1 ¸ p2PP2 σpp1, p2qpU1pp1, p2q U2pp1, p2qq (3.1) ¸ p1PP1 ¸ p2PP2 σpp1, p2qU11pp1, p2q  ¸ p1PP1 ¸ p2PP2 σpp1, p2qU11pp11, p2q ¥ 0 @p11 P P1 (3.2) ¸ p1PP1 ¸ p2PP2 σpp1, p2q  1 (3.3) σpp1, p2q ¥ 0 @p1 P P1, p2 P P2. (3.4)

To derive a polynomial time algorithm for finding the equilibrium, the optimization problem must be rewritten by making use of a representation combining normal

(18)

and sequence form. For every normal form plan pi of player i P N, there exists

a single pure equivalent realization plan: we denote it as rpi. We know that the

dual of the best response problem for player 1 in sequence form in the 2-player case takes the following form:

min v1 ¸ h1PH1YhH f1ph1qv1ph1q (3.1) F1ph1, q1qTv1ph1q  ¸ q2PQ2 U1pq1, q2qr2pq2q ¥ 0 @q1 P Q1, (3.2)

where v1 is the|H1| sized scalar vector of dual variables, each one associated with

an information set of player 1 and representing her utility from that information set to the terminal nodes of the game tree. The constraint can then be interpreted as ensuring that the value of player 1 at each information set of player 1 (i.e. her utility from that information set to the terminal nodes of the game tree) is not strictly smaller than the one obtainable by playing the best response available at that information set to player 1 given the strategy r2 of player 2. It follows that,

being f1 a vector characteristic of the sequence form of only zeroes but for the

first element, equal to 1, fT

1 v1 will be equal to the first element of v1 , that is, the

expected utility at the equilibrium. So, we are able to rewrite the optimization problem as: max ¸ pp1,p2qPP1PP2 σpp1, p2qrpT1pU1 U2qrp2 (3.1) ¸ p1PP1 ¸ p2PP2 σpp1, p2qrTp1U1rp2  f T 1v1 (3.2) ¸ p1PP1 ¸ p2PP2 σpp1, p2qrTp1U2rp2  f T 1v2 (3.3) f1Tv1 ¸ p1PP1 ¸ p2PP2 σpp1, p2qU11pp11, p2q ¥ 0 @p11 P P1 (3.4) f2Tv2 ¸ p1PP1 ¸ p2PP2 σpp1, p2qU21pp1, p12q ¥ 0 @p12 P P2. (3.5)

Equation (3) expresses the same concept as the constraint of the dual of the best response in sequence form, so that we can obtain as the following optimization problem:

(19)

max ¸ pp1,p2qPP1PP2 σpp1, p2qrTp1pU1 U2qrp2 (3.1) ¸ p1PP1 ¸ p2PP2 σpp1, p2qrpT1U1rp2  f T 1 v1 (3.2) ¸ p1PP1 ¸ p2PP2 σpp1, p2qrpT1U2rp2  f T 1 v2 (3.3) f1Tv1 ¸ p1PP1 ¸ p2PP2 σpp1, p2qU11pp11, p2q ¥ 0 @p11 P P1 (3.4) f2Tv2 ¸ p1PP1 ¸ p2PP2 σpp1, p2qU21pp1, p12q ¥ 0 @p12 P P2 (3.5) ¸ p1PP1 ¸ p2PP2 σpp1, p2q  1 (3.6) σpp1, p2q ¥ 0 @p1 P P1, p2 P P2, (3.7)

where the constraints (5) and (6) ensure the normal form strategy is well de-fined. Thus, the problem can be formulated as a LP with a polynomial number of constraints (linear in the number of the information sets of the players) and an exponential number of variables, enclosed in the following vector:

xT  pσpp11, p12q, σpp11, p22q, . . . , σpp12, p12q, . . . , v1T, vT2, st1, sT2q, being vT

1, v2T the variables characteristics of the dual formulation and st1, sT2 the

slack variables. Variables of σ are the only ones to appear in the objective function and thus are the only one with an associated non-null reduced cost in the associated simplex formulation, corresponding to c rU1ppi1, pi2q U2ppi1, pi2qs for each plan i.

3.2

The Formulation of the constraint of the Dual

Problem

The cardinality of the constraints is linear in the sequences of the players; as a consequence, the variables of the dual are linear in the sequences of the players as well. More exactly, we’ll call αi the dual variables associated to the constraints

of type (2) and βi the dual variables corresponding to the constraints of type (3)

and (4), numbering|Qi| respectively. Each player i thus adds a |Qi| 1 number of

constraints, for a grand total of|Q1| |Q2| 3 once the γ dual variable, associated to

the constraint (5), is taken into account. Similarly, the primal has an exponential number of variables and the dual an exponential number of constraints ( p|P1| 

(20)

|P2|q |H1| |H2|). If we consider the minimization of the negative of the welfare

objective function, the only relevant dual constraint from the computational point of view ( that is, the ones associated to the σ variables, since the others are linear in the information sets) is thus:

rTp1U1rp2α1 r T p1U2rp2α2 β T 1U1rp1 r T p1U2β2 γ ¤ r T p1pU1 U2qrp2@pp1, p2q P P1P2.

The separation problem of finding the couple of plans pp1, p2q maximizing the

quantity rTp1U1rp2α1 r T p1U2rp2α2 r T p1pU1 U2qr T p2 β T 1U1rp1 r T p1U2β2,

while checking no one gets above γ (which would violate the inequality) is ac-tually the same as the one of finding the column of the tableau associated to the dual LP within a simplex context with the maximum reduced cost. This means that being able to solve the separation problem in polynomial time corresponds to being able to apply a single step of the simplex algorithm to the dual in polynomial time, and we remember that by strong duality solving the dual means solving the primal. If we analyze the couples of plans iterating over the outcomes they lead to, that is, if for each outcome l P L we consider in turn all the plans ppl1, pl2q P P1P2

leading to outcome l, the problem can be reduced to the maximization of the terms of the form: βiTUirpk, because rTp1U1rp2α1 r T p1U2rp2α2 r T p1pU1 U2qr T p2

is constant within each such domain. It can be proven that this can be done in polynomial time and that thus the separation problem can overall be solved in polynomial time.

3.3

The Algorithm of Constrained Plan Search

The algorithm to do so receives as input a node of the game tree, the entire game tree modified so that the utilities are expressed as the product between a vector ζ, representing the βi, and themselves, the index i of the player we want

to find a plan for, the sequence associated to the fixed outcome l for player i, the set of information sets for player i encountered along such a sequence and an initially empty set of actions,and returns the maximal value of the reduced objective function of the separation problem ( the version where the unknown is p2 is taken into account as example), β1U1rp2, and a complete set of actions for

(21)

Algorithm 1 CONSTRAINED PLAN SEARCH H

1: function CP LAN SEARCHpy, Γ, i, ql

i, Hil, A1i)

2: v Ð K

3: a1 Ð null

4: if x is terminal then return Uipxq, A1i

5: else

6: if x P Vi then

7: for yP x.child do

8: v  CP LANSEARCHpy, γ, i, qil, Hil, A1iq.val return pv, A1iq

9: else

10: if hP Hil: xP h then

11: a1 Ð action specified by ql i

12: ya1 Ð child of x reached through a1

13: v  CP LANSEARCHpy, Γ, i, qil, Hil, A1iq.val 14: else 15: for y P x.child do 16: tempÐ CP LANSEARCHpy, Γ, i, qil, Hil, A1iq 17: if temp.val¡ v then 18: v Ð temp.val 19: a1 Ð a P ρpxq : χpx, aq  y return pv, A1iY a1iq

(22)

performing a depth-first traversal of the tree. Pseudocode is provided below. ) v represents the current value of the objective function, a1 is the variable storing the action to be added to the initially empty set, initially null. If the node is a leaf, and we return the utility for player 1 as value plus the action set built so far (Lines 4–5). Lines 8-10 are executed if we are on a node associated to player 1: the algorithm is simply called recursively on the children of the node, adding the value part of the return value to v. If the node is non-terminal and belongs to V2 two cases can take place: if the node belongs to the set of information sets

previously defined we set the action a1 as the one specified by the sequence at this point and reinitialize v as the value part of the return value of the algorithm called recursively on the node reached through a1, otherwise for every children of the node we call the algorithm recursively and compare the present value to the value part of the return value stored in temp. If v is greater it is reinitialized and a1 is set as the action played in order to get to the node the result was obtained for. In both cases v and the set enriched with a1 are returned. Basically, we go down the tree forming plans by adding actions that are compatible with the sequence defined by the outcome considered and lead to the greatest possible objective function value.

3.4

Application of the Simplex Algorithm

As for the simplex algorithm, we’ll make use of the two phase version: in phase 1 a basic feasible solution will be found through the solution of an auxiliary problem with artificial slack variables, to then move from there towards the optimal solution in phase 2; that is, the LP will be overall characterized by a variables vector of the form

xT  pσpp11, p12q, σpp11, p22q, . . . , σpp12, p12q, . . . , v1T, vT2, st1, sT2q,

the σ variables being the only ones which appear in the objective function and thus the only ones with an initial non-null reduced cost. In both cases the identi-fication of the nonbasic variable to be added to the basis within a single iteration of the simplex algorithm will be carried out in polynomial time. As previously mentioned the variables of the LP are exponential in number, so to further reduce execution time the tableau will be updated lazily but for the vector of costs, the basis columns, the vector of known terms and the columns corresponding to the slack variables si and to the variables of type vi ( since the formers are linear in

the sequences of the players and the latters are linear in their information sets) . Below it is possible to find a depiction of the tableau in its initial state, for phase 1:

(23)

Table 3.1: Example of tableau σpp1 1, p12q . . . σpp11, p |P 2| 2 q σpp21, p12q . . . σpp21, p |P 2| 2 q . . . σpp |P 1| 1 , p12q . . . σpp |P 1| 1 , p |P 2| 2 q v1 v2 s1 s2 U1N F U2N F U1N F U2N F . . . U1N F U2N F 0 0 0 0 0 UN F 1 U1N F . . . U1N F f1 0 . . . . 0 UN F 2 U2N F . . . U2N F 0 f2 ... ... 0 USF 1 rp2 U SF 1 rp2 . . . U SF 1 rp2 F T 1 p0q . . . . 0 rp1U SF 2 rp1U SF 2 . . . rp1U SF 2 p0q F2T . . . . 1 1 1 . . . 1 0 0 . . . .

Slack variables are multiplied by negative unitary coefficients within sparse matrices for the inequality constraints, nullified otherwise.

(24)

Chapter 4

NFCCE in 2 players games with

Nature

In Chapter 2 we analyzed how it is possible to derive a solution to the separation problem associated to the dual of the LP formulation of the NFCCE-SW in poly-nomial time, by virtue of an algorithm based on the premise that each outcome of a 2 player game uniquely identified a set of couples of plans. This premise is no longer valid in games with Nature, since chance makes it so each couple of plans is associated to a distribution over the outcomes rather than to a single one; in this chapter we will explain how it is still possible to obtain a separation oracle working in polynomial time exploiting the same algorithm by treating Nature as a fictitious third player.

Denoting as p3 P P3 a normal-form plan of the nature and as σN a nature

normal form “strategy”, we can rewrite the NFCCE-SW LP in the following form:

max ¸ pp1,p2,p3qPP1PP2XPP3 σpp1, p2, p3qrpT1pU1 U2qrp2 (4.1) ¸ pp1,p2,p3qPP1XPP2XPP3 σpp1, p2, p3qrpT1Uirp2  f T i vi @i P N (4.2) F1Tv1 U1p ¸ p2PP2 p ¸ p1PP1 σpp1, p2, p3qqrp2 ¥ 0 (4.3) F2Tv2 U2Tp ¸ p1PP1 p ¸ p2PP2 σpp1, p2, p3qqrp1 ¥ 0 (4.4) ¸ p1PP1 ¸ p2PP2 σpp1, p2q  σNpp3q @p3 P P3 (4.5) σpp1, p2, p3q ¥ 0 @pp1, p2, p3q P P1 P2 P3 (4.6)

The system is the same as the one before but for the fact that the NF strategies are now written as a function of p3 and that Constraint (5) has been modified to

(25)

take into account that the strategies of the first two players sum up no longer to 1 but to the set and known nature “strategy”. Nature plans being exponential in number would at a first glance prompt the addition of an exponential number of constraints with respect to the previous case, but we can actually concern ourselves with only a reasonably small subset of them, which we’ll represent as PN. This

occurs because we know the normal form strategies σN, allowing us to derive the

distribution over the outcomes induced by Nature, which we’ll call γpzq ;this can be done, for example, by taking into account Nature behavioral strategies πi and

then making use of the equation

γpzq  ¹

ha„z,ιphqi

πipa, hq @z P Z

which, for each outcome, computes its likelihood through the product of the prob-abilities of playing at each information sets encountered on the path from the root to the outcome the action necessary to reach the outcome, as defined by the be-havioral strategies. From γpzq, in turn, we can derive PN by means of the greedy

algorithm shown below, where γpi is the portion of the distribution over the

out-comes explained by pi and ωpipzq assumes the value 1 when pi defines a path from

the root leading to z for some actions of the other players. Basically the loop adds

Algorithm 2 N F  MIN  SUP P ORT  IDENT IF ICAT ION

1: PN Ð H

2: γpi Ð 0

3: while γ  0 do

4: γpi  maxpiPPNminzPsupppωpipzqqγpzq

5: PN  PN Y pi

6: γ  γ  γp

iωpipzq

to PN at each iteration the plan associated to the highest (with respect to the

plans) minimum (with respect to the supported outcomes) value until a support of σN is formed until γ turns to zero, that is, we select the plans that represent

most of the distribution in sequence while preventing selection that could cause γpzq to fall below zero. |PN| is at most equal to |Z|, so the number of constraints

induced by (5) is polynomial. At this point we have , as in the previous case, a program with a number of constraints linear in |H1| |H2| |Z| 5 and an exponential number of variables, where H1 and H2 are the sets of informations

sets of the two players. This time, though, the explicit inclusion of Nature as a player allows us to group the columns of the simplex tableau by p3: that is,

the strategy variables will be now ordered according to p3, and within a tableau

(26)

will be replicated, this way rather than enumerating over the exponentially many outcome distributions we will enumerate over nature plans at first level and over the corresponding single outcomes at a second, achieving once again a separation oracle working in polynomial time.

(27)

Chapter 5

NFCCE in 3 or more players

games

In case the number of player exceeds 2 the quantity to be minimized in the context of the separation problem is no longer the sum of a constant ( with respect to each set of couples of plans leading to a certain outcome ) term and two terms linear in a single realization plan, that is:

rpT1U1rp2α1 r T p1U2rp2α2  r T p1pU1 U2qr T p2 β T 1U1rp1 r T p1U2β2

since each of the non-constant terms is now dependent on the product of a number of realization plans corresponding to the number of players. For example, focusing on the case n 3 we have:

rTp1U1rp2rp3α1 r T p1U2rp2rp3α2 r T p1U3rp2rp3α3  r T p1pU1 U2 U3qr T p2rp3 β1TU1rp2rp3 r T p1U2rp3β2 r T p1r T p2U3β3

where the first four terms sum up, once again, to a quantity constant when enu-merating over the outcomes and looking for, for each outcome, plans leading to such an outcome. We can overcome the obstacle by introducing a new kind of variable χij representing the product of the realization plans rpi and rpj, that is,

χijpqi, qjq  rpipqiqrpjpqjq where qi and qjare sequences; since rpi and rpjhave

re-spectively sizes|Qi| and |Qj|, χij will obviously have size |Qi|  |Qj|. Defining then

a minimization subproblem for each χ, taking into account the defined variable can be characterized through the constraints below,

χijpqi, qjq P r0, 1s (5.1)

χijpqi, qjq ¤ rpipqiq (5.2)

χijpqi, qjq ¤ rpjpqjq (5.3)

(28)

we have found a way to lead the problem to an ILP formulation. The first con-straint comes from χijpqi, qjq being the product of probability functions; second

and third are once again a consequence of this, being 1 the maximum value of each realization plan component. The fourth constraint should come across as straight-forward as well, once remembered the realization plans we are exploiting are pure and examined all the possible values of the equation. For each possible couple i, j the constraints number 3|Qi|  |Qj| . Taking into account only these constraints,

we’ll have to solve the program once for each outcome; but we could also choose not to enumerate over the outcomes. In this case the quantity to be minimized de-rived from the linear program associated to the dual of the best response problem shown in previous papers at the very beginning would actually be:

rTp 1U1rp2rp3α1 r T p1U2rp2rp3α2 r T p1U3rp2rp3α3  r T p1pU1 U2 U3qr T p2rp3 β1TU1rp2rp3 r T p1U2rp3β2 r T p1r T p2U3β3

which could be rewritten as:

rp1rp2rp3pU1pα11q U2pα21q U3pα31qq β T 1U1rp2rp3 r T p1U2rp3β2 r T p1r T p2U3β3,

since we do not longer have the constance of the first term within a number of subproblems. We could then manage the new source of nonlinearity similarly as before, by defining a variable

ξijkpqi, qj, qkq  rpipqiqrpjpqjqrpkpqkq.

Again, the variable could be characterized through a linear system such as the one below: ξijkpqi, qjq P r0, 1s (5.1) ξijkpqi, qj, qkq ¤ rpipqiq (5.2) ξijkpqi, qj, qkq ¤ rpjpqjq (5.3) ξijkpqi, qj, qkq ¤ rpkpqkq (5.4) ξijkpqi, qj, qkq ¥ rpipqiq rpjpqjq rpkpqkq  2 (5.5)

(29)

Chapter 6

Conclusions

In this work we focus on coarse correlated equilibria for normal form games. In chapter 3, we show how an optimal NFCCE can be computed in polynomial time for 2 players games without Nature; in chapter 4 and 5 we see how the approach defined in chapter 3 can be exploited to obtain polynomial time computation for optimal NFCCE in 2 players games with Nature and optimal NFCCE in games featuring 3 or more players, once the LP initially proposed is appropriately revised and by means of a greedy support-identifying algorithm (for the case with Nature) and of auxiliary systems (for the case with n ¥3). Future work could study implementations that would further reduce the computation time.

(30)

Bibliography

[AGL16] B. An, N. Gatti, and V. Lesser. Alternating-offers bargaining in one-to-many and many-one-to-many settings. Annals of Mathematics and Artificial Intelligence, 77(1-2):67–103, 2016.

[Aum74] R. Aumann. Subjectivity and correlation in randomized strategies. Journal of Mathematical Economics, 1974.

[BCDNG17] Nicola Basilico, Andrea Celli, Giuseppe De Nittis, and Nicola Gatti. Computing the team–maxmin equilibrium in single–team single– adversary team games. Intelligenza Artificiale, 11(1):67–79, 2017.

[BCNG17] N. Basilico, A. Celli, G. De Nittis, and N. Gatti. Team-maxmin equilibrium: Efficiency bounds and algorithms. In AAAI, pages 356– 362, 2017.

[BNG16] N. Basilico, G. De Nittis, and N. Gatti. A security game combining patrolling and alarm-triggered responses under spatial and detection uncertainties. In AAAI, pages 404–410, 2016.

[BNG17] N. Basilico, G. De Nittis, and N. Gatti. Adversarial patrolling with spatially uncertain alarm signals. Artificial Intelligence, 246:220–257, 2017.

[BNT 17] L. Bisi, G. De Nittis, F. Trov`o, M. Restelli, and N. Gatti. Regret minimization algorithms for the followers behaviour identification in leadership games. In UAI, 2017.

[BS17] N. Brown and T. Sandholm. Libratus: The superhuman AI for no-limit poker. In IJCAI, pages 5226–5228, 2017.

[CG18] A. Celli and N. Gatti. Computational results for extensive-form ad-versarial team games. In AAAI, pages 965–972, 2018.

[CGG11] S. Ceppi, N. Gatti, and E. Gerding. Mechanism design for federated sponsored search auctions. In AAAI, 2011.

(31)

[CGM17] S. Coniglio, N. Gatti, and A. Marchesi. Pessimistic leader-follower equilibria with multiple followers. In IJCAI, pages 171–177, 2017.

[CGPR10] S. Ceppi, N. Gatti, G. Patrini, and M. Rocco. Local search tech-niques for computing equilibria in two-player general-sum strategic-form games. In AAMAS, pages 1469–1470, 2010.

[CMG17] A. Celli, A. Marchesi, and N. Gatti. On the complexity of nash equilibrium reoptimization. In UAI, 2017.

[CS08] V. Conitzer and T. Sandholm. New complexity results about nash equilibria. Games and Economic Behavior, 63(2):621 – 641, 2008.

[dCSB 13] E. Munoz de Cote, R. Stranders, N. Basilico, N. Gatti, and N. Jen-nings. Introducing alarms in adversarial patrolling games. In AA-MAS, pages 1275–1276, 2013.

[DGP09] C. Daskalakis, P. Goldberg, and C. Papadimitriou. The complex-ity of computing a nash equilibrium. SIAM Journal on Computing, 39(1):195–259, 2009.

[FCGS18] Gabriele Farina, Andrea Celli, Nicola Gatti, and Tuomas Sand-holm. Ex ante coordination and collusion in zero-sum multi-player extensive-form games. In Advances in Neural Information Processing Systems (NeurIPS), 2018.

[FMK 18] G. Farina, A. Marchesi, C. Kroer, N. Gatti, and T. Sandholm. Trembling-hand perfection in extensive-form games with commit-ment. In IJCAI, pages 233–239, 2018.

[GPRS12] N. Gatti, G. Patrini, M. Rocco, and T. Sandholm. Combining local search techniques and path following for bimatrix games. In UAI, pages 286–295, 2012.

[GRS13] N. Gatti, M. Rocco, and T. Sandholm. On the verification and com-putation of strong nash equilibrium. In AAMAS, pages 723–730, 2013.

[JFL 01] N. Jennings, P. Faratin, A. Lomuscio, S. Parsons, M. Wooldridge, and C. Sierra. Automated negotiation: prospects, methods and chal-lenges. Group Decision and Negotiation, 10(2):199–215, 2001.

[JPT 08] M. Jain, J. Pita, M. Tambe, F. Ord´o˜nez, P. Paruchuri, and S. Kraus. Bayesian stackelberg games and their application for security at los angeles international airport. SIGecom Exchanges, 7(2), 2008.

(32)

[LC14] F. Lopes and H. Coelho. Negotiation and Argumentation in Multi-agent Systems: Fundamentals, Theories, Systems and Applications. Bentham Science Publishers, 2014.

[MCG18] A. Marchesi, S. Coniglio, and N. Gatti. Leadership in singleton con-gestion games. In IJCAI, pages 447–453, 2018.

[MV78] H. Moulin and J-P Vial. Strategically zero-sum games: the class of games whose completely mixed equilibria cannot be improved upon. International Journal of Game Theory, 1978.

[Nas50] J. Nash. Equilibrium points in n-person games. Proceedings of the national academy of sciences, 36(1):48–49, 1950.

[NMG18] G. De Nittis, A. Marchesi, and N. Gatti. Computing the strategy to commit to in polymatrix games. In AAAI, pages 989–996, 2018.

[Sel74] R. Selten. Reexamination of the perfectness concept for equilibrium points in extensive games. Center for Mathematical Economics Work-ing Papers 023, Center for Mathematical Economics, Bielefeld Uni-versity, 1974.

[SFA 18] A. Sinha, F. Fang, B. An, C. Kiekintveld, and M. Tambe. Stackelberg security games: Looking beyond a decade of success. In IJCAI, pages 5494–5501, 2018.

[SLB08] Y. Shoham and K. Leyton-Brown. Multiagent systems: Algorithmic, game-theoretic, and logical foundations. Cambridge University Press, 2008.

[VJ98] D. Vermeulen and M. Jansen. The reduced form of a game. European Journal of Operational Research, 106(1):204 – 211, 1998.

[vS96] B. von Stengel. Efficient computation of behavior strategies. Games and Economic Behavior, 14(2):220 – 246, 1996.

[vSF08] B. von Stengel and F. Forges. Extensive-form correlated equilibrium: definition and computational complexity. Mathematics of Operations Research, 2008.

[VV03] S. De Vries and R. Vohra. Combinatorial auctions: A survey. IN-FORMS Journal on computing, 15(3):284–309, 2003.

Figura

Figure 2.1: Example of extensive-form game.
Figure 2.2: Game 1: Prisoner’s Dilemma (left) - Game 2: Bach-Stravinskij (right).
Figure 2.3: Some elements of the sequence-form representation of the game pro- pro-vided in Figure 2.1.
Figure 2.4: The only Nash equilibrium of the Prisoner’s Dilemma game.
+4

Riferimenti

Documenti correlati

An analytical solution is obtained for the steady state two-D temperature distribution in a capacitor, and for the two-D transient temperature in a food can under sterilization at

Here we described a male breast cancer patient who received combination therapy of the dual PI3K/mTOR inhibitor BEZ235 and low-dose everolimus as a third-line treatment for

Physico‐chemical parameters (tem‐ perature, salinity, turbidity, dissolved oxygen, heavy metals), genetic composition (microsatellites, ITS‐2), abundance and biomass (wet and

Standard or normal propagation conditions of the radar beam (i.e. vertical refractivity gradients around −40 N units/km for the first kilometer above sea level) are consid- ered to

We then focus our attention on some specific properties of coarse spaces, such as connectedness, we count the number of connected components of the coarse hyperspace in many cases,

Because the probability distribution on the couples of pure strategies, re- sulting from the mixed equilibrium is the following:... You pay me 1 if

The influence of torsional modes on excited-state dynamics is two-fold: (i) it brings excited states of the same spin multiplicity closer to each other, enhancing