Stochastic Control and Mean Field Games with an application to Optimal Trading

(1)

University of Pisa

FACULTY OF MATHEMATICS

Master’s Thesis

Stochastic Control and Mean Field Games

with an Application to Optimal Trading

Candidate:

Luca Minutillo Menga

Supervisor:

Prof. Luciano Campi

Co-Supervisors:

Prof. Fabrizio Lillo

Prof. Marco Romito

(2)

(3)

Introduction

Given an initial portfolio (that is, a number of shares of a certain stock) that should be bought or sold before a fixed terminal time, optimal trading (or optimal liquidation) means planning how many shares will be bought or sold at each time instant over a fixed finite time period, in order to maximize a gain. Like Cardaliaguet and Lehalle in [7], in this dissertation we formulate this classical problem within a mean field game: a set of investors (viewed as players) have to buy or sell in a given time interval some shares of the same stock, whose public price is influenced by their average trading speed.

In the first two chapters we briefly develop the theory of stochastic control and mean field games; in Chapter3we present an application of these theoretical tools to optimal trading.

After having informally introduced in Section1.1the meaning of stochastic control (and, in particular, of Markovian stochastic control), in Section1.2we discuss, for the sake of complete-ness, the “classical” approach for solving Markovian stochastic control problems. Unfortunately, this approach is ineffective in the application to optimal trading we are interested in; hence, in Section1.3and Section1.4we provide some results tailor-made for this application. In particu-lar, in the latter we introduce the definitions of risk aversion and utility functions, that will be useful in the sequel.

The Chapter 2 has the same structure as Chapter 1: in Section 2.1 we introduce the the-ory of stochastic differential games (that can be viewed as generalizations of stochastic control problems) and mean field games (a special class of differential stochastic games); in particular, we define the best response and the Nash equilibrium, two fundamental notions in game theory. Then, in Section2.2we discuss the classical approach to mean field games, whereas in Section2.3

we present another class of mean field games (sometimes called “extended mean field games”), which include the game described in Chapter3.

Cardaliaguet and Lehalle consider a simple model for the “permanent market impact” and prove that, under suitable conditions, this game admits a unique Nash equilibrium. We present this result in Section3.3and Section3.4, giving an inequality (eq. (3.4.3)) that must be satisfied in order for it to hold (whereas they simply require a certain parameter to be “small enough”). This equilibrium configuration is explicitly described in [7] under the additional assumption that all the investors have the same risk aversion.

Our main purpose is to extend these results by Cardaliaguet and Lehalle to a more general model for the market impact, from which many other models widely used in the literature can be derived as special cases (see the discussion in Section3.1and Section3.2).

More precisely, in Section3.5we prove the existence and uniqueness of the Nash equilibrium according to this general model; in Section 3.6, instead, we address the case of the same risk aversion for (almost) all the investors, where our general framework lead to equations for the Nash Equilibrium analogous to those formulated in [7]. Unfortunately, generally speaking, they

(6)

vi INTRODUCTION do not admit explicit solutions; however, this is not a big issue, since they can be numerically computed.

In Section3.7we perform a few numerical experiments, using the results stated in Section3.6

(thus, the risk-aversion is still supposed to be the same for almost all the agents). For simplicity, we focus on two particular cases of our general model: the one used by Cardaliaguet and Lehalle and another one, by Obizhaeva and Wang (see [16]).

Finally, in Section 3.8 we reformulate the whole problem by expressing the risk aversion of the players through suitable utility functions, and we analyze how this formulation changes the results previously presented.

In the Conclusion we briefly explain how can a Nash equilibrium appear in actual trading, even though the investors typically don’t have enough information to evaluate it.

(7)

Chapter

1

Stochastic Control

1.1 An informal introduction

Let us consider a dynamic system characterized by its state Xt ∈ Rd at any time t ∈ [0, T ]

(where T > 0 is a fixed maturity) and evolving in an uncertain environment, formalized by a probability space (Ω, F, P ), which is equipped with a filtration (Ft)t∈[0,T ] (with Ft⊆ F for all

t ∈[0, T ]) satisfying the usual conditions (completeness and right-continuity) and supporting a m-dimensional Brownian motion (Wt)t∈[0,T ].

The dynamics of the system is supposed to be driven by a process α = (αt)t∈[0,T ] called

control, which we can choose among a class of admissible processes. Each control takes values in a space of actions A ⊆ Rk_{, meaning that at each instant t we decide what “action” α}

t∈ A to

undertake in order to properly modify the evolution of the system. This can be formalized by introducing a stochastic differential equation (SDE)

Xt= b(t, Xt, αt)dt + σ(t, Xt, αt)dWt t ∈[0, T ],

in which the drift b(t, Xt, αt) ∈ Rd and the volatility σ(t, Xt, αt) ∈ Rd×m depend upon both the

current state Xtand the current action αt(and, possibly, also upon the scenario ω ∈ Ω).

We aim at finding the most appropriate control in order to maximize a certain gain or minimize a certain cost. For the sake of convenience we shall always talk about gains, avoiding to rephrase each definition (or result) in terms of costs, which can be always done straightforwardly. This gain may include a terminal gain g(XT) depending only upon the final state XT (and,

possibly, upon the scenario ω ∈ Ω), as well as a running gain Z T

0

f(t, Xt, αt)dt,

which takes into account all the past history of X = (Xt)t∈[0,T ]and might be controlled by α.

Therefore, we define the expected gain as J = E Z T 0 f(t, Xt, αt)dt + g(XT) . If α maximizes J, then it is called optimal control.

(8)

Chapter 1. Stochastic Control

A stochastic control problem (i.e. the problem of finding an optimal control) is called Marko-vian if the functions b, σ, f, g are deterministic (namely, not depending upon the scenario Ω). The rest of this chapter is devoted to study this class of problems.

1.2 Dynamic Programming Principle and HJB Equation

In this section we shall follow the classical approach to the Markovian stochastic control theory. Let us consider four continuous functions

b: [0, T ] × Rd

× A −→ Rd_, _f

: [0, T ] × Rd

× A −→ R, σ: [0, T ] × Rd_{× A −→ R}d×m_, _g_{: R}d_{−→ R}

satisfying the following conditions:

|b(t, x, a) − b(t, y, a)| + |σ(t, x, a) − σ(t, y, a)| ≤ c|x − y|, (1.2.1)

|b(t, x, a)| + |σ(t, x, a)| ≤ c(1 + |x| + |a|), (1.2.2) |f (t, x, a)| ≤ c(1 + |x|2_{+ |a|}2_), _(1.2.3)

|g(x)| ≤ c(1 + |x|2_), _(1.2.4)

for some constant c > 0 independent of (t, x, y, a).

Let us choose as class of admissible controls the collection A of all the processes α : [0, T ]×Ω → A _{belonging to H}2

, where H2 _{denotes the class of the progressively measurable processes with}

finite L2_{([0, T ] × Ω) norm.}

Note that, as a consequence of eq. (1.2.2), for every x ∈ Rd _{and α ∈ A}

E Z T 0 |b(t, x, αt)|2+ |σ(t, x, αt)|2 dt ≤ 3c2 T+ T |x|2 + E Z T 0 |αt|2dt < ∞. Hence the following proposition holds true as a direct application of Theorem1.2.2below, whose proof can be found in [17, p. 12].

Proposition 1.2.1. For every_{(t, x) ∈ [0, T ) × R}d _{and every control α ∈ A, the equation}

dXr= b(r, Xr, αr)dr + σ(r, Xr, αr)dWr r ∈[t, T ],

with initial condition Xt= x, admits a unique strong solution Xt,x,α in H2. Furthermore,

E sup r∈[t,T ] |Xt,x,α r | 2 < ∞.

Theorem 1.2.2 (Existence and uniqueness of strong solutions to SDEs). Given t ∈ [0, T ), let ˆb : [t, T ] × Ω × Rd

−→ Rd, ˆ

σ_{: [t, T ] × Ω × R}d_{−→ R}d×m be progressively measurable processes such that ˆb(·, x), ˆσ(·, x) ∈ H2

for all x ∈ Rd and |ˆb(r, x, ω) − ˆb(r, y, ω)| + |ˆσ(r, x, ω) − ˆσ(r, y, ω)| ≤ ˆc|x − y|

(9)

1.2. Dynamic Programming Principle and HJB Equation

for some positive constant ˆc independent of(r, x, y, ω). Then, for every X ∈ L2_{(Ω) independent}

of W , the SDE

dXr= ˆb(r, Xr)dr + ˆσ(r, Xr)dWr r ∈[t, T ],

with initial condition Xt= X, admits a unique strong solution X in H2. Moreover,

E sup r∈[t,T ] |Xr|2 < ∞.

Proposition1.2.1, together with eq. (1.2.3) and eq. (1.2.4), ensures that the gain function J(t, x, α) = E Z T t f(r, Xt,x,α r , αr)dr + g(XTt,x,α)

is well defined for all (t, x) ∈ [0, T ) × Rd_{, α ∈ A. Indeed,}

V : [0, T ] × Rd _−→ R (t, x) 7−→ sup α∈A J(t, x, α) t ∈[0, T ) (T, x) 7−→ g(x),

and say that ˆα ∈ A_{is an optimal control, with respect to (t, x) ∈ [0, T ) × R}d_{, if}

J(t, x, ˆα) = V (t, x).

Let us now present two central and enlightening theorems of stochastic control theory: the Dy-namic Programming Principle and the Hamilton-Jacobi-Bellman (HJB) Equation. Their proofs are rather technical and we shall omit them; however, the interested reader can find them in [17]. We are not going to use these two results in the sequel; nevertheless, we state them for the sake of completeness.

Theorem 1.2.3 (Dynamic Programming Principle). If V is continuous, the equality V(t, x) = sup α∈AE Z τ t f(r, Xrt,x,α, αr)dr + V (τ, Xτt,x,α)

holds true for all(t, x) ∈ [0, T ) × Rd _{and any stopping time τ} _{: Ω −→ [t, T ]}

This means that we can split the optimization problem over the interval [t, T ] into two minor (and, possibly, simpler) problems, involving separately the intervals [t, τ ] and [τ, T ].

The HJB equation can be considered as the infinitesimal version of Theorem 1.2.3. We shall partially prove it (assuming Theorem 1.2.3), in order to understand why. In order to make the Formulation more convenient, we shall use the following notation: given a function w ∈ C0,2_{([0, T ) × R}d_{), we define} Lw(t, x, a) = b(t, x, a) · ∇xw(t, x) + 1 2Tr(σ(t, x, a)σ(t, x, a) 0_H xw(t, x)) for all (t, x, a) ∈ [0, T ) × Rd_{× A.}1 1_H

(10)

Theorem 1.2.4(Hamilton-Jacobi-Bellman Equation). Let us assume that V ∈ C0_{([0, T ]×R}d_)∩

C1,2_{([0, T ) × R}d_{) and f (·, ·, a) ∈ C}0_{([0, T ] × R}d_{) for all a ∈ A. If}

sup a∈A {LV (t, x, a) + f (t, x, a)} < ∞ (1.2.5)

for all(t, x) ∈ [0, T ) × Rd_{, then the following equality, known as Hamilton-Jacobi-Bellman}

equa-tion, holds true: ∂V

∂t (t, x) + supa∈A

{LV (t, x, a) + f (t, x, a)} = 0 (t, x) ∈ [0, T ) × Rd_.

Proof. We prove only the “≤” inequality, that holds also without the assumption (1.2.5). The Dynamic Programming Principle ensures that

V_{(t, x) ≥ E} Z t+τ t f(r, Xrt,x,a, a)dr + V (t + τ, X t,x,a t+τ ) (1.2.6) for any constant control a ∈ A and any stopping time τ : Ω −→ [0, T − t].

The function V being smooth, we can apply Itˆo’s Formula, obtaining

V(t + τ, Xt+τt,x,a) = V (t, x) + Z t+τ t ∂V ∂r(r, X t,x,a r ) + LV (r, Xrt,x,a, a) dr+ Mτ, (1.2.7) where (Mh)h∈[t,T ]is defined by Mh= Z t+h t σ(r, Xrt,x,a, a) · ∇xV(r, Xrt,x,a)dWr.

Eq. (1.2.6) and eq. (1.2.7) lead to

0 ≥ E Z t+τ t f(r, Xt,x,a r , a) + ∂V ∂r(r, X t,x,a r ) + LV (r, X t,x,a r , a) dr+ Mτ . (1.2.8)

Let τ1, τ2: Ω −→ [0, ∞] be the stopping times defined as follows:

τ1= inf h>0 Z t+h t |σ(r, Xt,x,a r , a) · ∇xV(r, Xrt,x,a)| 2_{dr ≥}₁ τ2= inf h>0{|X t,x,a h − x| ≥ 1}.

Note that (Mτ1∧h)h≥0 is a martingale (by very standard Itˆo’s Calculus results); so, by the

Optimal Stopping Theorem, E[Mτ1∧τ2∧h] = 0 for any h > 0. Thus, if we set τ = τ1∧ τ2∧ h in

eq. (1.2.8) for some h ∈ (0, T − t], we get

0 ≥ E Z t+τ1∧τ2∧h t f(r, Xrt,x,a, a) + ∂V ∂r(r, X t,x,a r ) + LV (r, X t,x,a r , a) dr . (1.2.9) 4

(11)

1.2. Dynamic Programming Principle and HJB Equation

Using the a.s. continuity of the process Xt,x,a_{, we can easily infer that τ}

1∧ τ2 almost surely

positive; so τ1∧ τ2∧ h is almost surely equal to h for h small enough. Hence we deduce from the

Mean Value Theorem that lim h→0 1 h Z t+τ1∧τ2∧h t f(r, Xt,x,a r , a) + ∂V ∂r(r, X t,x,a r ) + LV (r, Xrt,x,a, a) dr= = lim h→0 1 h Z t+h t f(r, Xt,x,a r , a) + ∂V ∂r(r, X t,x,a r ) + LV (r, Xrt,x,a, a) dr= = f (t, x, a) +∂V ∂t(t, x) + LV (t, x, a) almost surely. Moreover, we have

1 h Z t+τ1∧τ2∧h t f(r, Xt,x,a r , a) + ∂V ∂r(r, X t,x,a r ) + LV (r, X t,x,a r , a) dr ≤(τ1∧ τ2∧ h)c(t) h ≤ c(t)

almost surely for all h ∈ (0, (T − t)/2], where c(t) is the positive constant given by sup f(r, y, a) +∂V ∂r(r, y) + LV (r, y, a) : r ∈ t, t+T − t 2 , |y − x| ≤1

and is finite by the Extreme Value Theorem, being V ∈ C1,2_{([0, T ) × R}d_{) and f (·, ·, a) ∈}

C0_{([0, T ] × R}d_).

Therefore, dividing both side of eq. (1.2.9) by h and applying Lebesgue’s Dominate Conver-gence Theorem, we obtain

0 ≥ lim h→0 1 hE Z t+τ1∧τ2∧h t f(r, Xrt,x,a, a) + ∂V ∂r(r, X t,x,a r ) + LV (r, X t,x,a r , a) dr ≥ ≥ E lim h→0 1 h Z t+τ1∧τ2∧h t f(r, Xt,x,a r , a) + ∂V ∂r(r, X t,x,a r ) + LV (r, X t,x,a r , a) dr = = f (t, x, a) +∂V ∂t(t, x) + LV (t, x, a). The above inequality is true for every a ∈ A, thus the proof is complete.

More in general, we shall say that a generic function w ∈ C1,2_{([0, T ) × R}d_{) satisfies the HJB}

equation if

∂w

∂t(t, x) + supa∈A

{Lw(t, x, a) + f (t, x, a)} = 0 t ∈[0, T ], x ∈ Rd_. _(1.2.10)

The last theorem of this section requires the following definition. Given ν : [0, T ] × Rd_{−→ A}

measurable, we call ν Markovian control if for all (t, x) ∈ [0, T ) × Rd _{there exists a control α ∈ A}

such that αr= ν(r, Xrt,x,α); moreover we call ν optimal if each of these α can be chosen optimally.

Theorem1.2.4suggests that a function ˆν : [0, T ] × Rd_{→ A such that}

ˆ

ν(t, x) ∈ arg max

a∈A

{LV (t, x, a) + f (t, x, a)} (t, x) ∈ [0, T ) × Rd

plays a key role in the solution of our optimization problem. Indeed the following theorem, which will be proven in a more general fashion in the next section, ensures that if ˆν is a Markovian control (and if V satisfies a quadratic growth condition), then ˆν is optimal. In fact, it states much more: in order for this to be true, the function satisfying the HJB equation does not need to be a priori the value function!

(12)

Theorem 1.2.5(Verification Theorem). Let w ∈ C0_{([0, T ] × R}d_{) ∩ C}1,2_{([0, T ) × R}d_{) satisfy the}

HJB equation (1.2.10) over[0, T ) × Rd_{, with terminal condition w(T, x) = g(x). Let us suppose,}

furthermore, that

|w(t, x)| ≤ c(1 + |x|2) (1.2.11)

for a positive constant c independent of(t, x).

If ν: [0, T ] × Rd_{−→ A is a Markovian control such that}

a∈A

{Lw(t, x, a) + f (t, x, a)} for all(t, x) ∈ [0, T ) × Rd_{, then w}_{= V and ν is optimal.}

1.3 Markovian dynamics under weaker assumptions

In this section we address the following question, arising from the financial problems that we are going to discuss later on: how can we approach a Markovian optimal control problem described by continuous functions b, σ, f, g not satisfying the conditions given by eq. (1.2.1), (1.2.2), (1.2.3), (1.2.4)? Note that in this case we cannot choose as admissible controls all those which belongs to H2_{, as we did in the previous section, since Proposition}_1.2.1_{may fail to be true, or the gain}

function might not be well defined. So, how can we choose the class of admissible controls? Let x0 ∈ Rd be fixed and let us consider a non-empty set A of progressively measurable

processes α : [0, T ] × Ω −→ A. We state, by axiom, that A is eligible to be the class of the admissible controls for the system X satisfying

(

dXt= b(t, Xt, αt)dt + σ(t, Xt, αt)dWt t ∈[0, T ]

X0= x0

(1.3.1) if and only if for every α ∈ A there exists at least a continuous strong solution Xα _{to (}_1.3.1₎

such that E Z T 0 |f (t, Xα t, αt)|dt + |g(XTα)| < ∞. (1.3.2)

The condition given by eq. (1.3.2) ensures that the expected gain J(α) = E Z T 0 f(t, Xα t, αt)dt + g(XTα) (1.3.3) is well defined. If Xα_{is not uniquely determined by this condition, we should provide a suitable}

criterion of choice to determine it without ambiguity. For instance, in the previous section Xα

is uniquely determined by requiring that it belongs to the space H2 _{(see Proposition} _1.2.1_).

Therefore, the class of admissible controls presented in the previous section is consistent with this statement.

We should now deal with the stochastic control problem itself. The following generalization of Theorem1.2.5provides a strategy for finding an optimal control in A, giving sufficient conditions for its uniqueness.

Theorem 1.3.1(Generalized Verification Theorem). Let us assume the existence of w ∈ C0_{([0, T ]×}

Rd) ∩ C1,2([0, T ) × Rd) and ν : [0, T ] × Rd −→ A measurable satisfying the following conditions: (i) w solves the HJB equation (1.2.10), with terminal condition w(T, x) = g(T, x).

(13)

1.3. Markovian dynamics under weaker assumptions

(ii) For all α ∈ A

E sup t∈[0,T ] |w(t, Xα t)| < ∞. (1.3.4)

(iii) For all (t, x) ∈ [0, T ) × Rd

a∈A

{Lw(t, x, a) + f (t, x, a)}. (1.3.5)

(iv) The SDE

dXt= b(t, Xt, ν(t, Xt))dt + σ(t, Xt, ν(t, Xt))dWt t ∈[0, T ], (1.3.6)

with initial condition X0= x0, admits a continuous strong solution Xν.

(v) The process αˆ: [0, T ] × Ω −→ R defined by ˆαt= ν(t, Xtν) belongs to A, and Xαˆ = Xν.

Then,α is optimal in A and w(0, xˆ 0) = J(ˆα).

Moreover, if the function A 3 a 7−→ Lw(t, x, a) + f (t, x, a) admits a unique maximum point for all (t, x) ∈ [0, T ) × Rd _{and X}ν _{is the only continuous strong solution of eq. (}_1.3.6_{), then} _α_ˆ

is the only optimal control in A.

Proof. Given a generic control in α ∈ A, let us denote ϕα(t) = ∂w ∂t(t, X α t) + Lw(t, X α t, αt) + f (t, Xtα, αt) t ∈[0, T ). By Hypothesis (i), ϕα(t) ≤ 0 t ∈[0, T ). (1.3.7) Since w ∈ C1,2

([0, T ) × Rd_{), we have by Itˆo’s Formula that}

w(τ, Xα τ) = w(0, x0) + Z τ 0 ∂w ∂t(t, X α t) + Lw(t, X α t, αt) dt+ Mτ (1.3.8)

for any stopping time τ : Ω −→ [0, T ], where Mh=

Z h

0

σ(t, Xtα, αt) · ∇xw(t, Xtα)dWt h ∈[0, T ].

M is a local martingale, so there exists an (a.s.) increasing and diverging sequence (τn)n≥0 of

stopping times such that (Mτn∧h)h≥0 is a martingale for all n. Hence, setting τ = τn∧ T in eq.

(1.3.8) gives the following equality for all n ∈ N: E Z τn∧T 0 f(t, Xα t, αt)dt + w(τn∧ T, Xταn∧T) = w(0, x0) + E Z τn∧T 0 ϕα(t)dt . (1.3.9) Observe that E sup n∈N Z τn∧T 0 f(t, Xα t, αt)dt + w τn∧ T, Xταn∧T ≤ ≤ E Z T 0 |f (t, Xα t, αt)|dt + sup t∈[0,T ] |w(t, Xν t)| < ∞

(14)

by hypothesis (ii). Hence, by letting n tend to infinity, we obtain

J(α) = = E Z T 0 f(t, Xα t, αt)dt + w(T, XTα) = = lim n→∞E Z τn∧T 0 f(t, Xα t, αt)dt + w τn∧ T, Xταn∧T = = w(0, x0) + lim n→∞E Z τn∧T 0 ϕα(t)dt ≤ ≤ w(0, x0), (1.3.10)

having used (in this order): the fact that g(T, ·) = w(T, ·), Lebesgue’s Dominate Convergence Theorem, eq. (1.3.9), eq. (1.3.7).

Now note that if α = αν _{the inequality (}_1.3.7_{) is actually an equality, by the hypotheses}

(iii), (iv), (v); thus, in this particular case, (1.3.10) leads to

J(ˆα) = w(0, x0). (1.3.11)

From eq. (1.3.10) (which is valid for any α ∈ A) and (1.3.11) we can infer that ˆαis optimal in A.

Now let us prove the second part of the theorem. A generic control α ∈ A is optimal if and only if J(α) = J(ˆα); thus, in order to be optimal, α must satisfy the relation

lim n→∞E Z τn∧T 0 ϕαˆ(t)dt = 0, (1.3.12)

by eq. (1.3.10) and eq. (1.3.11).

However, since ϕαˆ _{is a non-positive function (by eq. (}_1.3.7_{)), the expectations in eq. (}_1.3.12₎

must form a decreasing sequence of non-positive numbers. This means that each of them must be equal to zero, from which we deduce (using the continuity of φα_{) that ϕ}α _{is identically zero}

over the interval [0, T ).

Therefore, by the uniqueness of the maximum point of a 7−→ Lw(t, x, a) + f (t, x, a) for all (t, x) ∈ [0, T ) × Rd_{, we can infer that α}

t = ν(t, Xtα) identically. This implies that Xα is a

continuous strong solution of eq. (1.3.6); since this equation admits a unique continuous strong solution, we conclude that Xα_{= X}ν _{identically. Thus, α}

t= ν(t, Xtν) identically, i.e. α = ˆα.

Note that this theorem is an actual generalization of Theorem 1.2.5. Indeed the quadratic condition given by eq. (1.2.11), together with Proposition 1.2.1, ensure that eq. (1.3.4) is satisfied.

1.4 Risk aversion and utility functions

In this section we maintain all the notations and assumptions of the previous section. In partic-ular, we keep considering the system X evolving as described in eq. (1.3.1), with expected gain given by eq. (1.3.3).

Clearly, even if we are able to evaluate an optimal control ˆα ∈ A, nothing ensures that our actual gain will be close to J(ˆα), given the uncertainty of the environment in which X is evolving. In this section, we suppose to be particularly interested in reducing this uncertainty. Roughly

(15)

1.4. Risk aversion and utility functions

speaking, this means that we are more interested in small gains likely to be achieved rather than in huge and highly risky ones.

This kind of behaviour is called risk aversion and can be formalized by properly modifying the expected gain that we aim at maximizing.

The new expected gain takes the form e J(α) = E U Z T 0 f(t, Xtα, αt)dt + g(XTα) , (1.4.1)

where U ∈ C0_{(R) is a suitable concave function called utility function. Note that the risk aversion}

is exactly expressed by the concavity of U .

In order for the new expected gain to be well defined, we may need to restrict the class A of admissible controls to a certain subset eA ⊆ A, such that

E U Z T 0 f(t, Xtα, αt)dt + g(XTα) < ∞ for all α ∈ eA.

Strictly speaking, the expected gain eJ is not included in the general discussion made in Section 1.1, since the quantity inside the expectation in eq. (1.4.1) neither has the form of a running cost, nor depend uniquely on the terminal state of the system.

However, this issue can be solved by introducing a new one-dimensional component Yα _to

the system Xα_{, determined by}

(

dYt= f (t, Xt, αt)dt t ∈[0, T ]

Y0= 0

for every α ∈ eA.

Hence, eq. (1.4.1) can be rewritten as e

J(α) = E[eg( eX

α T)],

where eXα= (Xα_{, Y}α_{) is the new system controlled by α ∈ e}_{A and}

e

g: Rd+1 −→ R

(x, y) 7−→ g(x) + y is the new terminal function.

It is straightforward to check that eA is a well defined (according to the axiom given in the previous section) class of admissible controls for the new stochastic control problem described above (to which we shall refer as “modified problem”), according to the axiom given in the previous section.

One of the most common utility functions is the exponential utility function, namely U(x) = −1

λe

−λx

x ∈ R, (1.4.2)

where λ is a positive parameter; other examples of widely used utility functions include the power utility function, the isoelastic utility function and the logarithmic utility function; see [14] for their definitions and further information.

The following proposition, which will be useful in the sequel, is a modification of Theorem

1.3.1, suited to the modified problem above. Here we shall consider the exponential utility function given by eq. (1.4.2).

(16)

Proposition 1.4.1. Let us assume the existence of w ∈ C0_{([0, T ] × R}d_{) ∩ C}1,2_{([0, T ) × R}d_{) and}

ν: [0, T ] × Rd_{−→ A measurable satisfying the following conditions:}

(i) w solves the equation2 ∂w ∂t(t, x)+sup_{a∈ e}_A Lw(t, x, a)−λ 2kσ 0_{(t, x, a)∇} xw(t, x)k22+f (t, x, a) = 0 t ∈[0, T ], x ∈ Rd_, (1.4.3) with terminal condition w(T, x) = g(T, x).

(ii) For all α ∈ eA

E exp λ sup t∈[0,T ] |w(t, Xα t)| + λ Z T 0 |f (t, Xα_{, α)|dt} < ∞.

(iii) For all(t, x) ∈ [0, T ) × Rd

ν(t, x) ∈ arg max a∈ eA Lw(t, x, a) −λ 2kσ 0_{(t, x, a)∇} xw(t, x)k22+ f (t, x, a) . (1.4.4)

(iv) The SDE

dXt= b(t, Xt, ν(t, Xt))dt + σ(t, Xt, ν(t, Xt))dWt t ∈[0, T ], (1.4.5)

with initial condition X0= x0, admits a continuous strong solution Xν.

(v) The processαˆ: [0, T ] × Ω −→ R defined by ˆαt= ν(t, Xtν) belongs to eA, and Xαˆ= Xν.

Then, α is optimal in eˆ A with respect to the expected gain

e J_{(α) = E} −1 λexp − λ Z T 0 f(t, Xtα, αt)dt − λg(XTα) α ∈ eA, and e J(ˆα) = −1 λexp[−λw(0, x0)]. Moreover, if the function

A 3 a 7−→ Lw(t, x, a) −λ 2kσ

0_{(t, x, a)∇}

xw(t, x)k22+ f (t, x, a)

admits a unique maximum point for all(t, x) ∈ [0, T ) × Rd _{and X}ν _{is the only continuous strong}

solution of eq. (1.4.5), then α is the only optimal control in eˆ A. Proof. Let us define

e

w: [0, T ] × Rd+1 _−→

R (t, x, y) 7−→ −1

λexp[−λw(t, x) − λy].

Our goal is proving thatw_eand ν satisfy all the hypotheses of Theorem1.3.1for the modified problem described above. Having done it, the proposition can be straightforwardly derived from Theorem1.3.1.

2_{Here k·k}

2is the euclidean norm in Rd.

(17)

1.4. Risk aversion and utility functions

Let us start from hypothesis (i). Since the running function for the modified problem is null, the HJB equation forw_ereads

∂w_e

∂t(t, x, y) + sup_{a∈ e}_ALw(t, x, y, a) = 0e (t, x, y) ∈ [0, T ] × R

d+1_. _(1.4.6)

Now note that

Lw(t, x, y, a) =_e = b(t, x, a) · ∇xw(t, x, y) + f (t, x, a)e ∂w_e ∂y(t, x, y) + 1 2Tr(σ(t, x, a)σ(t, x, a) 0_H xw(t, x, y)) =e = Lw(t, x, a) −λ 2kσ 0_{(t, x, a)∇} xw(t, x)k22+ f (t, x, a) exp[−λw(t, x) − λy]. (1.4.7) Hence, ∂w_e ∂t(t, x, y) + sup_{a∈ e}_ALwe(t, x, y, a) exp[λw(t, x) + λy] = = ∂w ∂t(t, x) + sup_{a∈ e}_A Lw(t, x, a) −λ 2kσ 0_{(t, x, a)∇} xw(t, x)k22+ f (t, x, a) = 0 by eq. (1.4.3), which proves the validity of eq. (1.4.6). Moreover,

e

w(T, x, y) = exp[−λw(T, x) − λy] =_eg(x, y) by definition.

Hypothesis (ii) of Theorem1.3.1is directly implied by hypothesis (ii) above. By eq. (1.4.7), we have arg max a∈ eA Lw(t, x, y, a) =_e arg max a∈ eA Lw(t, x, a) − λ 2kσ 0_{(t, x, a)∇} xw(t, x)k22+ f (t, x, a) 3 ν(t, x);

therefore also hypothesis (iii) of Theorem1.3.1is verified thanks to hypothesis (iii) above. Finally, hypotheses (iv) and (v) of Theorem1.3.1don’t even need to be checked, since they are equal to the respective ones above.

(18)

(19)

Chapter

2

Mean Field Games

2.1 Introduction and motivations

In this chapter we introduce some mathematical models to address the case, arising from many practical situations, in which the dynamics of our private state is not under our own complete control, being influenced by the choices of other “players” like us, each of whom aims at maxi-mizing his own expected gain.

Let I be the set of players and suppose, in the most general case, that (Ω, F, P ) is equipped with a filtration (Fi

t)t∈[0,T ] for each i ∈ I (meaning that different players may not have access

to the same information),1_{each of which satisfies the usual conditions (completeness and}

right-continuity) and supports an mi-dimensional Brownian motion (Wti)t∈[0,T ]. Generally speaking,

we don’t require the Brownian motions (Wi₎

i∈I to be independent.

At time t each player has got his own private state Xi

t ∈ Rdi and can undertake an action

αi

t∈ Ai ⊆ Rki. The process αiin this context is called strategy, and is subject to some constraints

in a similar fashion as for stochastic control. Let us call XXX = (Xi₎

i∈I the total system and

ααα= (αi₎

i∈I the strategy profile, i.e. the collection of the strategies of all the players.

A stochastic differential game can be described by four continuous functions: 2

b: [0, T ] ×Q iRdi× Q iAi−→ Q iRdi, f : [0, T ] × Q iRdi× Q iAi −→ Q iR σ: [0, T ] ×Q iRdi× Q iAi−→ Q iRdi×mi, g: Q iRdi −→ Q iR.

Player i aims at maximizing the expected gain Ji(ααα) = E Z T 0 fi(t, XXXt, αααt)dt + gi(XXXT) , (2.1.1)

where XXX evolves according to the equations

dXti= bi(t, XXXt, αααt)dt + σi(t, XXXt, αααt)dWti t ∈[0, T ], i ∈ I, (2.1.2)

with certain initial conditions.

Let us call A the set of admissible strategy profiles, i.e. the set of the strategy profiles that ensure the well-definiteness of the game.

1_{We suppose F}i

t⊆ F for all i ∈ I and t ∈ [0, T ].

2_{We are now considering the Markovian case, in order to simplify the notation; more in general we can let}

(20)

Chapter 2. Mean Field Games

Given ααα ∈ A, the best response of player i to αααis the set of all the strategies ˆαi_{that maximize}

the expected gain of player i given that all the others play according to ααα; namely,3

Ji(α−i,αˆi) = sup{Ji(α−i, αi) : (α−i, αi) ∈ A}.

If player i knew in advance the strategies of the other players, then he would perform the best response to them. In general, this is not the case.

This raises the question: is it possible to define an “optimal” strategy profile?

Actually, there are several notions of optimality. We are particularly interested in the fol-lowing one: a strategy profile ˆααα ∈ A is said to be a Nash equilibrium if ˆαi _{belongs to the best}

response of player i to ˆαααfor each i ∈ I. Namely, if

Ji(ˆα−i,αˆi) = sup{Ji_(ˆ_α−i_{, α}i_{) : (ˆ}_α−i_{, α}i_{) ∈ A}}

for each i ∈ I.

Solving a stochastic differential game often amounts to identifying one possible Nash equilib-rium (or all of them). Usually, the way to do this is evaluating for each strategy profile its own best response, and then looking for a possible fixed point of the “best response map”.

Unfortunately, both these two tasks are complicated, generally speaking. In particular, what makes evaluating the best response so difficult is that varying the strategy of player i affects not only the dynamics of Xi _{but also all the others; therefore, even keeping α}−i_{fixed, the evolution}

of Xi _{is influenced both from the variations of α}i _{and from the subsequent variations of X}−i_.

In order to avoid this issue, we can consider games with a high or infinite number of players, in which the behaviour of a single player has a negligible impact on the dynamics of the entire system. Such games are called Mean Field Games and will be the topic of the rest of this chapter.

2.2 The classical approach

In this section we make the following assumptions: • I has size n < ∞.

• di= d, mi= m, Ai= A ⊆ Rk for all i ∈ I.

• The Brownian motions (Wi

t)t∈[0,T ]are independent.

• b, σ, f, g have the following structure: for all i ∈ I 4

bi(t, xxx, aaa) = 1 n X j∈I ˜b(t, xi_{, x}j_{, a}i ), fi(t, xxx, aaa) = 1 n X j∈I ˜ f(t, xi, xj, ai), σi(t, xxx, aaa) = 1 n X j∈I ˜ σ(t, xi_{, x}j_{, a}i_{), g}i_(x_x_{x) =} 1 n X j∈I ˜ g(xi_{, x}j_),

for some continuous functions ˜b : [0, T ] × Rd × Rd × A −→ Rd_, _f_˜ : [0, T ] × Rd × Rd × A −→ R, ˜ σ: [0, T ] × Rd × Rd × A −→ Rd×m_, _˜_g : Rd × Rd −→ R. 3_{We use the notation (common in literature) x}−i_{to denote (x}j₎

j∈I\{i}. 4_{We denote x}_x_{x = (x}j₎

j∈I, aaa = (aj)j∈I.

(21)

2.2. The classical approach

Note that in this case the game is completely symmetric. Given a measure γ on (Rd_{, B}_(Rd_)),5 _{let us define}

b(t, x, γ, a) = Z

Rd

˜b(t, x, x0_{, a)dγ(x}0_),

and σ(t, x, γ, a), f (t, x, γ, a), g(x, γ) in a similar fashion. Then, eq. (2.1.2) and (2.1.1) can be rewritten as dX_ti= b(t, Xi t, γ n t, α i t)dt + σ(t, X i t, γ n t, α i t)dW i t t ∈[0, T ], i ∈ I, Ji(ααα) = E Z T 0 f(t, Xti, γ n t, α i t)dt + g(X i T, γ n t) , where γ_tn= 1 n X j∈I δ_Xj t. 6 _(2.2.1)

Being the game symmetric, we expect a Nash equilibrium to be symmetric as well (i.e. having identical strategies).

Thus, let us consider a symmetric strategy profile ααα= (α)i∈I ∈ A. First, we would like to

evaluate the best response of player i to ααα. This amounts to finding the strategy ˆαi ∈ {αi _:

(α−i, αi_{) ∈ A} that maximizes the expected gain of player i for the system X}_X_X _{satisfying the}

equations ( dXi t = b(t, Xti, γtn, αit)dt + σ(t, Xti, γtn, αit)dWti dXtj= b(t, X j t, γtn, αt)dt + σ(t, Xtj, γtn, αt)dWtj j ∈ I \ {i} t ∈[0, T ], where γn _{is given by (}_2.2.1_).

As we said in the previous section, this optimization problem is quite difficult as it is. However, if n is large enough, it seems legitimate to expect that the dynamics of XXX does not change significantly by replacing γn t with 1 n −1 X j∈I\{i} δ_Xj t. (2.2.2)

Note that, after this substitution, αi _{doesn’t affect the dynamics of X}−i_anymore.

Actually, we can go even further, using again the assumption on the large number of player. Indeed, given the symmetry of the system X−i_{, the private states (X}j

t)j∈I\{i}at time t are equally

distributed and independent (being the Brownian motions (Wtj)t∈[0,T ]independent). Thus, the

measure given by eq. (2.2.2) is the empirical measure of a sample of n − 1 observations of each state Xtj(j ∈ I \ {i}). Suitable forms of the law of large number, like Glivenko-Cantelli Theorem

7 _{(see [}₁₀_{, p. 399-402]), suggest therefore that, for large values of n, we can actually replace γ}n t

with the law γtof each X j

t (j ∈ I \ {i}).

Hence, we may expect that the best response ˆαi _{of player i to α}_α_α_{is well approximated by the}

(optimal) control that maximizes the expected gain E Z T 0 f(t, Xti, γt, αit)dt + g(X i T, γt) , 5

B(Rd) is the Borel σ-algebra on Rd.

6_{δ denotes the Dirac measure.}

7_{Glivenko-Cantelli Theorem states that, for all t ∈ [0, T ], the empirical distribution function of a measure}

(22)

Chapter 2. Mean Field Games

being the dynamics of Xi _{described by}

dX_ti= b(t, Xti, γt, αit)dt + σ(t, X i

t, γt, αit)dW i

t t ∈[0, T ]. (2.2.3)

This is a classical stochastic control problem, which can be solved by applying the tools discussed in the previous chapter.

After having found the (approximated) best response ˆαi _{to α}_α_α_{, the remaining issue is}

deter-mining αααsuch that ˆαi _{is equal to α itself. As one can easily infer from what we have said, this}

is true if and only if γtturns out to be equal to the law of ˆXti for every t ∈ [0, T ], being ˆXi the

system controlled by ˆαi_{. Since for each t ∈ [0, T ] the law of ˆ}_Xi

t ultimately depends upon γt(see

eq. (2.2.3)), this last problem can be typically solve by a fixed point argument.

The heuristic argument we have given in this section can be formalized using the theory of Propagation of Chaos (see for example [8, p. 246-251]), proving that a profile strategy ααα construed as we have described actually provides an ε-Nash equilibrium (i.e. an approximation of a Nash equilibrium). We don’t give here further details, since we won’t deal with this type of mean field games in the sequel.

2.3 Extended Mean Field Games

In this section we shall consider a particular class of games not included in the general description given in Section2.1, that we call extended mean field games.

Let us equip the set I, that may be either finite or not, with a σ-algebra I and a measure θ on I. For each I-measurable subset of players J ⊆ I, θ(J) evaluates the “impact” of the players belonging to J on the total system.

As for the classical mean field games, we suppose that every player has got the same state space Rd

and the same space of actions A ⊆ Rk_{, and that the Brownian motions (W}i t)t∈[0,T ]

have the same dimension m.

An extended mean field game is described by five continuous functions: b_{: [0, T ] × R}d_{× R × A −→}Y i∈I Rd, f : [0, T ] × Rd× R × A −→ Y i∈I R, σ: [0, T ] × Rd × R × A −→Y i∈I Rd×m, g: Rd× R −→ Y i∈I R, ϕ: [0, T ] × Rd × A −→ R. Player i aims at maximizing the expected gain

Ji(ααα) = E Z T 0 fi(t, Xi t, µt, αi)dt + gi(XTi, µt) , where µt= Z I×Ω ϕ(t, Xtj(ω), α j t(ω))dθ(j)dP (ω)

and XXX evolves according to the equations dX_ti= bi_{(t, X}i t, µt, αit)dt + σ i_{(t, X}i t, µt, αti)dW i t t ∈[0, T ], i ∈ I,

with certain initial conditions.

In order to find a Nash equilibrium for this kind of games, we can proceed like we did for the classical mean field games.

(23)

2.3. Extended Mean Field Games

Given a strategy profile ααα ∈ A, the best response of player i to αααis the strategy ˆαi_{∈ {α}i _:

(α−i_{, α}i_{) ∈ A} that maximizes the expected gain of player i for the system X}_X_X _{satisfying the}

equations ( dXi t = bi(t, Xti, µt, αit)dt + σi(t, Xti, µt, αit)dWti dX_tj= bj_{(t, X}j t, µt, αjt)dt + σ j_{(t, X}j t, µt, αjt)dW j t j ∈ I \ {i} t ∈[0, T ], (2.3.1) where µt= θ(i)E[ϕ(t, Xti(ω), α i t(ω))] + Z (I\{i})×Ω ϕ(t, Xtj(ω), α j t(ω))dθ(i)dP (ω). (2.3.2)

Again, this optimization problem is made difficult by the dependence of µtupon Xiand αi.

However, if the number of players is high and θ(i) 1 for all i, our intuition suggests that the dynamics of XXX does not change significantly by replacing µt with

Z

(I\{i})×Ω

ϕ(t, Xtj(ω), α j

t(ω))dθ(i)dP (ω) (2.3.3)

in the eq. (2.3.1), as well as inside the expected gain. This leads to a stochastic control problem, whose solution is likely to approximate the best response ˆαi _{to α}_α_α.

However, we don’t need to address the issue of evaluating the fairness of this approximation, as far as the measure θ on I is non-atomic (i.e θ(i) = 0 ∀i ∈ I). In this case the expressions given by eq. (2.3.2) and eq. (2.3.3) are exactly equal, therefore we are facing an actual stochastic control problem.

Once this problem is solved, in order to find a Nash equilibrium it is enough to choose ααα ∈ A such that ˆαi= αi_{. In the next chapter we shall see how to do this in a practical case.}

(24)

(25)

Chapter

3

Application to Optimal Trading

3.1 Description of the game

In this chapter we deal with the following problem: a set I of investors have to trade (i.e. buy or sell) shares of a given stock, within a terminal time T > 0.

At every time t ∈ [0, T ] each agent i ∈ I owns a certain portfolio (i.e. number of shares) Qi

t ∈ R , that starts with a given qi0 = Qi0 ∈ R (which is positive if investor i is a seller and

negative if he is a buyer). For every i ∈ I, the evolution of the portfolio of trader i can be viewed as a stochastic process Qi_{: [0, T ] × Ω −→ R on a probability space (Ω, F, P ). Q}i _{is supposed to}

have almost surely differentiable trajectories, and its derivative αtat time t, namely the “trading

speed” of agent i, can be completely decided by him. It is positive if agent i is buying at time t and negative if he is selling. Bear in mind that an investor i whose portfolio qi

0 at time t = 0 is

(let’s say) positive is not forced to keep a negative trading speed at any time t ∈ [0, T ].

Let us suppose that the probability space (Ω, F, P ) is equipped with a filtration (Ft)t∈[0,T ]

(with Ft ⊆ F for all t ∈ [0, T ]), satisfying the usual conditions (completeness and

right-continuity) and supporting a one-dimensional Brownian motion (Wt)t∈[0,T ].

The public price of the stock is described by a stochastic process S : [0, T ] × Ω −→ R, whose dynamics is influenced by an exogenous noise, formalized by the Brownian motion W , and by the “average” trading speed of all the investors, given by a continuous function µ ∈ C0_{([0, T ]).}

Indeed, if µ(t) is positive and high, meaning that at time t ∈ [0, T ] most agents are buying huge amounts of the stock, its public price is supposed to increase; vice-versa, if µ(t) is negative and high in absolute value, meaning that at time t ∈ [0, T ] most agents are selling, S is supposed to decrease.

This intuitive phenomenon is called “permanent market impact” (if we believe that the price is influenced by the recent values of the average trading speed as much as by the older ones) or “transient market impact” (if the effect of the average trading speed is supposed to decay over time) and its validity is widely confirmed by empirical studies, like [6], [4], [15] (the latter strongly suggesting the hypothesis of the transience).

In [7] the permanent market impact is supposed to be linear with respect to the average trading speed:

dSt= βµ(t)dt + σdWt t ∈[0, T ], (3.1.1)

(26)

Chapter 3. Application to Optimal Trading

However, empirical studies (like for example [4]) suggest that it may be concave, like in the Almgren–Chriss model (named after the seminal papers [5], [2], [3] by Almgren and Chriss):

dSt= b0(µ(t))dt + σdWt t ∈[0, T ],

where b0∈ C0([0, T ]) is a suitable concave function. 1

Other authors have proposed models for a transient linear price impact. For example, in [11] (by Fruth et al.)

St= s0+ Z t 0 β(r)µ(r) exp − Z t r ρ(r0_)dr0 dr+ σWt t ∈[0, T ], (3.1.2)

where s0= S0 is the initial price of the stock and β, ρ ∈ C0([0, T ]) are positive functions. In [1]

Alfonsi et al. consider the same model as Fruth et al., except for the fact that β is constant; furthermore, in [16] Obizhaeva and Wang assume that both ρ and β are constant.

The average trading speed µ can be formalized by introducing a σ-algebra I and a probability measure θ on the set I, and defining

µ_{(t) = E}θ×P[αααt] t ∈[0, T ], (3.1.3)

where ααα: [0, T ] × I × Ω −→ R is given by αααt(i, ω) = αit(ω) and is supposed to be progressively

measurable2 _{as process on (I × Ω, I ⊗ F).}

Actually, the investor i trades at a price different from S, because of the presence of a certain trading cost, which is supposed to be linear with the trading speed αi_{. Thus, the price of the}

stock for agent i is given by S + καi_{, being κ a positive parameter independent of i, given by}

the market. 3

Hence, the dynamics of the wealth Yi _{: [0, T ] × Ω −→ R of agent i is described by the}

differential equation

dY_ti= −αi

t(St+ καit)dt t ∈[0, T ]. (3.1.4)

Without loss of generality, we can suppose that the wealth of each investor starts with Yi 0 = 0.

Agent i has to choose the most appropriate trading speed in order to maximize the expected value of a certain quantity, given by the sum of:

• The wealth Yi

T at time T ;

• The remaining portfolio Qi

T at time T , multiplied by its “economic value” at time T

according to agent i. This value is not simply given by the public price ST, because this

portfolio will have to be sold (or to be bought, if Qi

T is negative) in the future, and this

will result in further costs and risks, that can be supposed to be linear with Qi

T. Thus,

the economic value of Qi

T is given by ST+ ψiQiT, where ψiis a positive parameter. Unlike

κ, this parameter can depend upon i, meaning that each player may have a different “risk aversion” (i.e. a different attitude towards the risk).

• A penalization of the form

−φi Z T

0

(Qit) 2_dt,

1_{However, as explained by Gatheral and Schied in [}₁₂_{], in this case b}

0 should actually be linear with µ, in

order to avoid price manipulations (i.e. dynamic arbitrage).

2_{With respect to the filtration (I ⊗ F} t)t∈[0,T ].

3_{This means that the effective price of the stock is greater than S for a buyer and less than S for a seller.}

(27)

3.2. Assumptions

where φi _{is another positive parameter depending upon i. Indeed, holding a large (in}

absolute value) portfolio over the interval [0, T ] is risky. Thus, the more agent i is averse to risk, the more he is interested in achieving a small value for the quadratic norm of Qi

over [0, T ].

Hence, the expected gain that agent i aims at maximizing is Ji(ααα) = E Y_Ti,ααα+ Qi,αTαα(S α α α T− ψ i Qi,α_Tαα) − φi Z T 0 (Qi,αtαα) 2 dt , (3.1.5)

where we have added the superscripts “ααα” to stress the dependence of S, Qi_{, Y}i _{upon α}_α_α_.

Note that this problem can be viewed as an extended mean field game. Here, the players are the investors. The private state of player i at time t ∈ [0, T ] is given by (St, Qit, Yti) ∈ R3,

the strategy of player i is the trading speed αi _{(with R as space of actions), the strategy profile}

is ααα, the expected gain is given by eq. (3.1.5). In this case, the Brownian motion Wi _{is the}

same for all the players. We would like to approach the problem following the idea described in Section2.3, so we require I to have the cardinality of the continuum and θ to be non atomic (i.e. θ(i) = 0 for all i ∈ I).

3.2 Assumptions

In Section 3.3and Section 3.4we shall reformulate the same results discussed in [7] by Cartal-iaguet and Lehalle, hence we shall deal with the same model for the permanent market impact as them, given by eq. (3.1.1).

Later on, we shall assume the following model for the evolution of the public price with market impact, which generalizes all those presented in the previous section:4

dSt= [b0(t, µ(t)) + Stb1(t)]dt + σ(t, µ(t))(St)γdWt t ∈[0, T ], (3.2.1)

where b0 ∈ C0([0, T ] × R), b1∈ C0([0, T ]), σ ∈ C0([0, T ] × R) are suitable continuous functions

and γ ∈ {0, 1,1 2}.

The dependence of the volatility upon µ(t) in eq. (3.2.1) is not justified by empirical results; nevertheless, it doesn’t seem totally unreasonable to reckon that the average trading speed might have some kind of effect on the volatility of the public price, so we include this possibility in our discussion.

We assume b0 to be uniformly Lipschitz continuous with respect to the second variable, i.e.

there exists a positive constant β such that

|b0(t, u1) − b0(t, u2)| ≤ β|u1− u2| (t, u1, u2) ∈ [0, T ] × R2. (3.2.2)

If γ ∈ {0, 1}, Proposition1.2.1ensures that for any µ ∈ C0_{([0, T ]) the equation (}_3.2.1_{), with}

initial condition S0= s0, admits a unique strong solution Stsatisfying

EP sup t∈[0,T ] (St)2 < ∞. (3.2.3) Otherwise, (if γ = 1

2), we need to assume that the above condition holds true.

Now let us define the class of admissible strategy profiles.

4_{Note that the model by Fruth et al. (eq. (}_3.1.2_{)), can be obtained from ours by choosing}

b0(t, u) = β(t)u + ρ(t)s0

b1(t) = −ρ(t)

(28)

Definition 3.2.1. We define the class of admissible strategy profiles as the set A of all the measurable processes ααα: [0, T ] × I × Ω −→ R such that:

• αααhas continuous (almost surely with respect to the product measure θ × P ) trajectories; • We have Eθ×P sup t∈[0,T ] |αααt| < ∞; (3.2.4)

• αi _{belongs to H}2 _{for every i ∈ I, i.e. it is progressively measurable and admits finite}

L2_{([0, T ] × Ω) norm.}

The first two conditions ensure, by Lebesgue’s Dominate Convergence Theorem, that the average trading speed µ, given by eq. (3.1.3), is well defined and continuous; therefore, the public price Sααα _{is well defined for every α}_α_{α ∈ A}_{and satisfies eq. (}_3.2.3_).

Besides, the third condition ensures that the state equations of each player i (

dQi t= αitdt

dYti= −αit(Sαααt + καit)dt

t ∈[0, T ], (3.2.5)

with initial condition (Qi 0, Y

i

0) = (q0,0), admit a unique strong solution (Qi,ααα, Yi,ααα), and that

the expected gain Ji_(α_α_{α) is well defined. Indeed, thanks to eq. (}_3.2.3_{) we have}

EP sup t∈[0,T ] (Qi,αtαα) 2 ≤ 2 |qi0| 2 + kαik22 , (3.2.6) EP sup t∈[0,T ] |Y_ti,ααα| ≤ kαi_k 2kSαααk2+ κkαik22, (3.2.7)

where k·k2 is L2([0, T ] × Ω) norm; thus

EP Y_Ti,ααα+ Qi,α_Tαα(Sααα T − ψ i_Qi,ααα T ) − φ iZ T 0 (Qi,αtαα) 2_dt ≤ ≤ EP |Y_Ti,ααα| + |Qi,αTαα(S α αα T − ψ i_Qi,ααα T )| + φ i Z T 0 (Qi,αtαα) 2_dt < ∞.

The last assumption we make is the following: the random variables ψψψ, φφφ, qqq000 : I −→ R are

I-measurable. 5 _Moreover, Eθ[(|qqq000| + 1)( p φ φ φ+ ψψψ)] < ∞. (3.2.8)

This assumption will be useful for the research of Nash equilibria.

3.3 Evaluation of the best response

In this section and in the next one, we consider eq. (3.1.1) as dynamics for the public price S. Let ααα ∈ A be fixed, so that the public price S = Sααα _{is fixed as well. We simply call}

µ(t) = Eθ×P_[α_α_α

t] the average trading speed at time t ∈ [0, T ].

5_ψ_ψ_{ψ = (ψ}i₎

i∈I and φφφ, qqq000 are defined in a similar fashion.

(29)

3.3. Evaluation of the best response

In this section we would like to evaluate the best response ˆαi _{of player i to α}_α_{α. This amounts}

to solving the stochastic control problem described by the state equations ( dQi t= αitdt dYi t = −αit(St+ καit)dt t ∈[0, T ], (3.3.1)

with initial condition (Qi

0, Y0i) = (qi0,0). The expected gain to maximize is Ji((α−i, αi)) and the

class of admissible controls is {αi _{: (α}−i_{, α}i_{) ∈ A}, which satisfies the axiom given in Section}_1.3_.

This is not a Markovian problem as it has been formulated above, due to the presence of the stochastic process S; however, it can be easily turned into a Markovian problem by adding eq. (3.1.1) to the state equations (3.3.1). This allows us to apply Theorem1.3.1, as soon as we obtain all the hypotheses that it requires.

The first step is finding a function wi _{∈ C}0

([0, T ] × R3_{) ∩ C}1,2 ([0, T ) × R3_{) satisfying the HJB} equation ∂wi ∂t + sup_a∈R βµ(t)∂w i ∂s + a ∂wi ∂q − a[s + κa] ∂wi ∂y + 1 2σ 2∂2wi ∂s2 − φ i_q2 = 0 (3.3.2)

over [0, T ) × R3_{, with terminal condition}

wi(T, s, q, y) = y + q[s − ψiq] (s, q, y) ∈ R3. (3.3.3) Then, we should determine νi_{: [0, T ] × R}3_{→ R measurable such that}

νi(t, s, q, y) ∈ arg max a∈R a∂w i ∂q (t, s, q, y) − a(s + κa) ∂wi ∂y (t, s, q, y) (3.3.4) over [0, T ) × R3_{. Finally, we should ensure that w}i_{, ν}i _{meet all the hypotheses of Theorem}_1.3.1_.

We shall follow the approach of Cardaliaguet and Lehalle (in [7]), looking for a function wi

of the form wi(t, s, q, y) = y + sq + 2κ hi₀(t) + qhi 1(t) + q2 2 h i 2(t) (t, s, q, y) ∈ [0, T ] × R3_, _(3.3.5) where hi

0, hi1, hi2∈ C1([0, T ]) are suitable functions to be determined later.

By plugging the above expression into eq. (3.3.4), we obtain νi(t, s, q, y) ∈ arg max a∈R 2κa[hi 1(t) + qh i 2(t)] − κa 2_, (3.3.6) which leads to νi(t, q) = hi1(t) + qh i 2(t) (t, q) ∈ [0, T ] × R. (3.3.7)

Moreover, plugging eq. (3.3.5) into eq. (3.3.2) and evaluating in a = νi_{(t, q) (given by eq.}

(3.3.7)), we get dhi0 dt (t) + q dhi1 dt (t) + q2 2 dhi2 dt (t) + βµ(t) 2κ q+ 1 2[h i 1(t) + qhi2(t)]2− φi 2κq 2_{= 0} (t, q) ∈ [0, T ] × R, (3.3.8) with terminal condition

hi₀(T ) + qhi1(T ) + q2 2 h i 2(T ) = − ψi 2κq 2_. (3.3.9)

(30)

Clearly the above equations hold if and only if each monomial in q is equal to zero. Thus, we obtain the following system of ordinary differential equations:

             dhi₂ dt (t) + h i 2(t) 2₋φ i κ = 0 dhi 1 dt (t) + βµ(t) 2κ + h i 1(t)h i 2(t) = 0 dhi 0 dt (t) + 1 2h i 1(t) 2_{= 0} t ∈[0, T ], (3.3.10)

with terminal condition

       hi₂(T ) = −ψ i κ hi1(T ) = 0 hi 0(T ) = 0. (3.3.11)

Lemma 3.3.1. There exists a unique hi₂ ∈ C1_{([0, T ]) with h}i

2(T ) = − ψi

κ satisfying the first

equation in (3.3.10). It is given by hi₂(t) = r φi κ p φi_{κ − ψ}i_{− (}p φi_κ_{+ ψ}i_{) exp[}2φi κ (T − t)] p φi_{κ − ψ}i_{+ (}p_φi_κ_{+ ψ}i_{) exp[}2φi κ (T − t)] , (3.3.12)

for all t ∈[0, T ]. Moreover: • If ψi_<p

φi_{κ, then h}i

2 is increasing and contained inside the interval

− r φi κ, − ψi κ ; • If ψi₌p φi_{κ, then h}i 2 is identically equal to − r φi κ; • If ψi_>p φi_{κ, then h}i

2 is decreasing and contained inside the interval

−ψ i κ, − r φi κ . Proof. The first equation in (3.3.10) is a Riccati equation and can be solved by following standard techniques, for which we refer to [13, p. 23-25].

However, the existence, uniqueness and monotony of the solution, as well as the bounds given above, can be easily inferred from well known results (see for example [9]), without explicitly solve the equation itself. Indeed, it is enough to notice that every solution y(x) of

y0(x) + y2_{(x) −}φ i

κ = 0 has positive derivative for |y| <

q

φi

κ, null derivative for y = ±

q

φi

κ and negative derivative for

|y| > q

φi

κ.

The last two equations in (3.3.10) are much simpler and lead to the solutions hi₁(t) = β 2κ Z T t drexp Z r t hi₂(r0)dr0 µ(r) t ∈[0, T ], (3.3.13) hi₀(t) = 1 2 Z T t hi₁(r)2_dr _{t ∈}_{[0, T ].}

Now we have all we need in order to apply Theorem1.3.1. 24

(31)

3.3. Evaluation of the best response

Proposition 3.3.2 (By Cardaliaguet and Lehalle). Let us define ˆαi_{= h}i

1+ hi2Qi, where hi1, hi2

are given by eq. (3.3.13), (3.3.12) and

Qi(t) = qi 0exp Z t 0 hi2(r)dr + Z t 0 drexp Z t r hi2(r0)dr0 hi1(r) t ∈[0, T ]. (3.3.14) ˆ

αi _{is the unique best response of player i to α}_α_{α in the class {α}i _{: (α}−i_{, α}i_{) ∈ A}, and Q}i ₌

Qi,(α−i, ˆαi₎

.

Proof. So far, we have ensured that wi _{and ν}i_{(as defined in eq. (}_3.3.5_{) and eq. (}_3.3.7_{)), satisfy}

the hypotheses (i) and (iii) of Theorem1.3.1. Hypothesis (ii) holds as well, since for any strategy αi _{such that (α}−i_{, α}i_{) ∈ A we have}

Furthermore, the system of equations ( dQi t= [hi1(t) + hi2(t)Qit]dt dYi t = −[hi1(t) + hi2(t)Qit][St+ κhi1(t) + κhi2(t)Qit]dt t ∈[0, T ], (3.3.15)

with initial condition (Qi

0, Y0i) = (qi0,0), admits clearly a unique strong solution, given by Qi as

defined in eq. (3.3.14) and Yti= −

Z t

0

dr[hi

1(r) + hi2(r)Qi(r)][Sr+ κhi1(r) + κhi2(r)Qi(r)] t ∈[0, T ];

thus hypothesis (iv) is verified.

Hypothesis (v) (i.e. the fact that (α−i_,_α_ˆi_{) ∈ A) follows by the definition of A. Indeed, the}

first two requirements in Definition3.2.1are satisfied because ααα ∈ A, and the third one is verified since ˆαi _{is deterministic and continuous.}

Finally, the uniqueness part of the proposition follows by the second part of Theorem1.3.1

by noting that νi_{(t, q) is the only maximum point of}

A 3 a 7−→2κa[hi

1(t) + qh i

2(t)] − κa 2

for all (t, q) ∈ [0, T ] × R, since this function is strictly concave.

The following result states a simple and nice relation between the derivative of the trading speed ˆαi_{(i.e. the second derivative of Q}i_{) with the current portfolio Q}i_{, when the best response}

(32)

Proposition 3.3.3. Qi _{satisfies the equation}

d2_Qi dt2 (t) + β 2κµ(t) − φi κQ i_{(t) = 0} _{t ∈}_{[0, T ],} _(3.3.16)

with boundary conditions

   Qi(0) = qi 0 dQi dt (T ) = − ψi κQ i (T ). (3.3.17)

Proof. By taking the time derivative on both sides of the first equation in (3.3.15) we get d2Qi dt2 (t) = dhi1 dt (t) + dhi2 dt (t)Q i_{(t) + h}i 2(t) dQi dt (t) t ∈[0, T ]; (3.3.18) we obtain eq. (3.3.16) by plugging the first equation in (3.3.15) and the first two equations of (3.3.10) into eq. (3.3.18).

Finally, we can derive the second equation in (3.3.17) by evaluating the first equation in (3.3.15) in t = T and using eq. (3.3.11).

3.4 Existence and uniqueness of Nash equilibrium

Given an arbitrary function µ ∈ C0_{([0, T ]), consider the strategy profile α}_α_α_{= h} 1

hh11+ hhh222QQQ, 6 where

for all i ∈ I hi

1, hi2, Qiare given by eq. (3.3.13), (3.3.12), (3.3.14). We want to show that ααα ∈ A.

Note that αi _{is deterministic and continuous for all i ∈ I, so it is enough to show that}

α

αα: [0, T ] × I −→ R is progressively measurable and that eq. (3.2.4) holds true. By eq. (3.3.13) and (3.3.14), we get

ααα(t) = qqq000hhh222(t) exp Z t 0 hhh222(r)dr + β 2κ Z T t drexp Z r t h2 hh22(r0)dr0 µ(r)+ +β 2κhhh222(t) Z t 0 drexp Z t r hhh222(r0)dr0 Z T r dr0exp Z r0 r hhh222(r00)dr00 µ(r0) t ∈[0, T ]. (3.4.1)

Being ψψψand φφφ I-measurable by our assumption, hhh222: [0, T ] × I −→ R is progressively

mea-surable. Being qqq000 I-measurable, we conclude from eq. (3.4.1) that αααis progressively measurable

as well.

Now note that, being hi

2negative (by Lemma3.3.1), all the exponential functions that appear

6_{Generally speaking, given functions h}i_{: [0, T ] −→ R for all i ∈ I, we use the bold notation h}_h_{h to denote the}

process defined on [0, T ] × I such that hhh(t, i) = hi_{(t) for all (t, i) ∈ [0, T ] × I.} 26

(33)

3.5. Extension to a more general dynamics

by eq. (3.2.8).

Therefore, we have constructed a well defined map Γ1 : C0([0, T ]) −→ A, which maps an

arbitrary µ ∈ C0_{([0, T ]) in the strategy profile α}_α_α_{given above. Besides, we can define a map Γ} 2:

A −→ C0_{([0, T ]) as well, by simply mapping an arbitrary strategy profile α}_α_α_{into µ ∈ C}0_{([0, T ])}

defined by µ(t) = Eθ×P_[α_α_α t].

As we showed in the previous section, the composition Γ1◦ Γ2 gives us the (unique) best

response to an admissible strategy profile.

Note that ˆααα ∈ Aˆˆ is a Nash equilibrium if and only if ˆαααˆˆ = Γ1◦ Γ2(ˆαααˆˆ), and this is true if and

only if ˆαααˆˆ = Γ1(ˆµ) for some ˆµ ∈ C0([0, T ]) such that ˆµ= Γ2◦ Γ1(ˆµ). Thus, if Γ2◦ Γ1 admits a

unique fixed point ˆµ, then there exists a unique Nash equilibrium in A, given by Γ1(ˆµ).

Proposition 3.4.1 (By Cardaliaguet and Lehalle7_{). If}

βT 2κ 1 + T Eθr φφφ κ∨ ψ ψ ψ κ <1, (3.4.3)

then there exists a unique Nash equilibriumααα, independent of σ.ˆˆˆ

Proof. As we have just said, it is enough to prove that Γ2◦ Γ1admits a unique fixed point. This

can be showed by applying Banach Fixed-Point Theorem in C0_{([0, T ]), equipped with the metric}

given by the infinity norm.

For every couple µ1, µ2∈ C0([0, T ]), we have

Γ2◦ Γ1(µ1) − Γ2◦ Γ1(µ2) = = Eθ β 2κ Z T t drexp Z r t h2 hh22(r0)dr0 [µ1(r) − µ2(r)]+ +β 2κhhh222(t) Z t 0 drexp Z t r hhh222(r0)dr0 Z T r dr0exp Z r0 r hhh222(r00)dr00 [µ1(r0) − µ2(r0)] . Thus, recalling that hhh222 is negative and using eq. (3.4.2), we get

kΓ2◦ Γ1(µ1) − Γ2◦ Γ1(µ2)k∞≤ ≤ βT 2κkµ1− µ2k∞ 1 + T Eθ sup t∈[0,T ] |hhh222(t)| ≤ βT 2κ 1 + T Eθr φφφ κ∨ ψ ψψ κ kµ1− µ2k∞.

Therefore, eq. (3.4.3) ensures that Γ2◦ Γ1 is a contraction.

Moreover, recall that the Nash equilibrium ˆαααˆˆ is given by Γ1(ˆµ), where ˆµis the fixed point of

Γ2◦ Γ1. Thus, since Γ2◦ Γ1 does not depend on σ, neither does ˆααα.ˆˆ

3.5 Extension to a more general dynamics

In this section we consider a more general dynamics for the public price S given by eq. (3.2.1). The following lemma clarifies the dynamics of the expectation E[St] of the public price.

7_{Cartaliguet and Lehalle state and prove this result in [}₇_{] without explicitly formulating the inequality}_3.4.3_.

(34)

Lemma 3.5.1. E[St] follows the dynamics

d dtE[St] = b0(t, µ(t)) + b1(t)E[St] t ∈[0, T ]; (3.5.1) thus E[St] = s0exp Z t 0 drb1(r) + Z t 0 drb0(r, µ(r)) exp Z t r dr0b1(r0) t ∈[0, T ]. (3.5.2) Proof. By eq. (3.2.1), St= s0+ Z t 0 [b0(r, µ(r)) + Srb1(r)]dr + Z t 0 σ(r, µ(r))(Sr)γdWr t ∈[0, T ]. (3.5.3)

From eq. (3.2.3) and very standard Itˆo’s Calculus results, we can infer that the process given by the second integral on the right hand side of eq. (3.5.3) is a martingale. So, taking the expectations on both sides, we get

E[St] = s0+

Z t

0

[b0(r, µ(r)) + E[Sr]b1(r)]dr t ∈[0, T ].

Our final goal is giving a sufficient condition for the existence and uniqueness of Nash equilib-rium also in this case. In order to achieve this, we shall follow the same procedure as in Section

3.3and3.4.

Let i ∈ I be fixed. As we did in Section 3.3, we would like to find suitable functions wi _{∈ C}0

([0, T ] × R3_{) ∩ C}1,2

([0, T ) × R3_{) and ν}i

: [0, T ] × R3

→ R in order to apply Theorem

1.3.1. The HJB equation for wi _{has now the following form:}

∂wi ∂t + sup_a∈R [b0(t, µ(t)) + sb1(t)] ∂wi ∂s + a ∂wi ∂q + −a[s + κa]∂w i ∂y + 1 2s 2γ_{σ(t, µ(t))}2∂2wi ∂s2 − φ i_q2 = 0 (t, s, q, y) ∈ [0, T ) × R3_, _(3.5.4)

with the same terminal condition as in eq. (3.3.3). Eq. (3.3.4) remains unmodified, too. We cannot choose wi _{as in eq. (}_3.3.5_{), because in that way the above equation would be}

unsolvable. Thus, we should try something more sophisticated. This time, we search for wi _of

the following form:

wi(t, s, q, y) = y + sq + 2κ hi₀₀(t) + shi 10(t) + qh i 01(t) + s2 2 h i 20(t) + sqh i 11(t) + q2 2 h i 02(t) , (3.5.5) for suitable functions hi

00, hi10, hi01, h20i , hi11, hi02∈ C0([0, T ]) to be determined later.

With this substitution, we get from eq. (3.3.4) νi(t, s, q) = hi 01(t) + sh i 11(t) + qh i 02(t). (3.5.6)

By plugging eq. (3.5.5) into eq. (3.5.4) and evaluating in a = νi_{(t, s, q), we obtain}

dhi₀₀ dt (t) + s dhi₁₀ dt (t) + q dhi₀₁ dt (t) + s2 2 dhi₂₀ dt (t) + sq dhi₁₁ dt (t) + q2 2 dhi₀₂ dt (t)+ +[b0(t, µ(t)) + sb1(t)][hi10(t) + shi20(t) + q 2κ+ qh i 11(t)]+ +1 2[h i 01(t) + sh i 11(t) + qh i 02(t)] 2 +1 2s 2γ_σ (t, µ(t))2hi₂₀(t) − φ i 2κq 2 = 0; 28

(35)

3.5. Extension to a more general dynamics

with terminal condition hi₀₀(T ) + shi 10(T ) + qh i 01(T ) + s2 2h i 20(T ) + sqh i 11(T ) + q2 2 h i 02(T ) = − ψi 2κq 2_.

By equating to zero the coefficients of the binomials in s, q, we obtain                                    dhi 02 dt (t) + h i 02(t) 2 −φ i κ = 0 dhi 11 dt (t) + [b1(t) + h i 02(t)]h i 11(t) + 1 2κb1(t) = 0 dhi 01 dt (t) + h i 02(t)h i 01(t) + b0(t, µ(t)) 1 2κ+ h i 11(t) = 0 dhi 20 dt (t) + [2b1(t) + δ(2, 2γ)σ(t, µ(t)) 2_]hi 20(t) + hi11(t)2= 0 dhi 10 dt (t) + b1(t)h i 10(t) + h i 01(t)h i 11(t) + [b0(t, µ(t)) + δ(1, 2γ)σ(t, µ(t))2]hi20(t) = 0 dhi 00 dt (t) + b0(t, µ(t))h i 10(t) + 1 2h i 01(t) 2_{+ δ(0, 2γ)σ(t, µ(t))}2_hi 20(t) = 0, t ∈[0, T ] (3.5.7) where δ denotes the Kronecker delta, with terminal condition

hi₀₀(T ) = hi 10(T ) = h i 01(T ) = h i 20(T ) = h i 11(T ) = 0, hi₀₂(T ) = −ψ i κ. (3.5.8)

All the above ordinary differential equations admit (unique) solutions, as one can infer by standard results, for which we refer to [9]. In particular, hi

02= hi2, given by eq (3.3.12). For our

purpose, we are particularly interested in: hi11(t) = 1 2k Z T t drb1(r) exp Z r t dr0hi2(r0) + Z r t dr0b1(r0) t ∈[0, T ], (3.5.9) hi₀₁(t) = Z T t drb0(r, µ(r)) 1 2κ+ h i 11(r) exp Z r t dr0hi₂(r0₎ t ∈[0, T ]. (3.5.10)

Now we are able to evaluate the best response of player i to an admissible strategy profile. Proposition 3.5.2. Given ααα ∈ A, let us call µ(t) = Eθ×P_[α_α_α

t] for all t ∈ [0, T ] and let us consider

the strategyαˆi= hi 01+ h i 11S+ h i 2Q i_{, where h}i 01, h i 11, h i

2are given by eq. (3.5.10), (3.5.9), (3.3.12)

and Qi_t= qi 0exp Z t 0 hi₂(r)dr + Z t 0 dr hi₀₁(r) + hi 11(r)Sr exp Z t r hi₂(r0)dr0 . (3.5.11) ˆ

αi _{is the unique best response of player i to α}_α_{α in the class {α}i_{: (α}−i_{, α}i_{) ∈ A}.}

Moreover Qi_{= Q}i,(α−i, ˆαi₎

and E sup t∈[0,T ] | ˆαi| < ∞. (3.5.12)

Stochastic Control and Mean Field Games with an application to Optimal Trading

University of Pisa

FACULTY OF MATHEMATICS

Stochastic Control and Mean Field Games

with an Application to Optimal Trading

Luca Minutillo Menga

Prof. Luciano Campi

Prof. Fabrizio Lillo

Prof. Marco Romito

Contents

Introduction

Chapter

1

Stochastic Control

1.1

An informal introduction

1.2

Dynamic Programming Principle and HJB Equation

1.3

Markovian dynamics under weaker assumptions

1.4

Risk aversion and utility functions

Chapter

2

Mean Field Games

2.1

Introduction and motivations

2.2

The classical approach

2.3

Extended Mean Field Games

Chapter

3

Application to Optimal Trading

3.1

Description of the game

3.2

Assumptions

3.3

Evaluation of the best response

3.4

Existence and uniqueness of Nash equilibrium

3.5

Extension to a more general dynamics