be the corresponding strategy vector. Let QN be the normalized occupation mea-sure associated with uN. More precisely, QN is theP2(Z)-valued random variable determined by setting, for B∈ B(X ), R ∈ B(R2), D∈ B(W),
QNω(B× R × D) (5.1)
=. 1 N
N i=1
δXN
i (·,ω)(B)· δρωN,i(R)· δWN
i (·,ω)(D), ω∈ N,
where (X1N, . . . , XNN)is the solution of the system of equations (3.1) under strategy vector uN, and ρN,i is the relaxed control associated with individual strategy uNi , i∈ {1, . . . , N}.
Convergence results will be obtained under the hypothesis that (T) ∃δ0>0: sup
N∈NEN
1 N
N i=1
ξiN 2+δ0+ T
0
uNi (t)2+δ0dt
<∞.
Whenever (T) holds, we will—as we may—suppose that δ0∈ (0, 1 ∧ T ].
REMARK 5.1. Condition (T) is automatically satisfied if the action space is compact and the initial states, that is, the random variables ξiN, N ∈ N, i ∈ {1, . . . , N}, are uniformly bounded.
LEMMA 5.1. If condition (T) holds, then the family (PN ◦ (QN)−1)N∈N is pre-compact inP(P2(Z)).
PROOF. We verify that condition (T) implies the pre-compactness of the fam-ily (PN ◦ (QN)−1)N∈N by using a suitable tightness function on P2(Z). For a function ψ on[0, T ] with values in Rd orRd1, let wψ(·, T ) denote the modulus of continuity of ψ on[0, T ], that is, the function
[0, ∞) h → wψ(h, T )=. sup
t,s∈[0,T ]:|t−s|≤hψ (t )− ψ(s) ∈ [0, ∞].
If ψ is continuous, then the modulus of continuity of ψ takes values in[0, ∞).
Then g is a tightness function onP2(Z); see AppendixC.2. It is therefore enough to check that condition (T) entails supN∈NEN[g(QN)] < ∞. By definition of QN By Lemma3.2and condition (T),
sup monotonicity of h→ h−α and Markov’s inequality (as well as Jensen’s inequal-ity), up-per bound for the above sums that does not depend on N , we employ estimates on the moments of the modulus of continuity of Itô processes; cf. Fischer and
Nappo (2010) and the references therein. Since W1N, . . . , WNN are standard d1 -dimensional Wiener processes, we have by Lemma 3 of that paper and Hölder’s inequality that there exists a finite constant ¯Cp,d1depending only on p and d1such that, for every i∈ {1, . . . , N}, every k ∈ N with k ≥ 1/T ,
Thanks to assumption (A3), Lemma3.2and condition (T), we have
sup
On the other hand, by Hölder’s inequality, 1
and, thanks to assumption (A3), Lemma3.1and condition (T),
sup
Recall that α=2(8δ+δ0 0) and p= 2 + δ0/2. It follows that, for some finite constant
where the infinite sum on the right-hand side above has a finite limit since p/2− α· p = (8 + 2δ0)/(8+ δ0) >1.
Below, we will use the symbol I to indicate the index set of a (convergent) subsequence; thusI is a subset of N with the natural ordering and #I = ∞.
LEMMA 5.2. Suppose that (Pn◦ ξinn
∗)n∈I converges in P2(Rd) to some ¯ν ∈ P2(Rd), where, for each n∈ I, i∗n ∈ {1, . . . , n}. Then there exists a sequence ( ¯ξn)n∈IofRd-valued random variables such that the following hold:
(i) for every n∈ I, ¯ξn is defined on (n,Fn), measurable with respect to
Let n∈ I. By definition of the square Wasserstein metric, d2(νn,¯ν)2= inf
α∈P(Rd×Rd):[α]1=νnand[α]2=¯ν
Rd×Rd|x − ˜x|2α(dx, d˜x).
The infimum in the above equation is attained; see, for instance, Theorem 1.3 (Kantorovich’s theorem) inVillani(2003), pages 19–20. Thus, there exists α∗n∈ P(Rd× Rd)such that[αn∗]1= νn,[αn∗]2= ¯ν and
d2(νn,¯ν)2=
Rd×Rd|x − ˜x|2α∗n(dx, d˜x).
Recall that ϑ1n, . . . , ϑnn are independent F0n-measurable random variables which are uniformly distributed on[0, 1] and independent of the σ -algebra generated by ξ1n, . . . , ξnn, W1n, . . . , Wnn. By Theorem 6.10 in Kallenberg (2001), page 112, on measurable transfers, there exists a measurable function ϕn: Rd × [0, 1] → Rd such that
Set ¯ξn= ϕ. n(ξinn
∗, ϑinn
∗). Then ¯ξnis σ (ξinn
∗, ϑinn
∗)-measurable, Pn◦ (¯ξn)−1= ¯ν, and En ξinn
∗ − ¯ξn2= d2(νn,¯ν)2, which tends to zero as n→ ∞.
LEMMA 5.3. Grant condition (T). Let (Qn)n∈I be a subsequence that con-verges in distribution to someP2(Z)-valued random variable Q defined on some probability space (,F, P). Set
μω(t)= Q. ω◦ ˆX(t)−1, t∈ [0, T ], ω ∈ .
Then for P-almost every ω∈ , μω∈ M2 and Qω is a solution of equation (4.2) with flow of measures μω. Moreover,
lim inf
In→∞
1 n
n i=1
Jinun≥
ˆJμω(0), Qω, μω
P(dω).
PROOF. By Lemma5.1, (PN◦ (QN)−1)N∈Nis pre-compact inP(P2(Z)). Let (Qn)n∈I be a subsequence that converges in distribution to some P2(Z)-valued random variable Q, defined on some probability space (,F, P). Set μω(t)=. Qω◦ ˆX(t)−1, t ∈ [0, T ], ω ∈ . Since Qω ∈ P2(Z) for every ω∈ , we have μω∈ M2for every ω∈ ; cf. Remark4.2above. By construction, ˆW (0)= 0 Qnω -almost surely for Pn-almost every ω∈ n. Convergence in distribution implies W (0)ˆ = 0 Qω-almost surely for P-almost every ω∈ .
In order to verify that Qω is a solution of equation (4.2) with flow of measures μω for P-almost every ω∈ , it suffices to check that condition (iii) of Defini-tion4.1holds. The proof of this fact is analogous to the proof of Lemma 5.2 in Budhiraja, Dupuis and Fischer(2012). Since the situation here is somewhat differ-ent, we give details in AppendixDbelow.
The asymptotic lower bound for the average costs is a consequence of a version of Fatou’s lemma [cf. Theorem A.3.12Dupuis and Ellis(1997), page 307] since, for every n∈ I,
1 n
n i=1
Jinun=
n
Z
×[0,T ]ft, ϕ(t), Qnω◦ ˆX(t)−1, γr(dγ , dt) + FT , ϕ(T ), Qnω◦ ˆX(T )−1 Qnω(dϕ, dr, dw)Pn(dω) and Qnω◦ ˆX(t)−1→ μ(t) in distribution as n → ∞.
REMARK5.2. Lemma5.3shows that, under condition (T), all limit points of the normalized occupation measures (QN)N∈Nare concentrated on those random variables that, with probability one, take values in the set of McKean–Vlasov so-lutions of equation (4.2). The mean field condition of Definition 4.3is therefore always satisfied.
In addition to (T), we will need the following weak symmetry condition on the costs:
∃ a sequence of indicesi∗NN∈Nwith i∗N ∈ {1, . . . , N} such that
sup
N∈NJiNN
∗
uN<∞ and lim sup
N→∞
1 N
N i=1
JiNuN≤ lim sup
N→∞ JiNN
∗
uN. (S)
REMARK5.3. Condition (S) is automatically satisfied if the cost coefficients f, F are bounded functions. If f , F are unbounded and the costs associated with uN are symmetric in the sense that, for every N , every i∈ {2, . . . , N}, J1N(uN)= JiN(uN), then thanks to assumption (A5) and Lemma3.1, condition (S) follows from condition (T).
THEOREM5.1. Let (εN)N∈N⊂ [0, ∞) be a sequence converging to zero. Sup-pose that (ξN)N∈N and (uN)N∈N are such that (T) and (S) hold and, for each N ∈ N, ξN = (ξ1N, . . . , ξNN) is exchangeable and uN is a local εN-Nash equilib-rium for the N -player game. Let (Qn)n∈Ibe a subsequence that converges in dis-tribution to someP2(Z)-valued random variable Q defined on some probability space (,F, P). If there is m∈ M2such that, for P-almost every ω∈ ,
Qω◦ ˆX(t)−1= m(t), t∈ [0, T ],
then (Qω, m) is a solution of the mean field game for P-almost every ω∈ .
We postpone the proof of Theorem5.1to the end of this section. The crucial hy-pothesis in Theorem5.1is the almost sure nonrandomness of the flow of measures induced by a limit random variable Q. Thus, under the rather general conditions (T) and (S), we prove convergence to solutions of a mean field game for subse-quences with limit random variable Q such that P◦ (Q ◦ ( ˆX(t))−1t∈[0,T ])−1= δm
for some m∈ M2. This condition is reminiscent of the characterization of prop-agation of chaos in the Tanaka–Sznitman theorem. The nonrandomness of the induced flow of measures is implied by the nonrandomness of the joint law of initial condition, relaxed control and noise process, that is, by the condition P◦ (Q ◦ ( ˆX(0), ˆρ, ˆW )−1)−1= δνfor some ν∈ P(Rd× R2× W). This condition, in turn, is satisfied if the initial states and individual strategies of each N -player game are independent and identically distributed, where the marginal distributions are allowed to vary with N .
COROLLARY 5.2. Let (εN)N∈N⊂ [0, ∞) be a sequence converging to zero.
Suppose that (ξN)N∈N and (uN)N∈N are such that (T) holds and, for each N ∈ N, uN is a local εN-Nash equilibrium for the N -player game and the ran-dom variables (ξ1N, uN1, W1N), . . . , (ξNN, uNN, WNN) are independent and identically distributed. Let (Qn)n∈Ibe a subsequence that converges in distribution to some P2(Z)-valued random variable Q defined on some probability space (, F, P).
Then Qω is a solution of the mean field game for P-almost every ω∈ .
PROOF. By distributional symmetry of the vectors of initial states and indi-vidual strategies, the costs are symmetric and condition (T) entails condition (S);
cf. Remark5.3above.
Let T ⊂ Cb(Rd × R2× W) be a countable and measure determining set of functions. Let (Qn)n∈Ibe a convergent subsequence with limit random variable Q on (,F, P). Let ∈ T , and set
m= E. P
EQˆX(0), ˆρ, ˆW, v= E. P
EQˆX(0), ˆρ, ˆW− m
2 , mn= E. n
EQnˆX(0), ˆρ, ˆW, n∈ I.
The mapping → dis continuous onP2(Z). By convergence of (Qn)to Qand the continuous mapping theorem,
v= limn→∞EnEQnˆX(0), ˆρ, ˆW− mn2
= limn→∞En
1 n
n i=1
ξin, ρn,i, Win− mn
2 ,
where ρn,iis the relaxed control random variable induced by uni. As a consequence of the i.i.d. hypothesis, the random variables (ξin, ρn,i, Win), i∈ {1, . . . , n}, are independent and identically distributed with common mean equal to mn. Since is bounded, it follows that v= 0. This implies
EQˆX(0), ˆρ, ˆW= m P-almost surely.
SinceT is countable, we have with P-probability one EQ
ˆX(0), ˆρ, ˆW= m for all ∈ T .
SinceT is also measure determining, it follows that there exists a measure ν ∈ P(Rd× R2× W) such that, for P-almost every ω ∈ ,
Qω◦ˆX(0), ˆρ, ˆW−1= ν.
On the other hand, we know by Lemma5.3that Qω∈ P2(Z) is a McKean–Vlasov solution of equation (4.2) for P-almost every ω∈ . Uniqueness of such solutions according to Lemma 4.2yields the existence of a measure ∈ P2(Z) such that Qω= for P-almost every ω ∈ . Let m ∈ M2be the flow of measures induced by . Then, for P-almost every ω∈ ,
Qω◦ ˆX(t)−1= m(t), t∈ [0, T ].
The assertion is now a consequence of Theorem5.1.
Existence of local approximate Nash equilibria as required in Corollary5.2is guaranteed, in particular, under the hypotheses of Proposition 3.1 above (com-pact action space, bounded coefficients). Suppose that (ξN) is such that, for each N ∈ N, ξN is a vector of independent and identically distributed ran-dom variables with common marginal mN0 ∈ P2(Rd) and that, for some δ0>0, supN∈N|x|2+δ0mN0 (dx) <∞. Then, by Proposition 3.1, there exists a corre-sponding sequence (uN) of local approximate Nash equilibria such that the hy-potheses of Corollary5.2are satisfied. In addition to the desired limit relation, we thus obtain a proof of existence of solutions for the mean field game. Note that ex-istence of solutions is just a by-product of our analysis; analogous exex-istence results can in fact be obtained by directly working with the mean field game; seeLacker (2015). The proof there is based, as in Proposition 3.1here, on relaxed controls and a version of Fan’s fixed-point theorem.
PROOF OFTHEOREM5.1. By hypothesis, Q◦ ˆX(·)−1= m(·) P-almost surely for some deterministic m∈ M2. In view of Lemma 5.3, it is enough to show that the pair (Qω, m) satisfies the optimality condition of Definition4.3with P-probability one. This is equivalent to showing that ˆJ (m(0), Qω; m) = ˆV (m(0); m) for P-almost all ω∈ .
Let ε > 0. Choose a function ψεm: [0, T ]×Rd×W → and a probability mea-sure mε ∈ P2(Z) according to Lemma4.3. Choose a sequence of indices (i∗n)n∈I according to condition (S). We will, as we may, assume that i∗n= 1 for every n ∈ I;
otherwise, renumber the components of the n-player games.
The proof proceeds in five steps. First, we construct a coupling for the initial conditions. In the second step, based on that coupling and the feedback function ψεm, we define a competitor strategy˜unthat differs from unonly in component one (= i∗n). As verified in step three, the associated normalized occupation measures have the same limit Q as the sequence (Qn). This is used in the fourth step to show that lim supn→∞J1n(˜un)≤ ˆV (m(0); m) + ε. Thanks to this upper limit, the local approximate Nash equilibrium property of un together with condition (S), and the asymptotic lower bound on the average costs from Lemma5.3, we establish optimality in the fifth and last step.
First step. By hypothesis, the sequence (Pn◦ (Qn)−1)n∈Iconverges to P◦ Q−1 in P(P2(Z)). By the choice of the metric on Z, the continuity of the map Z (ϕ, r, w)→ ϕ(0) ∈ Rd, and the mapping theorem [for instance, Theorem 5.1 in Billingsley(1968), page 30], we have that
P2(Z) → ◦ˆX(0)−1∈ P2
Rd
is continuous. This implies, again by the continuous mapping theorem, that Pn◦Qn◦ˆX(0)−1−1 n→∞−→ P ◦Q◦ˆX(0)−1−1 inPP2
Rd.
By construction and hypothesis, respectively, Qn◦ˆX(0)−1=1
n
n i=1
δξn
i while P◦Q◦ˆX(0)−1−1= δm(0). It follows that (n1ni=1δξn
i)n∈Iconverges to m(0) in distribution asP2(Rd)-valued random variables, where m(0) is deterministic. This convergence implies, in par-ticular, that
En
1 n
n i=1
ξin2
n−→→∞
Rd|x|2m(0)(dx).
By hypothesis, ξn= (ξ1n, . . . , ξnn)is exchangeable for every n∈ I. Convergence of the associated empirical measures, by the Tanaka–Sznitman theorem [for instance, Theorem 3.2 inGottlieb(1998), page 27], implies that
Pn◦ξ1n−1 n→∞−→ m(0) inPRd.
Actually, we have convergence inP2(Rd)since, by exchangeability, En ξ1n2= En
1 n
n i=1
ξin2
for every n∈ I,
and the expectations on the right-hand side above converge to the second moment of m(0). We are therefore in the situation of Lemma5.2, and we apply that result with the choice i∗n= 1 to obtain a sequence (¯ξn)n∈IofRd-valued random variables such that ¯ξnis σ (ξinn
∗, ϑinn
∗)-measurable, Pn◦(¯ξn)−1= m(0) and En[|ξ1n− ¯ξn|2] → 0 as n→ ∞.
Second step. Define a strategy vector ˜un= ( ˜un1, . . . ,˜unn)by setting, for (t, ω)∈ [0, T ] × n,
˜uni(t, ω)=.
&
ψεmt, ¯ξn(ω), W1n(·, ω) if i= 1,
uni(t, ω) if i∈ {2, . . . , n}.
Notice that ˜unis indeed a strategy vector for the game with n players. Moreover,
˜uni = uni for i∈ {2, . . . , n}, while ˜un1∈ H2((Ftn,1), Pn; ). Let ˜ρn,i be the relaxed control induced by ˜uni, i∈ {1, . . . , n}. Clearly, ˜ρn,i= ρn,i for i≥ 2. On the other hand, by construction and since ¯ξnand W1nare independent,
Pn◦¯ξn, ˜ρn,1, W1n−1= mε ◦ˆX(0), ˆρ, ˆW−1 for every n∈ I.
The law of ˜un1, in particular, does not change with n. It follows that sup
n∈IEn
T
0 ˜un1(t)2dt
<∞.
The coercivity assumption (A6) implies that there exists C > 0 such that for every n∈ I,
En
T 0
un1(t)2dt
≤ C1+ J1n
un.
By choice of the index i∗n= 1 according to (S), we have supn∈NJ1n(un) <∞.
Since En[|ξ1n|2] =n1ni=1En[|ξin|2] by exchangeability, it follows that
(5.3) sup
n∈IEnξ1n2+ T
0 un1(t)2+ ˜un1(t)2dt
<∞.
Third step. Let ( ˜Xn1, . . . , ˜Xnn) be the solution of the system of equations (3.1) under strategy vector ˜un, and let ˜μN denote the empirical measure process associ-ated with ( ˜X1n, . . . , ˜Xnn). Let ˜Qnbe the normalized occupation measure associated with ˜un, that is, theP2(Z)-valued random variable determined by
Q˜nω(B× R × D)=. 1 n
n i=1
δ˜Xn
i(·,ω)(B)· δ˜ρωn,i(R)· δWin(·,ω)(D), ω∈ n, B∈ B(X ), R ∈ B(R2), D∈ B(W). We are going to show that
(5.4) Q˜n n→∞−→ Q in distribution asP2(Z)-valued random variables.
Since Qn→ Q in distribution, it suffices to show that
dP(P2(Z))Pn◦Q˜n−1, Pn◦Qn−1n−→ 0.→∞
Let n∈ I. By construction, definition of the bounded Lipschitz metric, inequality (2.1) and Hölder’s inequality,
dP(P2(Z))Pn◦Q˜n−1, Pn◦Qn−1
= sup
G∈C(P2(Z)):GbLip≤1En
GQn− GQ˜n
≤ En
dP2(Z)
Qn, ˜Qn
≤
En
1 n
n i=1
dZXin, ρn,i, Win,˜Xin, ˜ρn,i, Win2
≤ 1
√n+
En
1 n
n i=1
sup
t∈[0,T ]Xin(t)− ˜Xin(t)2
,
where the last inequality follows by definition of dZ and from the fact that ρn,i= ˜ρn,ifor i∈ {2, . . . , n}. Using assumption (A2), Hölder’s inequality, Doob’s
maximal inequality, Itô’s isometry, inequality (2.1) and Fubini’s theorem, we find
Similarly, but also using assumption (A3), En con-sequence of (5.3), condition (T), and Lemma3.1.
Fourth step. We are going to show that
(5.5) lim sup
Let ¯Xn1 be the unique solution to
¯X1n(t)= ¯ξn+ t
0
bs, ¯X1n(s), m(s),˜un1(s)ds + t
0 σs, ¯X1n(s), m(s)dW1n(s), t∈ [0, T ].
Then, by uniqueness in law and construction, for every n∈ I, ˆJm(0), mε; m
= En
T 0
ft, ¯X1n(t), m(t),˜un1(t)dt+ F¯Xn1(T ), m(T ).
Using assumption (A2), Hölder’s inequality, Itô’s isometry and Fubini’s theorem, we find that for every t∈ [0, T ],
En ˜X1n(t)− ¯X1n(t)2
≤ 3En ξ1n− ¯ξn2+ 6(T + 1)L2En
T 0
d2
˜μn(s), m(s)2ds
+ 6(T + 1)L2 t
0 En ˜X1n(s)− ¯Xn1(s)2ds.
The limit relation (5.4) implies that (˜μn(0))n∈Iconverges to m(0) in distribution as P2(Rd)-valued random variables and that, by uniform integrability thanks to Lemma3.2and condition (T),
sup
t∈[0,T ]En
d2
˜μn(t), m(t)2n−→ 0.→∞
By choice of the random variables ¯ξnaccording to Lemma5.2, En ξ1n− ¯ξn2n−→ 0.→∞
Therefore, by Gronwall’s lemma, sup
t∈[0,T ]En ˜Xn1(t)− ¯Xn1(t)2n→∞−→ 0.
Thanks to assumption (A4) and Hölder’s inequality, J1n˜un− ˆJm(0), mε; m
≤ En
T 0
ft, ˜Xn1(t),˜μn(t),˜un1(t)− ft, ¯X1n(t), m(t),˜un1(t) dt
+ En F˜X1n(T ), ˜μn(T )− F¯Xn1(T ), m(T )
≤√
10L(1+√
T ) sup
t∈[0,T ]En ˜Xn1(t)− ¯Xn1(t)2+ d2
˜μn(t), m(t)21/2
× sup
t∈[0,T ]En
1+ ˜Xn1(t)2+ ¯X1n(t) 2+ d2
˜μn(t), δ0
2 + d2
m(t ), δ0
21/2 .
By (5.3) together with Lemma3.1and an analogous estimate applied to ¯X1n, and since supt∈[0,T ]d2(m(t), δ0)2<∞ by continuity, we have
sup
n∈I sup
t∈[0,T ]En ˜Xn1(t)2+ ¯X1n(t) 2+ d2
˜μn(t), δ0
2 + d2
m(t ), δ0
2
<∞.
It follows that J1n(˜un)→ ˆJ(m(0), mε; m) as n → ∞, which establishes (5.5).
Fifth step. The limit relation (5.5) and the choice of mε imply that lim sup
j→∞ J1Nj˜uNj≤ ˆVm(0); m+ ε.
By hypothesis, unis a local εn-Nash equilibrium. By construction, ˜undiffers from unonly in component number one (= i∗n), and ˜un1 is (Ftn,1)-adapted. Therefore,
J1nun≤ J1n˜un+ εn.
By choice of the index 1= i∗naccording to (S) and since εn→ 0 by hypothesis, lim sup
n→∞
1 n
n i=1
Jinun≤ lim sup
n→∞ J1nun≤ lim sup
n→∞ J1n˜un. It follows that
lim sup
n→∞
1 n
n i=1
Jinun≤ ˆVm(0); m+ ε.
On the other hand, thanks to the second part of Lemma5.3, lim inf
n→∞
1 n
n i=1
Jinun≥
ˆJm(0), Qω, mP(dω).
It follows that
ˆJm(0), Qω, mP(dω)≤ ˆVm(0); m+ ε.
Since ε > 0 was arbitrary and ˆJ (m(0), Qω, m)≥ ˆV (m(0); m) for every ω ∈ by definition of ˆV, we conclude that
ˆJm(0), Qω, m= ˆVm(0); m for P-almost all ω∈ . REMARK 5.4. The proof of Theorem 5.1 gives some insight into why the assumption that the limit flow of measures m is deterministic cannot simply be dropped. In the second step of the proof, we define a competitor strategy ˜un1 for the deviating player (player one after relabeling) in terms of the noise feedback function ψεm. In general, for any t∈ [0, T ], ψεm(t,·, ·) depends on m through its values for all times, not only through its values up to time t . Therefore, if m were random, even taking for granted the measurable dependence of ψεmon m, we might
end up with a nonadapted competitor strategy. Indeed, the natural choice for ˜un1, namely ˜un1(t, ω)= ψ. εμnω(·)(t, ¯ξn(ω), W1n(·, ω)), would in general yield a -valued process that would not be an admissible strategy for player one in the n-player game.
APPENDIX A: PROOF OF LEMMA4.1, SECOND PART
Let ∈ P(Z) be a solution of equation (4.2) with flow of measures m in the sense of Definition4.1. Using the local martingale property of Mfmfor f a mono-mial of first or second order as in the proof of Proposition 5.4.6 in [Karatzas and Shreve(1991), pages 315–316], we find that, under and with respect to the fil-tration (Gt):
• ˆW is a d1-dimensional vector of continuous local martingales with ˆW (0)= 0 and quadratic covariations
ˆWl, ˆW˜l(t) = t · δl,˜l, l, ˜l∈ {1, . . . , d1};
• ¯X= ˆX− ˆX(0)−. ×[0,·]b(s, ˆX(s), m(s), γ )ˆρ(dγ, ds) is a d-dimensional vector of continuous local martingales with quadratic covariations
¯Xj, ¯Xk(t) = t
0
σ σTj ks, ˆX(s), m(s)ds, j, k∈ {1, . . . , d};
• ˆW, ¯X have quadratic covariations
¯Xk, ˆWl(t) = t
0 σkl
s, ˆX(s), m(s)ds,
where k∈ {1, . . . , d}, l ∈ {1, . . . , d1}.
The local martingale property also holds with respect to the filtration (Gt+); see the solution to Problem 5.4.13 in Karatzas and Shreve (1991), pages 318–319, 392, and Remark 4.2 in Budhiraja, Dupuis and Fischer(2012). By Lévy’s char-acterization of Brownian motion [for instance, Theorem 3.3.16 in Karatzas and Shreve(1991), page 157], we see that ˆW is a standard Wiener process with respect to (Gt+). As a consequence, the process
Y (t)=. t
0
σs, ˆX(s), m(s)d ˆW (s), t∈ [0, T ],
is well defined and a d-dimensional vector of continuous local martingales [under
with respect to (Gt+)] with quadratic covariations
Yj, Yk(t) = t
0
σ σTj ks, ˆX(s), m(s)ds, j, k∈ {1, . . . , d},
Yj, ˆWl(t) = t
0
σj l
s, ˆX(s), m(s)ds, j ∈ {1, . . . , d}, l ∈ {1, . . . , d1}.
The quadratic covariations between the components of the vectors of continuous local martingales ¯X, Y are given by [cf. Proposition 3.2.24 inKaratzas and Shreve (1991), page 147]
Yj, ¯Xk(t) =
d1
l=1
t 0
σj ls, ˆX(s), m(s)d ¯Xk, ˆWl(s)
= t
0
σ σTj ks, ˆX(s), m(s)ds, j, k∈ {1, . . . , d}.
It follows that ¯X− Y is a d-dimensional vector of continuous local martingales with ¯X(0)= 0 = Y (0) and quadratic covariations
¯Xj − Yj, ¯Xk− Yk = ¯Xj, ¯Xk − Yj, ¯Xk − ¯Xj, Yk + Yj, Yk ≡ 0.
This implies [cf. Problem 1.5.12 in Karatzas and Shreve (1991), page 35] that
¯X = Y -almost surely, which establishes the solution property.
APPENDIX B: PROOF OF LEMMA4.3 Fix m∈ M2, and set, for (t, x, γ )∈ [0, T ] × Rd× ,
bm(t, x, γ )= b. t, x, m(t), γ, σm(t, x)= σ. t, x, m(t), fm(t, x, γ )= f. t, x, m(t), γ, Fm(x)= F. x, m(T ).
Thanks to assumptions (A1), (A2), (A4) and the continuity of m, we have that bm, σm, fmare continuous in the time and control variable, uniformly over compact subsets ofRd, bm, σmare globally Lipschitz continuous in the state variable, uni-formly in the other variables, and fm, Fmare locally Lipschitz continuous in the state variable, uniformly in the other variables, with local Lipschitz constants that grow sublinearly in the state variable.
The function ψεm will be constructed based on the principle of dynamic pro-gramming applied in discrete time. To this end, we first introduce an original con-trol problem corresponding to the minimal costs ˆV (·, m), then we build a sequence of approximating optimal control problems by successively restricting the set of admissible strategies. The proof proceeds in six steps.
First step. LetU be the set of all quadruples ((, F, P), (Ft), ρ, W )such that the pair ((,F, P), (Ft))forms a stochastic basis satisfying the usual hypotheses, W is a d1-dimensional (Ft)-Wiener process, and ρ is an (Ft)-adaptedR2-valued random variable such that E[×[0,T ]|γ |2ρ(dγ , ds)] < ∞. For simplicity, we may write ρ∈ U instead of ((, F, P), (Ft), ρ, W )∈ U. Given any ρ ∈ U, (t0, x)∈ [0, T ] × Rd, the stochastic integral equation
X(t)= x +
×[0,t]bm
t0+ s, X(s), γρ(dγ , ds) (B.1)
+ t
0
σm
t0+ s, X(s)dW (s), t∈ [0, T − t0],
has a unique solution X= Xt0,x,ρ, that is, X is the unique (up to indistinguisha-bility with respect to P) Rd-valued (Ft)-adapted continuous process that sat-isfies (B.1) with P-probability one. Although the solution X of equation (B.1) starts in x at time zero, it corresponds to the solution of equation (4.2) starting in x at time t0. Define the costs associated with strategy ρ and initial condition (t0, x)∈ [0, T ] × Rd by
Jm(t0, x, ρ)= E.
×[0,T −t0]fm
t0+ s, X(s), γρ(dγ , ds)+ Fm
X(T − t0),
where X= Xt0,x,ρ. The corresponding value function Vmis given by Vm(t, x)= inf.
ρ∈UJm(t, x, ρ),
which is well defined as a measurable function [0, T ] × Rd → [0, ∞). Actually, Vmis continuous. For x∈ Rd, ρ∈ U, set
x,ρ= P ◦. X0,x,ρ, ρ, W−1.
Then x,ρ is a solution of equation (4.2) with flow of measures m and Jm(0, x, ρ)= ˆJδx, x,ρ; m.
Conversely, in view of Lemma4.1and thanks to Assumption (A6), any ∈ P(Z) with ˆJ (δx, ; m) < ∞ induces a strategy ρ ∈ U such that x,ρ= . It follows that Vm(0, x)= ˆV (δx; m) for every x ∈ Rdand, by conditioning on the initial state at time zero,
RdVm(0, x)m(0)(dx)= ˆVm(0); m.
Second step. The function Vm(0,·) is locally Lipschitz continuous. To be more precise, choose c0>0, 0⊂ according to (A6), and let r0>0 be such that 0⊂ {γ ∈ Rd2: |γ | ≤ r0}. We are going to show that there exists a constant C1∈ (0, ∞) depending only on K, L, T , m, r0 and c0such that
(B.2) Vm(0, x)− Vm(0,˜x) ≤ C1(1+ R)|x − ˜x| whenever|x| ∨ | ˜x| ≤ R.
To establish (B.2), set, for ε > 0, R > 0,
Uε,R=. ρ∈ U : Jm(0, x; ρ) ≤ Vm(0, x)+ ε for some x with |x| ≤ R. Then for all x,˜x ∈ Rd with|x| ∨ | ˜x| ≤ R,
Vm(0, x)− Vm(0,˜x) ≤ inf
ε>0 sup
ρ∈Uε,R
Jm(0, x; ρ) − Jm(0,˜x; ρ).
Let x,˜x ∈ Rd, ρ∈ U and let X, ˜X be the solutions of (B.1) under ρ with initial state x and ˜x, respectively. Using Hölder’s inequality, Jensen’s inequality, Itô’s
isometry, Fubini’s theorem, assumption (A2) and Gronwall’s lemma, we find that there exists a constant CL,T depending only on L, T such that
sup
t∈[0,T ]E X(t)− ˜X(t)2≤ CL,T|x − ˜x|.
Reusing the same tools but with assumption (A3) in place of (A2) (also cf.
Lemma 3.1), we find that there exists a constant CK,T ,m depending only on K, T, and on m [through supt∈[0,T ]|y|2m(t )(dy), which is finite since m is Thanks to the above estimates and assumption (A4), we have that there exist a con-stant CL,T ,mdepending only on L, T , and m, and a constant CK,L,T ,m depending
By the same estimates as above, but using (A5) instead of (A4), we find that there exists a constant ˜CK,T ,mdepending only on K, T , m such that, for all x∈ Rd,
This implies that there exists a constant CK,T ,m,depending only on K, T , m, and on (through minγ∈|γ |2) such that, for all x∈ Rd,
Vm(0, x)≤ CK,T ,m,
1+ |x|2.
Let ρ∈ Uε,Rfor some ε > 0. Choose x∈ Rd with|x| ≤ R such that Jm(0, x; ρ) ≤ Vm(0, x)+ ε (possible by definition of Uε,R). By the coercivity assumption (A6),
Jm(0, x; ρ) ≥ c0E
(\0)×[0,T ]|γ |2ρ(dγ , dt)
,
hence c0E
(\0)×[0,T ]|γ |2ρ(dγ , dt)
≤ CK,T ,m,
1+ R2+ ε.
By construction, E
×[0,T ]|γ |2ρ(dγ , dt)
≤ T · r02+ E
(\0)×[0,T ]|γ |2ρ(dγ , dt)
. It follows that there exists a constant CK,T ,m,c0,r0 depending only on K, T , m, c0
and on r0(clearly, minγ∈|γ |2≤ r02) such that sup
ρ∈Uε,R
E
×[0,T ]|γ |2ρ(dγ , dt)
≤ CK,T ,m,c0,r0(1+ R +√ ε).
This establishes (B.2).
Third step. For M∈ N, set M = {γ ∈ : |γ | ≤ M}. For M big enough, say. M ≥ M0, M is nonempty. Choose γ0 ∈ M0, and set M = {γ. 0} if M < M0. Then, for every M∈ N, M is compact (and nonempty) and M ⊂ M+1. Set
UM =. ρ∈ U : ρM× [0, T ]= T P-almost surely,
and let Vm,M be the value function defined with respect toUM instead ofU. We claim that
(B.3) Vm,M(0,·)M V→∞ m(0,·) uniformly over compact subsets ofRd. Notice that, by construction, Vm,M(0,·) ≥ Vm,M+1(0,·) ≥ Vm(0,·) for every M ∈ N. By Step 2, we know that Vm(0,·) is locally Lipschitz. Repeating the arguments of Step 2 (notice that UM ⊂ U by definition), we find that inequality (B.2) also holds for Vm,M(0,·) in place of Vm(0,·) and that the constant C1 can be cho-sen independently of M ∈ N. To establish (B.3), it is therefore enough to check that point-wise convergence holds. Fix x ∈ Rd. It suffices to show that given ρ∈ U there exits a sequence (ρ(M))⊂ U such that ρ(M)∈ UM for every M and Jm(0, x; ρ(M))→ Jm(0, x; ρ) as M → ∞.
Let ρ∈ U. For M ∈ N, let ρ(M)∈ UM be such that for every B∈ B(), every I∈ B([0, T ]),
ρ(M)(B× I) = ρ(B∩ M)× I+ ρ(\ M)× I· δγ0(B).
This determines a unique strategy ρ(M)∈ UM. Clearly, ρ(M)comes with the same stochastic basis as ρ. If (˙ρt)is a version of the time derivative process associated with ρ [thus, ρ(dγ , dt)= ˙ρt(dγ ) dt], then a version of the time derivative process of ρ(M)is given by
˙ρt(M)(dγ )= 1M(γ )· ρt(dγ )+ ρt(\ M)· δγ0(dγ ).
Let X, X(M) be the solutions of (B.1) under ρ and ρ(M), respectively. Thanks to Hölder’s inequality, Jensen’s inequality, Itô’s isometry, Fubini’s theorem and assumption (A2), there exists a constant CL,T depending only on L, T such that, for every t∈ [0, T ],
E X(t)− X(M)(t) 2
≤ CL,T
t 0
E X(s)− X(M)(s)2ds + CL,TE
×[0,t]bm
s, X(s), γρ(M)− ρ(dγ , ds)
2 . Using the definition of ρ(M), Hölder’s inequality and assumption (A3), we find that, for some constant CK,T ,mdepending only on K, T and m,
E
×[0,t]bms, X(s), γρ(M)− ρ(dγ , ds)
2
≤ 2T E T
0
\M
bm
s, X(s), γ 2˙ρs(dγ ) ds
+ 2E
ρ(\ M)× [0, T ]· T
0
bm
s, X(s), γ0 2
ds
≤ CK,T ,mEρ(\ M)× [0, T ]·(1+ sup
r∈[0,T ]X(r) 2*
+ CK,T ,mE
×[0,T ]1\M(γ )· |γ |2ρ(dγ , ds)
.
By (A3) and the usual estimates, including Gronwall’s lemma, we have E[supr∈[0,T ]|X(r)|2] < ∞. Since ρω is a measure with total mass T for every ω∈ , we have ρ(( \ M)× [0, T ]) → 0 as M → ∞ P-almost surely. This implies, by dominated convergence,
Eρ(\ M)× [0, T ]·(1+ sup
r∈[0,T ]X(r)2*M−→ 0.→∞
On the other hand, E[×[0,T ]|γ |2ρ(dγ , ds)] < ∞ by definition of U. This means that
E
×[0,T ]1\M(γ )· |γ |2ρ(dγ , ds)
M−→ 0.→∞
An application of Gronwall’s lemma now yields
E X(t)− X(M)(t)2M−→ 0.→∞
This convergence together with assumption (A5) (and an estimate completely anal-ogous to the one above) implies that Jm(0, x; ρ(M))→ Jm(0, x; ρ) as M → ∞.
Fourth step. Choose a family (M,k)M,k∈N of finite subsets of such that
M,k ⊂ M,k+1 ⊂ M, M,k ⊂ M+1,k, and min˜γ∈M,k|γ − ˜γ| ≤ 1/k for any γ ∈ M. Let UM,k be the set of all ρ∈ U such that ρ is the R2-valued ran-dom variable induced by a M,k-valued adapted process that is piecewise con-stant in time with respect to the equidicon-stant grid of step size T · 2−k. Thus,
M,k ⊂ M,k+1 ⊂ M, M,k ⊂ M+1,k, and min˜γ∈M,k|γ − ˜γ| ≤ 1/k for any γ ∈ M. Let UM,k be the set of all ρ∈ U such that ρ is the R2-valued ran-dom variable induced by a M,k-valued adapted process that is piecewise con-stant in time with respect to the equidicon-stant grid of step size T · 2−k. Thus,