Asymptotics of Transportation Cost for Occupation Measures of Fractional Brownian Motion

(1)

Dipartimento di Matematica

Corso di Laurea Magistrale in Matematica

Tesi di Laurea Magistrale

Asymptotics of Transportation Cost for

Occupation Measures of Fractional

Brownian Motion

25/09/2020

Candidato: Relatori:

Francesco Mattesini

Dario Trevisan

Martin Huesmann

Controrelatore:

Marco Romito

(2)

Introduction

The aim of this thesis is to give new insights on a continuous version of the matching problem. These kind of problems recently grew in popularity, and their formulations requires tools both from mathematical analysis and probability theory. This dissertation is aimed to be as much self containment as possible. To this end, for most of the theorems discussed, a proof will be provided.

In the next sections we describe the matching type problem investigated and the main result achieved. Since the aim of the next sections is to give an overview of the whole work, a reader unfamiliar with probability and mathematical analysis may encounter some problems on understanding either the setting of the problem, either its formulation. Therefore, before going into the next section, we suggest to skim through Chapter 1 and Chapter 2 where a review on optimal transportation theory and on fractional Brownian motion is given.

0.1 The Matching Problem and Known Results

The matching problem has many formulations. The main idea is to understand how one or more random samples match together. It is rather straightforward to understand what we mean by matching if we consider two random samples given by two sequences of random variables (Xi)n_i=1,(Yi)n_i=1. In this case, the bipartite matching problem consists on finding the optimal

matching between the two samples by choosing σ ∈ Sn in such a way that the sum of the

distances between each Xi and Yσ(i) is minimized.

Our aim instead is to consider a fractional Brownian motion BH _{taking values on the d-dimensional}

torus and study the asymptotic rates of convergence of the quantity E h W1 µT, TLd Td i , (1) where Ld

Tdis the Lebesgue measure restricted to the torus, and µT is the occupation measure

of BH _{defined by} µT = Z T 0 δ_BH t dt, where δx(y) =    1 if x = y 0 otherwise

is the Dirac’s delta. In this case the word matching means how much the occupation measure of the fractional Brownian motion is close to the Lebesgue measure with respect to the 1-Wasserstein metric. This seems rather abstract, but if we think of the torus as the cube [0, 1]d_{with boundaries}

identified, then (1) represents how much area (if in dimension d = 2), or volume (if in dimension dgreater than 2), of the cube is "visited" by the process.

(4)

The asymptotic rates of convergence of (1), as we will see in Chapter 3 and 4 depend on H and on the dimension of the ambient space. Problem (1) has been recently studied by Feng-Yu Wang and Jie-Xiang Zhu in [WZ19]. They showed that, if X = (Xt)t∈[0,T ] is a Markovian

Gaussian process taking values on the d-dimensional torus, then, naming hT the occupation

measure of X, it holds for T → ∞ that E " W2 1 ThT,L d Td 2# ∼        1 T if d ≤ 3 log T T if d = 4 T−d−22 if d ≥ 5 , (2)

where with the symbol ∼ we mean that f (T ) ∼ g (T ) if and only if there exist two constants a, b ∈ R such that ag (T ) ≤ f (T ) ≤ bf (T ) as T → ∞.

As we will see in Section 1.4 W1(µ, ν) ≤ Wp(µ, ν) for any p ≥ 1, µ, ν ∈ P

Td

and since the total mass of µT is T , in our setting, equation (2), reads, for T → ∞,

E h W1 µT, TLd Td i ∼        √ T if d ≤ 3 √ Tlog T if d = 4 T · T−d−21 if d ≥ 5 . (3)

Therefore, in the case H = 1/2, since B1

2 is a Brownian motion, and thus a Markovian Gaussian

process, we expect to get the same rate as in (3).

Remark 1. To be more precise in [WZ19] it has been considered the case of Gaussian process X taking values on a d-dimensional connected compact Riemmanian manifold (M, ρ) endowed with a measure µ, such that it admits a density with respect to the Lebesgue measure of the form dµ (x) = eV (x) _dx_{, where V ∈ C}2_{(M) and µ is the invariant measure for the process. In}

particular, calling hT the occupation measure of X, the rates in (2) become

E " W2 1 ThT, µ 2# .        1 T if d ≤ 3 log T T if d = 4 T−d−22 if d ≥ 5 .

Moreover, they showed that in dimension 3 and greater than or equal 5, the asymptotic upper bound is also an asymptotic lower bound. At the time of the writing of this thesis it is still an open problem whether the upper bound obtained in dimension 4 is also a lower bound in the general case of a connected compact Riemmanian manifold. However, it has been showed that the asymptotic upper bound rate in dimension 4 is also an asymptotic lower bound rate in the case of M = T4 _{and V ≡ 0, that is the 4-torus endowed with the Lebesgue measure restricted to}

the 4-torus L4

T4.

The techniques used in [WZ19] strongly rely on the existence of an infinitesimal generator and on some nice hypothesis on it, i.e. ultracontractivity. However, whenever H 6= 1/2, the fractional Brownian motion BH _{is not a Markov process. Thus we need to tackle the problem in a}

different way. The idea behind the techniques we are going to use is to consider a modified version of the occupation measure that is near in the 1-Wasserstein sense to the original occupation measure, but that admits a density with respect to the Lebesgue measure. In general, this is not always true, thus we need to smooth the starting measure in order to achieve the existence of a density with respect to the Lebesgue measure. The most natural way to do that is to consider a convolution with respect to a regularising kernel. Once a density is guaranteed, then it will play the role that the infinitesimal generator played in the case of Markovian Gaussian processes, and we also make use of explicit Gaussian computations. These arguments will be developed in Chapter 3 and then refined in order to achieve a lower bound in Chapter 4.

(5)

0.2 Main Result

As in (3) we can predict to get three different rates for (1) depending on d. In the Markovian Gaussian case the turning point was d = 4, now it will be replaced by a function depending on H. The main result representing the asymptotic rates of convergence of (1) can be resumed in the following.

Theorem 1. Let BH =B_tH

t∈[0,T ]be a fractional Brownian motion with Hurst index H ∈ (0, 1).

Then for T → ∞ 1. If d < 1 H + 2 then E h W1 µT, TLd Td i ∼√T . 2. If d = 1 H + 2 then E h W1 µT, TLd Td i ∼pTlog T . 3. If d > 1 H + 2 then E h W1 µT, TLd Td i ∼ T · T−dH−1H _. 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 3 4 5 6 7 8 9 10 11 12

Figure 1: Plot of d = 1/H + 2 for different values of H.

As announced, Thereom 1 identifies three different behaviours of the transport cost for different values of H. Moreover, as we can see in Figure 1, we can divide the plane in three areas identified by the "critical" values d = 1/H + 2. The upper area, where the latter case of Theorem 1, is the wider. This suggest that as long as the dimension becomes larger, than the rate of convergence becomes slower. Note that whenever d < 1/H + 2 then in the rate there is no contribution of H, indeed the dependence by H is only implicit on d. In this case, as we will see, the rate depends only on how close the smoothed measure we consider is to the occupation measure.

Now turning back on (3), let us compare it to the result of Theorem 1. If H = 1/2 then BH _{is a}

standard Brownian motion, which we recall being a Markovian Gaussian process. Then, from Theorem 1 we get E h W1 µT, TLd Td i ∼p Tlog T , and if d > 4 we get, E h W1 µT, TLd Td i ∼ T · T−d−21 _,

which are exactly the rates obtained in (3). Thus, our result is consistent with the results obtained in [WZ19].

Furthermore, note that if we fix the dimension d, then playing with the Hurst index H we can get a faster rate if H becomes smaller. Indeed, choosing carefully H we can move from the latter asymptotic rate in Theorem 1 to the first. This property is influenced by the regularity of the

(6)

fractional Brownian motion, which depends on H. As can be seen in Figure 2 the less regular BH is, the more area it covers. The more area BH covers, then the more its occupation measure is similar to the Lebesgue measure restricted to the d-torus.

(a) fBm on R3with H = 0.3. (b) fBm on R3 with H = 0.8.

Figure 2: Simulation of fBm on R3_.

The proof of Theorem 1 will be split into two parts: the upper bound asymptotic rates will be discussed and proved in Chapter 3 while the lower ones in Chapter 4. The techniques that will be discussed are based on a simple argument. Indeed, the main idea is to work with a smoothed version of the occupation measure, assuring us that this new measure is close enough to the original one. For instance, for the upper bound case, by a simple triangle inequality we can estimate from above(1) by studying separately the rates of the 1-Wasserstein distance between the occupation measure and its smoothed version, and between the smoothed measure and the Lebesgue measure on the d-torus. The first term can be estimated in a deterministic way, see Lemma 3. The second one, instead, one is more delicate. This can be approached via PDE as done in [AST16] and explicit rates can be gained by an analytical approach via Fourier analysis as done in [BL19].

(7)

1

Optimal Transport

Historically, the optimal transport problem was firstly introduced by Monge in 1781 [Mon81]. He was considering the problem of transporting a given quantity of material from a given configuration, déblais to another pattern, remblais, minimizing the energy needed to move the mass. In the last two centuries the popularity of this problem has grown and also its formulation has changed: the first step in this direction was made by Kantorovich in 1942 [Kan42], but it was with Brenier in [Bre91] that a new way towards a beautiful interplay between partial differential equations, fluid mechanics, probability theory and functional analysis was paved.

Optimal transport has also a key role on describing the bipartite matching problem. Indeed, the most natural way to measure the distance between a given measure and a sample measure is by mean of the Wasserstein distance, thus the necessity to introduce this theory in the studying of the matching problem.

This chapter is divided into four sections, the first one, mainly introductory, describes a family of general spaces with which we will work on; the second one and the third are devoted to the description of the optimal transport problem via a primal and a dual point of view; the fourth and last section introduces the Wasserstein distance, that will be our trustworthy tool to evaluate distances between measures.

The topics reviewed in this chapter are presented in a similar way to [San15] and [Vil03]. Moreover, another great reference on optimal transport theory is the monograph [Vil08].

1.1 Polish Spaces

Polish spaces are one of the most used settings in modern probability theory. Their usefulness derives from their simple, yet general, definition, that allows to work with powerful results. The main one, which is the one that will allow us to obtain an existence result for the optimal transport problem is a Prokhorov’s Theorem. Such Theorem provides a handy characterization of sequentially compactness.

Let us start with some well-known results. Let (X, d) be a metric space.

Definition 1. A sequence (xn)_n∈N is called Cauchy sequence if for every ε > 0 there exists

n ∈ N such that

d(xm, xn) < ε,

(8)

We say that a metric space (X, d) is complete if every Cauchy sequence admits a limit belonging to X. Recall that for every metric space (X, d), there exists a complete metric space

ˆ

X, ˆd, such that (X, d) is a subspace ofX, ˆˆ dand X is dense in ˆX. This space is unique, up to isometry, and is called the completion of (X, d). Clearly, ˆX is separable if and only if X is separable.

Definition 2. A topological space (X, τ) is completely metrizable if it admits a compatible

metric d such that (X, d) is complete. A completely metrizable space is said to be Polish space. In other words a Polish space is a topological space that admits a metric with respect to which it is complete.

Example 1. The topological space (0, 1) endowed with its usual topology is a polish space even

if, when endowed with its usual metric it is not complete. Indeed, consider on (0, 1) the distance d: (0, 1) × (0, 1) → R

(x, y) 7→ |x − y| , and consider the sequence (xn)_n∈N=

1 − 1

n

n∈N. Note that xn∈(0, 1) for every n. Fix ε > 0,

then let n be such that 1/n < ε/2, so that we have d(xn, xm) = 1 − 1 n − 1 − 1 m ≤ 1 n + 1 m < ε, hence (xn)_n∈N is a Cauchy sequence. On the other hand,

lim

n→∞xn= 1 /∈ (0, 1) ,

so (0, 1) endowed with the previous metric is not continuous. Now consider the function f: (0, 1) → −π 2, π 2 t 7→ −π 2 + tπ,

and let g (t) = arctan (f (t)). Note that g : (0, 1) → R is an homeomorphism between (0, 1) and R that is complete with its usual metric. Thus, (0, 1) is complete if endowed with the distance

dg(x, y) = |g (x) − g (y)| .

So (0, 1) is a Polish space.

Even from this easy example, should be clear that every well-behaved space is a Polish space. Moreover, their importance in measure theory is due to an explicit characterization for the precompactness of a family of probability measures. Before stating the key result of compactness in Polish space, which is the so called Prokhorov’s Theorem, let us recall some definition.

Definition 3 (Weak Convergence). Given a Polish space (X, τ). We say that a sequence of

probability measures (µn)_n∈N⊂P (X) weakly converges to a probability measure µ if

µn(f) → µ (f)

for every bounded and continuous f : X → R. We will make use of the notation µn* µto say

(9)

In the definition above we used the notation: µ(f) =

Z

X

f(x) dµ (x) .

We now introduce the concept of tightness for a family of probability measures. In particular, what the Prokhorov’s Theorem states is that this property is equivalent to being precompact in a Polish space.

Definition 4 (Tightness). Let (µi)i∈I be a family of probability measures on the Polish space

(X, τ). If for every ε > 0, there exists a compact K ⊂ X such that, for every µ ∈ (µi)i∈I, the

inequality µ (K) ≥ 1 − ε holds, then we say that the family (µi)_i∈I is tight.

Example 2. Every finite family of probability measures is tight. Indeed, let µ1, . . . , µn be the

finite family of probability measures, then since each single probability measure is tight for every ithere exists a compact set Ki such that µi(Ki) ≥ 1 − ε. tSetting K = K1∪ · · · ∪ Knyields the

thesis.

Theorem 2 (Prokhorov’s Theorem). Let (X, τ) be a Polish space and let (µi)i∈I ⊂ P (X).

Then the following are equivalent:

1. The family of probability measures (µi)i∈I is precompact in the space P (X) equipped with

the weak topology.

2. The family of probability measures (µi)_i∈I is tight.

This well-known result can be found in many books, for instance see [Shi96], where an easy proof in the case of X replaced by the real line is provided. This proof can be easily tailored to the case of Polish spaces. A deeper study of Polish spaces and Prokhorov’s Theorem can be found in Chapter 6 and 7 of the manoscript [Bog07a], where a detailed survey on measure theory is given.

1.2 Monge and Kantorovich Formulations

As said at the very beginning of the chapter, the starting point of the optimal transportation theory was the Monge’s problem, which can be stated as the following.

Problem 1 (Monge Formulation). Given two probability measures µ ∈ P (X) and ν ∈ P (X),

and a cost function c: X × Y → [0, +∞], solve infM(T ) = Z X c(x, T (x)) dµ (x) : T : X → Y, T#µ= ν , where we recall that T#µ is defined through (T#) (A) = µ T−1(A)

for every A. Moreover, we call T#µthe image measure of µ through T . Sometimes T#µis also called the pushforward of µ

through T .

The formulation made by Monge has some issues due to the constraint on the map T . A generalization of the Monge problem was proposed by Kantorovich in [Kan42].

Problem 2 (Kantorovich Formulation). Given µ ∈ P (X) , ν ∈ P (Y ), and c: X × Y → R,

consider the following minimization problem infK(γ) = Z X×Y c dγ : γ ∈ Π (µ, ν) , (1.1)

(10)

where Π (µ, ν) is the set of the transport plans, namely Π (µ, ν) =n

γ ∈P (X × Y ) : (πx)_#γ = µ, (πy)_#γ = ν

o

,

where πx and πy are the projection of X × Y , respectively, on X and on Y . A minimizer of the

Kantorovich problem is called optimal transport plan.

Note that the infimum of the Kantorovich problem is smaller than the one of the Monge’s. Indeed, if T : X → Y is a transport map, we can define a canonical morphism between the transport map and the transport plan as

T 7→ γT,

where γT = (1 × T )#µ. in particular γT is a transport plan and

Z X c(x, T (x)) dµ (x) = Z X×Y c(x, y) γT (x, y) .

Therefore the infimum of the Kantorovich problem is always smaller than the one of the Monge problem. The Kantorovich formulation is handier than the Monge’s one, indeed the constraints due to working with transport maps are replaced by the milder one of working with transport plans. Here is an intuitive interpretation of this statement. As we said in the introduction, the Monge problem deals with transporting masses from one place to another fixing the destination T(x) of each particle starting from each point x. What the Kantorovich formulation tells us is not focused on the displacement of each particles, indeed, this time we are specifying for every couple (x, y) how some particles goes from x to y. More precisely, the quantity γ (A × B) denotes the amount of mass that moves from A to B.

We now focus on investigating the well posedness of the Kantorovich problem, thus we provide sufficient assumptions for which the problem

Ic(µ, ν) = inf γ∈Π(µ,ν)

Z

X×Y

c(x, y) dγ (x, y) admits a solution. A first existence theorem can be stated as follows.

Theorem 3. Let X and Y be compact metric spaces, µ ∈ P (X) , ν ∈ P (Y ), and c: X ×Y → R

a continuous function. Then there exists finite Ic(µ, ν).

The assumptions of Theorem 3 are quite strong, thus we put this theorem aside and provide an existence theorem for the case of X, Y being two Polish spaces and c being lower semicontinuous. Essentially, the only hypothesis that we need to borrow from Theorem 3 are the compactness of Π (µ, ν) and the lower semicontinuity of the functional K (γ), therefore by Weierstrass’s Theorem a lower semicontinuous functional attains its minimum on a compact set, thus there exists finite Ic(µ, ν). Fortunately, the semicontinuity of K (γ) is inherited by the semicontinuity of the cost

function c.

Theorem 4. Let X and Y be two Polish spaces and c: X × Y → [0, ∞] be a lower

semicontin-uous cost function. Then K (γ) is lower semicontinsemicontin-uous with respect to the weak topology on P (X × Y ).

Proof. Let dX and dY be two distances compatible with the topology of X and Y respectively.

For every k ∈ N set

ck(x, y) = inf x0_∈X,y0_∈Y

(11)

Then for each k ∈ N, ck is a continuous function bounded by k and the sequence (ck)_k∈N

converges monotonically from below to c. Now consider a sequence (γn)_n∈N⊂P (X × Y ) and

π ∈P (X × Y ) such that γn* γ. Then

lim inf_n→∞ K(γn) = lim inf n→∞ Z X×Y c dγn≥lim inf n→∞ Z X×Y ck dγn= Z X×Y ck dγ.

By monotone convergence Theorem, we have lim inf n→∞ K(γn) ≥ limk→∞ Z X×Y ck dγ = Z X×Y c dγ= K (γ) , which is the semicontinuity of the functional K (γ).

It remains to show that the set Π (µ, ν) is compact with respect to the weak topology, this result can be easily achieved thanks to Prokhorov’s Theorem.

Theorem 5. Let X and Y be two polish spaces and let µ ∈ P (X) , ν ∈ P (Y ). Then Π (µ, ν)

is compact in P (X × Y ) with respect to the weak topology.

Proof. Thanks to Theorem 2, it is enough to show the tightness of the family. Thanks to the converse implication of Theorem 2, every single measure is tight, hence for every ε > 0 there exists two compact sets KX ⊂ X and KY ⊂ Y such that

µ(X \ KX) <

ε

2 µ(Y \ KY) <

ε 2.

Since the product of two compact sets is compact, the set KX × KY is compact in X × Y and,

for any γ ∈ Π (µ, ν), we have

γ((X × Y ) \ (KX× KY)) ≤ γ ((X \ KX) × Y ) + γ (X × (Y \ KY))

= µ (X \ KX) + ν (Y \ KY)

< ε.

Given these two results we are now ready to state the existence result regarding the Kan-torovich problem.

Theorem 6. Let X and Y be two Polish spaces, µ ∈ P (X) , ν ∈ P (Y ), and c: X×Y → [0, +∞]

a lower semicontinuous function. Then there exists finite Ic(µ, ν).

Proof. Thanks to Theorem 4 the functional K (γ) is lower semicontinuos, furthermore, thanks to Theorem 5 Π (µ, ν) is compact with respect to the weak topology, thus by Weierstrass Theorem there exists finite Ic(µ, ν).

1.3 Duality Theory

It is well-known that a linear minimization problem with convex constraints, like (1.1), admits a dual formulation. Essentially, this provides a new point of view of the problem and somehow a handy formulation to work with. This section will be devoted to provide the following Theorem.

Theorem. If X, Y are two Polish spaces and c: X × Y → R ∪ {+∞} is lower semicontinuous

and bounded from below, then the Kantorovich’s duality formula holds: minZ X×Y c dγ : γ ∈ Π (µ, ν) = supZ X ϕ dµ+ Z Y ψ dν : ϕ ∈ Cb(X) , ψ ∈ Cb(Y ) : ϕ ⊕ ψ ≤ c , where ϕ ⊕ ψ denotes the function (ϕ ⊕ ψ) (x, y) : = ϕ (x) + ψ (y).

(12)

In order to do that we will introduce the theory of c-duality. Before going on we give an informal interpretation of the Theorem above, which was given by L. Caffarelli and was quoted by C. Villani in [Vil03].

Problem (The shipper’s problem). Suppose that a entrepreneur that is both an industrialist

and a mathematician needs to transfer a huge amount of coal from his mines to his factories. He can hire trucks to do this transportation problem, but he has to pay them c (x, y) for each ton of coal which is transported from place x to place y. The amount of coal that he can extract from each mine, and the amount which each factory should receive are fixed. As he is trying to solve the associated Kantorovich problem in order to minimize the price he has to pay, another mathematician pops up and tells him "My friend, let me handle this for you: I will ship all your coal with my own trucks and you won’t have to worry about what goes where. I will just set a price ϕ (x) for loading one ton of coal at place x, and a price ψ (y) for unloading it at destination y. I will set the prices in such a way that your financial interest will be to let me handle all your transportation! Indeed, you can check very easily that for any x and y, the sum ϕ(x) + ψ (y) will always be less than the cost c (x, y) (in order to achieve this goal, I am even ready to give financial compensations for some places, in the form of negative prices!)". Clearly the entrepreneur accepts the deal. Now, what the Kantorovich duality formula tells is that if the shipper is clever enough, then he can arrange the prices in such a way that the entrepreneur will pay him (almost) as much as he would have been ready to spend by the other method.

Given these premises we are now ready to go into the technical aspects of the duality formula above, thus we start by introducing the c-conjugate of a function defined on R.

Definition 5 (c-conjugate). Given a function χ: X → R, we define its c-conjugate (sometimes

called c-transform) as the function χc_{: Y → R defined as}

χc(y) = inf

x∈Xc(x, y) − χ (x) .

Given a function ζ : Y → R, we define its ¯c-conjugate as the function ζ¯c_{: X → R defined as}

ζc¯(x) = inf

y∈Yc(x, y) − ζ (y) .

We will say that a function ψ defined on Y is ¯c-concave if there exists χ such that ψ = χc_,

analogously, we will say that a function ϕ defined on X is c-concave if there exists a function ζ such that ϕ = ζ¯c_.

We denote by c − conc (X) and ¯c − conc (Y ) respectively the set of c-concave and ¯c-concave functions.

Note that whenever X = Y , and the cost function c is symmetric, then there is no distinction between c-concavity and ¯c-concavity.

Let us now introduce a dual problem for the Kantorovich problem and let us investigate the relationship between these two problems. Firstly, by mean of an exchange inf − sup let us state formally a dual problem. Let M+(X × Y ) be the space of finite positive measures defined on

X × Y. We express the constraint γ ∈ Π (µ, ν) as follows: if γ ∈ M+(X × Y ), then we have

sup ϕ,ψ Z X ϕ(x) dµ (x) + Z Y ψ(y) dν (y) − Z X×Y (ϕ (x) + ψ (y)) dγ (x, y) =    0 if γ ∈ Π (µ, ν) +∞ otherwise ,

where the sup is taken among continuous and bounded functions ϕ, ψ. Thus, in the formulation of the Kantorovich problem, the constraints on γ can be removed if we add the previous sup.

(13)

Indeed, if γ ∈ Π (µ, ν), then nothing has been added, otherwise, we get +∞, which is avoided by the minimization. So we now look at the problem,

sup γZ X×Y c dγ+ min ϕ,ψ Z X ϕ dµ+ Z Y ψ dν − Z X×Y (ϕ + ψ) dγ,

and consider swapping sup and inf, thus we get: min ϕ,ψ Z X×Y c dγ+ sup γ Z X ϕ dµ+ Z Y ψ dν − Z X×Y (ϕ + ψ) dγ.

We would like that the two optimization problem above are the same. This is not always true, but the equality holds under suitable assumptions. If we now focus in the maximization over (ϕ, ψ), we can replace the inf in γ with a constraint on ϕ and ψ:

inf γ≥0 Z X×Y(c − ϕ ⊕ ψ) dγ =    0 if ϕ ⊕ ψ ≤ c on X × Y −∞ otherwise ,

where ϕ ⊕ ψ denotes the function (ϕ ⊕ ψ) (x, y) : = ϕ (x) + ψ (y). The equality holds, indeed, if ϕ ⊕ ψ > c somewhere, then consider the measure γ concentrated on the set where this strict inequality holds, with mass tending to ∞, then the integral tends to −∞. This remarks allow us to introduce the following dual optimization problem.

Problem 3(Dual Problem). Given µ ∈ P (X) , ν ∈ Y , and a cost function c: X ×Y → [0, +∞)

consider the problem maxZ X ϕ dµ+ Z Y ψ dν : ϕ ∈ Cb(X) , ψ ∈ Cb(Y ) : ϕ ⊕ ψ ≤ c . (1.2)

Note that the sup of the dual problem (1.2) is less than or equal to the min of the Kantorovich problem. Indeed, integrating the condition ϕ ⊕ ψ ≤ c according to γ, we obtain

Z X ϕ dµ+ Z Y ψ dν = Z X×Y (ϕ ⊕ ψ) dγ ≤ Z X×Y c dγ.

And the latter holds for ever admissible pair (ϕ, ψ), and for every admissible γ, thus the inequality (1.2) ≤ (1.1).

We now focus on equivalent formulation of the problem above. In particular, it has been shown that the solutions to (1.2) can be of the form (ϕ, ψ), where ϕ is c-concave and ψ is ¯c-concave. This relation will give an equivalent formulation, that can be used easily, whenever the cost c is replaced with a distance function.

Before we state the result, let us outline some remarks. Firstly, the notation of c-concavity implies a bound on the modulus of continuity. Recall the following result.

Proposition 1. Let (fα)_α be a family of functions defined on a metric space (X, d), satisfying

the following condition

|f_α(x) − f_α(y)| ≤ ω (d (x, y)) . Let f : = infαfα(x). Then also f satisfies the same estimate.

Proof. By hypothesis fα(x) − fα(y) ≤ ω (d (x, y)), then

f(x) ≤ fα(x) ≤ fα(y) + ω (d (x, y)) (1.3)

Taking the inf on the right-hand-side and reordering we get f(x) − f (y) ≤ ω (d (x, y)) , thus the thesis exchanging the role of x and y in (1.3).

(14)

In our setting, if c is continuous and finite on a compact set, hence uniformly continuous, there exists an increasing funciton ω : R+→ R+ with ω (0) = 0 such that

|c(x, y) − c (t, z)| ≤ ω (d (x, t) + d (y, z)) . So when we take χc_{(y) = inf}

xgx(y) with g (y) := c (x, y) − χ (x), and the functions gx satisfy

|gx(y) − gx(z)| ≤ ω (d (y, z)). Thus χc shares the same continuity modulus also.

Given a pair (ϕ, ψ) in the maximization problem (1.2), one can prove that the pair can always be replaced with (ϕ, ϕc_{) and then with ϕ}c¯c_{, ϕ}c¯

, and the constraints are preserved and the integrals increased. Actually we can go on, but we will show that ϕc¯cc_{= ϕ}c _{for any ϕ. The goal}

of these transformations is to improve the maximizing sequence and to get a uniform bound on its continuity.

Before going on we recall the Ascoli-Arzelà Theorem.

Theorem 7 (Ascoli-Arzelà Theorem). If X is a compact metric space and fn: X → R, n ∈ N is

a family of functions such that

• the family is equicontinuous, i.e., for every ε > 0 there exists a common δ > 0 such that |fn(x) − fn(y)| < ε for every x, y ∈ X such that d (x, y) < δ and for all n ∈ N;

• the family is equibounded, i.e. there exists a common constant K such that |fn(x)| ≤ K

for all x ∈ X and all n ∈ N;

Then the sequence (fn)_n∈N admits a uniformly converging subsequence (fnk)_k∈N to a function

f: X → R.

In order to have a more fluent reading we omit the proof of this well-known result, but we just suggest the evergreen [Rud86] where a proof is given.

Proposition 2. Let X, Y be two compact polish spaces and suppose that c: X × Y → R is a

continuous function. Then there exists a solution (ϕ, ψ) to the dual problem (1.2), and the pair solution is such that ϕ is c-concave and ψ is ¯c-concave, with ψ = ϕc_{. In particular, we can write}

maxZ X ϕ dµ + Z Y ψ dν : ϕ ∈ Cb(X) , ψ ∈ Cb(Y ) : ϕ ⊕ ψ ≤ c = max ϕ∈c−conc(X) Z X ϕ dµ+ Z Y ϕc dν

Proof. Giving a maximizing sequence (ϕn, ψn) we can improve it by mean of c- and ¯c-transforms.

By doing this, we can consider a uniform bound on the modulus of continuity, so that the modified sequence is equicontinuous. Let us use the notation (ϕn, ψn) for the improved sequence

by mean of the c- and ¯c-transforms. If the minimizing sequence (ϕn, ψn) is equibounded, then

by Ascoli-Arzelà Theorem there exists converging subsequence. Since the functions ϕn are

continuous on a compact space, then they are bounded and hence we can suppose without loss of generality that min ϕn= 0. Indeed, it is possible to add a constant to ϕn and subtract the same

constant to ψ, in this way the value of the functional does not change and the constraints are not affected. Hence, we get max ϕn≤ ω(diam X). Thus, if we have chosen ψn= ϕcn, then we have

ψn(y) = infxc(x, y) − ϕn(x) ∈ [min c − ω (diam X) , max c]. Thus, we have obtained uniform

bounds on ϕn and ψn and so we can apply Ascoli-Arzelà Theorem. Passing to the subsequence

that uniform converges, we can state tht ϕn→ ϕ and ψn→ ψ for n → ∞. Moreover, thanks to

the uniform convergence

Z X ϕn dµ+ Z Y ψn dν → Z X ϕ dµ+ Z Y ψ dν

(15)

as n → ∞. Furthermore, due to uniform convergence

ϕn(x) + ψn(y) ≤ c (x, y) ⇒ ϕ (x) + ϕ (y) ≤ c (x, y) .

Hence, (ϕ, ψ) is an admissible pair for the dual problem (1.2) and so it is optimal.

Proposition 3. Let us suppose that c is a real valued function. For every ϕ : X → R ∪ {−∞}

we have ϕc¯c_{≥ ϕ}_{. Moreover, the equality ϕ}c¯c_{= ϕ holds if and only if ϕ is c-concave; in general}

ϕc¯c _{is the least c-concave function greater than ϕ.}

Proof. Let us start by verifying that ϕc¯c_{≥ ϕ}_{. By using the definition, we get}

ϕc¯c= inf y c(x, y) − ϕ ¯ c_{(y) = inf} y c(x, y) − infx˜ c(˜x, y) + ϕ (˜x) . Now, inf ˜ x c(˜x, y) − ϕ (˜x) ≤ c (x, y) − ϕ (x) so ϕc¯c≥inf y c(x, y) − c (x, y) + ϕ (x) = ϕ (x) .

Analogously, we have ζ¯cc_{≥ ζ} _{for every ζ : Y → R ∪ {−∞}.}

Finally let us show that ϕc¯c_{= ϕ if and only if ϕ is c-concave. In this case, we have}

ϕ= ζ¯c⇒ ϕc= ζ¯cc≥ ζ ⇒ ϕc¯c≤ ζ¯c= ϕ.

The last inequality is obtained by noting that c-transform and ¯c-transform exchange the inequal-ities between functions. This also shows that in this case ϕc¯c_{≤ ϕ} _{and so ϕ}c¯c_{= ϕ.}

Now let us verify that for every ϕ, the function ϕc¯c_{is the least c-concave function greater than}

ϕ. Let ˜ϕ= χ¯c be a c-concave function and let us suppose that ˜ϕ ≥ ϕ. Then χ¯c≥ ϕ ⇒ χ¯cc≤ ϕc⇒ χ ≤ ϕc⇒ϕ˜= χ¯c≥ ϕc¯c, and this shows the inequality stated by the proposition.

Definition 6. Let c: X×Y → R∪{+∞} be a given function. A Borel set Γ ⊂ X×Y is c-cyclically

monotone if for every permutation σ ∈ Sn and every family of pairs (x1, y1) , . . . , (xn, yn) ∈ Γ we

have k X i=1 c(xy, yi) ≤ k X i=1 cxi, yσ(i) .

Note that since every σ ∈ Sn is a disjoint composition of cycles, the inequality above can

be checked only for cyclical permutations, i.e. with the permutation ˜σ (i) = i + 1, with the convention that yn+1 = y1. Definition 6 plays a key role in the characterisation of the support of

the optimal transport plan. Indeed, it can be showed that a transport plan γ is a minimizer for the Kantorovich problem if and only if its support is a c-cyclically monotone set, the next results aim to prove this equivalence.

Theorem 8. If c: X × Y → R and Γ 6= ∅ is a c-cyclically monotone set in X × Y , then there

exists a c-concave function ϕ: X → R ∪ {−∞} such that

(16)

Proof. The proof of this Theorem is constructive. Indeed, we give an explicit formula for the function ϕ and we show that it satisfies the properties of the Theorem. Let us fix a point (x0, y0) ∈ Γ and for x ∈ X set

ϕ(x) = inf {c (x, yn) − c (xn, yn) + c (xn, yn−1) − c (xn−1, yn−1) . . .

+c (x1, y0) − c (x0, y0) , n ∈ N, (xi, yi) ∈ Γ, i = 1, . . . , n, yn= y} .

Since c is real valued and Γ is non empty, ϕ never takes the value +∞. Now for y ∈ Y set

−ψ(y) = inf {−c (xn, y) + x (xn, yn−1) − c (xn−1, yn−1) + . . .

+c (x1, y0) − c (x0, y0) , n ∈ N, (xi, yi) ∈ Γ, i = 1, . . . , n} .

Note that by its definition ψ (y) > −∞ if and only if y ∈ (πy) (Γ). Then by construction, we

have ϕ = ψ¯c_{, hence ϕ is c-concave.}

Moreover let x = x0, then for every chain of points (x1, yi) ∈ Γ, we have n X i=0 c(xi+1, yi) ≥ n X i=0 c(xi, yi) ,

where xn+1= x0. This proves that the inf in the definition of ϕ (x0) is non negative, and hence

ϕ(x0) ≥ 0. Thus, we have proved that ϕ (x) is not constantly equals to −∞.

Now let us show that ϕ (x) + ϕc_{(y) = c (x, y) on Γ. Note that by the definition of c-transform}

ϕ(x) + ϕc(y) = ϕ (x) + inf

x c(x, y) − ϕ (x) = infx c(x, y) ≤ c (x, y) .

We have to verify the opposite inequality. Since ϕc_{= ψ}¯cc _{and ϕ}¯cc_{≥ ψ}_{, it suffices to show that}

ϕ(x) + ψ (y) ≥ c (x, y). Suppose that (x, y) ∈ Γ and let ε > 0. From ϕ = ϕc, we can find ¯y ∈ πy(Γ) such that c (x, ¯y) − ψ (¯y) < ϕ (x) + ε. In particular ψ (¯y) 6= ±∞. By the definition of

ψ, we obtain the inequality −ψ (y) ≤ −c (x, y) + c (x, ¯y) − ψ ¯(y) since every chain that starts in ¯y then can be completed by adding the point (x, y) ∈ Γ. By collecting these information we get:

−ψ(y) ≤ −c (x, y) + c (x, ¯y) − ψ (¯y) < −c(x, y) + ϕ (x) + ε.

By the arbitrariness of ε, the latter implies the inequality c (x, y) ≤ ϕ (x) + ψ (y).

Theorem 9. If γ is an optimal transport plan for the cost function c and c is continuous, then

supp (γ) is c-cyclically monotone.

Proof. Suppose that there exists k, σ and (x1) , . . . , (xn, yn) ∈ supp (γ) such that k X i=1 c(xi, yi) > k X i=1 c(xi, yσ(i)) .

Let ε > 0 such that

ε < 1 2k k X i=1 c(xi, yi) − k X i=1 cxi, yσ(i) ! .

By continuity of c, there exists r such that for every i = 1, . . . , k and for every (x, y) ∈ B(xi, r) × B (yi, r), we have c (x, y) > c (xi, yi) − ε and for every (x, y) ∈ B (xi, r) × yσ(i), r, we

have c (x, y) < c

xi, yσ(i)

+ ε. Now let us consider Vi := B (xi, r) × B (yi, r) and note that

(17)

and µi = (πx)#γi, νi= (πy)_#γi. We denoted by γ Vi the measure γ restricted to the subset Vi.

Let

ε < 1

kmini γ(Vi) .

For every i, let us construct an arbitrary measure ˜γi ∈Π

µi, νσ(i)

, for instance ˜γi= µi⊗ νσ(i).

Now define ˜γ = γ − ε0 k X i=1 γi+ ε0 k X i=1 ˜γi.

Now we want to find a contradiction by showing that ˜γ is a better competitor than γ for the transport problem, i.e. ˜γ ∈ Π (µ, ν) and R

c d˜γ <R

c dγ.

Firstly, let us verify that ˜γ is positive, to this end it suffices to show that ε0γ1 < 1_kγ. This

condition is satisfied since

ε0γi= (ε0/γ(V1)) γ Vi,

and ε0/γ(Vi) ≤ 1/k.

Let us verify the marginals condition for ˜γ. We get (πx)_#˜γ = µ − ε0 k X i=1 (πx)_#γi+ ε0 k X i=1 (πx)_#˜γi = µ − ε0 k X i=1 µi+ ε0 k X i=1 µi = µ, (πy)_#˜γ = ν − ε0 k X i=1 (πy)_#γi+ ε0 k X i=1 (πy)_#˜γi = ν − ε0 k X i=1 νi+ ε0 k X i=1 ν_σ(i)= ν. Finally, let us estimate the quantity R

c dγ −R

c d˜γ and let us show that it is positive. We have

Z c dγ − Z c d˜γ = ε0 k X i=1 Z c dγi− ε0 k X i=1 Z c d˜γi ≥ ε₀ k X i=1 (c (xi, yi) − ε) − ε0 k X i=1 cxi, yσ(i) + ε = ε0 k X i=1 c(xi, yi) − k X i=1 cxi, yσ(i) −2kε ! >0,

where we used the fact that γi is concentrated on B (xi, r) × B (yi, r) and ˜γi on B (xi, r) ×

B(yσ(i) , r) and that they have unit mass thanks to the rescaling by γ (Vi).

Theorem 10. Suppose that X and Y are two Polish spaces and that c: X × Y → R is uniform

continuous and bounded. Then the dual problem (1.2) admits a solution (ϕ, ϕc_{) and the max of}

the dual problem is the same of the min of the Kantorovich problem, namely minZ X×Y c dγ : γ ∈ Π (µ, ν) = maxZ X ϕ dµ+ Z Y ψ dν : ϕ ∈ Cb(X) , ψ ∈ Cb(Y ) : ϕ ⊕ ψ ≤ c .

(18)

Proof. Firstly, let us consider the minimization problem (1.1). Since c is continuous the problem admits a solution γ. Moreover, by Theorem 9 the support of γ is c-cyclically monotone. Since c is real valued, by Theorem 8 we know that there exists a c-concave function ϕ such that

Γ ⊂ {(x, y) ∈ X × Y : ϕ (x) + ϕc_{(y) = c (x, y)} .}

By their c- and ¯c-concavity, ϕ and ϕc_{are continuous. Moreover, from ϕ}c_{(y) = inf}

xc(x, y)−ϕ (x),

we get an upper bound for ϕc_{, since c is supposed to be bounded, and thus a lower bound for ϕ}c_,

so the functions ϕ and ϕc _{are both continuous and bounded.}

So we can use (ϕ, ϕc_{) as admissible pair in the dual problem (1.2). Now consider}

Z X ϕ dµ+ Z Y ϕc dν= Z X×Y(ϕ (x) + ϕ c_{(y)) dγ =}Z X×Y c(x, y) dγ,

where the last equality is due to the fact that γ is concentrated on Γ and there we have ϕ(x) + ϕc(y) = c (x, y), while the previous equality follows from the fact that the marginals of γ are µ and ν. By the optimality of γ

(1.2) ≥Z ϕ dµ+

Z

ϕc dν=

Z

c(x, y) dγ = (1.1),

and implies (1.2) = (1.1), since the opposite inequality holds in general. As a consequence, the pair (ϕ, ϕc_{) is optimal for (1.2).}

The next results aim to extend the equality of the duality formula also to the case of unbounded cost functions. This will be useful since we would like to have a nice representation when cost functionals are replaced with distances. The following result provides stability under suitable approximations.

Lemma 1. Suppose that ck, k ∈ N and c are lower semicontinuous and bounded from below

functions such that ck increasingly converge to c. Then

lim k→∞min Z ck dγ : γ ∈ Π (µ, ν) = minZ c dγ : γ ∈ Π (µ, ν) .

Proof. Since the limit is increasing, then ck ≤ c, and thus the limit on the left hand side is

smaller than the one on the right hand side thanks to monotonicity.

We now focus on the other inequality. Consider the sequence of optimal transport plan γk ∈

Π (µ, ν), defined by taking an optimizer for every cost function ck. Up to subsequences, we can

suppose that γk→ ¯γ. Let us fix an index j. Since k ≥ j, we have ck≥ cj, then

lim k→∞min Z ck dγ : γ ∈ Π (µ, ν) = lim k→∞ Z ck dγk ≥lim inf k→∞ Z cj dγk.

By the semicontinuity of the integral, we have lim inf k→∞ Z cj dγk≥ Z cj d¯γ.

Thus, we have obtained lim k→∞min Z ck dγ, γ ∈Π (µ, ν) ≥ Z cj d¯γ.

Since j is arbitrary and limjcj d¯γ =R c d¯γ by monotone convergence, we have that

lim k→∞min Z ck dγ, γ ∈Π (µ, ν) ≥ Z c d¯γ ≥ min Z c dγ, γ ∈Π (µ, ν) .

And this concludes the proof. Note that we have also proved the optimality of ¯γ for the limit cost c.

(19)

Theorem 11. If X, Y are two Polish spaces and c: X × Y → R ∪ {+∞} is lower semicontinuous

and bounded from below, then the Kantorovich duality formula holds and infZ X×Y c dγ : γ ∈ Π (µ, ν) = supZ X ϕ dµ+ Z Y ψ dν : ϕ ∈ Cb(X) , ψ ∈ Cb(Y ) : ϕ ⊕ ψ ≤ c .

Proof. Let us consider a sequence (ck)_k∈N of increasing k-Lipschitz continuous functions that

increasingly converge to c. Then the duality formula holds for ck, so we have

infZ ck dγ : γ ∈ Π (µ, ν) = maxZ ϕ dµ+ Z ψ dν : ϕ ⊕ ψ ≤ ck ≤sup Z ϕ dµ+ Z ψ dν : ϕ ⊕ ψ ≤ c ,

where the max and sup are taken among the functions ϕ, ψ ∈ Cb. The inequality is due to the

fact that ck≤ c, and hence every couple (ϕ, ψ) that satisfies ϕ ⊕ ψ ≤ ck, also satisfies ϕ ⊕ ψ ≤ c.

Taking the limit for k → ∞ we get the thesis by Lemma 1.

Remark 2. Note that in the previous Theorem the sup is needed since we are not able to guarantee the existence of a maximizing pair (ϕ, ψ).

Among all the possible cost functions, we can consider costs given by distances. Thus, we focus on transport problems with costs given by distances functions. In this case, we are able to define, among suitable assumptions, distance on spaces of probability, to this aim is totally devoted the next section. Before going on, let us recall an easy result regarding families of Lipschitz functions.

Proposition 4. Let (fα)_α∈Λ be a family of 1-Lipschitz continuous functions defined on a metric

space (X, d), then f (x) := infαfα(x) is 1-Lipschitz continuous function.

Proof. The family (fα)_α∈Λ shares the same modulus of continuity, hence the thesis by Proposition

1.

Proposition 5. If c: X × X → R is a distance, then a function u: X → R is c-concave if and

only if is Lipschitz continuous with Lipschitz constant, with respect to the distance c, less than or equal 1, namely

|u(x) − u (y)| ≤ c (x, y) ∀x ∈ X, ∀y ∈ Y.

We denote by Lip1 the set of such functions. Moreover, for u ∈ Lip1 we have uc= −u.

Proof. Let u be a c-concave function, then u(x) = χc(c) = inf

y c(x, y) − χ (y)

for some function χ: X → R ∪ {−∞}. Whenever χ (y) 6= −∞, the function x 7→ c (x, y) − χ (y) is a Lip1 function, since c (x, y) is Lip1. Indeed, by the triangle inequality

|c(x, y) − c (z, y)| ≤ c (x, z)

and thus x 7→ c (x, y) is Lip1. Moreover, since u is Lip1 as it is the inf of Lip1 function, we show

that the following identity holds

u(x) = inf

y c(x, y) + u (y) .

It is easy to show that the term on the right hand side is no greater than u (x) since we can choose y = x. Moreover, since u ∈ Lip1, u(x) − u (y) ≤ c (x, y), then u (x) ≤ u (y) + c (x, y) and

thus the equality. This shows that u = (−u)c _{and so that u is c-concave. Next, by applying this}

(20)

The characterisation given by Proposition 5 is very handy, indeed it is possible to provide some useful properties for the Kantorovich problem with cost functional given by distances. Here are provided some typical properties of distances. Recall that if µ ∈ P (X) is such that

Z

X

xdµ(x) < ∞,

then we say that the first moment of µ is finite and we write µ ∈ P1(X)

Corollary 1. If c: X × X → R is a distance, then given µ, ν, ρ ∈ P1(X), we have

Ic(µ, ν) ≤ Ic(µ, ρ) + Ic(ρ, ν) .

Proof. Let u ∈ Lip1, then

Z X u d(µ − ν) = Z X u d(µ − ρ) + Z X u d(ρ − ν) ≤I_c(µ, ρ) + I_c(ρ, ν) .

Then taking the supremum among all u ∈ Lip1, one gets

Ic(µ, ν) ≤ Ic(µ, ρ) + Ic(ρ, ν) .

Moreover, it also holds another typical property of distances, which is stated in the following.

Corollary 2. If c: X × X → R is a distance and µ, ν ∈ P1(X) are such that Ic(µ, ν) = 0,

then µ = ν.

Proof. We give here two point of view to prove this easy fact.

Firstly, notice that Ic(µ, ν) = 0 implies the existence of a transport plan γ ∈ Π (µ, ν) such that

Z

X×X

c(x, y) dγ (x, y) = 0,

that is c (x, y) = 0 γ-almost surely. Then since c is a distance, this means that x = y γ-almost surely, which implies

Z X ϕ(x) dµ (x) = Z X×X ϕ(x) dγ (x, y) =Z X×X ϕ(y) dγ (x, y) =Z X ϕ(y) dν (y) for every test function ϕ, thus µ = ν.

Another possibility is to note that by the Kantorovich duality formula we have

Z

X

u(x) d (µ − ν) (x) = 0 for every u ∈ Lip1 and that guarantees µ = ν.

(21)

1.4 Wasserstein Distance

As seen at the end of the last section, whenever the cost function is given by a distance, the associated minimization problem has some of the properties that defines distances. Recall that the minimization problem is defined for µ, ν ∈ P (X), this suggests us that is possible to define a distance on the space P (X). Thus, consider the polish space (X, d) and consider the problem, for p ≥ 1, Wp(µ, ν) = inf Z X×X d(x, y)p dγ(x, y) : γ (x, y) ∈ Π (µ (x) , ν (y)) _p1 , where we used as cost functional c (x, y) = d (x, y)p _{and µ, ν ∈ P (X). W}

p(µ, ν) is said to be

the p-Wasserstein distance between µ, ν. However, to be a distance is required that Wp(µ, ν) is

a finite quantity for every µ, ν ∈ P (X); and this is not true. To avoid this problem we consider the space of probability measures with finite moments of order p, which is

Pp(X) := µ ∈P (X) : Z X d(x, x0)p dµ(x) < ∞ for some x0 ∈ X . It is easy to show that p < q implies that Pp(X) ⊂ Pq(X). Moreover, note that

Z

X

d(x, x0)p dµ(x) < ∞

independently from the choice x0 ∈ X. Indeed, for every y0∈ X

Z X d(x, y0)p dµ(x) ≤ Z X(d (x, x0) + d (x0 , y0))p dµ(x) ≤ c(p) Z X(d (x, x0) p_{+ d (x} 0, y0)p) dµ (x) = c (p)Z X d(x, x0)p dµ(x) + d (y0, x0)p < ∞,

where c (p) ∈ R is a positive constant. The quantity Wp(µ, ν) is finite and it can be shown that

it is a distance on Pp(X). From the previous section, thanks to Corollary 1 and 2, is clear that

W1(µ, ν) is a distance on P1(X), but is not still clear, when the cost function is given by the

pth-power of a distance, if Wp is still a distance.

The only thing that needs to be checked for Wp is the triangle inequality. In order to do that we

need to relate in some sense the three different probability measures that pop up in the triangle inequality. The natural way to argue in this manner is to consider the transport plan between the first and second probability measure, the transport plan between the second and the third probability measure, and to glue them together. This is is possible by an important Lemma, which enables us to do that when the transports plans have a common marginal. In order to prove the gluing Lemma we need to introduce the concept of disintegration of measures.

Definition 7. Consider a space X endowed with a Borel measure µ and a map f : X → Y valued

in a topological space Y . We say that a family (µy)_y∈Y is a disintegration of µ according to f if

every µy is a probability measure concentrated on f−1({y}), and for every function ϕ ∈ Cb(X),

the map y 7→R

Xϕ dµy is Borel measurable and

Z X ϕ dµ= Z Y dν(y) Z X ϕ dµy, where ν = f#µ.

With abuse of notation, we will make use of the notation µ = µy⊗ ν to denote the disintegration

(22)

We do not investigate in details the existence and uniqueness results regarding disintegration of measures. However, we state the result when X, Y are two Polish spaces. In this setting, the disintegration of measures Theorem allows us to write any probability measure on X × Y as an average of probability measures on {x} × Y , for x ∈ X. Moreover, if π is a probability measure on X × Y , with marginal µ on X, then there exists a measurable applicaiton x 7→ πx, from X

into P (Y ), uniquely determined dµ (x)-almost everywhere, such that π=

Z

X(δx

⊗ πx) dµ (x) ,

which means that for every test functions u ∈ Cb(X × Y ),

Z X×Y u(x, y) dπ (x, y) = Z X Z Y u(x, y) dπx(y) dµ(x) . A detailed description of this statement, and much more, can be found on [GM89].

Given this result we are now ready to prove the gluing lemma that we spoke of previously.

Lemma 2(Gluing Lemma). Let µ1, µ2, µ3 be three probability measures supported in Polish spaces

X1, X2, X3 respectively, and let γ1,2 ∈Π (µ1, µ2) , γ2,3 ∈Π (µ2, µ3) be two transport plans. Then

there exists a probability measure γ ∈ P (X1× X2× X3) with (π1,2)_#γ = γ1,2,(π2,3)_#γ = γ2,3,

where π1,2: X1× X2× X3 → X1× X2 and π2,3: X1× X2× X3 → X2× X3 are the projections

on the first two variables and on the last two variables, respectively.

Proof. Consider γ1,2 and γ2,3 as in the statement, and disintegrate both measure with respect to

their common marginal µ2. Therefore there exists measurable maps γ1,2;2 and γ2,3;2, from X2 to

P (X1) , P (X3) respectively, such that

γ1,2= Z X2 γ1,2;2⊗ δx2 dµ2(x2) , γ2,3= Z X2 δx2 ⊗ γ2,3;2 dµ2(x2) . We set γ = Z X2 (γ1,2;2⊗ δx2⊗ γ2,3;2) dµ2(x2) ,

then γ ∈ P (X1× X2× X3). By the construction of γ it is easy show that it has the required

marginals.

With all these results at our disposal we are finally able to state and prove that Wp is a

distance.

Theorem 12. For all p ≥ 1, Wp defines a metric on Pp(X).

Proof. By its definition Wp is symmetric, nonnegative and it is finite on Pp(X). Moreover,

Corollary 2 still holds when we consider the pth-power of a distance as cost function, then Wp(µ, ν) = 0 if and only if µ = ν. It remains to show that, given µ, ν, ρ ∈ Pp(X), Wp(µ, ν) ≤

Wp(µ, ρ) + Wp(ρ, ν). Let γ1,2, γ2,3be the optimal transport plans (which exist thanks to the

well-posedness of the Kantorovich formulation) between µ and ρ, and between ρ and ν, respectively. Thanks to Lemma 2 there exists γ ∈ P (X × X × X) such that called π1,2, π2,3 the projection

on the first two copies of X and on the last two copies of X, respectively, then (π1,2)_#γ = γ1,2

(23)

Lp_{(X) functions, we get} Wp(µ, ν) ≤ Z X×X d(x, z)p dγ1,3(x, z) 1_p =Z X×X×X d(x, z)p dγ(x, y, z) 1_p ≤ Z X×X×X(d (x, y) + d (y, z)) p _dγ_{(x, y, z)} 1 p ≤ Z X×X×X d(x, y)p dγ(x, y, z) 1_p +Z X×X×X d(y, z)p dγ(x, y, z) 1 p =Z X×X d(x, y)p dγ1,2(x, y) 1_p +Z X×X d(y, z)p dγ2,3(y, z) 1_p = Wp(µ, ρ) + Wp(ρ, ν) .

And this concludes the proof.

Definition 8. Given a Polish space X, for every p ≥ 1 we say that Wp(X) = (Pp(X) , Wp) is

the Wasserstein space of order p of X.

So far we have showed that Wp(X) is a metric space, it is possible to show that there exists

a containment relation between the Wasserstein spaces. Indeed, for every p, q ≥ 1, denoting with γ the optimal transport plan between µ, ν ∈ Pp, by Jensen’s inequality for p ≤ q we obtain

kd(x, y)k_Lp_(γ) ≤ kd(x, y)k_Lq_(γ),

which implies Wp(µ, ν) ≤ Wq(µ, ν), and Wp(X) ⊂ Wq(X). As a consequence, for every p ≥ 1

it follows that W1(µ, ν) ≤ Wp(µ, ν) for every µ, ν ∈ P1(X), that is W1(X) ⊂ Wp(X).

Remark 3. Throughout Chapter 1 we defined the p-Wasserstein distance for two probability measure µ, ν, however, if µ and ν are only positive measure, but with the same mass, that is µ, ν ∈M (X) with µ (X) , ν (X) = M, then the p-Wasserstein distance still makes sense. Indeed, both µ and ν, can be rescaled in such a way that they become probability measure (it suffices to divide each measure by a factor 1/M), then it is easy to notice that

Wp(µ, ν) = M · Wp _µ M, ν M .

In particular, what was needed in defining the p-Wasserstein distance, was not that the two measures were both probability measures, but that they had the same total mass.

(24)

2

Fractional Brownian Motion

Fractional Brownian motion, sometimes abbreviated with fBm, is one of the most known stochastic processes. It is a popular model for both short-range and long-range dependent phenomena arising from various fields. For instance, a class of phenomena with long interdepence is encountered in hydrology, where Hurst found the range of cumulated water flows to vary proportionally to tH

with 1/2 < H < 1. Recently, it has grown in popularity thanks to its usefulness in mathematical finance, statistical mechanics and many other fields. Indeed, not only it is a generalisation of the standard Brownian motion, but it also differs very much from its most known version. The main difference between standard Brownian motion and fractional Brownian motion is that the increments of the first are independent, while the increments of the latter are not.

Due to its wide use, there are many references on fBm, for instance see [She14] as an essential toolbox on the topic, or see the extensive lecture notes [Nou12].

In our analysis we will consider a fractional Brownian motion taking values on a d-dimensional torus. Before stepping directly into the description of a fBm taking values on the torus, let us start by defining a one dimensional version of a fractional Brownian motion taking values on R. As well known, we can define a Gaussian process by mean of its mean and covariance functions.

Definition 9. A one dimensional fractional Brownian motion BH = B_tH

t∈[0,T ] is a almost

surely continuous centered Gaussian process with covariance function given by EhB_tHB_sHi= 1

2

s2H+ t2H − |t − s|2H. (2.1) Remark 4. To be more precise, definition 9 does not guarantee the existence of a fBm. Indeed, for every value of the Hurst index H ∈ (0, 1) the distribution of the process BH _{is uniquely}

determined by its mean and covariance function, however we need to check that the covariance function that defines the process is non negative definite. However, many results of existence are known, and in the following we recall one which will also be useful in Chapter 4.

It has been shown that not only a fBm exists, but it can be constructed in term of Wiener’s integral. Indeed, the following holds.

Theorem 13. The process BH ₌_BH t t∈[0,T ] defined by B_tH = p 2π sin (πH) Γ (2H) Γ H+1₂ Z t −∞ (t − s)H−12 + −(−s) H−1₂ + dBs,

(25)

where B is a standard Brownian motion and Γ is the Gamma function, is a fractional Brownian motion.

The representation above is called moving average representation of fractional Brownian motion and was firstly introduced by Mandelbrot and Van Ness in [MN68]. Note that BH

0 = 0,

sometimes a fractional Brownian motion that starts in the origin is called standard fractional Brownian motion. Proof. Set KH(t, s) = (t − s)H− 1 2 + −(−s) H−1₂ + ,

and note that for any t ∈ [0, T ], KH_{(t, ·) belongs to L}2_{(R), thus the Wiener integral is well}

defined and Eh

B_tHi= 0 for any t ∈ [0, T ]. Furthermore, by the Ito’s isometry and by the change of variables tx = s, we get E B_tH2 = 2π sin (πH) Γ (2H) Γ H+1₂2 Z t −∞ KH(t, s)2 ds = t2H2π sin (πH) Γ (2H) Γ H+1₂2 Z 1 −∞ KH(1, x)2 dx= t2H.

Indeed, it holds that

Z 1 −∞ KH(1, x)2 dx= Γ H+1₂2 2π sin (πH) Γ (2H), see for instance [Mis08] Appendix A. Moreover, for h > 0 we have

B_t+hH − B_tH = p 2π sin (πH) Γ (2H) Γ H+1₂ Z t −∞ KH(t + h, s) − KH(t, s) dBs +Z t+h t KH(t + h, s) dBs ! = p 2π sin (πH) Γ (2H) Γ H+1₂ (I1+ I2) .

Note that the term on the right hand side are independent and that the increments of a Brownian motion are stationary, thus

I1= Z 0 −∞ KH(h, s) − KH(0, s) dBs= Z 0 −∞ KH(h, s) dBs, I2= Z h 0 KH(h, s) dBs, so that E B_t+hH − BH t 2 = h2H_.

Finally, for 0 ≤ s < t ≤ T we get E h B_tHB_sHi= 1 2 E B_tH2 + E B_sH2 − E B_tH− B_sH2 = 1₂ t2H + s2H−(t − s)2H, and thus the thesis.

(26)

As we said before fBm is a generalisation of standard Brownian motion, indeed, for H = 1/2 the covariance function is Eh

B_tHB_sHi = t ∧ s, that is the covariance function of a standard Brownian motion. The main difference between the standard and fractional Brownian motion is that the increments of the first are independent, while, in general,the increments of the second are not. In particular, there are different behaviours depending on the value of the Hurst index H.

• If H > 1

2 we have that the increments of the fBm are positive correlated, that it is

E h B_tH₁ − BH s1 B_tH₂ − BH s2 i

>0 for every 0 ≤ s1 < t1 < s2 < t2 ≤ T, this also means

that the covariance function of the fBm is convex. For such value of H if the process was increasing in the past it will be more likely to keep the trend than to break it. Moreover, we say that for such H the fBm has the property of long memory, meaning that the increments are long-range dependent.

• If H < 1

2 we have that the increments of the fBm are negatively correlated, that it is

E h B_tH₁ − BH s1 B_tH₂ − BH s2 i

<0 for every 0 ≤ s1 < t1 < s2 < t2 ≤ T, this also means

that the covariance function of the fBm is concave. For such value of H is the process was increasing in the past it will be more likely to decrease in the future. In this case we say that the fBm has the property of counterpersistence.

Another interesting property of the increments is that they are stationary, and this easily follows from the covariance function.

Proposition 6. The increments of a fractional Brownian motion BH _{are stationary.}

Proof. Let 0 < s < t ≤ T and set:

Yt= BHt+s− BsH.

Then Y is a centered Gaussian process. Now computing its covariance function we get E[YtYr] = E

h

B_t+sH − B_sH B_r+sH − B_sHi = 1₂

t2H+ r2H− |t − r|2H= R (t, r) , that means that Y has the same distribution of BH_{, and thus the thesis.}

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 -3 -2 -1 0 1 2 3 (a) H = 0.3 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 -1.5 -1 -0.5 0 0.5 1 1.5 2 (b) H = 0.7

Figure 2.1: Simulation of 10 one dimensional fBms for different values of H. Moreover, the fractional Brownian motion enjoys some nice scaling properties.

Proposition 7. A fractional Brownian motion is H-self-similar, meaning that for every a > 0

Asymptotics of Transportation Cost for Occupation Measures of Fractional Brownian Motion

Asymptotics of Transportation Cost for

Occupation Measures of Fractional

Brownian Motion

Francesco Mattesini

Dario Trevisan

Martin Huesmann

Marco Romito

Contents

Introduction

0.1

The Matching Problem and Known Results

0.2

Main Result

1

Optimal Transport

1.1

Polish Spaces

1.2

Monge and Kantorovich Formulations

1.3

Duality Theory

1.4

Wasserstein Distance

2

Fractional Brownian Motion