A New Approach to the Random Matching Problem

(1)

Universit`

a degli Studi di Pisa

Corso di Laurea Magistrale in Matematica

Tesi di laurea magistrale

A New Approach to the Random Matching Problem

in 2 Dimensions

Candidato: Federico Glaudo Matricola 508684 Relatore: Luigi Ambrosio Controrelatore: Dario Trevisan

(2)

(3)

INTRODUCTION

1 Random Matching Problem

The random matching problem is a very classical combinatorial optimization prob-lem, seen from a probabilistic point of view. Some other examples of optimization problems are the travelling salesman or the minimum spanning tree. For all of these problems it is possible to ask ourselves what is the average value of the re-sult for a certain random distribution of the problem datum. A monograph on this kind of problems is [Yuk06].

This is exactly the framework where the statement of the random matching problem naturally takes place.

Problem 1.1 (Discrete matching). Let (M, d, m) be a measure metric space such

that m is a probability and let n > 0 be a positive integer. If X1, . . . , Xn and

Y1, . . . , Yn are independent identically distributed random points on M with

distri-bution m, study the quantity

E " min σ∈Sn 1 n n X i=1 d(Xi, Yσ(i))2 # .

The statement of the problem is quite vague on purpose. Indeed computing explicitly the said quantity is a too optimistic goal for almost any choice of the ambient space. Thence what we are looking for is a good bound on the average matching cost.

As appealing as the statement can be, it is still a little bit too hard to be cracked directly with analytical tools1 as we would like. Instead of having two different families of random points, it is handier to have just one and match it to the reference measure m. Of course, technical issues arise when trying to generalize the definition of matching cost as we don’t have anymore two sets of points but only one. It is exactly to address this issue that we will introduce in Chapter 1

(Optimal Transport Tools) the definition of Wasserstein distance. The Wasserstein distance should be understood as a quantitative way to say how far two measures are.

1_{We call analytical tools the partial differential equations methods we will adopt in contrast}

(6)

Problem 1.2 (Semi-discrete matching). Let (M, d, m) be a measure metric space

such that m is a probability and let n > 0 be a positive integer. If X1, . . . , Xn are

independent identically distributed random points on M with distribution m, study the quantity E " W₂2 1 n n X i=1 δXi, m !# . (1.1)

This latter problem is the one we will call the random matching problem whereas the former one will be called the discrete random matching problem.

Both in the discrete and semi-discrete statement there is a choice that seems arbitrary: the exponent. In both cases we have chosen the exponent 2 even if the problem still makes sense for any exponent 1 ≤ p < ∞. Given that we will consider only the quadratic case in this thesis, we have decided to fix the exponent to 2 in the statement (even if in the following sections we will treat briefly also different exponents).

Let us emphasize immediately that the discrete and semi-discrete statement are tightly linked one to the other. We will describe their relation in the following sections. The scaling factor _n1 in the discrete statement is there in analogy to the semi-discrete case.

As a general reference, we suggest the book [Tal14], which devotes multiple chapters to the treatment of the random matching problem.

2 Lower Bound Heuristic

We are going to describe an heuristic that works in any setting where we can talk of dimension2 _{and gives a nontrivial estimate on the asymptotic behaviour of}

Eq. (1.1) when n goes to infinity. Let us consider the semi-discrete statement, we are able to estimate from below the quantity

W_pp 1 n n X i=1 δXi, m !

independently of the choice of the points X1, . . . , Xn.

Let T : M → X1, . . . , Xn be an optimal (or almost optimal) matching and let

us define the sets Ωi ⊆ M such that T−1(Xi) = Ωi. Hence we do have m(Ωi) = _n1

and it holds W_pp 1 n n X i=1 δXi, m ! ≈ n X i=1 ˆ Ωi d(x, Xi)pdm(x) . (2.1) 2_{In order for this heuristic to make sense we have to be able to control the measure of balls.}

(7)

3. Known Results in All Dimensions

If our space M has dimension d ∈ N, then we are able to carry on our estimate. Indeed, on a d-dimensional space, we do expect that for any measurable set Ω ⊆ M and any point x0 ∈ M it holds

ˆ

Ω

d(x, x0)pdm(x) ≥ λ · m(Ω)1+

p

d (2.2)

for some constant λ = λ(M ). For example, this inequality can be proven easily on a d-dimensional Riemannian manifold.

Applying Eq. (2.2)into Eq. (2.1), we get

W_pp 1 n n X i=1 δXi, m ! ≥ λ 1 npd .

It is natural to ask ourselves whether this asymptotic behaviour of the cost is sharp or not. We will devote the next section to illustrate which are the known results in literature.

As anticipated, the heuristic described can be formalized quite easily on a manifold. On the other hand, it is messier to formalize properly such an heuristic if the discrete version of the problem is considered. Nonetheless, it can still be done if instead of the sets Ω1, . . . , Ωn, the Voronoi cells of the points X1, . . . , Xn

are considered (see [Aur91] for the definition of Voronoi cells). For completeness, let us recall that the Voronoi cells are the sets (Ωi)1≤i≤n defined like

Ωi := {x ∈ M : d(x, Xi) < d(x, Xj) ∀j 6= i} .

3 Known Results in All Dimensions

The random matching problem has been studied extensively both in the mathe-matical and physical literature. The setting where the problem has attracted the largest interest is the d-dimensional cube [0, 1]d, typically with p = 1 or p = 2. It should be crystal clear why this is the most interesting setting for applications. Even though in this thesis we will consider only the semi-discrete problem, most of the literature is focused on the discrete statement. Let us remark that the ma-jority of the results valid in the discrete setting keeps holding in the semi-discrete setting and the other way around.

Let us start with the easiest possible ambient space, that is the interval [0, 1].

3.1 Dimension 1

In dimension 1 the problem has a specific structure that makes the analysis much easier, compared to any other dimensions. Indeed in dimension 1 (more specifically

(8)

on the interval) an optimal matching is characterized by monotonicity. Once we know that a matching is optimal if and only if it is monotone, the problem becomes the study of the probability distribution of the k-th point in the increasing order. Such a distribution can be computed explicitly if the original distribution of the n random points is uniform on the interval. If the distribution is uniform, the result is E " W₂2 1 n n X i=1 δXi, m !# = 1 6n.

As anyone can notice, this growth does not agree with our heuristic reasoning. Indeed the expected growth was n−2. A (not entirely satisfactory) explanation of this deviation is the presence of further obstructions to the matching that are given by the rigidity of the one dimensional interval. In other words, it is costly to match points not only because they don’t coincide microscopically, but also because the macroscopic distributions of the points do not coincide3_{. More specifically it might}

be impossible to match each point to the space around that point. We will see how this phenomenon disappears in higher dimensions.

A monograph on the 1-dimensional case, where of course distributions different from the uniform one are studied, is [BL14].

3.2 Dimension ≥ 3

In dimension 3 and higher the behaviour forecasted in Section 2 (Lower Bound Heuristic) is correct. Once again let us focus on the simple case of M = [0, 1]d and m =Ld|_M.

It is known that for any 1 ≤ p < ∞ and for any positive integer n > 0 it holds

C1 npd ≤ E " min σ∈Sn 1 n n X i=1 d(Xi, Yσ(i))p # ≤ C2 npd .

for a suitable choice of the constants C1 = C1(d, p) and C2 = C2(d, p). When

p = 1, this result is proven in [DY95;Tal92], whereas the paper [Led17] addresses all cases 1 ≤ p < ∞ with methods, inspired by [AST16], very similar to the ones we are going to use.

Even more, in [BB13], it is proven that the limit lim n→∞ 1 npd E " min σ∈Sn 1 n n X i=1 d(Xi, Yσ(i))p #

exists under the assumption 1 ≤ p < d/2.

(9)

4. Structure of the Thesis

3.3 Dimension 2

This will be the main topic of investigation in this thesis. It is natural to ask ourselves whether in dimension 2 the behaviour is the expected one (i.e. n−p/2) or not. The answer is negative and, as proven in the fundamental paper [AKT84] when M = [0, 1]2, the correct behaviour is instead

E " min σ∈Sn 1 n n X i=1 d(Xi, Yσ(i))p # ≈ log(n) n !p/2 ,

that is bigger than the expected growth by a log(n) factor. The proof is essen-tially combinatorial and following such a strategy there is little hope to be able to compute the limit of the renormalized quantity, that is the real number Cp such

that E " min σ∈Sn 1 n n X i=1 d(Xi, Yσ(i))p # ∼ Cp log(n) n !p/2 .

Much more recently, in 2014, in [Car+14] the authors gave a claim on the value of the constant C2 with an ansatz supporting their claim. The claimed value is

C2 = _2π1 for the discrete version of the problem and C2 = _4π1 for the semi-discrete

version.

Two years later in [AST16] the authors successfully proved that the claim is indeed true for p = 2. Their main theorem shows the identity

lim n→∞ E [W22( Pn i=1δXi, m)] log(n) · n−1 = m(M ) 4π

whenever M is a 2-dimensional compact closed Riemannian manifold and m is the associated volume measure.

4 Structure of the Thesis

The main goal of this thesis is to describe a new approach, inspired from the paper [AST16], to prove the asymptotic behaviour of the matching cost on a compact 2-dimensional Riemannian manifold without boundary. The proof we are going to present does not rely on the duality theory used in the mentioned paper, whereas a characterization of optimal maps on manifolds will be exploited.

The first two chapters are entirely devoted to the introduction of the technical tools needed for the proof: the optimal transport theory and the heat flow.

In the first chapter we will outline, mostly without a proof, the central results of the optimal transport theory, focusing specifically on those parts of the theory we will need. The chapter is structured as a quick introduction to the topic. Instead

(10)

the second chapter is devoted to the heat flow (on manifolds and on more general spaces). The majority of the propositions are proven with all the necessary details. We will pay special attention to the generalization of the heat flow on manifolds with boundary and on weighted manifolds. The reason is that we believe, and we will explain why in the last chapter, that our proof of the asymptotic behaviour might work also in these more general spaces.

After these preliminaries, the core of the thesis starts. In the third chapter we describe a property of a map that implies its optimality as transport map. Such a property is an improvement over a proposition stated in [Vil08]. In Layman terms, it can be seen as a generalization of the converse of Brenier Theorem (namely that the gradient of a convex function is always optimal, for any initial measure) to the curved setting.

Such a result will be central in the last chapter where we present the new proof of the asymptotic behaviour. Let us state our main theorem:

Theorem 4.1. Let (M, g) be a 2-dimensional compact closed manifold whose

vol-ume measure m is a probability. Let (Xi)i∈N be a family of independent random

variables m-uniformly distributed on M and let Cn denote the cost of the matching

between the empirical measure and the reference measure: Cn:= W22 1 n n X i=1 δXi, m ! .

Then the asymptotic relation

E [Cn] ∼

log(n) 4πn

holds.

In the following paragraphs we are going to describe very briefly the strategy adopted to prove the theorem. The idea is to fix a short time t > 0 and consider the heat flow of the measure _n1 P

δXi at time t, denoted by µ

n,t_{. With a suitable}

choice of the time t, the measure µn,tis both extremely near to the original measure and has a smooth density that is approximately 1. Thence, linearizing an a-priori complicated partial differential equation, we are able to construct a transport map from µn,t to the reference measure m. The said transport map is induced by the flow of a time-dependent vector field (vs)0≤s≤1that is a small perturbation of ∇fn,t

where fn,t _{is the null mean solution of the Poisson equation −∆f}n,t _{= µ}n,t_{− 1}

(let us recall that µn,t is a smooth measure). Thanks to the optimality condition we will be able to show that the map induced by ∇fn,t _{(that is not mapping m}

(11)

4. Structure of the Thesis

vs and ∇fn,t, we will deduce

W₂2(µn,t, m) ≈

ˆ

M

|∇fn,t_|2_{dm .}

The last step of the proof consists in estimating the value of the integral, that happens to be exactly log(n)_4πn .

Let us remark that, even if we are not going to do it, it is possible to adapt the proof also to the discrete version of the problem, as it is done in [AST16, Section 4.3].

(12)

(13)

NOTATION

In the thesis we will use the following notation and conventions, often without recalling their meaning.

• The dimension of the ambient space is denoted with the nonnegative integer

d ∈ N. The more conventional letter n will be used to denote the number of points of the instance of the matching problem.

• Ld _{will denote the Lebesgue measure on R}d_.

• (M, m) denotes a Riemannian manifold equipped with its standard volume measure m.

• B(X) denotes the σ-algebra of Borel sets on a topological space X.

• Cb(X) denotes the bounded and continuous functions from a topological

space X into R. • C∞_{(X) and C}∞

c (X) denote respectively the smooth functions and the

com-pactly supported smooth function on a smooth manifold X.

• D(X) denotes the family of distributions on the space X (i.e. continuous linear functionals on C_c∞(X)).

• P(X) denotes the family of probability measures on a measurable space (X,B(X)).

• M (X) denotes the family of finite Radon measures on a measurable space (X,B(X)).

• δx denotes the probability measure concentrated on the point x.

• exp : T M → M is the exponential map from the tangent of a Riemannian manifold into the manifold itself.

• B(x, r) denotes the open ball with center x and radius r.

• ∂B(x, r) denotes the boundary of the ball with center x and radius r. • Given a Hilbert space (H, h · , · i) and a convex function F : H → [−∞, ∞]:

(14)

– dom(F ) denotes the domain of F , namely the points where F ∈ R; – ∂F denotes the subdifferential of F , namely

∂F (x) = {x∗ ∈ H : F (y) ≥ F (x) + hx∗, y − xi ∀y ∈ H} .

• Wk,p_{(X) is the space of functions f : X → R whose derivatives up to order}

k are in Lp(X).

• Wk,α_{(X) is the space of functions f : X → R that are classically}

differen-tiable up to order k and whose k-derivatives are α-Hölder.

• A . B means that there exists a positive constant C = C(M) that depends only on the manifold4 _{M such that A ≤ CB. If we add a subscript like .}

p

it means that the implicit constant is allowed to depend also on p.

• A(t) ∼ B(t) means that limt→0A_B = 1. If A or B depend also on some other

variable apart from t, we ask the limit to be uniform in all other variables (apart from the ambient manifold M , that is considered fixed). For example it is false that _n1 + t ∼ _n1 as the convergence is not uniform in n, whereas it holds n(1_t + 1) ∼ n_t and also diam(M ) + t ∼ diam(M ).

(15)

1

OPTIMAL TRANSPORT TOOLS

The optimal transport theory plays a role in the matching problem as it is the most natural way to define a distance on measures and therefore to compute how much a heuristic sample differs from the reference measure. In this chapter we will outline, mostly without proof, the theory of optimal transport that we will need. In our presentation we will devote a substantial amount of words to the treatment of the discrete case. This is done mainly for the substantial importance that it has in the study of the matching problem. On the other hand, we will not emphasize the minimal hypotheses needed for the various results and will often assume more than what is needed in the optimal statements known in literature.

All of what we are summarizing here, and much more, can be found in any monograph on optimal transport (see for example [Vil08; San15]).

1.1 Polish Spaces and Prokhorov Theorem

As a preliminary technical knowledge, let us define what a Polish space is. They are fundamental to the study of optimal transport and more in general to the study of abstract measure theory. The theory of Polish spaces is studied in detail in [Bog07, Chapter 6 and 8].

Definition 1.1.1. A topological space (X, τ ) is a Polish space if it is separable

and admits a distance that makes it a complete metric space.

It goes without saying that more or less all well-behaved spaces are Polish. Indeed any manifold (smooth or nonsmooth, with or without boundary, complete or not) is a Polish space.

Their importance in measure theory is due to Prokhorov theorem, that gives a characterization for the precompactness of a set of measures.

Theorem 1.1.2 (Prokhorov). Let (X, τ ) be a Polish space and let (µi)i∈I ⊆P(X)

(16)

1. The family of measures is precompact in the space P(X) equipped with the weak topology1.

2. The family is tight, i.e. for any > 0 there exists a compact K ⊆ X such that µi(X \ K) < for any i ∈ I.

1.2 Monge Formulation

Let us start by stating the Monge formulation of the optimal transport problem. Historically it was the first statement of the problem.

Problem 1.2.1 (Monge formulation). Let (X,B(X), µ) and (Y, B(Y ), ν) be two

Polish probability spaces and let c : X × Y → [0, ∞] be a Borel cost function. Investigate the following infimum and study whether it is a minimum

inf (ˆ X c(x, T (x)) dµ(x) : T : X → Y is Borel, f#µ = ν ) .

A Borel function T : X → Y such that T#µ = ν is called a (admissible) transport

map. An admissible transport map that is also a minimizer for the Monge problem is said an optimal transport map.

The most typical setting is X = Y = Rd _{with cost function c(x, y) = |x − y|}p

with 1 ≤ p < ∞. With such choice of the ambient space, the optimal transport problem can be easily identified with the real-world problem of moving a certain amount of mass from a distribution to another one trying to minimize the effort (that is of course a function of the distance).

In this thesis there will be at least another important setting, that is X =

Y = M where M is a Riemannian manifold and c(x, y) = d(x, y)p_{, denoting with}

d : M × M → [0, ∞) the Riemannian distance. As we will see later in this

chapter, in the two said settings, the problem is very well understood and, under mild additional assumptions, the infimum is a minimum.

It is important to appreciate since the very beginning what is the discrete version of the problem as it is one of the main character in the random matching problem. Let us fix two sets (xi)1≤i≤n, (yi)1≤i≤n of n points in the Euclidean space

Rd. If we denote µ = n1 P δxi and ν = 1 n P

δyi, the optimal transport problem with

respect to the cost c becomes the minimization problem over the permutations

σ ∈ Snof the quantityPic(xi, yσ(i)). From this viewpoint on the discrete transport

problem it should be clear the reason behind the name “random matching”: we have to match each point of the first set with a point in the second set.

1_{The weak topology is here intended as the topology induced by the duality with C}

(17)

1.3. Kantorovich Formulation and Wasserstein Distance

1.3 Kantorovich Formulation and Wasserstein

Distance

Even though the formulation given by Monge is the most intuitive one, it is not the most suitable to be attacked with mathematical tools. One of issues of the formulation is the asymmetry in the statement: the measures µ and ν do not play the same role. Furthermore, it is not clear a priori whether the infimum is a minimum. And last but not least, in the Monge formulation is not even obvious whether the class of transport maps is nonempty.

All of the said issues are addressed by the Kantorovich formulation, that relaxes the problem and instead of looking for a function looks for a measure.

Problem 1.3.1 (Kantorovich formulation). Let (X,B(X), µ) and (Y, B(Y ), ν) be

two Polish probability spaces and let c : X × Y → [0, ∞] be a Borel cost function. Investigate the following infimum and study whether it is a minimum

inf (ˆ X×Y c(x, y) dπ(x, y) : π ∈P(X × Y ) s.t. (p1)#π = µ, (p2)#π = ν ) . A Radon measure π ∈ P(X × Y ) such that (p1)#π = µ, (p2)#π = ν is called a

(admissible) transport plan. The set of admissible transport plans will be denoted by Γ(µ, ν). A transport plan that is also a minimizer is called an optimal transport plan.

There is a canonical morphism from transport maps to transport plans. Indeed if f : X → Y is a transport map, then πf = (1 × f )#µ is a transport plan and

ˆ X c(x, f (x)) dµ(x) = ˆ X×Y c(x, y) dπ(x, y) .

Therefore the infimum in the Kantorovich formulation is always smaller or equal than the infimum in the Monge formulation. We will see sufficient conditions for the equality between these two values.

The Kantorovich formulation shows a new property that was missing in the Monge one: linearity and convexity. Indeed the functional to minimize is linear and the space of admissible plans is convex.

Let us stress that the existence of an admissible transport plan is obvious: the choice π = µ ⊗ ν satisfies all requirements. Moreover the existence of a minimizer seems much more tractable in this new formulation, indeed the set of admissible transport plans is compact in the weak topology thanks to Prokhorov Theorem. In fact, as soon as the cost is lower semicontinuous the Kantorovich problem admits a minimizer:

(18)

Theorem 1.3.2. If c : X × Y → [0, ∞] is lower semicontinuous then the infimum

in Kantorovich formulation is attained.

Sketch of proof. To show that the minimum is attained it is enough to follow the

typical approach of the direct method in the calculus of variations.

Indeed it is easy to see that the family of transport plans Γ(µ, ν) (for some fixed measures µ and ν) is tight and therefore it is compact in the weak topology as stated in Theorem 1.1.2 (Prokhorov).

Furthermore the lower semicontinuity of c implies the lower semicontinuity of the functional π 7→´_X×Y c(x, y) dπ(x, y) with respect to the weak topology, hence

the result is proven.

It is now time to give some definitions that are very useful when the cost is a power of the distance function.

Definition 1.3.3. Given 1 ≤ p < ∞, let us assume that (X, d) is a metric space,

X = Y and the cost is c(x, y) = d(x, y)p_.

We will denote with Pp(X) ⊆ P(X) the space of probabilities µ such that

´

Xd(x, x0)

p_{dµ(x) < ∞ where x}

0 is an arbitrary point in X.

Given two measures µ, ν ∈ Pp(X), let us define the Wasserstein distance

Wp(µ, ν) between them as the infimum in the Kantorovich formulation with cost

c(x, y) = d(x, y)p_.

It is natural to ask ourselves which properties does the Wasserstein distance have. Let us summarize in a clear statement the most crucial properties of the said metric space.

Proposition 1.3.4. For any 1 ≤ p < ∞, the space (Pp(X), Wp) is a

com-plete metric space. Furthermore the induced topology is almost the weak topol-ogy, namely, given a sequence (µn)n∈N ⊆Pp(X) and a measure µ ∈ Pp(X), the

following statements are equivalent:

• The sequence µn converges to µ with respect to the Wp distance.

• The sequence converges weakly µn * µ and furthermore, for an arbitrary

x0 ∈ X, it holds2 ˆ X d(x, x0)pdµn(x) → ˆ X d(x, x0)pdµ(x) .

The following proposition, albeit very easy, will play a crucial role in the random matching problem.

2_{Let us remarks that if it holds for an x}

(19)

1.3. Kantorovich Formulation and Wasserstein Distance

Proposition 1.3.5 (Joint Convexity of Wp

p). Let 1 ≤ p < ∞ be a fixed exponent,

let µ1, µ2, ν1, ν2 ∈Pp(X) be four probability measures. For any choice of α1+α2 =

1 with α1, α2 ≥ 0 it holds

W_pp(α1µ1+ α2µ2, α1ν1+ α2ν2) ≤ α1Wpp(µ1, ν1) + α2Wpp(µ2, ν2) .

Proof. Let Σ1, Σ2 be two optimal transport plans between µ1, ν1 and µ2, ν2

respec-tively. It should be clear that α1Σ1+ α2Σ2 is a transport plan between α1µ1+ α2µ2

and α1ν1+ α2ν2. Therefore it holds

W_pp(α1µ1+ α2µ2,α1ν1+ α2ν2) ≤ ˆ X×X d(x, y)pdΣ(x, y) = α1 ˆ X×X d(x, y)pdΣ1(x, y) + α2 ˆ X×X d(x, y)pdΣ2(x, y) = α1Wpp(µ1, ν1) + α2Wpp(µ2, ν2)

and that is the desired result.

Last theorem (Theorem 1.3.2) assures us the existence of minimizers, but it doesn’t say anything about them. In order to get a characterization of the optimal transport plans we have to investigate the problem a lot more in depth. Before doing that in the general case, let us anticipate the results in the discrete case, where the technicalities are not hiding the ideas.

Remark 1.3.6 (Discrete Kantorovich problem). Let us assume that X = Y =

{1, 2, . . . , n} and that µ = ν are equal to the counting measure (up to renor-malization). Given an admissible transport plan π ∈ P({1, . . . , n}2), we can consider the associated doubly stochastic3 _{matrix Π ∈ M(n × n) such that for any}

1 ≤ i, j ≤ n it holds Πij = nπ((i, j)). In this new matrix formulation, the quantity

to be minimized is P

i,jΠijc(i, j).

First of all, recalling the Monge formulation, we would like to show that there exists a minimizer that is a permutation matrix. Such a result follows quite easily from the classical Birkhoff-von Neumann theorem (see the classic paper [Von53, Lemma 2] for a proof).

Theorem 1.3.7 (Birkhoff-von Neumann). The extreme points4 of the set of doubly stochastic matrices are the permutation matrices.

The set of stochastic matrices is convex and the functional to be minimized is clearly linear, hence there exists a minimizer that is an extreme point of the set of

3_{A matrix is doubly-stochastic if all the entries are nonnegative and the sum on all rows and}

columns is 1.

(20)

stochastic matrices. The cited theorem tells us that the extreme points are exactly the permutation matrices and therefore we have shown that the Kantorovich’s minimum is equal to Monge’s minimum in the discrete setting.

Even though the approach we have followed to show that Monge and Kan-torovich problem have the same minimum is quite distant from the approach in the general case, the strategy we are going to outline for the characterization of the minimizer is strongly influenced from the strategy in the general case.

Let us assume that Π ∈ M(n × n) is a doubly stochastic matrix. We would like to find a characterization of the minimizers, hence we can try a variational approach. What is the simplest admissible variation of M that doesn’t change the fact that it is doubly stochastic? Let (ih, jh)1≤h≤k ∈ {1, . . . , n}2 be a finite

sequence of pairs such that Πihjh > 0 for any 1 ≤ h ≤ k. For any > 0 small

enough, the matrix

Π −  k X h=1 δihjh− δihjh+1 !

is still doubly stochastic and therefore is associated to a valid transport plan. The variation of the cost functional is

k X h=1 c(ih, jh+1) − k X h=1 c(ih, jh) ! ,

therefore, if Π was a minimizer, it must hold

k X h=1 c(ih, jh+1) ≥ k X h=1 c(ih, jh) .

This is a necessary condition on the positive entries of a minimizing matrix. It is true, albeit far from obvious, that this condition is also sufficient. This condition in the general case corresponds to the c-monotonicity of the support of the plan and will be one of the main topics of the next section.

1.4 Duality Theory and Characterization of

Minimizers

In this section we assume that both the spaces X, Y and a Lipschitz continuous and bounded cost c : X × Y → [0, ∞) are fixed. The strong assumption on the cost will be lightened a posteriori at the end of this section.

We are going to give just an overview of the duality theory related to the opti-mal transport. The main ideas are inherited from convex analysis and specialized

(21)

1.4. Duality Theory and Characterization of Minimizers

for the problem. An introduction to convex analysis where all these concepts are investigated in detail is the wonderful book [Roc70] by Rockafellar.

Let us start with the already mentioned definition of c-monotonicity.

Definition 1.4.1 (c-monotonicity). A Borel set S ⊆ X × Y is c-monotone if and

only if for any finite sequence of pairs ((xi, yi))1≤i≤n ⊆ S it holds

n X i=1 c(xi, yi) ≤ n X i=1 c(xi, yi+1) ,

where the indexes are intended cyclic (yn+1= y1).

As already shown in the discrete setting, the c-monotonicity of the support of the plan is equivalent to optimality.

Theorem 1.4.2 (c-monotone support ⇐⇒ optimal plan). For an admissible

transport plan π ∈ Γ(µ, ν) such that ´ c dπ < ∞ the following statements are equivalent:

• The plan π is a minimizer for the Kantorovich problem. • The plan π is supported on a c-monotone set.

The fact that the c-monotonicity of the support is necessary can be proven following the same strategy we have outlined in the discrete case. Whereas the other implication is much deeper and needs some more tools to be conquered.

Let us continue giving the related definitions of c-concavity, c-conjugate and

c-superdifferential. These three concepts are generalizations of classical definitions

in convex analysis of: convexity, Fenchel conjugate and subdifferential.

Definition 1.4.3 (c-concavity). A function f : X → R ∪ {−∞} is c-concave if

there exists a set of pairs {(yi, αi)}i∈I ⊆ Y × R such that

f (x) = inf

i∈Ic(x, yi) − αi.

Let us remark that, differently from the standard definition of concavity, it is directly implied from the definition that a c-concave function is upper semicontin-uous.

Definition 1.4.4 (c-conjugate). Given a function f : X → R ∪ {−∞}, let us

denote f∗ _{: Y → R ∪ {−∞} the c-conjugate of f , defined as}

f∗(y) := inf

(22)

Definition 1.4.5 (c-superdifferential). Given a function f : X → R ∪ {−∞} and

a point x ∈ X such that f (x) ∈ R, let the c-superdifferential at x be the set

∂cf (x) = {y ∈ Y : f (x0) − c(x0, y) ≤ f (x) − c(x, y)} .

For the sake of brevity we will not spend a lot of words on the given definitions, but let us give a list of straightforward but useful results concerning them.

• Simply swapping the roles of X and Y in the definition, allows us to de-fine also the conjugate of a function with domain Y . As a consequence, it makes perfect sense to talk about the conjugate of the conjugate of a function i.e. fcc_{. Furthermore, in the same way, we also have a notion of}

c-superdifferential for functions with domain Y .

• The conjugate of a function is always c-concave.

• For any f : X → R ∪ {−∞} it holds fcc _{≥ f and equality holds if and only}

if f is c-concave.

• Given f : X → R ∪ {−∞}, for any x ∈ X and y ∈ Y it holds f(x) + f∗_{(y) ≤}

c(x, y). The equality f (x) + f∗(y) = c(x, y) holds if and only if y ∈ ∂cf (x).

Hence, by symmetry, we have y ∈ ∂c_{f (x) ⇐⇒ x ∈ ∂}c_f∗_(y).

Remark 1.4.6. Before going on with the duality theory, let us take some time to

find out what is a c-concave function when the cost c is the squared distance in the Euclidean space. Let us fix X = Y = Rd _{and c(x, y) = |x − y|}2_{. With this}

choice of the cost, a function f : Rd→ R ∪ {−∞} is c-concave if and only if there is a set of pairs {(yi, αi)}i∈I ∈ Rd× R such that

f (x) = inf i∈I|x − yi| 2_{− α} i = |x|2+ inf i∈I−2hx, yii + |yi|2 − αi

and that is clearly equivalent to the concavity of the function f (x) − |x|2_.

We are now ready to give the characterization of c-monotonicity.

Theorem 1.4.7. For a subset S ⊆ X × Y the following statements are equivalent:

1. The subset S is c-monotone.

2. There exists a c-concave function f : X → R ∪ {−∞} such that for any

(x, y) ∈ S it holds y ∈ ∂c_{f (x). Furthermore the function can be chosen such}

(23)

1.4. Duality Theory and Characterization of Minimizers

The previous theorem asserts that a subset is c-monotone if and only if it is contained in the graph of the c-superdifferential of a c-concave function. The implication 2) =⇒ 1) is straightforward, whereas the other implication is far from obvious and is a cornerstone of the theory.

Let us state the main duality theorem in the optimal transport theory. It relates the minimum of the Kantorovich problem with the supremum of a dual problem.

Theorem 1.4.8. For any pair of probability measures µ ∈P(X) and ν ∈ P(Y ),

the following equality holds:

inf (ˆ X×Y c(x, y) dπ(x, y) : π ∈ Γ(µ, ν) )

q

sup f ∈Lip_b(X) g∈Lipb(Y ) (ˆ f dµ + ˆ g dν : f (x) + g(y) ≤ c(x, y) ∀x, y ∈ X × Y ) .

Furthermore both the infimum (as already remarked) and the supremum are at-tained.

Proof of the easy inequality. Let us prove only the inequality sup ≤ inf. Let π ∈

Γ(µ, ν) be an admissible plan and f ∈ Lip_b(X), g ∈ Lip_b(Y ) such that f (x)+g(y) ≤

c(x, y). We just apply the definitions and, presto, the desired inequality is proven

ˆ X×Y c(x, y) dπ(x, y) ≥ ˆ X×Y f (x)+g(y) dπ(x, y) = ˆ X f (x) dµ(x)+ ˆ Y g(y) dν(y) .

Furthermore let us remark that equality holds if and only if f (x) + g(y) = c(x, y)

π-almost everywhere. If that’s the case not only we have equality, but we also gain

the optimality of the plan π.

We are now ready to tackle the hard implications ofTheorem 1.4.2(c-monotone support ⇐⇒ optimal plan) and Theorem 1.4.8. Indeed, using the easy implica-tions of both in conjunction with Theorem 1.4.7 the results will follow naturally.

Sketch of the hard implications of Theorems 1.4.2and 1.4.8. Let π ∈ Γ(µ, ν) be a

transport plan supported on a c-monotone set S ⊆ X × Y .

Thanks to Theorem 1.4.7_{we can find a c-concave function f : X → R ∪ {−∞}}

such that S ⊆ graph(∂cf ). Hence we can deduce that f (x) + f∗(y) = c(x, y) for

π-almost every x, y ∈ X × Y . That proves the hard inequality in Theorem 1.4.8

(24)

It is now time to upgrade the results we stated for a Lipschitz cost to a more general lower semicontinuous cost. The generalizations are fairly technical and can be obtained mostly through careful approximations of the cost. We are going to state the results without proof.

Theorem 1.4.9. Let µ ∈ P(X) and ν ∈ P(Y ) be two probability measures. If

the cost c : X × Y → [0, ∞] is lower semicontinuous and there exist two functions a ∈ L1_{(µ) and b ∈ L}1_{(ν) such that c(x, y) ≤ a(x) + b(y), then}_{Theorem 1.4.8} _holds

but the supremum is attained by functions f ∈ L1(µ) and g ∈ L1_{(ν) not necessarily}

Lipschitz.

Let us remark that the technical assumption c(x, y) ≤ a(x) + b(y) automati-cally gives the finiteness of the minimum in Kantorovich formulation, indeed the transport plan π = µ ⊗ ν has cost

ˆ X×Y c(x, y) dπ(x, y) ≤ ˆ X a(x) dµ(x) + ˆ Y b(y) dν(y) < ∞ .

Theorem 1.4.10. Let µ ∈P(X) and ν ∈ P(Y ) be two probability measures. If

the cost c : X × Y → [0, ∞] is lower semicontinuous and there exist two functions a ∈ L1_{(µ) and b ∈ L}1_{(ν) such that c(x, y) ≤ a(x) + b(y), then a transport plan}

π ∈ Γ(µ, ν) is optimal if and only if it is supported on a c-monotone set.

1.5 Brenier and McCann Theorems

After our long digression on Kantorovich formulation, it is now time to exploit the theory we have developed to gain some understanding in the original Monge formulation.

We are going to consider only the Hilbertian case, namely when the cost is the square of the distance. This assumption is very convenient in view of the connection between c-concavity and normal concavity, a much more studied and understood concept.

Let us start with Brenier theorem, that solves the Monge formulation of optimal transport when X = Y = Rd and the cost is the square of the distance.

Theorem 1.5.1 (Brenier theorem). Let us work in the setting X = Y = Rd with cost c(x, y) = |x − y|2 _{and let us consider two probability measures µ, ν ∈}P

2(Rd)

such that µ  Ld. Then Monge problem and Kantorovich problem have the same minimum value. Furthermore there exists a unique optimal transport map T : Rd _{→ R}d _{(up to negligible sets) in the Monge formulation and there exists a}

convex function f : Rd → R ∪ {∞} such that T = ∇f. Lastly, if the gradient ∇f

of a convex function f : Rd_{→ R ∪ {∞} is an admissible transport map between µ}

(25)

1.5. Brenier and McCann Theorems

As anticipated we are not going to give a proof of the theorem. It can be found on any of the aforementioned monographs or in the original paper [Bre87; Bre91] by Brenier.

Let us just give a sketchy idea of the approach.

Sketch of Theorem 1.5.1 (Brenier theorem). Let π ∈ Γ(µ, ν) be an optimal

trans-port plan. Thanks to Theorems 1.4.2 and 1.4.7 we can find a c-concave function

g : Rd _{→ R ∪ {−∞} such that supp(π) ⊆ graph (∂}c_{g). As we have shown in the}

Remark 1.4.6we immediately deduce that f = 1₂(|x|2_{− g(x)) is a convex function.}

It is a simple computation to verify that ∂cg = ∂f where ∂f denotes the

stan-dard subdifferential. Hence, recalling that f is differentiable almost everywhere thanks to Alexandrov theorem (see [Vil08, Theorem 14.1]), we have proven that supp(π) ⊆ graph (∇f ). It is now a matter of checks to verify that the plan π is induced by the transport map ∇f .

Brenier’s theorem states that if a map T is the gradient of a convex function

f : Rd_{→ R∪{∞} and maps µ into ν then it is optimal. We would like to compute}

(assuming smoothness) what is the condition on the function f for its gradient to map one measure into the other.

Let µ = ρ1Ld and ν = ρ2Ld. We have to impose

(∇f )#(ρ1Ld) = ρ2Ld

and, applying the area formula, this is equivalent to the validity of det ∇2f = ρ1

ρ2(∇f )

at µ-almost every point.

This last equation is known in literature as Monge-Ampère equation and will be one of the main ingredient of the heuristic that leads to the PDE approach to the matching problem. Even though we will not dig in this field, we must mention that this equation gives rise to a variety of questions related to the regularity of its solutions. A complete survey on the topic that describes the main result known in literature is [DF14].

It is natural to ask ourselves whether Brenier theorem can be extended to a curved ambient space. This question is positively answered by the following theorem of McCann (the original paper is [McC01]).

Theorem 1.5.2 (McCann theorem). Let (M, d) be a compact C2 Riemannian manifold and let m be its volume measure. Let us assume X = Y = M and c(x, y) = d(x, y)2_{. Given two measures µ, ν ∈}P

2(M ) with µ m, the Monge and

Kantorovich problems have the same minimum value and there exists an optimal transport map T : M → M such that T = exp(−∇f ) where f : M → R is a c-concave function.

(26)

Let us underline that the property of c-concavity is not extremely transparent when we are working on a manifold. We will address this issue in one of the following chapters, managing to obtain a condition on f : M → R that implies both the c-concavity and the optimality of exp(−∇f ).

1.6 Benamou-Brenier Formula

In this last section we are going to state, with a glimpse of motivation, the Benamou-Brenier formula. It gives a new way to compute the Wasserstein dis-tance between two measures as the minimum length of connecting curves. This is the first step toward a differential viewpoint on the space (P2, W2) that we are

not going to treat. The original paper introducing the formula is [BB00], whereas the papers where the differential viewpoint is described and investigated for the first time are [Stu06; LV09].

In this whole section the ambient spaces are equal X = Y and furthermore they are either the Euclidean space Rd _{or a smooth complete Riemannian manifold M .}

We need a smooth ambient space in order to be able to define a continuity equation. Until now we have never really transported mass. We should instead say that we have just teleported mass, indeed both in the Kantorovich and in the Monge formulation, we are simply saying where the mass should go.

In the Monge formulation a transport map induces a way to transport the mass: it is sufficient to follow the geodesic from x to T (x).

Instead, in the Kantorovich formulation it is not even clear whether it makes sense to describe the flow of the mass. Hence let us start by giving an informal description of a flow plan. Let us assume that the transport happens in the range of time t ∈ [0, 1]. A flow plan should specify where the mass is at any time

t ∈ [0, 1] and how the mass is flowing at time t. With this intuition in mind, let

us give the precise definition.

Definition 1.6.1 (Flow plan). Given two measures µ0, µ1 ∈ P2(X), a flow

plan is the joint information of a weakly continuous curve of measures µt ∈

C0_{([0, 1] ,}_P

2(X)) and a time-dependent Borel vector field (vt)t∈[0, 1] on X such

that the continuity equation holds in the distributional sense d

dtµt+ div(vtµt) = 0 .

Remark 1.6.2. Before going on, it is worth spending some time to understand the

(27)

1.6. Benamou-Brenier Formula

When we say that (µt, vt)t∈[0, 1] satisfy the continuity equation in the

distribu-tional sense, we mean that for any ϕ ∈ C_c∞(X × (0, 1)) it holds ˆ 1 0 ˆ X d dtϕ(x, t) dµtdt = ˆ 1 0 ˆ X ∇ϕ(x, t) · vt(x) dµtdt .

Under mild assumptions on the vector field, we have existence and uniqueness of the curve µt once µ0 is fixed. Let us give a formal statement of what we said.

We want to underline that the assumptions are far from being the minimal ones. For a sharp result, consider looking into [AGS08, Chapter 8].

Theorem 1.6.3. Let (vt)0≤t≤1 be a smooth in time and space vector field that is

compactly supported (uniformly in time). For any µ0 ∈P1(X), there is a unique

solution to the continuity equation and it is given by µt := (Ft)#µ0, where Ft is

the flow induced by the vector field (vt).

The Benamou-Brenier formula correlates the Wasserstein distance between µ0

and µ1 with the minimal cost of a flow plan. This allows us to prove upper bounds

for the Wasserstein distance between two measure exhibiting a flow plan between the two.

Theorem 1.6.4 (Benamou-Brenier formula). Given two measures µ0, µ1 ∈P2(X),

it holds W₂2(µ0, µ1) := inf (ˆ 1 0 ˆ X

|vt|2dµt : (µt, vt) is a flow plan between µ0 and µ1

)

. Proof. Let us give only a sketch of the upper bound for the Wasserstein distance,

with the additional assumption that vt is a smooth vector field. We give only this

part of the proof as it is nontechnical and furthermore it is the only part of the statement that we are going to use in the next chapters.

Let us fix a flow plan (µt, vt) and let us assume that the vector field (vt)0≤t≤1

is smooth and compactly supported. Let Ft : X → X be the flow induced by

(vt)0≤t≤1.

As we have already remarked, it holds (Ft)#µ0 = µt, thus F1 is an admissible

transport map. Using that the distance function from any point is 1-Lipschitz, we are able to estimate the cost of the transport map F1

(28)

(29)

2

HEAT EQUATION

The heat equation is one of the oldest and most natural evolution equation. Let us start by giving a very brief introduction to the heat equation in the Euclidean space as it can be seen as a toy-problem for the more general case we will face. This is a very well known topic and a classic introductory reference is [Eva10, Section 2.3].

Using the classical notion of derivative in the Euclidean setting, the heat equa-tion is given by ut = ∆u where u : [0, ∞) × Rd → R is a twice-differentiable

function. The initial datum is usually the value of u at time 0. Without any fur-ther assumption, the solution is not unique and it does not enjoy any integrability property. As soon as we ask that u(t) ∈ L2_(Rd_{) for every time t ≥ 0, we gain}

both uniqueness and existence. Furthermore, through a classical application of the Fourier transform, we are also able to give an explicit formula for the solution

u: u(t, x) := ˆ Rd u(0, y) · e −|x−y|2_4t (4πt)d2 dy . The function pt(x, y) := e −|x−y|2 4t (4πt)d2

is called the heat kernel and is the fundamental solution of the heat equation in the Euclidean space.

As a consequence of the formula it is clear that u is smooth at any positive time t > 0 (as it is expressed as a convolution with a smooth kernel). From a more qualitative point of view, it can be shown that, if u(0, · ) ∈ L1, the average of u is constant and furthermore u converges to 0, in the L2_{-norm, when t goes to}

infinity. The last two properties stated seem to be in contradiction, but this is not the case as the space is not compact and thus mass can escape to infinity.

Let us remark that, for a large part of the observations described above, our main tool has been the explicit formula deduced via Fourier transform. As strong as this approach is, it is also extremely little robust. Indeed, as soon as we don’t have anymore anything that resembles a Fourier transform, it is unfeasible. In this chapter we will prove more or less all of the said observations, and something more,

(30)

in the more general setting of a smooth compact Riemannian manifold. There are multiple ways to approach the problem, let us enumerate them:

• Applying the general theory of parabolic equations we can easily gain exis-tence and smoothness of the solution. This way is very concise but it would be arduous to obtain the global estimates we need (in fact the theory would give us mostly local estimates). Furthermore this approach is extremely technical.

• Exploiting the spectral properties of the Laplace operator we could obtain more or less all of the desired properties. We decided to follow a different approach mainly because this one is harder to generalize to weighted man-ifolds and impossible to generalize to the general setting of metric measure spaces.

• Using the gradient flow structure of the heat equation we are able to recover the existence, uniqueness and smoothness of the solution in a synthetic and elegant way. This is the approach we will follow.

• The most common way in literature (see [Cha84, Chapter 6] or [Ros97, Chap-ter 3]) to construct the heat kernel on a Riemannian manifold is through a delicate iterative construction that is at the same time very computationally demanding and hard to grasp. Nevertheless, as always, the more work pays and indeed with this approach it is possible to obtain sharp estimates on the heat kernel that are harder to get with the other approaches. We postpone the statements of these estimates to the chapter where we will need them.

2.1 Existence and Uniqueness of the solution

Let us start by stating the general existence theorem we are going to apply. It is the main result in the gradient flow theory and it states that under minimal assumption the curve of steepest descent (with respect to an energy) exists and is unique. Such a result is classical and can be found in [Bré73]. The theorem, with adequate modifications, is valid in the much more general setting of metric measures spaces as shown in the monography [AGS08].

Before stating the result let us recall the definition of absolutely continuous curve and its main properties.

Definition 2.1.1 (Absolutely Continuous Curve). Let (H, h · , · i) be a Hilbert

space. Given an open interval I ⊆ R, a curve γ : I → H is absolutely continuous if for any δ > 0 there exists > 0 such that for any family of disjoint intervals ([xi, yi])1≤i≤k ⊆ I it holdsP|γ(yi) − γ(xi)| < whenever P|yi− xi| < δ.

(31)

2.1. Existence and Uniqueness of the solution

The set of absolutely continuous curves from I into H is denoted by AC(I, H). Of course we can also talk about locally absolutely continuous curves. Such curves will be denoted by ACloc. It is important to remark that we could also define,

without changing the definition, the absolutely continuous curves with values in a metric space. We have not done so because we will never need the full generality. The main result concerning absolutely continuous curves is that they satisfy the fundamental theorem of calculus. See [AGS08, Remark 1.1.3].

Theorem 2.1.2 (Fundamental Theorem of Calculus). Let (H, h · , · i) be a Hilbert

space and I ⊆ R be an open interval. Any absolutely continuous curve γ ∈

AC(I, H) is differentiable1 _{at almost every t ∈ I. Denoting with γ}0

the deriva-tive, we have that |γ0| ∈ L1_{(I) and furthermore for any a, b ∈ I with a < b it}

holds

γ(b) − γ(a) =

ˆ b a

γ0(t) dt .

Furthermore, in order to state an estimate on the solution of a gradient flow, it is handy to give the definition of gradient of a convex function.

Definition 2.1.3 (Gradient of Convex Function). Let H be a Hilbert space. Given

a convex function F : H → (−∞, ∞] and a point x0 ∈ dom(∂F ), we define the

gradient of F in x0, denoted with ∇F (x0), as the element with minimum norm in

∂F (x0).

Finally we are done giving preliminary results; hence we can state the main theorem about gradient flows. It can be found both in the original [Bré73] and in the book [Eva10, Chapter 9].

Theorem 2.1.4 (Komura-Brezis). Let (H, h · , · i) be a Hilbert space and F : H →

[0, ∞] be a lower semicontinuous convex function. For any point x0 ∈ dom(F ) in

the closure of the domain of F , there exists a unique curve x ∈ ACloc((0, ∞) , H)

such that:

• It holds limt→0x(t) = x0.

• For almost every t > 0 it holds x0_{(t) ∈ −∂F (x(t)).}

• For every t > 0, x(t) ∈ dom(∂F ) and |∇F (x(t))|2 _≤ _inf

y∈dom(∂F )|∇F (y)|

2 ₊ 1

t2|x0− y| 2_.

1_{In this context differentiable is understood in the classical sense: the incremental ratios}

(32)

With the gradient flows theory in mind, we are ready to say what does it mean for us to solve the heat equation. We are going to work in a compact connected Riemannian manifold M without boundary. In the last section of this chapter we will outline how this theory works on weighted manifolds (possibly with boundary). From now on, for the whole chapter, (M, g) will denote a compact connected Riemannian manifold without boundary. The volume measure, that is assumed without loss of generality to be a probability measure, is denoted by m.

Definition 2.1.5 (Heat Equation). Given a function f ∈ L2_{(M ), a solution of the}

heat equation with initial datum f is a curve of functions u ∈ ACloc((0, ∞) , L2(M ))

such that

• It holds lim

t→0u(t) = f where the limit is understood in the L

2_-norm;

• For almost every t > 0, it holds2 _u0_{(t) = ∆u.}

Applying Theorem 2.1.4 (Komura-Brezis) it is easy to prove existence and uniqueness for the heat equation.

Proposition 2.1.6. For any initial datum f ∈ L2(M ), there exists a unique

solu-tion to the heat equasolu-tion. Furthermore, denoting with u ∈ ACloc((0, ∞) , L2(M ))

the solution, for any t > 0 it holds

ˆ M |∆u|2_{dm ≤} 1 t2 ˆ M |f |2_{dm .}

Proof. Given the definition of solution and the stated theorem about gradient flows

Theorem 2.1.4(Komura-Brezis), it would be enough to find a convex energy whose subdifferential is the the opposite of the Laplace operator (i.e. −∆) to obtain the existence and uniqueness part of the statement.

The energy we are looking for is the Dirichlet energy F : L2_{(M ) → [0, ∞],}

that is F (f ) :=    1 2 ´ M|∇f | 2_dm _{if f ∈ W}1,2_{(M ) ,} +∞ if f 6∈ W1,2_{(M ) .}

Of course F is strictly convex and lower semicontinuous. Let us prove that its subdifferential coincides with the Laplace operator.

When f 6∈ W1,2_{(M ), the claimed result holds, indeed the subdifferential is}

empty by definition and the Laplace operator of f does not belong to L2(M ). We are implicitly using that if ∆f ∈ L2_{(M ) then f ∈ W}2,2_{(M ). This simple elliptic}

2_{Here the Laplace operator has to be understood in the weak sense. Nonetheless, as we}

(33)

2.2. Self-Adjointness and Regularity

estimate (that can be shown directly applying Bochner identity) will be generalized to higher derivatives in Proposition 2.2.3.

Hence we can assume that f ∈ W1,2_{. By an equivalent definition of}

subdiffer-ential, we know that s ∈ ∂F (f ) if and only if for any g ∈ L2_{(M ) there exists a}

λg ≤ 0 such that for any t ∈ R it holds

F (f + tg) − F (f ) − ths, gi ≥ λgt2.

If g 6∈ W1,2(M ) the desired inequality holds for any s ∈ L2(M ). Hence we can assume that g ∈ W1,2_{(M ). Through some easy computations the last inequality}

can be transformed into

t ˆ M ∇f · ∇g − sg dm ≥ t2_(λ g− F (g)) , thus _ˆ M ∇f · ∇g − sg dm = 0 which is equivalent to s = −∆f .

The estimate part of the statement follows from the estimate in Theorem 2.1.4

(Komura-Brezis) if we put y = 0.

Once we have proven uniqueness and existence for an evolution equation like the heat equation, it is natural to consider the associated semigroup.

Definition 2.1.7. For any t > 0, let us define the operator Pt: L2(M ) → L2(M )

that maps a function f ∈ L2_{(M ) to u(t), where u ∈ AC((0, ∞) , L}2_{(M )) is the}

unique solution to the heat equation with initial datum f .

Remark 2.1.8. Because of the uniqueness of the solution, the semigroup Pthas the

semigroup property Ps+t = Ps◦ Pt. Furthermore, thanks to the linearity of the

heat equation, the operator Pt is linear. Lastly, the image of Pt is contained into

W2,2_{(M ) for any t > 0 as ∆P}

t(f ) ∈ L2(M ).

2.2 Self-Adjointness and Regularity

The heat equation on Rd shows a regularization property. It is natural to ask ourselves whether that is still true on a manifold. Of course, the answer is positive and we are going to prove it.

Somehow unexpectedly, the property that gives us the ability to start a boot-strap algorithm that gains the regularity is the fact that the operator Pt: L2(M ) →

(34)

Proposition 2.2.1. For any time t > 0, the operator Pt : L2(M ) → L2(M ) is

self-adjoint.

Proof. Let us fix two functions f, g ∈ L2_{(M ) and a time t > 0. Let us define the}

function ψ : [0, t] → R as

ψ(s) := hPs(f ), Pt−s(g)i .

If we could prove that ψ is constant, the thesis would follow from ψ(0) = ψ(t). The function ψ is locally absolutely continuous and its derivative is

ψ0(s) = h∆Ps(f ), Pt−s(g)i − hPs(f ), ∆Pt−s(g)i

= h∇Ps(f ), ∇Pt−s(g)i − h∇Ps(f ), ∇Pt−s(g)i = 0 ,

hence the result is proven.

We can now prove that the Laplace operator and Pt commute. This will be

the main ingredient in the bootstrap argument. Let us remark that this is true in general for semigroups induced by self-adjoint linear operators (as in Hille-Yosida theorem).

Proposition 2.2.2. If f ∈ W2,2_{(M ), then, for any t > 0, it holds}

∆Pt(f ) = Pt(∆f ) .

Proof. Let us fix an additional function g ∈ W2,2_{(M ).}

Thanks to Proposition 2.2.1 we know that the quantity hPt(f ), gi − hf, Pt(g)i

is independent of t. Hence, differentiating it and applying once again Proposi-tion 2.2.1, we get

0 = h∆Pt(f ), gi − hf, ∆Pt(g)i = h∆Pt(f ), gi − hPt(∆f ), gi

and this implies easily our result.

The main technical tool we will adopt to gain regularity is a very classical elliptic estimate that relates the Sobolev norm of the Laplacian with the Sobolev norm of the function. It can be proven first on Rn and then generalized to a Riemannian manifold using a partition of unity decomposition. This result, for general elliptic operators on open sets of Rn_{, can be found in [}_GT15_{, Thm. 8.13].}

Proposition 2.2.3. For any nonnegative integer k ∈ N, there exists a constant

C1(k, M ) such that for any function f ∈ Wk+2,2(M ) it holds

(35)

2.2. Self-Adjointness and Regularity

The bootstrap argument that shows regularity for a fixed time t > 0 is now ready to be described. In order to gain classical smoothness of the solution, we do need a theorem transforming Sobolev regularity into classical regularity. Such a theorem is, of course, the Sobolev embedding theorem. We state only the part of the theorem we do need (mainly because the full statement is quite long). It can be found in full generality in [Eva10, Section 5.6.3].

Theorem 2.2.4 (Sobolev Embedding Theorem). Let k ≥ 0 be a nonnegative

integer and let r ≥ 0 be a nonnegative integer such that k − d₂ > r. Any function f ∈ Wk,2_{(M ) admits a representative ˜}_f a.e._{= f such that ˜}_{f ∈ C}r_{(M ). Furthermore,}

for any 0 < α ≤ min(k − d₂ − r, 1), the immersion of Wk,2_{(M ) into C}r,α_{(M ) is}

continuous.

Proposition 2.2.5. Let us fix a function f ∈ L2(M ) and a nonnegative integer

k ∈ N. For every t > 0 it holds Pt(f ) ∈ W2k,2(M ). Furthermore there exists a

constant C2(k, M ) such that

kPt(f )kW2k,2 ≤ C₂(k, M ) 1 + 1 tk kf k_L2.

Thus, for every t > 0, Pt(f ) is smooth.

Proof. The smoothness part of the statement follows directly from Theorem 2.2.4

(Sobolev Embedding Theorem).

Given a nonnegative integer k ∈ N, let us prove that for almost every t > 0 it holds Pt(f ) ∈ W2k,2(M ). The upgrade from “almost every t” to “every t” comes

for free thanks to the continuity of the curve t 7→ Pt(f ).

Let us define s := t

k. Applying k times the commutation stated in

Proposi-tion 2.2.2, we can prove

∆kPt(f ) = (∆Ps)k(f )

and hence, applying the estimate stated in Proposition 2.1.6, we obtain k∆k_P t(f )kL2 ≤ 1 skkf kL2 = kk tkkf kL2.

The result now follows applying Proposition 2.2.3.

The last step toward full regularity is to gain the joint regularity in time and space for every time t > 0. In order to do this, let us start by showing that _dtdPt(f )

is a weak Sobolev derivative.

Lemma 2.2.6. Given a function f ∈ L2(M ), let us denote u(t, x) := Pt(f )(x)

the solution to the heat equation with initial datum f . For any function ϕ ∈ Cc((0, ∞) × M ) it holds ˆ ∞ 0 ˆ M u(t, x)∂ϕ(t, x) ∂t dm(x) dt = − ˆ ∞ 0 ˆ M u0(t, x)ϕ(t, x) dm(x) dt .

(36)

Proof. Let us fix > 0 such that supp(ϕ) ⊆ (, ∞) × M . Thanks to the fact that u ∈ ACloc((0, ∞) , L2(M )), we know that for any g ∈ L2(M ) and for any t ≥ it

holds ˆ M u(t, x)g(x) dm(x) = ˆ M u(, x) · g(x) dm(x) + ˆ t ˆ M u0(s, x)g(x) dm(x) ds . Thence ˆ ∞ 0 ˆ M u(t, x)∂ϕ(t, x) ∂t dm(x) dt = ˆ ∞ ˆ M u(, x)∂ϕ(t, x) ∂t dm(x) + ˆ t ˆ M u0(s, x)∂ϕ(t, x) ∂t dm(x) ds ! dt = ˆ ∞ ˆ t ˆ M u0(s, x)∂ϕ(t, x) ∂t dm(x) ds dt = ˆ ∞ ˆ M u0(s, x) ˆ ∞ t ∂ϕ(t, x) ∂t dt ! dm(x) ds = − ˆ ∞ 0 ˆ M u0(s, x)ϕ(s, x) dm(x) ds .

We can now state and prove the smoothness of Pt(f ) as a joint function of time

and space.

Theorem 2.2.7. For any f ∈ L2_{(M ), the function u(t, x) := P}

t(f )(x) is smooth

on (0, ∞) × M .

Proof. Thanks toLemma 2.2.6 we know that the weak derivative in time of u, i.e.

∂

∂tu, coincides with

d

dtu(t). Hence, as u is the solution to the heat equation, we

know that _∂t∂u = ∆u where the derivative in time is weak. Furthermore, thanks

toProposition 2.2.2, we can iterate this reasoning and obtain that

∂2

∂t2u = ∆ 2_{u .}

Let us define the space-time Laplace operator ˜∆ = _∂t∂22 + ∆. Joining what we said,

we obtain that for any nonnegative integer k ∈ N it holds ˜

∆ku = (∆2+ ∆)ku ,

hence, thanks toProposition 2.2.5, we have that for k ˜∆k_uk

L2_{(M )} is locally bounded

in time and, applyingProposition 2.2.3, this implies u ∈ W_loc2k,2((0, ∞) × M ). The regularity of u follows fromTheorem 2.2.4(Sobolev Embedding Theorem) as what we said holds for an arbitrarily large k ∈ N.

A New Approach to the Random Matching Problem

Universit`

a degli Studi di Pisa

A New Approach to the Random Matching Problem

in 2 Dimensions

Contents

INTRODUCTION

1

Random Matching Problem

2

Lower Bound Heuristic

3

Known Results in All Dimensions

3.1

Dimension 1

3.2

Dimension ≥ 3

3.3

Dimension 2

4

Structure of the Thesis

NOTATION

1

OPTIMAL TRANSPORT TOOLS

1.1

Polish Spaces and Prokhorov Theorem

1.2

Monge Formulation

1.3

Kantorovich Formulation and Wasserstein

Distance

1.4

Duality Theory and Characterization of

Minimizers

q

1.5

Brenier and McCann Theorems

1.6

Benamou-Brenier Formula

2

HEAT EQUATION

2.1

Existence and Uniqueness of the solution

2.2

Self-Adjointness and Regularity