Elements of Mathematical Oncology Franco Flandoli

(1)

Elements of Mathematical Oncology

Franco Flandoli

(2)

(3)

Part 1. Stochastic Di¤erential Equations, linear Partial Di¤erential Equations and

their links 5

Chapter 1. Brownian motion and heat equation 7

1. Introduction 7

2. Simulations of Brownian motion 8

3. About the problem of …nding a density from a sample 10

4. Macroscopic limit of Brownian motions 13

5. On the weak convergence of measures, in the random case 17

6. Heat equation as Fokker-Planck and Kolmogorov equation 18

7. Two-dimensional simulations 19

Chapter 2. SDEs and PDEs 23

1. Stochastic di¤erential equations 23

2. Simulation of SDE in dimension one 25

3. The method of compactness for SDEs 27

4. Links between SDEs and linear PDEs 32

5. Simulations of the macroscopic limit 37

Part 2. Interacting systems of cells and nonlinear Partial Di¤erential Equations 43

Chapter 3. Compactness results 45

1. Compact sets in the space of measure valued functions 45

2. Compactness criteria in in…nite dimensional function spaces 48

Chapter 4. Mean …eld theory 53

1. Mean …eld model. Compactness of the empirical measure 53

2. Passage to the limit 56

3. Uniqueness for the PDE and global limit of the empirical measure 60

4. The men …eld SDE and propagation of chaos 61

Chapter 5. Local and intermediate interactions 67

1. Simulation of interacting particles 67

2. Local interaction 69

3. The macroscopic limit of particles with a size 69

4. The PDE associated with local interaction 73

3

(4)

4 CONTENTS

5. Intermediate interaction: preparation 79

6. Rigorous results on intermediate interaction 82

Part 3. Growth and change of species in populations of cells and nonlinear Partial

Di¤erential Equations 95

Chapter 6. Examples of macroscopic systems in Mathematical Oncology 97

1. An advanced model of invasive tumor with angiogenesis 97

2. The Fisher-Kolmogorov-Petrovskii-Piskunov model 105

3. A probabilistic representation 110

4. Existence, uniqueness, invariant regions 114

5. Remarks on the full invasive model with angiogenesis 119

6. Simulations about the full system 128

7. Fick or Fokker-Planck? 129

8. Modelling the crowding-driven di¤usion 130

9. Conclusions 131

Chapter 7. The mathematics of proliferation at the microscopic level 133

1. Introduction 133

2. The model 133

3. Preliminaries on Itô formula for processes de…ned on random time intervals 134

4. Back to our case 135

5. Estimates on the martingale terms 138

6. Estimates on h^N_t 142

Bibliography 147

(5)

Part 1

Stochastic Di¤erential Equations, linear Partial

Di¤erential Equations and their links

(6)

(7)

Brownian motion and heat equation

1. Introduction

We recall that, given a probability space ( ; F; P ) and a …ltration (Ft)_{t 0}, namely what we call a

…ltered probability space ( ; F; F^t; P ), a (continuous) Brownian motion is a stochastic process (Bt)_{t 0} with the following properties:

i) it is a continuous adapted process ii) B₀= 0 a.s.

iii) for every t s 0, the r.v. Bt Bs is Gaussian N (0; t s) and it is independent of F^s. If continuity is not prescribed, it can be proved that there is a continuous version. A Brownian motion in R^d is a stochastic process (Bt)_{t 0} with values in R^d, Bt = B_t⁽¹⁾; :::; B_t^(d) such that its components B_t⁽ⁱ⁾ are independent real valued Brownian motions.

We also recall that the heat equation in R^d, with di¤usion constant k > 0, is the Partial Di¤erential Equation (PDE)

@u_t

@t = k u_t; ujt=0= u₀:

Here u = u_t = u_t(x) denotes the solution, a function u : [0; T ] R^d ! R (we may replace [0; T ] by [0; 1)) and u0 denotes the initial condition, a function u₀ : R^d! R.

This section is devoted to the description of a few properties of these objects and their links. We restrict the attention to k = ¹₂ for simplicity of notations and show the modi…cations in the general case at the end of the section.

1.1. Heat kernel. Let p_t(x) be the function

pt(x) = (2 t) ^d=2exp jxj²=2t de…ned for t > 0. It is called the heat kernel. We have

@pt(x)

@t = 1

2 pt(x) : Indeed,

@pt(x)

@t = d=2 (2 t) ^{d=2 1}2 exp jxj²=2t + (2 t) ^d=2exp jxj²=2t jxj²=2t²

= pt(x) d 2t+jxj²

2t²

!

7

(8)

8 1. BROWNIAN MOTION AND HEAT EQUATION

@ipt(x) = pt(x) ( xi=t)

@_i²pt(x) = pt(x) ( xi=t)²+ pt(x) ( 1=t)

= p_t(x) x²_i t²

1 t :

Most of the results of this section depend on this simple computation. However, let us now restart progressively from Brownian motion and investigate it also at a numerical level.

2. Simulations of Brownian motion

2.1. Simulation of a trajectory of Brownian motion. Let (B_t)_{t 0}, be a real valued Brownian motion, on a …ltered probability space ( ; F; Ft; P ). Let X₀ be an F0-measurable random variable.

Consider the stochastic process

Xt= X0+ Bt: It will be the solution of the stochastic di¤erential equation

dX_t= dB_t; Xjt=0 = X₀

when this concept will be clari…ed. The following simple code, written with the free software R, simulates a trajectory, in the spirit of explicit Euler scheme of discretization of stochastic di¤erential equations (n is the number of time steps; just to introduce one more R command, we use an uniform distributed initial condition):

n = 10000; dt = 0:01; h = sqrt(dt) X = 1 : n

X[1] = runif(1; 1; 1) for (t in 1 : (n 1)) f X[t + 1] = X[t] + h rnorm(1) g

plot(X; type = "l"; col = 3) lines((1 : n) 0)

Let us stress the role of the quantity h = sqrt(dt): the random variable corresponding to the software command X[t + 1] X[t] is Xt+dt Xt, which is equal to

X_t+dt X_t= B_t+dt B_t

hence it is, by de…nition of Brownian motion, a Gaussian N (0; dt). As such, it can be represented in the form

B_t+dt Bt=p dtZ

where Z is N (0; 1). This is why, in the code, X[t + 1] X[t] is settled equal to h rnorm(1). Notice

…nally that the independence of increments of Brownian motion is re‡ected, in the code, by the fact that rnorm(1) computes independent values at each iteration of the "for" cycle.

(9)

2.2. Simulation of several independent Brownian motions. What we are going to do can be equivalently described as the simulation of several Brownian trajectories, but in the spirit of systems of many particles we prefer to think as we had several independent Brownian motions and we simulate a single trajectory for each one of them.

Consider a sequence B_{t t 0}ⁱ , i = 1; 2; ::: of independent Brownian motions on a …ltered probability space ( ; F; F^t; P ).

Consider a sequence X₀ⁱ, i = 1; 2; ::: of random initial conditions, F0-measurable, independent and identically distributed with law having density ₀(x) with respect to Lebesgue measure on R^d.

Consider the following simple di¤erential equations

dX_tⁱ= dB_tⁱ; Xⁱj^t=0= X₀ⁱ having solutions

X_tⁱ = X₀ⁱ+ B_tⁱ: Let us see a picture with a myriad of colors:

N = 500; n = 10000; dt = 0:01; sd0 = 0:5; h = sqrt(dt) X = matrix(nrow = N; ncol = n)

X[; 1] = rnorm(N; 0; sd0) for (t in 1 : (n 1)) f

X[; t + 1] = X[; t] + h rnorm(N) g

plot(c(0; n); c( 20; 20)) for (k in 1 : N) f

lines(X[k; ]; col = k) g

First, notice the shape of the envelop, roughly likep

t, corresponding to the property V ar B_tⁱ = t.

(10)

We may look more closely the distribution of points at some time t. To have a better result, we increase the number of points and avoid to produce the previous picture which is time consuming:

N=1000; n=10000; dt=0.01; sd0=0.5; h=sqrt(dt) X=matrix(nrow=N,ncol=n)

X[,1]=rnorm(N,0,sd0) for (t in 1:(n-1)) { X[,t+1]=X[,t]+h*rnorm(N) }

nn=10

hist(X[,nn],30,FALSE)

lines(density(X[,nn],bw=sd(X[,nn])/3))

Her are two pictures corresponding to the time steps 10 and 1000:

These pictures arise the following question: when the number N of "particles" tends to in…nity, does this pro…le converges to a well de…ned limit pro…le? And could we obtain the limit pro…le by a less time consuming method, for instance by solving a suitable equation?

To get a feeling of the practical problem, take N = 10000. We see that now the software, on an ordinary laptop requires a non-negligible amount of time; and the pro…le is more and more regular. In a gas or ‡uid, the number of molecules is of the order of 10²⁰; in a living tissue a¤ected by a cancer, the number of tumor cells may be of the order of 10⁹; both are incredibly larger than N = 10000. The pro…le should be extremely regular, but not a¤ordable by a direct simulation of the particle system.

3. About the problem of …nding a density from a sample

In the previous section, given the values X_t¹; :::; X_t^N at some time t, we tried to represent graphically these points by means of a probability density, which re‡ects their degree of concentration.

(11)

The histogram is the …rst easy way: the space is partitioned in cells (here intervals of equal length) and the number of point in each cell is counted. One can plot the histogram giving the number of points per cell or its normalization with area one, to be compared with a probability density.

In order to obtain a more regular probability density function, there are various methods. One of them is particularly connected with certain theoretical investigation that we are going to perform in these lectures: it is the "kernel smoothing" method. It starts from a "kernel", namely a probability density K (x). Then the kernel is rescaled by the formula (d is the spatial dimension)

K (x) := ^dK ¹x and, given the points, we have to perform the average

N(x) := 1 N

XN i=1

K x X_tⁱ :

Notice that

Z

K (x) dx = 1 hence

Z

N(x) dx = 1 N

XN i=1

Z

K x X_tⁱ dx = 1 N

XN i=1

Z

K (x) dx = 1:

In other words, the resulting function _N(x) is a probability density function.

To implement this method one has to choose a kernel K (certain R commands choose a Gaussian kernel by default), but the result is only mildly a¤ected by this choice. What really makes a di¤erence is the choice of the rescaling factor , also called the "bandwidth". The general idea is that for large we get a rather ‡at and very smooth pro…le N(x); for small the pro…le oscillates. One has to choose an intermediate value of , which is the most di¢ cult problem of the implementation of the method (as it is the choice of the cells in the histogram). Wrong choices really give results that are not acceptable, too far from reality.

It is quite natural to expect that the value of the bandwidth is related to the standard deviation of points. However, the simple choice =sd(data) does not give the best result. Correction by a factor improve the result, but the choice of the factor is not easy. There are ad hoc rules, quite incredible, like the one at the help page ?bw.nrd or software R. Below we propose to use sd(data)/5, as a trial, but it is not always the best.

Concerning di¤erent variants of implementation of the kernel smoothing idea, see the help of R under the names ?ksmooth, ?bkde, ?density and for instance the paper at http://vita.had.co.nz/papers/density- estimation.pdf.

The reader is suggested to try the following exercise: a Gaussian sample is generated, the histogram is plotted and, optionally, also the true density that generated the sample is plotted over the histogram.

Then, one can over-plot the density given by a method of kernel smoothing, for di¤erent values of the bandwidth. Here is an example of code:

(12)

Z=rnorm(10000,0,10) hist(Z,50,FALSE) Z0=sort(Z)

Y=dnorm(Z0,0,10) lines(Z0,Y)

lines(density(Z,bw=sd(Z)/5),col="red")

and here are two examples with wrong choices of the bandwidth (we write only the last line):

lines(density(Z,bw=0.1),col="red"), lines(density(Z,bw=10),col="red")

(13)

4. Macroscopic limit of Brownian motions First of all let us introduce the so called empirical measure

S_t^N := 1 N

XN i=1

X_tⁱ:

It is a random probability measure on Borel sets, convex combination of random delta Dirac measures.

At the position X_tⁱ of each single particle we put a pointwise mass of size _N¹. By random probability measure we may loosely just mean a probability measure depending on ! 2 ; more rigorously we must explain in which sense we mean the measurability in ! and the simplest way it to say that R (x) S^N_t (dx) must be measurable, for each 2 Cb R^d .

If we imagine thus measure as a bunch of very small point masses, one one side we may get the feeling that the global mass is more concentrated here than there, but on the other side we do not see any pro…le, similar to a Gaussian or others. The only "altitude" we see is _N¹, in a sense.

To extract a pro…le we may mollify the atomic measure S_t^N by a convolution with a kernel: given a probability density (x), setting

(x) := ^d ¹x we perform the convolution

u_N(x) := S_t^N (x) :=

Z

R^d

(x y) S_t^N(dy) :

(14)

The function uN(x) is a regularized version of S_t^N and it is a probability density (the proof is the same made above for K). We have

S_t^N (x) = 1 N

XN i=1

x X_tⁱ

hence this operation coincides with the kernel smoothing method described above (with K = ).

After these preliminaries, we ask the following question: does S_t^N converge, as N ! 1, to a limit probability measure, maybe having a density _t(x)? Under the previous assumptions, this is true.

Dealing with convergence of measures, let us recall that we call weak convergence the property

(4.1) lim

N !1

Z

R^d

(x) S_t^N(dx) = Z

R^d

(x) t(x) dx when it holds for every continuous bounded function .

Theorem 1. Under the assumptions speci…ed at the beginning of Section 2.2,for every test function 2 Cb R^d we have property (4.1) in the sense of almost sure convergence. Moreover,

t(x) = Z

R^d

pt(x y) ₀(y) dy

where p_t(x) is the density of a Brownian motion in R^d, namely p_t(x) = (2 t) ^d=2exp jxj²=2t . Finally, the function t(x) is smooth for t > 0 and satis…es the Cauchy problem for the heat equation

@ t

@t = 1

2 ^t; j^t=0= 0

(the initial condition is assumed as a limit in L¹ R^d as t ! 0)

Proof. Given t, the r.v. X_tⁱ are i.i.d. and thus the same is true for the r.v. X_tⁱ , with 2 Cb R^d ; moreover, by the boundedness of , the r.v. X_tⁱ have …nite moments. Therefore, by the strong Law of Large Numbers,

1 N

XN i=1

X_tⁱ ! E X_t¹ in the sense of almost sure convergence.

Since X_tⁱ = X₀ⁱ+ B_tⁱ and since the terms X₀ⁱ and B_tⁱ are independent, with densities ₀ and p_t(x) respectively, then also X_tⁱ has density, given by the convolution of ₀ and p_t(x), density we have denoted above by t(x). Moreover

E X_t¹ = Z

R^d

(x) t(x) dx

and thus property (4.1) is fully proved, along with the convolution formula for _t(x).

Finally, it is by the computation of Section 1.1 it easily follows that t(x) satis…es the Cauchy problem for the heat equation.

(15)

At the beginning of this section we have introduced also the regularizations S_t^N (x). Do they converge too, to _t(x)? This question corresponds more closely to the problem stated at the end of Section 2.2. If we keep …xed, the answer is a trivial consequence of the last theorem (under the assumption that is also bounded continuous):

N !1lim S_t^N (x) = ( t) (x) for every x 2 R^d. We do not get _t(x), if we keep constant.

More interesting is to link and N , namely examin the limit lim_{N !1} _N S_t^N (x) for suitable sequences _N. When

N !1lim ^N S_t^N (x) = _t(x)?

Let us see an example of result. Assume 2 Cc¹ R^d , 0,R

(x) dx = 1, to use the most common results on convergence of molli…ers (we use in particular the fact that _N f ! f uniformly on compact sets if f is continuous), but the next result is true in larger generality using appropriate extensions.

Theorem 2. If

N !1lim

d N

N = 0:

then, for every t > 0 and x 2 R^d, we have

N !1lim Eh

N S_t^N (x) t(x) ²i

= 0:

Proof. First, we have ( t(x) is not the average of _N x X_tⁱ )

Eh

N S_t^N (x) t(x)²i

= E 2 4 1

N XN i=1

N x X_tⁱ t(x)

23 5

2E 2 4 1

N XN i=1

N x X_tⁱ E _N x X_tⁱ

23 5

+ 2 1 N

XN i=1

E _N x X_tⁱ _t(x)

2

:

About the …rst term, using the independence between the r.v. _N x X_tⁱ and the property E _N x X_tⁱ E _N x X_tⁱ = 0, we may delete the mixed terms and write

E 2 4 1

N XN

i=1

N x X_tⁱ E _N x X_tⁱ

23 5 = 1

N² XN

i=1

Eh

N x X_tⁱ E _N x X_tⁱ ²i

which, being _N x X_tⁱ equally distributed, is equal to

= 1 NEh

N x X_t¹ E _N x X_t¹ ²i

:

(16)

For the same reason the second term is controlled by Eh

N S^N_t (x) _t(x) ²i 2 1

NEh

N x X_t¹ E _N x X_t¹ ²i

+ 2 E _N x X_t¹ t(x)² 4

NEh

N x X_t¹ ²i + 4

NE _N x X_t¹ ²+ 2 E _N x X_t¹ _t(x) ²: Notice that

E _N x X_t¹ = Z

R^d ^N

(x y) _t(y) dy (we have proved above that X_t¹ has density t). Moreover, we have

N !1lim Z

R^d

N(x y) t(y) dy = t(x)

uniformly in x on compact sets, for t > 0. Hence 2 E _N x X_t¹ t(x)² converges to zero. The sequence

E _N x X_t¹ ² = Z

R^d ^N

(x y) t(y) dy

2

is, for the same reason, bounded, hence the term _N⁴E _N x X_t¹ ² converges to zero. It remains to deal with the term _N⁴Eh

N x X_t¹ ²i

. We have Eh

N x X_t¹ ²i

= Z

R^d 2

N(x y) _t(y) dy = _N^d Z

R^d d N

2 1

N (x y) _t(y) dy

d N k tk₁

Z

R^d d N

2 1

N (x y) dy = _N^dk tk₁ Z

R^d

2(z) dz hence

4 NE

h

N x X_t¹ ²

i ^d

N

N 4 k ^tk₁ Z

R^d

2(z) dz :

This term goes to zero if we assume

d N

N ! 0.

Remark 1. The condition lim_{N !1}

d N

N = 0 has a very simple interpretation. First, the law from which the particles are extracted is almost compact support (as any probability law). To simplify the argument, assume it has support of linear size 1. If we have N particles, and we thin for simplicity that they are almost uniformly distributed in the support, the distance between closest neighbors is of the order N ^1=d. If the bandwidth N is of this order or smaller, the average performed by the kernel is made only on a …nite numebr of particles, sometimes maybe even zero particles, so it ‡uctuates randomly. If, on the contrary, N is much bigger than N ^1=d, then we average over a large number of particles and a sort of LLN is active.

(17)

5. On the weak convergence of measures, in the random case Above we have proved that, for every 2 Cb R^d , we have a.s.

N !1lim Z

R^d

(x) S_t^N(dx) = Z

R^d

(x) t(x) dx:

The event of zero probability where convergence does not hold may depend on . Hence, a priori, we cannot say that, for P -a.e. ! 2 , the sequence of measures S_t^N(!)

N 2N converges weakly, since this property precisely means: there exists an event ₀ with P ( ₀) = 1 such that for all ! 2 0

and all 2 Cb R^d we have

N !1lim Z

R^d

(x) S_t^N(!) (dx) = Z

R^d

(x) _t(x) dx:

Until now, we have a set ₀ for each with a similar property and their intersection is not under control. The result is true.

Corollary 1. There exists an event 0 with P ( 0) = 1 such that, for all ! 2 ⁰, S_t^N(!) (dx) converges weakly to _t(x) dx.

Proof. The idea is to prove the assertion …rst for a dense countable set of test functions and then extend to all test functions by an estimate. However, let us see the details.

Let f g _2N be a dense sequence in Cc R^d (the set of compact support continuous functions), density measured with respect to the uniform convergence (notice that C_b R^d , on the contrary, would not be separable). Since, for each 2 N, we have (4.1) with = , being a countable numebr of properties we may say that there exists an event ₀ with P ( ₀) = 1 such that, for all ! 2 0

and every 2 N, we have

(5.1) lim

N !1

Z

R^d

(x) S_t^N(!) (dx) = Z

R^d

(x) _t(x) dx:

For every 2 Cc R^d and 2 N we have Z

R^d

(x) S_t^N(!) (dx) Z

R^d

(x) t(x) dx Z

R^dj (x) (x)j St^N(!) (dx) +

Z

R^d

(x) S_t^N(!) (dx) Z

R^d

(x) t(x) dx +

Z

R^dj (x) (x)j ^t(x) dx

k k₁

Z

R^d

S_t^N(!) (dx) + Z

R^d

t(x) dx +

Z

R^d

(x) S_t^N(!) (dx) Z

R^d

(x) t(x) dx :

(18)

Recall thatR

R^dS_t^N(!) (dx) = 1, R

R^d t(x) dx = 1. Given 2 C^c R^d , givern " > 0, let 2 N be such that k k₁ ^"₂. Then

Z

R^d

(x) S_t^N(!) (dx) Z

R^d

(x) t(x) dx

" + Z

R^d

(x) S_t^N(!) (dx) Z

R^d

(x) _t(x) dx : Therefore, recalling (5.1), for every ! 2 0 we have

lim sup

N !1

Z

R^d

(x) S^N_t (!) (dx) Z

R^d

(x) _t(x) dx ":

Since " > 0 is arbitrary and the limsup does not depend upon ", we deduce that the limsup is equal to zero. Finally, recall that, if a sequence of probability measures converges to a probability measures (this is essential, and here it is true) over all test functions of C_c R^d , the it converges weakly (namely over all test functions of Cb R^d ). The proof is complete.

Remark 2. A similar result is true if we replace a.s. convergence with convergence in probability.

6. Heat equation as Fokker-Planck and Kolmogorov equation

The heat equation is related to Bownian motion in three ways: as macroscopic limit (the theo- rems above), as a Fokker-Planck equation; as a Kolmogorov equation. Let us see here the last two interpretations.

6.1. Heat equation as Fokker-Planck equation. We shall see below in Section 4.1 the general meaning of Fokker-Planck equation. The heat equation is a particular case of it. In Section 4.1 we deal with measure-valued solutions. Here, due to the simplicity of the particular case, we may deal with regular solutions of the Fokker-Planck equation.

Consider the equation

dXt= dBt; Xj^t=0 = X0

where X₀ has density ₀(x). The result is: the density _t(x) of X_t is solution of the Cauchy problem for the heat equation

@ t

@t = 1

2 ^t; j^t=0= 0:

We have already proved this result in Theorem 1. This section does not present a new result but only insists on the link between SDEs and PDEs which states (under appropriate assumptions) that the law of the solution to an SDE satis…es (in a suitable sense) a PDE.

6.2. Probabilistic representation formula. Finally, let us see that, for the heat equation

@u_t

@t = 1

2 u_t; ujt=0= u₀

one has the following probabilistic representation formula, in terms of a Brownian motion B_t: u_t(x) = E [u₀(x + B_t)] :

(19)

Indeed,

E [u0(x + Bt)] = Z

u0(x + y) pt(y) dy

= Z

pt(x y) u0(y) dy

and we know that this expression gives a solution of the heat equation. This link is a particular case of the link between SDEs and Kolmogorov equations, studied below in Section 4.2.

7. Two-dimensional simulations

We complete the …rst Chapter by a few simulations of Brownian motion in dimension 2, which will come back in subsequent chapters.

7.1. Brownian motion in 2D. The next code is a way to simulate a trajectory of a 2D Brownian motion:

n=10000; dt=0.01; h=sqrt(dt) X=1:n; Y=1:n

X[1]=0; Y[1]=0

for (t in 1:(n-1)) { X[t+1]=X[t]+h*rnorm(1) Y[t+1]=Y[t]+h*rnorm(1) }

plot(X,Y,type="l", col=3); abline(h=0); abline(0,1000)

The following two variants show a movie, in two di¤erent forms:

n=10000; dt=0.01; h=sqrt(dt) X=1:n; Y=1:n

(20)

X[1]=0; Y[1]=0

plot(c(-10,10),c(-10,10)) abline(h=0)

abline(0,1000)

for (t in 1:(n-1)) { X[t+1]=X[t]+h*rnorm(1) Y[t+1]=Y[t]+h*rnorm(1)

lines(X[1:t],Y[1:t],type="l", col=3) }

...

n=10000; dt=0.01; h=sqrt(dt) X=1:n; Y=1:n

X[1]=0; Y[1]=0

for (t in 1:(n-1)) { X[t+1]=X[t]+h*rnorm(1) Y[t+1]=Y[t]+h*rnorm(1) plot(c(-10,10),c(-10,10)) abline(h=0)

abline(0,1000)

lines(X[1:t],Y[1:t],type="l", col=3) }

7.2. Several Brownian motions in 2D. With several Brownian motions, we cannot plot any- more the full trajectores, the picture would be too full. We may, for instance, plot the …nal positions:

n=1000; N=1000; dt=0.01; h=sqrt(dt) X=matrix(nrow=N,ncol=n); Y=X X[,1]=rnorm(N,0,1); Y[,1]=rnorm(N,0,1) for (t in 1:(n-1)) {

X[,t+1]=X[,t]+h*rnorm(N) Y[,t+1]=Y[,t]+h*rnorm(N) }

plot(c(-10,10),c(-10,10)); lines(X[,n],Y[,n],type="p", col=1) abline(h=0); abline(0,1000)

(21)

This is a simple example of particle system. With a great degree of abstraction. we could think of it as a set of cancer cells, embedded into a tissue or in vitro.

We would like to see the motion of the particles. We use a number of tricks:

i) we plot only a few times, by means of the commands T=500, if(t%%T==0 ), etc.;

ii) we clean the previous points with the command polygon(c(-10,10,10,-10),c(-10,-10,10,10),col="white", border=NA).

Moreover, the method sometimes works poorly, it depends on the values of T compared to the complexity of the speci…c code. The example below has been tuned to work.

n=100000; N=1000; dt=0.0001; h=sqrt(dt) X=matrix(nrow=N,ncol=n); Y=X

X[,1]=rnorm(N,0,1); Y[,1]=rnorm(N,0,1) T=500

plot(c(-10,10),c(-10,10),type="n") for (t in 1:(n-1)) {

X[,t+1]=X[,t]+h*rnorm(N) Y[,t+1]=Y[,t]+h*rnorm(N) if(t%%T==0 )

{

polygon(c(-10,10,10,-10),c(-10,-10,10,10),col="white", border=NA) lines(X[,t+1],Y[,t+1],type="p", col=1)

abline(h=0) abline(0,1000) }

}

(22)

(23)

SDEs and PDEs

1. Stochastic di¤erential equations

1.1. De…nitions. We call stochastic di¤erential equation (SDE) an equation of the form (1.1) dXt= b (t; Xt) dt + (t; Xt) dBt; Xj^t=0= X0

where (Bt)_{t 0} is a d-dimensional Brownian motion on a …ltered probability space ( ; F; F^t; P ), X₀ is F⁰-measurable, b : [0; T ] R^d ! R^d and : [0; T ] R^d ! R^{d d} have some regularity speci…ed case by case and the solution (X_t)_{t 0} is a d-dimensional continuous adapted process. The meaning of the equation is the identity

(1.2) X_t= X₀+

Z t 0

b (s; X_s) ds + Z t

0

(s; X_s) dB_s

where we have to assume conditions on b and which guarantees that s 7! b (s; X^s) is integrable, s 7! (s; Xs) is square integrable, with probability one; Rt

0 (s; X_s) dB_s is an Itô integral, and more precisely it is its continuous version in t; and the identity has to hold uniformly in t, with probability one. The generalization to di¤erent dimensions of B and X is obvious; we take the same dimension to have less notations.

Even if less would be su¢ cient with more arguments, let us assume that b and are at least continuous, so that the above mentioned conditions of integrability of s 7! b (s; Xs) and s 7! (s; Xs) are ful…lled.

In most cases, if X₀ = x₀ is deterministic, when we prove that a solution exists, we can also prove that it is adapted not only to the …ltration (F^t) but also to Ft^B , the …ltration associated to the Brownian motion; more precisely, its completion. This is just natural, because the input of the equation is only the Brownian motion. However, it is so natural if implicitly we think to have a suitable uniqueness. Otherwise, in principle, it is di¢ cult to exclude that one can construct, maybe in some arti…cial ways, a solution which is not B-adapted. Indeed it happens that there are relevant examples of stochastic equations where solutions exist which are not B-adapted. This is the origin of the following de…nitions.

Definition 1 (strong solutions). We have strong existence for equation (1.1) if, given any …ltered probability space ( ; F; Ft; P ) with a Brownian motion (B_t)_{t 0}, given any deterministic initial condition X0 = x0, there is a continuous F^t-adapted process (Xt)_{t 0} satisfying (1.2) (in particular, we may choose (Ft) = Ft^B and have a solution adapted to B). A strong solution is a solution adapted to

Ft^B .

23

(24)

24 2. SDES AND PDES

Definition 2 (weak solutions). Given a deterministic initial condition X0= x0, a weak solution is the family composed of a …ltered probability space ( ; F; Ft; P ) with a Brownian motion (B_t)_{t 0}and a continuous F^t-adapted process (Xt)_{t 0} satisfying (1.2).

In the de…nition of weak solution, the …ltered probability space and the Brownian motion are not speci…ed a priori, they are part of the solution; hence we are not allowed to choose (F^t) = Ft^B .

When X₀ is random, F0-measurable, the concept of weak solution is formally in trouble because the space where X0 has to be de…ned is not prescribed a priori. The concept of strong solution can be adapted for instance replacing Ft^B with Ft^B_ F0 , or just saying that, if (X_t)_{t 0} is a solution on a prescribed space ( ; F; F^t; P ) where X0 and B are de…ned, then it is a strong solution. If we want to adapt the de…nition of weak solution to the case of random initial conditions, we have to prescribe only the law of X₀ and put in the solution the existence of X₀ with the given law.

Let us come to uniqueness. Similarly to existence, there are two concepts.

Definition 3 (pathwise uniqueness). We say that pathwise uniqueness holds for equation (1.1) if, given any …ltered probability space ( ; F; F^t; P ) with a Brownian motion (Bt)_{t 0}, given any deterministic initial condition X0= x0, if X_t⁽¹⁾

t 0 and X_t⁽²⁾

t 0 are two continuous F^t-adapted process which ful…ll (1.2), then they are indistinguishable.

Definition 4 (uniqueness in law). We say that there is uniqueness in law for equation (1.1) if, given two weak solutions on any pair of spaces, their laws coincide.

1.2. Strong solutions. The most classical theorem about strong solutions and pathwise uniqueness holds, as in the deterministic case, under Lipschitz assumptions on the coe¢ cients. Assume there are two constants L and C such that

b (t; x) b t; x⁰ L x x⁰ (t; x) t; x⁰ L x x⁰

jb (t; x)j C (1 + jxj) j (t; x)j C (1 + jxj)

for all values of t and x. The second condition on b and is written here for sake of generality, but is we assume, as said above, that b and are continuous, it follows from the …rst condition (the uniform in time Lipschitz property).

Theorem 3. Under the previous assumptions on b and , there is strong existence and pathwise uniqueness for equation (1.1). If, for some p 2, E [jX0j^p] < 1, then

E

"

sup

t2[0;T ]jXtj^p

#

< 1:

Proof. ...

(25)

1.3. Weak solutions. Let us see only a particular example of result about weak solutions. As- sume that is constant and non-degenerate; for simplicity, assume it equal to the identity, namely consider the SDE with additive noise

dXt= b (t; Xt) dt + dBt:

Moreover, assume b only measurable and bounded (or continuous and bounded if we prefer to maintain the general assumption of continuity). The key features of these assumptions are: the noise is non- degenerate (hence more restrictive than above for strong solutions) but b is very weak, much weaker than the usual Lipschitz case. Under such assumption on b, if we do not have the noise dBt in the equation, it is easy to make examples without existence or without uniqueness.

Theorem 4. Under these assumptions, for every x0 2 R^d, there exists a weak solution and it is unique in law.

2. Simulation of SDE in dimension one 2.1. Linear example. Consider the equation, with > 0,

dX_t= X_tdt + dB_t; X₀ = x:

Its Euler discretization, on intervals of constant amplitude, has the theoretical form

X_t_n+1 X_t_n = X_t_ndt + p

dtBtn+1 Btn

pdt dt := t_n+1 t_n

where the r.v.’s

Z_n= Btn+1 Btn

pdt

are standard Gaussian and independent. The algorithmic form can be the following one; …rst we construct a function, the drift, then we write the main part of the code:

drift=function(x) -x

n=10000; dt=0.01; h=sqrt(dt); sig=1 X=1:n; X[1]=1

for (t in 1:(n-1)) {X[t+1]=X[t]+ dt*drift(X[t]) + h*sig*rnorm(1)}

plot(X,type="l", col=2); abline(h=0); abline(0,1000)

(26)

26 2. SDES AND PDES

Exercise 1. Try with other initial conditions and other values of and .

2.2. Nonlinear example. Consider the equation (called "two-well-potential")

dX_t= X_t X_t³ dt + dB_t; X₀= x:

Its Euler discretization is

X_t_n+1 X_t_n = X_t_n X_t³_n dt + p dtZ_n

dt := t_n+1 t_n; Z_n= B_t_n+1 B_t_n pdt :

The code is:

drift=function(x) x-x^3

n=10000; dt=0.01; h=sqrt(dt); sig=0.5 X=1:n; X[1]=1

for (t in 1:(n-1)) {X[t+1]=X[t]+ dt*drift(X[t]) + h*sig*rnorm(1)}

plot(X,type="l", col=4); abline(h=0); abline(0,1000)

(27)

Exercise 2. Try with other initial conditions and other values of and n.

2.3. Important exercise. In both cases of the two examples above, plot an istogram and a

…tted (non-parametric) continuous density of the distribution at time t (for instance for t = 1, 10, 50, 100).

3. The method of compactness for SDEs

3.1. Compactness in C [0; T ] ; R^d . Recall that a set is called relatively compact if its closure is compact. Every subset of a compact set is relatively compact.

Recall classical Ascoli-Arzelà theorem: a family of functions F C [0; T ] ; R^d is relatively compact (in the uniform topology) if

i) for every t 2 [0; T ], the set ff (t) ; f 2 F g is bounded ii) for every " > 0 there is > 0 such that

jf (t) f (s)j "

for every f 2 F and every s; t 2 [0; T ] with jt sj .

Recall the de…nition of the Hölder seminorm, for f : [0; T ] ! R^d, [f ]_C = sup

t6=s

jf (t) f (s)j jt sj

and obviously of the supremum norm kfk₁= sup_{t2[0;T ]}jf (t)j. Simple su¢ cient conditions for (i) and (ii) are

i’) there is M > 0 such that kfk₁ M for all f 2 F

ii’) for some 2 (0; 1), there is R > 0 such that [f]C R for all f 2 F .

(28)

28 2. SDES AND PDES

Hence the sets

K_M;R=n

f 2 C [0; T ] ; R^d ; kfk₁ M; [f ]_C Ro are relatively compact in C [0; T ] ; R^d .

The Sobolev space W ^;p 0; T ; R^d , with 2 (0; 1) and p > 1, is de…ned as the set of all f 2 L^p 0; T ; R^d such that

[f ]_W ;p :=

Z T 0

jf (t) f (s)j^p

jt sj^{1+ p} dtds < 1:

We endow W ^;p 0; T ; R^d with the norm kfkW ^;p = kfkL^p+ [f ]_W ;p. It is known that W ^;p 0; T ; R^d C^" [0; T ] ; R^d if ( ") p > 1

and

[f ]_C" C"; ;pkfkW ^;p:

Using these new spaces, simple su¢ cient conditions for (i) and (ii) are (i’) and

ii”) for some 2 (0; 1) and p > 1 with p > 1, there is R > 0 such that [f]W ^;p R for all f 2 F . Indeed, if (ii”) holds, there exists " > 0 such that ( ") p > 1, hence such that [f ]_C" C"; ;pkfkW ^;p; moreover, kfkL^p T^1=pkfk₁ T^1=pM and therefore

[f ]_C" C_{"; ;p}kfkW ^;p C_{"; ;p} T^1=pM + R for all f 2 F , which implies the validity of (ii’). Therefore the sets

K_M;R⁰ =n

f 2 C [0; T ] ; R^d ; kfk₁ M; [f ]_W ;p Ro are relatively compact in C [0; T ] ; R^d , if p > 1.

3.2. Application to SDEs. Consider the SDE

dX_t= b (t; X_t) dt + (t; X_t) dB_t; Xjt=0= x

with bounded continuous coe¢ cients. We may dream of a generalization of Peano theorem, namely just existence of a solution.

Let b_n; _n be a sequence of continuous functions, each one uniformly Lipschitz in x (with constant that may depend on n):

jbn(t; x) b_n(t; y)j L_njx yj j ⁿ(t; x) n(t; y)j Lnjx yj equibounded:

kbⁿk₁+ k ⁿk₁ C

(here C > 0 is independent of n) and such that b_n ! b, n ! , uniformly on compact sets of [0; T ] R^d. Let fXtⁿg be the solutions of the equations

dX_tⁿ= b_n(t; X_tⁿ) dt + _n(t; X_tⁿ) dB_t; Xⁿjt=0= x:

Let Qⁿ be their laws on C [0; T ] ; R^d .

(29)

Lemma 1. The family fQⁿg is tight in C [0; T ] ; R^d .

Proof. Step 1. For pedagogical reasons, we start with a partially insu¢ cient proof, to clarify the role of certain arguments.

Recalling the relatively compact sets K_M;R above, it is su¢ cient to prove that, given > 0, there are M; R > 0 such that

(3.1) P Xⁿ2 KM;R^c <

for all n 2 N. Condition (3.1) means

P (kXⁿk₁> M or [Xⁿ]_C > R) < : A su¢ cient condition is

P (kXⁿk₁> M ) < =2 and P ([Xⁿ]_C > R) < =2:

Step 2. The …rst one is easy:

P (kXⁿk₁> M ) 1 ME

"

sup

t2[0;T ]jXtⁿj

#

and

sup

t2[0;T ]jXtⁿj jxj + Z T

0 jbⁿ(s; X_sⁿ)j ds + sup

t2[0;T ]

Z t 0

n(s; X_sⁿ) dBs

C + sup

t2[0;T ]

Z t 0

n(s; X_sⁿ) dB_s which implies

E

"

sup

t2[0;T ]jXtⁿj

#

C + E

"

sup

t2[0;T ]

Z t 0

n(s; X_sⁿ) dBs

#

C + E

"

sup

t2[0;T ]

Z t 0

n(s; X_sⁿ) dB_s

2#1=2

C + E Z T

0 j n(s; X_sⁿ)j²ds

1=2

C:

Hence, given > 0, we may choose M > 0 such that P (kXⁿk₁> M ) < =2.

Step 3. The second one,

P sup

t6=s

jXtⁿ X_sⁿj jt sj > R

!

< =2

however is more di¢ cult since it involves a double supremum in time and martingale inequalities do not help. A way to prove this property is to use a quantitative Kolmogorov criterion; another is a variant of these tightness arguments based on stopping times, developed by Aldous.

(30)

30 2. SDES AND PDES

However we have seen above another class of compact sets based on Sobolev spaces. The advantage of W ^;p 0; T ; R^d with respect to C^" [0; T ] ; R^d is that the topology is entirely de…ned by integrals, which merge with expectation better than a supremum. Let us use them in the next step.

Step 4. Recalling now the relatively compact sets K_M;R⁰ above, using the same argument as in step 1 and the result of step 2, we are left to prove that there exist 2 (0; 1) and p > 1 with p > 1, with the following property: given > 0, there is R > 0 such that

P ([Xⁿ]_W ;p > R) < =2 for every n 2 N. We have

P ([Xⁿ]_W ;p > R) 1 RE

Z T 0

jXtⁿ X_sⁿj^p jt sj^{1+ p} dtds

= C R

Z T 0

E [jXtⁿ X_sⁿj^p] jt sj^{1+ p} dtds:

Now, for t s,

X_tⁿ X_sⁿ= Z t

s

b_n(r; X_rⁿ) dr + Z t

s

n(r; X_rⁿ) dB_r jXtⁿ X_sⁿj^p C

Z t

s jbn(r; X_rⁿ)j dr

p

+ C Z t

s

n(r; X_rⁿ) dB_r

p

C (t s)^p+ C Z t

s

n(r; X_rⁿ) dBr p

and, by the so called Burkholder-Davis-Gundy inequality, E

Z t s

n(r; X_rⁿ) dB_r

p

CE

" Z t

s j n(r; X_rⁿ)j²dr

p=2#

C (t s)^p=2: Therefore

P ([Xⁿ]_C" > R) C R

Z T 0

jt sj^p=2 jt sj^{1+ p}dtds

= C R

Z T 0

1

jt sj¹⁺( ¹₂)^pdtds:

The integralRT 0

RT 0

1

jt sj¹⁺( ¹2)^pdtds is …nite if < ¹₂; we need to use p > 2, because of the constraint p > 1. Thus we may …nd R > 0 such that the previous expression is smaller than =2, as required.

Remark 3. By the same computation, one can shows that B 2 W ^;p 0; T ; R^d a.s., for every

< ¹₂ and p > 2, hence B 2 C^" [0; T ] ; R^d a.s., for every " < ¹₂, as we already know. It also provides a quantitative control on P ([B]_C" > R), usually not stated when Kolmogorov regularity theorem is given.

(31)

Remark 4. The fractional Sobolev topology above provides one of the simplest proofs of Kol- mogorov regularity theorem; obviously one has to accept the Sobolev embedding theorem, which in a sense incorporates arguments similar to those of dyadic partitions in the classical proof of Kolmogorov theorem.

Remark 5. From the previous proof we may deduce the following well-known criterium (see the books of Billingsley): if (i) holds and there are p; ; C > 0 such that

E [jXtⁿ X_sⁿj^p] C jt sj¹⁺

for all n 2 N, then the sequence fQⁿg is tight in C [0; T ] ; R^d .

We have proved that fQⁿg is tight in C [0; T ] ; R^d . Hence there are subsequences which converge weakly. Let us take one of them and, just for simplicity of notation, let us denote it by fQⁿg.

Thus we are assuming that fQⁿg converges weakly to some probability measure Q on Borel sets of C [0; T ] ; R^d . Our aim is to prove the existence of a solution of the SDE.

Assume for a second that the previous facts imply the existence of a continuous process X such that Xⁿ converges to X a.s., in the uniform topology. This assertion is obviously false: convergence in law does not imply a.s. convergence. However, it is true in a more involved way, using a Skorohod representation theorem. The details are not trivial, for instance because the new processes eX_tⁿ, on a new probability space, given by such theorem, do not satisfy the SDE, a priori; one can prove that they satisfy it in a suitable weak sense. So, let us miss these details, also because they will not enter the discussion in the case of the macroscopic limit, our …nal interest. And assume (although not true) that the original sequence Xⁿconverges to X a.s., in the uniform topology on compact sets. It follows

that Z t

s

bn(r; X_rⁿ) dr ! Z t

s

b (r; Xr) dr Z t

s

n(r; X_rⁿ) dBr! Z t

s

(r; Xr) dBr

in probability, by the P -a.s. uniform convergence of b_n(r; X_rⁿ) (resp. _n(r; X_rⁿ)) to b (r; X_r) (resp.

(r; X_r)) and the equiboundedness of b_n(r; X_rⁿ) (resp. _n(r; X_rⁿ)). It follows that X_t satis…es the SDE.

3.3. The zero-noise example. Concerning this last issue, the convergence, there is an interesting …nite dimensional example which anticipates what happens in the case of macroscopic limits. It is the case of the sequence of equations

dX_t^"= b (t; X_t^") dt + "dBt; Xj^t=0= x

when b is bounded continuous. Under this assumption there is existence and uniqueness in law, but also in the strong sense, for every " > 0. We claim that the family fQ^"g of the laws of fXt^"g is tight in C [0; T ] ; R^d and each limit measure Q of the family has the following property:

Q (C_x) = 1

(32)

32 2. SDES AND PDES

where Cx C [0; T ] ; R^d is the set of all solutions of the deterministic equation dX_t

dt = b (t; X_t) ; Xjt=0= x:

The proof of tightness of fQ^"g is identical to the one given above, we leave it as an exercise. Let fQ^"ⁿg be a weakly converging subsequence and Q be its limit. Consider the functional : C [0; T ] ; R^d ! R de…ned as

(f ) = sup

t2[0;T ]

ft x Z t

0

b (s; fs) ds : It has the property that (f ) = 0 if and only if f 2 C^x.

The functional is continuous on C := C [0; T ] ; R^d , with the uniform topology (here we use that b is continuous). Recall that by Portmanteau theorem one has Q (A) lim inf Q^"ⁿ(A) for every open set A C. Hence, for every > 0,

Q (f 2 C : (f ) > ) lim inf Q^"ⁿ(f 2 C : (f ) > ) :

If we prove that this lim inf is zero, then Q (f 2 C : (f ) > ) = 0 for every > 0, hence Q (f 2 C : (f ) = 0) = 1, which proves Q (C_x) = 1 (we leave as an exercise to prove that C_x is a closed set, hence Borel).

We have

Q^"ⁿ(f 2 C : (f ) > ) = P ( (X^"ⁿ) > )

= P sup

t2[0;T ]

X_t^"ⁿ x Z t

0

b (s; X_s^"ⁿ) ds >

!

= P sup

t2[0;T ]j"nB_tj >

!

= P sup

t2[0;T ]jB^tj > ="ⁿ

! "n

E

"

sup

t2[0;T ]jB^tj

#

! 0:

4. Links between SDEs and linear PDEs

4.1. Fokker-Planck equation. Along with the stochastic equation (1.1) de…ned by the coe¢ - cients b and , we consider also the following parabolic PDE on [0; T ] R^d:

(4.1) @p

@t = 1 2

X@i@j(aijp) div (pb) ; pj^t=0= p0

called Fokker-Planck equation. Here

a = ^T:

Although in many cases it has regular solutions, in order to minimize the theory it is convenient to introduce the concept of measure-valued solution t; moreover we restrict to the case of probability measures. To give a meaning to certain integrals below, we assume (beside other assumptions depending on the result)

b; bounded continuous