Elements of Mathematical Oncology Franco Flandoli

(1)

Elements of Mathematical Oncology

Franco Flandoli

(2)

(3)

Part 1

Stochastic Di¤erential Equations, linear Partial

Di¤erential Equations and their links

(6)

(7)

CHAPTER 1

Brownian motion and heat equation

1. Introduction

We recall that, given a probability space ( ; F; P ) and a …ltration (Ft)_{t 0}, namely what we call a

…ltered probability space ( ; F; F^t; P ), a (continuous) Brownian motion is a stochastic process (Bt)_{t 0} with the following properties:

i) it is a continuous adapted process ii) B₀= 0 a.s.

iii) for every t s 0, the r.v. Bt Bs is Gaussian N (0; t s) and it is independent of F^s. If continuity is not prescribed, it can be proved that there is a continuous version. A Brownian motion in R^d is a stochastic process (Bt)_{t 0} with values in R^d, Bt = B_t⁽¹⁾; :::; B_t^(d) such that its components B_t⁽ⁱ⁾ are independent real valued Brownian motions.

We also recall that the heat equation in R^d, with di¤usion constant k > 0, is the Partial Di¤erential Equation (PDE)

@u_t

@t = k u_t; ujt=0= u₀:

Here u = u_t = u_t(x) denotes the solution, a function u : [0; T ] R^d ! R (we may replace [0; T ] by [0; 1)) and u0 denotes the initial condition, a function u₀ : R^d! R.

This section is devoted to the description of a few properties of these objects and their links. We restrict the attention to k = ¹₂ for simplicity of notations and show the modi…cations in the general case at the end of the section.

1.1. Heat kernel. Let p_t(x) be the function

pt(x) = (2 t) ^d=2exp jxj²=2t de…ned for t > 0. It is called the heat kernel. We have

@pt(x)

@t = 1

2 pt(x) : Indeed,

@pt(x)

@t = d=2 (2 t) ^{d=2 1}2 exp jxj²=2t + (2 t) ^d=2exp jxj²=2t jxj²=2t²

= pt(x) d 2t+jxj²

2t²

!

7

(8)

8 1. BROWNIAN MOTION AND HEAT EQUATION

@ipt(x) = pt(x) ( xi=t)

@_i²pt(x) = pt(x) ( xi=t)²+ pt(x) ( 1=t)

= p_t(x) x²_i t²

1 t :

Most of the results of this section depend on this simple computation. However, let us now restart progressively from Brownian motion and investigate it also at a numerical level.

2. Simulations of Brownian motion

2.1. Simulation of a trajectory of Brownian motion. Let (B_t)_{t 0}, be a real valued Brownian motion, on a …ltered probability space ( ; F; Ft; P ). Let X₀ be an F0-measurable random variable.

Consider the stochastic process

Xt= X0+ Bt: It will be the solution of the stochastic di¤erential equation

dX_t= dB_t; Xjt=0 = X₀

when this concept will be clari…ed. The following simple code, written with the free software R, simulates a trajectory, in the spirit of explicit Euler scheme of discretization of stochastic di¤erential equations (n is the number of time steps; just to introduce one more R command, we use an uniform distributed initial condition):

n = 10000; dt = 0:01; h = sqrt(dt) X = 1 : n

X[1] = runif(1; 1; 1) for (t in 1 : (n 1)) f X[t + 1] = X[t] + h rnorm(1) g

plot(X; type = "l"; col = 3) lines((1 : n) 0)

Let us stress the role of the quantity h = sqrt(dt): the random variable corresponding to the software command X[t + 1] X[t] is Xt+dt Xt, which is equal to

X_t+dt X_t= B_t+dt B_t

hence it is, by de…nition of Brownian motion, a Gaussian N (0; dt). As such, it can be represented in the form

B_t+dt Bt=p dtZ

where Z is N (0; 1). This is why, in the code, X[t + 1] X[t] is settled equal to h rnorm(1). Notice

…nally that the independence of increments of Brownian motion is re‡ected, in the code, by the fact that rnorm(1) computes independent values at each iteration of the "for" cycle.

(9)

2. SIMULATIONS OF BROWNIAN MOTION 9

2.2. Simulation of several independent Brownian motions. What we are going to do can be equivalently described as the simulation of several Brownian trajectories, but in the spirit of systems of many particles we prefer to think as we had several independent Brownian motions and we simulate a single trajectory for each one of them.

Consider a sequence B_{t t 0}ⁱ , i = 1; 2; ::: of independent Brownian motions on a …ltered probability space ( ; F; F^t; P ).

Consider a sequence X₀ⁱ, i = 1; 2; ::: of random initial conditions, F0-measurable, independent and identically distributed with law having density ₀(x) with respect to Lebesgue measure on R^d.

Consider the following simple di¤erential equations

dX_tⁱ= dB_tⁱ; Xⁱj^t=0= X₀ⁱ having solutions

X_tⁱ = X₀ⁱ+ B_tⁱ: Let us see a picture with a myriad of colors:

N = 500; n = 10000; dt = 0:01; sd0 = 0:5; h = sqrt(dt) X = matrix(nrow = N; ncol = n)

X[; 1] = rnorm(N; 0; sd0) for (t in 1 : (n 1)) f

X[; t + 1] = X[; t] + h rnorm(N) g

plot(c(0; n); c( 20; 20)) for (k in 1 : N) f

lines(X[k; ]; col = k) g

First, notice the shape of the envelop, roughly likep

t, corresponding to the property V ar B_tⁱ = t.

(10)

We may look more closely the distribution of points at some time t. To have a better result, we increase the number of points and avoid to produce the previous picture which is time consuming:

N=1000; n=10000; dt=0.01; sd0=0.5; h=sqrt(dt) X=matrix(nrow=N,ncol=n)

X[,1]=rnorm(N,0,sd0) for (t in 1:(n-1)) { X[,t+1]=X[,t]+h*rnorm(N) }

nn=10

hist(X[,nn],30,FALSE)

lines(density(X[,nn],bw=sd(X[,nn])/3))

Her are two pictures corresponding to the time steps 10 and 1000:

These pictures arise the following question: when the number N of "particles" tends to in…nity, does this pro…le converges to a well de…ned limit pro…le? And could we obtain the limit pro…le by a less time consuming method, for instance by solving a suitable equation?

To get a feeling of the practical problem, take N = 10000. We see that now the software, on an ordinary laptop requires a non-negligible amount of time; and the pro…le is more and more regular. In a gas or ‡uid, the number of molecules is of the order of 10²⁰; in a living tissue a¤ected by a cancer, the number of tumor cells may be of the order of 10⁹; both are incredibly larger than N = 10000. The pro…le should be extremely regular, but not a¤ordable by a direct simulation of the particle system.

3. About the problem of …nding a density from a sample

In the previous section, given the values X_t¹; :::; X_t^N at some time t, we tried to represent graphically these points by means of a probability density, which re‡ects their degree of concentration.

(11)

3. ABOUT THE PROBLEM OF FINDING A DENSITY FROM A SAMPLE 11

The histogram is the …rst easy way: the space is partitioned in cells (here intervals of equal length) and the number of point in each cell is counted. One can plot the histogram giving the number of points per cell or its normalization with area one, to be compared with a probability density.

In order to obtain a more regular probability density function, there are various methods. One of them is particularly connected with certain theoretical investigation that we are going to perform in these lectures: it is the "kernel smoothing" method. It starts from a "kernel", namely a probability density K (x). Then the kernel is rescaled by the formula (d is the spatial dimension)

K (x) := ^dK ¹x and, given the points, we have to perform the average

N(x) := 1 N

XN i=1

K x X_tⁱ :

Notice that

Z

K (x) dx = 1 hence

Z

N(x) dx = 1 N

XN i=1

Z

K x X_tⁱ dx = 1 N

XN i=1

Z

K (x) dx = 1:

In other words, the resulting function _N(x) is a probability density function.

To implement this method one has to choose a kernel K (certain R commands choose a Gaussian kernel by default), but the result is only mildly a¤ected by this choice. What really makes a di¤erence is the choice of the rescaling factor , also called the "bandwidth". The general idea is that for large we get a rather ‡at and very smooth pro…le N(x); for small the pro…le oscillates. One has to choose an intermediate value of , which is the most di¢ cult problem of the implementation of the method (as it is the choice of the cells in the histogram). Wrong choices really give results that are not acceptable, too far from reality.

It is quite natural to expect that the value of the bandwidth is related to the standard deviation of points. However, the simple choice =sd(data) does not give the best result. Correction by a factor improve the result, but the choice of the factor is not easy. There are ad hoc rules, quite incredible, like the one at the help page ?bw.nrd or software R. Below we propose to use sd(data)/5, as a trial, but it is not always the best.

Concerning di¤erent variants of implementation of the kernel smoothing idea, see the help of R under the names ?ksmooth, ?bkde, ?density and for instance the paper at http://vita.had.co.nz/papers/density- estimation.pdf.

The reader is suggested to try the following exercise: a Gaussian sample is generated, the histogram is plotted and, optionally, also the true density that generated the sample is plotted over the histogram.

Then, one can over-plot the density given by a method of kernel smoothing, for di¤erent values of the bandwidth. Here is an example of code:

(12)

Z=rnorm(10000,0,10) hist(Z,50,FALSE) Z0=sort(Z)

Y=dnorm(Z0,0,10) lines(Z0,Y)

lines(density(Z,bw=sd(Z)/5),col="red")

and here are two examples with wrong choices of the bandwidth (we write only the last line):

lines(density(Z,bw=0.1),col="red"), lines(density(Z,bw=10),col="red")

(13)

4. MACROSCOPIC LIMIT OF BROWNIAN MOTIONS 13

4. Macroscopic limit of Brownian motions First of all let us introduce the so called empirical measure

S_t^N := 1 N

XN i=1

X_tⁱ:

It is a random probability measure on Borel sets, convex combination of random delta Dirac measures.

At the position X_tⁱ of each single particle we put a pointwise mass of size _N¹. By random probability measure we may loosely just mean a probability measure depending on ! 2 ; more rigorously we must explain in which sense we mean the measurability in ! and the simplest way it to say that R (x) S^N_t (dx) must be measurable, for each 2 Cb R^d .

If we imagine thus measure as a bunch of very small point masses, one one side we may get the feeling that the global mass is more concentrated here than there, but on the other side we do not see any pro…le, similar to a Gaussian or others. The only "altitude" we see is _N¹, in a sense.

To extract a pro…le we may mollify the atomic measure S_t^N by a convolution with a kernel: given a probability density (x), setting

(x) := ^d ¹x we perform the convolution

u_N(x) := S_t^N (x) :=

Z

R^d

(x y) S_t^N(dy) :

(14)

The function uN(x) is a regularized version of S_t^N and it is a probability density (the proof is the same made above for K). We have

S_t^N (x) = 1 N

XN i=1

x X_tⁱ

hence this operation coincides with the kernel smoothing method described above (with K = ).

After these preliminaries, we ask the following question: does S_t^N converge, as N ! 1, to a limit probability measure, maybe having a density _t(x)? Under the previous assumptions, this is true.

Dealing with convergence of measures, let us recall that we call weak convergence the property

(4.1) lim

N !1

Z

R^d

(x) S_t^N(dx) = Z

R^d

(x) t(x) dx when it holds for every continuous bounded function .

Theorem 1. Under the assumptions speci…ed at the beginning of Section 2.2,for every test function 2 Cb R^d we have property (4.1) in the sense of almost sure convergence. Moreover,

t(x) = Z

R^d

pt(x y) ₀(y) dy

where p_t(x) is the density of a Brownian motion in R^d, namely p_t(x) = (2 t) ^d=2exp jxj²=2t . Finally, the function t(x) is smooth for t > 0 and satis…es the Cauchy problem for the heat equation

@ t

@t = 1

2 ^t; j^t=0= 0

(the initial condition is assumed as a limit in L¹ R^d as t ! 0)

Proof. Given t, the r.v. X_tⁱ are i.i.d. and thus the same is true for the r.v. X_tⁱ , with 2 Cb R^d ; moreover, by the boundedness of , the r.v. X_tⁱ have …nite moments. Therefore, by the strong Law of Large Numbers,

1 N

XN i=1

X_tⁱ ! E X_t¹ in the sense of almost sure convergence.

Since X_tⁱ = X₀ⁱ+ B_tⁱ and since the terms X₀ⁱ and B_tⁱ are independent, with densities ₀ and p_t(x) respectively, then also X_tⁱ has density, given by the convolution of ₀ and p_t(x), density we have denoted above by t(x). Moreover

E X_t¹ = Z

R^d

(x) t(x) dx

and thus property (4.1) is fully proved, along with the convolution formula for _t(x).

Finally, it is by the computation of Section 1.1 it easily follows that t(x) satis…es the Cauchy problem for the heat equation.

(15)

4. MACROSCOPIC LIMIT OF BROWNIAN MOTIONS 15

At the beginning of this section we have introduced also the regularizations S_t^N (x). Do they converge too, to _t(x)? This question corresponds more closely to the problem stated at the end of Section 2.2. If we keep …xed, the answer is a trivial consequence of the last theorem (under the assumption that is also bounded continuous):

N !1lim S_t^N (x) = ( t) (x) for every x 2 R^d. We do not get _t(x), if we keep constant.

More interesting is to link and N , namely examin the limit lim_{N !1} _N S_t^N (x) for suitable sequences _N. When

N !1lim ^N S_t^N (x) = _t(x)?

Let us see an example of result. Assume 2 Cc¹ R^d , 0,R

(x) dx = 1, to use the most common results on convergence of molli…ers (we use in particular the fact that _N f ! f uniformly on compact sets if f is continuous), but the next result is true in larger generality using appropriate extensions.

Theorem 2. If

N !1lim

d N

N = 0:

then, for every t > 0 and x 2 R^d, we have

N !1lim Eh

N S_t^N (x) t(x) ²i

= 0:

Proof. First, we have ( t(x) is not the average of _N x X_tⁱ )

Eh

N S_t^N (x) t(x)²i

= E 2 4 1

N XN i=1

N x X_tⁱ t(x)

23 5

2E 2 4 1

N XN i=1

N x X_tⁱ E _N x X_tⁱ

23 5

+ 2 1 N

XN i=1

E _N x X_tⁱ _t(x)

2

:

About the …rst term, using the independence between the r.v. _N x X_tⁱ and the property E _N x X_tⁱ E _N x X_tⁱ = 0, we may delete the mixed terms and write

E 2 4 1

N XN

i=1

N x X_tⁱ E _N x X_tⁱ

23 5 = 1

N² XN

i=1

Eh

N x X_tⁱ E _N x X_tⁱ ²i

which, being _N x X_tⁱ equally distributed, is equal to

= 1 NEh

N x X_t¹ E _N x X_t¹ ²i

:

(16)

For the same reason the second term is controlled by Eh

N S^N_t (x) _t(x) ²i 2 1

NEh

N x X_t¹ E _N x X_t¹ ²i

+ 2 E _N x X_t¹ t(x)² 4

NEh

N x X_t¹ ²i + 4

NE _N x X_t¹ ²+ 2 E _N x X_t¹ _t(x) ²: Notice that

E _N x X_t¹ = Z

R^d ^N

(x y) _t(y) dy (we have proved above that X_t¹ has density t). Moreover, we have

N !1lim Z

R^d

N(x y) t(y) dy = t(x)

uniformly in x on compact sets, for t > 0. Hence 2 E _N x X_t¹ t(x)² converges to zero. The sequence

E _N x X_t¹ ² = Z

R^d ^N

(x y) t(y) dy

2

is, for the same reason, bounded, hence the term _N⁴E _N x X_t¹ ² converges to zero. It remains to deal with the term _N⁴Eh

N x X_t¹ ²i

. We have Eh

N x X_t¹ ²i

= Z

R^d 2

N(x y) _t(y) dy = _N^d Z

R^d d N

2 1

N (x y) _t(y) dy

d N k tk₁

Z

R^d d N

2 1

N (x y) dy = _N^dk tk₁ Z

R^d

2(z) dz hence

4 NE

h

N x X_t¹ ²

i ^d

N

N 4 k ^tk₁ Z

R^d

2(z) dz :

This term goes to zero if we assume

d N

N ! 0.

Remark 1. The condition lim_{N !1}

d N

N = 0 has a very simple interpretation. First, the law from which the particles are extracted is almost compact support (as any probability law). To simplify the argument, assume it has support of linear size 1. If we have N particles, and we thin for simplicity that they are almost uniformly distributed in the support, the distance between closest neighbors is of the order N ^1=d. If the bandwidth N is of this order or smaller, the average performed by the kernel is made only on a …nite numebr of particles, sometimes maybe even zero particles, so it ‡uctuates randomly. If, on the contrary, N is much bigger than N ^1=d, then we average over a large number of particles and a sort of LLN is active.

(17)

5. ON THE WEAK CONVERGENCE OF MEASURES, IN THE RANDOM CASE 17

5. On the weak convergence of measures, in the random case Above we have proved that, for every 2 Cb R^d , we have a.s.

N !1lim Z

R^d

(x) S_t^N(dx) = Z

R^d

(x) t(x) dx:

The event of zero probability where convergence does not hold may depend on . Hence, a priori, we cannot say that, for P -a.e. ! 2 , the sequence of measures S_t^N(!)

N 2N converges weakly, since this property precisely means: there exists an event ₀ with P ( ₀) = 1 such that for all ! 2 0

and all 2 Cb R^d we have

N !1lim Z

R^d

(x) S_t^N(!) (dx) = Z

R^d

(x) _t(x) dx:

Until now, we have a set ₀ for each with a similar property and their intersection is not under control. The result is true.

Corollary 1. There exists an event 0 with P ( 0) = 1 such that, for all ! 2 ⁰, S_t^N(!) (dx) converges weakly to _t(x) dx.

Proof. The idea is to prove the assertion …rst for a dense countable set of test functions and then extend to all test functions by an estimate. However, let us see the details.

Let f g _2N be a dense sequence in Cc R^d (the set of compact support continuous functions), density measured with respect to the uniform convergence (notice that C_b R^d , on the contrary, would not be separable). Since, for each 2 N, we have (4.1) with = , being a countable numebr of properties we may say that there exists an event ₀ with P ( ₀) = 1 such that, for all ! 2 0

and every 2 N, we have

(5.1) lim

N !1

Z

R^d

(x) S_t^N(!) (dx) = Z

R^d

(x) _t(x) dx:

For every 2 Cc R^d and 2 N we have Z

R^d

(x) S_t^N(!) (dx) Z

R^d

(x) t(x) dx Z

R^dj (x) (x)j St^N(!) (dx) +

Z

R^d

(x) S_t^N(!) (dx) Z

R^d

(x) t(x) dx +

Z

R^dj (x) (x)j ^t(x) dx

k k₁

Z

R^d

S_t^N(!) (dx) + Z

R^d

t(x) dx +

Z

R^d

(x) S_t^N(!) (dx) Z

R^d

(x) t(x) dx :

(18)

Recall thatR

R^dS_t^N(!) (dx) = 1, R

R^d t(x) dx = 1. Given 2 C^c R^d , givern " > 0, let 2 N be such that k k₁ ^"₂. Then

Z

R^d

(x) S_t^N(!) (dx) Z

R^d

(x) t(x) dx

" + Z

R^d

(x) S_t^N(!) (dx) Z

R^d

(x) _t(x) dx : Therefore, recalling (5.1), for every ! 2 0 we have

lim sup

N !1

Z

R^d

(x) S^N_t (!) (dx) Z

R^d

(x) _t(x) dx ":

Since " > 0 is arbitrary and the limsup does not depend upon ", we deduce that the limsup is equal to zero. Finally, recall that, if a sequence of probability measures converges to a probability measures (this is essential, and here it is true) over all test functions of C_c R^d , the it converges weakly (namely over all test functions of Cb R^d ). The proof is complete.

Remark 2. A similar result is true if we replace a.s. convergence with convergence in probability.

6. Heat equation as Fokker-Planck and Kolmogorov equation

The heat equation is related to Bownian motion in three ways: as macroscopic limit (the theo- rems above), as a Fokker-Planck equation; as a Kolmogorov equation. Let us see here the last two interpretations.

6.1. Heat equation as Fokker-Planck equation. We shall see below in Section 3.1 the general meaning of Fokker-Planck equation. The heat equation is a particular case of it. In Section 3.1 we deal with measure-valued solutions. Here, due to the simplicity of the particular case, we may deal with regular solutions of the Fokker-Planck equation.

Consider the equation

dXt= dBt; Xj^t=0 = X0

where X₀ has density ₀(x). The result is: the density _t(x) of X_t is solution of the Cauchy problem for the heat equation

@ t

@t = 1

2 ^t; j^t=0= 0:

We have already proved this result in Theorem 1. This section does not present a new result but only insists on the link between SDEs and PDEs which states (under appropriate assumptions) that the law of the solution to an SDE satis…es (in a suitable sense) a PDE.

6.2. Probabilistic representation formula. Finally, let us see that, for the heat equation

@u_t

@t = 1

2 u_t; ujt=0= u₀

one has the following probabilistic representation formula, in terms of a Brownian motion B_t: u_t(x) = E [u₀(x + B_t)] :

(19)

7. TWO-DIMENSIONAL SIMULATIONS 19

Indeed,

E [u0(x + Bt)] = Z

u0(x + y) pt(y) dy

= Z

pt(x y) u0(y) dy

and we know that this expression gives a solution of the heat equation. This link is a particular case of the link between SDEs and Kolmogorov equations, studied below in Section 3.2.

7. Two-dimensional simulations

We complete the …rst Chapter by a few simulations of Brownian motion in dimension 2, which will come back in subsequent chapters.

7.1. Brownian motion in 2D. The next code is a way to simulate a trajectory of a 2D Brownian motion:

n=10000; dt=0.01; h=sqrt(dt) X=1:n; Y=1:n

X[1]=0; Y[1]=0

for (t in 1:(n-1)) { X[t+1]=X[t]+h*rnorm(1) Y[t+1]=Y[t]+h*rnorm(1) }

plot(X,Y,type="l", col=3); abline(h=0); abline(0,1000)

The following two variants show a movie, in two di¤erent forms:

n=10000; dt=0.01; h=sqrt(dt) X=1:n; Y=1:n

(20)

X[1]=0; Y[1]=0

plot(c(-10,10),c(-10,10)) abline(h=0)

abline(0,1000)

for (t in 1:(n-1)) { X[t+1]=X[t]+h*rnorm(1) Y[t+1]=Y[t]+h*rnorm(1)

lines(X[1:t],Y[1:t],type="l", col=3) }

...

n=10000; dt=0.01; h=sqrt(dt) X=1:n; Y=1:n

X[1]=0; Y[1]=0

for (t in 1:(n-1)) { X[t+1]=X[t]+h*rnorm(1) Y[t+1]=Y[t]+h*rnorm(1) plot(c(-10,10),c(-10,10)) abline(h=0)

abline(0,1000)

lines(X[1:t],Y[1:t],type="l", col=3) }

7.2. Several Brownian motions in 2D. With several Brownian motions, we cannot plot any- more the full trajectores, the picture would be too full. We may, for instance, plot the …nal positions:

n=1000; N=1000; dt=0.01; h=sqrt(dt) X=matrix(nrow=N,ncol=n); Y=X X[,1]=rnorm(N,0,1); Y[,1]=rnorm(N,0,1) for (t in 1:(n-1)) {

X[,t+1]=X[,t]+h*rnorm(N) Y[,t+1]=Y[,t]+h*rnorm(N) }

plot(c(-10,10),c(-10,10)); lines(X[,n],Y[,n],type="p", col=1) abline(h=0); abline(0,1000)

(21)

7. TWO-DIMENSIONAL SIMULATIONS 21

This is a simple example of particle system. With a great degree of abstraction. we could think of it as a set of cancer cells, embedded into a tissue or in vitro.

We would like to see the motion of the particles. We use a number of tricks:

i) we plot only a few times, by means of the commands T=500, if(t%%T==0 ), etc.;

ii) we clean the previous points with the command polygon(c(-10,10,10,-10),c(-10,-10,10,10),col="white", border=NA).

Moreover, the method sometimes works poorly, it depends on the values of T compared to the complexity of the speci…c code. The example below has been tuned to work.

n=100000; N=1000; dt=0.0001; h=sqrt(dt) X=matrix(nrow=N,ncol=n); Y=X

X[,1]=rnorm(N,0,1); Y[,1]=rnorm(N,0,1) T=500

plot(c(-10,10),c(-10,10),type="n") for (t in 1:(n-1)) {

X[,t+1]=X[,t]+h*rnorm(N) Y[,t+1]=Y[,t]+h*rnorm(N) if(t%%T==0 )

{

polygon(c(-10,10,10,-10),c(-10,-10,10,10),col="white", border=NA) lines(X[,t+1],Y[,t+1],type="p", col=1)

abline(h=0) abline(0,1000) }

}

(22)

(23)

CHAPTER 2

SDEs and PDEs

1. Stochastic di¤erential equations

1.1. De…nitions. We call stochastic di¤erential equation (SDE) an equation of the form (1.1) dXt= b (t; Xt) dt + (t; Xt) dBt; Xj^t=0= X0

where (Bt)_{t 0} is a d-dimensional Brownian motion on a …ltered probability space ( ; F; F^t; P ), X₀ is F⁰-measurable, b : [0; T ] R^d ! R^d and : [0; T ] R^d ! R^{d d} have some regularity speci…ed case by case and the solution (X_t)_{t 0} is a d-dimensional continuous adapted process. The meaning of the equation is the identity

(1.2) X_t= X₀+

Z t 0

b (s; X_s) ds + Z t

0

(s; X_s) dB_s

where we have to assume conditions on b and which guarantees that s 7! b (s; X^s) is integrable, s 7! (s; Xs) is square integrable, with probability one; Rt

0 (s; X_s) dB_s is an Itô integral, and more precisely it is its continuous version in t; and the identity has to hold uniformly in t, with probability one. The generalization to di¤erent dimensions of B and X is obvious; we take the same dimension to have less notations.

Even if less would be su¢ cient with more arguments, let us assume that b and are at least continuous, so that the above mentioned conditions of integrability of s 7! b (s; Xs) and s 7! (s; Xs) are ful…lled.

In most cases, if X₀ = x₀ is deterministic, when we prove that a solution exists, we can also prove that it is adapted not only to the …ltration (F^t) but also to Ft^B , the …ltration associated to the Brownian motion; more precisely, its completion. This is just natural, because the input of the equation is only the Brownian motion. However, it is so natural if implicitly we think to have a suitable uniqueness. Otherwise, in principle, it is di¢ cult to exclude that one can construct, maybe in some arti…cial ways, a solution which is not B-adapted. Indeed it happens that there are relevant examples of stochastic equations where solutions exist which are not B-adapted. This is the origin of the following de…nitions.

Definition 1 (strong solutions). We have strong existence for equation (1.1) if, given any …ltered probability space ( ; F; Ft; P ) with a Brownian motion (B_t)_{t 0}, given any deterministic initial condition X0 = x0, there is a continuous F^t-adapted process (Xt)_{t 0} satisfying (1.2) (in particular, we may choose (Ft) = Ft^B and have a solution adapted to B). A strong solution is a solution adapted to

Ft^B .

23

(24)

24 2. SDES AND PDES

Definition 2 (weak solutions). Given a deterministic initial condition X0= x0, a weak solution is the family composed of a …ltered probability space ( ; F; Ft; P ) with a Brownian motion (B_t)_{t 0}and a continuous F^t-adapted process (Xt)_{t 0} satisfying (1.2).

In the de…nition of weak solution, the …ltered probability space and the Brownian motion are not speci…ed a priori, they are part of the solution; hence we are not allowed to choose (F^t) = Ft^B .

When X₀ is random, F0-measurable, the concept of weak solution is formally in trouble because the space where X0 has to be de…ned is not prescribed a priori. The concept of strong solution can be adapted for instance replacing Ft^B with Ft^B_ F0 , or just saying that, if (X_t)_{t 0} is a solution on a prescribed space ( ; F; F^t; P ) where X0 and B are de…ned, then it is a strong solution. If we want to adapt the de…nition of weak solution to the case of random initial conditions, we have to prescribe only the law of X₀ and put in the solution the existence of X₀ with the given law.

Let us come to uniqueness. Similarly to existence, there are two concepts.

Definition 3 (pathwise uniqueness). We say that pathwise uniqueness holds for equation (1.1) if, given any …ltered probability space ( ; F; F^t; P ) with a Brownian motion (Bt)_{t 0}, given any deterministic initial condition X0= x0, if X_t⁽¹⁾

t 0 and X_t⁽²⁾

t 0 are two continuous F^t-adapted process which ful…ll (1.2), then they are indistinguishable.

Definition 4 (uniqueness in law). We say that there is uniqueness in law for equation (1.1) if, given two weak solutions on any pair of spaces, their laws coincide.

1.2. Strong solutions. The most classical theorem about strong solutions and pathwise uniqueness holds, as in the deterministic case, under Lipschitz assumptions on the coe¢ cients. Assume there are two constants L and C such that

b (t; x) b t; x⁰ L x x⁰ (t; x) t; x⁰ L x x⁰

jb (t; x)j C (1 + jxj) j (t; x)j C (1 + jxj)

for all values of t and x. The second condition on b and is written here for sake of generality, but is we assume, as said above, that b and are continuous, it follows from the …rst condition (the uniform in time Lipschitz property).

Theorem 3. Under the previous assumptions on b and , there is strong existence and pathwise uniqueness for equation (1.1). If, for some p 2, E [jX0j^p] < 1, then

E

"

sup

t2[0;T ]jXtj^p

#

< 1:

Proof. ...

(25)

2. SIMULATION OF SDE IN DIMENSION ONE 25

1.3. Weak solutions. Let us see only a particular example of result about weak solutions. As- sume that is constant and non-degenerate; for simplicity, assume it equal to the identity, namely consider the SDE with additive noise

dXt= b (t; Xt) dt + dBt:

Moreover, assume b only measurable and bounded (or continuous and bounded if we prefer to maintain the general assumption of continuity). The key features of these assumptions are: the noise is non- degenerate (hence more restrictive than above for strong solutions) but b is very weak, much weaker than the usual Lipschitz case. Under such assumption on b, if we do not have the noise dBt in the equation, it is easy to make examples without existence or without uniqueness.

Theorem 4. Under these assumptions, for every x0 2 R^d, there exists a weak solution and it is unique in law.

2. Simulation of SDE in dimension one 2.1. Linear example. Consider the equation, with > 0,

dX_t= X_tdt + dB_t; X₀ = x:

Its Euler discretization, on intervals of constant amplitude, has the theoretical form

X_t_n+1 X_t_n = X_t_ndt + p

dtBtn+1 Btn

pdt dt := t_n+1 t_n

where the r.v.’s

Z_n= Btn+1 Btn

pdt

are standard Gaussian and independent. The algorithmic form can be the following one; …rst we construct a function, the drift, then we write the main part of the code:

drift=function(x) -x

n=10000; dt=0.01; h=sqrt(dt); sig=1 X=1:n; X[1]=1

for (t in 1:(n-1)) {X[t+1]=X[t]+ dt*drift(X[t]) + h*sig*rnorm(1)}

plot(X,type="l", col=2); abline(h=0); abline(0,1000)

(26)

26 2. SDES AND PDES

Exercise 1. Try with other initial conditions and other values of and .

2.2. Nonlinear example. Consider the equation (called "two-well-potential")

dX_t= X_t X_t³ dt + dB_t; X₀= x:

Its Euler discretization is

X_t_n+1 X_t_n = X_t_n X_t³_n dt + p dtZ_n

dt := t_n+1 t_n; Z_n= B_t_n+1 B_t_n pdt :

The code is:

drift=function(x) x-x^3

n=10000; dt=0.01; h=sqrt(dt); sig=0.5 X=1:n; X[1]=1

for (t in 1:(n-1)) {X[t+1]=X[t]+ dt*drift(X[t]) + h*sig*rnorm(1)}

plot(X,type="l", col=4); abline(h=0); abline(0,1000)

(27)

3. LINKS BETWEEN SDES AND LINEAR PDES 27

Exercise 2. Try with other initial conditions and other values of and n.

2.3. Important exercise. In both cases of the two examples above, plot an istogram and a

…tted (non-parametric) continuous density of the distribution at time t (for instance for t = 1, 10, 50, 100).

3. Links between SDEs and linear PDEs

3.1. Fokker-Planck equation. Along with the stochastic equation (1.1) de…ned by the coe¢ - cients b and , we consider also the following parabolic PDE on [0; T ] R^d:

(3.1) @p

@t = 1 2

X@_i@_j(a_ijp) div (pb) ; pjt=0= p₀

called Fokker-Planck equation. Here

a = ^T:

Although in many cases it has regular solutions, in order to minimize the theory it is convenient to introduce the concept of measure-valued solution t; moreover we restrict to the case of probability measures. To give a meaning to certain integrals below, we assume (beside other assumptions depending on the result)

b; bounded continuous

but it will be clear that more cumbersome results can be done by little additional conceptual e¤ort.

We loosely write

(3.2) @ t

@t = 1 2

Xd i;j=1

@i@j(aij t) div ( tb) ; j^t=0= 0

(28)

28 2. SDES AND PDES

but we mean the following concept. By h ^t; i we mean R

R^d (x) t(dx). By Cb [0; T ] R^d we denote the space of bounded continuous functions ' : [0; T ] R^d ! R and by C_b^1;2 [0; T ] R^d the space of functions ' such that ';^@_@t; @_i ; @_i@_j 2 Cb [0; T ] R^d .

Definition 5. A measure-valued solution of the Fokker-Planck equation (3.2) is a family of Borel probability measures ( _t)_{t2[0;T ]} on R^d such that t 7! h t; ' (t; :)i is measurable for all ' 2 C_b [0; T ] R^d and

h t; (t; )i h 0; (0; )i = Z t

0

*

s; 0

@@

@s + 1 2

Xd i;j=1

a_ij@_i@_j + b r 1 A (s; )

+ ds

for every 2 Cb^1;2 [0; T ] R^d .

Theorem 5 (existence). The law t of Xt is a measure-valued solution of the the Fokker-Planck equation (3.2).

Proof. Let be of class C_b^1;2 [0; T ] R^d . By Itô formula for (t; Xt), we have

d (t; X_t) = @

@t (t; X_t) dt + r (t; Xt) dX_t+1 2

Xd i;j=1

@_i@_j (t; X_t) a_ij(t; X_t) dt

= @

@t (t; X_t) dt + r (t; Xt) b (t; X_t) dt + r (t; Xt) (t; X_t) dB_t+1 2

Xd i;j=1

@_i@_j (t; X_t) a_ij(t; X_t) dt:

We have ERT

0 jr (t; Xt) (t; X_t)j²dt < 1 (we use here that and r are bounded), hence ERt

0r (s; Xs) (s; X_s) dW_s= 0 and thus (all terms are …nite by the boundedness assumptions) E [ (t; X_t)] E [ (0; X₀)] = E

Z t 0

@

@s(s; X_s) ds + E Z t

0 r (s; Xs) b (s; X_s) ds +1

2 Xd i;j=1

E Z t

0

@i@j (s; Xs) aij(s; Xs) ds:

Since E [ (t; Xt)] =R

R^d (t; x) t(dx) (and similarly for the other terms) we get the weak formulation of equation (3.2). The preliminary property that t 7! h t; ' (t; :)i = E [' (t; Xt)] is measurable for all ' 2 Cb [0; T ] R^d is easy.

Remark 3. Under suitable assumptions, like the simple case when aij is the identity matrix, if 0

has a density p₀ then also _t has a density p (t; ), often with some regularity gained by the parabolic structures, and thus the Fokker-Planck equation in the di¤ erential form (3.1) holds.

(29)

Theorem 6 (uniqueness). Assume that the backward parabolic equation (called backward Kol- mogorov equation)

@u

@t +1 2

Xd i;j=1

a_ij@_i@_ju + b ru = 0 on [0; T₀] R^d uj^t=T0 =

has, for every T0 2 [0; T ] and 2 Cc¹ R^d , at least one solution u of class C_b^1;2 [0; T0] R^d . Then the Fokker-Planck equation (3.2) has at most one measure-valued solution.

Proof. If tis a measure-valued solution of the Fokker-Planck equation and u is a C_b^1;2 [0; T₀] R^d solution of the Kolmogorov equation, then (from the identity which de…nes measure-valued solutions and the identity of Kolmogorov equation)

h T0; i h 0; u (0; )i :

Then, if ⁽ⁱ⁾_t , i = 1; 2, are two measure-valued solutions of the Fokker-Planck equation with the same initial condition ₀, we have

D ₍₁₎

T0; E

=D ₍₂₎

T0; E : This identity holds for every 2 Cc¹ R^d , hence ⁽¹⁾_T

0 = ⁽²⁾_T

0. The time T₀2 [0; T ] is arbitrary, hence

(1) = ⁽²⁾.

Obviously the weak aspect of the previous uniqueness result is the assumption, not explicit in terms of the coe¢ cients. The reason is that there are two main cases when such implicit assumption is satis…ed. One is the case when b and are very regular; the other is when is non-degenerate. As an example of the second case, let us mention the fundamental case of heat equation.

Example 1. Consider the case b = 0, = Id. The forward Kolmogorov equation

@v

@t = 1 2

Xd i;=1

@_i²v on [0; T0] R^d vj^t=0=

has the explicit solution

v (t; x) = Z

p_t(x y) (y) dy

which is in…nitely di¤ erentiable with all bounded derivatives. The function u (t; x) = v (T₀ t; x) is then an explicit and regular solution of the backward Kolmogorov equation above. In this case, therefore, the Fokker-Planck equation has a unique measure-valued solution.

(30)

30 2. SDES AND PDES

3.2. Backward Kolmogorov equation. Along with the stochastic equation (1.1) de…ned by the coe¢ cients b and , we consider also the following backward parabolic PDE on [0; T ] R^d, called backward Kolmogorov equation:

@u

@t +1 2

Xd i;j=1

a_ij@_i@_ju + b ru = 0; ujt=T = :

To express in full generality the relation with the SDE we have to introduce the SDE on the time interval [t0; T ], with any t0 2 [0; T ]:

Xt= x + Z t

t0

b (s; Xs) ds + Z t

t0

(s; Xs) dBs; t 2 [t⁰; T ] .

Obviously, on [t₀; T ], we have the same results as on [0; T ], in particular strong existence and pathwise uniqueness under Lipschitz assumptions. Assume these conditions and denote the unique solution, de…ned on some …ltered probability space, by X_t^t⁰^;x. The relation with the backward Kolmogorov equation is

u (t; x) = Eh

X_T^t;x i :

This relation holds under di¤erent assumptions and for solutions u with di¤erent degrees of regularity.

Let us start with the most elementary result.

Proposition 1. If u is a solution of the backward Kolmogorov equation of class C^1;2 [0; T ] R^d , with bounded ru and , then u (t; x) = E

h

X_T^t;x i

.

Proof. Given t0 2 [0; T ], we apply Itô formula to u t; Xt^t⁰^;x on [t0; T ]. The computation is the same done in the proof of Theorem 5. Since u solves the backward Kolmogorov equation, we get

T; X_T^t⁰^;x = u (t₀; x) + Z T

t0

ru t; Xt^t⁰^;x t; X_t^t⁰^;x dB_t

where we have used the identity X_t^t₀⁰^;x = x. Using the boundedness assumption, ru t; Xt^t⁰^;x

t; X_t^t⁰^;x is of class M², hence E

Z T t0

ru t; Xt^t⁰^;x t; X_t^t⁰^;x dB_t = 0:

The relation u (t0; x) = E h

X_T^t⁰^;x i

follows.

Remark 4. Let us stress the duality of the previous problems. The law t of X_t acts on test functions byR

d _t which is equal to E [ (X_t)]; hence there is a form of duality between the law _t and the expression E [ (Xt)]. The PDE satis…ed by the law t and the PDE satis…ed by Eh

X_T^t;x i are in duality, as rigorously claimed by Theorem 6. It is a general fact of linear problem that existence for a problem gives uniqueness for the dual one: think to the fact that full range of a matrix implies kernel zero for the transpose matrix; hence Theorem 6 is natural.

(31)

3.3. Macroscopic limit. We may reformulate Theorem 5 as a macroscopic limit of a system of non-interacting particles.

Let B_tⁱ, i 2 N, be a sequence of independent Brownian motions in R^d, de…ned on a …ltered probability space ( ; F; Ft; P ). Let X₀ⁱ be independent R^d-r.v.’s, F0-measurable, with the same law

0. Let b; as in Theorem 5; more precisely, let us assume they are bounded continuous and satisfy the Lipschitz conditions of the classical theorem of existence and uniqueness (one may ask less, sinc ehere we only need uniqueness in law). Consider the sequence of SDEs in R^d

dX_tⁱ = b t; X_tⁱ dt + t; X_tⁱ dB_tⁱ; Xⁱjt=0= X₀ⁱ:

One can prove that the processes X_tⁱ have the same law; denote the marginal at time t of X_tⁱ by t; we already know that _t is a measure-valued solution of the Fokker-Planck equation.

The processes X_tⁱ are independent, since each X_tⁱis adapted to the corresponding Brownian motion B_tⁱ and initial condition X₀ⁱ, which are independent. One can provide rigorous proofs of these facts, which however are extremely intuitive.

Consider, for each N 2 N, the empirical measure St^N := _N¹ PN

i=1 X_tⁱ. Consider also molli…ers (x) = ^d ¹x , with 2 Cc¹ R^d , 0,R

(x) dx = 1.

Theorem 7. For every t 2 [0; T ], a.s. we have weak convergence of S^Nt to the solution _t of the PDE (3.2). Moreover, if _t has a density _t, with _t 2 Cb R^d for some t 2 [0; T ], then for every x 2 R^d we have

N !1lim ^N S_t^N (x) = _t(x) in mean square (L²( )-limit), as soon as lim_{N !1}

d N

N = 0.

Proof. The proof of the …rst claim is the same done above for the Brownian motions: given 2 Cb R^d , the r.v. X_tⁱ are bounded i.i.d., hence by the strong Law of Large Numbers we have, a.s.,

N !1lim S_t^N; = E X_t¹ = h ^t; i :

Then one can …nd a full probability event 0 where the same convergence holds for every 2 Cb R^d , by the argument of Corollary 1.

The proof of the second claim is also the same as the one of Theorem 2, where the existence of the density _tis used, along with its continuity and …niteness of k tk₁.

For the sequel of these lecture it is very important to extract the following message from this result. If a family of cells X_tⁱ are subject to a drift b t; X_tⁱ which move them in a certain direction, at the PDE level the density _t(x) of those cells will be subject to the transport term

div ( b) :

And conversely, if we read such a term in the PDE for a density, the interpretation is a drift which acts on the particles of that density. The most common case of vector …eld b is a gradient …eld of a potential U

b = rU

in which case we say that particles move along the gradient of U . The transport term has the form div ( rU) that we shall meet several times below in cancer models.

(32)

32 2. SDES AND PDES

Similarly, but this is in a sense more obvious, if the cells X_tⁱ are subject also to a random motion described by a Brownian motion dB_tⁱ, then the density is subject to the di¤usion term ¹₂ . This relation is generalized to the case of a non homogeneous, non isotropic random motion of the form

t; X_tⁱ dB_tⁱ and the corresponding di¤usion term P

ij@_i@_j(a_ij ) in the PDE; however, the homogeneous isotropic case dB_tⁱ is the rule, in absence of special phenomena.

Finally,

4. Simulations of the macroscopic limit

4.1. Simulation of simple PDEs. First of all, let us see a simple code to simulate the heat equation

@

@t = k on [0; T ] R^d jt=0 = ₀

in dimension d = 1. The method is the …nite di¤erence one, with explicit Euler in time. There is a famous and essential rule to know: the time step dt and space step dx must be related by the condition (sometimes called Von Neumann stability condition)

k dt dx²

1 2: At the boundary, we have imposed no-‡ux conditions.

NT= 10000; Nx=500; dx=0.1; K=1; T=100 L=dx*Nx; dt=(dx^2)/(4*K)

Nx.virt=Nx+1

u = matrix(nrow=Nx.virt, ncol=NT) X=seq(0,L,dx)

u[,1]=dnorm(X,L/2,5) M=max(dnorm(X,L/2,5))

plot(c(0,L),c(-0.01,0.1),type="n") for (t in 1:(NT-1)) {

for (i in 2:Nx) {

u[i,t+1] = u[i,t] + K * dt * ((u[i+1,t]-2*u[i,t]+u[i-1,t])) / (dx^2) }

u[1,t+1]=u[2,t+1]; u[Nx.virt,t+1]=u[Nx,t+1]

if(t%%T==0 ) {

polygon(c(0,L,L,0),c(-0.01,-0.01,0.1,0.1),col="white", border=NA) lines(X,u[,t+1])

abline(h=0); abline(h=M) }

}

(33)

4. SIMULATIONS OF THE MACROSCOPIC LIMIT 33

Running the simulation for a short while, the e¤ect of the boundary starts to appear and it is not so nice. However, if we set equal to zero at the boundary instead of the no-‡ux condition, slowly the mass disappears, which is even worse in a sense.

We suggest to change dt=(dx^2)/(4*K) into dt=(dx^2)/(2*K) and dt=(dx^2)/(1*K) to see the role of the stability condition.

Now we add a drift to the heat equation, in the Fokker-Plank form

@

@t = k div ( b) on [0; T ] R^d j^t=0= 0

and we choose a drift b which concentrates mass around a point:

b (x) = C₁(x x₀) e ^C²^{jx x}⁰^j²:

NT= 10000; Nx=500; dx=0.1; K=1; T=100 L=dx*Nx; dt=(dx^2)/(4*K)

Nx.virt=Nx+1

u = matrix(nrow=Nx.virt, ncol=NT) X=seq(0,L,dx)

b = -(X-L/2-20)*exp(-0.01*abs((X-L/2-20)^2)) u[,1]=dnorm(X,L/2,5)

M=max(dnorm(X,L/2,5))

for (i in 2:Nx) {

u[i,t+1] = u[i,t] + K * dt * ((u[i+1,t]-2*u[i,t]+u[i-1,t])) / (dx^2) - dt*(b[i+1]*u[i+1,t]-b[i]*u[i,t])/dx }

(34)

34 2. SDES AND PDES

u[1,t+1]=u[2,t+1]; u[Nx.virt,t+1]=u[Nx,t+1]

if(t%%T==0 ) {

polygon(c(0,L,L,0),c(-0.01,-0.01,0.1,0.1),col="white", border=NA) lines(X,u[,t+1])

abline(h=0); abline(h=M) }

}

If we want an interpretation, it can be the case of cells which move from their original position and gather around a blood vessel.

4.2. Particle simulation and comparison with the PDE. We …rst run the previous simulation of the heat equation (black line) together with the simulation of 10000 Brownian particles starting with the same initial density as the heat equation; we represent the pro…le of particles by means of the "density" R-command.

NT= 10000; Nx=500; dx=0.1; K=1/2; T=100; N=10000 L=dx*Nx; dt=(dx^2)/(4*K); h=sqrt(dt)

Nx.virt=Nx+1

u = matrix(nrow=Nx.virt, ncol=NT); XMB=matrix(nrow=N,ncol=NT) X=seq(0,L,dx)

u[,1]=dnorm(X,L/2,5) XMB[,1]=rnorm(N,L/2,5) M=max(dnorm(X,L/2,5))

XMB[,t+1]=XMB[,t]+h*rnorm(N) for (i in 2:Nx) {

u[i,t+1] = u[i,t] + K * dt * ((u[i+1,t]-2*u[i,t]+u[i-1,t])) / (dx^2) }

Elements of Mathematical Oncology Franco Flandoli