Stochastic Processes - lecture notes -

(1)

Stochastic Processes - lecture notes -

Matteo Caiaffa

Academic year 2011-2012

(2)

Chapter 1 Introduction to Stochastic processes

1.1 Propaedeutic definitions and theorems

Definition 1.1.1. (of probability space). A probability space is a triple (Ω,E , P) where

(i) Ω is any (non - empty) set, it is called the sample space,

(ii) E is a σ - algebra of subsets of Ω (whose elements are called events). Clearly we have {∅, Ω} ⊂ E ⊂ P(Ω) (= 2^Ω),

(iii) P is a map, called probability measure on E , associating a number with each element of E . P has the following properties: 0 ≤ P ≤ 1, P(Ω) = 1 and it is countably additive, that is, for any sequence A_i of disjoint elements of E , P(S^∞_i A_i) = P∞

i=1P(Ai).

Definition 1.1.2. (of random variable). A random variable is a map X : Ω → R such that ∀ I ∈ R : X⁻¹(I) ∈E (that is, a random variable is an E − measurable map), where I is a Borel set.

Let us consider the family FX of subsets of Ω defined by FX = {X⁻¹(I), I ∈B}, where B is the σ - algebra of the Borel sets of R. It is easy to see that FX is a σ - algebra, so we can say that X is a random variable if and only if FX ∈E . We can also say that FX

is the least σ - algebra that makes X measurable.

Definition 1.1.2 gives us the following relation: Ω −−−→ R and E^X ←−−−^X⁻¹ B. We now introduce a measure (associated with the random variable X) on the Borel sets of R:

µ_X(I) := P(X⁻¹(I)) and, resuming,

(5)

(Ω, E , P)

(R, X

? B,

X⁻¹

6

µ_X)

?

Definition 1.1.3. The measure µ_X is called the probability distribution¹ of X and it is the image measure of P by means of the random variable X.

The measure µX is completely and uniquely determined by the values that it assumes on the Borel sets I.² In other words, one can say that µX turns out to be uniquely determined by its distribution function³ F_X:

FX = µx(] − ∞, t]) = P(X ≤ t). (1.1.1) Definition 1.1.4. (of random vector). A random vector is a random variable with values in R^d, where d > 1. Formally, a random vector is a map X : Ω → R^d such that

∀ I ∈B : X⁻¹(I) ∈E .

The definition just stated provides the relations Ω −−−→ R^X ^d and E^d ←−−−^X⁻¹ B, where B is the smallest σ - algebra containing the open sets of R^d. Moreover, P corresponds with an image measure µ over the Borel sets of R^d.

Definition 1.1.5. (of product σ - algebra). Consider as I = I₁×I₂×I₃×. . . (rectangles in R^d) where I_k ∈ B. The smallest σ - algebra containing the family of rectangles I₁× I₂× I₃× . . . is called product σ-algebra, and it is denoted by B^d=B ⊗ B ⊗ B ⊗ . . . .

Theorem 1.1.6. The σ-algebra generated by the open sets of R^d coincides with the product σ-algebra B^d.

Theorem 1.1.7. X = (X₁, X₂, . . . , X_d) is a random vector if and only if each of its component X_k is a random variable.

Thanks to the previous theorem, given a random vector X = (X1, X2, . . . , Xd) and a product σ - algebra B^d, for any component Xk the following relations holds: Ω −−−→ R^X^k andE^d ←−−−^X^k⁻¹−B. To complete this scheme of relations, we introduce a measure associated

1For the sake of simplicity, we will omit the index X where there is no possibility of misunderstanding.

2Moreover, µX is a σ - additive measure over R.

3In italian: Funzione di ripartizione.

(6)

with P. Consider d monodimensional measures µk and define ν(I1 × I₂ × . . . × I_d) :=

µ₁(I₁)µ₂(I₂) . . . µ_d(I_d).

(Ω, E , P)

(R, Xk

? B,

X_k⁻¹

6

µ)

?

Theorem 1.1.8. The measure ν can be extended (uniquely) to a measure over B^d. ν is called product measure and it is denoted by µ₁⊗ µ₂⊗ . . . ⊗ µ_d.⁴

Definition 1.1.9. If µ ≡ µ₁⊗ µ₂⊗ . . . ⊗ µ_d, the k measures µ_k are said to be independent.

Example 1.1.10. Consider a 2 - dim random vector X₁₂ := (X₁, X₂) and the Borel set J = I × I with I ⊂ R. We obviously have Ω −−−→ R^X ^d and E^d ←−−−−^X^1,2⁻¹ B². Moreover, the map R^d −−−−−→ R^(π¹^,π²⁾ ², where (π₁, π₂) are the projections of R^d onto R², fulfills a commutative diagram between Ω, R^d and R².

Ω X

- R^d

R²

(π₁, π₂)

?

X

1,2

-

By definition of image measure, we are allowed to write:

µ_1,2(J ) = P(X1,2⁻¹(J )) = µ(J × R × . . . × R) and, in particular

µ_1,2(I × R) = µ1(I), µ_2,1(I × R) = µ2(I).

In addition to what we have recalled, using the well-known definition of Gaussian measure, the reader can also verify an important proposition:

Proposition 1.1.11. µ is a Gaussian measure.

4In general, µ 6= µ1⊗ µ2⊗ . . . ⊗ µd.

(7)

1.2 Some notes on probability theory

We recall here some basic definitions and results which should be known by the reader from previous courses of probability theory.

We have already established (definition 1.1.3) how the probability distribution is defined.

However we can characterize it by means of the following complex - valued function:

ϕ(t) = Z

R

e^itxµ(dx) = Z

R

e^itxdF = E(e^itx). (1.2.1)

Where E(X) = R

ΩXdP is the expectation value of X.

Definition 1.2.1. (of characteristic function). ϕ(t) (in equation 1.2.1) is called the characteristic function of µx (or the characteristic function of the random variable X).

Theorem 1.2.2. (of Bochner). The characteristic function has the following properties (i) ϕ(0) = 1,

(ii) ϕ is continuous in 0,

(iii) ϕ is positive definite, that is, for any n with n an integer, for any choice of the n real numbers t₁, t₂, . . . , t_n and the n complex numbers z₁, z₂, . . . , z_n, results

n

X

i,j=1

ϕ(t_i − t_j)z_iz¯_j ≥ 0.

We are now interested to establish a fundamental relation existing between a measure and its characteristic function. To achieve this goal let us show that, given a random variable X and a Borel set I ∈B^d, one can define a measure µ using the following integral

µ(I) = Z

I

g(x)dx, (1.2.2)

with dx the ordinary Lebesgue measure in R^d, and

g_m,A =

1

√2π

d

√ 1

detAe⁻¹²^hA⁻¹(x−m),(x−m)i, (1.2.3) where m ∈ R^d is a number (measure value) and A = d × d is a covariance matrix, symmetric and positive definite.

Notice that the Fourier transform of the measure is just the characteristic function ϕ(t) =

Z

R^d

e^iht,xiµ(dx) (1.2.4)

(8)

and in particular, for a Gaussian measure, the characteristic function is given by the following expression:

ϕgm,A(t) = e^ihm,ti−¹²^hAt,ti. (1.2.5)

Some basic inequalities of probability theory.

(i) Chebyshev’s inequality: If λ > 0

P (|X| > λ) ≤ 1

λ^pE(|X|^p).

(ii) Schwartz’s inequality:

E(XY ) ≤p

E(X²)E(Y²).

(iii) Hölder’s inequality:

E(XY ) ≤ [E(|X|^p)]¹^p[E(|Y |^q)]¹^q, where p, q > 1 and ¹_p + ¹_q = 1.

(iv) Jensen’s inequality: If ϕ : R → R is a convex function and the random variable X and ϕ(X) have finite expectation values, then

ϕ(E(X)) ≤ E(ϕ(X)).

In particular for ϕ(x) = |x|^p, with p ≥ 1, we obtain

|E(X)|^p ≤ E(|X|^p).

Different types of convergence for a sequence of random variables X_n, n = 1, 2, 3, . . . .

(i) Almost sure (a.s.) convergence: X_n −−−→ X, if^a.s.

n→∞lim X_n(ω) = X(ω), ∀ω /∈ N, whereP (N ) = 0.

(ii) Convergence in probability: X_n −−^p→ X, if

n→∞lim P (|X_n− X| > ε) = 0, ∀ε > 0.

(9)

(iii) Convergence in mean of order p ≥ 1: Xn Lp

−−→ X, if

n→∞lim E(|Xn− X|^p) = 0.

(iv) Convergence in law: Xn

−→ X, ifL

n→∞lim F_X_n(x) = F_X(x),

for any point x where the distribution function F_X is continuous.

1.3 Conditional probability

Theorem 1.3.1. (of Radon - Nikodym). Let A be a σ - algebra over X and α, ν : A → R two σ - additive real - valued measures. If α is positive and absolutely continuous with respect to ν, then there exists a map g ∈ L¹(α), called density or derivative of ν with respect to α, such that

ν(A) = Z

A

g dα,

for any A ∈A . Moreover each other density function is equal to g almost everywhere.

Theorem 1.3.2. (of dominated convergence). Let {f_k} be a increasing sequence of measurable non - negative function, f_k : Rⁿ → [0, +∞], then if f (x) = lim_k→∞f_k(x) one

has Z

Rⁿ

f (x) dµ(x) = lim

k→∞f_k(x) dµ(x)

Lemma 1.3.3. Consider the probability space (Ω,E , P). Let F ⊆ E be a σ - algebra and A ⊆ Ω. Then there exists a unique F - measurable function Y : Ω → R such that for every F ⊆F

P(A ∩ F ) = Z

F

Y (ω)P(dω).

Proof F 7→ P(A ∩ F ) is a measure absolutely continuous with respect to P thus, by Randon - Nikodym theorem, there exists Y such that

P(A ∩ F ) = Z

F

Y dP.

Finally

Z

F

Y dP = Z

F

Y⁰dP ⇒ Z

F

(Y − Y⁰)dP = 0 for any F ⊆ Ω, that implies Y = Y⁰ almost everywhere with respect to P.

(10)

Definition 1.3.4. (of conditional probability). We call conditional probability of A with respect to F and it is denoted by P(A|F ), the only Y such that

P(A ∩ F ) = Z

F

Y (ω)P(dω).

(11)

Chapter 2 Gaussian Processes

2.1 Generality

Definition 2.1.1. (of stochastic process). A stochastic process is a family of random variables {X_t}_t∈T defined on the same probability space (Ω,E , P). The set T is called the parameter set of the process. A stochastic process is said to be a discrete parameter process or a continuous parameter process according to the discrete or continuous nature of its parameter set. If t represents time, one can think of Xt as the state of the process at a given time t.

Definition 2.1.2. (of trajectory). For each element ω ∈ Ω, the mapping t −→ X_t(ω)

is called the trajectory of the process.

Consider a continuous parameter process {X_t}_{t∈[0,T ]}. We can take out of this family a number d > 0 of random variables (X_t₁, X_t₂, . . . , X_t_d). For example, for d = 1 we can take a random variable anywhere in the set [0, T ] (this set being continuous) and, from what we know about a random variable and a probability space, we get the maps Ω −−−→ R and^X^t E ←−−−^X^t⁻¹ B. Moreover, we can associate a measure µtto P. Proceeding in this way for all the (infinite) Xt, we obtain a family of 1 - dim measures {µt}_{t∈[0,T ]} (1 - dim marginals).

Example 2.1.3. Let {X_t}_{t∈[0,T ]} be a stochastic process defined on the probability space (Ω,E , P), then consider the random vector X1,2 = (X_t₁, X_t₂). We can associate with P a 2 - dim measure µ_t₁_,t₂ and, for every choice of the components of the random vector, we construct a (infinite) family of 2 - dim measures {µ_t₁_,t₂}_t₁_,t₂∈[0,T ].

In general, to each stochastic process corresponds a family M of marginals of finite

(12)

dimensional distributions:

M = {{µt} , {µ_t₁_,t₂} , {µ_t₁_,t₂_,t₃} , . . . , {µ_t₁_,t₂_,...,t_n} , . . .} . (2.1.1)

A stochastic process is a model for something, it is a description of some random phenom- ena evolving in time. Such a kind of model does not exists by itself, but has to be built. In general, one starts from M to construct it (we shall see later, with some examples, how to deal with this construction). An important characteristic to notice about the family M is the so-called compatibility property (or consistency).

Definition 2.1.4. (of compatibility property). If {t_k₁ < . . . < t_k_m} ⊂ {t₁ < . . . < t_n}, then µ_t_k1_,...,t_km is the marginal of µ_t₁_,...,t_n, corresponding to the indexes k₁, k₂, . . . , k_m. Definition 2.1.5. (of Gaussian process). A process characterized by the property that each finite - dimensional measure µ ∈ M has a normal distribution, is said to be a Gaussian stochastic process. It is customary to denote the mean value of such a process by m(t) := E(X^t) and the covariance function with ϕ(s, t) := E[(X^t− m(t))(Xs− m(s))], where s, t ∈ [0, T ].

According to the previous definition, we can write the expression of the generic n - dimensional measure as:

µ_t₁_,...,t_n(I) :=

Z

I

1

√2π

n

1

pdet(A)e⁻¹²^hA⁻¹^(x−m^t1,...,tn^),x−m^t1,...,tnⁱdx, (2.1.2) where x ∈ Rⁿ, I is a Borel set, m_t₁_,...,t_n = (m(t₁), . . . , m(t_n)) and A = a_ij = ϕ(t_i, t_j),

∀ i, j = 1 . . . n. In equation 2.1.2 the covariance matrix A must be invertible. To avoid this constraint, the characteristic function (which does not depend on A⁻¹) can turn to be helpful:

ϕ_t₁_,...,t_n(z) = Z

Rⁿ

e^{i hz,xi}µ_t₁_,...,t_n(dx) = e^{i hm}^t1,...,tn^,zi−¹²^hAz,zi.

2.2 Markov process

Definition 2.2.1. (of Markov process). Given a process {X_t} and a family of transition probability p(s, x; t, I), {Xt} is said to be a Markov process if the following conditions are satisfied.

(i) (s, x, t) → p(s, x; t, I) is a measurable function, (ii) I → p(s, x; t, I) is a probability measure, (iii) p(s, x; t, I) = R

Rp(s, x; r, dz)p(r, z; t, I),

(13)

(iv) p(s, x; s, I) = δ_x(I),

(v) E(1I(X_t)|Fs) = p(s, X_s; t, I) = E(1I(X_t)|(X_s)), where Ft = σ{X_u : u ≤ t}.

Property (iii) is equivalent to state: P(Xt∈ I|X_s = x, X_s₁ = x₁, . . . , X_s_n = x_n) = P(Xt ∈ I|X_s = x). One can observe that the family of finite dimensional measures M does not appear in the definition.¹ Anyway M can be derived from the transition probability.

Setting I = (I1× I₂× . . . × I_n) and G = (I₁ × I₂× . . . × I_n−1):

µ_t₁_,t₂_,...,t_n(I)

= P ((Xt1, . . . , X_t_n) ∈ I)

= P (Xt1, . . . , X_t_n−1) ∈ G, X_t_n ∈ I_n

= Z

G

P(Xtn ∈ I_n|X_t_n−1 = x_n−1, . . . , X_t₁ = x₁)P(Xt1 ∈ dx₁, . . . , X_t_n−1 ∈ dx_n−1)

= Z

G

p(t_n−1, x_n−1; t_n, I_n)P(Xt1 ∈ dx₁, . . . , X_t_n−1 ∈ dx_n−1)

= Z

G

p(t_n−1, x_n−1; t_n, I_n)p(t_n−2, x_n−2; t_n−1, dx_n−1)P(Xt1 ∈ dx₁, . . . , X_t_n−2 ∈ dx_n−2).

By reiterating the same procedure:

µ_t₁_,t₂_,...,t_n(I)

= Z

R

Z

G

P(X0 ∈ dx₀)p(0, x₀; t₁, dx₁)p(t₁, x₁; t₂, dx₂) . . . p(t_n−1, x_n−1; t_n, dx_n), where P(X0 ∈ dx₀) is the initial distribution.

2.2.1 Chapman - Kolmogorov relation

Let X_t be a Markov process and, in particular, with s₁ < s₂ < · · · < s_n ≤ s < t

P(X^t∈ I|X_s = x, X_s_n = x_n, . . . , X_s₁ = x₁) = P(Xt ∈ I|X_s = x).

Denote the probability P(X^t∈ I|Xs= x) with p(s, x; t, I). Then the following equation p(s, x; t, I) =

Z

R

p(s, x; r, dz)p(r, z; t, I), s < t, (2.2.1)

holds and it is called the Chapman - Kolmogorov relation. Under some assumptions, one can write this relation in a different way, giving birth to various families of transition probability.

1We will see later (first theorem of Kolmogorov) howM can characterise a stochastic process.

(14)

2.2.2 Reductions of transition probability

In general there are two ways to study a Markov process:

(i) a stochastic method: studying the trajectories (X_t)(e.g. by stochastic differential equation),

(ii) an analytic method: studying directly the family {p(s, x; t, I)} (transition probability).

Take account of the second way.

1^st simplification: Homogeneity in time

Consider the particular case in which we can express the dependence on s, t by means of a single function ϕ(τ, x, I) which dependes on the difference t − s =: τ . So we have²

{p(s, x; t, I)} → {p(t, x, I)}. (2.2.2) It is easy to understand how propositions (i),(ii) and (iv) in the definition of Markov process change: fixing x we have that the map (t, x) → p(t, x, I) is measurable, I → p(t, x, I) is a probability measure and p(0, x, t) = δx(I). As far as the Chapman - Kolmogorov relation is concerned, the previous change of variables produces

p(t₁+ t₂, x, I) = Z

R

p(t₁, x, dz)p(t₂, z, I) or, changing time intervals,

p(t₁+ t₂, x, I) = Z

R

p(t₂, x, dz)p(t₁, z, I).

2^nd simplification

This is the particular case of a transition probability that admits density, such that p(s, x; t, I) = R

Iϕ(y)dy, where ϕ ≥ 0 and R

Rϕ(y)dy = 1. Again, as is customary, we replace ϕ with p:

p(s, x; t, I) = Z

I

p(s, x; t, y)dy, with p(s, x; t, y) ≥ 0 and p(s, x; t, I) = R

Rp(s, x; t, y)dy = 1. Under these new hypoteses,

2From now on, according to the notation adopted by other authors, the reducted forms of transition probability will be written with the same letters identifying variables of the original expression. However here (in the second member of 2.2.2) t is referred to the difference t − s and p is ϕ.

(15)

Chapman - Kolmogorov relation becomes:

p(s, x; t, I) = Z

R

p(s, x, r, z)p(r, z, t, y).

3^rd simplification

Starting from the assumption of homogeneity in time, there is another reduction:

p(t, x, I) = Z

I

p(t, x, y)dy.

A straightforward computation yields the new form of the Chapman - Kolmogorov relation p(t₁+ t₂, x, y) =

Z

R

p(t₁, x, z)p(t₂, z, y)

or the equivalent expression replacing t₁ with t₂. 4^th simplification: Homogeneity in space

This simplification allows us to write the transition probability as a function defined on R⁺× R:

p(t, x, y) = p(t, x)

where the new x actually represents the difference y − x. Proposition (iii) becomes

p(t₁+ t₂, x) = Z

R

p(t₁, y)p(t₂, x)dy = Z

R

p(t₂, y)p(t₁, x)dy

and p(t₁+ t₂) = p(t₁) ? p(t₂) is just the convolution. All these simplifications are resumed in the following diagram.

{p(s, x; t, I)} ^- {p(t, x, I)}

{p(s, x; t, y)}

?

- {p(t, x, y)}

?

- {p(t, x)}

-

2.3 Wiener process

In 1827 Robert Brown (botanist, 1773 - 1858) observed the erratic and continuous motion of plant spores suspended in water. Later in the 20’s Norbert Wiener (mathematician, 1894 - 1964) proposed a mathematical model describing this motion, the Wiener process (also called the Brownian motion).

(16)

Definition 2.3.1. (of Wiener process). A Gaussian, continuous parameter process characterized by mean value m(t) = 0 and covariance function ϕ := min{s, t} = s ∧ t, for any s, t ∈ [0, T ], is called a Wiener process (denoted by {Wt}).

We will often talk about a Wiener process following the characterisation just stated, but it is appreciable to notice that next definition can provide a more intuitive description of the fundamental properties of a process of this kind.

Definition 2.3.2. A stochastic process {Wt}t≥0 is called a Wiener process if it satisfies the properties listed below:

(i) W0 = 0,

(ii) {Wt} has, almost surely, continuous trajectory,³

(iii) ∀ s, t ∈ [0, T ], with 0 ≤ s < t, the increments Wt− W_sare stationary,⁴ independent random variables and W_t− W_s∼N (0, t − s).

Theorem 2.3.3. Definition 2.3.1 and 2.3.2 are equivalent.

Proof

(2.3.1) ⇒ (2.3.2)

The independence of increments is a plain consequence of the Gaussian structure of Wt. As a matter of fact we can express Wt and Wt− Ws as a linear combination of Gaussian random variables,

(W_s, W_t− W_s) = 1 0

−1 1

! W_s W_t

!

moreover, they are not correlated ⁵

a_ij = a_ji = E[(W_t_i − W_t_i−1)(W_t_j − W_t_j−1)]

= . . . = t_i− t_i−1+ t_i−1− t_i = 0, thus independent. Now we can write a Wiener process as:

W_t= (W^t

N − W₀) + (W^2t

N − W^t

N) + . . . + (W^tN

N − W^{t(N −1)}

N

), then

W^kt

N − W^(k−1)t

N

∼ N (0, t N).

3This statement is equivalent to say that the mapping t −→ W_tis continuous in t with probability 1.

4The term stationary increments means that the distribution of the nth increment is the same as for the increment Wt− W0, ∀ 0 < s, t < ∞.

5Observe that under the assumption s < t, the relations E(W^t) = m(t) = 0, Var(Wt) = min{t, t} = t, E(W^t− Ws) = 0 and E[(W^t− Ws)²] = t − 2s + s = t − s are valid.

(17)

(2.3.2) ⇒ (2.3.1)

Conversely, the ‘shape’ of the increments distribution implies that a Wiener process is a Gaussian process. Sure enough, for any 0 < t1 < t₂. . . < t_n, the probability distribution of a random vector (W_t₁, W_t₂, . . . , W_t_n) is normal because is a linear combination of the vector (W_t₁, W_t₂ − W_t₁, . . . , W_t_n − W_t_n−1), whose components are Gaussian by definition.

The mean value and the autocovariance function are:

E(Wt) = 0

E(WtW_s) = E[WS(W_t− W_s+ W_s)]

= E[Ws(W_t− W_s)] + E(Ws2)

= s = s ∧ t when s ≤ t.

Remark 2.3.4. Referring to definition 2.3.2, notice that the a priori existence of a Wiener process is not guaranteed, since the mutual consistency of properties (i)-(iii) has first to be shown. In particular, it must be proved that (iii) does not contradict (ii). However, there are several ways to prove the existence of a Wiener process.⁶ For example one can invoke the Kolmogorov theorem that will be stated later or, more simply, one can show the existence by construction.

Figure 2.3.1: Simulation of the Wiener process.

6Wiener himself proved the existence of the process in 1923.

(18)

2.3.1 Distribution of the Wiener Process

Theorem 2.3.5. The finite dimensional distributions of the Wiener process are given by µ_t₁_,...,t_n(B) =

Z

B

g_t₁(x₁)g_t₂−t1(x₂− x₁) . . . g_t_n−tn−1(x_n− x_n−1)dx₁. . . dx_n.

Proof Given a Wiener process W_t, so m(t) = 0 and φ(s, t) = min (s, t), let us compute µ_t₁_,...,t_n(B). We already know that

µ_t₁_,...,t_n(B) :=

Z

B

1

√2π

n

√ 1

det Ae⁻¹²^hA⁻¹^x,xidx, where the covariance matrix is

A =







t₁ t₁ · · · t₁ t1 t2 . . . · · · t2

... t₂ t₃ · · · t₃ ... · · · · t₁ t₂ · · · t_n







Performing some operations one gets:

det(A) = t₁(t₂− t₁) · · · (t_n− t_n−1).

We can now compute the scalar product hA⁻¹x, xi. Take y := A⁻¹x, then x = Ay and hA⁻¹x, xi = hy, xi, that is











x₁ = t₁(y₁+ · · · + y_n) x₂ = t₁y₁+ t₂(y₂+ · · · + y_n) ...

x_n= t₁y₁+ t₂y₂+ · · · + t_ny_n

Manipulating the system one gains the following expression for the scalar product hA⁻¹x, xi = x²₁

t1

+ (x₂− x₁)² t2− t1

+ · · · + (x_n− x_n−1)² tn− tn−1

,

then, introducing the function g_t(α) = ^√¹

2πte⁻^α2^t , we finally obtain µt1,...,tn(B) =

Z

B

gt1(x1)gt2−t1(x2 − x1) · · · gtn−tn−1(xn− xn−1)dx1· · · dxn.

(19)

2.3.2 Wiener processes are Markov processes

Consider a Wiener process (m(t) = 0 and φ(s, t) = min(s, t)).

Theorem 2.3.6. A Wiener process satisfies the Markov condition

P(W^t ∈ I|Ws = x, Ws1 = x1, . . . , Wsn = xsn) = P(W^t ∈ I|Ws = x). (2.3.1) Proof. Compute first the right-hand side of 2.3.1. By definition

P(W^t∈ I, Ws ∈ B) = Z

B

P(W^t ∈ I|Ws = x)P(W^s∈ dx), (2.3.2) otherwise

P(Wt∈ I, W_s ∈ B) = Z

B×I

g_s(x₁)g_t−s(x₂− x₁)dx₁dx₂

= Z

B

Z

I

g_t−s(x₂− x₁)dx₂

g_s(x₁)dx₁.

(2.3.3)

So, comparing 2.3.2 and 2.3.3 and noticing that

P(Ws ∈ dx) = g_s(x)dx, then

P(Wt ∈ I|W_s = x) = Z

I

g_t−s(y − x)dy a.e. . (2.3.4) Compunting the left-hand side of 2.3.1, we actually obtain the same result of 2.3.4. Con- sider an n + 1 - dim random variable Y and perform the computation for a Borel set B = J1× . . . Jn× J. As before

Z

B

P(W^t∈ I|Y = y)P(Y ∈ dy) = P(Wt ∈ I, Y ∈ B)

= P(Wt ∈ I, W_s₁ ∈ J₁, . . . , W_s_n ∈ J_n, W_s ∈ J).

Where P(Wt∈ I, W_s₁ ∈ J₁, . . . , W_s_n ∈ J_n, W_s∈ J) can be expressed as Z

B×I

g_s₁(y₁)g_s₂_−s₁(y₂ − y₁) · · · g_s−s_n(y_n+1− y_s)g_t−s(y − y_n+1)dy₁. . . dy_n+1dy.

(20)

Since P(Y ∈ dy1· · · dy_n+1) = g_s₁(y₁) · · · g_s−s_n(y_n+1− y_n)dy₁· · · dy_n+1 Z

B

P(Wt∈ I|Y = y)P(Y ∈ dy) = Z

B

Z

I

g_t−s(y − y_n+1)dy

P(Y ∈ dy), that implies

P(W^t∈ I|Y = y) = Z

I

gt−s(y − yn+1)dy a.e.

and, since in our case yn+1= x, the theorem is proved.

2.3.3 Brownian bridge

As it has already been said, one can think of a stochastic process as a model built from a collection of finite dimensional measures. For example, suppose to have a family M of Gaussian measures. We know that the 1 - dim measure µ_t(I) is given by the following integral:

µ_t(I) = Z

I

1

p2πσ²(t)e⁻

[x−m(t)]2

2σ2(t) dx. (2.3.5)

Now, for every choice of m(t) and σ²(t) we obtain an arbitrary 1 - dim measure.⁷ We can also construct the 2 - dim measure, but in that case, instead of σ²(t), we will have a 2 × 2 covariance matrix that must be symmetric and positive definite for the integral to be computed. In addition, the 2 - dim measure obtained in this way must satisfy the compatibility property (definition 2.1.4).

Example 2.3.7. Consider t ∈ [0, +∞] and a Borel set I. Try to construct a process under the hypotheses m ≡ 0 and a(s, t) = s ∧ t − st (a₁₁ = t₁− t₁², a₁₂ = t₁− t₁t₂, . . .).

It can be proved that the covariance matrix A = a_ij is symmetric and definite positive, thus equation (2.3.5) makes sense.

In conclusion we have now the idea of how to build a process given a family of finite dimensional measures M . On the other hand, it is interesting and prolific to make over the previous procedure asking ourselves if a process corresponding with a given M does exists . We will introduce chapter 3 keeping this question in mind.

Example 2.3.7 leads us to the following definition.

Definition 2.3.8. (of Brownian bridge). A Gaussian, continuous parameter process {X_t}_t∈[0,1] with continuous trajectories, mean value m(t) = 0, covariant function a(s, t) = s ∧ t − st and conditioned to have W (1) = 0, ∀ 0 ≤ s ≤ t ≤ T , is called a Brownian bridge process.

7For example, for suitable m(t) and a(s, t) we obtain just the Wiener process.

(21)

There is an easy way to construct a Brownian bridge from a Wiener process:

X(t) = W (t) − tW (1) ∀ 0 ≤ t ≤ 1.

Observe also that given a Wiener process, we do not have to worry about the existence of the Brownian bridge. It is interesting to check that the following process represents a Brownian bridge too:

Y (t) = (1 − t) W t 1 − t

∀ 0 ≤ t ≤ 1, Y (1) = 0.

Figure 2.3.2: Simulation of the Brownian bridge.

2.3.4 Exercises

Example 2.3.9. Consider a Wiener process {Wt}, the vector space L²_loc(R⁺) and a function f : R⁺ → R locally of bounded variation (BV from now on) and locally in L². For any trajectory define

X = Z T

0

f (t)dW_t , X(ω) = Z T

0

f (t)dW_t(ω). (2.3.6)

Observe that we are integrating over time, hence we can formally remove the ω that appears in the integrand as a parameter. Integration by parts yields

X(ω) = Z T

0

f (t)dWt = f (t)Wt− f (0)W0− Z T

0

Wtdf (t).

(22)

You are now wondering how X is distributed. To this purpose fix a T , choose any partition π = {0, t₁, t₂, . . . , t_n} of [0, T ] and search for the distribution of the sum

∞

X

i=1

f (t_i)(W_t_i+1− W_t_i) =: X_π, (2.3.7)

whose limit is just the first equation in 2.3.6. Look now at the properties of X_π. First of all notice that it is a linear combination of Gaussian processes. Then, because of the properties of a Wiener process,⁸ the covariance function of X_π (sum of the covariance functions) is Pn−1

i=0 f (ti)²(ti+1− ti), thus X_π ∼

0,

n−1

X

i=0

f (t_i)²(t_i+1− t_i)

(2.3.8)

and, for n → ∞,

n−1

X

i=0

f (t_i)²(t_i+1− t_i) −→

Z T 0

f (t)²dt = kf k²_L2 (0,T ).

By construction we have, for π → 0, that X_π(ω → X(ω). The same result can be achieved independently by calling into account the characteristic function of X_π:

ϕ(λ)_X_π = E(e^iλX^π) = e⁻¹²^λ²^Pⁿ⁻¹ⁱ⁼⁰ ^{f (t}ⁱ⁾²^(tⁱ⁺¹^−tⁱ⁾ −−−−−−→ e^π→0 ⁻¹²^R⁰^T^{f (t)}²^d^t.

This is a sequence of function e^iλX^π^(ω) that converges pointwise to e^iλX(ω). Observing that

|e^iλX^π| = 1 and calling to action the Lebesgue dominated convergence theorem, as π → 0 one has

ϕ(λ)_X_π −→ ϕ(λ)_X = E(e^iλX) = e⁻¹²^λ²^R⁰^T^{f (t)}²^dt

The integral X =RT

0 f (t)dW_thas been computed; for a more general expression notice that if its upper bound is variable, one gets the process

X_t:=

Z t 0

f (r)dW_r. (2.3.9)

Example 2.3.10. Given a Wiener process, a Borel set I and the rectangle G = (x − δ, x + δ) × I, compute the marginal µ_s,t(G) ∈ M .

8Independent increments distributed as Wt− Ws∼N (0, t − s).

(23)

Due to the features of the Wiener process one has µ_s,t(G) = P(Ws, W_t∈ G) =

Z

G

g(x)dx = Z

G

√1 2π

√ 1

detAe⁻¹²^hA⁻¹(x−0),(x−0)idx (2.3.10) where

A =s s s t

and A⁻¹ = 1 detA

t − s

−s s

. Set z = (x − 0), then

(z₁, z₂)

t − s

−s s

z₁ z₂

= . . . = tz₁²− 2sz₁z₂+ sz₂², so

Z

G

√1 2π

√ 1

detAe⁻¹²^hA⁻¹^z,zidz = 1

√2π

√ 1 st − s²

Z

G

e⁻

tz21 −2sz1z2+sz2 2

2(st−s2) dz₁dz₂

= . . . = 1

√2π

√ 1

st − s² Z

G

e⁻

z21

2se⁻^(z2−z1)

2

t−s dz₁dz₂.

At this point, bringing to mind the transition function g_t(ξ) = ^√¹

2πte⁻^ξ2^2t we have in this case:

P (Ws, W_t) ∈ G = Z

G=I×(x−δ)(x+δ)

g_s(z₁) · g_t−s(z₂− z₁)dz₁dz₂. (2.3.11)

Example 2.3.11. Let {W_t} be a Wiener process, I a Borel set and s, t ∈ [0, T ] with s < t. Compute the probability P(W^t∈ I|Ws= x).

By definition of conditional probability we know that

P(W^t∈ I|Ws = x) = P(W^t∈ I, W_s= x)

P(Ws = x) . (2.3.12)

When the denominator of the right hand-side is zero (as in this case), the numerator is zero too⁹ and formula 2.3.12 is not useful (0/0). Try then to compute the probability

P(Wt ∈ I|W_s ∈ (x − δ, x + δ)) = P(Wt ∈ I, W_s ∈ (x − δ, x + δ))

P(W^s ∈ (x − δ, x + δ) , (2.3.13) where P(Ws ∈ (x − δ, x + δ) > 0.

9Consider two events A, B and the expression ^P(A∩B)

P(B) . If P(B) = 0, since P(A ∩ B) ⊂ P (B) also the numerator would be zero.

(24)

Using the result of exercise 2.3.10 we have

P(Wt∈ I, W_s ∈ (x − δ, x + δ)) P(Ws ∈ (x − δ, x + δ) =

R

I

R

(x−δ)(x+δ)g_s(z₁)g_t−s(z₂− z₁)dz₁

dz₂

R

(x−δ)(x+δ)g_s(z₁)g_t−s(z₂− z₁)dz₁

dz₂

=

R

x+δ x−δ g_s(z₁)

R

Ig_t−s(z₂− z₁)dz₂

dz₁

R

Rg_t−s(z₂− z₁)dz₂

dz₁

=

1 2δ

R

Ig_t−s(z₂− z₁)dz₂

dz₁

1 2δ

R

x+δ

x−δ g_s(z₁)dz₁

.

Latest expression has been achieved by observing that R

Rg_t−sdz₂ = 1 because W_s is Gaussian (with known variance) and then dividing numerator and denominator by _2δ¹. If we computed the limit, we would obtain 0/0 again, but after applying De L’Hospital’s rule we have

. . . −→ g_s(x)

R

Ig_t−s(z₂− x)dz₂

g_s(x) =

Z

I

g_t−s(z₂− x)dz₂ and, finally, we get the transition probability

P(Wt∈ I|W_s= x) = P(Wt− W_s ∈ I − x). (2.3.14)

2.4 Poisson process

A random variable T_Ω → (0, ∞) has exponential distribution of parameter λ > 0 if, for any t ≥ 0,

P(T > t) = e^−λt. In that case T has a density function

fT(t) = λe^−λt1_(0,∞)(t) and the mean value and the variance of T are given by

E(T ) = 1

λ , Var(T ) = 1 λ².

Proposition 2.4.1. A random variable T : Ω → (0, ∞) has an exponential distribution

(25)

if and only if it has the following memoryless property

P(T > s + t|T > s) = P(T > t) for any s, t ≥ 0.

Definition 2.4.2. (of Poisson process.) A stochastic process {N_t, t ≥ 0} defined on a probability space (Ω,E , P) is said to be a Poisson process of rate λ if it verifies the following properties

(i) Nt = 0,

(ii) for any n ≥ 1 and for any 0 ≤ t₁ ≤ . . . ≤ t_nthe increments N_t_n−N_t_n−1. . . N_t₂−N_t₁, are independent random variables,

(iii) for any 0 ≤ s < t, the increment N_t−N_s has a Poisson distribution with parameter λ(t − s), that is,

P(Nt− N_s= k) = e^−λ(t−s)[λ(t − s)]^k k! , where k = 0, 1, 2, . . . and λ > 0 is a fixed constant.

Definition 2.4.3. (of compound Poisson process.) Let N = {N_t, t ≥ 0} be a Poisson process with rate λ. Consider a sequence {Yn, n ≥ 1} of independent and identically distributed random variables, which are also independent of the process N . Set Sn = Y₁ + . . . Y_n. Then the process

Z_t= Y₁+ . . . + Y_N_t = S_N_t,

with Zt = 0 if Nt= 0, is called a compound Poisson process.

Proposition 2.4.4. The compound Poisson process has independent increments and the law of an increment Z_t− Z_s has characteristic funcion

e(^ϕ^Y1^(x)−1)^λ(t−s) , where ϕY1(x) denotes the characteristic function of Y1.

(26)

Chapter 3 Kolmogorov’s theorems

3.1 Cylinder set

Consider a probability space (Ω,E , P) and a family of random variables {Xt}_{t∈[0,T ]}. Every ω ∈ Ω corresponds with a trajectory t −→ X_t(ω), ∀ t ∈ [0, T ]. We want to characterize the mapping X defined from Ω in some space of functions, where X is a function of t: X(ω)(t) := Xt(ω), ∀ ω ∈ Ω. If we have no information about trajectories, we can assume some hypotheses in order to define the codomain of the mapping X. Suppose, for example, to deal only with constant or continuous trajectories. This means to immerse the set of trajectories into the larger sets of constant or continuous functions, respectively.

In general, if also the regularity of the trajectories is unknown, we can simply consider the very wide space R^{[0,T ]} of real function, then X : Ω −→ R^{[0,T ]}. We want now to find the domain of the mapping X⁻¹, that is, a σ - algebra that makes X measurable with respect to the σ - algebra itself. To this end, let us introduce the cylinder set

C_t₁_,t₂_,...,t_n_,B = {f ∈ R^{[0,T ]} : (f (t₁), f (t₂), . . . , f (t_n)) ∈ B},

where B ∈ B^d (B is called the base of the cylinder) and C ⊆ R^{[0,T ]}. Then consider the set C of all the cylinders C:

C = {Ct1,t2,...tn,B, ∀ n ≥ 1, ∀ (t, t₁, t₂, . . . , t_n), ∀ B ∈Bⁿ}.

Actually, C is an algebra but not σ - algebra. Anyway we know that there exists the smallest σ - algebra containing C . We are going to denote it by σ(C ). Now we can show that X⁻¹ : σ(C ) −→ E .

Theorem 3.1.1. The preimage of σ(C ) lies in E .

(27)

Proof Consider the cylinder set C_t₁_,t₂_,...,t_n_,B ∈C ⊆ σ(C ).

X⁻¹(C_t₁_,t₂_,...,t_n_,B) = {ω ∈ Ω : X(ω) ∈ C_t₁_,t₂_,...,t_n}

= {ω ∈ Ω : (X(ω)(t₁), X(ω)(t₂), . . . , X(ω)(t_n)) ∈ B}

= {X_t₁(ω), X_t₂(ω), . . . , X_t_n(ω) ∈ B} ∈E .

(3.1.1)

Now we consider the sigma algebra {X⁻¹(E) ∈ E } = F for any E ⊂ R^{[0,T ]} and, since C ⊂ F , the theorem is proved.

To fulfill the relations between the abstract probability space and the objects which we actually deal with, notice that the map X is measurable as a function (Ω,E ) → (R^{[0,T ],σ(}^{C )}), so we can consider the image measure of P defined as usual:

µ(E) = P(X⁻¹(E)), ∀ E ∈ σ(C ).

If we now consider two preocesses X, X⁰ (also defined on different probability spaces) with the same family M , that is

P((Xt1, X_t₂, . . . , X_t_n)) ∈ E) = P((Xt⁰1, X_t⁰

2, . . . , X_t⁰

n)) ∈ E), then also µX = µ_X⁰. In other words, µX is determined byM only.

3.2 Kolmogorov’s theorem I

Theorem 3.2.1. (of Kolmogorov, I). Given a family of probability measures M = {{µ_t} , {µ_t₁_,t₂} , . . .} with the compatibility property 2.1.4, there exists a unique measure µ on (R^{[0,T ]}, σ(C )) such that µ(Ct1,t2,...,tn,B) = µt1,t2,...,tn(B) and we can define, on the space (R^{[0,T ]}, σ(C ), µ), the stochastic process Xt(ω) = ω(t) (with ω ∈ R^{[0,T ]}) which has the family M as finite dimensional distributions.

Although this theorem assures the existence of a process, it is not a constructive theorem, so it does not provide enough informations in order to find a process given a family of probability measures. For our purposes it is more useful to take into account another theorem, that we will call second theorem of Kolmogorov.¹ From now on, if not otherwise specified, we will always recall the second one.

1Note that some authors call it Kolmogorov’s criterion or Kolmogorov continuity criterion.

(28)

3.3 Kolmogorov’s theorem II

Before stating the theorem, let us recall some notions.

Definition 3.3.1. (of equivalent processes). A stochastic process {X_t}_{t∈[0,T ]} is equivalent to another stochastic process {Y_t}_{t∈[0,T ]} if

P(X^t6= Y_t) = 0 ∀ t ∈ [0, T ].

It is also said that {Xt}_{t∈[0,T ]} is a version of {Yt}_{t∈[0,T ]}. However, equivalent processes may have different trajectories.

Definition 3.3.2. (of continuity in probability). A real - valued stochastic process {X_t}_{t∈[0,T ]}, where [0, T ] ⊂ R, is said to be continuous in probability if, for any > 0 and every t ∈ [0, T ]

lims→tP(|Xt− X_s| > ) = 0. (3.3.1) Definition 3.3.3. Given a sequence of events A₁, A₂, . . ., one defines lim supA_k as the event lim supA_k=T∞

n=1

S∞ k=nA_k.

Lemma 3.3.4. (of Borel Cantelli). Let A₁, A₂, . . . be a sequence of events such that

∞

X

k=1

P(A^k) < ∞,

then the event A = lim supAK has null probability P(A) = 0. Of course P(A^c) = 1, where A^c =S∞

n=1

T∞ k=nA^c_k.

Proof We have A ⊂S k = n^∞A_k. Then

P(A) ≤

∞

X

k=n

P(Ak)

which, by hypothesis, tends to zero when n tends to infinity.

Definition 3.3.5. (of Hölder continuity).² A function f : Rⁿ −→ K, (K = R ∨ C) satisfies the Hölder condition, or it is said to be Hölder continuous, if there exist non - negative real constants C and α such that

|f (x) − f (y)| ≤ C|x − y|^α

2More generally, this condition can be formulated for functions between any two metric space.

Stochastic Processes - lecture notes -