Stochastic filtering: a functional and probabilistic approach.

(1)

Master’s Degree programme

in Economics and Finance

Final thesis

Stochastic filtering:

a functional and probabilistic approach

Supervisor

Ch. Prof. Antonella Basso

Graduand

Gianmarco Vidotto

Matriculation Number 842271

(2)

(3)

Introduction

In this thesis we will try to study the so-called ”Stochastic filtering” prob-lem. We will work in the continuous time setting and we will try to be mathematically rigorous.

In our context, filtering means using a set of ”dirty” observations (dirty means ”a↵ected by noise”) and try to gather some information about an unobservable process. Basically we have to filter this observations and try to distinguish the ”real” compononent (which contains useful informations on the generator process) from the noise.

Nowadays, the filtering theory is extensively used in a broad range of applications: from the aerospace industry to GPS-tracking.

The structure is as follows.

In the first chapter we will try to give a sound and rigorous background on stochastic calculus and on the theory of Stochastic Di↵erential Equations (SDEs). We will state the classical theorems and we will provide the proofs of the most interesting (and maybe less known) results. We will try to collect here all the probability one needs to know to read, in a profitable way, the remaining chapters.

In the second chapter we will study the filtering problem in a linear setting using a functional approach. We will prove and comment all the steps towards the derivation of the famous Kalman-Bucy filter. In the end, there will be two simple applications (with MATLAB) in order to show the main drawbacks of the continuous time setting and we will gave a little taste of the discrete time version of Kalman filter.

Finally, in the last chapter, we will study the non-linear case of the filter-ing problem usfilter-ing a probabilistic approach.

(6)

Chapter 1 Probability and SDE theory

In this chapter I introduce some useful results on probability and measure theory I will use later on and I summerize the most important results about the theory of Stochastic Di↵erential Equation (SDE). This chapter is delib-erately dense since I want to prepare the background and introduce all the notation.

1.1 Stochastic Process

A probability space, denoted as (⌦,_{F, P ) is a triple where ⌦ is a non empty}

sey, F is a -algebra on ⌦ and P is a probability measure. A set A ⇢ ⌦

is said to be measaurable (w.r.t _{F) if A 2 F. A measaurable set B is}

said to be negligible if P (B) = 0. It is common, in probability theory, to

denote the elements of _{F as event. Using this terminology, we say that an}

event A happens almost surely (a.s) if P (A) = 1. Finally, P is said to be complete if every subset of a negligible event is measurable. The assumption of completeness is not restrictive as we can always complete a measure space.

Let I be a real interval of the form [0, T ] or [0, +_{1[. A stochastic process}

on RN _{is a collection of random variables with values in} _RN _{such that the}

map

X : I_{⇥ ⌦ ! R}N_{, X(t, !) = X}

t(!) (1.1)

(7)

L1_{(⌦, P ) for every t}_{2 I.}

It is useful to highlight that, for every fixed ! _{2 ⌦, we have a so called}

trajectory (a function of time t).

A stochastic process is said to be continuos (a.s) if its trajectories are

continuos function for all ! 2 ⌦ (for almost all ! 2 ⌦).

A stochastic process is said to be right-continuos (a.s) if its trajectories

are right-continuos function for all !_{2 ⌦ (for almost all ! 2 ⌦).}

A filtration (Ft)t 0 is an increasing family of sub -algebras of F. The

natural filtration for a stochastic process X is defined as ˜

FX

t = (Xs1(H)|0  s  t, H 2 B) (1.2)

We say that a process is adapted to _{{F } if X}t is Ft-measurable for every t.

Every process is adapted to its own natural filtration.

Let X, Y be two stochastic process defined on the same probability space

(⌦,F, P ). We say that X is a modification of Y if Xt = Yt almost surely for

every t.

This means that, if we set

Nt ={! 2 ⌦| Xt(!)6= Yt(!)} (1.3)

we have that Nt 2 N for any t 0. N represents the set of negligible event

(those events whose probability is 0).

Two stochastic processes are indistinguishable if almost all the trajecto-ries coincide. This means that, if we set

N ={! 2 ⌦| Xt(!)6= Yt(!) 8t} (1.4)

we have that N 2 N .

Clearly, if X and Y are two indistinguishable processes, they are also modifications. The reverse is not true. Indeed, we can write the set N below as

N =[

t 0

(8)

but this is an uncountable union (not necessarility measurable). Anyway, it’s easy to show that, in case of a.s. right-continuous processes the two notions coincide (see, for example, [10].

We state now another useful result we will use when we define the class of processes for which the Itˆo integral makes sense.

We say that a stochastic process X is progressively measurable w.r.t the

fil-tration _Ft if, for every t, X : [0, t]⇥ ⌦ is B([0, t]) ⌦ Ft measurable. This

means that

{(s, !) 2 [0, t] ⇥ ⌦|Xs(!)2 H} 2 B([0, t]) ⌦ Ft, H 2 B (1.6)

This definition will be used later. Here we only note that a progressively measurable process is measurable and adapted (Fubini-Tonelli theorem). The converse is not trivial.

However, we have that every right-continuous and adapted process is progressively measurable. The idea of the proof is to construct a sequence of simple processes who converges pointwise to X on [0, t] and to show that they’re measurable w.r.t. the product -algebra. Since X is defined as the pointwise limit of measurable functions, it’s measurable as well.

1.2 Martingale spaces:

_M

2

and

_M

2_c

Before presenting the main results concerning the martingale spaces, we in-troduce the so-called Doob’s inequality.

Let M be a right continuous martingale and p > 1. Then, for every T we have that

E[ sup

t2[0,T ]|M

t|p] qpE[|MT|p] (1.7)

where q = _pp1 is the conjugate exponent to p.

The inequality can be proved easily in discrete time. Then, using the Beppo-Levi theorem (see, for instance, [2]), we can shift to the continuous case.

(9)

because we’ll se that the Itˆo Integral is defined as a limit in M2 c.

Thus, for a fixed T > 0 we denote by

1. _M2 _{the linear space of right-continuous}

Ft-martingales (Mt)t2[0,T ] such

that M0= 0 a.s. and such that

JMKT = r E⇥ sup t2[0,T ]|M t|2⇤ (1.8) is finite 2. _M2

c the subspace of continuous martingales in M2

Here the analogy with the Lp _{spaces is evident. In particular we can}

see that _JMKT is a seminorm in M2 because it’s equal to zero for all those

stochastic process which are indistinguishable from the null process.

For our purposes, the most relevant fact is that the space (_M2

c,J·KT)

is complete (every cauchy sequence converge). Moroever, _M2

c is a closed

subspace of _M2_.

The proof is instructive so I reported it here.

Given a cauchy sequence (Mn_{) in}

M2_{, we want to prove that there exists a}

convergent subsequence to conclude that also the orginal sequence converge.

We construct a subsequence (Mkn_{) such that}

JMkn _Mkn+1_K

T 

1

2n, n 1 (1.9)

and we set ⌫n = Mkn. Then we define

N(!) = N X n=1 sup t2[0,T ]|⌫ n+1(t, !) ⌫nt, !|, N 1 (1.10)

that is an increasing sequence of non-negative function.

Moreover, we remember that, if f, g : A ⇢ R ! R are positive function

in their domain, then sup(f· g) = sup(f) · sup(g). Thus, using this result, if

(10)

that E[2 N] 2 N X n=1 J⌫n+1 ⌫nK2T (1.11)

which, using the way we constructed the sequence, is less than 2. Thus, we

have that E[2

N] 2. Now, using the Beppo-Levi theorem we have that

(!) = lim

n!1N(!), !2 ⌦ (1.12)

andE[2_]_{ 2. Thus, even if (!) can take infinite values, we know that the}

set in which can be equal to +1 is negligible.

Thus, formally, we can say that (!) < +_{1 for every ! 2 ⌦ \ F where}

P (F ) = 0.

Now, using the triangle inequality of the absolute value and the fact that

sup(f + g) sup(f) + sup(g) we can write

sup

t2[0,T ]|⌫n+1

(t, !) ⌫nt, !|  (!) (!)m 1 (1.13)

for n m 2. This means that (⌫n(t, !)) is a Cauchy sequence in R for

every t_{2 [0, T ] and ! 2 ⌦ \ F . Since, for every ! 2 ⌦ \ F , ⌫}n(t, !) is a

right-continuous function of t and the convergence is uniform in t, there exist a process M (t, !) which is indistinguishable from a right-continuous process

(continuous in case Mn_{2 M}2

c.

1.3 Usual hypotheses

We say that (_Ft) satisfies the so-called ”usual hypotheses” w.r.t P if:

1. N ⇢ F0

2. the filtration is right continuous, i.e. for every t 0

Ft=

\

✏>0

Ft+✏ (1.14)

(11)

counter-intuitive result. Intuitively, the first one means that we always know (even at time t = 0) what is possible and what is not.

Let X and Y be two random variables such that X is equal to Y almost

surely (P (X = Y ) = 1). If X is _Ft-measurable, is Y Ft-measurable ? Not

necessarily, unless we introduce the first assumption (and the completeness

of the measure). Indeed, if we set A =_{{! 2 ⌦|X 6= Y }, we have that}

Y 1(H) =_{{! 2 ⌦ \ A|X} 1(H)_{} [ {! 2 A|Y} 1(H)_} (1.15)

where the first set is measurable by assumption and the second has null measure (and so, by assumption, is contained in the filtration for every t)

The second assumption has even bigger implications for stochastic cal-culus. We’ll see when we’ll talk about stopping times. Here, we just note

that if a random variable X is_Fs-measurable for every s > t, then X is also

Ft-measurable.

Given a stochastic process X on the space (⌦,F, P ), we set

FtX = \ ✏>0 ˆ Ft+✏X (1.16) where ˆ_FX

t = ( ˜FtX[ N ). It can be verified that the filtration above verifies

the usual hypotheses.

1.4 Stopping times

We introduce now a concept we will encounter a number of times later on. A random variable

⌧ : ⌦_{! R} 0[ {1} (1.17)

is called stopping time w.r.t the filtration (_Ft) if{⌧  t} 2 Ft for every t.

It’s easy to see that if ⌧ is a stopping time then_{{⌧ < t} 2 F}t.

To prove the reverse, we note that for every ✏ > 0 we have that

{⌧ < t} = \

{n2N|1_<✏}

(12)

so that {⌧ < t} 2 Ft+✏. Consequently, in view of the usual hypotheses

{⌧ < t} 2 Ft =

\

✏>0

Ft+✏ (1.19)

A particularly useful stopping time is the so-called hitting time of an open

set in RN_.

Let X a right-continuous, _Ft adapted process in RN and H an open set

in_RN_{. Let us set} I(!) ={t 0|Xt(!)2 H} (1.20) and ⌧ (!) = 8 < :

inf I(!) ifI(!)6= ;

+₁ ifI(!) =_; (1.21)

So, if ⌧ = t, we know that the process touched the boundary of the open

set at time t and then entered inside but we highlight that Xt62 H.

In order to show that ⌧ is stopping time, in light of the observations made

above, we have to show that_{{⌧ < t} 2 F}t.

We state that

{⌧ < t} = [

s2Q\[0,t[

{Xs2 H} (1.22)

Indeed, let’s suppose that there exist ¯! and t0 < t 2 R \ Q such that

Xt0(¯!)2 H. Since H is open, there will exist ¯r such that the ball B(Xt0(¯!), ¯r)⇢

H. Moreover, since Xt is right continuous we know that there exist > 0

such that _{8t such that t} t0 < we have Xt 2 B(Xt0(¯!), ¯r). Now, pick

t1 2 (t0, t0+ ) such that t1 < t and t1 2 Q (We cal always do this because

of the density of the irrational numbers). Then we have that Xt1 2 H. This

concludes the proof.

We give now some usuful notion we will use later on when we define the stochastic integral with a stopping-time as upper integration limit.

Let ⌧ be a stopping time which is finite on ⌦\ N where N is a negligible

event and let X be a stochastic process. We set

(13)

Thus,8! 2 ⌦\N we have a random variable. When ⌧ = 1 the expression

above is not well defined (what is X1 ?) but since N is negligible, we don’t

care too much.

Then, we define the -algebra associated to the stopping time ⌧ as

F⌧ ={F 2 F|F [ {⌧  t} 2 Ft8t} (1.24)

For every stopping time ⌧ and n2 N, the equation

⌧n(!) = 8 < : k+1 2n if k 2n  ⌧(!) < k+1 2n +1 if ⌧(!) = +1 (1.25)

1.5 Brownian motion

We introduce now a foundamental process for stochastic calculus. Since Brownian motion and its properties are well-known, I just stated here without any proof (for a more detailde analysis see, for instance, [7]).

Let (⌦,_{F, P, (F}t)) be a filtered probability space. A real brownian motion

is a stochatic process W = (Wt)t 0 inR such that

• W0= 0 a.s

• W is Ft-adapted and continuos

• for t > s 0, the random variable Wt Ws has normal distribution

N0,t s and is independent ofFs

It’s worth noting that there exist more brownian motion defined on dif-ferent probability spaces. The existence is not trivial and some construction can be found in the literature.

In order to understand the di↵erence between ordinary calculus and stochas-tic calculus we need to remember the concept of bounded variation and quadratic variation.

(14)

Let us consider a function g : [a, b]! Rn _{and a partition ⇣ =}_{t

0, ..., tN}

of [a, b]. The variation of g relative to ⇣ is defined by

V[a,b](g, ⇣) = N

X

k=1

|g(tk) g(tk 1)| (1.26)

The first variation of g over [a, b] is the supremum taken over all the partitions of [a, b], that is

V[a,b](g) = sup ⇣

V[a,b](g, ⇣) (1.27)

If this quantity is finite, we say that g is a bounded variation (BV) function

(g _{2 BV ([a, b])).}

On the other hand, the quadratic variation of g relative to ⇣ is defined by Vt(2)(g, ⇣) =

N

X

k=1

|g(tk) g(tk 1)|2 (1.28)

Now, it’s possible to show that if g_{2 BV ([0, t]) \ C([0, t]), then}

lim

|⇣|!0V (2)

t (g, ⇣) = 0 (1.29)

Let’s come back to our brownian motion.

It can be shown that for any sequence of partitions (⇣n) with mesh converging

to zero (that is limn!1|⇣n| = 0) we have that

lim

n!1V (2)

t (W, ⇣n) = t (1.30)

in L2_{(⌦, P ).}

Now, using a standard result of probability theory, we know that for any

such sequence of partitions (⇣n) there exists a subsequence (⇣kn) such that

lim

n!1V (2)

t (W, ⇣kn) = t, a.s. (1.31)

Thus, in light of the previous result, almost all the trajectories of the brow-nian motion cannot have bounded variations.

(15)

Finally, it can be proved that with probability one, the trajectories of a Brownian Motion are nowhere di↵erentiable.

1.6 Itˆ

o Integral

In this section we introduce the construction of the Itˆo Integral and the main problems in its definition.

Basically, we want to study what happens to this object

I1=

N

X

k=1

utk 1(Wtk Wtk 1) (1.32)

when we shift from the discrete to the continuous case. Does there exist the

limit for every ! 2 ⌦ ?

It turns out that this is a difficult object to deal with. Indeed, even if

assume nice conditions on the integrand process ut (for example, continuity

of the trajectories) we cannot define those object as a Riemann Integral (we can’t use the Lagrange theorem because W is nowhere di↵erentiable) and we cannot even define it in the Riemann-Stieltjes sense (W has not bounded variation).

Here, the more general way to construct the Itˆo Integral. In what follows

W is a real Brownian motion on the filtered probability space (⌦,F, P, (Ft))

where the usual hypotheses hold and T is a fixed positive number.

We say that a stochastic process u belongs to the class_L2 _if:

• u is progressively measurable w.r.t the filtration (Ft)

• u 2 L2_{([0, T ]}

⇥ ⌦) that is (using Fubini theorem)

Z T

0 E[u 2

(16)

A process u2 L2 _{is called simple if it can be written as follows} ut = N X k=1 ek ]tk 1,tk](t), t2 [0, T ] (1.34)

where 0 = t0< t1<· · · < tN = T and ek are random variables on (⌦,F, P ).

As a consequence of the assumption that u _{2 L}2_{, we have that e}

k is

Ftk 1-measurable for every k = 1,· · · , N. Further, ek2 L

2_{(⌦, P ).}

For this kind of process, we define the Itˆo Integral as Z utdWt= N X k=1 ek(Wtk Wtk 1) (1.35)

and, for every 0_{ a < b  T ,}

Z b

a

utdWt =

Z

ut ]a,b](t)dWt (1.36)

We briefly explain the main important properties concerning the Itˆo in-tegral of simple processes. Recalling the tower property of conditional ex-pectation, we have that all the properties listed below are still true in the ”non-conditional” version.

Let u, v 2 L2_{, ↵,} _{2 R and 0  a < b < c  T . The following properties}

hold: • Linearity: Z (↵ut+ vt)dWt = ↵ Z utdWt+ Z vtdWt (1.37) • Additivity: _Z b a utdWt+ Z c b utdWt= Z c a utdWt (1.38) • Null expectation: EZ b a utdWt|Fa = 0 (1.39)

(17)

• The stochastic process Xt= Z t 0 usdW s, t2 [0, T ] (1.40) is a continuous Ft-martingale.

• The Itˆo isometry

EZ b a utdWt Z b a utdWt|Fa =E Z b a u2tdt|Fa (1.41)

The last property, the Itˆo isometry, can be written as an equality between norms

k

Z b

a

utdWtkL2_(⌦)=kuk_L2_([a,b]_⇥⌦) (1.42) This property is foundamental for the construction of the Itˆo Integral for a

more general class ofL2 _processes.

Intuitively, if (un_{) is a cauchy sequence in L}2_{([0, T ]}

⇥ ⌦), then (IT(un)) is

a Cauchy sequence in L2_{(⌦) too.}

Luckily, it can be proved that, for every u2 L2 _{there exist a sequence of}

simple processes in L2 _{such that}

lim

n!1ku u n

k2L2_{([0,T ]⇥⌦)}= 0 (1.43)

Thus, we select a convergent sequence of simple processes (un_{) approximating}

u _{2 L}2_{. Since it converges, (u}n_{) is a Cauchy sequence. Thanks to Itˆo}

Isometry, even (IT(un)) is a Cauchy sequence in L2(⌦). Thus, we define the

Itˆo integral as the limit

Z T

0

utdWt = lim

n!1IT(u

n_{) in L}2_(⌦) _(1.44)

With the procedure above, we have defined the stochastic integral only except

for a negligible event NT 2 N . Indeed, recall that every element of L2(⌦) is

unique up to a negligible event. This is a problem in defining the integral as a stochastic process since T belongs to un uncountable set.

(18)

Thus, we use the Doob’s inequality. We consider the usual sequence of

simple processes (un₎

2 L2 _{as before. Then, we define}

It(un) =

Z t

0

unsdWs, t2 [0, T ] (1.45)

Using Doob’s Inequality we have that

JI(un₎ _I(um₎

KT  2kun vnkL2_{([0,T ]}_⇥⌦) (1.46)

Thus It(un) is a Cauchy sequence in (M2c,J·K) that is a complete space.

Thus, finally, we can write

Z t 0 usdWs= lim n!1 Z t 0 unsdWs in M2c (1.47)

The limit is defined up to indistinguishability.

All the properties outlined before in the case of simple processes are still

true; in particular, using the Doob-Inequality, we have that if u, v _{2 L}2 _are

modifications then their stochastic integrals coincide.

We give here also the definition in the case in which the upper integration limit is a stopping time.

If u_{2 L}2₍

Ft) and ⌧ is anFt-stopping time such that 0 ⌧  T a.s then

(ut t⌧) 2 L2 and X⌧ = Z ⌧ 0 usdWs= Z T 0 us s⌧dWs a.s (1.48)

The proof is constructive so I report it here.

The stochastic process {⌧ t} is bounded and progressively measurable.

Thus, the product ut {⌧ t}2 L2. We define

Y =

Z T

0

us {⌧ s}dWs (1.49)

(19)

First of all, we consider the case of a simple stopping time ⌧ = n X k=1 tk Fk (1.50)

with 0 < t1< t2<· · · < tn = T and Fk2 Ftk are disjoint events such that

F = n [ k=1 Fk 2 F0 (1.51) .

Given the definition of X⌧, we have that X⌧ = 0 on ⌦\ F and we can

write X⌧ = F Z T 0 usdWs n X k=1 Fk Z T tk usdWs (1.52)

Now, to go on, we need a useful lemma that we state without proof. Let

u2 L2 _{and X a} _F

t0-measaurable random variable for some t0 > 0. Then

X Z T t0 utdWt = Z T t0 XutdWt (1.53)

Now, it’s easy to see that {⌧ t}= (1 {⌧<t}) so that we can write

Y = Z T 0 us1 {⌧<s})dWs = Z T 0 usdWs Z T 0 us( ⌦\F + n X k=1 Fk {s>tk})dWs = F Z T 0 usdWs n X k=1 Z T tk us FkdWs

Now, Fk is a Ftk-measurable random variable for k = 1,· · · , n so that

we have X⌧ = Y .

(20)

0 ⌧  T a.s. We define the following decresing sequence of stopping time ⌧n = 2n ₁ X k=0 T (k + 1) 2n Fk (1.54) where Fk = {T k₂n < ⌧  T (k+1)

2n }. Since ⌧ is stopping time, we have that

Fk2 FT (k+1)

2n and they are disjoint.

Finally, we can easily see that S2_k=0n 1Fk is F0-measurable. This because

we know that _{{⌧ = 0} and {⌧ > T } are F}0-measurable (the first because

of the definition of stopping times and the second because it’s a negligible event). We can write

⌦\ {{⌧ = 0} [ {⌧ > T }} =

2n ₁ [

k=0

Fk (1.55)

and this is measurable. So, we’re in the previous situation.

Recalling that the stochastic integral is a continuous martingale, we have

that X⌧n converges to X⌧ a.s. If we define

Yn= Z T 0 us {⌧n>s}dWs (1.56) we have that X⌧n = Y n_, 8n 2 N.

We introduce here the concept of quadratic variation of a stochastic in-tegral as Xt= Z t 0 usdWs (1.57) where u_{2 L}2_.

Let X as above. Then for any t > 0, there exists the limit lim |⇣|!1 N X k=1 |Xtk Xtk 1| 2₌Z t 0 u2_sds in L2(⌦, P ) (1.58)

and we call it the quadratic variation process of X.

(21)

proceed to generalize this notion to a larger class of stochastic processes and we’ll see the notion of local martingale.

We denote byL2

locthe family of processes (ut)t2[0,T ] that are progressively

measurable w.r.t (_Ft)t2[0,T ] and such that

Z T

0

u2

tdt <1 a.s (1.59)

The integral above is a random variable: 8! 2 ⌦ we have a trajectory

and we compute the value of the integral. Morover, it is well defined: the assumption of progressively measurability togheter with the Tonelli Theorem

ensure that_{8! 2 ⌦ and 8t 2 [0, T ] the trajectory is Lebesgue integrable.}

It’s worth noting that:

• Every progressively measurable and (a.s) right continuous process

be-longs to L2

loc.

• The space is invariant w.r.t. equivalent probability measures. This are the explained steps:

• Given u 2 L2

loc, we define the process

At =

Z t

0

u2sds t2 [0, T ] (1.60)

At is an increasing and continuous process (property of Lebesgue

inte-gral)

• For every n 2 N we put

⌧n= inf{t 2 [0, T ]|At n} ^ T (1.61)

⌧n is a stopping time and (since u2 L2loc)

(22)

Moreover, if we set Fn ={⌧n= T} = {AT  n} (1.63) we have that [ n2N Fn= ⌦\ N, N 2 N (1.64) • Now, we set unt = ut {t⌧n}, t2 [0, T ] (1.65)

and we note that un

t 2 L2 since E Z T 0 (un_t)2dt =E Z ⌧n 0 (ut)2dt  n (1.66)

Thus, the process

Xtn =

Z t

0

untdWt, t2 [0, T ] (1.67)

is well-defined.

Now, for every n, h2 N we have that un_{= u}n+h _{= u on F}

n.

We give here the definition of local martingale.

A process M = (Mt)t2[0,T ] is a Ft-local martingale if there exists an

increasing sequence (⌧n) ofFt-stopping times, called localizing sequence for

M , such that

lim

n!1⌧n = T a.s. (1.68)

and, for every n_{2 N, the stochastic process M}t^⌧n is aFt-martingale.

1.7 Itˆ

o process

An Itˆo process is a stochastic process X of the form

Xt = X0+ Z t 0 µsds + Z t 0 sdWs, t2 [0, T ] (1.69)

(23)

where X0 is a F0-measurable random variable, µ 2 L1loc and 2 L2loc.

The first integral is an ordinary Lebesgue integral for every !_{2 ⌦.}

In what follows, we usually write the above expression in ”di↵erential” form

dXt = µtdt + tdWt (1.70)

It’s worth noting that every expression in the ”integral form” has been jus-tified while the di↵erential form is merely a shortcut.

If X is the Itˆo process above, its quadratic variation process is given by

hXit =

Z t

0 2

sds (1.71)

or, in di↵erential form

d_hXi_t = 2_tdt (1.72)

We will now introduce two ”versions” of the so-called Itˆo-formula. The proof is not difficult. It is based on a Taylor series expansion. We’ll not report it here (see, for instance, [10]).

Let f 2 C2₍_{R) and let W be a real Brownian motion. Then f(W ) is an}

Itˆo process and we have

df (Wt) = f0(Wt)dWt+

1

2f

00_(W

t)dt (1.73)

We observe the presence of the extra term (1

2f00(Wt)dt) which makes this

”di↵erentiation” rule di↵erent from the ordinary one. The general formulation is as follows.

Let X be the Itˆo process above and f = f (t, x) _{2 C}1,2₍_R2_{). Then the}

stochastic process Yt = f (t, Xt) is an Itˆo process and we have

df (t, Xt) = @tf (t, Xt)dt + @xf (t, Xt)dXt+

1

2@xxf (t, Xt)dhXit (1.74)

A little remark about the notation. Let X be an Itˆo process

(24)

If h is a stochastic process such that hµ 2 L1

loc and h 2 L2loc, we usually

write

dYt= htdXt (1.76)

instead of

dYt = htµtdt + ht tdWt (1.77)

Thus, the stochastic integral w.r.t. an Itˆo process can be written as

Z t 0 hdXs= Z t 0 hsµsds + Z t 0 hs sdWs (1.78) We remark that, if µ _{2 L}1

loc, 2 L2loc and h is a continuous adapted

process, then hµ2 L1

loc and h 2 L2loc. More generally, it’s enough that h

is progresssively measurable and a.s. bounded.

The two formulas above are extremely useful in stochastic analysis. In-deed, using them, we can compute the di↵erential for a large amount of functions depending on time and an Itˆo process. We will see later on that we can also solve some Stochastic Di↵erential Equation (SDE) choosing wisely the function f (x, t).

Particularly relevant is the case in which µ, are non stochastic processes

but deterministic functions of time. Indeed, if µ 2 L1_{(0, +}_{1) and} ₂

L2_(0,

1), the process defined by

dSt= µ(t)dt + (t)dWt (1.79)

has a normal distibution with

E[St] = S0+ Z t 0 µ(s)ds V AR(St) = Z t 0 2_(s)ds _(1.80)

Computing the two moments above is not difficult: you write the integral form of the Itô process, and employ the property of the Itô integral (null expectation). To compute the variance you use the Itô isometry. On the

other hand, to show that St has a normal distribution8t you have to check

(25)

characteristic function of a normal variable.

We can also develop a Multi-dimensional Itˆo calculus. Conceptually, there is nothing new, so I will not focus too much on it. I just want to highlight the fact that, starting from an independent Brownian Motion, we can construct a correlated one.

Let (⌦,_{F, P, (F}t)) be a filtered probability space. A d-dimensional

Brow-nian motion is a stochastic process in Rd _{such that}

• W0= 0 P-a.s

• W is Ft adapted and continuous

• for t > s 0 the random variable Wt Ws has multi-normal

distribu-tion _N0,(t s)Id and is independent of Fs.

Intuitively, a d-dimensional Brownian Motion is a vector of 1-dimensional independent Brownian motions.

Sometimes, it can be useful to have a d-Brownian motion in which every component is correlated (to some degree) to each other. This is the so-called correlated Brownian motion.

Let A an (N⇥ d) matrix with constant values and let us denote by AT

its transpose. Then let us define

% = AAT _(1.81)

Given µ_{2 R}N _{and a d-dimensional Brownian motion W , we put}

Bt= µ + AWt (1.82)

Then, Bt is a N -dimensional Brownian motion starting from µ at t = 0

with correlation matrix %.

In finance, usually, N is the number of assets on the market and d is the number of ”risk” factors.

All the Itˆo formulas can be easily generalized to the multidimensional case.

(26)

1.8 SDE: Existence and Uniqueness

We consider Z 2 RN _{and two measurable functions}

b = b(t, x) : [0, T ]⇥ RN

! RN_, _{= (t, x) : [0, T ]}

⇥ RN

! RN⇥d _(1.83)

We refer to b as the drift and to as the di↵usion coefficient.

Let W a d-dimensional Brownian Motion on the filtered probability space

(⌦,_{F, P, (F}t)) on which the usual hypotheses hold. A solution relative to

W of the SDE with coefficent (Z, b, ) is a Ft-adapted continuous process

(Xt)t2[0,T ] such that

• b(t, Xt)2 L1loc and (t, Xt)2 L2loc

• we have Xt = Z + Z t 0 b(s, Xs)ds + Z t 0 (s, Xs)dWs, t2 [0, T ] (1.84)

or, in di↵erential form,

dXt = b(t, Xt)dt + (t, Xt)dWt, X0 = Z (1.85)

There are various notions of solutions to the previous SDE. Here, we only study strong solutions.

In particular, we say that the SDE with coefficents (Z, b, ) is solvable in the strong sense if, for every fixed standard Brownian Motion W , there exist a solution.

The di↵erence w.r.t. the solution in the weak sense is that, in that case, the Brownian Motion is part of the solution.

The most important result is the following: The SDE

dXt = b(t, Xt)dt + (t, Xt)dWt, X0= Z (1.86)

(27)

• Z 2 L2_{(⌦, P ) and it is} _F

0-measurable;

• b, are locally Lipschitz continuous in x uniformly w.r.t t, i.e. for every

n_{2 N there exist a constant K}n such that

|b(t, x) b(t, y)_|2₊

| (t, x) (t, y)_|2

 Kn|x y|2 (1.87)

for|x|, |y|  n, t 2 [0, T ]

• b, have at most linear growth in x, i.e

|b(t, x)|2+_{| (t, x)|}2_{ K(1 + |x|}2) x_{2 R}N, t_{2 [0, T ]} (1.88)

for a postive constant K. We have the following result.

If the standard conditions 1) and 2) hold, then the solution of the SDE

dXt = b(t, Xt)dt + (t, Xt)dWt, X0= Z (1.89)

is pathwise unique (two strong solutions are indistinguishable).

The proof is constructive so I reported it here. Before reporting the proof, I state here (without proof) the Gronwall’s lemma.

Let '2 C([0, T ]) be such that

'(t) a +

Z t

0

f (s)'(s)ds, t2 [0, T ] (1.90)

where a2 R and f is a continuous, non-negative function. Then we have

'(t)_{ ae}R0tf (s)ds, t2 [0, T ] (1.91)

Let X and ˜X two strong solutions with initial datum Z and ˜Z.For n_{2 N}

and ! _{2 ⌦ we define}

(28)

and ˜sn(!) analogously. Remember that inf{;} = 1 and that the two

solu-tions are, by definition, continuous.

We have that sn(!) and ˜sn(!) are an increasing sequence of stopping

times converging to T a.s.

We define ⌧n = sn ^ ˜sn, and we have that ⌧n is an increasing sequence

converging to T a.s. Now, we can write

Xt^⌧n X˜t^⌧n = Z Z +˜ Z t^⌧n 0 (b(s, Xs) b(s, ˜Xs))ds+ + Z t^⌧n 0 ( (s, Xs) (s, ˜Xs))dWs

Using the elementary inequality (a + b + c)2

 3(a2_{+ b}2_{+ c}2_{) and taking}

the expected value, we can write E  |Xt^⌧n X˜t^⌧n| 2  3E[|Z Z˜|2_] + 3E  | Z t^⌧n 0 (b(s, Xs) b(s, ˜Xs))ds|2 + 3_E  | Z t^⌧n 0 ( (s, Xs) (s, ˜Xs))dWs|2

Now, to deal with the second integral we use the fact that _|R _{| } R _{| · |}

togheter with the Holder Inequality (the t in the formula below appers be-cause the other function in the Holder formula is 1). Instead, for what concern the third integral, we use the Itˆo isometry, toghether with the fact

that ( (s, Xs) (s, ˜Xs)) {s⌧n^t} 2 L

(29)

and using the assumption on the coefficents  3 ✓ E[|Z _Z˜_|2_{] + K} n(T + 1) Z t 0 E  |Xs^⌧n X˜s^⌧n| 2 _ds ◆

Now, using Gronwall’s lemma, we know that E



|Xt^⌧n X˜t^⌧n|

2

 3E[|Z Z˜|2_]e3Kn(T +1)t _(1.93)

If Z = ˜Z a.s., using Fatou’s lemma, we have that

E[|Xt X˜t|2] =E  lim n!1|Xt^⌧n ˜ Xt^⌧n| 2  lim_n!1infE  |Xt^⌧n X˜t^⌧n| 2 _{= 0}

This imply that Xt = ˜Xt a.s 8t 2 [0, T ]. Thus, they’re modifications and,

since they are continuous stochastic processes, also indistinguishable. Let’s see now an example of SDE widely used in finance, the so-called Geometric Brownian Motion.

Let’s consider the following SDE

dSt = µStdt + StdWt (1.94)

(30)

We know that we have to find a process S such that,8t 2 [0, T ] we have St = S0+ µ Z t 0 Ssds + Z t 0 SsdWs (1.95)

To solve this SDE, we can use directly Itˆo formula with f (t, Xt) = ln(Xt).

We have that f0_{(x) =} 1

x and f00(x) =

1 x2. Thus, applying Itˆo formula, we have

dln(St) = 1 St dSt 1 2 1 S2 t d_hSi_t = µdt + dWt 1 2 2_dt = (µ 1 2 2_{)dt + dW} t

Integrating, from 0 to t, we get

ln(St) = ln(S0) + (µ 1 2 2_{)t + W} t (1.96) and, finally, St = S0e Wt+(µ 2 2)t (1.97)

(31)

Chapter 2 Stochastic filtering

2.1 The problem

In this section we explain what we mean by the term ”stochastic filtering” and our main steps toward its ”solution”. We will recall some basic facts about Hilbert spaces when we deal with projections.

Suppose we have a stochastic process Xtwith values inRn which is a solution

of the following SDE

dXt= b(t, Xt)dt + (t, Xt)dWt (2.1)

where b : [0, +_1[⇥Rn

! Rn _and _{: [0, +}

1[⇥Rn

! Rn⇥p _{satisfy the usual}

conditions on the coefficents for a SDE. Here Wtis a p dimensional Brownian

Motion on a given probability space (⌦,F, P ).

Now, assume that we cannot observe directly this process but only a noise

version of it. In particular, _8t 0 we observe

Ht = c(t, Xt) + f (t, Xt) ˜Wt (2.2)

where c : [0, +_1[⇥Rn _{! R}m _{and f : [0, +}_1[⇥Rn _{! R}m⇥r _{satisfy the usual}

conditions on the coefficents for a SDE. Here ˜Wtis a r dimensional Brownian

motion independent from Wt and X0.

(32)

fol-lowing process

Zt =

Z t

0

Hsds (2.3)

so that Zt obey to the following SDE

dZt = c(t, Xt)dt + f (t, Xt)d ˜Wt (2.4)

Our objective is to obtain the ”best” possible estimate of the unobservable

process Xt using the information obtained by observing Zt. In particular,

8t 0 we would like to have an estimate ˆXt of Xt using the information

given by the observation of Zs in the interval [0, t].

First of all, when we say that ˆXt should be based on the observations of

Z we mean that ˆXt should be Gt-measurable where

Gt = (Zs|s  t) (2.5)

is the -algebra generated by the process Zs until time t.

Then, when we say ”best possible” estimate, we should specify in which

sense. Since we are dealing with random variables we say that ˆXt is a good

estimate if

E[|Xt Xˆt|2] = min{E[|Xt Y|2], Y 2 ⇤t} (2.6)

where

⇤t ={Y : ⌦ ! Rn, Y 2 L2(⌦, P ), Y is Gt measurable} (2.7)

Now, we’ll use the theory of Hilbert space to characterize the solution ˆXt.

Before going on, we recall this general result.

Let K be a convex, closed and non-empty subset of a Hilbert space H.

Then, _{8y 2 H there exists a unique x}02 K such that

ky x0k = min

x2Kky xk (2.8)

(33)

Moreover, if K is a closed ”subspace” then we know that there

(x PK(x), ) = 0, 8 2 K, 8x 2 H (2.9)

where (_{·, ·) is the scalar product.}

We are precisely in the second situation: L2_{(⌦, P ) is a Hilbert space, ⇤}

t

is a linear subspace (0 2 ⇤t) of L2(⌦, P ) and thus it’s an Hilbert space as

well.

Thus, we know that

ˆ

Xt= P⇤t(X) (2.10)

Now, from the general theory on Hilbert spaces, we know that Z

⌦

Y (Xt P⇤t(X)) = 0 8Y 2 ⇤t (2.11)

Choosing Y = 1 we can say that Z A (Xt P⇤t(X)) = 0 8A 2 Gt (2.12) or equivalently _Z A Xt = Z A P⇤t(X) 8A 2 Gt (2.13)

The expression above is the exact definition of conditional expectation. Given that the conditional expectation is unique, we can write

ˆ

Xt = P⇤t(X) =E[Xt|Gt] (2.14)

2.2 The 1-d linear case

Let us assume for the moment that the two process Xt and Zt satisfy the

two following linear SDEs

dXt = a(t)Xtdt + b(t)dWt

(34)

We assume that Z0 = 0, X0 is normally distributed and independent from

Wt and ˜Wt and the coefficents (a(t), b(t), c(t), d(t)) are bounded in bounded

intervals. In this way the usual conditions for the existence and uniqueness are satisfied. We remember that, in the linear case above, we have an explicit

solution for Xt.

Our ultimate objective is to write down an explicit expression for the

stochastic di↵erential of ˆXt in terms of Zt. We proceed in five steps.

2.2.1 Step 1

Let’s define _{L = L(Z, t) as the closure in L}2_{(⌦, P ) of the space of functions}

that can be written as

c0+

n

X

j=1

cjZsj(!), c0, cj 2 R sj  t (2.15)

Thus, _{L is a closed (non-empty) subspace of L}2 _{and we can use the result}

about projections outlined before. PL will denote the projection operator

from L2 _to

L.

Now, we want to show that ˆ

Xt =P⇤t(Xt) =PL(Xt) (2.16)

The fact above has an important implication: it shows that the best

possible estimate of Xt is linear in Zt. Remember that PL(Xt) 2 L so that

ˆ

Xt can be written as a linear combination (or as a limit in L2(⌦, P ) of a

linear combination) of Zsj with sj t.

In order to prove this, we prove the following lemma:

Let X, Zswith s t be L2(⌦) random variables and assume that

(X, Zs1, Zs2, . . . , Zsn)2 R

n+1 _(2.17)

is normal_8sj t. Then,

(35)

whereG is the -algebra generated by Zs, s t. There is no subscript in

the expression above because we’re considering t as fixed.

We define ˆX =_P_L(X) and we put ˜X = X X. We want to show thatˆ

˜

X is independent from_G.

We remember that a k-dimensional vector is (jointly) normal i↵ every linear combination of its elements is normal. Given that X is normal (by

assumption) and ˆX is normal because it’s (the limit in L2_{(⌦, P ) of) a linear}

combination of normal random variables, we have that

( ˜X, Zs1, Zs2, . . . , Zsn) (2.19)

is normal.

Now, using Hilbert theory, we know thatE[ ˜XZsj] = 0,8j = 1, . . . , n. This

imply non correlation (remember that E[ ˜X] = 0) and, since we’re dealing

with normal variables, independence. Thus, we have that

˜

X and (Zs1, Zs2, . . . , Zsn) (2.20)

are independent _8sj  t and, consequently, X is independent of G.

Using the independence result we just found, we can write

E[ AX] =˜ E[ ˜X]E[ A] = 0 8A 2 G (2.21)

and, recalling the definition of conditional expectation,

E[X|G] = PL(X) (2.22)

Now, let’s come back to our case. We need to show the jointly normality

of Xt and Zt. In order to do this, we note that the vector

Mt= " Xt Zt # (2.23)

(36)

can be seen as the solution of the 2-dimensional linear SDE dMt = H(t)M (t)dt + J(t)dBt (2.24) where H(t) = " a(t) 0 c(t) 0 # , J(t) = " b(t) 0 0 d(t) # (2.25) We know that the solution of a linear SDE is always a normal process. Know-ing this, we can now apply the lemma we proved at the beginnKnow-ing of the section to conclude.

2.2.2 Step 2

Now, we will show that the space_{L(Z, T ) (defined as above), can be written}

as

L(Z, T ) = {c0+

Z T

0

f (t)dZt; f 2 L2[0, T ], c02 R} (2.26)

We call the right-hand side_{N (Z, T ).}

Now, from general topology, we know that the closure of a set A (which

is usually denoted by ¯A) with respect to a normk·k1 is the set itself togheter

with all the limit points of the converging sequences (in the sense on the considered norm) contained in A.

Morover, ¯A is the smallest closed set that contains the set A. Thus, if we

want to prove the equality above, we have to show that

1. _{N (Z, T ) ⇢ L(Z, T )}

2. _{N (Z, T ) is closed w.r.t the L}2_{(⌦, P )-norm}

3. N (Z, T ) contain all the linear combinations of the form

c0+

n

X

j=1

cjZtj c0, cj 2 R tj  T 8j (2.27)

(37)

write n X j=1 cjZtj = n 1 X j=0 c⇤j Zj = n 1 X j=0 Z tj+1 tj c⇤jdZt= Z T 0 ( n 1 X j=0 c⇤j [tj,tj+1))dZt

In order to show that _{N (Z, T ) is closed, we have to show that every}

con-vergent sequence contained inN (Z, T ) has a limit which is still in N (Z, T ).

More formally, let’s un ⇢ N (Z, T ) be a convergent sequence in the L2(⌦, P

)-sense and let us denote its limit point as u. We have to show that u ₂

N (Z, T ).

Before proving this, we state an important result without proof. Let

f 2 L2_{(⌦, P ) and let Z}

t the Itˆo process presented in the introduction. Then,

A0 Z T 0 |f(t)| 2_dt  E[( Z T 0 f (t)dZt)2] A1 Z T 0 |f(t)| 2_dt _(2.28)

Now,8n 2 N we can write un as

un= cn+

Z T

0

fn(t)dZt (2.29)

We know that{un}{n2N}is a convergent sequence and, consequently, a Cauchy

sequence. This means that8✏ > 0 there exist ⌫ 2 N such that 8n, m > ⌫ we

have

cn cm+E[(

Z T

0

(fn(t) fm(t))dZt)] < ✏ (2.30)

Now, using the result we stated before (2.28), we see that _{fn(t)} is a

Cauchy sequence in L2_{(⌦, P ) which is a complete metric space. This means}

that the limit point is in L2_{(⌦, P ).}

To show that N (Z, T ) ⇢ L(Z, T ) we need to introduce an important

concept of continuity wich has a relevant impact on the way Itˆo Integral is constructed.

We say that a stochastic process ut is L2-continuous at t0 if

lim

t!t0E[(Xt

(38)

In this case, if we put u(⇣) = N X k=1 utk 1 ]tk 1,tk] (2.32)

where ⇣ =_{t0, t1,· · · , tN} is a partition of [0, T ], then u⇣ is a simple process

inL2 _{and we have}

lim

|⇣|!0u

⇣ _{= u in L}2_{([0, T ]}

⇥ ⌦) (2.33)

Now, recalling the way in which we have defined the stochastic integral, we have that we can see the Itˆo integral (with fixed upper integration limit)

simply as the limit (in L2_{) of a Riemann-Stieltjes sum.}

This is particularly useful in our case. Indeed,it’s clear from the

defini-tion above that a continuous funcdefini-tion is L2_{-continuous. Remember that a}

continuous function f (t), t _{2 [0, T ] can be seen as a stochastic process with}

non dependence on ⌦.

Thus, we have that if f 2 C[0, T ]

Z T 0 f (t)dZt = lim n!1 X j f (j· 2 n_)(Z (j+1)2 n Z_j2 n) (2.34)

which is in L(Z, T ) because it’s the L2_{(⌦, P )-limit of a sequence of element}

inL(Z, T ).

Finally, we have proved that

L(Z, T ) = N (Z, T ) (2.35)

This characterization of the space_{L(Z, T ) will be usufel later on.}

Now, we define the innovation process Nt as follows:

Nt = Zt

Z t

0

c(s) ˆXsds (2.36)

with dynamic given by the following SDE

(39)

From now on we will often use the second specification.

Now, we will show the following facts about Nt

• Nt has orthogonal increments

This means that

E[(Nt2 Nt1)(Ns2 Ns1)] = 0 (2.38)

for every non overlapping interval [t1, t2] and [s1, s2].

Now, if s < t and Y _{2 L(Z, T ) we have that}

E[(Nt Ns)Y ] =E[ ✓ Z t s c(r)(Xr Xˆr)dr + Z t s d(r)dVr ◆ Y ] = Z t s c(r)E[(Xr Xˆr)Y ]dr +E[( Z t s d(r)d ˜Wr)Y ] = 0

where we have used Fubini-Tonelli theorem to exchange the order of

integration in the second line, the fact that Xt Xˆt is ortoghonal in

L(Z, r) which contain L(Z, s), 8s  r and the property of the increment of the Brownian Motion (remember that d(t) is a deterministic function of time and, consequently, the stochastic integral can be seen as the

limit in L2_{(⌦, P ) of a Riemann sum).}

• E[N2

t] =

Rt 0d

2_(s)ds

Applaying Itˆo’s formula with g(t, x) = x2 _{we have}

d(Nt2) = 2NtdNt+

1

22(dNt)

2 _(2.39)

Integrating from 0 to t and taking the expected value we get

E[N2 t] =E[ Z t 0 2NsdNs] + Z t 0 d2(s)ds (2.40)

(40)

prop-erty of orthogonality of the increments of Nt, we have that Z t 0 NsdNs= lim tj!0 X Ntj[Ntj+1 Ntj] (2.41) which is equal to 0. • L(N, t) = L(Z, t) 8t 0

The first inclusion (L(N, t) ⇢ L(Z, t)) is clear from the definition of Nt

itself. To establish the reverse inclusion we use the characterization of

the function in_{L(Z, t)) we found earlier.}

In order to do this, let f (t) 2 L2_{(⌦, P ) and let’s try to see if we can}

writeR₀tf (s)dNsas the sum of an integral w.r.t the stochastic process

Zt plus a constant. We have

Z t 0 f (s)dNs= Z t 0 f (s)dZs Z t 0 f (r)c(r) ˆXrdr

Now, we know that _{8r we have that c(r) ˆ}Xr 2 L(Z, r) and can be

written as c(r) ˆXr = m(r) + Z t 0 g(r, s)dZs (2.42) where g(r,·) 2 L2_{(⌦, P ) and c(r)}_{2 R}

Thus, we can write (remembering the previous expression)

Z t 0 f (s)dNs= Z t 0 f (s)dZs Z t 0 f (r)[ Z r 0 g(r, s)dZs]dr Z t 0 f (r)c(r)dr = Z t 0 [f (s) Z t s f (r)g(r, s)dr]dZs Z t 0 f (r)c(r)dr

It’s a well-know fact about Volterra integral equations that there exists

for all h2 L2_{[0, t] an f} _{2 L}2_{[0, t] such that}

f (s)

Z t

s

(41)

By choosing h = [0,t1], t1 t we obtain Z t 0 f (r)c(r)dr + Z t 0 f (s)dNs= Zt1 (2.44)

and this conludes the proof.

2.2.3 Step 3

Let Ntthe innovation process defined in the previous step. We have assumed

that d(t) is ”bounded away from zero” on [0, T ]. This essentialy means that

there is a constant c > 0 such that |d(t)| > c 8t 2 [0, T ].

We define the process Rt as

dRt=

1

d(t)dNt (2.45)

with t 0 and R0 = 0.

We claim that Rt is a 1-dimensional Brownian Motion.

We observe that Rt inherith from Nt the continuity of paths, the

orthog-onality of the increments and the Gaussian property. We have to show that

E[Rt] = 0 and E[RrRs] = min(s, t)

In order to do this, we apply Itˆo’s formula to get

d(R2t) == 2RtdRt+ (dRt)2 = 2RtdRt+ dt (2.46)

and (using again the orthogonality of the increments) we have

E[R2

t] = t (2.47)

Thus, if s < t

E[RtRs] =E[(Rt Rs)Rs] +E[R2s] = s (2.48)

Since

(42)

we conclude that

ˆ

Xt=PL(R,t)(Xt) (2.50)

The projection in the space L(R, t) can be written as

ˆ Xt =E[Xt] + Z t 0 @ @sE[XtRs]dRs (2.51)

Indeed, recall the characterization of the functions in the space L(R, t).

We know that ˆ Xt = c0(t) + Z t 0 g(s)dRs g2 L2[0, t], c0(t)2 R (2.52)

Taking expectation we see that

c0(t) =E[ ˆXt] =E[Xt] (2.53)

Recall that we have that (Xt Xˆt) is orthogonal to every element of

L(r, t). So,

(Xt Xˆt)?

Z t

0

f (s)dRs 8f 2 L2[0, t] (2.54)

Thus, we can write (remember that Xt is independent from Rt and

E[ ˆXt] =E[Xt]) E[Xt Z t 0 f (s)dRs] =E[ ˆXt Z t 0 f (s)dRs] =E[ Z t 0 g(s)dRs Z t 0 f (s)dRs] =E[ Z t 0 g(s)f (s)ds] = Z t 0 g(s)f (s)ds 8f 2 L2_{[0, t]} If we choose f = [0,r], r t we have E[XtRr] = Z r 0 g(s)ds (2.55) or equivalently g(r) = @ @rE[XtRr] (2.56)

(43)

2.2.4 Step 4

Now, we want to find an explicit expression for Xt. We know that Xt solves

the linear SDE

dXt = a(t)Xtdt + b(t)dWt (2.57)

It’s easy to see (using the Itˆo formula) that the solution is

Xt = exp( Z t 0 a(s)ds)X0+ Z t 0 exp( Z t s a(s)ds)b(s)dWs (2.58)

Moreover, we have that

E[Xt] = E[X0]exp(

Z t

0

a(s)ds) (2.59)

2.2.5 Step 5

We now combine the previous steps to obtain the solution of the filtering

problem: a stochastic di↵erential equation for ˆXt. Starting with the formula

ˆ Xt =E[Xt] + Z t 0 f (s, t)dRs (2.60) where f (s, t) = @ @sE[XtRs] (2.61) we use that Rs= Z s 0 c(r) d(r)(Xr Xˆr)dr + Vs (2.62) and obtain E[XtRs] = Z s 0 c(r) d(r)E[XtX˜r] (2.63) where ˜ Xr = Xr Xˆr (2.64)

Using the explicit solution of Xt we obtain

E[XTX˜r] = exp(

Z t

r

a(v)dv)E[XrX˜r] = exp(

Z t

r

(44)

where S(r) =E[( ˜Xr)2] (2.65) Thus E[XtRs] = Z t 0 c(r) d(r)exp( Z t r a(v)dv)S(r)dr (2.66) so that f (s, t) = c(s) d(s)exp( Z t s a(v)dv)S(s) (2.67)

We claim that S(t) satisfies the Riccati Equation dS

dt = 2a(t)S(t)

c2_(t)

d2_(t)S

2_{(t) + b}2_(t) _(2.68)

In order to prove this, note that (Pythagorean theorem) we have

S(t) =E[(Xt Xˆt)2] =E[Xt2] 2E[XtXt] + ˆˆ X2_t=E[Xt2] E[ ˆXt]

which can be written as = T (t) Z t 0 f (s, t)2_ds _E[X t]2 where T (t) =E[X2 t].

Now, using the explicit solution of Xt we have

T (t) = exp(2 Z t 0 a(s)ds)_E[X2 0] + Z t 0 exp(2 Z t s a(u)du)b2_(s)ds _(2.69) Thus, dT dt = 2a(t)exp(2 Z t 0 a(s)ds)E[X02] + Z t 0 2a(t)exp(2 Z t s a(u)du)b2(s)ds + b2(t) (2.70) that is dT dt = 2a(t)T (t) + b 2_(t) _(2.71)

(45)

deriva-tive (using step 4) we have dS dt = dT dt f 2_{(t, t)} Z t 0 2f (s, t)·_@t@f (s, t)ds 2a(t)E[Xt]2 = 2a(t)T (t) + b2(t) c 2_(t)S2_(t) d2_(t) 2 Z t 0

f2(s, t)a(t)ds 2a(t)E[Xt]2

= 2a(t)S(t) + b2(t) c

2_(t)S2_(t)

d2_(t)

From the formula ˆ Xt = m0(t) + Z t 0 f (s, t)dRs m0(t) =E[Xt] (2.72) it follows that d ˆXt = m00(t)dt + f (t, t)dRt+ ( Z t 0 @ @tf (s, t)dRs)dt (2.73) since Z u 0 ( Z t 0 @ @tf (s, t)dRs)dt = Z u 0 ( Z u s @ @tf (s, t)dt)dRs = Z u 0 (f (s, u) f (s, s))dRs= ˆXu m0(u) Z u 0 f (s, s)dRs Thus, d ˆXt = c00(t)dt + a(t)· ( ˆXt c0(t))dt + c(t)S(t) d(t) dRt = a(t) ˆXtdt + c(t)S(t) d(t) dRt Now, we substitute dRt= 1 d(t)[dZt c(t) ˆXtdt] (2.74) we obtain d ˆXt = (a(t) c2_(t)S(t) d2_(t) ) ˆXtdt + c(t)S(t) d2_(t) dZt (2.75)

(46)

2.3 The Kalman Bucy Filter

Finally, collecting all the steps above, we can say that the solution ˆXt =

E[Xt|Gt] of the 1-dimensional filtering problem

dXt = a(t)Xtdt + b(t)dWt

dZt = c(t)Xtdt + d(t)d ˜Wt

with all the conditions on the coefficentis outlined above, satisfies the follow-ing SDE d ˆXt = (F (t) c2_(t)S(t) d2_(t) ) ˆXtdt + c(t)S(t) d2_(t) dZt Xˆ0 =E[X0] (2.76)

where S(t) =E[(Xt Xˆt)2] satisfies the Riccati equation

dS

dt = 2a(t)S(t)

c2_(t)

d2_(t)S

2_{(t) + b}2_(t) _(2.77)

with S(0) =E[(X0 E[X0])2]

2.4 Some Applications

We present here two examples to show how to implement the formulas we find above.

2.4.1 Noisy observation of a constant

The problem we study here is deliberately easy but we can learn a lot from it. This is the setting.

We have a normal random variable X _{⇠ N (µ,} 2_{). We cannot see its}

value. Instead, we can see a noisy version of it. For every time t2 [0, T ] we

observe

Yt= X0⇤ t +  ⇤ Wt (2.78)

(47)

Even from this simple example, we can see the main difficulties in applying the formulas we found above:

• We have to discretize the two SDEs (observable and unobservable pro-cesses).

• Usually, we don’t have an explicit formula for the Conditional Expec-tation but a SDE. So, we have to discretize it. Usually, this involves the computation of stochastic integrals and ordinary integrals with the consequent error.

In the script below I generate one value of X0. Then, I generate m

trajec-tories of Yt so that I can see an approximation of the comditonal expectation

(Remember that it’s a random variable itself).

What we obtain is that the goodness of our filter depends critically on the valus of . I called this parameter the amplification factor. Indeed, an higher value of  means that we have an higher noise (positive or negative) and consequently it’s difficult to filter it. Here, we report the Matlab code used in the simulation.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%% NOISY OBSERVATION OF A COSTANT %%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% close all;

clear; % Input:

mu=2; %Expected value of X_0 sigma=1; %sqrt(Variance) of X_0 kappa = 0.3; %Amplification T=1; %Time lenght

n=10000; %Number of time interval m = 1000; %Number of replication dt = T/n; %Lenght time interval % generazione variabile casuale X_0 X_0 = mu + sigma*randn(1);

time=0:dt:T;

% generazione Brownian motion W = zeros(m,n+1);

(48)

for k=2:n+1

W(:,k)=W(:,k-1)+sqrt(dt)*randn(m,1); end

% generazione observable process Y = X_0*repmat(time,m,1)+kappa*W; % compute the best estimate

Y_hat = ((kappa^2)*mu +(sigma^2)*Y)./repmat(kappa^2+(sigma^2)*time,m,1); % plot a sample trajectories

plot(time,Y(1,:)’)

title(’Sample trajectory observable process’); xlabel(’time’); figure plot(time,Y_hat(1,:)) title(’Conditional expectation’); xlabel(’time’); figure hist(Y_hat(:,100));

title(’Empirical distribution at time t=0.1’); figure

hist(Y_hat(:,10000));

(49)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 time -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0

0.1 Sample trajectory observable process

Figure 2.1: Simulated trajectory of observable process Yt

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 time -1 -0.5 0 0.5 1 1.5 2 2.5 Conditional expectation

Figure 2.2: Conditional expectation processE[X0|Ft]

2.4.2 Discretization problem: a solution

As we have seen in the previous example (and we will see again in the non-linear setting), to implement the Kalman-Bucy filter we necessarily have to

(50)

discretize the problem.

This raises a question. Are we sure there is no a better fiter in the discrete case? In this section we briefly present the so called Kalman filter (no Bucy) and an application with Matlab to show its potential.

From now on we essentially follows the book of Hamilton (1994). The key starting point is the so-called state-space representation of a dynamic model

for yt

✏t+1= F✏t+ vt+1 (2.79)

yt = A0xt+ H0✏t+ wt (2.80)

In the expression above yt represents a (n⇥ 1) vector of observed variables

at time t. ✏t, instead, is an unobservable process. We know its law but

not its values. A0, F, H0 are matrices of known parameters of dimension

(n⇥ k), (r ⇥ r), (n ⇥ r) respectively. xt is a (k ⇥ 1) vector of exogenous

variables. vt and w are vector white noise with corresponding

variance-covariance matrix given by Q and R. They are independent each other and intertemporraly (See the similarity with the Brownian motion).

As can be seen this a really general setting and an high number of con-tinuous filtering problem can be approximated by this discrete version.

We will not study explicitly the derivation of the Filter (in any case it’s not difficult at all and the mathematics is well beyond that developed above and below) but we highlight here three problems that can be solved using the Filter.

The first problem we want to tackle is the discrete analogous of the fil-tering problem in the continuous case. We want to compute

✏t+1|t=E[✏t+1|⌦t] (2.81)

where ⌦t ={yt,· · · , y1, xt,· · · , x1}.

(51)

value of yt given ⌦t 1. More formally, we can compute

yt|t 1=E[yt|xt, ⌦t 1] (2.82)

Forecasting yt is really natural: we can observe it and, most part of the

time, is the process we are interested in.

Sometimes, however, we may also be interested in the ”latent” process ✏t.

We remember here that we cannot know the values of the state process (even

at time T when we collected all the observation of yt. Thus, it’s natural to

estimate the process ✏t using all the information collected from time t = 0

to t = T . Formally, we want to compute

✏t|T =E[✏t|⌦T] (2.83)

This ”procedure” is called smoothing.

Obtaining a solution of the smoothing problem in the continuous case (even when we are in the linear setting) is really difficult.

We show the potential of the Filter above in a simple simuleted example. Here is the parameter of my example and its state-space representation. Anyway, I reported the code below.

The state (bivariate) process is given by " z1 t z2 t # = " 0.1 0.4 0.3 0.7 # " z1 t 1 z2 t 1 # + " ✏1 t ✏2 t # (2.84) while the observation process is

yt= 0.5⇤ sin(t) + h 0.6 0.9i " z1 t z2 t # (2.85) Moreover, I assumed that the shock are independent and incorrelated with unit variance. We report here the code used. All the formulas and the algorithm are based on Hamilton, 1994 ([6]).

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%% TESTING KALMAN FILTER %%%%%%%%%%%%%%%%%%%%%%%%

(52)

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% n = 1000; % number of observations

% generation exogenous variables x

X = sin(linspace(0,10,1000))’; % simulated exogenous variable % generation unobservable processes

Z = zeros(2,n); Z(:,1) = 0; F = [0.1, 0.4; 0.3, 0.7]; for t=1:n-1 Z(:,t+1) = F*Z(:,t)+randn(2,1); end

% generation observable process A = 0.5; H=[0.6 0.9]; y=zeros(1,n); for t=1:n y(t) = 0.5 * X(t) + H*Z(:,t)+randn(1); end Y = y’;

R=1; % Because we’re considering normal gaussian Q=eye(2); % Two gaussian incorrelated random variable % Kalman filter algorithm

[E,E_TT,Y_hat] = KalmanFilter_d(Y,F,X,A,H,Q,R);

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%% KALMAN FILTER ALGORITHM %%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% function [E,E_TT,Y_hat] = KalmanFilter_d(Y,F,X,A,H,Q,R)

% Model:

% e_{t+1} = F*e_{t} + v_{t+1} ---> State equation % y_t = A*x_{t} + H*e_{t} + w_t ---> Observation equation %

% Y = vector of observations % X = vector of exogenous variables % A,F,H parameters matrix

% Q,R = variance-covariance matrix of the white noise (v_t and w_t) % Dimensions control:

T=size(Y,1); % Number of periods;

n=size(Y,1); % Number of observations for every t; r=size(F,1); % Dimensions of the unobservable vector; % Matrix initialization:

(53)

% E(t) = conditional exp. of the state vector given observation of y_t % (t=1,...,t-1)

% E_tt = conditional exp. of the state vector given observation of y_t % (t=1,...,t)

% E_TT = conditional exp. of the state vector given the entire observation % of y_t (t=1,..., T)

% Y_hat = the forecast of y_t E= zeros(r,T); P= zeros(r^2,T); P_tt = zeros(r^2,T); E_TT = zeros(r,T); E_tt = zeros(r,T); J = zeros(r^2,T-1); Y_hat=zeros(n,1);

% Starting values of the algorithm E(:,1) = 0;

P(:,1) = inv(eye(r^2)-kron(F,F))*reshape(Q,r^2,1); % Forecasting Algorithm for e_{t}

for k=1:T Z=reshape(P(:,k),r,r); E_tt(:,k)=E(:,k)+Z*H’*inv(H*Z*H’+R)*(Y(k)-A*X(k)-H*E(:,k)); E(:,k+1)= F*E_tt(:,k); D=Z-Z*H’*inv(H * Z * H’ + R)*H*Z; C=F*D*F’ + Q; J(:,k)=reshape(D*F’*C,r^2,1); P_tt(:,k) = reshape(D,r^2,1); P(:,k+1) = reshape(C,r^2,1); end % Forecasting of y_{t}: for k=1:T

Y_hat(k) = A*X(k) + H*E(:,k); end

% Smoothing Algorithm: E_TT(:,end) = E_tt(:,end); for k=1:T-1

K=reshape(J(:,end-k+1),r,r);

E_TT(:,end-k) = E_tt(:,end-k) + K*(E_TT(:,T-k+1)-E(:,end-k+1)); end

end

(54)

0 100 200 300 400 500 600 700 800 900 1000 Observation -5 -4 -3 -2 -1 0 1 2 3

4 Filtering of state process Z(1)

State process Z(1) Filter

Figure 2.3: Simulated trajectory and filter of z1

t 0 100 200 300 400 500 600 700 800 900 1000 Observation -8 -6 -4 -2 0 2 4

6 Filtering of state process Z(2)

State process Z(2) Filter

Figure 2.4: Simulated trajectory and filter of z2

(55)

0 100 200 300 400 500 600 700 800 900 1000 Observation -10 -8 -6 -4 -2 0 2 4 6

8 Forecast observable process

Observable process Y Forecast

Figure 2.5: Simulated trajectory and forecast of yt

0 100 200 300 400 500 600 700 800 900 1000 Observation -5 -4 -3 -2 -1 0 1 2 3

4 Smoothing of state process Z(1)

State process Z(1) Smoothing process

Figure 2.6: Smoothing process of z1

(56)

Chapter 3 Non-linear case

In this section we will study the non-linear case of the stochastic filtering problem presented above.

Since we will use some results about change of measure, I recall some results in the first section. In this way there won’t be any interruption or regression when we’ll deal with the filtering problem.

3.1 Some results about change of measure

Let us consider a non-empty set X equipped with a -algebra_{F. Let P and}

Q be two measures defined onF.

We say that Q is absolutely continuous w.r.t. P on_{F if}

8E 2 F, P (E) = 0 ) Q(E) = 0 (3.1)

and we write Q_{⌧ P .}

Clearly, the definition depends onF. For this reason it’s more convenient

to write Q⌧F P .

Indeed, if_{G is a sub -algebra of F and we have that Q ⌧}G P , this does

not imply that Q_⌧_F P .

If Q⌧ P and P ⌧ Q on F, we say that Q and P are equivalent and we

(57)

We know that, for instance, Nµ, 2 is absolutely continuous w.r.t. the Lebesgue measure (m) (It follows from the property of Lebesgue integral).

Suppose now we have a probability space (⌦,_{F, P ) and a random variable}

X : ⌦_{! [0, 1[ such that E[X] = 1. If we define another measure Q as}

Q(A) = Z

A

XdP, A_{2 F} (3.2)

we have that Q is a probability measure (Q(⌦) =E[X] = 1) and Q ⌧ P on

F.

The requirement that E[X] = 1 has been made only to obtain another

probability measure.

We may wonder if all the measures absolutely continuous w.r.t. P have the above form (integral of a proper function). The answer is yes and it’s contained on the foundamental theorem of Radon-Nikodym.

Let (⌦,F, P ) be a finite measure space (not necessarily a probability

space). If Q is a finite measure (Q(⌦) < _{1) on (⌦, F), then there exists}

L : ⌦_{! R, L} 0 such that

• L is F-measurable; • L is P -sommable;

• Q(A) = RALdP 8A 2 F

We say that L is the density of Q with respect to P or, alternatively, the

Radon-Nikodym derivative of Q w.r.t. P onF. We write, indi↵erently,

L = dQ

dP, dQ = LdP (3.3)

or, to emphasize the dependence on_F,

L = dQ

dP|F (3.4)

Of course, if we change the value of L in a negligible set, the value of the integral does not change. This means that L is P -almost surely unique: if

(58)

Moreover, let P and Q be probability measures on the space (⌦, P ) with

Q _{⌧ P and L =} dQ_dP. It can be shown that X _{2 L}1_{(⌦, Q) if and only if}

XL_{2 L}1_{(⌦, P ) and we have}

EQ_{[X] =}_EP_[XL] _(3.5)

We recall here the notion of conditional expectation. Let X be a real

P -sommable random variable on a probability space (⌦,F, P ) and G ⇢ F

be a -algebra. Let Y be a random variable such that • Y is P -integrable and G-measurable

• RAXdP =

R

AY dP, 8A 2 G

We say that Y is the conditional expectation of X given G and we write

Y =E[X|G].

Of course, Y is defined P -almost surely. The proof of the existence of a random variable with these properties make use of the Radon-Nikodym theorem.

Suppose, for the moment, that X is a positive and P -integrable random

variable on (⌦,_{F, P ) and G is a sub- -algebra. If we define a measure Q as}

Q(A) = Z

A

XdP A2 G (3.6)

then Q_{⌧ P in G. From the Radon Nikodym theorem, we know that there}

exist a _{G-measurable and positive random variable L such that}

Z A XdP = Q(A) = Z A LdP, 8A 2 G

so that E[X|G] = L P a.s. In the case of a general random variable we use

the previous result on X+ _{and X .}

Now, we have all the ingredients to state the main results about the change of measure.

Let (⌦,_{F, P ) be a probability space, G ⇢ F a -algebra, Q a probability}

Stochastic filtering: a functional and probabilistic approach.

Master’s Degree programme

in Economics and Finance

Final thesis

Stochastic filtering:

a functional and probabilistic approach

Supervisor

Ch. Prof. Antonella Basso

Graduand

Gianmarco Vidotto

Matriculation Number 842271

Contents

Introduction

Chapter 1

Probability and SDE theory

1.1

Stochastic Process

1.2

Martingale spaces:

M

and

M

1.3

Usual hypotheses

1.4

Stopping times

1.5

Brownian motion

1.6

Itˆ

o Integral

1.7

Itˆ

o process

1.8

SDE: Existence and Uniqueness

Chapter 2

Stochastic filtering

2.1

The problem

2.2

The 1-d linear case

2.2.1

Step 1

2.2.2

Step 2

2.2.3

Step 3

2.2.4

Step 4

2.2.5

Step 5

2.3

The Kalman Bucy Filter

2.4

Some Applications

2.4.1

Noisy observation of a constant

2.4.2

Discretization problem: a solution

Chapter 3

Non-linear case

3.1

Some results about change of measure

_M

_M