• Non ci sono risultati.

Stochastic Processes (Master degree in Engineering) Franco Flandoli

N/A
N/A
Protected

Academic year: 2021

Condividi "Stochastic Processes (Master degree in Engineering) Franco Flandoli"

Copied!
52
0
0

Testo completo

(1)

Stochastic Processes

(Master degree in Engineering)

Franco Flandoli

(2)
(3)

Preface v

Chapter 1. Preliminaries of Probability 1

1. Transformation of densities 1

2. About covariance matrices 3

3. Gaussian vectors 5

Chapter 2. Stochastic processes. Generalities 13

1. Discrete time stochastic process 13

2. Stationary processes 16

3. Time series and empirical quantities 19

4. Gaussian processes 21

5. Discrete time Fourier transform 22

6. Power spectral density 24

7. Fundamental theorem on PSD 26

8. Signal to noise ratio 30

9. An ergodic theorem 31

Chapter 3. ARIMA models 37

1. De…nitions 37

2. Stationarity, ARMA and ARIMA processes 40

3. Correlation function 41

4. Power spectral density 45

iii

(4)
(5)

These notes are planned to be the last part of a course of Probability and Stochastic Processes.

The …rst part is devoted to the introduction to the following topics, taken for instance from the book of Baldi (Italian language) or Billingsley (in English):

Probability space ( ; F; P )

Conditional probability and independence of events Factorization formula and Bayes formula

Concept of random variable X, random vector X = (X1; :::; Xn) Law of a r.v., probability density (discrete and continuous) Distribution function and quantiles

Joint law of a vector and marginal laws, relations

(Transformation of densities and moments) (see complements below) Expectation, properties

Moments, variance, standard deviation, properties Covariance and correlation coe¢ cient, covariance matrix Generating function and characteristic function

(Discrete r.v.: Bernoulli, binomial, Poisson, geometric)

Continuous r.v.: uniform, exponential, Gaussian, Weibull, Gamma Notions of convergence of r.v.

(Limit theorems: LLN, CLT; Chebyshev inequality.)

Since we need some more specialized material, Chapter 1 is a complement to this list of items.

v

(6)
(7)

Preliminaries of Probability

1. Transformation of densities

Exercise 1. If X has cdf FX(x) and g is increasing and continuous, then Y = g (X) has cdf FY (y) = FX g 1(y)

for all y in the image of y. If g is decreasing and continuous, the formula is FY (y) = 1 FX g 1(y)

Exercise 2. If X has continuous pdf fX(x) and g is increasing and di¤ erentiable, then Y = g (X) has pdf

fY (y) = fX g 1(y)

g0(g 1(y)) = fX(x) g0(x) y=g(x)

for all y in the image of y. If g is decreasing and di¤ erentiable, the formula is fY (y) = fX(x)

g0(x) y=g(x): Thus, in general, we have the following result.

Proposition 1. If g is monotone and di¤ erentiable, the transformation of densities is given by fY (y) = fX(x)

jg0(x)j y=g(x)

Remark 1. Under proper assumptions, when g is not injective the formula generalizes to fY (y) = X

x:y=g(x)

fX(x) jg0(x)j:

Remark 2. A second proof of the previous formula comes from the following characterization of the density: f is the density of X if and only if

E [h (X)] = Z

R

h (x) f (x) dx

for all continuous bounded functions h. Let us use this fact to prove that fY (y) = fjgX0(x)

(x)j y=g(x) is the density of Y = g (X). Let us compute E [h (Y )] for a generic continuous bounded functions h. We

1

(8)

2 1. PRELIMINARIES OF PROBABILITY

have, from the de…nition of Y and from the characterization applied to X, E [h (Y )] = E [h (g (X))] =

Z

R

h (g (x)) f (x) dx:

Let us change variable y = g (x), under the assumption that g is monotone, bijective and di¤ erentiable.

We have x = g 1(y), dx = 1

jg0(g 1(y))jdy (we put the absolute value since we do not change the extreme of integration, but just rewrite R

R) so that Z

R

h (g (x)) f (x) dx = Z

R

h (y) f g 1(y) 1

jg0(g 1(y))jdy:

If we set fY (y) := jgfX0(x)

(x)j y=g(x) we have proved that E [h (Y )] =

Z

R

h (y) fY (y) dy

for every continuous bounded functions h. By the characterization, this implies that fY (y) is the density of Y . This proof is thus based on the change of variable formula.

Remark 3. The same proof works in the multidimensional case, using the change of variable formula for multiple integrals. Recall that in place of dy = g0(x)dx one has to use dy = jdet Dg (x)j dx where Dg is the Jacobian (the matrix of …rst derivatives) of the transformation g : Rn! Rn. In fact we need the inverse transformation, so we use the corresponding formula

dx = det Dg 1(y) dy = 1

jdet Dg (g 1(y))jdy:

With the same passages performed above, one gets the following result.

Proposition 2. If g is a di¤ erentiable bijection and Y = g (X), then fY (y) = fX(x)

jdet Dg (x)j y=g(x):

Exercise 3. If X (in Rn) has density fX(x) and Y = U X, where U is an orthogonal linear transformation of Rn (it means that U 1 = UT), then Y has density

fY (y) = fX UTy :

1.1. Linear transformation of moments. The solution of the following exercises is based on the linearity of expected value (and thus of covariance in each argument).

Exercise 4. Let X = (X1; :::; Xn) be a random vector, A be a n d matrix, Y = AX. Let

X = X1 ; :::; Xn be the vector of mean values of X, namely Xi = E [Xi]. Then

Y := A X is the vector of mean values of Y , namely Yi = E [Yi].

Exercise 5. Under the same assumptions, if QX and QY are the covariance matrices of X and Y , then

QY = AQXAT:

(9)

2. About covariance matrices

The covariance matrix Q of a vector X = (X1; :::; Xn), de…ned as Qij = Cov (Xi; Xj), is symmetric:

Qij = Cov (Xi; Xj) = Cov (Xj; Xi) = Qji and non-negative de…nite:

xTQx = Xn i;j=1

Qijxixj = Xn i;j=1

Cov (Xi; Xj) xixj = Xn i;j=1

Cov (xiXi; xjXj)

= Cov 0

@ Xn i=1

xiXi; Xn j=1

xjXj

1

A = V ar [W ]

where W =Pn

i=1xiXi.

The spectral theorem states that any symmetric matrix Q can be diagonalized, namely it exists a orthonormal basis e1; :::; en of Rn where Q takes the form

Qe= 0

@ 1 0 0

0 ::: 0

0 0 n

1 A :

Moreover, the numbers i are eigenvalues of Q, and the vectors ei are corresponding eigenvectors.

Since the covariance matrix Q is also non-negative de…nite, we have

i 0; i = 1; :::; n:

Remark 4. To understand better this theorem, recall a few facts of linear algebra. Rn is a vector space with a scalar product h:; :i, namely a set of elements (called vectors) with certain operations (sum of vectors, multiplication by real numbers, scalar product between vectors) and properties. We may call intrinsic the objects de…ned in these terms, opposite to the objects de…ned by means of numbers, with respect to a given basis. A vector x 2 Rn is an intrinsic object; but we can write it as a sequence of numbers (x1; :::; xn) in in…nitely many ways, depending on the basis we choose. Given an orthonormal basis u1; :::; un, the components of a vector x 2 Rn in this basis are the numbers hx; uji, j = 1; :::; n. A linear map L in Rn, given the basis u1; :::; un, can be represented by a matrix of components hLui; uji.

We shall write yT x for hx; yi (or hy; xi).

Remark 5. After these general comments, we see that a matrix represents a linear transformation, given a basis. Thus, given the canonical basis of Rn, that we shall denote by u1; :::; un, given the matrix Q, it is de…ned a linear transformation L from Rn to a Rn. The spectral theorem states that there is a new orthonormal basis e1; :::; en of Rn such that, if Qe represents the linear transformation L in this new basis, then Qe is diagonal.

Remark 6. Let us recall more facts about linear algebra. Start with an orthonormal basis u1; :::; un, that we call canonical or original basis. Let e1; :::; en be another orthonormal basis. The vector u1, in

(10)

4 1. PRELIMINARIES OF PROBABILITY

the canonical basis, has components

u1= 0 BB

@ 1 0 :::

0 1 CC A

and so on for the other vectors. Each vector ej has certain components. Denote by U the matrix such that its …rst column has the same components as e1 (those of the canonical basis), and so on for the other columns. We could write U = (e1; :::; en). Also, Uij = eTj ui. Then

U 0 BB

@ 1 0 :::

0 1 CC A = e1

and so on, namely U represents the linear map which maps the canonical (original) basis of Rn into e1; :::; en. This is an orthogonal transformation:

U 1 = UT:

Indeed, U 1 maps e1; :::; en into the canonical basis (by the above property of U ), and UT does the same:

UTe1 = 0 BB

@

eT1 e1

eT2 e1

:::

eTn e1 1 CC A =

0 BB

@ 1 0 :::

0 1 CC A

and so on.

Remark 7. Let us now go back to the covariance matrix Q and the matrix Qegiven by the spectral theorem: Qe is a diagonal matrix which represents the same linear transformation L in a new basis e1; :::; en. Assume we do not know anything else, except they describe the same map L and Qe is diagonal, namely of the form

Qe= 0

@ 1 0 0

0 ::: 0

0 0 n

1 A :

Let us deduce a number of facts:

i)

Qe= U QUT

ii) the diagonal elements j are eigenvalues of L, with eigenvectors ej iii) j 0, j = 1; :::; n.

To prove (i), recall from above that

(Qe)ij = eTj Lei and Qij = uTj Lui:

(11)

Moreover, Uij = eTj ui, hence ej =Pn

k=1Ukjuk, and thus (Qe)ij = eTj Lei=

Xn k;k0=1

UkiUk0juTk0 Luk= Xn k;k0=1

UkiQijUk0j = U QUT ij:

To prove (ii), let us write the vector Le1 in the basis e1; :::; en: ei is the vector 0 BB

@ 1 0 :::

0 1 CC

A, the map L is represented by Qe, hence Le1 is equal to

Qe

0 BB

@ 1 0 :::

0 1 CC A =

0 BB

@

1

0 :::

0 1 CC A = 1

0 BB

@ 1 0 :::

0 1 CC A

which is 1e1 in the basis e1; :::; en. We have checked that Le1 = 1e1, namely that 1 is an eigenvalue and e1 is a corresponding eigenvector. The proof for 2, etc. is the same. To prove (iii), just see that, in the basis e1; :::; en,

eTjQeej = j: But

eTjQeej = ejTU QUTej = vTQv 0

where v = UTej, having used the property that Q is non-negative de…nite. Hence j 0.

3. Gaussian vectors

Recall that a Gaussian, or Normal, r.v. N ; 2 is a r.v. with probability density f (x) = 1

p2 2exp jx j2 2 2

! :

We have shown that is the mean value and 2 the variance. The standard Normal is the case = 0,

2= 1. If Z is a standard normal r.v., then + Z is N ; 2 .

We may give the de…nition of Gaussian vector in two ways, generalizing either the expression of the density or the property that + Z is N ; 2 . Let us start with a lemma.

Lemma 1. Given a vector = ( 1; :::; n) and a symmetric positive de…nite n n matrix Q (namely vTQv > 0 for all v 6= 0), consider the function

f (x) = 1

p(2 )ndet(Q)exp (x )T Q 1(x ) 2

!

where x = (x1; :::; xn) 2 Rn. Notice that the inverse Q 1 is well de…ned for positive de…nite matrices, (x )TQ 1(x ) is a positive quantity, det(Q) is a positive number. Then:

i) f (x) is a probability density;

(12)

6 1. PRELIMINARIES OF PROBABILITY

ii) if X = (X1; :::; Xn) is a random vector with such joint probability density, then is the vector of mean values, namely

i= E [Xi] and Q is the covariance matrix:

Qij = Cov (Xi; Xj) :

Proof. Step 1. In this step we explain the meaning of the expression f (x). We have recalled above that any symmetric matrix Q can be diagonalized, namely it exists a orthonormal basis e1; :::; en

of Rn where Q takes the form

Qe= 0

@ 1 0 0

0 ::: 0

0 0 n

1 A :

Moreover, the numbers i are eigenvalues of Q, and the vectors ei are corresponding eigenvectors.

See above for more details. Let U be the matrix introduced there, such that U 1 = UT. Recall the relation Qe= U QUT.

Since vTQv > 0 for all v 6= 0, we deduce

vTQev = vTU Q UTv > 0 for all v 6= 0 (since UTv 6= 0). Taking v = ei, we get i> 0.

Therefore the matrix Qe is invertible, with inverse given by

Qe1 = 0

@

1

1 0 0

0 ::: 0

0 0 n1

1 A :

It follows that also Q, being equal to UTQeU (the relation Q = UTQeU comes from Qe = U QUT), is invertible, with inverse Q 1 = UTQe1U . Easily one gets (x )T Q 1(x ) > 0 for x 6= . Moreover,

det(Q) = det UT det (Qe) det (U ) = 1 n because

det(Qe) = 1 n

and det (U ) = 1. The latter property comes from

1 = det I = det UTU = det UT det (U ) = det (U )2

(to be used in exercise 3). Therefore det(Q) > 0. The formula for f (x) is meaningful and de…nes a positive function.

Step 2. Let us prove that f (x) is a density. By the theorem of change of variables in multidi- mensional integrals, with the change of variables x = UTy,

Z

Rn

f (x) dx = Z

Rn

f UTy dy

(13)

because det UT = 1 (and the Jacobian of a linear transformation is the linear map itself). Now, since U Q 1UT = Qe1, f UTy is equal to the following function:

fe(y) = 1

p(2 )ndet(Qe)exp (y e)TQe1(y e) 2

!

where

e= U : Since

(y e)T Qe1(y e) = Xn i=1

(yi ( e)i)2

i

and det(Qe) = 1 n, we get fe(y) =

Yn i=1

p1 2 i

exp (yi ( e)i)2 2 i

! :

Namely, fe(y) is the product of n Gaussian densities N (( e)i; i). We know from the theory of joint probability densities that the product of densities is the joint density of a vector with indepen- dent components. Hence fe(y) is a probability density. Therefore R

Rnfe(y) dy = 1. This proves R

Rnf (x) dx = 1, so that f is a probability density.

Step 3. Let X = (X1; :::; Xn) be a random vector with joint probability density f , when written in the original basis. Let Y = U X. Then (exercise 3) Y has density fY (y) given by fY (y) = f UTy . Thus

fY (y) = fe(y) = Yn i=1

p1 2 i

exp (yi ( e)i)2 2 i

! : Thus (Y1; :::; Yn) are independent N (( e)i; i) r.v. and therefore

E [Yi] = ( e)i; Cov (Yi; Yj) = ij i: From exercises 4 and 5 we deduce that X = UTY has mean

X = UT Y and covariance

QX = UTQYU:

Since Y = e and e = U we readily deduce X = UTU = . Since QY = Qe and Q = UTQeU we get QX = Q. The proof is complete.

Definition 1. Given a vector = ( 1; :::; n) and a symmetric positive de…nite n n matrix Q, we call Gaussian vector of mean and covariance Q a random vector X = (X1; :::; Xn) having joint probability density function

f (x) = 1

p(2 )ndet(Q)exp (x )T Q 1(x ) 2

!

where x = (x1; :::; xn) 2 Rn. We write X N ( ; Q).

(14)

8 1. PRELIMINARIES OF PROBABILITY

The only drawback of this de…nition is the restriction to strictly positive de…nite matrices Q. It is sometimes useful to have the notion of Gaussian vector also in the case when Q is only non-negative de…nite (sometimes called degenerate case). For instance, we shall see that any linear transformation of a Gaussian vector is a Gaussian vector, but in order to state this theorem in full generality we need to consider also the degenerate case. In order to give a more general de…nition, let us take the idea recalled above for the 1-dimensional case: a¢ ne transformations of Gaussian r.v. are Gaussian.

Definition 2. i) The standard d-dimensional Gaussian vector is the random vector Z = (Z1; :::; Zd) with joint probability density f (z1; :::; zd) =

Yd i=1

p (zi) where p (z) = p1 2 e z22 :

ii) All other Gaussian vectors X = (X1; :::; Xn) (in any dimension n) are obtained from standard ones by a¢ ne transformations:

X = AZ + b

where A is a matrix and b is a vector. If X has dimension n, we require A to be d n and b to have dimension n (but n can be di¤ erent from d).

The graph of a standard 2-dimensional Gaussian vector is

2 2

0 0 -20.00

x y

-2 0.15

z 0.10

0.05

and the graph of the other Gaussian vectors can be guessed by linear deformations of the base plane xy (deformations de…ned by A) and shift (by b). For instance, if

A = 2 0

0 1

matrix which enlarge the x axis by a factor 2, we get the graph

(15)

4 40.0020 02 -2

x y

-4 -2-4

0.05 0.15

z 0.10

First, let us compute the mean and covariance matrix of a vector of the form X = AZ + b, with Z of standard type. From exercises 4 and 5 we readily have:

Proposition 3. Mean and covariance Q matrix of a vector X of the previous form are given by

= b Q = AAT:

When two di¤erent de…nitions are given for the same object, one has to prove their equivalence.

If Q is positive de…nite, the two de…nition aim to describe the same object, but for Q non-negative de…nite but not strictly positive de…nite, we have only the last de…nition, so we do not have to check any compatibility.

Proposition 4. If Q is positive de…nite, then de…nitions 1 and 2 are equivalent. More precisely, if X = (X1; :::; Xn) is a Gaussian random vector with mean and covariance Q in the sense of de…nition 1, then there exists a standard Gaussian random vector Z = (Z1; :::; Zn) and a n n matrix A such that

X = AZ + : One can take A = p

Q, as described in the proof. Vice versa, if X = (X1; :::; Xn) is a Gaussian random vector in the sense of de…nition 2, of the form X = AZ + b, then X is Gaussian in the sense of de…nition 1, with mean and covariance Q given by the previous proposition.

Proof. Let us prove the …rst claim. Let us de…ne pQ = UTp

QeU wherep

Qe is simply de…ned as

pQe= 0

@ p

1 0 0

0 ::: 0

0 0 p

n

1 A :

We have

pQ T = UT p Qe

T

U = UTp

QeU =p Q

(16)

10 1. PRELIMINARIES OF PROBABILITY

and p

Q 2 = UTp

QeU UTp

QeU = UTp Qep

QeU = UTQeU = Q becausep

Qep

Qe= Qe. Set

Z = p

Q 1X where notice that p

Q is invertible, from its de…nition and the strict positivity of i. Then Z is Gaussian. Indeed, from the formula for the transformation of densities,

fZ(z) = fX(x)

jdet Dg (x)j z=g(x)

where g (x) = p

Q 1x ; hence det Dg (x) = det p

Q 1 = p 1

1

p

n; therefore fZ(z) =

Yn i=1

p

i

p 1

(2 )ndet(Q)exp

pQz + T Q 1 p

Qz + 2

!

= 1

p(2 )nexp

pQz T Q 1 p Qz 2

!

= 1

p(2 )nexp zTz 2

which is the density of a standard Gaussian vector. From the de…nition of Z we get X =p

QZ + , so the …rst claim is proved.

The proof of the second claim is a particular case of the next exercise, that we leave to the reader.

Exercise 6. Let X = (X1; :::; Xn) be a Gaussian random vector, B a n m matrix, c a vector of Rm. Then

Y = BX + c

is a Gaussian random vector of dimension m. The relations between means and covariances is

Y = B X + c and covariance

QY = BQXBT:

Remark 8. We see from the exercise that we may start with a non-degenerate vector X and get a degenerate one Y , if B is not a bijection. This always happens when m > n.

Remark 9. The law of a Gaussian vector is determined by the mean vector and the covariance matrix. This fundamental fact will be used below when we study stochastic processes.

Remark 10. Some of the previous results are very useful if we want to generate random vectors according to a prescribed Gaussian law. Assume we have prescribed mean and covariance Q, n- dimensional, and want to generate a random sample (x1; :::; xn) from such N ( ; Q). Then we may generate n independent samples z1; :::; zn from the standard one-dimensional Gaussian law and com-

pute p

Qz +

(17)

where z = (z1; :::; zn). In order to have the entries of the matrix p

Q, if the software does not provide them (certain software do it), we may use the formulap

Q = UTp

QeU . The matrix p

Qe is obvious.

In order to get the matrix U recall that its columns are the vectors e1; :::; en written in the original basis. And such vectors are an orthonormal basis of eigenvectors of Q. Thus one has to use at least a software that makes the spectral decomposition of a matrix, to get e1; :::; en.

(18)
(19)

Stochastic processes. Generalities

1. Discrete time stochastic process

We call discrete time stochastic process any sequence X0; X1; X2; :::; Xn; ::: of random variables de…ned on a probability space ( ; F; P ), taking values in R. This de…nition is not so rigid with respect to small details: the same name is given to sequences X1; X2; :::; Xn; :::, or to the case when the r.v.

Xntake values in a space di¤erent from R. We shall also describe below the case when the time index takes negative values.

The main objects attached to a r.v. are its law, its …rst and second moments (and possibly higher order moments and characteristic or generating function, and the distribution function). We do the same for a process (Xn)n 0: the probability density of the r.v. Xn, when it exists, will be denoted by fn(x), the mean by n, the standard deviation by n. Often, we shall write t in place of n, but nevertheless here t will be always a non-negative integer. So, our …rst concepts are:

i) mean function and variance function:

t= E [Xt] ; t2= V ar [Xt] ; t = 0; 1; 2; :::

In addition, the time-correlation is very important. We introduce three functions:

ii) the autocovariance function C (t; s), t; s = 0; 1; 2; ::::

C (t; s) = E [(Xt t) (Xs s)]

and the function

R (t; s) = E [XtXs]

(the name will be discussed below). They are symmetric (R (t; s) = R (s; t) and the same for C (t; s)) so it is su¢ cient to know them for t s. We have

C (t; s) = R (t; s) t s; C (t; t) = t2:

In particular, when t 0 (which is often the case), C (t; s) = R (t; s). Most of the importance will be given to t and R (t; s). In addition, let us introduce:

iii) the autocorrelation function

(t; s) = C (t; s)

t s

We have

(t; t) = 1; j (t; s)j 1:

The functions C (t; s), R (t; s), (t; s) are used to detect repetitions in the process, self-similarities under time shift. For instance, if (Xn)n 0 is roughly periodic of period P , (t + P; t) will be signi…- cantly higher than the other values of (t; s) (except (t; t) which is always equal to 1). Also a trend

13

(20)

14 2. STOCHASTIC PROCESSES. GENERALITIES

is a form of repetitions, self-similarity under time shift, and indeed when there is a trend all values of (t; s) are quite high, compared to the cases without trend. See the numerical example below.

Other objects (when de…ned) related to the time structure are:

iv) the joint probability density

ft1;:::;tn(x1; :::; xn) ; ; tn ::: t1

of the vector (Xt1; :::; Xtn) and v) the conditional density

ftjs(xjy) = ft;s(x; y)

fs(y) ; t > s:

Now, a remark about the name of R (t; s). In Statistics and Time Series Analysis, the name autocorrelation function is given to (t; s), as we said above. But in certain disciplines related to signal processing, R (t; s) is called autocorrelation function. There is no special reason except the fact that R (t; s) is the fundamental quantity to be understood and investigated, the others (C (t; s) and (t; s)) being simple transformations of R (t; s). Thus R (t; s) is given the name which mostly reminds the concept of self-relation between values of the process at di¤erent times. In the sequel we shall use both languages and sometimes we shall call (t; s) the autocorrelation coe¢ cient.

The last object we introduce is concerned with two processes simultaneously: (Xn)n 0and (Yn)n 0. It is called:

vi) cross-correlation function

CX;Y (t; s) = E [(Xt E [Xt]) (Ys E [Ys])] :

This function is a measure of the similarity between two processes, shifted in time. For instance, it can be used for the following purpose: one of the two processes, say Y , is known, has a known shape of interest for us, the other process, X, is the process under investigation, and we would like to detect portions of X which have a shape similar to Y . Hence we shift X in all possible ways and compute the correlation with Y .

When more than one process is investigated, it may be better to write RX(t; s), CX(t; s) and so on for the quantities associated to process X.

1.1. Example 1: white noise. The white noise with intensity 2 is the process (Xn)n 0 with the following properties:

i) X0; X1; X2; :::; Xn; ::: are independent r.v.’s ii) Xn N 0; 2 .

It is a very elementary process, with a trivial time-structure, but it will be used as a building block for other classes of processes, or as a comparison object to understand the features of more complex cases. The following picture has been obtained by R software by the commands x<-rnorm(1000);

ts.plot(x).

(21)

Let us compute all its relevant quantities (the check is left as an exercise):

t= 0 t2 = 2 R (t; s) = C (t; s) = 2 (t s) where the symbol (t s) denotes 0 for t 6= s, 1 for t = s,

(t; s) = (t s) ft1;:::;tn(x1; :::; xn) =

Yn i=1

p (xi) where p (x) = 1

p2 2e 2 2x2 ftjs(xjy) = p (x) .

1.2. Example 2: random walk. Let (Wn)n 0 be a white noise (or more generally, a process with independent identically distributed W0; W1; W2; :::). Set

X0 = 0

Xn+1 = Xn+ Wn; n 0:

This is a random walk. White noise has been used as a building block: (Xn)n 0 is the solution of a recursive linear equation, driven by white noise (we shall see more general examples later on). The following picture has been obtained by R software by the commands x<-rnorm(1000); y<-cumsum(x);

ts.plot(y).

The random variables Xnare not independent (Xn+1 obviously depends on Xn). One has Xn+1=

Xn i=0

Wi:

We have the following facts We prove them by means of the iterative relation (this generalizes better to more complex discrete linear equations). First,

0 = 0

n+1= n; n 0 hence n= 0 for every n 0.

By induction, Xn and Wn are independent for every n, hence:

(22)

16 2. STOCHASTIC PROCESSES. GENERALITIES

Exercise 7. Denote by 2 the intensity of the white noise; …nd a relation between n+12 and n2 and prove that

n=p

n ; n 0:

An intuitive interpretation of the result of the exercise is that Xnbehaves as p

n, in a very rough way.

As to the time-dependent structure, C (t; s) = R (t; s), and:

Exercise 8. Prove that R (m; n) = n 2, for all m n (prove it for m = n, m = m + 1, m = n + 2 and extend). Then prove that

(m; n) = rn

m: The result of this exercise implies that

(m; 1) ! 0 as m ! 1:

We may interpret this result by saying that the random walk looses memory of the initial position.

2. Stationary processes

A process is called wide-sense stationary if t and R (t + n; t) are independent of t.

It follows that also t, C (t + n; t) and (t + n; t) are independent of t. Thus we speak of:

i) mean

ii) standard deviation

iii) covariance function C (n) := C (n; 0)

iv) autocorrelation function (in the improper sense described above) R (n) := R (n; 0)

v) autocorrelation coe¢ cient (or also autocorrelation function, in the language of Statistics) (n) := (n; 0) :

(23)

A process is called strongly stationary if the law of the generic vector (Xn1+t; :::; Xnk+t) is inde- pendent of t. This implies wide stationarity. The converse is not true in general, but it is true for Gaussian processes (see below).

2.1. Example: white noise. We have

R (t; s) = 2 (t s) hence

R (n) = 2 (n) :

2.2. Example: linear equation with damping. Consider the recurrence relation Xn+1 = Xn+ Wn; n 0

where (Wn)n 0 is a white noise with intensity 2 and 2 ( 1; 1) :

The following picture has been obtained by R software by the commands ( = 0:9, X0 = 0):

w <- rnorm(1000) x <- rnorm(1000) x[1]=0

for (i in 1:999) {

x[i+1] <- 0.9*x[i] + w[i]

}

ts.plot(x)

It has some features similar to white noise, but less random, more persistent in the direction where it moves.

Let X0 be a r.v. independent of the white noise, with zero average and variance e2. Let us show that (Xn)n 0 is stationary (in the wide sense) if e2 is properly chosen with respect to 2.

(24)

18 2. STOCHASTIC PROCESSES. GENERALITIES

First we have

0 = 0

n+1 = n; n 0

hence n= 0 for every n 0. The mean function is constant.

As a preliminary computation, let us impose that the variance function is constant. By induction, Xn and Wn are independent for every n, hence

2

n+1= 2 2n+ 2; n 0:

If we want n+12 = n2 for every n 0, we need

2

n= 2 2n+ 2; n 0 namely

2 n=

2

1 2; n 0:

In particular, this implies the relation

e2=

2

1 2:

It is here that we …rst see the importance of the condition j j < 1.

If we assume this condition on the law of X0, then we …nd

2 1 = 2

2

1 2 + 2 =

2

1 2 = 20

and so on, n+12 = n2 for every n 0. Thus the variance function is constant.

Finally, we have to show that R (t + n; t) is independent of t. We have R (t + 1; t) = E [( Xt+ Wt) Xt] = n2 =

2

1 2

which is independent of t;

R (t + 2; t) = E [( Xt+1+ Wt+1) Xt] = R (t + 1; t) =

2 2

1 2

and so on,

R (t + n; t) = E [( Xt+n 1+ Wt+n 1) Xt] = R (t + n 1; t)

= ::: = nR (t; t) =

n 2

1 2

which is independent of t. The process is stationary. We have R (n) =

n 2

1 2: It also follows that

(n) = n:

The autocorrelation coe¢ cient (as well as the autocovariance function) decays exponentially in time.

(25)

2.3. Processes de…ned also for negative times. We may extend a little bit the previous de…nitions and call discrete time stochastic process also the two-sided sequences (Xn)n2Z of random variables. Such processes are thus de…ned also for negative time. The idea is that the physical process they represent started in the far past and continues in the future.

This notion is particularly natural in the case of stationary processes. The function R (n) (similarly for C (n) and (n)) are thus de…ned also for negative n:

R (n) = E [XnX0] ; n 2 Z:

By stationarity,

R ( n) = R (n)

because R ( n) = E [X nX0] = E [X n+nX0+n] = E [X0Xn] = R (n). Therefore we see that this extension does not contain so much new information; however it is useful or at least it simpli…es some computation.

3. Time series and empirical quantities

A time series is a sequence or real numbers, x1; :::; xn. Also empirical samples have the same form.

The name time series is appropriate when the index i of xi has the meaning of time.

A …nite realization of a stochastic process is a time series. ideally, when we have an experimental time series, we think that there is a stochastic process behind. Thus we try to apply the theory of stochastic process.

Recall from elementary statistics that empirical estimates of mean values of a single r.v. X are computed from an empirical sample x1; :::; xn of that r.v.; the higher is n, the better is the estimate.

A single sample x1 is not su¢ cient to estimate moments of X.

Similarly, we may hope to compute empirical estimates of R (t; s) etc. from time series. But here, when the stochastic process has special properties (stationary and ergodic, see below the concept of ergodicity), one sample is su¢ cient! By “one sample”we mean one time series (which is one realization of the process, like the single x1 is one realization of the r.v. X). Again, the higher is n, the better is the estimate, but here n refers to the length of the time series.

Consider a time series x1; :::; xn. In the sequel, t and ntare such t + nt= n:

Let us de…ne

xt= 1 nt

nt

X

i=1

xi+t; bt2= 1 nt

nt

X

i=1

(xi+t xt)2 R (t) =b 1

nt nt

X

i=1

xixi+t

C (t) =b 1 nt

nt

X

i=1

(xi x0) (xi+t xt)

b(t) = C (t)b b0bt

=

Pnt

i=1(xi x0) (xi+t xt) qPnt

i=1(xi x0)2Pnt

i=1(xi+t xt)2 :

(26)

20 2. STOCHASTIC PROCESSES. GENERALITIES

These quantities are taken as approximations of

t; t2; R (t; 0) ; C (t; 0) ; (t; 0)

respectively. In the case of stationary processes, they are approximations of

; 2; R (t) ; C (t) ; (t) :

In the section on ergodic theorems we shall see rigorous relations between these empirical and theo- retical functions.

The empirical correlation coe¢ cient bX;Y =

Pn

i=1(xi x) (yi y) qPn

i=1(xi x)2Pn

i=1(yi y)2

between two sequences x1; :::; xn and y1; :::; yn is a measure of their linear similarity. If the there are coe¢ cients a and b such that the residuals

"i = yi (axi+ b)

are small, then jbX;Yj is close to 1; precisely, bX;Y is close to 1 if a > 0, close to -1 if a < 0. A value of bX;Y close to 0 means that no such linear relation is really good (in the sense of small residuals).

Precisely, smallness of residuals must be understood compared to the empirical variancebY2 of y1; :::; yn: one can prove that

b2X;Y = 1 b2"

bY2

(the so called explained variance, the proportion of variance which has been explained by the linear model). After these remarks, the intuitive meaning of bR (t), bC (t) and b(t) should be clear: they measure the linear similarity between the time series and its t-translation. It is useful to detect repetitions, periodicity, trend.

Example 1. Consider the following time series, taken form EUROSTAT database. It collects export data concerning motor vehicles accessories, since January 1995 to December 2008.

Its autocorrelation function b(t) is given by

(27)

We see high values (the values of b(t) are always smaller than 1 in absolute value) for all time lag t. The reason is the trend of the original time series (highly non stationary).

Example 2. If we consider only the last few years of the same time series, precisely January 2005 - December 2008, the data are much more stationary, the trend is less strong. The autocorrelation function b(t) is now given by

where we notice a moderate annual periodicity.

4. Gaussian processes

If the generic vector (Xt1; :::; Xtn) is jointly Gaussian, we say that the process is Gaussian. The law of a Gaussian vector is determined by the mean vector and the covariance matrix. Hence the law of the marginals of a Gaussian process are determined by the mean function t and the autocorrelation function R (t; s).

Proposition 5. For Gaussian processes, stationarity in the wide and strong sense are equivalent.

Proof. Given a Gaussian process (Xn)n2N, the generic vector (Xt1+s; :::; Xtn+s) is Gaussian, hence with law determined by the mean vector of components

E [Xti+s] = ti+s and the covariance matrix of components

Cov Xti+s; Xtj+s = R (ti+ s; tj+ s) ti+s tj+s: If the process is stationary in the wide sense, then ti+s= and

R (ti+ s; tj+ s) ti+s tj+s= R (ti tj) 2

do not depend on s. Then the law of (Xt1+s; :::; Xtn+s) does not depend on s. This means that the process is stationary in the strict sense. The converse is a general fact. The proof is complete.

Most of the models in these notes are obtained by linear transformations of white noise. White noise is a Gaussian process. Linear transformations preserve gaussianity. Hence the resulting processes are

(28)

22 2. STOCHASTIC PROCESSES. GENERALITIES

Gaussian. Since we deal very often with stationary processes in the wide sense, being them Gaussian they also are strictly stationary.

5. Discrete time Fourier transform Given a series (xn)n2Z of real or complex numbers such thatP

n2Zjxnj2 < 1, we denote by bx (!) or by F [x] (!) the discrete time Fourier transform (DTFT) de…ned as

b

x (!) = F [x] (!) = 1 p2

X

n2Z

e i!nxn; ! 2 [0; 2 ] :

The function can be considered for all ! 2 R, but it is 2 -periodic. Sometimes the factor p12 is not included in the de…nition; sometimes, it is preferable to use the variant

b

x (f ) = 1 p2

X

n2Z

e 2 if nxn; f 2 [0; 1] :

We make the choice above, independently of the fact that in certain applications it is customary or convenient to make others. The factor p1

2 is included for symmetry with the inverse transform or the Plancherel formula (without p1

2 , a factor 21 appears in one of them).

The L2-theory of Fourier series guarantees that the seriesP

n2Ze i!nxn converges in mean square with respect to !, namely, there exists a square integrable functionbx (!) such that

N !1lim Z 2

0

X

jnj N

e i!nxn bx (!)

2

d! = 0:

The sequence xn can be reconstructed from its Fourier transform by means of the inverse Fourier transform

xn= 1 p2

Z 2 0

ei!nx (!) d!:b Among other properties, let us mention Plancherel formula

X

n2Z

jxnj2= Z 2

0 jbx (!)j2d!

and the fact that under Fourier transform the convolution corresponds to the product:

F

"

X

n2Z

f ( n) g (n)

#

(!) = bf (!)bg(!) :

When X

n2Z

jxnj < 1 then the seriesP

n2Ze i!nxn is absolutely convergent, uniformly in ! 2 [0; 2 ], simply because X

n2Z

sup

!2[0;2 ]

e i!nxn =X

n2Z

sup

!2[0;2 ]

e i!n jxnj =X

n2Z

jxnj < 1:

(29)

In this case, we may also say that x (!) is a bounded continuous function, not only square inte-b grable. Notice that the assumption P

n2Zjxnj < 1 implies P

n2Zjxnj2 < 1, because P

n2Zjxnj2 supn2ZjxnjP

n2Zjxnj and supn2Zjxnj is bounded when P

n2Zjxnj converges.

One can do the DTFT also for sequences which do not satisfy the assumption P

n2Zjxnj2 < 1, in special cases. Consider for instance the sequence

xn= a sin (!1n) : Compute the truncation

b

x2N(!) = 1 p2

X

jnj N

e i!na sin (!1n) : Recall that

sin t = eit e it 2i : Hence sin (!1n) = ei!1n2ie i!1n

X

jnj N

e i!na sin (!1n) = 1 2i

X

jnj N

e i(! !1)n 1 2i

X

jnj N

e i(!+!1)n:

The next lemma makes use of the concept of generalized function or distribution, which is outside the scope of these notes. We still given the result, to be understood in some intuitive sense. We use the generalized function (t) called delta Dirac, which is characterized by the property

(5.1)

Z 1

1

(t t0) f (t) dt = f (t0)

for all continuous compact support functions f . No usual function has this property. A way to get intuition is the following one. Consider a function n(t) which is equal to zero for t outside 2n1 ;2n1 , interval of length n1 around the origin; and equal to n in 2n1 ;2n1 . Hence (t t0) is equal to zero for t outside t0 1

2n; t0+2n1 , equal to n in t0 1

2n; t0+2n1 . We have Z 1

1

n(t) dt = 1:

Now, Z 1

1

n(t t0) f (t) dt = n

Z t0+2n1 t0 1

2n

f (t) dt

which is the average of f around t0. As n ! 1, this average converges to f (t0) when f is continuous.

Namely. we have

n!1lim Z 1

1

n(t t0) f (t) dt = f (t0)

which is the analog of identity (5.1), but expressed by means of traditional concepts. In a sense, thus, the generalized function (t) is the limit of the traditional functions n(t). But we see that n(t) converges to zero for all t 6= 0, and to 1 for t = 0. So, in a sense, (t) is equal to zero for t 6= 0, and to 1 for t = 0; but this is a very poor information, because it does not allow to deduce identity (5.1) (the way n(t) goes to in…nity is essential, not only the fact that (t) is 1 for t = 0).

(30)

24 2. STOCHASTIC PROCESSES. GENERALITIES

Lemma 2. Denote by (t) the generalized function such that Z 1

1

(t t0) f (t) dt = f (t0)

for all continuous compact support functions f (it is called the delta Dirac distribution). Then

N !1lim X

jnj N

e itn= 2 (t) : From this lemma it follows that

N !1lim X

jnj N

e i!na sin (!1n) =

i (! !1)

i (! + !1) : In other words,

Corollary 1. The sequence

xn= a sin (!1n) has a generalized DTFT

b

x (!) = lim

N !1xb2N(!) = p

p2i( (! !1) (! + !1)) :

This is only one example of the possibility to extend the de…nition and meaning of DTFT outside the assumption P

n2Zjxnj2 < 1. It is also very interesting for the interpretation of the concept of DTFT. If the signal xn has a periodic component (notice that DTFT is linear) with angular frequency

!1, then its DTFT has two symmetric peaks (delta Dirac components) at !1. This way, the DTFT reveals the periodic components of the signal.

Exercise 9. Prove that the sequence

xn= a cos (!1n) has a generalized DTFT

b

x (!) = lim

N !1xb2N(!) = p

p2 ( (! !1) + (! + !1)) : 6. Power spectral density

Given a stationary process (Xn)n2Z with correlation function R (n) = E [XnX0], n 2 Z, we call power spectral density (PSD ) the function

S (!) = 1 p2

X

n2Z

e i!nR (n) ; ! 2 [0; 2 ] : Alternatively, one can use the expression

S (f ) = 1 p2

X

n2Z

e 2 if nR (n) ; f 2 [0; 1]

which produces easier visualizations because we catch more easily the fractions of the interval [0; 1].

(31)

Remark 11. In principle, to be de…ned, this series requiresP

n2ZjR (n)j < 1 or at leastP

n2ZjR (n)j2 <

1. In practice, on a side the convergence may happen also in unexpected cases due to cancellations, on the other side it may be acceptable to use a …nite-time variant, something likeP

jnj Ne i!nR (n), for practical purposes or from the computational viewpoint.

A priori, one may think that S (f ) may be not real valued. However, the function R (n) is non- negative de…nite (this means Pn

i=1R (ti tj) aiaj 0 for all t1; :::; tn and a1; :::; an) and a theorem states that the Fourier transform of non-negative de…nite function is a non-negative function. Thus, at the end, it turns out that S (f ) is real and also non-negative. We do not give the details of this fact here because it will be a consequence of the fundamental theorem below.

6.1. Example: white noise. We have

R (n) = 2 (n) hence

S (!) =

2

p2 ; ! 2 R:

The spectra density is constant. This is the origin of the name, white noise.

6.2. Example: perturbed periodic time series. This example is numeric only. Produce with R software the following time series:

t <- 1:100

y<- sin(t/3)+0.3*rnorm(100) ts.plot(y)

The empirical autocorrelation function, obtained by acf(y), is

and the power spectral density, suitable smoothed, obtained by spectrum(y,span=c(2,3)), is

(32)

26 2. STOCHASTIC PROCESSES. GENERALITIES

6.3. Pink, Brown, Blue, Violet noise. In certain applications one meets PSD of special type which have been given names similarly to white noise. Recall that white noise has a constant PSD.

Pink noise has PSD of the form

S (f ) 1 f: Brown noise:

S (f ) 1 f2: Blue noise

S (f ) f 1 : Violet noise

S (f ) f2 1 :

7. Fundamental theorem on PSD

The following theorem is often stated without assumptions in the applied literature. One of the reasons is that it can be proved under various level of generality, with di¤erent meanings of the limit operation (it is a limit of functions). We shall give a rigorous statement under a very precise assumption on the autocorrelation function R (n); the convergence we prove is rather strong. The assumption is a little bit strange, but satis…ed in all our examples. The assumption is that there exists a sequence ("n)n2N of positive numbers such that

(7.1) lim

n!1"n= 0; X

n2N

jR (n)j

"n < 1:

This is just a little bit more restrictive than the conditionP

n2NjR (n)j < 1 which is natural to impose if we want uniform convergence of p1

2

P

n2Ze i!nR (n) to S (!). Any example of R (n) satisfying P

n2NjR (n)j < 1 that the reader may have in mind, presumably satis…es assumption (7.1) in a easy way.

(33)

Theorem 1 (Wiener-Khinchin). If (X (n))n2Z is a wide-sense stationary process satisfying as- sumption (7.1), then

S (!) = lim

N !1

1

2N + 1E Xb2N(!)2 :

The limit is uniform in ! 2 [0; 2 ]. Here X2N is the truncated process X 1[ N;N ]. In particular, it follows that S (!) is real an non-negative.

Proof. Step 1. Let us prove the following main identity:

(7.2) S (!) = 1

2N + 1E Xb2N(!)2 + rN(!) where the remainder rN is given by

rN(!) = 1 2N + 1F

2

4 X

n2 (N; )

E [X ( + n) X (n)]

3 5 (!)

with

(N; t) = [ N; Nt ) [ (Nt+; N ] Nt+=

8<

:

N if t 0

N t if 0 < t N 0 if t > N Nt =

8<

:

N if t 0

N t if N t < 0 0 if t < N

:

Since R (t) = E [X (t + n) X (n)] for all n, we obviously have, for every T > 0, R (t) = 1

2N + 1 X

jnj N

E [X (t + n) X (n)] : Thus

(7.3) S (!) = bR (!) = 1

2N + 1F Z N

N

E [X ( + n) X (n)] (!) : Then recall that

F

"

X

n2Z

f ( n) g (n)

#

(!) = bf (!)bg (!) hence

F

"

X

n2Z

f ( + n) g (n)

#

(!) = F

"

X

n2Z

f ( n) g ( n)

# (!)

= bf (!)bg ( !) because

F [g ( )] (!) = bg ( !) :

Riferimenti

Documenti correlati

OUR IN-DEPTH ANALYSIS ON THE HDA PROCESS The Hydrodealkylation (HDA) of toluene is a chemical reaction that yields benzene according to the following:.. Toluene +

Therefore, despite the impressive construction, it appears likely that the market of the commodities will not be significantly affected in the long term.. 4 – October

Politecnico di Milano has granted the PSE Journal direct access to the performance of the plants and their connected financial accounts.. This will allow us to

The blockchain represents a revolution of the concept of transactions, as it introduces an open, distributed ledger that records the history of transactions in a

To confirm your attendance and manage your cockpit, visit tiny.cc/pse18w7 The best performing teams have had a really impressive performance, clearly demonstrating not only their

A FINANCIAL ANALYSIS OF THE CONDUCTION OF THE HDA PLANTS – PART 3 We conclude our financial analysis of the HDA plants by performing an analysis of the cash flows in

THE HDA AUTOPILOT – WHAT IS KNOWN SO FAR The Autopilot, a technology presented two weeks ago by Politecnico di Milano, appears to be an almost revolutionary

The editorial team of the Journal is impressed by the incredible overall performance of the contracted teams, which successfully dealt with production issues, formal