• Non ci sono risultati.

or to the case when the r.v

N/A
N/A
Protected

Academic year: 2021

Condividi "or to the case when the r.v"

Copied!
14
0
0

Testo completo

(1)

Stochastic processes. Generalities

1. Discrete time stochastic process

We call discrete time stochastic process any sequence X0; X1; X2; :::; Xn; ::: of random variables de…ned on a probability space ( ; F; P ), taking values in R. This de…nition is not so rigid with respect to small details: the same name is given to sequences X1; X2; :::; Xn; :::, or to the case when the r.v.

Xntake values in a space di¤erent from R. We shall also describe below the case when the time index takes negative values.

The main objects attached to a r.v. are its law, its …rst and second moments (and possibly higher order moments and characteristic or generating function, and the distribution function). We do the same for a process (Xn)n 0: the probability density of the r.v. Xn, when it exists, will be denoted by fn(x), the mean by n, the standard deviation by n. Often, we shall write t in place of n, but nevertheless here t will be always a non-negative integer. So, our …rst concepts are:

i) mean function and variance function:

t= E [Xt] ; t2= V ar [Xt] ; t = 0; 1; 2; :::

In addition, the time-correlation is very important. We introduce three functions:

ii) the autocovariance function C (t; s), t; s = 0; 1; 2; ::::

C (t; s) = Cov (Xt; Xs) = E [(Xt t) (Xs s)]

and the function

R (t; s) = E [XtXs]

(the name will be discussed below). They are symmetric (R (t; s) = R (s; t) and the same for C (t; s)) so it is su¢ cient to know them for t s. We have

C (t; s) = R (t; s) t s; C (t; t) = t2:

In particular, when t 0 (which is often the case), C (t; s) = R (t; s). Most of the importance will be given to t and R (t; s). In addition, let us introduce:

iii) the autocorrelation function

(t; s) = Corr (Xt; Xs) = C (t; s)

t s

We have

(t; t) = 1; j (t; s)j 1:

i

(2)

The functions C (t; s), R (t; s), (t; s) are used to detect repetitions in the process, self-similarities under time shift. For instance, if (Xn)n 0 is roughly periodic of period P , (t + P; t) will be signi…- cantly higher than the other values of (t; s) (except (t; t) which is always equal to 1). Also a trend is a form of repetitions, self-similarity under time shift, and indeed when there is a trend all values of

(t; s) are quite high, compared to the cases without trend. See the numerical example below.

Other objects (when de…ned) related to the time structure are:

iv) the joint probability density

ft1;:::;tn(x1; :::; xn) ; ; tn ::: t1

of the vector (Xt1; :::; Xtn) and v) the conditional density

ftjs(xjy) = ft;s(x; y)

fs(y) ; t > s:

Now, a remark about the name of R (t; s). In Statistics and Time Series Analysis, the name autocorrelation function is given to (t; s), as we said above. But in certain disciplines related to signal processing, R (t; s) is called autocorrelation function. There is no special reason except the fact that R (t; s) is the fundamental quantity to be understood and investigated, the others (C (t; s) and (t; s)) being simple transformations of R (t; s). Thus R (t; s) is given the name which mostly reminds the concept of self-relation between values of the process at di¤erent times. In the sequel we shall use both languages and sometimes we shall call (t; s) the autocorrelation coe¢ cient.

The last object we introduce is concerned with two processes simultaneously: (Xn)n 0and (Yn)n 0. It is called:

vi) cross-correlation function

CX;Y (t; s) = E [(Xt E [Xt]) (Ys E [Ys])] :

This function is a measure of the similarity between two processes, shifted in time. For instance, it can be used for the following purpose: one of the two processes, say Y , is known, has a known shape of interest for us, the other process, X, is the process under investigation, and we would like to detect portions of X which have a shape similar to Y . Hence we shift X in all possible ways and compute the correlation with Y .

When more than one process is investigated, it may be better to write RX(t; s), CX(t; s) and so on for the quantities associated to process X.

1.1. Example 1: white noise. The white noise with intensity 2 is the process (Xn)n 0 with the following properties:

i) X0; X1; X2; :::; Xn; ::: are independent r.v.’s ii) Xn N 0; 2 .

It is a very elementary process, with a trivial time-structure, but it will be used as a building block for other classes of processes, or as a comparison object to understand the features of more complex cases. The following picture has been obtained by R software by the commands x<-rnorm(1000);

ts.plot(x).

(3)

1. DISCRETE TIME STOCHASTIC PROCESS iii

Let us compute all its relevant quantities (the check is left as an exercise):

t= 0 t2 = 2 R (t; s) = C (t; s) = 2 (t s) where the symbol (t s) denotes 0 for t 6= s, 1 for t = s,

(t; s) = (t s) ft1;:::;tn(x1; :::; xn) =

Yn i=1

p (xi) where p (x) = 1

p2 2e 2 2x2 ftjs(xjy) = p (x) .

1.2. Example 2: random walk. Let (Wn)n 0 be a white noise (or more generally, a process with independent identically distributed W0; W1; W2; :::). Set

X0 = 0

Xn+1 = Xn+ Wn; n 0:

This is a random walk. White noise has been used as a building block: (Xn)n 0 is the solution of a recursive linear equation, driven by white noise (we shall see more general examples later on). The following picture has been obtained by R software by the commands x<-rnorm(1000); y<-cumsum(x);

ts.plot(y).

The random variables Xnare not independent (Xn+1 obviously depends on Xn). One has Xn+1=

Xn i=0

Wi:

We have the following facts We prove them by means of the iterative relation (this generalizes better to more complex discrete linear equations). First,

0 = 0

n+1= n; n 0 hence n= 0 for every n 0.

By induction, Xn and Wn are independent for every n, hence:

(4)

Exercise 1. Denote by 2 the intensity of the white noise; …nd a relation between n+12 and n2 and prove that

n=p

n ; n 0:

An intuitive interpretation of the result of the exercise is that Xnbehaves as p

n, in a very rough way.

As to the time-dependent structure, C (t; s) = R (t; s), and:

Exercise 2. Prove that R (m; n) = n 2, for all m n (prove it for m = n, m = m + 1, m = n + 2 and extend). Then prove that

(m; n) = rn

m: The result of this exercise implies that

(m; 1) ! 0 as m ! 1:

We may interpret this result by saying that the random walk looses memory of the initial position.

2. Stationary processes

A process is called wide-sense stationary if t and R (t + n; t) are independent of t.

It follows that also t, C (t + n; t) and (t + n; t) are independent of t. Thus we speak of:

i) mean

ii) standard deviation

iii) covariance function C (n) := C (n; 0)

iv) autocorrelation function (in the improper sense described above) R (n) := R (n; 0)

v) autocorrelation coe¢ cient (or also autocorrelation function, in the language of Statistics) (n) := (n; 0) :

(5)

2. STATIONARY PROCESSES v

A process is called strongly stationary if the law of the generic vector (Xn1+t; :::; Xnk+t) is inde- pendent of t. This implies wide stationarity. The converse is not true in general, but it is true for Gaussian processes (see below).

2.1. Example: white noise. We have

R (t; s) = 2 (t s) hence

R (n) = 2 (n) :

2.2. Example: linear equation with damping. Consider the recurrence relation Xn+1 = Xn+ Wn; n 0

where (Wn)n 0 is a white noise with intensity 2 and 2 ( 1; 1) :

The following picture has been obtained by R software by the commands ( = 0:9, X0 = 0):

w <- rnorm(1000) x <- rnorm(1000) x[1]=0

for (i in 1:999) {

x[i+1] <- 0.9*x[i] + w[i]

}

ts.plot(x)

It has some features similar to white noise, but less random, more persistent in the direction where it moves.

Let X0 be a r.v. independent of the white noise, with zero average and variance e2. Let us show that (Xn)n 0 is stationary (in the wide sense) if e2 is properly chosen with respect to 2.

(6)

First we have

0 = 0

n+1 = n; n 0

hence n= 0 for every n 0. The mean function is constant.

As a preliminary computation, let us impose that the variance function is constant. By induction, Xn and Wn are independent for every n, hence

2

n+1= 2 2n+ 2; n 0:

If we want n+12 = n2 for every n 0, we need

2

n= 2 2n+ 2; n 0 namely

2 n=

2

1 2; n 0:

In particular, this implies the relation

e2=

2

1 2:

It is here that we …rst see the importance of the condition j j < 1.

If we assume this condition on the law of X0, then we …nd

2 1 = 2

2

1 2 + 2 =

2

1 2 = 20

and so on, n+12 = n2 for every n 0. Thus the variance function is constant.

Finally, we have to show that R (t + n; t) is independent of t. We have R (t + 1; t) = E [( Xt+ Wt) Xt] = n2 =

2

1 2

which is independent of t;

R (t + 2; t) = E [( Xt+1+ Wt+1) Xt] = R (t + 1; t) =

2 2

1 2

and so on,

R (t + n; t) = E [( Xt+n 1+ Wt+n 1) Xt] = R (t + n 1; t)

= ::: = nR (t; t) =

n 2

1 2

which is independent of t. The process is stationary. We have R (n) =

n 2

1 2: It also follows that

(n) = n:

The autocorrelation coe¢ cient (as well as the autocovariance function) decays exponentially in time.

(7)

3. TIME SERIES AND EMPIRICAL QUANTITIES vii

2.3. Processes de…ned also for negative times. We may extend a little bit the previous de…nitions and call discrete time stochastic process also the two-sided sequences (Xn)n2Z of random variables. Such processes are thus de…ned also for negative time. The idea is that the physical process they represent started in the far past and continues in the future.

This notion is particularly natural in the case of stationary processes. The function R (n) (similarly for C (n) and (n)) are thus de…ned also for negative n:

R (n) = E [XnX0] ; n 2 Z:

By stationarity,

R ( n) = R (n)

because R ( n) = E [X nX0] = E [X n+nX0+n] = E [X0Xn] = R (n). Therefore we see that this extension does not contain so much new information; however it is useful or at least it simpli…es some computation.

3. Time series and empirical quantities

A time series is a sequence or real numbers, x1; :::; xn. Also empirical samples have the same form.

The name time series is appropriate when the index i of xi has the meaning of time.

A …nite realization of a stochastic process is a time series. ideally, when we have an experimental time series, we think that there is a stochastic process behind. Thus we try to apply the theory of stochastic process.

Recall from elementary statistics that empirical estimates of mean values of a single r.v. X are computed from an empirical sample x1; :::; xn of that r.v.; the higher is n, the better is the estimate.

A single sample x1 is not su¢ cient to estimate moments of X.

Similarly, we may hope to compute empirical estimates of R (t; s) etc. from time series. But here, when the stochastic process has special properties (stationary and ergodic, see below the concept of ergodicity), one sample is su¢ cient! By “one sample”we mean one time series (which is one realization of the process, like the single x1 is one realization of the r.v. X). Again, the higher is n, the better is the estimate, but here n refers to the length of the time series.

Consider a time series x1; :::; xn. In the sequel, t and ntare such t + nt= n:

Let us de…ne

xt= 1 nt

nt

X

i=1

xi+t; bt2= 1 nt

nt

X

i=1

(xi+t xt)2 R (t) =b 1

nt nt

X

i=1

xixi+t

C (t) =b 1 nt

nt

X

i=1

(xi x0) (xi+t xt)

b(t) = C (t)b b0bt

=

Pnt

i=1(xi x0) (xi+t xt) qPnt

i=1(xi x0)2Pnt

i=1(xi+t xt)2 :

(8)

These quantities are taken as approximations of

t; t2; R (t; 0) ; C (t; 0) ; (t; 0)

respectively. In the case of stationary processes, they are approximations of

; 2; R (t) ; C (t) ; (t) :

In the section on ergodic theorems we shall see rigorous relations between these empirical and theo- retical functions.

The empirical correlation coe¢ cient bX;Y =

Pn

i=1(xi x) (yi y) qPn

i=1(xi x)2Pn

i=1(yi y)2

between two sequences x1; :::; xn and y1; :::; yn is a measure of their linear similarity. If the there are coe¢ cients a and b such that the residuals

"i = yi (axi+ b)

are small, then jbX;Yj is close to 1; precisely, bX;Y is close to 1 if a > 0, close to -1 if a < 0. A value of bX;Y close to 0 means that no such linear relation is really good (in the sense of small residuals).

Precisely, smallness of residuals must be understood compared to the empirical variancebY2 of y1; :::; yn: one can prove that

b2X;Y = 1 b2"

bY2

(the so called explained variance, the proportion of variance which has been explained by the linear model). After these remarks, the intuitive meaning of bR (t), bC (t) and b(t) should be clear: they measure the linear similarity between the time series and its t-translation. It is useful to detect repetitions, periodicity, trend.

Example 1. Consider the following time series, taken form EUROSTAT database. It collects export data concerning motor vehicles accessories, since January 1995 to December 2008.

Its autocorrelation function b(t) is given by

(9)

4. GAUSSIAN PROCESSES ix

We see high values (the values of b(t) are always smaller than 1 in absolute value) for all time lag t. The reason is the trend of the original time series (highly non stationary).

Example 2. If we consider only the last few years of the same time series, precisely January 2005 - December 2008, the data are much more stationary, the trend is less strong. The autocorrelation function b(t) is now given by

where we notice a moderate annual periodicity.

4. Gaussian processes

If the generic vector (Xt1; :::; Xtn) is jointly Gaussian, we say that the process is Gaussian. The law of a Gaussian vector is determined by the mean vector and the covariance matrix. Hence the law of the marginals of a Gaussian process are determined by the mean function t and the autocorrelation function R (t; s).

Proposition 1. For Gaussian processes, stationarity in the wide and strong sense are equivalent.

Proof. Given a Gaussian process (Xn)n2N, the generic vector (Xt1+s; :::; Xtn+s) is Gaussian, hence with law determined by the mean vector of components

E [Xti+s] = ti+s and the covariance matrix of components

Cov Xti+s; Xtj+s = R (ti+ s; tj+ s) ti+s tj+s: If the process is stationary in the wide sense, then ti+s= and

R (ti+ s; tj+ s) ti+s tj+s= R (ti tj) 2

do not depend on s. Then the law of (Xt1+s; :::; Xtn+s) does not depend on s. This means that the process is stationary in the strict sense. The converse is a general fact. The proof is complete.

Most of the models in these notes are obtained by linear transformations of white noise. White noise is a Gaussian process. Linear transformations preserve gaussianity. Hence the resulting processes are

(10)

Gaussian. Since we deal very often with stationary processes in the wide sense, being them Gaussian they also are strictly stationary.

5. An ergodic theorem

There exist several versions of ergodic theorems. The simplest one is the Law of Large Numbers.

Let us recall it in its simplest version, with convergence in mean square.

Proposition 2. If (Xn)n 1 is a sequence of uncorrelated r.v. (Cov (Xi; Xj) = 0 for all i 6= j), with …nite and equal mean and variance 2, then n1Pn

i=1Xi converges to in mean square:

n!1lim E 2 4 1

n Xn i=1

Xi

23 5 = 0:

It also converges in probability.

Proof.

1 n

Xn i=1

Xi = 1

n Xn i=1

(Xi ) hence

1 n

Xn i=1

Xi

2

= 1 n2

Xn i;j=1

(Xi ) (Xj )

E 2 4 1

n Xn

i=1

Xi

23 5 = 1

n2 Xn i;j=1

E [(Xi ) (Xj )] = 1 n2

Xn i;j=1

Cov (Xi; Xj)

= 1 n2

Xn i;j=1

2 ij =

2

n ! 0:

Recall that Chebyshev inequality states (in this particular case)

P 1

n Xn

i=1

Xi > "

! E

h 1 n

Pn i=1Xi

2i

"2

for every " > 0. Hence, from the computation of the previous proof we deduce

P 1

n Xn i=1

Xi > "

! 2

"2 n:

In itself, this is an interesting estimate on the probability that the sample average n1Pn

i=1Xi di¤ers from more than ". It follows that

n!1lim P 1 n

Xn i=1

Xi > "

!

= 0

(11)

5. AN ERGODIC THEOREM xi

for every " > 0. This is the convergence in probability of n1 Pn

i=1Xi to .

Remark 1. Often this theorem is stated only in the particular case when the r.v. Xi are indepen- dent and identically distributed, with …nite second moment. We see that the proof is very easy under much more general assumptions.

Remark 2. It can be generalized to the following set of assumptions: (Xn)n 1 is a sequence of uncorrelated r.v.; the moments n= E [Xn] and n2 = V ar [Xn] satisfy

n!1lim 1 n

Xn i=1

i= ; n2 2 for every n 2 N

for some …nite constants and 2. Under these assumptions, the same results hold.

We have written the proof, very classical, so that the proof of the following lemma is obvious.

Lemma 1. Let (Xn)n 1 be a sequence of r.v. with …nite second moments and equal mean . Assume that

(5.1) lim

n!1

1 n2

Xn i;j=1

Cov (Xi; Xj) = 0:

Then n1Pn

i=1Xi converges to in mean square and in probability.

The lemma will be useful if we detect interesting su¢ cient conditions for (5.1). Here is our main ergodic theorem. Usually by the name ergodic theorem one means a theorem which states that the time-averages of a process converge to a deterministic value (the mean of the process, in the stationary case).

Theorem 1. Assume that (Xn)n 1 is a wide sense stationary process (this ensures in particular that (Xn)n 1 is a sequence of r.v. with …nite second moments and equal mean ). If

n!1lim C (n) = 0 then 1nPn

i=1Xi converges to in mean square and in probability.

Proof. Since Cov (Xi; Xj) = Cov (Xj; Xi), we have Xn

i;j=1

Cov (Xi; Xj)

Xn i;j=1

jCov (Xi; Xj)j 2 Xn

i=1

Xi j=1

jCov (Xi; Xj)j

so it is su¢ cient to prove that

n!1lim 1 n2

Xn i=1

Xi j=1

jCov (Xi; Xj)j = 0:

Since the process is stationary, Cov (Xi; Xj) = C (i j) so we have to prove limn!1n12

Pn i=1

Pi

j=1jC (i j)j = 0. But

Xn i=1

Xi j=1

jC (i j)j = Xn

i=1

Xi 1 k=0

jC (k)j

(12)

= jC (0)j + (jC (0)j + jC (1)j) + (jC (0)j + jC (1)j + jC (2)j) + ::: + (jC (0)j + ::: + jC (n 1)j)

= n jC (0)j + (n 1) jC (1)j + (n 2) jC (2)j + ::: + jC (n 1)j

=

n 1X

k=0

(n k) jC (k)j n

n 1X

k=0

jC (k)j :

Therefore it is su¢ cient to prove limn!1n1Pn 1

k=0jC (k)j = 0. If limn!1C (n) = 0, for every " > 0 there is n0 such that for all n n0 we have jC (n)j ". Hence, for n n0,

1 n

n 1X

k=0

jC (k)j 1 n

nX0 1 k=0

jC (k)j + 1 n

n 1X

k=n0

" 1 n

nX0 1 k=0

jC (k)j + ":

SincePn0 1

k=0 jC (k)j is independent of n, there is n1 n0 such that for all n n1

1 n

nX0 1 k=0

jC (k)j ":

Therefore, for all n n1,

1 n

n 1X

k=0

jC (k)j 2":

This means that limn!1 1nPn 1

k=0jC (k)j = 0. The proof is complete.

5.1. Rate of convergence. Concerning the rate of convergence, recall from the proof of the LLG that

E 2 4 1

n Xn

i=1

Xi

23

5 2

n:

We can reach the same result in the case of the ergodic theorem, under a suitable assumption.

Proposition 3. If (Xn)n 1 is a wide sense stationary process such that

:=

X1 k=0

jC (k)j < 1

(this implies limn!1C (n) = 0) then

E 2 4 1

n Xn i=1

Xi

23

5 2

n :

(13)

5. AN ERGODIC THEOREM xiii

Proof. It is su¢ cient to put together several steps of the previous proof:

E 2 4 1

n Xn i=1

Xi

23 5 = 1

n2 Xn i;j=1

Cov (Xi; Xj) 2 n2

Xn i=1

Xi j=1

jCov (Xi; Xj)j

2 n

n 1X

k=0

jC (k)j 2 n : The proof is complete.

Notice that the assumptions of these two ergodic results (especially the ergodic theorem) are very general and always satis…ed in our examples.

5.2. Empirical autocorrelation function. Very often we need the convergence of time averages of certain functions of the process: we would like to have

1 n

Xn i=1

g (Xi) ! g

in mean square, for certain functions g. We need to check the assumptions of the ergodic theorem for the sequence (g (Xn))n 1. Here is a simple example.

Proposition 4. Let (Xn)n 0 be a wide sense stationary process, with …nite fourth moments, such that E Xn2Xn+k2 is independent of n and

k!1lim E X02Xk2 = E X02 2: In other words, we assume that

k!1lim Cov X02; Xk2 = 0:

Then n1Pn

i=1Xi2 converges to E X12 in mean square and in probability.

Proof. Consider the process Yn= Xn2. The mean function of (Yn) is E Xn2 which is independent of n by the wide-sense stationarity of (Xn). For the autocorrelation function

C (n; n + k) = E [YnYn+k] E Xn2 2= E Xn2Xn+k2 E Xn2 2

we need the new assumption of the proposition. Thus (Yn) is wide-sense stationary. Finally, from the assumption limk!1E X02Xk2 = E X02 2, which means limk!1CY (k) = 0 where CY (k) is the autocorrelation function of (Yn), we can apply the ergodic theorem. The proof is complete.

More remarkable is the following result, related to the estimation of R (n) by sample path autocor- relation function. Given a process (Xn)n 1, call sample path (or empirical) autocorrelation function the process

1 n

Xn i=1

XiXi+k:

(14)

Theorem 2. Let (Xn)n 0 be a wide sense stationary process, with …nite fourth moments, such that E [XnXn+kXn+jXn+j+k] is independent of n and

j!1lim E [X0XkXjXj+k] = E [X0Xk]2 for every k = 0; 1; :::

In other words, we assume that

j!1lim Cov (X0Xk; XjXj+k) = 0:

Then the sample path autocorrelation function n1 Pn

i=1XiXi+k converges to R (k) as n ! 1 in mean square and in probability. Precisely, for every k 2 N, we have

n!1lim E 2 4 1

n Xn i=1

XiXi+k R (k)

23 5 = 0

and similarly for the convergence in probability.

Proof. Given k 2 N, consider the new process Yn= XnXn+k. Its mean function is constant in n because of the wide-sense stationarity of (Xn). For the autocorrelation function,

RY (n; n + j) = E [YnYn+j] = E [XnXn+kXn+jXn+j+k]

it is independent of n by assumption. Moreover, CY (j) converges to zero. Thus it is su¢ cient to apply the ergodic theorem, with the remark that E [Y0] = R (k). The proof is complete.

With similar proof one can obtain other results of the same type. Notice that the additional assumptions that E Xn2Xn+k2 and E [XnXn+kXn+jXn+j+k] are independent of n are a consequence of the assumption that (Xn) is stationary in the strict sense. Thus, stationarity in the strict sense can be put as an assumption for several ergodic statements. Recall that wide-sense stationarity plus gaussianity implies strict-sense stationarity.

Remark 3. There is no way to check ergodic assumptions in applications. Thus one should believe they hold true. The intuition behind this faith is that they are true when the random variables Xn, for very large n, become roughly independent of the random variables Xi with small i. In the ideal (never true) case when Xn would be exactly independent of Xi, we would have Cov (X0Xk; XnXn+k) = 0 for given k and very large n, and so on. Therefore, when we think that the stochastic process of a certain application looses memory, tend to assume values independent from those at the beginning, as time goes to in…nity, then we believe the assumptions of the ergodic theorems are satis…ed.

Riferimenti

Documenti correlati

We note that even though the open sets U n constructed in the previous proof are finite unions of rational open balls, the algorithm does not provide a corresponding rational cover in

Ac- cording to the so-called amyloid cascade hypothesis, we assume that the progression of AD is associated with the presence of soluble toxic oligomers of beta-amyloid.. Monomers

Beside revealing several hDEGs involved in synaptic transmission and immune system function, in line with previous studies on animal and human prion diseases [7], our

Using a cluster expansion technique it was proved that the quenched free energy, the one describing the peculiar structure of the spin glass, and the annealed one coincide in

Thus, the Integrated QFD-AHP model evaluated the Spatial Displays (which are supposed to coincide with the Control Tower windows), as the preferred technology solution since they

Cela ne revient, bien évidemment, qu’à l’illusion : la nourriture que l’on trouve dans ces restaurants n’a généralement presque rien à voir avec la

Keywords: stochastic approximation, asymptotic distribution, Bayesian Nonparametrics, Pitman–Yor process, random functionals, random probability measure, stopping rule..

Using the variation in the number of relocated mafia members according to destination province, I estimate the impact of such relocation on the incidence of crime and homicides,