• Non ci sono risultati.

Preliminaries of Probability

N/A
N/A
Protected

Academic year: 2021

Condividi "Preliminaries of Probability"

Copied!
10
0
0

Testo completo

(1)

Preliminaries of Probability

1. Transformation of densities

Exercise 1. If X has cdf FX(x) and g is increasing and continuous, then Y = g (X) has cdf FY (y) = FX g 1(y)

for all y in the image of y. If g is decreasing and continuous, the formula is FY (y) = 1 FX g 1(y)

Exercise 2. If X has continuous pdf fX(x) and g is increasing and di¤ erentiable, then Y = g (X) has pdf

fY (y) = fX g 1(y)

g0(g 1(y)) = fX(x) g0(x) y=g(x)

for all y in the image of y. If g is decreasing and di¤ erentiable, the formula is fY (y) = fX(x)

g0(x) y=g(x): Thus, in general, we have the following result.

Proposition 1. If g is monotone and di¤ erentiable, the transformation of densities is given by fY (y) = fX(x)

jg0(x)j y=g(x)

Remark 1. Under proper assumptions, when g is not injective the formula generalizes to fY (y) = X

x:y=g(x)

fX(x) jg0(x)j:

Remark 2. A second proof of the previous formula comes from the following characterization of the density: f is the density of X if and only if

E [h (X)] = Z

R

h (x) f (x) dx

for all continuous bounded functions h. Let us use this fact to prove that fY (y) = fjgX0(x)

(x)j y=g(x) is the density of Y = g (X). Let us compute E [h (Y )] for a generic continuous bounded functions h. We

i

(2)

ii 1. PRELIMINARIES OF PROBABILITY

have, from the de…nition of Y and from the characterization applied to X, E [h (Y )] = E [h (g (X))] =

Z

R

h (g (x)) f (x) dx:

Let us change variable y = g (x), under the assumption that g is monotone, bijective and di¤ erentiable.

We have x = g 1(y), dx = 1

jg0(g 1(y))jdy (we put the absolute value since we do not change the extreme of integration, but just rewrite R

R) so that Z

R

h (g (x)) f (x) dx = Z

R

h (y) f g 1(y) 1

jg0(g 1(y))jdy:

If we set fY (y) := jgfX0(x)

(x)j y=g(x) we have proved that E [h (Y )] =

Z

R

h (y) fY (y) dy

for every continuous bounded functions h. By the characterization, this implies that fY (y) is the density of Y . This proof is thus based on the change of variable formula.

Remark 3. The same proof works in the multidimensional case, using the change of variable formula for multiple integrals. Recall that in place of dy = g0(x)dx one has to use dy = jdet Dg (x)j dx where Dg is the Jacobian (the matrix of …rst derivatives) of the transformation g : Rn! Rn. In fact we need the inverse transformation, so we use the corresponding formula

dx = det Dg 1(y) dy = 1

jdet Dg (g 1(y))jdy:

With the same passages performed above, one gets the following result.

Proposition 2. If g is a di¤ erentiable bijection and Y = g (X), then fY (y) = fX(x)

jdet Dg (x)j y=g(x):

Exercise 3. If X (in Rn) has density fX(x) and Y = U X, where U is an orthogonal linear transformation of Rn (it means that U 1 = UT), then Y has density

fY (y) = fX UTy :

1.1. Linear transformation of moments. The solution of the following exercises is based on the linearity of expected value (and thus of covariance in each argument).

Exercise 4. Let X = (X1; :::; Xn) be a random vector, A be a n d matrix, Y = AX. Let

X = X1 ; :::; Xn be the vector of mean values of X, namely Xi = E [Xi]. Then

Y := A X is the vector of mean values of Y , namely Yi = E [Yi].

Exercise 5. Under the same assumptions, if QX and QY are the covariance matrices of X and Y , then

QY = AQXAT:

(3)

2. About covariance matrices

The covariance matrix Q of a vector X = (X1; :::; Xn), de…ned as Qij = Cov (Xi; Xj), is symmetric:

Qij = Cov (Xi; Xj) = Cov (Xj; Xi) = Qji and non-negative de…nite:

xTQx = Xn i;j=1

Qijxixj = Xn i;j=1

Cov (Xi; Xj) xixj = Xn i;j=1

Cov (xiXi; xjXj)

= Cov 0

@ Xn i=1

xiXi; Xn j=1

xjXj

1

A = V ar [W ]

where W =Pn

i=1xiXi.

The spectral theorem states that any symmetric matrix Q can be diagonalized, namely it exists a orthonormal basis e1; :::; en of Rn where Q takes the form

Qe= 0

@ 1 0 0

0 ::: 0

0 0 n

1 A :

Moreover, the numbers i are eigenvalues of Q, and the vectors ei are corresponding eigenvectors.

Since the covariance matrix Q is also non-negative de…nite, we have

i 0; i = 1; :::; n:

Remark 4. To understand better this theorem, recall a few facts of linear algebra. Rn is a vector space with a scalar product h:; :i, namely a set of elements (called vectors) with certain operations (sum of vectors, multiplication by real numbers, scalar product between vectors) and properties. We may call intrinsic the objects de…ned in these terms, opposite to the objects de…ned by means of numbers, with respect to a given basis. A vector x 2 Rn is an intrinsic object; but we can write it as a sequence of numbers (x1; :::; xn) in in…nitely many ways, depending on the basis we choose. Given an orthonormal basis u1; :::; un, the components of a vector x 2 Rn in this basis are the numbers hx; uji, j = 1; :::; n. A linear map L in Rn, given the basis u1; :::; un, can be represented by a matrix of components hLui; uji.

We shall write yT x for hx; yi (or hy; xi).

Remark 5. After these general comments, we see that a matrix represents a linear transformation, given a basis. Thus, given the canonical basis of Rn, that we shall denote by u1; :::; un, given the matrix Q, it is de…ned a linear transformation L from Rn to a Rn. The spectral theorem states that there is a new orthonormal basis e1; :::; en of Rn such that, if Qe represents the linear transformation L in this new basis, then Qe is diagonal.

Remark 6. Let us recall more facts about linear algebra. Start with an orthonormal basis u1; :::; un, that we call canonical or original basis. Let e1; :::; en be another orthonormal basis. The vector u1, in

(4)

iv 1. PRELIMINARIES OF PROBABILITY

the canonical basis, has components

u1= 0 BB

@ 1 0 :::

0 1 CC A

and so on for the other vectors. Each vector ej has certain components. Denote by U the matrix such that its …rst column has the same components as e1 (those of the canonical basis), and so on for the other columns. We could write U = (e1; :::; en). Also, Uij = eTj ui. Then

U 0 BB

@ 1 0 :::

0 1 CC A = e1

and so on, namely U represents the linear map which maps the canonical (original) basis of Rn into e1; :::; en. This is an orthogonal transformation:

U 1 = UT:

Indeed, U 1 maps e1; :::; en into the canonical basis (by the above property of U ), and UT does the same:

UTe1 = 0 BB

@

eT1 e1 eT2 e1

:::

eTn e1 1 CC A =

0 BB

@ 1 0 :::

0 1 CC A

and so on.

Remark 7. Let us now go back to the covariance matrix Q and the matrix Qegiven by the spectral theorem: Qe is a diagonal matrix which represents the same linear transformation L in a new basis e1; :::; en. Assume we do not know anything else, except they describe the same map L and Qe is diagonal, namely of the form

Qe= 0

@ 1 0 0

0 ::: 0

0 0 n

1 A :

Let us deduce a number of facts:

i) from basic linear algebra we know the relation Qe= UTQU

ii) the diagonal elements j are eigenvalues of L, with eigenvectors ej

iii) j 0, j = 1; :::; n.

(5)

To prove (ii), let us write the vector Le1 in the basis e1; :::; en: ei is the vector 0 BB

@ 1 0 :::

0 1 CC

A, the map L is represented by Qe, hence Le1 is equal to

Qe 0 BB

@ 1 0 :::

0 1 CC A =

0 BB

@

1

0 :::

0 1 CC A = 1

0 BB

@ 1 0 :::

0 1 CC A

which is 1e1 in the basis e1; :::; en. We have checked that Le1 = 1e1, namely that 1 is an eigenvalue and e1 is a corresponding eigenvector. The proof for 2, etc. is the same. To prove (iii), just see that, in the basis e1; :::; en,

eTjQeej = j: But

eTjQeej = eTjUTQU ej = vTQv 0

where v = U ej, having used the property that Q is non-negative de…nite. Hence j 0.

3. Gaussian vectors

Recall that a Gaussian, or Normal, r.v. N ; 2 is a r.v. with probability density f (x) = 1

p2 2exp jx j2 2 2

! :

We have shown that is the mean value and 2 the variance. The standard Normal is the case = 0,

2= 1. If Z is a standard normal r.v., then + Z is N ; 2 .

We may give the de…nition of Gaussian vector in two ways, generalizing either the expression of the density or the property that + Z is N ; 2 . Let us start with a lemma.

Lemma 1. Given a vector = ( 1; :::; n) and a symmetric positive de…nite n n matrix Q (namely vTQv > 0 for all v 6= 0), consider the function

f (x) = 1

p(2 )ndet(Q)exp (x )T Q 1(x ) 2

!

where x = (x1; :::; xn) 2 Rn. Notice that the inverse Q 1 is well de…ned for positive de…nite matrices, (x )TQ 1(x ) is a positive quantity, det(Q) is a positive number. Then:

i) f (x) is a probability density;

ii) if X = (X1; :::; Xn) is a random vector with such joint probability density, then is the vector of mean values, namely

i= E [Xi] and Q is the covariance matrix:

Qij = Cov (Xi; Xj) :

(6)

vi 1. PRELIMINARIES OF PROBABILITY

Proof. Step 1. In this step we explain the meaning of the expression f (x). We have recalled above that any symmetric matrix Q can be diagonalized, namely it exists a orthonormal basis e1; :::; en of Rn where Q takes the form

Qe= 0

@ 1 0 0

0 ::: 0

0 0 n

1 A :

Moreover, the numbers i are eigenvalues of Q, and the vectors ei are corresponding eigenvectors.

See above for more details. Let U be the matrix introduced there, such that U 1 = UT. Recall the relation Qe= UTQU .

Since vTQv > 0 for all v 6= 0, we deduce

vTQev = vTUT Q (U v) > 0 for all v 6= 0 (since Uv 6= 0). Taking v = ei, we get i> 0.

Therefore the matrix Qe is invertible, with inverse given by Qe1 =

0

@

1

1 0 0

0 ::: 0

0 0 n1

1 A :

It follows that also Q, being equal to U QeUT (the relation Q = U QeUT comes from Qe = UTQU ), is invertible, with inverse Q 1 = U Qe1UT. Easily one gets (x )T Q 1(x ) > 0 for x 6= . Moreover,

det(Q) = det (U ) det (Qe) det UT = 1 n

because

det(Qe) = 1 n

and det (U ) = 1. The latter property comes from

1 = det I = det UTU = det UT det (U ) = det (U )2

(to be used in exercise 3). Therefore det(Q) > 0. The formula for f (x) is meaningful and de…nes a positive function.

Step 2. Let us prove that f (x) is a density. By the theorem of change of variables in multidi- mensional integrals, with the change of variables x = U y,

Z

Rn

f (x) dx = Z

Rn

f (U y) dy

because jdet Uj = 1 (and the Jacobian of a linear transformation is the linear map itself). Now, since UTQ 1U = Qe1, f (U y) is equal to the following function:

fe(y) = 1

p(2 )ndet(Qe)exp (y e)TQe1(y e) 2

!

where

e= UT :

(7)

Since

(y e)T Qe1(y e) = Xn i=1

(yi ( e)i)2

i

and det(Qe) = 1 n, we get fe(y) =

Yn i=1

p1 2 i

exp (yi ( e)i)2 2 i

! :

Namely, fe(y) is the product of n Gaussian densities N (( e)i; i). We know from the theory of joint probability densities that the product of densities is the joint density of a vector with indepen- dent components. Hence fe(y) is a probability density. Therefore R

Rnfe(y) dy = 1. This proves R

Rnf (x) dx = 1, so that f is a probability density.

Step 3. Let X = (X1; :::; Xn) be a random vector with joint probability density f , when written in the original basis. Let Y = UTX. Then (exercise 3) Y has density fY (y) given by fY (y) = f (U y).

Thus

fY (y) = fe(y) = Yn i=1

p1

2 iexp (yi ( e)i)2 2 i

! : Thus (Y1; :::; Yn) are independent N (( e)i; i) r.v. and therefore

E [Yi] = ( e)i; Cov (Yi; Yj) = ij i: From exercises 4 and 5 we deduce that X = U Y has mean

X = U Y and covariance

QX = U QYUT:

Since Y = e and e = UT we readily deduce X = U UT = . Since QY = Qe and Q = U QeUT we get QX = Q. The proof is complete.

Definition 1. Given a vector = ( 1; :::; n) and a symmetric positive de…nite n n matrix Q, we call Gaussian vector of mean and covariance Q a random vector X = (X1; :::; Xn) having joint probability density function

f (x) = 1

p(2 )ndet(Q)exp (x )T Q 1(x ) 2

!

where x = (x1; :::; xn) 2 Rn. We write X N ( ; Q).

The only drawback of this de…nition is the restriction to strictly positive de…nite matrices Q. It is sometimes useful to have the notion of Gaussian vector also in the case when Q is only non-negative de…nite (sometimes called degenerate case). For instance, we shall see that any linear transformation of a Gaussian vector is a Gaussian vector, but in order to state this theorem in full generality we need to consider also the degenerate case. In order to give a more general de…nition, let us take the idea recalled above for the 1-dimensional case: a¢ ne transformations of Gaussian r.v. are Gaussian.

(8)

viii 1. PRELIMINARIES OF PROBABILITY

Definition 2. i) The standard d-dimensional Gaussian vector is the random vector Z = (Z1; :::; Zd) with joint probability density f (z1; :::; zd) =

Yd i=1

p (zi) where p (z) = p1 2 e z22 :

ii) All other Gaussian vectors X = (X1; :::; Xn) (in any dimension n) are obtained from standard ones by a¢ ne transformations:

X = AZ + b

where A is a matrix and b is a vector. If X has dimension n, we require A to be d n and b to have dimension n (but n can be di¤ erent from d).

The graph of a standard 2-dimensional Gaussian vector is

2 2

0 0 -20.00

x y

-2 0.15

z 0.10

0.05

and the graph of the other Gaussian vectors can be guessed by linear deformations of the base plane xy (deformations de…ned by A) and shift (by b). For instance, if

A = 2 0

0 1

matrix which enlarge the x axis by a factor 2, we get the graph

4 4 20 02 -20.00

x y

-4 -2-4

0.05 0.15

z 0.10

First, let us compute the mean and covariance matrix of a vector of the form X = AZ + b, with Z of standard type. From exercises 4 and 5 we readily have:

(9)

Proposition 3. Mean and covariance Q matrix of a vector X of the previous form are given by

= b Q = AAT:

When two di¤erent de…nitions are given for the same object, one has to prove their equivalence.

If Q is positive de…nite, the two de…nition aim to describe the same object, but for Q non-negative de…nite but not strictly positive de…nite, we have only the last de…nition, so we do not have to check any compatibility.

Proposition 4. If Q is positive de…nite, then de…nitions 1 and 2 are equivalent. More precisely, if X = (X1; :::; Xn) is a Gaussian random vector with mean and covariance Q in the sense of de…nition 1, then there exists a standard Gaussian random vector Z = (Z1; :::; Zn) and a n n matrix A such that

X = AZ + : One can take A = p

Q, as described in the proof. Vice versa, if X = (X1; :::; Xn) is a Gaussian random vector in the sense of de…nition 2, of the form X = AZ + b, then X is Gaussian in the sense of de…nition 1, with mean and covariance Q given by the previous proposition.

Proof. Let us prove the …rst claim. Let us de…ne pQ = Up

QeUT wherep

Qe is simply de…ned as

pQe= 0

@ p

1 0 0

0 ::: 0

0 0 p

n

1 A :

We have

pQ T = U p Qe

T

UT = Up

QeUT =p Q

and p

Q 2 = Up

QeUTUp

QeUT = Up Qep

QeUT = U QeUT = Q becausep

Qep

Qe= Qe. Set

Z = p

Q 1X where notice that p

Q is invertible, from its de…nition and the strict positivity of i. Then Z is Gaussian. Indeed, from the formula for the transformation of densities,

fZ(z) = fX(x)

jdet Dg (x)j z=g(x)

(10)

x 1. PRELIMINARIES OF PROBABILITY

where g (x) = p

Q 1x ; hence det Dg (x) = det p

Q 1 = p 1

1 p

n; therefore fZ(z) =

Yn i=1

p

i

p 1

(2 )ndet(Q)exp

pQz + T Q 1 p

Qz + 2

!

= 1

p(2 )nexp

pQz T Q 1 p Qz 2

!

= 1

p(2 )nexp zTz 2

which is the density of a standard Gaussian vector. From the de…nition of Z we get X =p

QZ + , so the …rst claim is proved.

The proof of the second claim is a particular case of the next exercise, that we leave to the reader.

Exercise 6. Let X = (X1; :::; Xn) be a Gaussian random vector, B a n m matrix, c a vector of Rm. Then

Y = BX + c

is a Gaussian random vector of dimension m. The relations between means and covariances is

Y = B X + c and covariance

QY = BQXBT:

Remark 8. We see from the exercise that we may start with a non-degenerate vector X and get a degenerate one Y , if B is not a bijection. This always happens when m > n.

Remark 9. The law of a Gaussian vector is determined by the mean vector and the covariance matrix. This fundamental fact will be used below when we study stochastic processes.

Remark 10. Some of the previous results are very useful if we want to generate random vectors according to a prescribed Gaussian law. Assume we have prescribed mean and covariance Q, n- dimensional, and want to generate a random sample (x1; :::; xn) from such N ( ; Q). Then we may generate n independent samples z1; :::; zn from the standard one-dimensional Gaussian law and com-

pute p

Qz +

where z = (z1; :::; zn). In order to have the entries of the matrix p

Q, if the software does not provide them (certain software do it), we may use the formulap

Q = Up

QeUT. The matrix p

Qe is obvious.

In order to get the matrix U recall that its columns are the vectors e1; :::; en written in the original basis. And such vectors are an orthonormal basis of eigenvectors of Q. Thus one has to use at least a software that makes the spectral decomposition of a matrix, to get e1; :::; en.

Riferimenti

Documenti correlati

Related to the kind of issues we are going to discuss is also the following classical theorem in algebraic geometry, essentially due to Del Pezzo and

Remark 1.7. The partial order given by the divisibility of monomials gives Γ a poset structure. An order ideal is called pure if all maximal monomials have the same degree.

Design/methodology/approach – Building on extant research on strategic networks, value networks and business models and leveraging a qualitative survey, the authors develop and test

They build the constructive reals as equivalence classes of fundamental (Cauchy) sequences of rationals; then they introduce the primitive strict order relation (<) and

Here we have used Pythagoras’s theorem, which is generalized by the Cosine Rule (Teorema di Carnot) for triangles (equally valid in space because all the action takes place in a

4 shows the temperature dependence of the screening masses obtained in the present work, both in the scalar (black circles) and pseudoscalar (black diamonds) channels, divided by

In Section 2 we define the process in discrete-time and in continuous-time, survival and extinction and then define the generating function of a BRW (Section 2.2), its fixed points

Since our proofs mainly rely on various properties of Ramanujan’s theta functions and dissections of certain q-products, we end this section by defining a t-dissection and