Preliminaries of Probability

(1)

Preliminaries of Probability

1. Transformation of densities

Exercise 1. If X has cdf FX(x) and g is increasing and continuous, then Y = g (X) has cdf FY (y) = FX g ¹(y)

for all y in the image of y. If g is decreasing and continuous, the formula is F_Y (y) = 1 F_X g ¹(y)

Exercise 2. If X has continuous pdf fX(x) and g is increasing and di¤ erentiable, then Y = g (X) has pdf

fY (y) = fX g ¹(y)

g⁰(g ¹(y)) = fX(x) g⁰(x) _y=g(x)

for all y in the image of y. If g is decreasing and di¤ erentiable, the formula is fY (y) = fX(x)

g⁰(x) _y=g(x): Thus, in general, we have the following result.

Proposition 1. If g is monotone and di¤ erentiable, the transformation of densities is given by fY (y) = fX(x)

jg⁰(x)j y=g(x)

Remark 1. Under proper assumptions, when g is not injective the formula generalizes to f_Y (y) = X

x:y=g(x)

f_X(x) jg⁰(x)j:

Remark 2. A second proof of the previous formula comes from the following characterization of the density: f is the density of X if and only if

E [h (X)] = Z

R

h (x) f (x) dx

for all continuous bounded functions h. Let us use this fact to prove that f_Y (y) = ^f_jg^X₀^(x)

(x)j y=g(x) is the density of Y = g (X). Let us compute E [h (Y )] for a generic continuous bounded functions h. We

i

(2)

ii 1. PRELIMINARIES OF PROBABILITY

have, from the de…nition of Y and from the characterization applied to X, E [h (Y )] = E [h (g (X))] =

Z

R

h (g (x)) f (x) dx:

Let us change variable y = g (x), under the assumption that g is monotone, bijective and di¤ erentiable.

We have x = g ¹(y), dx = ¹

jg⁰(g ¹(y))jdy (we put the absolute value since we do not change the extreme of integration, but just rewrite R

R) so that Z

R

h (g (x)) f (x) dx = Z

R

h (y) f g ¹(y) 1

jg⁰(g ¹(y))jdy:

If we set f_Y (y) := _jg^f^X₀^(x)

(x)j y=g(x) we have proved that E [h (Y )] =

Z

R

h (y) f_Y (y) dy

for every continuous bounded functions h. By the characterization, this implies that f_Y (y) is the density of Y . This proof is thus based on the change of variable formula.

Remark 3. The same proof works in the multidimensional case, using the change of variable formula for multiple integrals. Recall that in place of dy = g⁰(x)dx one has to use dy = jdet Dg (x)j dx where Dg is the Jacobian (the matrix of …rst derivatives) of the transformation g : Rⁿ! Rⁿ. In fact we need the inverse transformation, so we use the corresponding formula

dx = det Dg ¹(y) dy = 1

jdet Dg (g ¹(y))jdy:

With the same passages performed above, one gets the following result.

Proposition 2. If g is a di¤ erentiable bijection and Y = g (X), then f_Y (y) = f_X(x)

jdet Dg (x)j _y=g(x):

Exercise 3. If X (in Rⁿ) has density f_X(x) and Y = U X, where U is an orthogonal linear transformation of Rⁿ (it means that U ¹ = U^T), then Y has density

f_Y (y) = f_X U^Ty :

1.1. Linear transformation of moments. The solution of the following exercises is based on the linearity of expected value (and thus of covariance in each argument).

Exercise 4. Let X = (X1; :::; Xn) be a random vector, A be a n d matrix, Y = AX. Let

X = ^X₁ ; :::; ^X_n be the vector of mean values of X, namely ^X_i = E [Xi]. Then

Y := A ^X is the vector of mean values of Y , namely ^Y_i = E [Yi].

Exercise 5. Under the same assumptions, if Q^X and Q^Y are the covariance matrices of X and Y , then

Q^Y = AQ^XA^T:

(3)

2. About covariance matrices

The covariance matrix Q of a vector X = (X1; :::; Xn), de…ned as Qij = Cov (Xi; Xj), is symmetric:

Q_ij = Cov (X_i; X_j) = Cov (X_j; X_i) = Q_ji and non-negative de…nite:

x^TQx = Xn i;j=1

Q_ijx_ix_j = Xn i;j=1

Cov (X_i; X_j) x_ix_j = Xn i;j=1

Cov (x_iX_i; x_jX_j)

= Cov 0

@ Xn i=1

xiXi; Xn j=1

xjXj

1

A = V ar [W ]

where W =Pn

i=1x_iX_i.

The spectral theorem states that any symmetric matrix Q can be diagonalized, namely it exists a orthonormal basis e₁; :::; e_n of Rⁿ where Q takes the form

Q_e= 0

@ ¹ 0 0

0 ::: 0

0 0 n

1 A :

Moreover, the numbers i are eigenvalues of Q, and the vectors ei are corresponding eigenvectors.

Since the covariance matrix Q is also non-negative de…nite, we have

i 0; i = 1; :::; n:

Remark 4. To understand better this theorem, recall a few facts of linear algebra. Rⁿ is a vector space with a scalar product h:; :i, namely a set of elements (called vectors) with certain operations (sum of vectors, multiplication by real numbers, scalar product between vectors) and properties. We may call intrinsic the objects de…ned in these terms, opposite to the objects de…ned by means of numbers, with respect to a given basis. A vector x 2 Rⁿ is an intrinsic object; but we can write it as a sequence of numbers (x₁; :::; x_n) in in…nitely many ways, depending on the basis we choose. Given an orthonormal basis u1; :::; un, the components of a vector x 2 Rⁿ in this basis are the numbers hx; u^ji, j = 1; :::; n. A linear map L in Rⁿ, given the basis u₁; :::; u_n, can be represented by a matrix of components hLui; u_ji.

We shall write y^T x for hx; yi (or hy; xi).

Remark 5. After these general comments, we see that a matrix represents a linear transformation, given a basis. Thus, given the canonical basis of Rⁿ, that we shall denote by u1; :::; un, given the matrix Q, it is de…ned a linear transformation L from Rⁿ to a Rⁿ. The spectral theorem states that there is a new orthonormal basis e1; :::; en of Rⁿ such that, if Qe represents the linear transformation L in this new basis, then Q_e is diagonal.

Remark 6. Let us recall more facts about linear algebra. Start with an orthonormal basis u1; :::; un, that we call canonical or original basis. Let e₁; :::; e_n be another orthonormal basis. The vector u₁, in

(4)

iv 1. PRELIMINARIES OF PROBABILITY

the canonical basis, has components

u1= 0 BB

@ 1 0 :::

0 1 CC A

and so on for the other vectors. Each vector e_j has certain components. Denote by U the matrix such that its …rst column has the same components as e1 (those of the canonical basis), and so on for the other columns. We could write U = (e1; :::; en). Also, Uij = e^T_j ui. Then

U 0 BB

@ 1 0 :::

0 1 CC A = e¹

and so on, namely U represents the linear map which maps the canonical (original) basis of Rⁿ into e1; :::; en. This is an orthogonal transformation:

U ¹ = U^T:

Indeed, U ¹ maps e1; :::; en into the canonical basis (by the above property of U ), and U^T does the same:

U^Te1 = 0 BB

@

e^T₁ e₁ e^T₂ e₁

:::

e^T_n e₁ 1 CC A =

0 BB

@ 1 0 :::

0 1 CC A

and so on.

Remark 7. Let us now go back to the covariance matrix Q and the matrix Qegiven by the spectral theorem: Q_e is a diagonal matrix which represents the same linear transformation L in a new basis e1; :::; en. Assume we do not know anything else, except they describe the same map L and Qe is diagonal, namely of the form

Qe= 0

@ ¹ 0 0

0 ::: 0

0 0 n

1 A :

Let us deduce a number of facts:

i) from basic linear algebra we know the relation Q_e= U^TQU

ii) the diagonal elements j are eigenvalues of L, with eigenvectors ej

iii) _j 0, j = 1; :::; n.

(5)

To prove (ii), let us write the vector Le₁ in the basis e₁; :::; e_n: e_i is the vector 0 BB

@ 1 0 :::

0 1 CC

A, the map L is represented by Q_e, hence Le₁ is equal to

Q_e 0 BB

@ 1 0 :::

0 1 CC A =

0 BB

@

1

0 :::

0 1 CC A = ¹

0 BB

@ 1 0 :::

0 1 CC A

which is ₁e₁ in the basis e₁; :::; e_n. We have checked that Le₁ = ₁e₁, namely that ₁ is an eigenvalue and e1 is a corresponding eigenvector. The proof for 2, etc. is the same. To prove (iii), just see that, in the basis e₁; :::; e_n,

e^T_jQ_ee_j = _j: But

e^T_jQ_ee_j = e^T_jU^TQU e_j = v^TQv 0

where v = U ej, having used the property that Q is non-negative de…nite. Hence j 0.

3. Gaussian vectors

Recall that a Gaussian, or Normal, r.v. N ; ² is a r.v. with probability density f (x) = 1

p2 ²exp jx j² 2 ²

! :

We have shown that is the mean value and ² the variance. The standard Normal is the case = 0,

2= 1. If Z is a standard normal r.v., then + Z is N ; ² .

We may give the de…nition of Gaussian vector in two ways, generalizing either the expression of the density or the property that + Z is N ; ² . Let us start with a lemma.

Lemma 1. Given a vector = ( ₁; :::; _n) and a symmetric positive de…nite n n matrix Q (namely v^TQv > 0 for all v 6= 0), consider the function

f (x) = 1

p(2 )ⁿdet(Q)exp (x )^T Q ¹(x ) 2

!

where x = (x₁; :::; x_n) 2 Rⁿ. Notice that the inverse Q ¹ is well de…ned for positive de…nite matrices, (x )^TQ ¹(x ) is a positive quantity, det(Q) is a positive number. Then:

i) f (x) is a probability density;

ii) if X = (X₁; :::; X_n) is a random vector with such joint probability density, then is the vector of mean values, namely

i= E [X_i] and Q is the covariance matrix:

Q_ij = Cov (X_i; X_j) :

(6)

vi 1. PRELIMINARIES OF PROBABILITY

Proof. Step 1. In this step we explain the meaning of the expression f (x). We have recalled above that any symmetric matrix Q can be diagonalized, namely it exists a orthonormal basis e₁; :::; e_n of Rⁿ where Q takes the form

Q_e= 0

@ ¹ 0 0

0 ::: 0

0 0 n

1 A :

Moreover, the numbers i are eigenvalues of Q, and the vectors ei are corresponding eigenvectors.

See above for more details. Let U be the matrix introduced there, such that U ¹ = U^T. Recall the relation Qe= U^TQU .

Since v^TQv > 0 for all v 6= 0, we deduce

v^TQ_ev = v^TU^T Q (U v) > 0 for all v 6= 0 (since Uv 6= 0). Taking v = ei, we get _i> 0.

Therefore the matrix Qe is invertible, with inverse given by Q_e¹ =

0

@

1

1 0 0

0 ::: 0

0 0 _n¹

1 A :

It follows that also Q, being equal to U QeU^T (the relation Q = U QeU^T comes from Qe = U^TQU ), is invertible, with inverse Q ¹ = U Q_e¹U^T. Easily one gets (x )^T Q ¹(x ) > 0 for x 6= . Moreover,

det(Q) = det (U ) det (Qe) det U^T = 1 n

because

det(Qe) = 1 n

and det (U ) = 1. The latter property comes from

1 = det I = det U^TU = det U^T det (U ) = det (U )²

(to be used in exercise 3). Therefore det(Q) > 0. The formula for f (x) is meaningful and de…nes a positive function.

Step 2. Let us prove that f (x) is a density. By the theorem of change of variables in multidimensional integrals, with the change of variables x = U y,

Z

Rⁿ

f (x) dx = Z

Rⁿ

f (U y) dy

because jdet Uj = 1 (and the Jacobian of a linear transformation is the linear map itself). Now, since U^TQ ¹U = Q_e¹, f (U y) is equal to the following function:

f_e(y) = 1

p(2 )ⁿdet(Qe)exp (y _e)^TQ_e¹(y _e) 2

!

where

e= U^T :

(7)

Since

(y e)^T Q_e¹(y e) = Xn i=1

(yi ( e)_i)²

i

and det(Qe) = 1 n, we get f_e(y) =

Yn i=1

p1 2 i

exp (y_i ( _e)_i)² 2 i

! :

Namely, fe(y) is the product of n Gaussian densities N (( e)_i; i). We know from the theory of joint probability densities that the product of densities is the joint density of a vector with independent components. Hence fe(y) is a probability density. Therefore R

Rⁿfe(y) dy = 1. This proves R

Rⁿf (x) dx = 1, so that f is a probability density.

Step 3. Let X = (X₁; :::; X_n) be a random vector with joint probability density f , when written in the original basis. Let Y = U^TX. Then (exercise 3) Y has density f_Y (y) given by f_Y (y) = f (U y).

Thus

f_Y (y) = f_e(y) = Yn i=1

p1

2 _iexp (y_i ( _e)_i)² 2 _i

! : Thus (Y₁; :::; Y_n) are independent N (( _e)_i; _i) r.v. and therefore

E [Y_i] = ( _e)_i; Cov (Y_i; Y_j) = _{ij i}: From exercises 4 and 5 we deduce that X = U Y has mean

X = U _Y and covariance

Q_X = U Q_YU^T:

Since _Y = _e and _e = U^T we readily deduce _X = U U^T = . Since Q_Y = Q_e and Q = U Q_eU^T we get Q_X = Q. The proof is complete.

Definition 1. Given a vector = ( 1; :::; n) and a symmetric positive de…nite n n matrix Q, we call Gaussian vector of mean and covariance Q a random vector X = (X₁; :::; X_n) having joint probability density function

f (x) = 1

p(2 )ⁿdet(Q)exp (x )^T Q ¹(x ) 2

!

where x = (x1; :::; xn) 2 Rⁿ. We write X N ( ; Q).

The only drawback of this de…nition is the restriction to strictly positive de…nite matrices Q. It is sometimes useful to have the notion of Gaussian vector also in the case when Q is only non-negative de…nite (sometimes called degenerate case). For instance, we shall see that any linear transformation of a Gaussian vector is a Gaussian vector, but in order to state this theorem in full generality we need to consider also the degenerate case. In order to give a more general de…nition, let us take the idea recalled above for the 1-dimensional case: a¢ ne transformations of Gaussian r.v. are Gaussian.

(8)

viii 1. PRELIMINARIES OF PROBABILITY

Definition 2. i) The standard d-dimensional Gaussian vector is the random vector Z = (Z1; :::; Zd) with joint probability density f (z1; :::; zd) =

Yd i=1

p (zi) where p (z) = p¹ 2 e ^z2² :

ii) All other Gaussian vectors X = (X1; :::; Xn) (in any dimension n) are obtained from standard ones by a¢ ne transformations:

X = AZ + b

where A is a matrix and b is a vector. If X has dimension n, we require A to be d n and b to have dimension n (but n can be di¤ erent from d).

The graph of a standard 2-dimensional Gaussian vector is

2 2

0 0 -20.00

x y

-2 0.15

z ^0.10

0.05

and the graph of the other Gaussian vectors can be guessed by linear deformations of the base plane xy (deformations de…ned by A) and shift (by b). For instance, if

A = 2 0

0 1

matrix which enlarge the x axis by a factor 2, we get the graph

4 4 20 02 -20.00

x y

-4 -2-4

0.05 0.15

z ^0.10

First, let us compute the mean and covariance matrix of a vector of the form X = AZ + b, with Z of standard type. From exercises 4 and 5 we readily have:

(9)

Proposition 3. Mean and covariance Q matrix of a vector X of the previous form are given by

= b Q = AA^T:

When two di¤erent de…nitions are given for the same object, one has to prove their equivalence.

If Q is positive de…nite, the two de…nition aim to describe the same object, but for Q non-negative de…nite but not strictly positive de…nite, we have only the last de…nition, so we do not have to check any compatibility.

Proposition 4. If Q is positive de…nite, then de…nitions 1 and 2 are equivalent. More precisely, if X = (X1; :::; Xn) is a Gaussian random vector with mean and covariance Q in the sense of de…nition 1, then there exists a standard Gaussian random vector Z = (Z₁; :::; Z_n) and a n n matrix A such that

X = AZ + : One can take A = p

Q, as described in the proof. Vice versa, if X = (X1; :::; Xn) is a Gaussian random vector in the sense of de…nition 2, of the form X = AZ + b, then X is Gaussian in the sense of de…nition 1, with mean and covariance Q given by the previous proposition.

Proof. Let us prove the …rst claim. Let us de…ne pQ = Up

Q_eU^T wherep

Qe is simply de…ned as

pQe= 0

@ p

1 0 0

0 ::: 0

0 0 p

n

1 A :

We have

pQ ^T = U p Qe

T

U^T = Up

QeU^T =p Q

and p

Q ² = Up

Q_eU^TUp

Q_eU^T = Up Q_ep

Q_eU^T = U Q_eU^T = Q becausep

Qep

Qe= Qe. Set

Z = p

Q ¹X where notice that p

Q is invertible, from its de…nition and the strict positivity of i. Then Z is Gaussian. Indeed, from the formula for the transformation of densities,

f_Z(z) = f_X(x)

jdet Dg (x)j z=g(x)

(10)

x 1. PRELIMINARIES OF PROBABILITY

where g (x) = p

Q ¹x ; hence det Dg (x) = det p

Q ¹ = p ¹

1 p

n; therefore fZ(z) =

Yn i=1

p

i

p 1

(2 )ⁿdet(Q)exp

pQz + ^T Q ¹ p

Qz + 2

!

= 1

p(2 )ⁿexp

pQz ^T Q ¹ p Qz 2

!

= 1

p(2 )ⁿexp z^Tz 2

which is the density of a standard Gaussian vector. From the de…nition of Z we get X =p

QZ + , so the …rst claim is proved.

The proof of the second claim is a particular case of the next exercise, that we leave to the reader.

Exercise 6. Let X = (X1; :::; Xn) be a Gaussian random vector, B a n m matrix, c a vector of R^m. Then

Y = BX + c

is a Gaussian random vector of dimension m. The relations between means and covariances is

Y = B _X + c and covariance

Q_Y = BQ_XB^T:

Remark 8. We see from the exercise that we may start with a non-degenerate vector X and get a degenerate one Y , if B is not a bijection. This always happens when m > n.

Remark 9. The law of a Gaussian vector is determined by the mean vector and the covariance matrix. This fundamental fact will be used below when we study stochastic processes.

Remark 10. Some of the previous results are very useful if we want to generate random vectors according to a prescribed Gaussian law. Assume we have prescribed mean and covariance Q, n- dimensional, and want to generate a random sample (x₁; :::; x_n) from such N ( ; Q). Then we may generate n independent samples z1; :::; zn from the standard one-dimensional Gaussian law and com-

pute p

Qz +

where z = (z1; :::; zn). In order to have the entries of the matrix p

Q, if the software does not provide them (certain software do it), we may use the formulap

Q = Up

Q_eU^T. The matrix p

Q_e is obvious.

In order to get the matrix U recall that its columns are the vectors e1; :::; en written in the original basis. And such vectors are an orthonormal basis of eigenvectors of Q. Thus one has to use at least a software that makes the spectral decomposition of a matrix, to get e1; :::; en.