• Non ci sono risultati.

Þ Þ One - dimensionalcase Gaussianrandomvariablesin RRRR

N/A
N/A
Protected

Academic year: 2021

Condividi "Þ Þ One - dimensionalcase Gaussianrandomvariablesin RRRR"

Copied!
10
0
0

Testo completo

(1)

Gaussian vectors

Lecture 5

Gaussian random variables in R R R R n

One-dimensional case

One-dimensional Gaussian density with mean μ and standard deviation σ (called Nμ, σ2):

fx = 1

2πσ2 exp −x− μ22 .

Proposition If X ∼ Nμ,σ2, then aX + b is again Gaussian. Precisely aX + b ∼ Naμ + b,a2σ2.

Proof

EϕaX + b =

ϕax + b 1

2πσ2 exp −x− μ22 dx t = ax + b

dx = dt/a

=

ϕt 1

a 2πσ2 exp −t− aμ + b2 2a2σ2 dt since

x− μ = ax + b− aμ + b

a .

The function

1 a 2πσ2

exp − t− aμ + b2 2a2σ2

is the density of a Gaussian Naμ + b, a2σ2. The proof is complete.

Corollary If X ∼ Nμ,σ2, then Z := Xσ−μ ∼ N0,1. In other words, any Gaussian r.v.

X ∼ Nμ,σ2 can be represented in the form X = σZ + μ where Z is canonical (standard).

Remark For any random variable X (not necessarily Gaussian), the transformation Z := X− μ

σ

is called “standardization”. The r.v. Z has always mean zero and standard deviation 1.

However, if X belongs to some class (ex. Weibull), Z does not necessarily belong to the same class. Unless X is Gaussian. This is one of the reasons why the Gaussian part of probability theory is called the “linear theory” (invariance by linear, or even affine, transformations).

Exercise (Theoretical) Show that Weibull class is not invariant.

The general property of the previous remark is based on the linearity of the expectation:

(2)

EaX + bY + c = aEX + bEY + c and the quadratic property of the variance:

VaraX + b = a2VarX

which hold true for all random variables and constants. See “Appunti teorici terza parte”, section 2.

Multidimensional case

We give the definition of multidimensional Gaussian variable reversing the previous procedure.

Definition Canonical (standard) gaussian density inRn: fx1, ..., xn = 1

ex1

2

2 ... 1

exn

2

2 = 1

2πn/2 ex12+...+xn22 or in vector notations:

fx = 1

2πn/2 exp‖x‖2 2 where ‖x‖ is the Euclidean norm of x = x1, ..., xn.

A random vector Z = Z1, ..., Zn with density fx will be called a canonical Gaussian vector.

A picture of the canonical Gaussian density in dimension n = 2 was given in the first lecture. A sample of 100 points from a 2-D Gaussian is:

-2 -1 0 1 2 3

-2-1012

z.1

z.2

Definition General Gaussian random vector X = X1, ..., Xn: any random vector of the form

X = AZ + μ

where Z = Z1, ..., Zk is a canonical Gaussian vector inRk, for some k, A is a matrix with k input (columns) and n output (rows), and μ is a n-vector.

In plain words: Gaussian vectors: linear (affine) transformations of canonical Gaussian vectors.

μ = translation.

A: several possibilities: rotation, stretching in some direction... It plays the role of σ (“large A” means large dispersion), but it is multidimensional. Let us see a few 2-D examples:

● translation by 1, 1

● multiplication by 2 0

(3)

● multiplication by 2 0

0 1 followed by 45° rotation, namely multiplication by

A = 1 2

1 −1

1 1

2 0

0 1 = 2 −1/ 2

2 1/ 2

-4 -2 0 2

-4-3-2-10123

x[1, ]

x[2, ]

Proposition Let

Q = AAT

(n × n square, symmetric, matrix). If det Q ≠ 0, then the density of X is

fx = 1

2πn/2 det Q expx− μTQ−1x− μ

2 .

Level curves of the density: fx = C

x− μTQ−1x− μ = R2. They are ellipsoids.

Covariance matrix

More on independence

Recall that two events A and B are called independent if PA∩ B = PAPB

(more or less equivalently, if PA|B = PA and PB|A = PB). Two random variables X, Y are called independent if

PX ∈ I,Y ∈ J = PX ∈ IPY ∈ J

for every interval I, J. If they have a densities fXx, fYy (called marginals), and joint density fx, y, then the identity

fx, y = fXx ⋅ fYy

is equivalent to independence of X, Y.

Remark Z = Z1, ..., Zn canonical Gaussian vector  Z1, ..., Zn independent 1-d Gaussian standard.

Proposition If X, Y are independent, then

EXY = EXEY.

(4)

This is not a characterization of independence: it may happen that EXY = EXEY

but X, Y are not independent (the average is only a summary of the density, so a propriety of product of averages does not imply product of densities).

However, such examples must be “cooked” with intention, they do not happen “at random”. Moreover:

Proposition If X, Y are jointly gaussian and

EXY = EXEY

they are independent.

(a posteriori of thi lecture we could prove this claim).

Definition Given two random variables X, Y, we call covariance the number CovX, Y = EX− EXY − EY

= EXY − EXEY.

It is a generalization of the Variance: CovX, X = VarX.

We see that:

CovX, Y = 0EXY = EXEY.

Definition We say that X and Y are uncorrelated if CovX, Y = 0, or equivalently if EXY = EXEY.

Corollary Independent implies uncorrelated.

Uncorrelated and jointly gaussian implies independent.

The number CovX, Y gives a measure of the relation between two random variables.

More closely we could see that it describes the degree of linear relation (regression theory). Large CovX, Y correspondes to high degree of linear correlation.

A drawback of CovX, Y is that it depends on the unit of measure of X and Y: “large”

CovX, Y is relative to the order of magnitude of the other quantities of the problem. The correlation coefficient

ρX, Y = CovX, YσXσY

is independent of the unit of measure (it is “absolute”), and

− 1 ≤ ρX,Y ≤ 1.

Again, ρX, Y = 0 means uncorrelated. High degree of correlation becomes ρX, Y close to +1 or−1 (positive or negative linear correlation).

Proposition In general,

VarX + Y = VarX + VarY + 2CovX, Y.

Hence, if X, Y are uncorrelated, then

VarX + Y = VarX + VarY.

This is not linearity of the variance. The first identity comes simply from the property

a + b2 = a2 + b2 + 2ab.

Proposition Cov is linear in both arguments:

+ bX + c, Y = aCovX

(5)

and similarly in the second argument (it is symmetric). Notice that additive constants c disappear (as in the variance).

(proof: elementary)

What is Q = AA

T

Let us understand better Q = AAT. Write X = AZ + μ in components:

X1 = A11Z1 + A12Z2 + ... + μ1

X2 = A21Z1 + A22Z2 + ... + μ2

...

and compute

CovX1, X2

= Cov

i

A1iZi + μ1,

j

A2jZj + μ2

=

i,j

A1iA2jCovZi, Zj

=

i

A1iA2i = AAT1,2 = Q1,2

In general,

CovXh, Xk = AATh,k = Qh,k.

Proposition Q = AAT is the “covariance matrix” (matrix of covariances).

Covariance is generalization of variance. Q is generalization of σ2 from one-dimensional to multi-dimensional.

Example For the example

A = 2 −1/ 2

2 1/ 2 we have

Q = 2 −1/ 2

2 1/ 2

2 2

−1/ 2 1/ 2 =

5 2

3 2 3 2

5 2

.

The covariance between X1 and X2 is 32.

Spectral theorem

Any symmetric matrix, hence Q in particular, can be diagonalized: there exists a new orthonormal basis ofRn where Q is diagonal. The elements of such basis are eigenvectors of Q, the elements of Q on the diagonal are the corresponding eigenvalues:

Qvi = λivi

(6)

Q =

λ1 0 0 0 ... 0 0 0 λn

in the basis v1, ..., vn. The use is to order the eigenvalues in decreasing order.

Example For the example

A = 2 −1/ 2

2 1/ 2 , Q =

5 2

3 2 3 2

5 2

the eigenvectors are v1 = 1

1 and v2 = −1

1 , with eigenvalues λ1 = 4, λ2 = 1.

The covarance matrix Q = AATis also positive semi-definite:

xTQx ≥ 0 for all vectors x ∈ Rn. This is equivalent to

λi ≥ 0 for i = 1, ..., n. Moreover, det Q ≥ 0.

We have

det Q > 0  λi > 0 for all i = 1, ..., n.

In such a case, the level curves have the form y1

λ1 2

+ ... + yn

λn 2

= R2

wherey1, ..., yn are the coordinates in the new basis v1, ..., vn. They are ellipses with axes v1, ..., vn and amplitudes along these axes equal to λ1, ..., λn. The method of Principal Component Analysis (PCA) will be based on these remarks.

Example For our usual example, since v1 = 1

1 , v2 = −1

1 , λ1 = 4, λ2 = 1

the ellipses have the form:

-1.5 -1 -0.5 0 0.5 1 1.5 y

-1.5 -1 -0.5 0.5 x 1 1.5

which can be obtained also from the equation xTQ−1x = R2, x = x, yT, namely

(7)

Q−1 = 0.625 −0.375

−0.375 0.625

0.625x2 + 0.625y2− 0.375 ⋅ 2xy = 1 (R2 = 1).

Generation of multivariate samples

How to generate Gaussian samples with given covariance?

In many applications Q is known, but A is not. We want to generate a sample under X = AZ + μ. Problem:

Q ↦ A?

We have to solve the equation (A is the unknown) AAT = Q.

The softwareRgives us the following solution:

require(mgcv) A<-mroot(Q)

We may choose the dimensionkofZ. Simplest choice:k = n= dimension ofX. ThusAis a square matrix.

We may chooseAsymmetric. Thus the equation is

A2 = Q.

The solution is

A = Q .

In practice?

Exercise Assume to know the spectral decomposition ofQ, namely the eigenvectorsviand the

eigevaluesλi. LetUbe the orthogonal matrix (UT = U−1) defined as follows: thei-th column ofUisvi. Check thatQ := UTQUis diagonal, with diagonal elementsλi. The matrix Q is simply the diagonal matrix with elements λi . Then set

A := U Q UT.

Check thatAis symmetric andA2 = Q.

Generation of non-gaussian samples

Recall the theorem:

Theorem i) If Y is a random variable with cdf F (continuous case), then the random variable

U := FY

is uniformly distributed on0, 1.

(8)

ii) If U is a uniform random variable then F−1U

is a random variable with cdf F.

Application of both (i) and (ii) gives us:

Corollary If Y is a random variable with cdf F andΦ denotes the cdf of a standard normal, then

Z := Φ−1FY

is a standard normal variable. And vice-versa, Y = F−1ΦZ.

Algorithm to generate a sample from Y:

● generate a sample from a standard normal variable Z

● compute F−1ΦZ.

Nothing more than the old one based on uniform? No: multidimensional, correlated!

Let Y1, Y2two r.v. with cdf F1, F2 (continuous case). Compute X1 := Φ−1F1Y1, X2 := Φ−1F2Y2.

They are standard normal, but not necessarily independent.

Theoretical gap: no reason whyX1, X2 should be jointly gaussian (gaussian vector).

AssumeX1, X2 jointly gaussian.

● Compute covariance matrix Q of X1, X2, and average μ = μ1, μ2.

● Compute A = Q as above (or any other solution of AAT = Q).

● Simulate standard Gaussian vector Z1, Z2.

● Compute X1, X2 from Z1, Z2 by means of A and μ.

● Anti-transform

Yi = Fi−1ΦXi, i = 1, 2.

This is a way to generate samples from non-gaussian correlated r.v.Y1, Y2.

Multidimensional data fit

We describe only simple rules.

Assume a samplex1, y1, ..., xn, yn is given. We cannot plot a joint histogram or cdf.

Thus we cannot get a feeling about gaussianity or not. But we can plot 1-D marginals. A Gaussian vector has Gaussian marginals.

If we want to model our data by a 2-D Gaussian (either because we see a good

agreement with gaussinaity of the marginals, or because of simplicity), we estimate Q and μ simply bycovandmean, inR.

Otherwise, if we want to describe marginals by non-gaussian distributions,

● we fit the marginals and find F1, F2,

● transform the data by

xi = Φ−1F1xi, yi = Φ−1F2yi, i = 1, ..., n into a new samplex1, y1, ..., xn, yn

● assume it is jointly gaussian (we only know that x1, ..., xn and y1, ..., yn  are gaussian)

● computecovandmeanofx1, y1, ..., xn, yn.

(9)

other purposes.

Example

Consider the following 20 points in the plane

-0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0

-0.50.00.51.01.52.02.5

x

y

They have been produced artificially by two independent N1, 1 components. Let us ignore this fact. As an exercise, let us think they are the values of two physical quantities measured in 20 experiments.

We want to solve the following problem: compute the probability that both components are positive.

A simple answer is: we count the number of point with positive components, 13 in this example, and answer 1320 = 0.65. We clearly see that a number of points are close to the boundary, thus the result suffers very much the peculiarity of the sample. We are sure that, if we repeat the experiments, this number may change considerably.

Thus let us extract a model, a 2-D density, from data and compute the theoretical probability from it. We hope it is a more stabel result.

For simplicity, let us choose a Gaussian fit from the beginning. Computecovand meanof data, that in our case are:

Q = 1.001 −0.058

−0.058 0.798 , μ = 1.146 0.746

We see that in this example the first component is fitted quite well with respect to the true N0, 1 which generated the sample. The second is not: the second sample is poor. The correlation between the two samples is very small, good indication of independence.

The (gaussian) model has been found. How to compute the required probability? By Monte Carlo.

Using require(mgcv), A<-mroot(Q), get A. Then produce N standard points z = z1, z2, transform them by Az + μ,

-2 0 2 4

-2024

xx

yy

compute the fraction with both positive components. This is a Monte Carlo approzimation of the required probability. At the end we find

p = 0.69.

(10)

It is not very different from 1320 = 0.65. But if we repeat a few times the whole procedure we see that the second estimate is more stable than the first one (not so much, however, only roughly 20% better).

Riferimenti

Documenti correlati

One specific challenge for establishing if changes in gut microbial composition and/or activity have an effect on human health or biomarkers of human health is that effects on

Free radicals are waste substances produced by cells as the body processes food and reacts to the environment.. If the body cannot process and remove free radicals

The experts were called upon to support the tutors and the group in the analysis and identification of the needs expressed by the participants regarding the possible discovery

In this paper we proved the undecidability of the interval temporal logic with a single modality corresponding to Allen’s Overlap relation, interpreted over discrete linear

A systematic analysis of the influence of milling time on particle size distribution, and TiB 2 dispersion in the Cu matrix, allowed us to find the most proper processing parameters

E FISICA DI IMPASTI IN TERRA CRUDA E PROPOSTA DI METODOLOGIE PER IL LORO

The museum has now been recognized as a powerful resource for local development, both from an economic and a social point of view. Museums can support local economic

The degree graph of a finite group G, that we denote by ∆(G), is defined as the (simple undirected) prime graph related to the set cd(G) of the irreducible complex character degrees