• Non ci sono risultati.

Example 1: discrete setting

N/A
N/A
Protected

Academic year: 2021

Condividi "Example 1: discrete setting"

Copied!
43
0
0

Testo completo

(1)

Maximum Likelihood (ML)

Estimation and Specification Tests

Econometrics I Part II

2016

(2)

Introduction

The ML estimation methodology is grounded on the assumption that the (conditional) distribution of an observed phenomenon (the endogenous variable) is known up to a finite number of unknown parameters.

These unknown parameters are estimated by taking those values for them that give the observed values the highest probability (likelihood) to be drawn given the assumed

conditional distribution.

(3)

Example 1: discrete setting

Consider a large pool filled with black and white balls. We are interested in the fraction of white balls, p in this pool.

(4)

To obtain information on p we extract a random sample of n balls.

Let us denote yi  1 if ball i is white and yi  0 otherwise.

Then it holds by assumption that

Pryi  1  p (1)

(5)

Suppose our sample contains n1

i yi white balls and n − n1

black balls, the probability of obtaining such a sample (in a given order) is given by

Lp  pn11 − pn−n1 (2) The expression in (2), Lp, seen as a function of the unknown

parameter p, is referred to as the likelihood function.

(6)

The maximum likelihood estimator of p implies that that we

choose a value for p that maximizes the expression in (2), that is the probability of drawing the observed sample.

(7)

For computational reasons it is often more convenient to

maximize the natural log (which is a monotonic transformation) of the expression in (2). Such a transformation is referred to as the log-likelihood function

log Lp  n1 log p  n − n1 log1 − p (3)

(8)

Maximizing the expression in (3) with respect to p gives the first-order condition

d log Lp

dp  np −1 n − n1

1 − p  0 (4)

which, solving for p, gives the ML estimator

pML  nn1 (5)

(9)

To be sure that the solution we have corresponds to a maximum we also need to check the second-order condition

d2 log Lp

dp2  − n1

p2 − n − n1

1 − p2  0 (6) which indeed shows that LpML is a maximum.

(10)

Example 2: continuos setting

Consider a bivariate classical linear regression model

augmented with the normality assumption of the error terms yi  1  2xi  i

i|x  NID0, 2

(7) where the NID acronym stands for normally N and

independently I distributed errors.

(11)

Given the assumptions in (7) the following holds

yi|x  NID1  2xi,2 (8) Therefore the contribution of observation i to the likelihood

function is the value of the density function at the observed point yi. For the normal distribution this gives

fyi|xi;, 2  1

22 exp − 12 yi − 1 − 2xi2

2 (9)

(12)

Because of the independency assumption, the joint density of y1, y2, . . . yn (conditional on x) is given by

fy1, y2, . . . yn|x;, 2 

i fyi|xi;, 2

 1

22

n

i exp − 12 yi − 1 − 2xi2

2

(13)

The likelihood function is identical to the joint density function in (10) but it is seen as a function of the unknown parameters

and 2. We can therefore write L, 2  1

22

n

i exp − 12 yi − 1 − 2xi2

2 (11)

and, by applying the log transformation

log L, 2  − n2 log22 − 12

i yi − 1 − 2xi2

2 (12)

(14)

As the first term in (12) does not depend upon  it can be easily seen that maximizing the expression in (12) with respect to 1

and 2 corresponds to minimizing the residual sum of squares S.

(15)

That is, the ML estimators for 1 and 2 are identical to the OLS estimators. In general terms the following holds

bML  bOLS  XX−1Xy (13)

(16)

Given the expression in (13) we can substitute yi − 1 − 2xi in expression (12) with the corresponding ML residuals (which are also the OLS residuals)

log L2  − n2 log22 − 12

i ei2

2 (14)

After differentiating the expression in (14) with respect to 2

(17)

we obtain the first order condition d log L2

d2  − n2 2

22  12

i ei2

4  0 (15)

Solving for 2 yields the ML estimator for 2

ML

2  ee

n (16)

(18)

This estimator is consistent even if it is biased. In fact it does not correspond to the unbiased estimator for 2 that was derived

from the OLS estimator, given by s2  ee

n − K (17)

(19)

General properties of the ML estimator

Suppose that we are interested in the conditional distribution of yi given xi.

The probability mass (in a discrete setting) or the density function (in a continuos setting) can be written as

fyi|xi; (18)

where  is the unknown parameter vector.

(20)

Assume that observations are mutually independent. In this situation the probability mass or joint density function of the sample y1, y2, . . . yn conditional on x1, x2, . . . xn can be written as:

fy1, y2, . . . yn|X; 

i fyi|xi; (19)

(21)

The likelihood function is therefore given by

L 

i Li 

i fyi|xi; (20)

where Li is the likelihood contribution of observation i, which represents how much observation i contributes to the

likelihood.

(22)

The ML estimator for  is the solution to the maximization problem

Max log L 

Max

i log Li (21)

(23)

First order conditions are given by

∂ log L

∂ ML

i ∂ log L∂i



ML

 0 (22)

If the log-likelihood function is globally concave there is a unique global maximum and the ML estimator is uniquely determined by these first-order conditions.

Only in special cases, however, the ML estimator can be determined analytically. More often numerical optimization methods are required.

(24)

For notational convenience, we denote the vector of the first derivatives of the likelihood function as

s  ∂log L

∂

i ∂ log L∂i

i si (23)

where the s vector is referred to as the score vector and the si vector is referred to as the score contribution for

observation i.

(25)

The first order conditions thus say that the K sample averages of the score contributions, evaluated at the ML estimate, 

ML

should be zero

s|ML

i si 

ML  0 (24)

(26)

Provided that the likelihood function is correctly specified, it can be shown under weak regularity conditions that

(a) the ML estimator is consistent for

ML

→ p (25)

(27)

(b) the ML estimator is asymptotically efficient, that is,

asymptotically, the ML estimator has the smallest variance among all consistent (linear and non-linear) asymptotic

estimators;

(28)

(c) the ML estimator is asymptotically normally distributed, according to

n 

ML − 

d

→ N0, V (26)

where V is the asymptotic covariance matrix which

corresponds to the inverse of the information matrix I.

(29)

The covariance matrix V is determined by the shape of the log-likelihood function.

To describe it in general terms we define the information in observation i as

Ii  −E ∂2 log Li

∂∂ (27)

Loosely speaking, this K  K matrix summarizes the expected amount of information about  contained in observation i.

(30)

The average information matrix for a sample of size n is given by

In  1n

i Ii  −E 1n

i 2 log Li

∂∂

 −E 1n2 log L

∂∂

(28)

(31)

while the limiting information matrix is defined as I 

n→

lim In (29)

In the special case where observations are identically and independently distributed the following holds

I  In  Ii (30)

(32)

Intuitive interpretation of the information matrix

The expression in (28) is (minus) the expected value of the matrix of second order derivatives, scaled by the number of observations.

If the log-likelihood function is highly curved around its

maximum, the second derivative is large, the variance is small and the ML estimator is relatively accurate.

If, on the other hand, the function is less curved, the second derivative is small, the variance is larger and the ML estimator less accurate.

Given the asymptotic efficiency of the ML estimator the inverse of the information matrix provides a lower bound on the

asymptotic covariance matrix, often referred to as the Cramer-Rao lower bound.

(33)

An alternative expression for the information matrix can be obtained from the result that the matrix

Ji  Esisi (31) is identical to Ii, provided that the likelihood function is

correctly specified.

(34)

(d) the covariance matrix can be consistently estimated by replacing the expectational operator with a sample average and by replacing the unknown coefficients with the

corresponding maximum likelihood estimates. The estimator based on (28) is

VH  − 1n

i 2 log Li

∂∂ ML

−1

(32) whereas,the estimator based on (31) is

VG  1

n

i siMLsML −1 (33)

(35)

Example 1: discrete setting

The second order derivative is d2 log Lp

dp2  − n1

p2 − n − n1

1 − p2 (34)

We can therefore write that

(36)

I  −E 1n d2 log Lp

dp2  −E 1n − n1

p2 − n − n1

1 − p2

p

p2  1 − p

1 − p2

 1

p1 − p

(35)

(37)

and, finally

n pML − p

d

→ N0, p1 − p (36)

(38)

Example 2: continuos setting

The log-likelihood function is

log L, 2  − n2 log22 − 12

i yi − xi2

2 (37)

(39)

The score contributions are therefore given by si, 2 

∂ log Li,2

∂

∂ log Li,2

∂2

yi−xi

2 xi

21212 yi−x4i2

i

2 xi

21212 i24

(38)

(40)

To obtain the asymptotic covariance matrix of the ML estimator we use the expression in (31).

After computing the external product

si, 2si, 2

i2

4 xixi i221212 i24 xi

i

221212 i24 xi21212 i24 2

(3

(41)

we obtain

Ji 

1

2 xixi 0

0 214 (40)

Note that under normality

Ei  0, Ei2  2, Ei3  0, Ei4  34

(42)

Using the expression in (40), the asymptotic covariance matrix is given by

I, 2−12XX−1 0

0 24 (41)

where XX

limn→

i xixi

(43)

Form all this we finally obtain n 

ML − 

d

→ N0, 2XX−1n ML

2 − 2

d

→ N0, 24

(42)

Riferimenti

Documenti correlati

For the calculation of the efficiency, the initial values of the currents and the voltages in the storage elements are assumed to be correct.. As mentioned above, the final values

8 Furthermore, some authors have compared the Perceval S with a new-generation sutureless rapid deployment valve (INTUITY; Edwards Lifesciences LLC, Irvine, Calif) and again

Sakai and myself [33], [34] determined (1) all the biharmonic hypersurfaces in irreducible symmetric spaces of compact type which are regular orbits of commutative Hermann actions

One can easily verify that every element of ~ is a right tail of 5 preordered withrespect to relation defined 1n (j).. A CHARACTERIZATION DF V-PRIME ANO 5TRONGLY V-PRIME ELHlrtHS OF

Se Wills ha ragione, se davvero nel paese leader esiste un blocco sociale maggioritario che ha nell’avversione alla teoria dell’evoluzione biologica e nella diffidenza per il

This paper uses a prospective evaluation design to derive estimates of the potential cost savings that may arise from Local Healthcare Authorities (LHAs)

agricultural region models population energy area Topic 05 data analysis (indexes, indicators, maps).. farmers basin under over potential indicators Topic 06 society

The form f lu-da-qé-e of CTMMA 148A would represent the inaccuracy of an unexperienced scribe (or of a scribe who was not much familiar with the Greek language); 14) while the