Statistical models A.Y. 2014/15

Written exam of March 27, 2015.

1. Consider the general linear model Y = Xβ + E, where Y = (y1, . . . , yn) is a vector in R^{n},
X is a matrix n × p and E are random variables with mean 0.

(a) Show that ˆβ = (X^{t}X)^{−1}X^{t}Y is an unbiased estimator of β. Compute also V ( ˆβ). This
choice is generally called the least square method; what does this exactly mean?

(b) Two possible motivations for the choice of ˆβ are maximum likelihood estimation, or Gauss-Markov theorem. State precisely the results (proofs are not necessary, but, if time allows, they are welcome) and the assumptions on the errors E for either result.

(c) Let ˆε = Y − X ˆβ the observed residuals; prove that V (ˆε_{i}) = σ^{2}(1 − H_{ii}) where H =
X(X^{t}X)^{−1}X^{t} and σ^{2} = V (E_{j}) for each j. Explain why, if H_{ii} is close to 1 for some
value of i, this results makes us expect that ˆy_{i}will be close to yi.

(d) Are ˆεiand ˆεj independent for i 6= j? [Hint: check the proof of the previous result]

2. (a) Write down the matrix X and the vector β in order to set the linear regression y_{i} =
a + bx_{i}+ ε_{i}, i = 1 . . . n as a general linear model Y = Xβ + E.

(b) How can be obtain a confidence interval for the parameter b? Write down the elements of the formula, not necessarily the explicit computation.

(c) When is the matrix (X^{t}X) not invertible?

(d) Assume that the number of points n = 10 and there exists a qualitative variable Z such that Zi = A for i = 1 . . . 4 and Zi = B for i = 5 . . . 10. Write down the model

y_{i} =

(µ_{A}+ bx_{i}+ ε_{i} if Z_{i} = A
µ_{B}+ bx_{i}+ ε_{i} if Z_{i} = B

as a general linear model specifying the matrix X and the vector β (there are several
ways to do this, but there is one that is preferable if we then wish to test the hypothesis
µ_{A}= µ_{B}).

3. Write down, if possible, the following models as linear models, possibly after some transfor-
mation? In all cases ε_{i} represent independent and equidistributed error terms, E(εi) = 0; a,
b, c . . . parameters to be estimated.

(a) yi =

(a + b(x_{i}− 30) + ε_{i} if x_{i} < 30
a + c(30 − x_{i}) + ε_{i} if x_{i} > 30
(b) yi = ax^{b}_{i}(1 + ε_{i})

(c) yi = a + bx_{i}
x_{i}+ c + ε_{i}

(d) y_{i} = a cos(ωx_{i}+ ϕ) + b sin(ωx_{i} + ϕ) + ε_{i}

1