Inferential Statistics
Hypothesis tests on a population mean
Eva Riccomagno, Maria Piera Rogantin
DIMA – Universit`a di Genova
riccomagno@dima.unige.it rogantin@dima.unige.it
Part D
Hypothesis tests on the mean of a population (continue)
1. Test on the mean of a Normal variable – known variance (a)-(b) . . .
(c) The p-value
(d) Sample size, given α and β
2. Test on the mean of a Normal variable – unknown variance
1. (c) The p-value
Recall
A statistical hypothesis test on a parameter θ is given by H0 : θ ∈ Θ0
H1 : θ ∈ Θ1 T ∼ F R0 The power function is defined as
P (θ) = P (T ∈ R0|θ) =
α(θ) if θ ∈ Θ0 1 − β(θ) if θ ∈ Θ1
=
pr. type I error reject H0 when true 1 − pr. type II error reject H0 when false The size of the test is α = supθ∈Θ0 P (θ)
and it holds α ≥ α(θ) for all θ ∈ Θ0
A test is said of level α if its size is less than or equal to α
p-value
We have a family of statistical hypothesis tests H0 : θ ∈ Θ0
H1 : θ ∈ Θ1 T ∼ F tobs from a sample X
and for all α ∈ (0, 1) we have Rα0 (a rejection region for a size α test)
The p-value or observed significance level is the smallest level of significance at which H0 would be rejected, namely
p-value = inf{α ∈ (0, 1) : tobs ∈ Rα0}
• From the book by Wassermann p. 157
Informally, the p-value is a measure of the evidence against H0: the smaller the p-value, the stronger the evidence against H0
Typically, researchers use the following evidence scale:
p-value evidence
< .01 very strong evidence against H0
.01 − .05 strong evidence against H0
.05 − .10 weak evidence against H0
> .1 little or no evidence against H0
Warning! A large p-value is not strong evidence in favor of H0. A large p-value can occur for two reasons: (i) H0 is true or (ii) H0 is false but the test has low power.
Warning! Do not confuse the p-value with P (H0|Data). The p- value is not the probability that the null hypothesis is true.
• “evidence” ⇐⇒ “statistically significant”
• The p-value depends on the sample size. If the sample is large, even a small difference can be “evidence”, that is hard to explain by the chance variability
The most common situation
Consider the family of tests with Rα0 = {T ≥ sα} where sα is computed from
P (T ≥ sα|H0) < α Note that α0 < α00 if and only if sα0 > sα00
5 10 15 20
0.000.040.080.12
t
s sα‘
α‘
α‘’ obs
p(tobs) = sup
θ∈Θ0
P (T > tobs)
The p-value is the level of the test when tobs is the threshold of the rejected region.
Running example. Concentration of toxic alga blooms
H0 : µ ≥ 10000 and H1 : µ < 10000
Day A: xA = 8500 p-value(8500) = 0.012
stde=2100/sqrt(10)
pnorm(8500,10000,stde)
[1] 0.01194886 700070007000 800080008000 900090009000 100001000010000 110001100011000 120001200012000 130001300013000
Day B: xB = 9500 p-value(9500) = 0.226
7000 8000 9000 10000 11000 12000 13000
7000 8000 9000 10000 11000 12000 13000
7000 8000 9000 10000 11000 12000 13000
How to compute the p-value?
The p-value (as the rejection region) depends on the “form” of the alternative hypothesis
In practice, the p-value is the level of the test if the threshold of R0 is the sample value
Consider H0 : µ = µ0
Assume: µ0 = 10000 and suppose x = 9000
different contexts
H1 : µ < µ0
H1 : µ > µ0
H1 : µ 6= µ0
10000 9000
10000 9000
10000 11000
9000
p(9000) = 0.066
pnorm(9000,mu0,std)
p(9000) = 0.934
1-pnorm(9000,mu0,std)
p(9000) = 0.132
2*pnorm(9000,mu0,std) [notice: 2*pnorm]
1. (d) Sample size n, given α e β for one-sided test
H0 : µ = µ0 H1 : µ=µ1
α probability of type I error β probability of type II error
To have at most the given probabilities of error, the sample size n should be
n ≥
zα + zβ2 σ2
(µ0 − µ1)2
where zα and zβ are the α-th and the β-th quantiles of a standard normal random variable Z, Z ∼ N (0, 1)
The sample size n increases when:
- the distance between µ0 and µ1 decreases - the variance increases
- zα and zβ increase, equivalently α and β decrease
How to compute n?
Assume µ1 < µ0
α = P X < s|µ = µ0
= P
X − µ0
σ/√
n < s − µ0
σ/√ n
= P (Z < zα) β = P X > s|µ = µ1
= P
X − µ1
σ/√
n > s − µ1
σ/√ n
= P Z > z1−β
From s−µ0
σ/√
n = zα and s−µ1
σ/√
n = z1−β = −zβ we have:
n = zα + zβ2 σ2 (µ0 − µ1)2 The result is the same if µ1 > µ0
2. Test on the mean of a Normal variable with unknown variance – t test
Let X ∼ N (µ, σ2) with unknown µ and σ2 The estimator of µ remains X.
Remember that, if σ2 is known, X ∼ N
µ, σ2
n
or, equivalently, X − µ
σ/√
n ∼ N (0, 1)
The unbiased estimator of σ2 is (see Part A) S2 = 1
n − 1
n X
i=1
Xi − X2
A small aside of probability
The random variable T has distribution Student’s t with n − 1 degrees of freedom:
T = X − µ S/√
n ∼ t[n−1]
The density function and the cumulative dis- tribution function of a t[n] r.v. are close to those of the standard normal r.v. N (0, 1) Dashed lines: t[2] and t[5] – solid line: N (0, 1)
−3 −2 −1 0 1 2 3
0.00.10.20.30.4
−3 −2 −1 0 1 2 3
0.00.10.20.30.4
−3 −2 −1 0 1 2 3
0.00.10.20.30.4
Running example. H0 : µ ≥ 10000 H1 : µ < 10000
Suppose σ unknown and estimated by s = 2000 (using S2)
The form of the rejection region is the same when σ is known : R0 = {X < c}
The threshold c is such that P
X < c|µ = 10000 = P X − 10000 2000/√
10 < c − 10000 2000/√
10
!
= P (T < tα) = α = 0.05
where tα is the α-th quantile of a random variable t[9]. Then c − 10000
2000/√
10 = tα and c = 10000 + tα 2000/√ 10
We obtain c = 8841 from R:
t_05 = qt(.05,9)
c=10000+t_05*2000/sqrt(10);c
Example
How accurate are radon detectors of a type sold to homeowners?
University researchers placed 12 detectors in a chamber that ex- posed them to 105 pico-curies per liter (pCi/l) of radon
The detector readings were as follows:
91.9,97.8,111.4,122.3,105.4,95.0,103.8,99.6,96.6,119.3,104.8,101.7
Is there convincing evidence that the mean of detector readings differs from the nominal value of 105?
Model and test
Let X be the random variable modeling the detector reading Assume X ∼ N (µ, σ2); σ2 unknown
H0 : µ = 105 and H1 : µ 6= 105 Test statistic under H0: X−105
S/√ n
• p-value computation in R using t.test
Rn=c(91.9,97.8,111.4,122.3,105.4,95.0,103.8,99.6,96.6,119.3,104.8,101.7) t.test(Rn-105) ### data should be centered at mu_0
One Sample t-test data: Rn - 105
t = -0.31947, df = 11, p-value = 0.7554
alternative hypothesis: true mean is not equal to 0 95 percent confidence interval:
-6.837503 5.104170 sample estimates:
mean of x -0.8666667
p-value=0.76.
The sample test statistic depends on both x and s
• Computation of the reject region R0
t05=qt(0.025,11)
c1=105+t05*sd(Rn)/sqrt(12) c2=105-t05*sd(Rn)/sqrt(12) cbind(c1,c2)
s1 s2
[1,] 99.02916 110.9708 mean(Rn)
[1] 104.1333
R0 = {x < 99.0} ∪ {x > 111.0}
Note that R0 depends on data; in fact the variance of X is estimated by S2/√
n
The sample mean is 104.1 that does not belong to R0
• Direct computation of the p-value
t_obs=(mean(Rn)-105)/(sd(Rn)/sqrt(length(Rn)));t_obs [1] -0.3194729
pvalue=2*pt(t_obs,(length(Rn)-1));pvalue [1] 0.7553532
There is not evidence to reject H0 (µ = 105)