Inferential Statistics
Hypothesis tests on a population mean
Eva Riccomagno, Maria Piera Rogantin
DIMA – Universit`a di Genova
riccomagno@dima.unige.it rogantin@dima.unige.it
Part C
Hypothesis tests 0. Review
1. Aside of probability. Normal random variable
2. Test on the mean of a Normal variable – known variance
(running example: toxic algae)
(a) Composite hypotheses (b) The power function
(c) The p-value
(d) Sample size, given α and β
3. Test on the mean of a Normal variable – unknown variance 4. Test on the mean of a variable with unknown distribution 5. Aside of probability. Central limit theorem
6. Test on the proportion of a dichotomous event
7. Test of the equality of two means (two-samples and paired samples)
0. Review A hypothesis testing is formed by
- a null and alternative hypothesis
- a test statistics (function of the sample variables) - a rejection region/decision rule
Significance level/Critical value/Test statistic: all contribute to the definition of the rejection rule
Type I and II errors
Example: if R0 = {x such that T (x) > s} (slide 27) then:
α = P(X > s | H0) reject H0 when it’s true β = P(X ≤ s | H1) retain H0 when it’s false
Formulation of the hypotheses and “form” of R0
Example. The quantity of tomatoes in a can is nominally set at 100 g.
Formulate a statistical hypothesis test as if you were:
1. the Federal Trade Commission 2. an unscrupulous shareholder
3. the worker in charge of the quality control of the can tomato filling machine.
Indicate to which hypotheses the following R0 correspond:
R0 = {x s.t. T (x) > s}
R0 = {x s.t. T (x) < s1} ∪ {x s.t. T (x) > s2} R0 = {x s.t. T (x) < s}
In case of two-sided test s1 and s2 are such that
α1 = P(T < s1 | H0) α2 = P(T > s2 | H0) with α1 + α2 = α If the test statistics has a symmetrical distribution often α1 and α2 are set as: α1 = α2 = α/2
Consider Ex. 2 of Assignment 3. X ∼ B(20, p)
What happens when the hypotheses change?
1. H0 : p = 0.3 H1 : p = 0.5 α = 0.05 one-sided right 2. H0 : p = 0.5 H1 : p = 0.3 α = 0.05 one-sided left
> p_0=0.3;p_1=0.5;
> s05_right=qbinom(1-0.05,20,p_0);s05_right [1] 9 ### same result as p_1=0.7
> s05_left=qbinom(0.05,20,p_1)-1;s05_left ### note -1 [1] 5
One-sided right R0 = {x > 9}; one-sided left R0 = {x < 5}.
Notice that for x = 5, 6, 7, 8, 9 the decision of the two tests is different (small size and p0 “close” to p1)
What happens when α change?
1. H0 : p = 0.3 H1 : p = 0.5 α = 0.05 3. H0 : p = 0.3 H1 : p = 0.5 α = 0.01
> s01_right=qbinom(1-0.01,20,p_0);s01_right [1] 11
R0 with α = 0.01 is smaller than R0 with α = 0.05
1. Aside of Probability. Normal random variable
Probability density functions and cumulative distribution functions
µ = −1, 0, 5 σ = 1
−4 −2 0 2 4 6 8
0.00.10.20.30.40.5
Normal sigma = 1
−4 −2 0 2 4 6 8
0.00.10.20.30.40.5
Normal sigma = 1
−4 −2 0 2 4 6 8
0.00.10.20.30.40.5
Normal sigma = 1
−4 −2 0 2 4 6 8
0.00.20.40.60.81.0
Normal sigma = 1
−4 −2 0 2 4 6 8
0.00.20.40.60.81.0
Normal sigma = 1
−4 −2 0 2 4 6 8
0.00.20.40.60.81.0
Normal sigma = 1
µ = 0
σ = 0.5, 1, 3
−5 0 5
0.00.20.40.60.8
Normal mu = 0
−5 0 5
0.00.20.40.60.8
Normal mu = 0
−5 0 5
0.00.20.40.60.8
Normal mu = 0
−5 0 5
0.00.20.40.60.81.0
Normal mu = 0
−5 0 5
0.00.20.40.60.81.0
Normal mu = 0
−5 0 5
0.00.20.40.60.81.0
Normal mu = 0
Some properties of normal random variable
• For X ∼ N (µ, σ2) let Y = aX + b be.
Then Y ∼ N (aµ + b, a2σ2) In particular
Z = X − µ
σ ∼ N (0, 1) Z is called standard normal variable
• The sum of normal variables is a normal variable.
In particular for X1, . . . , Xn independent and identically dis- tributed (i.i.d.) with Xi ∼ N (µ, σ2) for all i = 1, . . . , n, then
X ∼ N µ, σ2 n
!
• The density function of X ∼ N (µ, σ2) is: f (x) = √ 1
2π σ2 exp
−(x−µ)
2
2σ2
2. Test on the mean of a normal random variable with known variance
2. (a) Composite hypotheses
Running example: X models the concentration of toxic algae blooms
The statistical model is: X ∼ N (µ, σ2) with σ2 known
and experts set a bathing alert if µ > 10000 cells/liter. Thus H0 : µ ≥ 10000 H1 : µ < 10000
The significance level of the test is set at α = 5%. If we reject H0 we can swim with 5% probability of side effects due to the algae.
Test statistics X
Sample size: n = 10 σ = 2100 cells/liter Set α = 5%.
The test hypotheses are both composite
First we consider the two cases 1. H0 : µ=10000 H1 : µ= 8500 2. H0 : µ=10000 H1 : µ<10000 and next
3. H0 : µ≥10000 H1 : µ<10000
10000
Case 1.
Assume H0: X ∼ N (10000, 21002/10)
R0 = {x < s} where s is such that PX < s|µ = 10000 = α Get s = 8908 with R
mu0=10000;std=2100/sqrt(10) c1= qnorm(.05,mu0,std);c1
If the test statistic value on the sample is less than 8908, then we reject H0. Spot the probability of type I error in the plot.
If x is larger than 8908, then the prob- ability of type II error is
β = P s < X|µ = 8500 Get β = 27% with R
mu1=8500;1-pnorm(c1,mu1,std)
We retain H0 with the probability a large probability of type II error.
10000
8500 retain H
reject H0 0
10000
8500 retain H
reject H
Case 2.
H0 : µ=10000 H1 : µ<10000 R0 does not change
as it is computed under H0
The probability β of II type error be- comes a function of µ as it is com-
puted under H1 namely µ < 10000 reject H 10000retain H
0 0
Case 3.
H0 : µ≥10000 H1 : µ<10000
Keeping the same R0, the prob- ability of type I error, denoted by α(µ), is smaller than α
α(µ) < α
The probability β of the type II error is a function of µ under H1, as in case 2.
10000 retain H
reject H0 0
2. (b) The power function P(θ) of a test Consider a generic test on a parameter θ:
H0 : θ ∈ Θ0 H1 : θ ∈ Θ1
Running example. H0 : µ ≥ 10000 H1 : µ < 10000 Then Θ0 = (10000, +∞), Θ1 = (−∞, 10000)
The power function of the test is the probability to reject H0 as a function of the parameter θ
P (θ) = P (T ∈ R0|θ) =
( 1 − β(θ) if θ ∈ Θ1 correct decision α(θ) if θ ∈ Θ0 type I error
Note that α(θ) ≤ α.
1. the critical value is the smallest s s.t. P (T > s|H0) < α 2. the critical region is the largest R0 s.t. P (T ∈ R0|H0) < α
Power function:
P (θ) = P (T ∈ R0|θ) =
( 1 − β(θ) if θ ∈ Θ1 correct decision α(θ) if θ ∈ Θ0 type I error
Running example:
R0 = {X < 8908}
P (µ) = PX < 8908 | µ ∈ R P (10000) = 0.05 = α
0 1
α
10000 8500
1-β(8500)
std=2100/sqrt(10);c1=qnorm(.05,10000,std) mu=seq(7000,11000)
p=pnorm(c1,mu,std)
plot(mu,p,type="l",lwd=3, col="red")
“Tests with large power are preferable”
Power and sample size
The probability to reject H0, when it is true, grows as the sample size grows.
Running example
H0 : µ ≥ 10000 and H1 : µ < 10000 R0 = {Xn < 8908}
P (µ) = P Xn < 8908 | µ ∈ R
n = 10 red n = 20 blue 0
1
α
10000 0
1
α
10000 9500
If the values of the parameter under H0 and under H1 are “close”
(e.g. 10000 and 9500 respectively), only with large sample the probability of correct decision is large.
The power for one-sided and two-sided tests One-sided: H0 : µ ≥ 10000 and H1 : µ < 10000
R0 = {X < 8908} P (µ) = PX < 8908| µ ∈ R
Two-sided: H0 : µ = 10000 and H1 : µ 6= 1000 R0 = {X < 8700} ∪ {X > 11300}
P (µ) = PX < 8700 | µ ∈ R + PX > 11300) | µ ∈ R
One-sided red
H0 : µ ≥ 10000 H1 : µ < 10000
Two-sided blue
H0 : µ = 10000 H1 : µ 6= 10000
0 1
α
10000
mu=seq(7000,13000); mu0=10000; std=2100/sqrt(10) c1_u=qnorm(.05,mu0,std); p=pnorm(c1_u,mu,std)
c1_b=qnorm(.025,mu0,std);c2_b=qnorm(.975,mu0,std);p_b=pnorm(c1_b,mu,std)+1-pnorm(c2_b,mu,std) limx=c(7000,13000);limy=c(0,1)
plot(mu,p,type="l",lwd=3,xaxt="n",yaxt="n",xlab=" ",ylab=" ",col="red",xlim=limx,ylim=limy) par(new=T)
plot(mu,p_b,type="l",lwd=3,xaxt="n",yaxt="n",xlab=" ",ylab=" ",col="blue",xlim=limx,ylim=limy) axis(2, at = 0,0, las=1,cex=1.2,col="blue");axis(2, at = 1,1, las=1,cex=1.2,col="blue")
axis(2, at =0.05,expression(alpha), las=1,cex=1.2,col="blue") axis(1, at = mu0,mu0, las=1,cex=1.2,col="blue")
abline(h=c(0,1));abline(h=0.05,v=c(mu0),lty=3,lwd=2);abline(h=0.05,lty=3,lwd=2)
2. (c) The p-value Recall
A statistical hypothesis test on a parameter θ is given by H0 : θ ∈ Θ0
H1 : θ ∈ Θ1 T ∼ F R0 The power function is defined as
P (θ) = P (T ∈ R0|θ) =
α(θ) if θ ∈ Θ0 1 − β(θ) if θ ∈ Θ1
=
type I error reject H0 when true 1 − type II error reject H0 when false The size of the test is α = supθ∈Θ0 P (θ)
and it holds α ≥ α(θ) for all θ ∈ Θ0
A test is said of level α if its size is less than or equal to α
p-value
We have a family of statistical hypothesis tests H0 : θ ∈ Θ0
H1 : θ ∈ Θ1 T ∼ F tobs from a sample X
and for all α ∈ (0, 1) we have Rα0 (a rejection region for a size α test)
The p-value or observed significance level is the smallest level of significance at which H0 would be rejected, namely
p-value = inf{α ∈ (0, 1) : tobs ∈ Rα0}
The most common situation
Consider the family of tests with Rα0 = {T ≥ sα} where sα is computed from
P (T ≥ sα|H0) < α Note that α0 < α00 if and only if sα0 > sα00
5 10 15 20
0.000.040.080.12
t
s sα‘
α‘
α‘’ obs
p(tobs) = sup
θ∈Θ0
P (T > tobs)
From the book by Wassermann p. 157
Informally, the p-value is a measure of the evidence against H0: the smaller the p-value, the stronger the evidence against H0
Typically, researchers use the following evidence scale:
p-value evidence
< .01 very strong evidence against H0
.01 − .05 strong evidence against H0
.05 − .10 weak evidence against H0
> .1 little or no evidence against H0
Warning! A large p-value is not strong evidence in favor of H0. A large p-value can occur for two reasons: (i) H0 is true or (ii) H0 is false but the test has low power.
Warning! Do not confuse the p-value with P (H0|Data). The p-value is not the probability that the null hypothesis is true.
“evidence” ⇐⇒ “statistically significant”
The p-value depends on the sample size. If the sample is large, even a small difference can be “evidence”, that is hard to explain by the chance variability.
How to compute the p-value?
The p-value (as the rejection region) depends on the “form” of the alternative hypothesis
In practice, the p-value is the level of the test if the threshold of R0 is the sample value.
Consider H0 : µ = µ0
Running example: µ0 = 10000 and suppose x = 9000 H1 : µ < µ0
H1 : µ > µ0
H1 : µ 6= µ0
10000 9000
10000 9000
10000 11000
9000
p(9000) = 0.066
pnorm(9000,mu0,std)
p(9000) = 0.934
1-pnorm(9000,mu0,std)
p(9000) = 0.132
2*pnorm(9000,mu0,std) [notice: 2*pnorm]
Reference
Regina Nuzzo. (2014) Statistical Errors – P values, the “gold standard” of statistical validity, are not as reliable as many sci- entists assume. Nature, vol. 506, p. 150-2.
2. (d) Sample size n, given α e β for one-sided test
H0 : µ = µ0 H1 : µ = µ1 (with µ1 < µ0) ⇒ R0 = (−∞, s)
Let Z ∼ N (0, 1) and zα be the α-th quantile of Z.
α = P X < s|µ = µ0
= P
X − µ0
σ/√
n < s − µ0
σ/√ n
= P (Z < zα) β = P X > s|µ = µ1
= P
X − µ1
σ/√
n > s − µ1
σ/√ n
= P Z > z1−β
From s−µ0
σ/√
n = zα and s−µ1
σ/√
n = z1−β = −zβ we have:
n =
zα + zβ2 σ2 (µ0 − µ1)2
The result is the same if µ1 > µ0.
The sample size n increases when:
- the distance between µ0 and µ1 decreases - the variance increases
- zα and zβ increase, equivalently α and β decrease
3. Test on the mean of a Normal variable with unknown variance – t test
Let X ∼ N (µ, σ2) with unknown µ and σ2
The estimator of µ remains X. The unbiased estimator of σ2 is S2 = 1
n − 1
n X
i=1
Xi − X2
A small aside of probability
The random variable T has distribution Student’s t with n − 1 degrees of freedom:
T = X − µ S/√
n ∼ t[n−1]
The density function and the cumulative distribution function of a t[n] r.v. are close to those of the standard normal r.v. N (0, 1)
Dashed lines: t[2] and t[5] – solid line: N (0, 1) −3 −2 −1 0 1 2 3
0.00.10.20.30.4
−3 −2 −1 0 1 2 3
0.00.10.20.30.4
−3 −2 −1 0 1 2 3
0.00.10.20.30.4
Running example. H0 : µ ≥ 10000 H1 : µ < 10000 Suppose σ unknown and estimated by 2000.
The form of the rejection region is the same when σ is known : R0 = {X < s}
The threshold s is such that P
X < s|µ = 10000 = P X − 10000 2000/√
10 < s − 10000 2000/√
10
!
= P (T < tα) = α = 0.05
where tα is the α-th quantile of a random variable t[9]. Then s − 10000
2000/√
10 = tα and s = 10000 + tα 2000/√ 10
We obtain s = 8841 from R:
> c = qt(.05,9)
> s=10000+c*2000/sqrt(10);s
4. The Wald test for the mean of random variable with unknown distribution
Consider a test on the mean µ of a random variable X with unknown distribution and assume a large sample size. Then an asymptotic test can be performed.
The pivot used for the test statistics is X − µ
S/√ n where S2 is defined as in slide 22.
Its distribution can be approximated with a standard normal dis- tribution:
X − µ S/√
n ∼
approx N (0, 1)
5. Aside of probability. Approximate distribution of the sum Sn and the mean Xn of random variables
independent and identically distributed (i.i.d.)
Let X1, . . . , Xn be i.i.d. random variables with mean µ and vari- ance σ2
Let Sn be the random variable Sn =
n X
i=1
Xi
Sn has mean n µ and variance n σ2. Then S√n−nµ
nσ has mean 0 and variance 1.
Central limit theorem (CLT)
n→∞lim P
Sn − nµ
√nσ ≤ t
!
= P(Z ≤ t), for all t ∈ R with Z ∼ N (0, 1)
Approximate distribution of Sn and Xn
For “large” n Sn ∼
approx N (nµ, nσ2)
The random variable sample mean Xn = Sn/n has mean µ and variance σ2/n.
For “large” n Xn ∼
approx N (µ, σ2/n) How large n should be?
Comparison between exact distribution and approximate distribution via CLT of Xn for different n
Two cases, where the exact distribution of Xn is known
• exponential distribution
• Bernoulli distribution
Case 1
X1, . . . , Xn i.i.d. with exponential distribution with parameter λ X1 ∼ E(λ)
Plot of the density function of X1 f (x) = λe−λx for x > 0 for λ = 1, 5, 10
(very “asymmetrical”)
0.0 0.5 1.0 1.5 2.0
0246810
Esponenzial
0.0 0.5 1.0 1.5 2.0
0246810
Esponenzial
0.0 0.5 1.0 1.5 2.0
0246810
Esponenzial
What about the distribution of Xn?
Exact and approssimate distributions of Xn, X1 ∼ E(2) Probability density functions of Xn
0.0 0.2 0.4 0.6 0.8 1.0
012345
n = 2
0.0 0.2 0.4 0.6 0.8 1.0
012345
0.0 0.2 0.4 0.6 0.8 1.0
012345
n = 5
0.0 0.2 0.4 0.6 0.8 1.0
012345
0.0 0.2 0.4 0.6 0.8 1.0
012345
n = 10
0.0 0.2 0.4 0.6 0.8 1.0
012345
0.0 0.2 0.4 0.6 0.8 1.0
012345
n = 20
0.0 0.2 0.4 0.6 0.8 1.0
012345
Cumulative distribution functions of Xn
0.0 0.2 0.4 0.6 0.8 1.0
0.00.40.8
n = 2
0.0 0.2 0.4 0.6 0.8 1.0
0.00.40.8
0.0 0.2 0.4 0.6 0.8 1.0
0.00.40.8
n = 5
0.0 0.2 0.4 0.6 0.8 1.0
0.00.40.8
0.0 0.2 0.4 0.6 0.8 1.0
0.00.40.8
n = 10
0.0 0.2 0.4 0.6 0.8 1.0
0.00.40.8
0.0 0.2 0.4 0.6 0.8 1.0
0.00.40.8
n = 20
0.0 0.2 0.4 0.6 0.8 1.0
0.00.40.8
Case 2
X1, . . . , Xn i.i.d. with Bernoulli distribution with parameter p
Model for an experiment with two outcomes: 1 (success) with probability p and 0 with probability 1 − p:
X1 ∼ B(1, p) with mean p and variance p(1 − p).
Plot of the distribution function of X1 for p = 0.3
0.00.20.40.60.81.0
x
density function
0 1
Sn has exact binomial distribution B(n, p) (sum of ones in n in- dependent trials)
Xn sometime is denoted by Pb (average of ones in n independent trials)
Exact and approssimate cumulative distribution functions of Xn, X1 ∼ B(1, 0.3))
0.0 0.2 0.4 0.6 0.8 1.0
0.00.40.8
n = 2
0.0 0.2 0.4 0.6 0.8 1.0
0.00.40.8
0.0 0.2 0.4 0.6 0.8 1.0
0.00.40.8
n = 5
0.0 0.2 0.4 0.6 0.8 1.0
0.00.40.8
0.0 0.2 0.4 0.6 0.8 1.0
0.00.40.8
n = 10
0.0 0.2 0.4 0.6 0.8 1.0
0.00.40.8
0.0 0.2 0.4 0.6 0.8 1.0
0.00.40.8
n = 30
0.0 0.2 0.4 0.6 0.8 1.0
0.00.40.8
Example
How accurate are radon detectors of a type sold to homeowners?
University researchers placed 12 detectors in a chamber that ex- posed them to 105 pico-curies per liter (pCi/l) of radon.
The detector readings were as follows:
91.9,97.8,111.4,122.3,105.4,95.0,103.8,99.6,96.6,119.3,104.8,101.7
Is there convincing evidence that the mean reading of all detec- tors of this type differs from the nominal value of 105?
Computation in R
> p=c(91.9,97.8,111.4,122.3,105.4,95.0,103.8,99.6,96.6,119.3,104.8,101.7)-105
> t.test(p)
One Sample t-test data: p
t = -0.31947, df = 11, p-value = 0.7554
alternative hypothesis: true mean is not equal to 0 95 percent confidence interval:
-6.837503 5.104170 sample estimates:
mean of x -0.8666667
Note: for t.test data should be centered at the value of H0.
6. Test on the proportion p of a binary event
Example.
It is known that 40% of the mice with eczema are free from symptoms in 4 weeks.
We consider a new the drug. It is efficient if more than 40% of mice are free from symptoms in 4 weeks.
From the population of mice that just shown eczema symptoms a sample is drawn; these mice are treated with the drug.
Parameter of interest:
p proportion mice free from symptoms in 4 weeks
The random variable X modeling the experiments has distribution X ∼ B(1, p)
with mean p and variance p(1 − p).
• Formulation of hypotheses:
H0: the drug has no effect, p = 0.40
To state that the drug is efficient we should reject H0 After 4 weeks: H0: p = 0.40 and H1: p > 0.40
• One-sided right test
• Sample size: n = 25
• Sample variables: X1, . . . , X25 i.i.d. X1 ∼ B(1, p).
• Test statistics:
P = Xb
mean of ones in the sample.
Result of the experiment and decision
In the sample of 25 treated mice 12 were free of symptoms in 4 weeks: ˆp = 0.48
Assume H0. Fix α = 5%. Use CLT:
Pb ∼
approx N
0.40, 0.40 0.60 25
• Rejection region of H0: R0 = (c, 1)
where c is the 0.95-th quantile of a N (0.40, 0.4 ∗ 0.6/25)
> s = sqrt(0.4*0.6/25)
> qnorm(0.95,0.40,s) [1] 0.5611621
R0 = (0.56, 1)
• p-value of 0.48: P(P > 0.48 | p = 0.40)b
> 1-pnorm(0.48,0.40,s)
[1] 0.2071081 p-value(0.48) = 0.21
There is no experimental evidence to reject H0
Direct computation in R
> prop.test(12,25,0.40,alternative="greater",correct=F)
1-sample proportions test without continuity correction data: 12 out of 25, null probability 0.4
X-squared = 0.66667, df = 1, p-value = 0.2071
alternative hypothesis: true p is greater than 0.4 95 percent confidence interval:
0.3258181 1.0000000 sample estimates:
p 0.48
The default alternative is "two.sided"
The default level is conf.level=0.95
Exercise
Chicago land’s technology professionals get local technology news from various newspapers and magazines. A marketing company claims that 25% of the IT professionals choose the Chicago Tri- bune as their primary source for local IT news. A survey was conducted last month to check this claim. Among a sample of 750 IT professionals in the Chicago land area, 23.43% of them prefer the Chicago Tribune. Can we conclude that the claim of the marketing company is true?
7. Test for the equality of two means
A common application is to test if a new process or treatment is superior to a current process or treatment.
The data may either be paired or unpaired.
a) Paired samples When there is a one-to-one correspondence between the values in the two samples. That is, if X1, X2, . . . , Xn and Y1, Y2, . . . , Yn are the two sample variables, then Xi cor- responds to Yi.
b) Unpaired samples The sample sizes for the two samples may or may not be equal.
7. a) Paired samples
Let X and Y be two random variables modeling some character- istic of the same population.
Example. Drinking Water
(from https://onlinecourses.science.psu.edu Penny State University)
Trace metals in drinking water affect the flavor and an unusually high con- centration can pose a health hazard.
Ten pairs of data were taken measuring zinc concentration in bottom water and surface water.
> water
bottom surface [1,] 0.430 0.415 [2,] 0.266 0.238 [3,] 0.567 0.390 [4,] 0.531 0.410 [5,] 0.707 0.605 [6,] 0.716 0.609 [7,] 0.651 0.632 [8,] 0.589 0.523 [9,] 0.469 0.411 [10,] 0.723 0.612
Assume X ∼ N (µX, σX2 ) and Y ∼ N (µY , σY2).
Let (X1, Y1), . . . , (Xn, Yn) be the n paired sample variables.
Consider the sample random variables D1, . . . , Dn with Di = Xi − Yi.
The test statistics is the sample mean of D:
D ∼ N (µD, σD2 /n)
with µD = µX − µY and σD2 = σX2 + σY2 − 2Cov(X, Y ), usually unknown and estimated by the unbiased estimator SD2 .
The test of the equality of the two means, e.g. H0 : µX = µY , becomes a Student’s t test on µD, e.g. H0 : µD = 0,
Example. Drinking Water (continue)
> D=water[,1]-water[,2];D
[1] 0.015 0.028 0.177 0.121 0.102 0.107 0.019 0.066 0.058 0.111
• Hypotheses: H0 : µD = 0 and H1 : µD 6= 0
• Two-sided right: R0 = (−∞, c1) ∪ (c2, +∞)
• Sample size: n = 10
• Sample variables: D1, . . . , D25 i.i.d. Xi ∼ N (0, σ2) with σ2 estimated by SD2
• Test statistics: T = D−µD
S/√
n ; under H0: T = D
S/√
n ∼ t9
• α = 0.05
The thresholds of the rejection region c1 and c2 are such that 0.025 = P(T < c1 | µD = 0) 0.025 = P(T > c2 | µD = 0) Observe that, because of the symmetry w.r.t. 0 of the Student’s t density
c1 = −c2
In the sample: d = 0.0804 s = 0.052
The sample value of the test statistics, under H0, is 4.86
> d_m=mean(D);d_m; s=sd(D);s [1] 0.0804
[1] 0.05227321
> t=d_m/(s/sqrt(10));t [1] 4.863813
The rejection region is R0 = (−∞, −2.262) ∪ (2.262, ∞). The p-value is 0.0009
> c1=qt(0.025,9) [[1] -2.262157
> 2*(1-pt(t,9)) [1] 0.0008911155
The direct computation in R produces:
> t.test(water[,1],water[,2],paired=TRUE) Paired t-test
data: water[, 1] and water[, 2]
t = 4.8638, df = 9, p-value = 0.0008911
alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval:
0.043006 0.117794 sample estimates:
mean of the differences 0.0804
There is experimental evidence to reject H0
7. b) Unpaired samples Example. Prey of two species of spiders
(from https://onlinecourses.science.psu.edu Penny State University)
The feeding habits of two species of net-casting spiders are stud- ied. The species, the deinopis and menneus, coexist in eastern Australia. The following data were obtained on the size, in mil- limeters, of the prey of random samples of the two species.
The spiders were selected randomly. Then assume the measure- ments are independent.
> d
[1] 12.9 10.2 7.4 7.0 10.5 11.9 7.1 9.9 4.4 11.3
> m
[1] 10.2 6.9 10.9 11.0 10.1 5.3 7.5 10.3 9.2 8.8
> mean(d);mean(m) [1] 10.26
[1] 9.02
d m
68101214
Normal distribution
Assume the size of the two population (denoted by A and B) have normal distribution
XA ∼ N (µA, σA2) XB ∼ N (µB, σB2 )
We want to test H0 : µA = µB and H1 : µA 6= µB
or equivalently: H0 : µA − µB = 0 and H1 : µA − µB 6= 0
Let nA and nB be the size of the two independent sample of XA and XB. In the example nA = nB = 10.
The two sample mean random variable are:
XA ∼ N µA, σA2 nA
!
XB ∼ N µB, σB2 nB
!
Consider the random variable difference of the two sample mean random variables. It has normal distribution:
XA − XB ∼ N µA − µB, σA2
nA + σB2 nB
!
The original test become a test on the mean of one normal random variable
1. The variances σA2 and σB2 are known
Fixed α, a usual normal test is carried out.
2. The variances σA2 and σB2 are unknown, and assumed equal and estimated by the unbiased estimators SA2 e SB2.
A unbiased estimator of the variance of the random variable XA − XB is:
S2 = (nA − 1)SA2 + (nB − 1)SB2
(nA + nB − 2) · nA + nB
nA nB (Pooled variance) In particular, if nA = nB = n, then S2 = SA2 + SB2
/n
The test statistics is:
T =
XA − XB − (µA − µB)
S with T ∼ td d = nA + nB − 2 Fixed α, a usual Student’s t test is carried out.
3. The unknown variances σA2 and σA2 are not equal
A hypothesis test based on the t distribution, known as Welch’s t-test, can be used.
Example. Prey of two species of spiders (continue)
• Hypotheses: H0 : µD = µM and H1 : µD 6= µM
• Two-sided right: R0 = (−∞, c1) ∪ (c2, +∞)
• Sample size: nD = nM = 10
• First, assume σM2 = σD2 . Pooled variance estimator:
S2 = SD2 + SM2 /n and d = 2n − 2
• Test statistics under H0: T =
XD − XM
S ∼ t18
• α = 0.05
The thresholds of the rejection region c1 and c2 are such that 0.025 = P(T < c1 | µD = µM) 0.025 = P(T > c2 | µD = µM)
The sample means of the two groups are:
xD = 10.26 xM = 9.0.2
The sample difference of means is: xD − xM = 1.24 The sample pooled variance is: s2 = 1.01
The sample value of the test statistics, under H0, is 1.18
> diff_m=mean(d)-mean(m);diff_m [1] 1.24
> s2=(sd(d)^2+sd(m)^2)/10;s2 [1] 0.9915556
> t=diff_m/sqrt(s2);t [1] 1.245269
The rejection region is R0 = (−∞, −2.1) ∪ (2.1, ∞).
The p-value is 0.25
> c1=qt(0.025,18);c1 [1] -2.100922
> 2*(1-pt(t,18)) ## note 2*( ) -- two sided test [1] 0.2290008
There is no experimental evidence to reject H0
Can we assume the two variances equal?
A specific test can be performed (based on the Fisher distribu- tion). Here we do not give the details. Compute in R
> var.test(m, d, ratio = 1)
F test to compare two variances data: m and d
F = 0.56936, num df = 9, denom df = 9, p-value = 0.4142
alternative hypothesis: true ratio of variances is not equal to 1 95 percent confidence interval:
0.1414206 2.2922339 sample estimates:
ratio of variances 0.5693585
We can assume the σD2 = σM2 , although the ratio of variances is 0.57. This apparent inconsistency is due to the small sample sizes.
Direct computation in R of the test
H0 : µD = µM and H1 : µD 6= µM, assuming σD2 = σM2
> t.test(d,m,var.equal=T) Two Sample t-test data: d and m
t = 1.2453, df = 18, p-value = 0.229
alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval:
-0.8520327 3.3320327 sample estimates:
mean of x mean of y
10.26 9.02
If the equality of the variances is rejection, we use the Welch Two Sample t-test.
In such a case the polled variance s2 and the degrees of freedom are compute in an another manner.
Compute in R
t.test(d,m)
Welch Two Sample t-test data: d and m
t = 1.2453, df = 16.74, p-value = 0.2302
alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval:
-0.8633815 3.3433815 sample estimates:
mean of x mean of y 10.26 9.02
The problem of making inference on means when vari- ances are unequal, is, in general, quite a difficult one. It is known as the Behrens-Fisher Problem.
(G. Casella, R.J. Berger, Statistical Inference, 2nd ed., Duxbury, Ex.
8.42)
Notes and generalisations
• The Wald test. If the two random variables have not normal distribution and the sample size is “large” a Wald test can be performed.
• Threshold different from zero. In some applications, you may want to adopt a new process or treatment only if it exceeds the current treatment by some threshold. In this case, the difference between the two mean is not compared with 0 but with the chosen threshold.
Open more graphical devices in R
boxplot(a)
dev.new(); dev.set(dev.cur()) ## open a new device boxplot(b)
for (i in 1:2) dev.off() ## close two open devices