Inferential Statistics
Hypothesis tests on a population mean
Eva Riccomagno, Maria Piera Rogantin
DIMA – Universit`a di Genova
riccomagno@dima.unige.it rogantin@dima.unige.it
Part E
Hypothesis tests on the mean of a population (continue)
3. Test on the mean of a variable with unknown distribution 4. Aside of probability. Central limit theorem
5. Test on the proportion of a dichotomous event
3. The Wald test for the mean of random variable with unknown distribution
Consider a test on the mean µ of a random variable X with unknown distribution and assume a large sample size. Then an asymptotic test can be performed
The pivot used for the test statistics is X − µ
S/√ n where S2 = 1
n−1
Pn
i=1(Xi − X)2
Its distribution can be approximated with a standard Normal dis- tribution
X − µ S/√
n ∼
approx N (0, 1)
4. Aside of probability. Approximate distribution of the sum Sn and the mean Xn of random variables
independent and identically distributed (i.i.d.)
Let X1, . . . , Xn be i.i.d. random variables with mean µ and vari- ance σ2
Let Sn be the random variable Sn =
n X
i=1
Xi
Sn has mean n µ and variance n σ2. Then S√n−nµ
nσ has mean 0 and variance 1
Central limit theorem (CLT)
n→∞lim P
Sn − nµ
√nσ ≤ t
!
= P(Z ≤ t), for all t ∈ R with Z ∼ N (0, 1)
Approximate distribution of Sn and Xn
For “large” n Sn ∼
approx N (nµ, nσ2)
The random variable sample mean Xn = Sn/n has mean µ and variance σ2/n
For “large” n Xn ∼
approx N (µ, σ2/n) How large n should be?
Comparison between exact distribution and approximate distribution via CLT of Xn for different n
Two cases, where the exact distribution of Xn is known
• exponential distribution
• Bernoulli distribution
Example 1
X1, . . . , Xn i.i.d. with exponential distribution with parameter λ, λ > 0, written as
X1 ∼ E(λ) The mean of X1 is 1
λ
Plot of the density function of X1 f (x) = λe−λx for x > 0 for λ = 1, 5,10
(very “asymmetrical”)
0.0 0.5 1.0 1.5 2.0
0246810
Esponenzial
0.0 0.5 1.0 1.5 2.0
0246810
Esponenzial
0.0 0.5 1.0 1.5 2.0
0246810
Esponenzial
What about the distribution of Xn?
Exact and approssimate distributions of Xn, X1 ∼ E(2) Probability density functions of Xn
0.0 0.2 0.4 0.6 0.8 1.0
012345
n = 2
0.0 0.2 0.4 0.6 0.8 1.0
012345
0.0 0.2 0.4 0.6 0.8 1.0
012345
n = 5
0.0 0.2 0.4 0.6 0.8 1.0
012345
0.0 0.2 0.4 0.6 0.8 1.0
012345
n = 10
0.0 0.2 0.4 0.6 0.8 1.0
012345
0.0 0.2 0.4 0.6 0.8 1.0
012345
n = 20
0.0 0.2 0.4 0.6 0.8 1.0
012345
Cumulative distribution functions of Xn
0.0 0.2 0.4 0.6 0.8 1.0
0.00.40.8
n = 2
0.0 0.2 0.4 0.6 0.8 1.0
0.00.40.8
0.0 0.2 0.4 0.6 0.8 1.0
0.00.40.8
n = 5
0.0 0.2 0.4 0.6 0.8 1.0
0.00.40.8
0.0 0.2 0.4 0.6 0.8 1.0
0.00.40.8
n = 10
0.0 0.2 0.4 0.6 0.8 1.0
0.00.40.8
0.0 0.2 0.4 0.6 0.8 1.0
0.00.40.8
n = 20
0.0 0.2 0.4 0.6 0.8 1.0
0.00.40.8
Exact: Xn ∼ 1n Γ (n, λ) Approssimate: Xn ∼ N 1λ, nλ12
Example 2
X1, . . . , Xn i.i.d. with Bernoulli distribution with parameter p
Model for an experiment with two outcomes: 1 (success) with probability p and 0 with probability 1 − p
X1 ∼ B(1, p) with mean p and variance p(1 − p)
Plot of the distribution function of X1 for p = 0.3
0.00.20.40.60.81.0
x
density function
0 1
Sn has exact binomial distribution B(n, p) (sum of ones in n in- dependent trials)
Xn sometime is denoted by Pb (number of ones in n independent trials divided by n)
Exact and approssimate cumulative distribution functions of Xn, X1 ∼ B(1, 0.3)
0.0 0.2 0.4 0.6 0.8 1.0
0.00.40.8
n = 2
0.0 0.2 0.4 0.6 0.8 1.0
0.00.40.8
0.0 0.2 0.4 0.6 0.8 1.0
0.00.40.8
n = 5
0.0 0.2 0.4 0.6 0.8 1.0
0.00.40.8
0.0 0.2 0.4 0.6 0.8 1.0
0.00.40.8
n = 10
0.0 0.2 0.4 0.6 0.8 1.0
0.00.40.8
0.0 0.2 0.4 0.6 0.8 1.0
0.00.40.8
n = 30
0.0 0.2 0.4 0.6 0.8 1.0
0.00.40.8
Exact: Xn ∼ 1
n B(n, p) Approssimate: Xn ∼ N
p, p(1−p)n
5. Test on the proportion p of a binary event
Example
It is known that 40% of the mice with eczema are free from symptoms in 4 weeks
A new the drug is efficient if more than 40% of mice are free from symptoms in 4 weeks
From the population of mice that start eczema symptoms a sam- ple is drawn; these mice are treated with the drug
Parameter of interest:
p proportion of mice free from symptoms in 4 weeks
The random variable X modeling the experiments has distribution X ∼ B(1, p)
with mean p and variance p(1 − p)
• Formulation of hypotheses:
H0: the drug has no effect, p = 0.40
To state that the drug is efficient we should reject H0 After 4 weeks: H0: p = 0.40 and H1: p > 0.40
• One-sided right test
• Sample size: n = 25
• Sample variables: X1, . . . , X25 i.i.d. X1 ∼ B(1, p)
• Test statistics:
P = Xb
mean of ones in the sample (mice free of symptom after four weeks)
Result of the experiment and decision
In the sample of 25 treated mice 12 were free of symptoms in 4 weeks: ˆp = 0.48
Assume H0. Fix α = 5%. Use CLT:
Pb ∼
approx N
0.40, 0.40 0.60 25
• Rejection region of H0: R0 = (c, 1)
where c is the 0.95-th quantile of a N (0.40, 0.4 ∗ 0.6/25)
s = sqrt(0.4*0.6/25)
c = qnorm(0.95,0.40,s);c [1] 0.5611621
R0 = (0.56, 1)
• p-value of 0.48: P(P > 0.48 | p = 0.40)b
> 1-pnorm(0.48,0.40,s)
[1] 0.2071081 p-value(0.48) = 0.21
There is no experimental evidence to reject H0
Direct computation in R
> prop.test(12,25,0.40,alternative="greater",correct=F)
1-sample proportions test without continuity correction data: 12 out of 25, null probability 0.4
X-squared = 0.66667, df = 1, p-value = 0.2071
alternative hypothesis: true p is greater than 0.4 95 percent confidence interval:
0.3258181 1.0000000 sample estimates:
p 0.48
The default alternative is "two.sided"
The default level is conf.level=0.95
Exercise
Chicago land’s technology professionals get local technology news from various newspapers and magazines. A marketing company claims that 25% of the IT professionals choose the Chicago Tri- bune as their primary source for local IT news. A survey was conducted last month to check this claim. Among a sample of 750 IT professionals in the Chicago land area, 23.43% of them prefer the Chicago Tribune. Can we conclude that the claim of the marketing company is true?