Inferential Statistics Hypothesis tests on a population mean

(1)

Inferential Statistics

Hypothesis tests on a population mean

Eva Riccomagno, Maria Piera Rogantin

DIMA – Universit`a di Genova

riccomagno@dima.unige.it rogantin@dima.unige.it

(2)

Part E

Hypothesis tests on the mean of a population (continue)

3. Test on the mean of a variable with unknown distribution 4. Aside of probability. Central limit theorem

5. Test on the proportion of a dichotomous event

(3)

3. The Wald test for the mean of random variable with unknown distribution

Consider a test on the mean µ of a random variable X with unknown distribution and assume a large sample size. Then an asymptotic test can be performed

The pivot used for the test statistics is X − µ

S/√ n where S² = ¹

n−1

P_n

i=1(X_i − X)²

Its distribution can be approximated with a standard Normal distribution

X − µ S/√

n ∼

approx N (0, 1)

(4)

4. Aside of probability. Approximate distribution of the sum S_n and the mean X_n of random variables

independent and identically distributed (i.i.d.)

Let X₁, . . . , X_n be i.i.d. random variables with mean µ and variance σ²

Let S_n be the random variable S_n =

n X

i=1

X_i

S_n has mean n µ and variance n σ². Then ^S^√ⁿ^−nµ

nσ has mean 0 and variance 1

Central limit theorem (CLT)

n→∞lim P

S_n − nµ

√nσ ≤ t

!

= P(Z ≤ t), for all t ∈ R with Z ∼ N (0, 1)

(5)

Approximate distribution of S_n and X_n

For “large” n S_n ∼

approx N (nµ, nσ²)

The random variable sample mean X_n = S_n/n ^has mean µ and variance σ²/n

For “large” n X_n ∼

approx N (µ, σ²/n) How large n should be?

Comparison between exact distribution and approximate distribution via CLT of X_n for different n

Two cases, where the exact distribution of X_n is known

• exponential distribution

• Bernoulli distribution

(6)

Example 1

X₁, . . . , X_n i.i.d. with exponential distribution with parameter λ, λ > 0, written as

X₁ ∼ E(λ) The mean of X₁ is ¹

λ

Plot of the density function of X₁ f (x) = λe^−λx for x > 0 for λ = 1, 5,10

(very “asymmetrical”)

0.0 0.5 1.0 1.5 2.0

0246810

Esponenzial

0.0 0.5 1.0 1.5 2.0

0246810

Esponenzial

0.0 0.5 1.0 1.5 2.0

0246810

Esponenzial

What about the distribution of X_n?

(7)

Exact and approssimate distributions of X_n, X1 ∼ E(2) Probability density functions of X_n

0.0 0.2 0.4 0.6 0.8 1.0

012345

n = 2

0.0 0.2 0.4 0.6 0.8 1.0

012345

0.0 0.2 0.4 0.6 0.8 1.0

012345

n = 5

0.0 0.2 0.4 0.6 0.8 1.0

012345

0.0 0.2 0.4 0.6 0.8 1.0

012345

n = 10

0.0 0.2 0.4 0.6 0.8 1.0

012345

0.0 0.2 0.4 0.6 0.8 1.0

012345

n = 20

0.0 0.2 0.4 0.6 0.8 1.0

012345

Cumulative distribution functions of X_n

0.0 0.2 0.4 0.6 0.8 1.0

0.00.40.8

n = 2

0.0 0.2 0.4 0.6 0.8 1.0

0.00.40.8

0.0 0.2 0.4 0.6 0.8 1.0

0.00.40.8

n = 5

0.0 0.2 0.4 0.6 0.8 1.0

0.00.40.8

0.0 0.2 0.4 0.6 0.8 1.0

0.00.40.8

n = 10

0.0 0.2 0.4 0.6 0.8 1.0

0.00.40.8

0.0 0.2 0.4 0.6 0.8 1.0

0.00.40.8

n = 20

0.0 0.2 0.4 0.6 0.8 1.0

0.00.40.8

Exact: X_n ∼ ¹_n Γ (n, λ) Approssimate: X_n ∼ N ¹_λ, _nλ¹₂

(8)

Example 2

X₁, . . . , X_n i.i.d. with Bernoulli distribution with parameter p

Model for an experiment with two outcomes: 1 (success) with probability p and 0 with probability 1 − p

X₁ ∼ B(1, p) with mean p and variance p(1 − p)

Plot of the distribution function of X₁ for p = 0.3

0.00.20.40.60.81.0

x

density function

0 1

S_n has exact binomial distribution B(n, p) (sum of ones in n independent trials)

X_n sometime is denoted by P^b (number of ones in n independent trials divided by n)

(9)

Exact and approssimate cumulative distribution functions of X_n, X₁ ∼ B(1, 0.3)

0.0 0.2 0.4 0.6 0.8 1.0

0.00.40.8

n = 2

0.0 0.2 0.4 0.6 0.8 1.0

0.00.40.8

0.0 0.2 0.4 0.6 0.8 1.0

0.00.40.8

n = 5

0.0 0.2 0.4 0.6 0.8 1.0

0.00.40.8

0.0 0.2 0.4 0.6 0.8 1.0

0.00.40.8

n = 10

0.0 0.2 0.4 0.6 0.8 1.0

0.00.40.8

0.0 0.2 0.4 0.6 0.8 1.0

0.00.40.8

n = 30

0.0 0.2 0.4 0.6 0.8 1.0

0.00.40.8

Exact: X_n ∼ ¹

n B(n, p) Approssimate: X_n ∼ N

p, ^p(1−p)_n

(10)

5. Test on the proportion p of a binary event

Example

It is known that 40% of the mice with eczema are free from symptoms in 4 weeks

A new the drug is efficient if more than 40% of mice are free from symptoms in 4 weeks

From the population of mice that start eczema symptoms a sample is drawn; these mice are treated with the drug

Parameter of interest:

p proportion of mice free from symptoms in 4 weeks

(11)

The random variable X modeling the experiments has distribution X ∼ B(1, p)

with mean p and variance p(1 − p)

• Formulation of hypotheses:

H₀: the drug has no effect, p = 0.40

To state that the drug is efficient we should reject H₀ After 4 weeks: H₀: p = 0.40 and H₁: p > 0.40

• One-sided right test

• Sample size: n = 25

• Sample variables: X₁, . . . , X₂₅ i.i.d. X₁ ∼ B(1, p)

• Test statistics:

P = Xb

mean of ones in the sample (mice free of symptom after four weeks)

(12)

Result of the experiment and decision

In the sample of 25 treated mice 12 were free of symptoms in 4 weeks: ˆp = 0.48

Assume H₀. Fix α = 5%. Use CLT:

Pb ∼

approx N

0.40, 0.40 0.60 25

• Rejection region of H₀: R₀ = (c, 1)

where c is the 0.95-th quantile of a N (0.40, 0.4 ∗ 0.6/25)

s = sqrt(0.4*0.6/25)

c = qnorm(0.95,0.40,s);c [1] 0.5611621

R₀ = (0.56, 1)

• p-value of 0.48: P(P > 0.48 | p = 0.40)^b

> 1-pnorm(0.48,0.40,s)

[1] 0.2071081 p-value(0.48) = 0.21

There is no experimental evidence to reject H₀

(13)

Direct computation in R

> prop.test(12,25,0.40,alternative="greater",correct=F)

1-sample proportions test without continuity correction data: 12 out of 25, null probability 0.4

X-squared = 0.66667, df = 1, p-value = 0.2071

alternative hypothesis: true p is greater than 0.4 95 percent confidence interval:

0.3258181 1.0000000 sample estimates:

p 0.48

The default alternative is "two.sided"

The default level is conf.level=0.95

(14)

Exercise

Chicago land’s technology professionals get local technology news from various newspapers and magazines. A marketing company claims that 25% of the IT professionals choose the Chicago Tri- bune as their primary source for local IT news. A survey was conducted last month to check this claim. Among a sample of 750 IT professionals in the Chicago land area, 23.43% of them prefer the Chicago Tribune. Can we conclude that the claim of the marketing company is true?