• Non ci sono risultati.

Inferential Statistics Hypothesis tests on a population mean

N/A
N/A
Protected

Academic year: 2021

Condividi "Inferential Statistics Hypothesis tests on a population mean"

Copied!
14
0
0

Testo completo

(1)

Inferential Statistics

Hypothesis tests on a population mean

Eva Riccomagno, Maria Piera Rogantin

DIMA – Universit`a di Genova

riccomagno@dima.unige.it rogantin@dima.unige.it

(2)

Part E

Hypothesis tests on the mean of a population (continue)

3. Test on the mean of a variable with unknown distribution 4. Aside of probability. Central limit theorem

5. Test on the proportion of a dichotomous event

(3)

3. The Wald test for the mean of random variable with unknown distribution

Consider a test on the mean µ of a random variable X with unknown distribution and assume a large sample size. Then an asymptotic test can be performed

The pivot used for the test statistics is X − µ

S/√ n where S2 = 1

n−1

Pn

i=1(Xi − X)2

Its distribution can be approximated with a standard Normal dis- tribution

X − µ S/√

n ∼

approx N (0, 1)

(4)

4. Aside of probability. Approximate distribution of the sum Sn and the mean Xn of random variables

independent and identically distributed (i.i.d.)

Let X1, . . . , Xn be i.i.d. random variables with mean µ and vari- ance σ2

Let Sn be the random variable Sn =

n X

i=1

Xi

Sn has mean n µ and variance n σ2. Then Sn−nµ

has mean 0 and variance 1

Central limit theorem (CLT)

n→∞lim P

Sn − nµ

√nσ ≤ t

!

= P(Z ≤ t), for all t ∈ R with Z ∼ N (0, 1)

(5)

Approximate distribution of Sn and Xn

For “large” n Sn

approx N (nµ, nσ2)

The random variable sample mean Xn = Sn/n has mean µ and variance σ2/n

For “large” n Xn

approx N (µ, σ2/n) How large n should be?

Comparison between exact distribution and approximate distribution via CLT of Xn for different n

Two cases, where the exact distribution of Xn is known

• exponential distribution

• Bernoulli distribution

(6)

Example 1

X1, . . . , Xn i.i.d. with exponential distribution with parameter λ, λ > 0, written as

X1 ∼ E(λ) The mean of X1 is 1

λ

Plot of the density function of X1 f (x) = λe−λx for x > 0 for λ = 1, 5,10

(very “asymmetrical”)

0.0 0.5 1.0 1.5 2.0

0246810

Esponenzial

0.0 0.5 1.0 1.5 2.0

0246810

Esponenzial

0.0 0.5 1.0 1.5 2.0

0246810

Esponenzial

What about the distribution of Xn?

(7)

Exact and approssimate distributions of Xn, X1 ∼ E(2) Probability density functions of Xn

0.0 0.2 0.4 0.6 0.8 1.0

012345

n = 2

0.0 0.2 0.4 0.6 0.8 1.0

012345

0.0 0.2 0.4 0.6 0.8 1.0

012345

n = 5

0.0 0.2 0.4 0.6 0.8 1.0

012345

0.0 0.2 0.4 0.6 0.8 1.0

012345

n = 10

0.0 0.2 0.4 0.6 0.8 1.0

012345

0.0 0.2 0.4 0.6 0.8 1.0

012345

n = 20

0.0 0.2 0.4 0.6 0.8 1.0

012345

Cumulative distribution functions of Xn

0.0 0.2 0.4 0.6 0.8 1.0

0.00.40.8

n = 2

0.0 0.2 0.4 0.6 0.8 1.0

0.00.40.8

0.0 0.2 0.4 0.6 0.8 1.0

0.00.40.8

n = 5

0.0 0.2 0.4 0.6 0.8 1.0

0.00.40.8

0.0 0.2 0.4 0.6 0.8 1.0

0.00.40.8

n = 10

0.0 0.2 0.4 0.6 0.8 1.0

0.00.40.8

0.0 0.2 0.4 0.6 0.8 1.0

0.00.40.8

n = 20

0.0 0.2 0.4 0.6 0.8 1.0

0.00.40.8

Exact: Xn 1n Γ (n, λ) Approssimate: Xn ∼ N 1λ, 12

(8)

Example 2

X1, . . . , Xn i.i.d. with Bernoulli distribution with parameter p

Model for an experiment with two outcomes: 1 (success) with probability p and 0 with probability 1 − p

X1 ∼ B(1, p) with mean p and variance p(1 − p)

Plot of the distribution function of X1 for p = 0.3

0.00.20.40.60.81.0

x

density function

0 1

Sn has exact binomial distribution B(n, p) (sum of ones in n in- dependent trials)

Xn sometime is denoted by Pb (number of ones in n independent trials divided by n)

(9)

Exact and approssimate cumulative distribution functions of Xn, X1 ∼ B(1, 0.3)

0.0 0.2 0.4 0.6 0.8 1.0

0.00.40.8

n = 2

0.0 0.2 0.4 0.6 0.8 1.0

0.00.40.8

0.0 0.2 0.4 0.6 0.8 1.0

0.00.40.8

n = 5

0.0 0.2 0.4 0.6 0.8 1.0

0.00.40.8

0.0 0.2 0.4 0.6 0.8 1.0

0.00.40.8

n = 10

0.0 0.2 0.4 0.6 0.8 1.0

0.00.40.8

0.0 0.2 0.4 0.6 0.8 1.0

0.00.40.8

n = 30

0.0 0.2 0.4 0.6 0.8 1.0

0.00.40.8

Exact: Xn 1

n B(n, p) Approssimate: Xn ∼ N 

p, p(1−p)n



(10)

5. Test on the proportion p of a binary event

Example

It is known that 40% of the mice with eczema are free from symptoms in 4 weeks

A new the drug is efficient if more than 40% of mice are free from symptoms in 4 weeks

From the population of mice that start eczema symptoms a sam- ple is drawn; these mice are treated with the drug

Parameter of interest:

p proportion of mice free from symptoms in 4 weeks

(11)

The random variable X modeling the experiments has distribution X ∼ B(1, p)

with mean p and variance p(1 − p)

• Formulation of hypotheses:

H0: the drug has no effect, p = 0.40

To state that the drug is efficient we should reject H0 After 4 weeks: H0: p = 0.40 and H1: p > 0.40

• One-sided right test

• Sample size: n = 25

• Sample variables: X1, . . . , X25 i.i.d. X1 ∼ B(1, p)

• Test statistics:

P = Xb

mean of ones in the sample (mice free of symptom after four weeks)

(12)

Result of the experiment and decision

In the sample of 25 treated mice 12 were free of symptoms in 4 weeks: ˆp = 0.48

Assume H0. Fix α = 5%. Use CLT:

Pb

approx N



0.40, 0.40 0.60 25



• Rejection region of H0: R0 = (c, 1)

where c is the 0.95-th quantile of a N (0.40, 0.4 ∗ 0.6/25)

s = sqrt(0.4*0.6/25)

c = qnorm(0.95,0.40,s);c [1] 0.5611621

R0 = (0.56, 1)

• p-value of 0.48: P(P > 0.48 | p = 0.40)b

> 1-pnorm(0.48,0.40,s)

[1] 0.2071081 p-value(0.48) = 0.21

There is no experimental evidence to reject H0

(13)

Direct computation in R

> prop.test(12,25,0.40,alternative="greater",correct=F)

1-sample proportions test without continuity correction data: 12 out of 25, null probability 0.4

X-squared = 0.66667, df = 1, p-value = 0.2071

alternative hypothesis: true p is greater than 0.4 95 percent confidence interval:

0.3258181 1.0000000 sample estimates:

p 0.48

The default alternative is "two.sided"

The default level is conf.level=0.95

(14)

Exercise

Chicago land’s technology professionals get local technology news from various newspapers and magazines. A marketing company claims that 25% of the IT professionals choose the Chicago Tri- bune as their primary source for local IT news. A survey was conducted last month to check this claim. Among a sample of 750 IT professionals in the Chicago land area, 23.43% of them prefer the Chicago Tribune. Can we conclude that the claim of the marketing company is true?

Riferimenti

Documenti correlati

This analysis demonstrates that a subject matter is capable of settlement by arbitration whenever the parties are able to regulate the matter on which the

Some texts interpret confidence intervals as follows: if I repeat the experiment over and over, the interval will contain the parameter 1 − α percent of the time, e.g. This is

If the points form a nearly linear pattern, the normal distribution is a good model for this data. Departures from the straight line indicate departures

Consider a test on the mean µ of a random variable X with unknown distribution and assume a large sample size.. Aside of probability.. Test on the proportion p of a binary

Part F Confidence intervals.. Chicago land’s technology professionals get local technology news from various newspapers and magazines. A marketing company claims that 25% of the

Chi-square test, Phi coefficient, and Fisher exact test... , X n ) is distribution-free if its distribution is invariant for each distribution of the sample variables.. An example:

Consider the output of your test by varying the significance level and the sample size (if possible)?. Can you consider or device another test for the

Such a selection is now based on the maximum check interval value present in