• Non ci sono risultati.

Inferential Statistics Hypothesis tests on a population mean

N/A
N/A
Protected

Academic year: 2021

Condividi "Inferential Statistics Hypothesis tests on a population mean"

Copied!
53
0
0

Testo completo

(1)

Inferential Statistics

Hypothesis tests on a population mean

Eva Riccomagno, Maria Piera Rogantin

DIMA – Universit`a di Genova

riccomagno@dima.unige.it rogantin@dima.unige.it

(2)

Part C

Hypothesis tests 0. Review

1. Aside of probability. Normal random variable

2. Test on the mean of a Normal variable – known variance

(running example: toxic algae)

(a) Composite hypotheses (b) The power function

(c) The p-value

(d) Sample size, given α and β

3. Test on the mean of a Normal variable – unknown variance 4. Test on the mean of a variable with unknown distribution 5. Aside of probability. Central limit theorem

6. Test on the proportion of a dichotomous event

7. Test of the equality of two means (two-samples and paired samples)

(3)

0. Review A hypothesis testing is formed by

- a null and alternative hypothesis

- a test statistics (function of the sample variables) - a rejection region/decision rule

Significance level/Critical value/Test statistic: all contribute to the definition of the rejection rule

Type I and II errors

Example: if R0 = {x such that T (x) > s} (slide 27) then:

α = P(X > s | H0) reject H0 when it’s true β = P(X ≤ s | H1) retain H0 when it’s false

(4)

Formulation of the hypotheses and “form” of R0

Example. The quantity of tomatoes in a can is nominally set at 100 g.

Formulate a statistical hypothesis test as if you were:

1. the Federal Trade Commission 2. an unscrupulous shareholder

3. the worker in charge of the quality control of the can tomato filling machine.

Indicate to which hypotheses the following R0 correspond:

R0 = {x s.t. T (x) > s}

R0 = {x s.t. T (x) < s1} ∪ {x s.t. T (x) > s2} R0 = {x s.t. T (x) < s}

In case of two-sided test s1 and s2 are such that

α1 = P(T < s1 | H0) α2 = P(T > s2 | H0) with α1 + α2 = α If the test statistics has a symmetrical distribution often α1 and α2 are set as: α1 = α2 = α/2

(5)

Consider Ex. 2 of Assignment 3. X ∼ B(20, p)

What happens when the hypotheses change?

1. H0 : p = 0.3 H1 : p = 0.5 α = 0.05 one-sided right 2. H0 : p = 0.5 H1 : p = 0.3 α = 0.05 one-sided left

> p_0=0.3;p_1=0.5;

> s05_right=qbinom(1-0.05,20,p_0);s05_right [1] 9 ### same result as p_1=0.7

> s05_left=qbinom(0.05,20,p_1)-1;s05_left ### note -1 [1] 5

One-sided right R0 = {x > 9}; one-sided left R0 = {x < 5}.

Notice that for x = 5, 6, 7, 8, 9 the decision of the two tests is different (small size and p0 “close” to p1)

What happens when α change?

1. H0 : p = 0.3 H1 : p = 0.5 α = 0.05 3. H0 : p = 0.3 H1 : p = 0.5 α = 0.01

> s01_right=qbinom(1-0.01,20,p_0);s01_right [1] 11

R0 with α = 0.01 is smaller than R0 with α = 0.05

(6)

1. Aside of Probability. Normal random variable

Probability density functions and cumulative distribution functions

µ = −1, 0, 5 σ = 1

−4 −2 0 2 4 6 8

0.00.10.20.30.40.5

Normal sigma = 1

−4 −2 0 2 4 6 8

0.00.10.20.30.40.5

Normal sigma = 1

−4 −2 0 2 4 6 8

0.00.10.20.30.40.5

Normal sigma = 1

−4 −2 0 2 4 6 8

0.00.20.40.60.81.0

Normal sigma = 1

−4 −2 0 2 4 6 8

0.00.20.40.60.81.0

Normal sigma = 1

−4 −2 0 2 4 6 8

0.00.20.40.60.81.0

Normal sigma = 1

µ = 0

σ = 0.5, 1, 3

−5 0 5

0.00.20.40.60.8

Normal mu = 0

−5 0 5

0.00.20.40.60.8

Normal mu = 0

−5 0 5

0.00.20.40.60.8

Normal mu = 0

−5 0 5

0.00.20.40.60.81.0

Normal mu = 0

−5 0 5

0.00.20.40.60.81.0

Normal mu = 0

−5 0 5

0.00.20.40.60.81.0

Normal mu = 0

(7)

Some properties of normal random variable

• For X ∼ N (µ, σ2) let Y = aX + b be.

Then Y ∼ N (aµ + b, a2σ2) In particular

Z = X − µ

σ ∼ N (0, 1) Z is called standard normal variable

• The sum of normal variables is a normal variable.

In particular for X1, . . . , Xn independent and identically dis- tributed (i.i.d.) with Xi ∼ N (µ, σ2) for all i = 1, . . . , n, then

X ∼ N µ, σ2 n

!

• The density function of X ∼ N (µ, σ2) is: f (x) = 1

2π σ2 exp



(x−µ)

2

2



(8)

2. Test on the mean of a normal random variable with known variance

2. (a) Composite hypotheses

Running example: X models the concentration of toxic algae blooms

The statistical model is: X ∼ N (µ, σ2) with σ2 known

and experts set a bathing alert if µ > 10000 cells/liter. Thus H0 : µ ≥ 10000 H1 : µ < 10000

The significance level of the test is set at α = 5%. If we reject H0 we can swim with 5% probability of side effects due to the algae.

(9)

Test statistics X

Sample size: n = 10 σ = 2100 cells/liter Set α = 5%.

The test hypotheses are both composite

First we consider the two cases 1. H0 : µ=10000 H1 : µ= 8500 2. H0 : µ=10000 H1 : µ<10000 and next

3. H0 : µ≥10000 H1 : µ<10000

10000

(10)

Case 1.

Assume H0: X ∼ N (10000, 21002/10)

R0 = {x < s} where s is such that PX < s|µ = 10000 = α Get s = 8908 with R

mu0=10000;std=2100/sqrt(10) c1= qnorm(.05,mu0,std);c1

If the test statistic value on the sample is less than 8908, then we reject H0. Spot the probability of type I error in the plot.

If x is larger than 8908, then the prob- ability of type II error is

β = P s < X|µ = 8500 Get β = 27% with R

mu1=8500;1-pnorm(c1,mu1,std)

We retain H0 with the probability a large probability of type II error.

10000

8500 retain H

reject H0 0

10000

8500 retain H

reject H

(11)

Case 2.

H0 : µ=10000 H1 : µ<10000 R0 does not change

as it is computed under H0

The probability β of II type error be- comes a function of µ as it is com-

puted under H1 namely µ < 10000 reject H 10000retain H

0 0

Case 3.

H0 : µ≥10000 H1 : µ<10000

Keeping the same R0, the prob- ability of type I error, denoted by α(µ), is smaller than α

α(µ) < α

The probability β of the type II error is a function of µ under H1, as in case 2.

10000 retain H

reject H0 0

(12)

2. (b) The power function P(θ) of a test Consider a generic test on a parameter θ:

H0 : θ ∈ Θ0 H1 : θ ∈ Θ1

Running example. H0 : µ ≥ 10000 H1 : µ < 10000 Then Θ0 = (10000, +∞), Θ1 = (−∞, 10000)

The power function of the test is the probability to reject H0 as a function of the parameter θ

P (θ) = P (T ∈ R0|θ) =

( 1 − β(θ) if θ ∈ Θ1 correct decision α(θ) if θ ∈ Θ0 type I error

Note that α(θ) ≤ α.

1. the critical value is the smallest s s.t. P (T > s|H0) < α 2. the critical region is the largest R0 s.t. P (T ∈ R0|H0) < α

(13)

Power function:

P (θ) = P (T ∈ R0|θ) =

( 1 − β(θ) if θ ∈ Θ1 correct decision α(θ) if θ ∈ Θ0 type I error

Running example:

R0 = {X < 8908}

P (µ) = PX < 8908 | µ ∈ R P (10000) = 0.05 = α

0 1

α

10000 8500

1-β(8500)

std=2100/sqrt(10);c1=qnorm(.05,10000,std) mu=seq(7000,11000)

p=pnorm(c1,mu,std)

plot(mu,p,type="l",lwd=3, col="red")

“Tests with large power are preferable”

(14)

Power and sample size

The probability to reject H0, when it is true, grows as the sample size grows.

Running example

H0 : µ ≥ 10000 and H1 : µ < 10000 R0 = {Xn < 8908}

P (µ) = P Xn < 8908 | µ ∈ R

n = 10 red n = 20 blue 0

1

α

10000 0

1

α

10000 9500

If the values of the parameter under H0 and under H1 are “close”

(e.g. 10000 and 9500 respectively), only with large sample the probability of correct decision is large.

(15)

The power for one-sided and two-sided tests One-sided: H0 : µ ≥ 10000 and H1 : µ < 10000

R0 = {X < 8908} P (µ) = PX < 8908| µ ∈ R

Two-sided: H0 : µ = 10000 and H1 : µ 6= 1000 R0 = {X < 8700} ∪ {X > 11300}

P (µ) = PX < 8700 | µ ∈ R + PX > 11300) | µ ∈ R

(16)

One-sided red

H0 : µ ≥ 10000 H1 : µ < 10000

Two-sided blue

H0 : µ = 10000 H1 : µ 6= 10000

0 1

α

10000

mu=seq(7000,13000); mu0=10000; std=2100/sqrt(10) c1_u=qnorm(.05,mu0,std); p=pnorm(c1_u,mu,std)

c1_b=qnorm(.025,mu0,std);c2_b=qnorm(.975,mu0,std);p_b=pnorm(c1_b,mu,std)+1-pnorm(c2_b,mu,std) limx=c(7000,13000);limy=c(0,1)

plot(mu,p,type="l",lwd=3,xaxt="n",yaxt="n",xlab=" ",ylab=" ",col="red",xlim=limx,ylim=limy) par(new=T)

plot(mu,p_b,type="l",lwd=3,xaxt="n",yaxt="n",xlab=" ",ylab=" ",col="blue",xlim=limx,ylim=limy) axis(2, at = 0,0, las=1,cex=1.2,col="blue");axis(2, at = 1,1, las=1,cex=1.2,col="blue")

axis(2, at =0.05,expression(alpha), las=1,cex=1.2,col="blue") axis(1, at = mu0,mu0, las=1,cex=1.2,col="blue")

abline(h=c(0,1));abline(h=0.05,v=c(mu0),lty=3,lwd=2);abline(h=0.05,lty=3,lwd=2)

(17)

2. (c) The p-value Recall

A statistical hypothesis test on a parameter θ is given by H0 : θ ∈ Θ0

H1 : θ ∈ Θ1 T ∼ F R0 The power function is defined as

P (θ) = P (T ∈ R0|θ) =

α(θ) if θ ∈ Θ0 1 − β(θ) if θ ∈ Θ1

=

type I error reject H0 when true 1 − type II error reject H0 when false The size of the test is α = supθ∈Θ0 P (θ)

and it holds α ≥ α(θ) for all θ ∈ Θ0

A test is said of level α if its size is less than or equal to α

(18)

p-value

We have a family of statistical hypothesis tests H0 : θ ∈ Θ0

H1 : θ ∈ Θ1 T ∼ F tobs from a sample X

and for all α ∈ (0, 1) we have Rα0 (a rejection region for a size α test)

The p-value or observed significance level is the smallest level of significance at which H0 would be rejected, namely

p-value = inf{α ∈ (0, 1) : tobs ∈ Rα0}

(19)

The most common situation

Consider the family of tests with Rα0 = {T ≥ sα} where sα is computed from

P (T ≥ sα|H0) < α Note that α0 < α00 if and only if sα0 > sα00

5 10 15 20

0.000.040.080.12

t

s sα‘

α‘

α‘’ obs

p(tobs) = sup

θ∈Θ0

P (T > tobs)

(20)

From the book by Wassermann p. 157

Informally, the p-value is a measure of the evidence against H0: the smaller the p-value, the stronger the evidence against H0

Typically, researchers use the following evidence scale:

p-value evidence

< .01 very strong evidence against H0

.01 − .05 strong evidence against H0

.05 − .10 weak evidence against H0

> .1 little or no evidence against H0

Warning! A large p-value is not strong evidence in favor of H0. A large p-value can occur for two reasons: (i) H0 is true or (ii) H0 is false but the test has low power.

Warning! Do not confuse the p-value with P (H0|Data). The p-value is not the probability that the null hypothesis is true.

“evidence” ⇐⇒ “statistically significant”

The p-value depends on the sample size. If the sample is large, even a small difference can be “evidence”, that is hard to explain by the chance variability.

(21)

How to compute the p-value?

The p-value (as the rejection region) depends on the “form” of the alternative hypothesis

In practice, the p-value is the level of the test if the threshold of R0 is the sample value.

Consider H0 : µ = µ0

Running example: µ0 = 10000 and suppose x = 9000 H1 : µ < µ0

H1 : µ > µ0

H1 : µ 6= µ0

10000 9000

10000 9000

10000 11000

9000

p(9000) = 0.066

pnorm(9000,mu0,std)

p(9000) = 0.934

1-pnorm(9000,mu0,std)

p(9000) = 0.132

2*pnorm(9000,mu0,std) [notice: 2*pnorm]

(22)

Reference

Regina Nuzzo. (2014) Statistical Errors – P values, the “gold standard” of statistical validity, are not as reliable as many sci- entists assume. Nature, vol. 506, p. 150-2.

(23)

2. (d) Sample size n, given α e β for one-sided test

H0 : µ = µ0 H1 : µ = µ1 (with µ1 < µ0) ⇒ R0 = (−∞, s)

Let Z ∼ N (0, 1) and zα be the α-th quantile of Z.

α = P X < s|µ = µ0

= P

X − µ0

σ/

n < s − µ0

σ/ n



= P (Z < zα) β = P X > s|µ = µ1

= P

X − µ1

σ/

n > s − µ1

σ/ n



= P Z > z1−β



From s−µ0

σ/

n = zα and s−µ1

σ/

n = z1−β = −zβ we have:

n =

zα + zβ2 σ20 − µ1)2

The result is the same if µ1 > µ0.

The sample size n increases when:

- the distance between µ0 and µ1 decreases - the variance increases

- zα and zβ increase, equivalently α and β decrease

(24)

3. Test on the mean of a Normal variable with unknown variance – t test

Let X ∼ N (µ, σ2) with unknown µ and σ2

The estimator of µ remains X. The unbiased estimator of σ2 is S2 = 1

n − 1

n X

i=1

Xi − X2

A small aside of probability

The random variable T has distribution Student’s t with n − 1 degrees of freedom:

T = X − µ S/√

n ∼ t[n−1]

The density function and the cumulative distribution function of a t[n] r.v. are close to those of the standard normal r.v. N (0, 1)

Dashed lines: t[2] and t[5] – solid line: N (0, 1) −3 −2 −1 0 1 2 3

0.00.10.20.30.4

−3 −2 −1 0 1 2 3

0.00.10.20.30.4

−3 −2 −1 0 1 2 3

0.00.10.20.30.4

(25)

Running example. H0 : µ ≥ 10000 H1 : µ < 10000 Suppose σ unknown and estimated by 2000.

The form of the rejection region is the same when σ is known : R0 = {X < s}

The threshold s is such that P

X < s|µ = 10000 = P X − 10000 2000/√

10 < s − 10000 2000/√

10

!

= P (T < tα) = α = 0.05

where tα is the α-th quantile of a random variable t[9]. Then s − 10000

2000/√

10 = tα and s = 10000 + tα 2000/√ 10

We obtain s = 8841 from R:

> c = qt(.05,9)

> s=10000+c*2000/sqrt(10);s

(26)

4. The Wald test for the mean of random variable with unknown distribution

Consider a test on the mean µ of a random variable X with unknown distribution and assume a large sample size. Then an asymptotic test can be performed.

The pivot used for the test statistics is X − µ

S/√ n where S2 is defined as in slide 22.

Its distribution can be approximated with a standard normal dis- tribution:

X − µ S/√

n ∼

approx N (0, 1)

(27)

5. Aside of probability. Approximate distribution of the sum Sn and the mean Xn of random variables

independent and identically distributed (i.i.d.)

Let X1, . . . , Xn be i.i.d. random variables with mean µ and vari- ance σ2

Let Sn be the random variable Sn =

n X

i=1

Xi

Sn has mean n µ and variance n σ2. Then Sn−nµ

has mean 0 and variance 1.

Central limit theorem (CLT)

n→∞lim P

Sn − nµ

√nσ ≤ t

!

= P(Z ≤ t), for all t ∈ R with Z ∼ N (0, 1)

(28)

Approximate distribution of Sn and Xn

For “large” n Sn

approx N (nµ, nσ2)

The random variable sample mean Xn = Sn/n has mean µ and variance σ2/n.

For “large” n Xn

approx N (µ, σ2/n) How large n should be?

Comparison between exact distribution and approximate distribution via CLT of Xn for different n

Two cases, where the exact distribution of Xn is known

• exponential distribution

• Bernoulli distribution

(29)

Case 1

X1, . . . , Xn i.i.d. with exponential distribution with parameter λ X1 ∼ E(λ)

Plot of the density function of X1 f (x) = λe−λx for x > 0 for λ = 1, 5, 10

(very “asymmetrical”)

0.0 0.5 1.0 1.5 2.0

0246810

Esponenzial

0.0 0.5 1.0 1.5 2.0

0246810

Esponenzial

0.0 0.5 1.0 1.5 2.0

0246810

Esponenzial

What about the distribution of Xn?

(30)

Exact and approssimate distributions of Xn, X1 ∼ E(2) Probability density functions of Xn

0.0 0.2 0.4 0.6 0.8 1.0

012345

n = 2

0.0 0.2 0.4 0.6 0.8 1.0

012345

0.0 0.2 0.4 0.6 0.8 1.0

012345

n = 5

0.0 0.2 0.4 0.6 0.8 1.0

012345

0.0 0.2 0.4 0.6 0.8 1.0

012345

n = 10

0.0 0.2 0.4 0.6 0.8 1.0

012345

0.0 0.2 0.4 0.6 0.8 1.0

012345

n = 20

0.0 0.2 0.4 0.6 0.8 1.0

012345

Cumulative distribution functions of Xn

0.0 0.2 0.4 0.6 0.8 1.0

0.00.40.8

n = 2

0.0 0.2 0.4 0.6 0.8 1.0

0.00.40.8

0.0 0.2 0.4 0.6 0.8 1.0

0.00.40.8

n = 5

0.0 0.2 0.4 0.6 0.8 1.0

0.00.40.8

0.0 0.2 0.4 0.6 0.8 1.0

0.00.40.8

n = 10

0.0 0.2 0.4 0.6 0.8 1.0

0.00.40.8

0.0 0.2 0.4 0.6 0.8 1.0

0.00.40.8

n = 20

0.0 0.2 0.4 0.6 0.8 1.0

0.00.40.8

(31)

Case 2

X1, . . . , Xn i.i.d. with Bernoulli distribution with parameter p

Model for an experiment with two outcomes: 1 (success) with probability p and 0 with probability 1 − p:

X1 ∼ B(1, p) with mean p and variance p(1 − p).

Plot of the distribution function of X1 for p = 0.3

0.00.20.40.60.81.0

x

density function

0 1

Sn has exact binomial distribution B(n, p) (sum of ones in n in- dependent trials)

Xn sometime is denoted by Pb (average of ones in n independent trials)

(32)

Exact and approssimate cumulative distribution functions of Xn, X1 ∼ B(1, 0.3))

0.0 0.2 0.4 0.6 0.8 1.0

0.00.40.8

n = 2

0.0 0.2 0.4 0.6 0.8 1.0

0.00.40.8

0.0 0.2 0.4 0.6 0.8 1.0

0.00.40.8

n = 5

0.0 0.2 0.4 0.6 0.8 1.0

0.00.40.8

0.0 0.2 0.4 0.6 0.8 1.0

0.00.40.8

n = 10

0.0 0.2 0.4 0.6 0.8 1.0

0.00.40.8

0.0 0.2 0.4 0.6 0.8 1.0

0.00.40.8

n = 30

0.0 0.2 0.4 0.6 0.8 1.0

0.00.40.8

(33)

Example

How accurate are radon detectors of a type sold to homeowners?

University researchers placed 12 detectors in a chamber that ex- posed them to 105 pico-curies per liter (pCi/l) of radon.

The detector readings were as follows:

91.9,97.8,111.4,122.3,105.4,95.0,103.8,99.6,96.6,119.3,104.8,101.7

Is there convincing evidence that the mean reading of all detec- tors of this type differs from the nominal value of 105?

(34)

Computation in R

> p=c(91.9,97.8,111.4,122.3,105.4,95.0,103.8,99.6,96.6,119.3,104.8,101.7)-105

> t.test(p)

One Sample t-test data: p

t = -0.31947, df = 11, p-value = 0.7554

alternative hypothesis: true mean is not equal to 0 95 percent confidence interval:

-6.837503 5.104170 sample estimates:

mean of x -0.8666667

Note: for t.test data should be centered at the value of H0.

(35)

6. Test on the proportion p of a binary event

Example.

It is known that 40% of the mice with eczema are free from symptoms in 4 weeks.

We consider a new the drug. It is efficient if more than 40% of mice are free from symptoms in 4 weeks.

From the population of mice that just shown eczema symptoms a sample is drawn; these mice are treated with the drug.

Parameter of interest:

p proportion mice free from symptoms in 4 weeks

(36)

The random variable X modeling the experiments has distribution X ∼ B(1, p)

with mean p and variance p(1 − p).

• Formulation of hypotheses:

H0: the drug has no effect, p = 0.40

To state that the drug is efficient we should reject H0 After 4 weeks: H0: p = 0.40 and H1: p > 0.40

• One-sided right test

• Sample size: n = 25

• Sample variables: X1, . . . , X25 i.i.d. X1 ∼ B(1, p).

• Test statistics:

P = Xb

mean of ones in the sample.

(37)

Result of the experiment and decision

In the sample of 25 treated mice 12 were free of symptoms in 4 weeks: ˆp = 0.48

Assume H0. Fix α = 5%. Use CLT:

Pb

approx N



0.40, 0.40 0.60 25



• Rejection region of H0: R0 = (c, 1)

where c is the 0.95-th quantile of a N (0.40, 0.4 ∗ 0.6/25)

> s = sqrt(0.4*0.6/25)

> qnorm(0.95,0.40,s) [1] 0.5611621

R0 = (0.56, 1)

• p-value of 0.48: P(P > 0.48 | p = 0.40)b

> 1-pnorm(0.48,0.40,s)

[1] 0.2071081 p-value(0.48) = 0.21

There is no experimental evidence to reject H0

(38)

Direct computation in R

> prop.test(12,25,0.40,alternative="greater",correct=F)

1-sample proportions test without continuity correction data: 12 out of 25, null probability 0.4

X-squared = 0.66667, df = 1, p-value = 0.2071

alternative hypothesis: true p is greater than 0.4 95 percent confidence interval:

0.3258181 1.0000000 sample estimates:

p 0.48

The default alternative is "two.sided"

The default level is conf.level=0.95

(39)

Exercise

Chicago land’s technology professionals get local technology news from various newspapers and magazines. A marketing company claims that 25% of the IT professionals choose the Chicago Tri- bune as their primary source for local IT news. A survey was conducted last month to check this claim. Among a sample of 750 IT professionals in the Chicago land area, 23.43% of them prefer the Chicago Tribune. Can we conclude that the claim of the marketing company is true?

(40)

7. Test for the equality of two means

A common application is to test if a new process or treatment is superior to a current process or treatment.

The data may either be paired or unpaired.

a) Paired samples When there is a one-to-one correspondence between the values in the two samples. That is, if X1, X2, . . . , Xn and Y1, Y2, . . . , Yn are the two sample variables, then Xi cor- responds to Yi.

b) Unpaired samples The sample sizes for the two samples may or may not be equal.

(41)

7. a) Paired samples

Let X and Y be two random variables modeling some character- istic of the same population.

Example. Drinking Water

(from https://onlinecourses.science.psu.edu Penny State University)

Trace metals in drinking water affect the flavor and an unusually high con- centration can pose a health hazard.

Ten pairs of data were taken measuring zinc concentration in bottom water and surface water.

> water

bottom surface [1,] 0.430 0.415 [2,] 0.266 0.238 [3,] 0.567 0.390 [4,] 0.531 0.410 [5,] 0.707 0.605 [6,] 0.716 0.609 [7,] 0.651 0.632 [8,] 0.589 0.523 [9,] 0.469 0.411 [10,] 0.723 0.612

(42)

Assume X ∼ N (µX, σX2 ) and Y ∼ N (µY , σY2).

Let (X1, Y1), . . . , (Xn, Yn) be the n paired sample variables.

Consider the sample random variables D1, . . . , Dn with Di = Xi − Yi.

The test statistics is the sample mean of D:

D ∼ N (µD, σD2 /n)

with µD = µX − µY and σD2 = σX2 + σY2 − 2Cov(X, Y ), usually unknown and estimated by the unbiased estimator SD2 .

The test of the equality of the two means, e.g. H0 : µX = µY , becomes a Student’s t test on µD, e.g. H0 : µD = 0,

(43)

Example. Drinking Water (continue)

> D=water[,1]-water[,2];D

[1] 0.015 0.028 0.177 0.121 0.102 0.107 0.019 0.066 0.058 0.111

• Hypotheses: H0 : µD = 0 and H1 : µD 6= 0

• Two-sided right: R0 = (−∞, c1) ∪ (c2, +∞)

• Sample size: n = 10

• Sample variables: D1, . . . , D25 i.i.d. Xi ∼ N (0, σ2) with σ2 estimated by SD2

• Test statistics: T = D−µD

S/

n ; under H0: T = D

S/

n ∼ t9

• α = 0.05

The thresholds of the rejection region c1 and c2 are such that 0.025 = P(T < c1 | µD = 0) 0.025 = P(T > c2 | µD = 0) Observe that, because of the symmetry w.r.t. 0 of the Student’s t density

c1 = −c2

(44)

In the sample: d = 0.0804 s = 0.052

The sample value of the test statistics, under H0, is 4.86

> d_m=mean(D);d_m; s=sd(D);s [1] 0.0804

[1] 0.05227321

> t=d_m/(s/sqrt(10));t [1] 4.863813

The rejection region is R0 = (−∞, −2.262) ∪ (2.262, ∞). The p-value is 0.0009

> c1=qt(0.025,9) [[1] -2.262157

> 2*(1-pt(t,9)) [1] 0.0008911155

The direct computation in R produces:

> t.test(water[,1],water[,2],paired=TRUE) Paired t-test

data: water[, 1] and water[, 2]

t = 4.8638, df = 9, p-value = 0.0008911

alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval:

0.043006 0.117794 sample estimates:

mean of the differences 0.0804

There is experimental evidence to reject H0

(45)

7. b) Unpaired samples Example. Prey of two species of spiders

(from https://onlinecourses.science.psu.edu Penny State University)

The feeding habits of two species of net-casting spiders are stud- ied. The species, the deinopis and menneus, coexist in eastern Australia. The following data were obtained on the size, in mil- limeters, of the prey of random samples of the two species.

The spiders were selected randomly. Then assume the measure- ments are independent.

> d

[1] 12.9 10.2 7.4 7.0 10.5 11.9 7.1 9.9 4.4 11.3

> m

[1] 10.2 6.9 10.9 11.0 10.1 5.3 7.5 10.3 9.2 8.8

> mean(d);mean(m) [1] 10.26

[1] 9.02

d m

68101214

(46)

Normal distribution

Assume the size of the two population (denoted by A and B) have normal distribution

XA ∼ N (µA, σA2) XB ∼ N (µB, σB2 )

We want to test H0 : µA = µB and H1 : µA 6= µB

or equivalently: H0 : µA − µB = 0 and H1 : µA − µB 6= 0

Let nA and nB be the size of the two independent sample of XA and XB. In the example nA = nB = 10.

The two sample mean random variable are:

XA ∼ N µA, σA2 nA

!

XB ∼ N µB, σB2 nB

!

Consider the random variable difference of the two sample mean random variables. It has normal distribution:

XA − XB ∼ N µA − µB, σA2

nA + σB2 nB

!

The original test become a test on the mean of one normal random variable

(47)

1. The variances σA2 and σB2 are known

Fixed α, a usual normal test is carried out.

2. The variances σA2 and σB2 are unknown, and assumed equal and estimated by the unbiased estimators SA2 e SB2.

A unbiased estimator of the variance of the random variable XA − XB is:

S2 = (nA − 1)SA2 + (nB − 1)SB2

(nA + nB − 2) · nA + nB

nA nB (Pooled variance) In particular, if nA = nB = n, then S2 = SA2 + SB2

/n

The test statistics is:

T =

XA − XB − (µA − µB)

S with T ∼ td d = nA + nB − 2 Fixed α, a usual Student’s t test is carried out.

3. The unknown variances σA2 and σA2 are not equal

A hypothesis test based on the t distribution, known as Welch’s t-test, can be used.

(48)

Example. Prey of two species of spiders (continue)

• Hypotheses: H0 : µD = µM and H1 : µD 6= µM

• Two-sided right: R0 = (−∞, c1) ∪ (c2, +∞)

• Sample size: nD = nM = 10

• First, assume σM2 = σD2 . Pooled variance estimator:

S2 = SD2 + SM2  /n and d = 2n − 2

• Test statistics under H0: T =

XD − XM

S ∼ t18

• α = 0.05

The thresholds of the rejection region c1 and c2 are such that 0.025 = P(T < c1 | µD = µM) 0.025 = P(T > c2 | µD = µM)

(49)

The sample means of the two groups are:

xD = 10.26 xM = 9.0.2

The sample difference of means is: xD − xM = 1.24 The sample pooled variance is: s2 = 1.01

The sample value of the test statistics, under H0, is 1.18

> diff_m=mean(d)-mean(m);diff_m [1] 1.24

> s2=(sd(d)^2+sd(m)^2)/10;s2 [1] 0.9915556

> t=diff_m/sqrt(s2);t [1] 1.245269

The rejection region is R0 = (−∞, −2.1) ∪ (2.1, ∞).

The p-value is 0.25

> c1=qt(0.025,18);c1 [1] -2.100922

> 2*(1-pt(t,18)) ## note 2*( ) -- two sided test [1] 0.2290008

There is no experimental evidence to reject H0

(50)

Can we assume the two variances equal?

A specific test can be performed (based on the Fisher distribu- tion). Here we do not give the details. Compute in R

> var.test(m, d, ratio = 1)

F test to compare two variances data: m and d

F = 0.56936, num df = 9, denom df = 9, p-value = 0.4142

alternative hypothesis: true ratio of variances is not equal to 1 95 percent confidence interval:

0.1414206 2.2922339 sample estimates:

ratio of variances 0.5693585

We can assume the σD2 = σM2 , although the ratio of variances is 0.57. This apparent inconsistency is due to the small sample sizes.

Direct computation in R of the test

H0 : µD = µM and H1 : µD 6= µM, assuming σD2 = σM2

> t.test(d,m,var.equal=T) Two Sample t-test data: d and m

t = 1.2453, df = 18, p-value = 0.229

alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval:

-0.8520327 3.3320327 sample estimates:

mean of x mean of y

10.26 9.02

(51)

If the equality of the variances is rejection, we use the Welch Two Sample t-test.

In such a case the polled variance s2 and the degrees of freedom are compute in an another manner.

Compute in R

t.test(d,m)

Welch Two Sample t-test data: d and m

t = 1.2453, df = 16.74, p-value = 0.2302

alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval:

-0.8633815 3.3433815 sample estimates:

mean of x mean of y 10.26 9.02

The problem of making inference on means when vari- ances are unequal, is, in general, quite a difficult one. It is known as the Behrens-Fisher Problem.

(G. Casella, R.J. Berger, Statistical Inference, 2nd ed., Duxbury, Ex.

8.42)

(52)

Notes and generalisations

• The Wald test. If the two random variables have not normal distribution and the sample size is “large” a Wald test can be performed.

• Threshold different from zero. In some applications, you may want to adopt a new process or treatment only if it exceeds the current treatment by some threshold. In this case, the difference between the two mean is not compared with 0 but with the chosen threshold.

(53)

Open more graphical devices in R

boxplot(a)

dev.new(); dev.set(dev.cur()) ## open a new device boxplot(b)

for (i in 1:2) dev.off() ## close two open devices

Riferimenti

Documenti correlati

In par- ticular, we consider three abstraction approaches: (i) an approximation of the stochastic discrete dynamics with continuous deterministic ones, applied to all model

Data-set consisting of accelerator pedal position α (input data; first graph), dynamometer torque T D and shaft torque T ST (output data; second graph and third graph,

2.. the critical value is the smallest s s.t.. 10000 and 9500 respectively), only with large sample the probability of correct decision

If the sample is large, even a small difference can be “evidence”, that is hard to explain by the chance variability.. are close to those of the standard

A marketing company claims that 25% of the IT professionals choose the Chicago Tri- bune as their primary source for local IT news. A survey was conducted last month to check

b) two-samples.. Chicago Tribune Chicago land’s technology professionals get local technology news from various newspapers and magazines. A marketing company claims that 25% of the

Il lavoro di tesi, oltre ad illustrare per la prima volta in modo approfondito la personalità e l'attività di Bosio grazie ad uno spoglio dettagliato del materiale documentario,

Because the Canonical table provides information about the gene/protein position in each chromosome as well as AA composition, we can investigate the content of single amino acids