Inferential Statistics Hypothesis tests on a population mean

(1)

Inferential Statistics

Hypothesis tests on a population mean

Eva Riccomagno, Maria Piera Rogantin

DIMA – Universit`a di Genova

riccomagno@dima.unige.it rogantin@dima.unige.it

(2)

Part C

Hypothesis tests 0. Review

1. Aside of probability. Normal random variable

2. Test on the mean of a Normal variable – known variance

(running example: toxic algae)

(a) Composite hypotheses (b) The power function

(c) The p-value

(d) Sample size, given α and β

3. Test on the mean of a Normal variable – unknown variance 4. Test on the mean of a variable with unknown distribution 5. Aside of probability. Central limit theorem

6. Test on the proportion of a dichotomous event

7. Test of the equality of two means (two-samples and paired samples)

(3)

0. Review A hypothesis testing is formed by

- a null and alternative hypothesis

- a test statistics (function of the sample variables) - a rejection region/decision rule

Significance level/Critical value/Test statistic: all contribute to the definition of the rejection rule

Type I and II errors

Example: if R₀ = {x such that T (x) > s} (slide 27) then:

α = P(X > s | H0) reject H₀ when it’s true β = P(X ≤ s | H1) retain H₀ when it’s false

(4)

Formulation of the hypotheses and “form” of R₀

Example. The quantity of tomatoes in a can is nominally set at 100 g.

Formulate a statistical hypothesis test as if you were:

1. the Federal Trade Commission 2. an unscrupulous shareholder

3. the worker in charge of the quality control of the can tomato filling machine.

Indicate to which hypotheses the following R₀ correspond:

R₀ = {x s.t. T (x) > s}

R₀ = {x s.t. T (x) < s₁} ∪ {x s.t. T (x) > s₂} R₀ = {x s.t. T (x) < s}

In case of two-sided test s₁ and s₂ are such that

α₁ = P(T < s1 | H₀) α₂ = P(T > s2 | H₀) with α₁ + α₂ = α If the test statistics has a symmetrical distribution often α₁ and α₂ are set as: α₁ = α₂ = α/2

(5)

Consider Ex. 2 of Assignment 3. X ∼ B(20, p)

What happens when the hypotheses change?

1. H0 : p = 0.3 H1 : p = 0.5 α = 0.05 one-sided right 2. H0 : p = 0.5 H1 : p = 0.3 α = 0.05 one-sided left

> p_0=0.3;p_1=0.5;

> s05_right=qbinom(1-0.05,20,p_0);s05_right [1] 9 ### same result as p_1=0.7

> s05_left=qbinom(0.05,20,p_1)-1;s05_left ### note -1 [1] 5

One-sided right R0 = {x > 9}; one-sided left R₀ = {x < 5}.

Notice that for x = 5, 6, 7, 8, 9 the decision of the two tests is different (small size and p0 “close” to p1)

What happens when α change?

1. H0 : p = 0.3 H1 : p = 0.5 α = 0.05 3. H0 : p = 0.3 H1 : p = 0.5 α = 0.01

> s01_right=qbinom(1-0.01,20,p_0);s01_right [1] 11

R0 with α = 0.01 is smaller than R0 with α = 0.05

(6)

1. Aside of Probability. Normal random variable

Probability density functions and cumulative distribution functions

µ = −1, 0, 5 σ = 1

−4 −2 0 2 4 6 8

0.00.10.20.30.40.5

Normal sigma = 1

−4 −2 0 2 4 6 8

0.00.10.20.30.40.5

−4 −2 0 2 4 6 8

0.00.10.20.30.40.5

−4 −2 0 2 4 6 8

0.00.20.40.60.81.0

−4 −2 0 2 4 6 8

0.00.20.40.60.81.0

−4 −2 0 2 4 6 8

0.00.20.40.60.81.0

µ = 0

σ = 0.5, 1, 3

−5 0 5

0.00.20.40.60.8

Normal mu = 0

−5 0 5

0.00.20.40.60.8

Normal mu = 0

−5 0 5

0.00.20.40.60.8

Normal mu = 0

−5 0 5

0.00.20.40.60.81.0

Normal mu = 0

−5 0 5

0.00.20.40.60.81.0

Normal mu = 0

−5 0 5

0.00.20.40.60.81.0

Normal mu = 0

(7)

Some properties of normal random variable

• For X ∼ N (µ, σ²) let Y = aX + b be.

Then Y ∼ N (aµ + b, a²σ²) In particular

Z = X − µ

σ ∼ N (0, 1) Z is called standard normal variable

• The sum of normal variables is a normal variable.

In particular for X₁, . . . , X_n independent and identically distributed (i.i.d.) with X_i ∼ N (µ, σ²) for all i = 1, . . . , n, then

X ∼ N µ, σ² n

!

• The density function of X ∼ N (µ, σ²) is: f (x) = ^√ ¹

2π σ² exp

−^(x−µ)

2

2σ²

(8)

2. Test on the mean of a normal random variable with known variance

2. (a) Composite hypotheses

Running example: X models the concentration of toxic algae blooms

The statistical model is: X ∼ N (µ, σ²) with σ² known

and experts set a bathing alert if µ > 10000 cells/liter. Thus H₀ : µ ≥ 10000 H₁ : µ < 10000

The significance level of the test is set at α = 5%. If we reject H₀ we can swim with 5% probability of side effects due to the algae.

(9)

Test statistics X

Sample size: n = 10 σ = 2100 cells/liter Set α = 5%.

The test hypotheses are both composite

First we consider the two cases 1. H₀ : µ=10000 H₁ : µ= 8500 2. H₀ : µ=10000 H₁ : µ<10000 and next

3. H₀ : µ≥10000 H₁ : µ<10000

10000

(10)

Case 1.

Assume H₀: X ∼ N (10000, 2100²/10)

R₀ = {x < s} where s is such that PX < s|µ = 10000 = α Get s = 8908 with R

mu0=10000;std=2100/sqrt(10) c1= qnorm(.05,mu0,std);c1

If the test statistic value on the sample is less than 8908, then we reject H₀. Spot the probability of type I error in the plot.

If x is larger than 8908, then the probability of type II error is

β = P s < X|µ = 8500 Get β = 27% with R

mu1=8500;1-pnorm(c1,mu1,std)

We retain H₀ with the probability a large probability of type II error.

10000

8500 retain H

reject H₀ ₀

10000

8500 retain H

reject H

(11)

Case 2.

H₀ : µ=10000 H₁ : µ<10000 R₀ does not change

as it is computed under H₀

The probability β of II type error becomes a function of µ as it is com-

puted under H₁ namely µ < 10000 _{reject H} ¹⁰⁰⁰⁰_{retain H}

0 0

Case 3.

H₀ : µ≥10000 H₁ : µ<10000

Keeping the same R₀, the probability of type I error, denoted by α(µ), is smaller than α

α(µ) < α

The probability β of the type II error is a function of µ under H₁, as in case 2.

10000 retain H

reject H₀ ₀

(12)

2. (b) The power function P⁽θ) of a test Consider a generic test on a parameter θ:

H₀ : θ ∈ Θ₀ H₁ : θ ∈ Θ₁

Running example. H₀ : µ ≥ 10000 H₁ : µ < 10000 Then Θ₀ = (10000, +∞), Θ₁ = (−∞, 10000)

The power function of the test is the probability to reject H₀ as a function of the parameter θ

P (θ) = P (T ∈ R₀|θ) =

( 1 − β(θ) if θ ∈ Θ₁ correct decision α(θ) if θ ∈ Θ₀ type I error

Note that α(θ) ≤ α.

1. the critical value is the smallest s s.t. P (T > s|H₀) < α 2. the critical region is the largest R₀ s.t. P (T ∈ R₀|H₀) < α

(13)

Power function:

P (θ) = P (T ∈ R₀|θ) =

( 1 − β(θ) if θ ∈ Θ₁ correct decision α(θ) if θ ∈ Θ₀ type I error

Running example:

R₀ = {X < 8908}

P (µ) = PX < 8908 | µ ∈ R P (10000) = 0.05 = α

0 1

α

10000 8500

1-β(8500)

std=2100/sqrt(10);c1=qnorm(.05,10000,std) mu=seq(7000,11000)

p=pnorm(c1,mu,std)

plot(mu,p,type="l",lwd=3, col="red")

“Tests with large power are preferable”

(14)

Power and sample size

The probability to reject H₀, when it is true, grows as the sample size grows.

Running example

H₀ : µ ≥ 10000 and H₁ : µ < 10000 R₀ = {X_n < 8908}

P (µ) = P X_n < 8908 | µ ∈ R

n = 10 red n = 20 blue ⁰

1

α

10000 0

1

α

10000 9500

If the values of the parameter under H₀ and under H₁ are “close”

(e.g. 10000 and 9500 respectively), only with large sample the probability of correct decision is large.

(15)

The power for one-sided and two-sided tests One-sided: H₀ : µ ≥ 10000 and H₁ : µ < 10000

R₀ = {X < 8908} P (µ) = PX < 8908| µ ∈ R

Two-sided: H₀ : µ = 10000 and H₁ : µ 6= 1000 R₀ = {X < 8700} ∪ {X > 11300}

P (µ) = PX < 8700 | µ ∈ R + PX > 11300) | µ ∈ R

(16)

One-sided red

H₀ : µ ≥ 10000 H₁ : µ < 10000

Two-sided blue

H₀ : µ = 10000 H₁ : µ 6= 10000

0 1

α

10000

mu=seq(7000,13000); mu0=10000; std=2100/sqrt(10) c1_u=qnorm(.05,mu0,std); p=pnorm(c1_u,mu,std)

c1_b=qnorm(.025,mu0,std);c2_b=qnorm(.975,mu0,std);p_b=pnorm(c1_b,mu,std)+1-pnorm(c2_b,mu,std) limx=c(7000,13000);limy=c(0,1)

plot(mu,p,type="l",lwd=3,xaxt="n",yaxt="n",xlab=" ",ylab=" ",col="red",xlim=limx,ylim=limy) par(new=T)

plot(mu,p_b,type="l",lwd=3,xaxt="n",yaxt="n",xlab=" ",ylab=" ",col="blue",xlim=limx,ylim=limy) axis(2, at = 0,0, las=1,cex=1.2,col="blue");axis(2, at = 1,1, las=1,cex=1.2,col="blue")

axis(2, at =0.05,expression(alpha), las=1,cex=1.2,col="blue") axis(1, at = mu0,mu0, las=1,cex=1.2,col="blue")

abline(h=c(0,1));abline(h=0.05,v=c(mu0),lty=3,lwd=2);abline(h=0.05,lty=3,lwd=2)

(17)

2. (c) The p^-value Recall

A statistical hypothesis test on a parameter θ is given by H₀ : θ ∈ Θ₀

H₁ : θ ∈ Θ₁ T ∼ F R₀ The power function is defined as

P (θ) = P (T ∈ R₀|θ) =







α(θ) if θ ∈ Θ₀ 1 − β(θ) if θ ∈ Θ₁

=







type I error reject H₀ when true 1 − type II error reject H₀ when false The size of the test is α = sup_θ∈Θ₀ P (θ)

and it holds α ≥ α(θ) for all θ ∈ Θ₀

A test is said of level α if its size is less than or equal to α

(18)

p-value

We have a family of statistical hypothesis tests H₀ : θ ∈ Θ₀

H₁ : θ ∈ Θ₁ T ∼ F t_obs from a sample X

and for all α ∈ (0, 1) we have R^α₀ (a rejection region for a size α test)

The p-value or observed significance level is the smallest level of significance at which H₀ would be rejected, namely

p-value = inf{α ∈ (0, 1) : t_obs ∈ R^α₀}

(19)

The most common situation

Consider the family of tests with R^α₀ = {T ≥ s_α} where s_α is computed from

P (T ≥ s_α|H₀) < α Note that α⁰ < α⁰⁰ if and only if s_α0 > s_α00

5 10 15 20

0.000.040.080.12

t

s s_α‘

α‘

α‘’ ^obs

p(t_obs) = sup

θ∈Θ₀

P (T > t_obs)

(20)

From the book by Wassermann p. 157

Informally, the p-value is a measure of the evidence against H0: the smaller the p-value, the stronger the evidence against H₀

Typically, researchers use the following evidence scale:

p-value evidence

< .01 very strong evidence against H0

.01 − .05 strong evidence against H0

.05 − .10 weak evidence against H0

> .1 little or no evidence against H0

Warning! A large p-value is not strong evidence in favor of H0. A large p-value can occur for two reasons: (i) H0 is true or (ii) H0 is false but the test has low power.

Warning! Do not confuse the p-value with P (H0|Data). The p-value is not the probability that the null hypothesis is true.

“evidence” ⇐⇒ “statistically significant”

The p-value depends on the sample size. If the sample is large, even a small difference can be “evidence”, that is hard to explain by the chance variability.

(21)

How to compute the p-value?

The p-value (as the rejection region) depends on the “form” of the alternative hypothesis

In practice, the p-value is the level of the test if the threshold of R₀ is the sample value.

Consider H₀ : µ = µ₀

Running example: µ₀ = 10000 and suppose x = 9000 H₁ : µ < µ₀

H₁ : µ > µ₀

H₁ : µ 6= µ₀

10000 9000

10000 11000

9000

p(9000) = 0.066

pnorm(9000,mu0,std)

p(9000) = 0.934

1-pnorm(9000,mu0,std)

p(9000) = 0.132

2*pnorm(9000,mu0,std) [notice: 2*pnorm]

(22)

Reference

Regina Nuzzo. (2014) Statistical Errors – P values, the “gold standard” of statistical validity, are not as reliable as many sci- entists assume. Nature, vol. 506, p. 150-2.

(23)

2. (d) Sample size n^{, given} α ^e β for one-sided test

H₀ : µ = µ₀ H₁ : µ = µ₁ (with µ₁ < µ₀) ⇒ R₀ = (−∞, s)

Let Z ∼ N (0, 1) and z_α be the α-th quantile of Z.

α = P X < s|µ = µ⁰

= P

X − µ0

σ/√

n < s − µ0

σ/√ n

= P (Z < z^α) β = P X > s|µ = µ¹

= P

X − µ1

σ/√

n > s − µ1

σ/√ n

= P Z > z1−β

From ^s−µ⁰

σ/√

n = z_α and ^s−µ¹

σ/√

n = z_1−β = −z_β we have:

n =

z_α + z_β² σ² (µ₀ − µ₁)²

The result is the same if µ1 > µ0.

The sample size n increases when:

- the distance between µ₀ and µ₁ decreases - the variance increases

- z_α and z_β increase, equivalently α and β decrease

(24)

3. Test on the mean of a Normal variable with unknown variance – t test

Let X ∼ N (µ, σ²) with unknown µ and σ²

The estimator of µ remains X. The unbiased estimator of σ² is S² = 1

n − 1

n X

i=1

X_i − X²

A small aside of probability

The random variable T has distribution Student’s t with n − 1 degrees of freedom:

T = X − µ S/√

n ∼ t_[n−1]

The density function and the cumulative distribution function of a t_[n] r.v. are close to those of the standard normal r.v. N (0, 1)

Dashed lines: t_[2] and t_[5] – solid line: N (0, 1) ⁻³ ⁻² ⁻¹ ⁰ ¹ ² ³

0.00.10.20.30.4

−3 −2 −1 0 1 2 3

0.00.10.20.30.4

−3 −2 −1 0 1 2 3

0.00.10.20.30.4

(25)

Running example. H₀ : µ ≥ 10000 H₁ : µ < 10000 Suppose σ unknown and estimated by 2000.

The form of the rejection region is the same when σ is known : R₀ = {X < s}

The threshold s is such that P

X < s|µ = 10000 = P X − 10000 2000/√

10 < s − 10000 2000/√

10

!

= P (T < t_α) = α = 0.05

where t_α is the α-th quantile of a random variable t_[9]. Then s − 10000

2000/√

10 = t_α and s = 10000 + t_α 2000/√ 10

We obtain s = 8841 from R:

> c = qt(.05,9)

> s=10000+c*2000/sqrt(10);s

(26)

4. The Wald test for the mean of random variable with unknown distribution

Consider a test on the mean µ of a random variable X with unknown distribution and assume a large sample size. Then an asymptotic test can be performed.

The pivot used for the test statistics is X − µ

S/√ n where S² is defined as in slide 22.

Its distribution can be approximated with a standard normal distribution:

X − µ S/√

n ∼

approx N (0, 1)

(27)

5. Aside of probability. Approximate distribution of the sum S_n and the mean X_n of random variables

independent and identically distributed (i.i.d.)

Let X₁, . . . , X_n be i.i.d. random variables with mean µ and variance σ²

Let S_n be the random variable S_n =

n X

i=1

X_i

S_n has mean n µ and variance n σ². Then ^S^√ⁿ^−nµ

nσ has mean 0 and variance 1.

Central limit theorem (CLT)

n→∞lim P

S_n − nµ

√nσ ≤ t

!

= P(Z ≤ t), for all t ∈ R with Z ∼ N (0, 1)

(28)

Approximate distribution of S_n and X_n

For “large” n S_n ∼

approx N (nµ, nσ²)

The random variable sample mean X_n = S_n/n ^has mean µ and variance σ²/n.

For “large” n X_n ∼

approx N (µ, σ²/n) How large n should be?

Comparison between exact distribution and approximate distribution via CLT of X_n for different n

Two cases, where the exact distribution of X_n is known

• exponential distribution

• Bernoulli distribution

(29)

Case 1

X₁, . . . , X_n i.i.d. with exponential distribution with parameter λ X₁ ∼ E(λ)

Plot of the density function of X₁ f (x) = λe^−λx for x > 0 for λ = 1, 5, 10

(very “asymmetrical”)

0.0 0.5 1.0 1.5 2.0

0246810

Esponenzial

0.0 0.5 1.0 1.5 2.0

0246810

Esponenzial

0.0 0.5 1.0 1.5 2.0

0246810

Esponenzial

What about the distribution of X_n?

(30)

Exact and approssimate distributions of X_n, X₁ ∼ E(2) Probability density functions of X_n

0.0 0.2 0.4 0.6 0.8 1.0

012345

n = 2

0.0 0.2 0.4 0.6 0.8 1.0

012345

0.0 0.2 0.4 0.6 0.8 1.0

012345

n = 5

0.0 0.2 0.4 0.6 0.8 1.0

012345

0.0 0.2 0.4 0.6 0.8 1.0

012345

n = 10

0.0 0.2 0.4 0.6 0.8 1.0

012345

0.0 0.2 0.4 0.6 0.8 1.0

012345

n = 20

0.0 0.2 0.4 0.6 0.8 1.0

012345

Cumulative distribution functions of X_n

0.0 0.2 0.4 0.6 0.8 1.0

0.00.40.8

n = 2

0.0 0.2 0.4 0.6 0.8 1.0

0.00.40.8

0.0 0.2 0.4 0.6 0.8 1.0

0.00.40.8

n = 5

0.0 0.2 0.4 0.6 0.8 1.0

0.00.40.8

0.0 0.2 0.4 0.6 0.8 1.0

0.00.40.8

n = 10

0.0 0.2 0.4 0.6 0.8 1.0

0.00.40.8

0.0 0.2 0.4 0.6 0.8 1.0

0.00.40.8

n = 20

0.0 0.2 0.4 0.6 0.8 1.0

0.00.40.8

(31)

Case 2

X₁, . . . , X_n i.i.d. with Bernoulli distribution with parameter p

Model for an experiment with two outcomes: 1 (success) with probability p and 0 with probability 1 − p:

X₁ ∼ B(1, p) with mean p and variance p(1 − p).

Plot of the distribution function of X₁ for p = 0.3

0.00.20.40.60.81.0

x

density function

0 1

S_n has exact binomial distribution B(n, p) (sum of ones in n independent trials)

X_n sometime is denoted by P^b (average of ones in n independent trials)

(32)

Exact and approssimate cumulative distribution functions of X_n, X₁ ∼ B(1, 0.3))

0.0 0.2 0.4 0.6 0.8 1.0

0.00.40.8

n = 2

0.0 0.2 0.4 0.6 0.8 1.0

0.00.40.8

0.0 0.2 0.4 0.6 0.8 1.0

0.00.40.8

n = 5

0.0 0.2 0.4 0.6 0.8 1.0

0.00.40.8

0.0 0.2 0.4 0.6 0.8 1.0

0.00.40.8

n = 10

0.0 0.2 0.4 0.6 0.8 1.0

0.00.40.8

0.0 0.2 0.4 0.6 0.8 1.0

0.00.40.8

n = 30

0.0 0.2 0.4 0.6 0.8 1.0

0.00.40.8

(33)

Example

How accurate are radon detectors of a type sold to homeowners?

University researchers placed 12 detectors in a chamber that ex- posed them to 105 pico-curies per liter (pCi/l) of radon.

The detector readings were as follows:

91.9,97.8,111.4,122.3,105.4,95.0,103.8,99.6,96.6,119.3,104.8,101.7

Is there convincing evidence that the mean reading of all detectors of this type differs from the nominal value of 105?

(34)

Computation in R

> p=c(91.9,97.8,111.4,122.3,105.4,95.0,103.8,99.6,96.6,119.3,104.8,101.7)-105

> t.test(p)

One Sample t-test data: p

t = -0.31947, df = 11, p-value = 0.7554

alternative hypothesis: true mean is not equal to 0 95 percent confidence interval:

-6.837503 5.104170 sample estimates:

mean of x -0.8666667

Note: for t.test data should be centered at the value of H₀.

(35)

6. Test on the proportion p of a binary event

Example.

It is known that 40% of the mice with eczema are free from symptoms in 4 weeks.

We consider a new the drug. It is efficient if more than 40% of mice are free from symptoms in 4 weeks.

From the population of mice that just shown eczema symptoms a sample is drawn; these mice are treated with the drug.

Parameter of interest:

p proportion mice free from symptoms in 4 weeks

(36)

The random variable X modeling the experiments has distribution X ∼ B(1, p)

with mean p and variance p(1 − p).

• Formulation of hypotheses:

H₀: the drug has no effect, p = 0.40

To state that the drug is efficient we should reject H₀ After 4 weeks: H₀: p = 0.40 and H₁: p > 0.40

• One-sided right test

• Sample size: n = 25

• Sample variables: X₁, . . . , X₂₅ i.i.d. X₁ ∼ B(1, p).

• Test statistics:

P = Xb

mean of ones in the sample.

(37)

Result of the experiment and decision

In the sample of 25 treated mice 12 were free of symptoms in 4 weeks: ˆp = 0.48

Assume H₀. Fix α = 5%. Use CLT:

Pb ∼

approx N

0.40, 0.40 0.60 25

• Rejection region of H₀: R₀ = (c, 1)

where c is the 0.95-th quantile of a N (0.40, 0.4 ∗ 0.6/25)

> s = sqrt(0.4*0.6/25)

> qnorm(0.95,0.40,s) [1] 0.5611621

R₀ = (0.56, 1)

• p-value of 0.48: P(P > 0.48 | p = 0.40)^b

> 1-pnorm(0.48,0.40,s)

[1] 0.2071081 p-value(0.48) = 0.21

There is no experimental evidence to reject H₀

(38)

Direct computation in R

> prop.test(12,25,0.40,alternative="greater",correct=F)

1-sample proportions test without continuity correction data: 12 out of 25, null probability 0.4

X-squared = 0.66667, df = 1, p-value = 0.2071

alternative hypothesis: true p is greater than 0.4 95 percent confidence interval:

0.3258181 1.0000000 sample estimates:

p 0.48

The default alternative is "two.sided"

The default level is conf.level=0.95

(39)

Exercise

Chicago land’s technology professionals get local technology news from various newspapers and magazines. A marketing company claims that 25% of the IT professionals choose the Chicago Tri- bune as their primary source for local IT news. A survey was conducted last month to check this claim. Among a sample of 750 IT professionals in the Chicago land area, 23.43% of them prefer the Chicago Tribune. Can we conclude that the claim of the marketing company is true?

(40)

7. Test for the equality of two means

A common application is to test if a new process or treatment is superior to a current process or treatment.

The data may either be paired or unpaired.

a) Paired samples When there is a one-to-one correspondence between the values in the two samples. That is, if X₁, X₂, . . . , X_n and Y₁, Y₂, . . . , Y_n are the two sample variables, then X_i cor- responds to Y_i.

b) Unpaired samples The sample sizes for the two samples may or may not be equal.

(41)

7. a) Paired samples

Let X and Y be two random variables modeling some character- istic of the same population.

Example. Drinking Water

(from https://onlinecourses.science.psu.edu Penny State University)

Trace metals in drinking water affect the flavor and an unusually high concentration can pose a health hazard.

Ten pairs of data were taken measuring zinc concentration in bottom water and surface water.

> water

bottom surface [1,] 0.430 0.415 [2,] 0.266 0.238 [3,] 0.567 0.390 [4,] 0.531 0.410 [5,] 0.707 0.605 [6,] 0.716 0.609 [7,] 0.651 0.632 [8,] 0.589 0.523 [9,] 0.469 0.411 [10,] 0.723 0.612

(42)

Assume X ∼ N (µ_X, σ_X² ) and Y ∼ N (µ_Y , σ_Y²).

Let (X₁, Y₁), . . . , (X_n, Y_n) be the n paired sample variables.

Consider the sample random variables D₁, . . . , D_n with D_i = X_i − Y_i.

The test statistics is the sample mean of D:

D ∼ N (µ_D, σ_D² /n)

with µ_D = µ_X − µ_Y and σ_D² = σ_X² + σ_Y² − 2Cov(X, Y ), usually unknown and estimated by the unbiased estimator S_D² .

The test of the equality of the two means, e.g. H₀ : µ_X = µ_Y , becomes a Student’s t test on µ_D, e.g. H₀ : µ_D = 0,

(43)

Example. Drinking Water (continue)

> D=water[,1]-water[,2];D

[1] 0.015 0.028 0.177 0.121 0.102 0.107 0.019 0.066 0.058 0.111

• Hypotheses: H₀ : µ_D = 0 and H₁ : µ_D 6= 0

• Two-sided right: R₀ = (−∞, c₁) ∪ (c₂, +∞)

• Sample size: n = 10

• Sample variables: D₁, . . . , D₂₅ i.i.d. X_i ∼ N (0, σ²) with σ² estimated by S_D²

• Test statistics: T = ^D−µ^D

S/√

n ; under H₀: T = ^D

S/√

n ∼ t₉

• α = 0.05

The thresholds of the rejection region c₁ and c₂ are such that 0.025 = P(T < c1 | µ_D = 0) 0.025 = P(T > c2 | µ_D = 0) Observe that, because of the symmetry w.r.t. 0 of the Student’s t density

c₁ = −c₂

(44)

In the sample: d = 0.0804 s = 0.052

The sample value of the test statistics, under H0, is 4.86

> d_m=mean(D);d_m; s=sd(D);s [1] 0.0804

[1] 0.05227321

> t=d_m/(s/sqrt(10));t [1] 4.863813

The rejection region is R0 = (−∞, −2.262) ∪ (2.262, ∞). The p-value is 0.0009

> c1=qt(0.025,9) [[1] -2.262157

> 2*(1-pt(t,9)) [1] 0.0008911155

The direct computation in R produces:

> t.test(water[,1],water[,2],paired=TRUE) Paired t-test

data: water[, 1] and water[, 2]

t = 4.8638, df = 9, p-value = 0.0008911

alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval:

mean of the differences 0.0804

There is experimental evidence to reject H0

(45)

7. b) Unpaired samples Example. Prey of two species of spiders

(from https://onlinecourses.science.psu.edu Penny State University)

The feeding habits of two species of net-casting spiders are stud- ied. The species, the deinopis and menneus, coexist in eastern Australia. The following data were obtained on the size, in mil- limeters, of the prey of random samples of the two species.

The spiders were selected randomly. Then assume the measure- ments are independent.

> d

[1] 12.9 10.2 7.4 7.0 10.5 11.9 7.1 9.9 4.4 11.3

> m

[1] 10.2 6.9 10.9 11.0 10.1 5.3 7.5 10.3 9.2 8.8

> mean(d);mean(m) [1] 10.26

[1] 9.02

d m

68101214

(46)

Normal distribution

Assume the size of the two population (denoted by A and B) have normal distribution

X_A ∼ N (µ_A, σ_A²) X_B ∼ N (µ_B, σ_B² )

We want to test H₀ : µ_A = µ_B and H₁ : µ_A 6= µ_B

or equivalently: H₀ : µ_A − µ_B = 0 and H₁ : µ_A − µ_B 6= 0

Let n_A and n_B be the size of the two independent sample of X_A and X_B. In the example n_A = n_B = 10.

The two sample mean random variable are:

X_A ∼ N µ_A, σ_A² n_A

!

X_B ∼ N µ_B, σ_B² n_B

!

Consider the random variable difference of the two sample mean random variables. It has normal distribution:

X_A − X_B ∼ N µ_A − µ_B, σ_A²

n_A + σ_B² n_B

!

The original test become a test on the mean of one normal random variable

(47)

1. The variances σ_A² and σ_B² are known

Fixed α, a usual normal test is carried out.

2. The variances σ_A² and σ_B² are unknown, and assumed equal and estimated by the unbiased estimators S_A² e S_B².

A unbiased estimator of the variance of the random variable X_A − X_B is:

S² = (n_A − 1)S_A² + (n_B − 1)S_B²

(n_A + n_B − 2) · n_A + n_B

n_A n_B (Pooled variance) In particular, if n_A = n_B = n, then S² = S_A² + S_B²

/n

The test statistics is:

T =

X_A − X_B − (µ_A − µ_B)

S with T ∼ t_d d = n_A + n_B − 2 Fixed α, a usual Student’s t test is carried out.

3. The unknown variances σ_A² and σ_A² are not equal

A hypothesis test based on the t distribution, known as Welch’s t-test, can be used.

(48)

Example. Prey of two species of spiders (continue)

• Hypotheses: H₀ : µ_D = µ_M and H₁ : µ_D 6= µ_M

• Two-sided right: R₀ = (−∞, c₁) ∪ (c₂, +∞)

• Sample size: n_D = n_M = 10

• First, assume σ_M² = σ_D² . Pooled variance estimator:

S² = S_D² + S_M² /n and d = 2n − 2

• Test statistics under H₀: T =

X_D − X_M

S ∼ t₁₈

• α = 0.05

The thresholds of the rejection region c₁ and c₂ are such that 0.025 = P(T < c1 | µ_D = µ_M) 0.025 = P(T > c2 | µ_D = µ_M)

(49)

The sample means of the two groups are:

x_D = 10.26 x_M = 9.0.2

The sample difference of means is: x_D − x_M = 1.24 The sample pooled variance is: s² = 1.01

The sample value of the test statistics, under H₀, is 1.18

> diff_m=mean(d)-mean(m);diff_m [1] 1.24

> s2=(sd(d)^2+sd(m)^2)/10;s2 [1] 0.9915556

> t=diff_m/sqrt(s2);t [1] 1.245269

The rejection region is R₀ = (−∞, −2.1) ∪ (2.1, ∞).

The p-value is 0.25

> c1=qt(0.025,18);c1 [1] -2.100922

> 2*(1-pt(t,18)) ## note 2*( ) -- two sided test [1] 0.2290008

There is no experimental evidence to reject H₀

(50)

Can we assume the two variances equal?

A specific test can be performed (based on the Fisher distribution). Here we do not give the details. Compute in R

> var.test(m, d, ratio = 1)

F test to compare two variances data: m and d

F = 0.56936, num df = 9, denom df = 9, p-value = 0.4142

alternative hypothesis: true ratio of variances is not equal to 1 95 percent confidence interval:

ratio of variances 0.5693585

We can assume the σ_D² = σ_M² , although the ratio of variances is 0.57. This apparent inconsistency is due to the small sample sizes.

Direct computation in R of the test

H₀ : µ_D = µ_M and H₁ : µ_D 6= µ_M, assuming σ_D² = σ_M²

> t.test(d,m,var.equal=T) Two Sample t-test data: d and m

t = 1.2453, df = 18, p-value = 0.229

mean of x mean of y

10.26 9.02

(51)

If the equality of the variances is rejection, we use the Welch Two Sample t-test.

In such a case the polled variance s² and the degrees of freedom are compute in an another manner.

Compute in R

t.test(d,m)

Welch Two Sample t-test data: d and m

t = 1.2453, df = 16.74, p-value = 0.2302

mean of x mean of y 10.26 9.02

The problem of making inference on means when variances are unequal, is, in general, quite a difficult one. It is known as the Behrens-Fisher Problem.

(G. Casella, R.J. Berger, Statistical Inference, 2nd ed., Duxbury, Ex.

8.42)

(52)

Notes and generalisations

• The Wald test. If the two random variables have not normal distribution and the sample size is “large” a Wald test can be performed.

• Threshold different from zero. In some applications, you may want to adopt a new process or treatment only if it exceeds the current treatment by some threshold. In this case, the difference between the two mean is not compared with 0 but with the chosen threshold.

(53)

Open more graphical devices in R

boxplot(a)

dev.new(); dev.set(dev.cur()) ## open a new device boxplot(b)

for (i in 1:2) dev.off() ## close two open devices