• Non ci sono risultati.

Inferential Statistics Multiple tests and other useful amenities

N/A
N/A
Protected

Academic year: 2021

Condividi "Inferential Statistics Multiple tests and other useful amenities"

Copied!
39
0
0

Testo completo

(1)

Inferential Statistics

Multiple tests and other useful amenities

Eva Riccomagno, Maria Piera Rogantin

DIMA – Universit`a di Genova

riccomagno@dima.unige.it rogantin@dima.unige.it

(2)

Review

Part D Test of the equality of two means (two-samples and paired samples)

Part E Multiple tests

Part F Confidence intervals

(3)

Review

Exercise. Chicago Tribune. Chicago land’s technology professionals get local technology news from various newspapers and magazines. A marketing company claims that 25% of the IT professionals choose the Chicago Tribune as their primary source for local IT news. A survey was conducted last month to check this claim. Among a sample of 750 IT professionals in the Chicago land area, 23.43% of them prefer the Chicago Tribune. Can we conclude that the claim of the marketing company is true?

The random variable modeling the preference of the Chicago Tribune is X ∼ B(1, p).

Test statistic: P = X; sample value: ˆb p = 0.2343.

Large sample size (n = 750). Using CLT P ∼ Nb p, p(1−p)n . H0 : p = 0.25 which H1? : p 6= 0.25 or p < 0.25??

(4)

p-value computation in R using t.test.

> prop.test(np,750,0.25)

1-sample proportions test with continuity correction data: np out of 750, null probability 0.25

X-squared = 0.904, df = 1, p-value = 0.3417

alternative hypothesis: true p is not equal to 0.25 95 percent confidence interval:

0.2047542 0.2666131 sample estimates:

p 0.2343

> prop.test(np,750,0.25,"less")

1-sample proportions test with continuity correction data: np out of 750, null probability 0.25

X-squared = 0.904, df = 1, p-value = 0.1709

alternative hypothesis: true p is less than 0.25 95 percent confidence interval:

0.0000000 0.2613561 sample estimates:

p 0.2343

In both cases there is not evidence to reject H0 (p = 0.25)

(5)

Part D. Test for the equality of two means

A common application is to test if a new process or treatment is superior to a current process or treatment.

The data may either be paired or unpaired.

a) Paired samples When there is a one-to-one correspondence between the values in the two samples. That is, if X1, X2, . . . , Xn and Y1, Y2, . . . , Yn are the two sample variables, then Xi cor- responds to Yi.

b) Unpaired samples The sample sizes for the two samples may or may not be equal.

(6)

a) Paired samples

Let X and Y be two random variables modeling some character- istic of the same population.

Example. Drinking Water

(from https://onlinecourses.science.psu.edu Penny State University)

Trace metals in drinking water affect the flavor and an unusually high con- centration can pose a health hazard.

Ten pairs of data were taken measuring zinc concentration in bottom water and surface water.

> water

bottom surface [1,] 0.430 0.415 [2,] 0.266 0.238 [3,] 0.567 0.390 [4,] 0.531 0.410 [5,] 0.707 0.605 [6,] 0.716 0.609 [7,] 0.651 0.632 [8,] 0.589 0.523 [9,] 0.469 0.411 [10,] 0.723 0.612

(7)

Assume X ∼ N (µX, σX2 ) and Y ∼ N (µY , σY2).

Th test hypotheses are:

H0 : µX = µY and H1 : µX 6= µY

or equivalently: H0 : µX − µY = 0 and H1 : µX − µY 6= 0 H1 could be µX < µY or µX > µY and possibly H0 composite.

Let (X1, Y1), . . . , (Xn, Yn) be the n paired sample variables.

Consider the sample random variables D1, . . . , Dn with Di = Xi − Yi.

Consider the sample mean of D:

D ∼ N (µD, σD2 /n)

with µD = µX − µY and σD2 = σX2 + σY2 − 2Cov(X, Y ), usually un- known and estimated by the unbiased estimator SD2 .

The test of the equality of the two means becomes a Student’s t test on µD, with H0 : µD = 0.

(8)

Example. Drinking Water (continue)

> D=water[,1]-water[,2];D

[1] 0.015 0.028 0.177 0.121 0.102 0.107 0.019 0.066 0.058 0.111

• Hypotheses: H0 : µD = 0 and H1 : µD 6= 0

• Two-sided: R0 = (−∞, c1) ∪ (c2, +∞)

• Sample size: n = 10

• Sample variables: D1, . . . , D25 i.i.d. Di ∼ N (0, σD2 ) with σD2 estimated by SD2

• Test statistic under H0: T = D

SD/

n ∼ t9

• α = 0.05

The thresholds of the rejection region c1 and c2 are such that 0.025 = P(T < c1 | µD = 0) 0.025 = P(T > c2 | µD = 0) Observe that, because of the symmetry w.r.t. 0 of the Student’s t density

c1 = −c2

(9)

In the sample: d = 0.0804 s = 0.052

The sample value of the test statistic, under H0, is 4.86

> d_m=mean(D);d_m; s=sd(D);s [1] 0.0804

[1] 0.05227321

> t=d_m/(s/sqrt(10));t [1] 4.863813

The rejection region is R0 = (−∞, −2.262) ∪ (2.262, ∞). The p-value is 0.0009

> c1=qt(0.025,9) [[1] -2.262157

> 2*(1-pt(t,9)) [1] 0.0008911155

The direct computation in R produces:

> t.test(water[,1],water[,2],paired=TRUE) Paired t-test

data: water[, 1] and water[, 2]

t = 4.8638, df = 9, p-value = 0.0008911

alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval:

0.043006 0.117794 sample estimates:

mean of the differences 0.0804

There is experimental evidence to reject H0

(10)

b) Unpaired samples Example. Prey of two species of spiders

(from https://onlinecourses.science.psu.edu Penny State University)

The feeding habits of two species of net-casting spiders are stud- ied. The species, the deinopis and menneus, coexist in eastern Australia. The following data were obtained on the size, in mil- limeters, of the prey of random samples of the two species.

The spiders were selected randomly. Then assume the measure- ments are independent.

> d=c(12.9,10.2,7.4,7.0,10.5,11.9,7.1,9.9,4.4,11.3)

> m=c(10.2,6.9,10.9,11.0,10.1,5.3,7.5,10.3,9.2,8.8)

> mean(d);mean(m) [1] 10.26

[1] 9.02

d m

68101214

(11)

Normal distribution

Assume the size of the two population (denoted by A and B) have normal distribution

XA ∼ N (µA, σA2) XB ∼ N (µB, σB2 )

We want to test H0 : µA = µB and H1 : µA 6= µB

or equivalently: H0 : µA − µB = 0 and H1 : µA − µB 6= 0

Let nA and nB be the size of the two independent sample of XA and XB. In the example nA = nB = 10.

The two sample mean random variable are:

XA ∼ N µA, σA2 nA

!

XB ∼ N µB, σB2 nB

!

Consider the random variable difference of the two sample mean random variables. It has normal distribution:

XA − XB ∼ N µA − µB, σA2

nA + σB2 nB

!

The original test become a test on the mean of one normal random variable

(12)

1. The variances σA2 and σB2 are known

Fixed α, a usual normal test is carried out.

2. The variances σA2 and σB2 are unknown, and assumed equal and estimated by the unbiased estimators SA2 e SB2.

A unbiased estimator of the variance of the random variable XA − XB is:

S2 = (nA − 1)SA2 + (nB − 1)SB2

(nA + nB − 2) · nA + nB

nA nB (Pooled variance) In particular, if nA = nB = n, then S2 = SA2 + SB2

/n

The test statistic is:

T =

XA − XB − (µA − µB)

S with T ∼ td d = nA + nB − 2 Fixed α, a usual Student’s t test is carried out.

3. The unknown variances σA2 and σA2 are not equal

A hypothesis test based on the t distribution, known as Welch’s t-test, can be used.

(13)

Example. Prey of two species of spiders (continue)

• Hypotheses: H0 : µD = µM and H1 : µD 6= µM

• Two-sided: R0 = (−∞, c1) ∪ (c2, +∞)

• Sample size: nD = nM = 10

• First, assume σM2 = σD2 . Pooled variance estimator:

S2 = SD2 + SM2  /n and d = 2n − 2

• Test statistic under H0: T =

XD − XM

S ∼ t18

• α = 0.05

The thresholds of the rejection region c1 and c2 are such that 0.025 = P(T < c1 | µD = µM) 0.025 = P(T > c2 | µD = µM)

(14)

The sample means of the two groups are:

xD = 10.26 xM = 9.0.2

The sample difference of means is: xD − xM = 1.24 The sample pooled variance is: s2 = 1.01

The sample value of the test statistic, under H0, is 1.18

> diff_m=mean(d)-mean(m);diff_m [1] 1.24

> s2=(sd(d)^2+sd(m)^2)/10;s2 [1] 0.9915556

> t=diff_m/sqrt(s2);t [1] 1.245269

The rejection region is R0 = (−∞, −2.1) ∪ (2.1, ∞).

The p-value is 0.25

> c1=qt(0.025,18);c1 [1] -2.100922

> 2*(1-pt(t,18)) ## note 2*( ) -- two sided test [1] 0.2290008

There is no experimental evidence to reject H0

(15)

Can we assume the two variances equal?

A specific test can be performed (based on the Fisher distribu- tion). Here we do not give the details. Compute in R

> var.test(m, d, ratio = 1)

F test to compare two variances data: m and d

F = 0.56936, num df = 9, denom df = 9, p-value = 0.4142

alternative hypothesis: true ratio of variances is not equal to 1 95 percent confidence interval:

0.1414206 2.2922339 sample estimates:

ratio of variances 0.5693585

We can assume the σD2 = σM2 , although the ratio of variances is 0.57. This apparent inconsistency is due to the small sample sizes.

Direct computation in R of the test

H0 : µD = µM and H1 : µD 6= µM, assuming σD2 = σM2

> t.test(d,m,var.equal=T) Two Sample t-test data: d and m

t = 1.2453, df = 18, p-value = 0.229

alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval:

-0.8520327 3.3320327 sample estimates:

mean of x mean of y 10.26 9.02

(16)

If the equality of the variances is rejected, we use the Welch Two Sample t-test.

In such a case the polled variance s2 and the degrees of freedom are compute in an another manner.

Compute in R

t.test(d,m)

Welch Two Sample t-test data: d and m

t = 1.2453, df = 16.74, p-value = 0.2302

alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval:

-0.8633815 3.3433815 sample estimates:

mean of x mean of y 10.26 9.02

The problem of making inference on means when vari- ances are unequal, is, in general, quite a difficult one. It is known as the Behrens-Fisher Problem.

(G. Casella, R.J. Berger, Statistical Inference, 2nd ed., Duxbury, Ex.

8.42)

(17)

Notes and generalisations

• The Wald test. If the two random variables have not normal distribution and the sample size is “large” a Wald test can be performed.

• Threshold different from zero. In some applications, you may want to adopt a new process or treatment only if it exceeds the current treatment by some threshold. In this case, the difference between the two mean is not compared with 0 but with the chosen threshold.

(18)

Part E. Multiple tests

We may need to conduct many hypothesis tests with the same data.

Suppose each test is conducted at level α.

For any one test, the chance of a false rejection of the null is α.

But the chance of at least one false rejection is much higher.

Examples:

• Measuring the state of anxiety by questionnaire in two groups of subjects. Various questions help define the level of anxiety.

As more topics are compared, it becomes more likely that the two groups will appear to differ on at least one topic by random chance alone.

• Efficacy of a drug in terms of the reduction of any one of a number of disease symptoms. It becomes more likely that the drug will appear to be an improvement over existing drugs in terms of at least one symptom.

(19)

Consider m independent hypothesis tests:

H0i and H1i for i = 1, . . . , m Example.

For α = 0.05 and m = 2

Probability to retain both H01 and H02 when true: (1 − α)2 = 0.952 = 0.90 Probability to reject at least one true hypothesis: 1−(1−α)2 = 1−0.952 = 0.10

For α = 0.05 and m = 20

Probability to reject at least one true hypothesis:

1 − (1 − α)20 = 1 − 0.9520 = 0.64  α

There are many ways to deal with this problem. Here we discuss two methods.

(20)

Bonferroni (B) Method

Given p-values p1, . . . , pm, reject null hypothesis H0i if pi ≤ α

m

The probability of falsely rejecting any null hypotheses is less than or equal to α.

Example (continue)

m = 2: 1 − (1 − (0.05/2))2 = 0.0493 m = 20: 1 − (1 − (0.05/20))20 = 0.0488

Benjamini-Hochberg (BH) Method

1. Let p(1) < · · · < p(m) denote the ordered p-values.

2. Reject all null hypotheses H0(i) for which p(i) < αi m

If the tests are not independent the value to compare p(i) is appropriately adjusted.

(21)

Example

Consider the following 10 (sorted) p-values. Fix α = 0.05.

p=c(0.00017,0.00448,0.00671,0.00907,0.01220,0.33626,0.39341, 0.53882,0.58125,0.98617)

alpha=0.05; m=length(p); i=seq(1,m);

b=i/m*alpha; BH=(p<b);

B=(p<alpha/m); cbind(p,b,BH,B) p b BH B

[1,] 0.00017 0.005 1 1 [2,] 0.00448 0.010 1 1 [3,] 0.00671 0.015 1 0 [4,] 0.00907 0.020 1 0 [5,] 0.01220 0.025 1 0 [6,] 0.33626 0.030 0 0 [7,] 0.39341 0.035 0 0 [8,] 0.53882 0.040 0 0 [9,] 0.58125 0.045 0 0 [10,] 0.98617 0.050 0 0

Reject H0i for

- i = 1, 2 with Bonferroni method

- i = 1, 2, 3, 4, 5 with Benjamini-Hochberg method.

(22)

Abuse of test

Warning! There is a tendency to use hypothesis testing meth- ods even when they are not appropriate. Often, estimation and confidence intervals are better tools. Use hypothesis testing only when you want to test a well-defined hypothesis.

(from Wassermann)

(23)

A summary of the paper by Regina Nuzzo. (2014) Statistical Errors – P values, the “gold standard” of statistical validity, are not as reliable as many scientists assume. Nature, vol. 506, p. 150-2.

• Ronald Fisher 1920s

– intended as an informal way to judge whether evidence was significant for a second look

– one part of a fluid, non-numerical process that blended data and background knowledge to lead to scientific con- clusions

• Interpretation

– the P value summarizes the data assuming a specific null hypothesis

(24)

• Caveats

– tendency to deflect attention from the actual size of an effect

– P-hacking or significance-chasing, including making as- sumptions, monitor data while it is being collected, ex- cluding data points, ...

• Measures that can help – look for replicability

– do not ignore you exploratory studies nor prior knowledge – report effect sizes and confidence intervals

– take advantage of Bayes’ rule (not part of this course unfortunately)

– try multiple methods on the same data set

– adopt a two-stage analysis, or “preregistered replication”

(25)

Part F

Confidence intervals 1. Introduction

2. Confidence interval for the mean

(a) of a Normal variable – known variance (b) of a Normal variable – unknown variance

(c) of a variable with unknown distribution (approximate) 3. Different levels 1 − α

4. Asymmetric and one-sided intervals 5. Confidence intervals and tests

(26)

1. Introduction

A 1− α confidence interval for a parameter θ is an interval (L, U ) where L and U are functions of the random sample X1, . . . , Xn (i.e. are random variables) such that

P (θ ∈ (L, U )) ≥ 1 − α for all θ ∈ Θ The interval (L, U ) is also called interval estimator.

1 − α is called the coverage of the confidence interval. Usually 1 − α = 0.95

We have over a 1− α chance of covering the unknown parameter with the estimator interval (from Casella Berger).

(27)

From Wassermann.

Warning! (L, U ) is random and θ is fixed.

Warning! There is much confusion about how to interpret a confidence interval. A confidence interval is not a probability statement about θ since θ is a fixed quantity [. . . ].

Warning! Some texts interpret confidence intervals as follows: if I repeat the experiment over and over, the interval will contain the parameter 95 percent of the time. This is correct but useless since we rarely repeat the same experiment over and over. [. . . ] Rather

- day 1: θ1 ⇒ collect data ⇒ construct a 95% IC for θ1 - day 2: θ2 ⇒ collect data ⇒ construct a 95% IC for θ2 - day 3: θ3 ⇒ collect data ⇒ construct a 95% IC for θ3 - . . .

Then 95 percent of your intervals will trap the true parameter value. There is no need to introduce the idea of repeating the same experiment over and over.

(28)

2. Confidence interval for the mean of a random variable Let X1, . . . , Xn be an i.i.d. random sample. Parameter of interest µ.

• Point estimator: X;

point estimate x (sample value of X at the observed data points)

• Confidence interval or Interval estimator with coverage 1− α:

X − δ , X + δ with δ such that P

X − δ < µ < X + δ = 1 − α

The limit of the interval X − δ and X + δ are random variables The sample confidence interval is:

(x − δ, x + δ)

How to compute δ? Using the (exact or approximate) distribution of the point estimator X.

(29)

2. (a) Confidence interval for the mean of a Normal variable – known variance

For X1, . . . , Xn i.i.d. sample random variables with X1 ∼ N (µ, σ2) X ∼ N µ, σ2

n

!

or Z = X − µ σ/√

n ∼ N (0, 1)) 1 − α = PX − δ < µ < X + δ = P

µ − δ < X < µ + δ

Example.

X1 ∼ N (µ, 4) n = 9 1−α = 0.95

X ∼ N



µ, 4 9



0.00.10.20.30.40.50.60.00.10.20.30.40.50.6

µ − δ µ µ + δ

(30)

Computation of δ

1 − α = P µ − δ < X < µ + δ

= P µ − δ − µ

σ n

< X − µ

σ n

< µ + δ − µ

σ n

!

= P δ

σ n

< Z < δ

σ n

!

δ

σ n

= z1−α/2 δ = z1−α/2 σ

√n

Density functions of - Z ∼ N (0, 1)

- X ∼ N (µ, 4/9) z1−0.05/2 = 1.96 δ = 1.31

0.00.10.20.30.40.50.60.00.10.20.30.40.50.6

µ − δ µ µ + δ

0.00.10.20.30.40.50.60.00.10.20.30.40.50.6

1.96 0 1.96

Confidence interval for µ: X − z1−α/2 σ

√n, X + z1−α/2 σ

√n

!

(31)

Sample confidence interval for µ:

x − z1−α/2 σ

√n, x + z1−α/2 σ

√n

!

- we do not know if µ belongs or not to this sample interval whose limits are computed using the sample value x

- whit another x the interval would be different

Among all possible confidence intervals constructed as before, 95% contains µ and 5% does not.

Simulation for 100 samples: n = 80 σ2 = 4 1 − α = 95%

x − 1.96 2/√

80, x + 1.96 2/√

80

6 intervals do not contain µ.

(32)

2. (b) Confidence interval for the mean of a Normal variable – unknown variance

For X1, . . . , Xn random sample and X1 ∼ N (µ, σ2) as point esti- mator of µ and σ2 take X and S2 respectively.

Consider the random variable

T = X − µ S/√

n ∼ t[n−1]

The computation of the confidence interval for µ is similar to the normal case

X − t1−α/2 S

√n, X + t1−α/2 S

√n

!

2. (c) Confidence interval for the mean of a random variable with unknown distribution

If the sample size is “large” we can use the approximate distri- bution of X via CLT:

X − z1−α/2 S

√n, X + z1−α/2 S

√n

!

(33)

Examples

Consider an i.i.d random sample and fix α = 0.05

• n = 36

– X1 ∼ N (µ, 4); confidence interval: (¯x − 1.96 2/6, ....)

– X1 ∼ N (µ, σ2), s = 2; confidence interval: (¯x−2.03 2/6, ....)

• n = 100

– X1 ∼ N (µ, 4); confidence interval: (¯x − 1.96 2/10, ....)

– X1 ∼ N (µ, σ2), s = 2; confidence interval: (¯x−1.98 2/10, ....) where ¯x and s obtained from the same data.

(34)

3. Different coverage coefficients 1 − α A 95%-confidence interval or a 99%-confidence interval?

Values of the 1 − α/2 quantile of a standard normal random variable N (0, 1): z0.950 = 1.64 z0.975 = 1.96 z0.995 = 2.58

0.95 0.99 0.90

What is gained in precision is lost in range

Example X ∼ N (µ, 4/80) and assume x = 2.5:

- at 90%: δ = 0.37 sample confidence interval (1.92, 3.08) - at 95%: δ = 0.44 sample confidence interval (2.06, 2.94) - at 99%: δ = 0.58 sample confidence interval (2.13, 2.87)

(35)

4. Asymmetric and one-sided intervals

When the distribution of the point estimator used to compute the limits of the confidence interval is symmetric w.r.t. the parameter of interest, it is natural to consider “symmetric” intervals, e.g.

(X − δ, X + δ).

This is not the case, for instance, when parameter of interest is σ2. The confidence interval is (we do not give the details):

S2 n − 1

qr , S2 n − 1 ql

!

where ql is the quantile αl and qr is the quantile 1 − αr of a Chi square random variable with n − 1 degrees of freedom, with

αl + αr = α non necessarily αl = αr = α/2

(36)

Examples n = 5 S2 n−1q

r , S2 n−1q

l



αl = 0.03, αr = 0.02

n−1

qr = 0.3 n−1q

l = 4.4

Asymmetric confidence interval for σ2

0.3 S2, 4.4 S2

αl = 0.05 n−1q

l = 3.5

One-sided confidence interval for σ2

0, 3.5 S2

0 5 10 15

0.000.050.100.15

0 5 10 15

0.000.050.100.15

0 5 10 15

0.000.050.100.15

q_l q_r

0 5 10 15

0.000.050.100.15

0 5 10 15

0.000.050.100.15

q_l

(37)

5. Confidence intervals and tests Parameter of interest µ of a N (µ, σ) with σ known

Two-sided 1 − α confidence interval for µ and two-sided test at level α (H0 : µ = µ0 and H1 : µ 6= µ0)

H0 is retained for x ∈ (µ0 − δ, µ0 + δ) The sample confidence interval is (x − δ, x + δ)

( µ )

( x

A

) ( x

B

)

0

where δ = z1−α/2 σ

n for both confidence interval and test.

The interval where H0 is retained is centered in µ0 while the confidence interval is centered in x

If the sample confidence interval contains µ0 then H0 is retained, and viceversa.

(38)

One-side right confidence interval and one-sided left test

H0 is retained for x ∈ (µ0 − δ1, +∞) The sample left confidence interval is (∞, x + δ1)

x

A

( µ

)

0

( )

where δ1 = z1−α σ

n for both confidence interval and test.

Remark.

If the parameter of interest is a proportion p then δ is different for confidence intervals and tests, because it depends on the standard deviation. In the first case it is calculated using the sample value ˆp, in the second one using p0.

(39)

Compare tests and confidence intervals in output R

Example. Chicago Tribune (continue)

> prop.test(np,750,0.25)

1-sample proportions test with continuity correction data: np out of 750, null probability 0.25

X-squared = 0.904, df = 1, p-value = 0.3417

alternative hypothesis: true p is not equal to 0.25 95 percent confidence interval:

0.2047542 0.2666131 sample estimates:

p 0.2343

> prop.test(np,750,0.25,"less")

1-sample proportions test with continuity correction data: np out of 750, null probability 0.25

X-squared = 0.904, df = 1, p-value = 0.1709

alternative hypothesis: true p is less than 0.25 95 percent confidence interval:

0.0000000 0.2613561 sample estimates:

p 0.2343

Riferimenti

Documenti correlati

The challenge is to use the one sample that is available, together with the assumption of independent and iden- tically distributed sample values, to infer the sampling distribution

It is known that the probability of success, p, is either 0.3 or 0.7 and 20 independent trials are performed in exactly the same way Aim: infer the true value of p from the outcomes

2.. the critical value is the smallest s s.t.. 10000 and 9500 respectively), only with large sample the probability of correct decision

If the sample is large, even a small difference can be “evidence”, that is hard to explain by the chance variability.. are close to those of the standard

A marketing company claims that 25% of the IT professionals choose the Chicago Tri- bune as their primary source for local IT news. A survey was conducted last month to check

b) two-samples.. Chicago Tribune Chicago land’s technology professionals get local technology news from various newspapers and magazines. A marketing company claims that 25% of the

Some texts interpret confidence intervals as follows: if I repeat the experiment over and over, the interval will contain the parameter 1 − α percent of the time, e.g. This is

If the points form a nearly linear pattern, the normal distribution is a good model for this data. Departures from the straight line indicate departures