• Non ci sono risultati.

Inferential Statistics

N/A
N/A
Protected

Academic year: 2021

Condividi "Inferential Statistics"

Copied!
36
0
0

Testo completo

(1)

Inferential Statistics

Eva Riccomagno, Maria Piera Rogantin

DIMA – Universit`a di Genova

riccomagno@dima.unige.it rogantin@dima.unige.it

(2)

Part G

Distribution free hypothesis tests

1. Classical and distribution-free tests 2. Distribution-free statistics and tests

3. Aside of Probability. Two distribution-free statistics 4. The sign test

5. The Wilcoxon-Mann-Whitney test 6. The goodness-of-fit tests

a) Chi-square test

b) Kolmogorov-Smirnov tests (one and two samples) 7. Final remarks

(3)

1. Classical and distribution-free tests

• Differences between independent groups

– Classical: t-test (or Welch test) to compare the mean of two groups; ANOVA for more groups

– Distribution-free: Mann-Whitney U test and Kolmogorov- Smirnov two-sample test; Kruskal-Wallis and Median test for more groups.

• Differences between variables

– Classical: t-test for paired samples; repeated measures ANOVA for more than two variables

– Distribution-free: Sign test and Wilcoxon’s matched pairs test.

• Relationships between variables

– Classical: correlation coefficient.

– Distribution-free: Spearman R,... . For binary variables:

Chi-square test, Phi coefficient, and Fisher exact test.

(4)

2. Distribution-free statistics and tests

Let X1, . . . , Xn ∼ F be i.i.d. sample variables.

A statistic T = T (X1, . . . , Xn) is distribution-free if its distribution is invariant for each distribution of the sample variables.

An example: the Wald test (using CLT approximation for the distribution of Xn). Under H0 : µ = µ0 for large n

Xn − µ0 S/√

n ∼

approx N (0, 1)

This is a particular case of: the statistics with asymptotic (limit)

distribution independent from the sample distribution are distribution- free.

A test is distribution-free if the test statistic is distribution-free.

(5)

3. Aside of Probability. Two distribution free statistics

Sign Statistics

Consider any i.i.d. random sample X1, . . . , Xn with median equal to 0.

Assume P(Xi = 0) = 0, for i = 1, . . . , n (e.g. Xi continuous).

Define Zi =

( 1 if Xi > 0

0 if Xi < 0 and note that Zi ∼ B(1, 1/2) The statistic B =

n X

i=1

Zi ∼ B(n, 1/2) is distribution free

Furthermore, for large n the statistic B − 1/2

1/2√

n ∼

approx N (0, 1)

(6)

Rank statistics

Consider the sample variables X1, . . . , Xn and the corresponding rank variables R1, . . . , Rn where Ri represents the position of Xi in the sample

Note. In lecture 2 we saw that an observed sample can be ordered by eg. the R command sort. Also random variables can be sorted returning the ordered random vector (X(1), ..., X(n)). Thus R1 is the index of the minimum of the random sample and it is a random variable.

The joint distribution of (R1, . . . , Rn) does not depend on the distribution of the sample variables.

We do not give here the details (proof based on combinatorial computation).

If the data contains ties, to the tied values assign the average of the ranks they would have received had they not been tied.

E.g. to the values 13 14 14 16 17 are assigned the ranks 1 2.5 2.5 3 4.

(7)

4. The simplest distribution-free test: the sign test

a) Test for the median of a random variable

b) Test for the equality of two medians - paired sample a) Test for the median of a random variable.

Consider X1, . . . , Xn i.i.d. random sample and the test with hypotheses:

H0 : Q2 = λ0 against H1 : Q2 6= λ0 H1 could be Q2 < λ0 or Q2 > λ0.

Consider Zi =

( 1 if Xi ≥ λ0

0 if Xi < λ0 then Zi ∼ B(1, 1/2) and the test statistic is

B =

n X i=1

Zi

Under H0, B ∼ B(n, 1/2) The test is carried out as usual.

(8)

b) Test for the equality of two medians - paired samples

Let X and Y be two continuous random variables modeling some characteristic of the same population, with median Q2X and Q2Y respectively. Consider a test with hypotheses:

H0 : Q2X = Q2Y against H1 : Q2X > Q2Y

H1 could be Q2X 6= Q2Y or Q2X < Q2Y .

Let (X1, Y1), . . . , (Xn, Yn) be the n paired random sample and define (D1, . . . , Di, . . . , Dn) with Di = Xi − Yi.

The test hypotheses become

H0 : Q2D = 0 against H1 : Q2D > 0

and we fall in the set-up of case a).

Remark. A powerful alternative for both a) and b) is the rank signed Wilcoxon test. We do not give here the details.

(9)

Example. Deer legs

Zar, Jerold H. (1999), ”Chapter 24: More on Dichotomous Variables”, Bio- statistical Analysis (Fourth ed.), Prentice-Hall

The null hypothesis is that there is no difference be- tween the hind leg and fore- leg length in deer. The al- ternative hypothesis is that the hind leg length is longer than foreleg length.

Thus:

Deer Hind leg Foreleg Diff. sign

1 142 138 +

2 140 136 +

3 144 147 -

4 144 139 +

5 142 143 -

6 146 141 +

7 149 143 +

8 150 145 +

9 142 136 +

10 148 146 +

H0 : Q2D = 0 against H1 : Q2D > 0

Under H0 the test statistic is B = P10i=1Zi ∼ B(10, 1/2). Its sam- ple value is b = 8.

The test is one-sided right.

The p-value of b is 0.055 (in R: 1-pbinom(7,10,0.5)).

(10)

Direct computation in R

> binom.test(8,10,alternative ="greater") Exact binomial test

data: 8 and 10

number of successes = 8, number of trials = 10, p-value = 0.05469

alternative hypothesis: true probability of success is greater than 0.5 95 percent confidence interval:

0.4930987 1.0000000 sample estimates:

probability of success 0.8

There is “weak evidence” against H0. When the sample size is small the tails of the test statistic distribution are large. This leads to reject H0 often. In practice, to overcome this choose a high α. In our example there is evidence to retain H0.

(11)

5. The Mann-Whitney U test or Wilcoxon rank-sum test (equality of two distributions - unpaired samples)

The null hypothesis can be expressed as the probability of an observation from the population X exceeding an observation from the population Y equals the probability of an observation from Y exceeding an observation from X:

H0 : P(X > Y ) = P(X < Y ) = 0.5

The alternative hypothesis can be stated in terms of one-sided (left or right) or two-sided test.

Here X and Y are two continuous independent random variables and to test H0 we consider X1, . . . , Xn1 and Y1, . . . , Yn2 two inde- pendent random samples, with possibly different size.

The variables could be discrete or ordinal with P(X = Y ) = 0.

(12)

Put together the two samples, so that there are n = n1 + n2 observations in total.

Let R1, . . . , Rn1 be the rank variables assigned to X1, . . . , Xn1 and Rn1+1, . . . , Rn the rank variables assigned to Y1, . . . , Yn2.

The statistics W1 =

n1 X

i=1

Ri and U1 = W1 − n1(n1 + 1) 2

are distribution-free and are used as test statistic.

U1 takes integer values between 0 and n1n2.

The statistics W2 and U2 (based on ranks of the Y ’s) are defined analogously. Moreover W1 + W2 = n(n + 1)/2.

Which between W1 and W2 is to consider? (or U1 and U2?) Usually the statistics with lower sample value is used.

(13)

A small example

Does the treatment A produce lower values of a variable than the treatment B?

Denote by X and Y the variables modeling the results of treat- ment A and B respectively.

H0 : P(X < Y ) = P(X > Y ) H1 : P(X < Y ) > P(X > Y ) Seven elements are drawn from the population at random.

Three, randomly chosen, are assigned to treatment A; the other four to treatment B: n1 = 3 and n2 = 4.

The sample values and the corresponding sample ranks are xi 12 16 13 r(xi) 1 4 2

yi 17 15 18 20 r(yi) 5 3 6 7 The sample value of W1 is w = 7.

(14)

Computation of the distribution of W1 under H0 (n1 = 3, n2 = 4)

W1 is sum of 3 different numbers chosen among {1, . . . , 7}.

It takes values between 6 and 18. It is symmetrical w.r.t. 12.

How many ways are there to form w?

- 6: one way, 1 + 2 + 3;

- 7: one way, 1 + 2 + 4;

- 8: two ways, 1 + 2 + 5 and 1 + 3 + 4; . . .

Under H0, the three ranks of X are randomly chosen among {1, . . . , 7}: 73

= 35 cases.

Then the distribution of W1 for n1 = 3 and n2 = 4 is

w associated ranks fW1(w)

6 (1,2,3) 1/35

7 (1,2,4) 1/35

8 (1,2,5);(1,3,4) 2/35

9 (1,2,6); (1,3,5); (2,3,4) 3/35

10 (1,2,7); (1,3,6); (1,4,5); (2,3,5) 4/35 11 (1,3,7); (1,4,6); (2,4,5); (2,3,6) 4/35 12 (1,4,7); (1,5,6); (2,3,7); (2,4,6); (3,4,5) 5/35

The distribution of W1 depends only on n1 and n2: W1 is a distribution-free statistics

(15)

Some properties of W1 and U1 under H0

Minimum value: all the ranks of the Xi’s are smaller than the ranks of the Yi’s:

min(W1) =

n1

X

i=1

i = n1(n1 + 1)

2 min(U1) = 0

Maximum value: all the ranks of the Yi’s are smaller than the ranks of the Xi’s:

max(W1) =

n

X

i=n1+1

i = n1(n + n2 + 1)

2 max(U1) = n1n2

Mean value:

E(W1) = n1 (n + 1)

2 E(U1) = n1n2

2

Variance:

V(W1) = V(U1) = n1 n2 (n + 1) 12

• W1 and U1 are symmetrical w.r.t. their mean values.

Moreover, for n1 and n2 greater than 10:

U1 − E(U1) std(U1)

approx N (0, 1)

(16)

Back to the test

The test is one-sided left; the sample value is 7 and its p-value is P(W1 ≤ 7) = 2/35 = 0.057

In such a case with low sample size, we can say that the evidence is against H0.

Direct computation in R

> x=c( 12, 16,13);y=c(17,15,18, 20)

> wilcox.test(x,y,"less")

Wilcoxon rank sum test data: x and y

W = 1, p-value = 0.05714

alternative hypothesis: true location shift is less than 0

The approximation of W1 with a standard normal distribution is not appropriate for small sample sizes.

But, in such a case, the exact computation and the normal ap- proximation give similar results:

z = 7 − 12

√8 = −1.77 p-value(−1.77) = 0.039

(17)

6. Goodness-of-fit tests

Measures of goodness-of-fit typically summarize the discrepancy between observed values and the values expected under a known probability model.

Such measures can also be used to test whether two samples are drawn from identical distributions.

We consider here two goodness-of-fit tests:

a) Chi-square test (discrete variables) b) Kolmogorov-Smirnov

(18)

6. a) Chi-square goodness-of-fit tests

Let X be a discrete random variable with finite support variable with

P(X = xi) = πi i = 1, . . . , r The test hypotheses are:

H0 : πi = πi0 for all i and H1 : πi 6= πi0 for at least one i Let

- X1, ..., Xn be a random sample

- F1, . . . , Fr be the sample variables denoting the sample fre- quencies of the values 1, . . . , r

- N1, . . . , Nr be the corresponding counts variables, Ni = nFi, i = 1, . . . , r.

Often the Ni’s variables are called observed (counts) while the nπi0’s are called expected (counts) and denoted by Oi and Ei respectively.

(19)

The test statistic is Q = n

r X

i=1

(Fi − πi0)2 πi0 =

r X

i=1

(Ni − nπi0)2

i0 =

(simply)

r X

i=1

(Oi − Ei)2 Ei

Its asymptotic distribution is a chi-square with r − 1 degrees of freedom:

Q ∼

approx χ2[r−1]

0 5 10 15 20

0.000.050.100.15

The test is one-sided right because large sample values of Q state large difference between observed frequencies and expected frequencies.

(20)

Dependence on a parameter

Often the πi’s depends on a unknown parameter θ. Examples:

• X ∼ B(n, θ) (binomial),

• X ∼ U {0, θ} (discrete uniform between 0 and θ),

• X ∼ P(θ) (truncated Poisson, considering null the probability of “large” integers)

We can write πi = πi(θ) and the test hypotheses become:

H0 : πi = πi0(θ) and H1 : πi 6= πi0(θ)

If Θn is a consistent estimator of θ with normal asymptotic dis- tribution N (θ, V(Θn)) (e.g. maximum likelihood estimator) then the test can be conduct with the statistic:

Q = n

r X

i=1

(Fi − πi0n))2 πi0n)

(21)

Example. Sons among the first 7 children

(Edwards and Fraccaro 1960)

Consider the number of males among the first seven sons of 1334 Swedish Ministers

n. sons 0 1 2 3 4 5 6 7

counts 6 57 206 362 365 256 69 13 We want to test if they are sample values of a random variable

X ∼ B(7, θ)

The point estimator of θ is X/7, the maximum likelihood estimator. The estimate of θ is 0.514.

> x=c( 0,1,2,3,4,5,6,7);o=c(6,57,206,362,365,256,69,13)

> t=sum(x*o)/sum(o)/7;t [1] 0.5140287

The expected counts under H0 are:

> e=sum(o)*dbinom(0:7,7,t);round(e,1)

[1] 8.5 63.2 200.6 353.7 374.1 237.4 83.7 12.6

The sample values of Q is 5.98 with p-value 0.54. (q=sum((o-e)^2/e); 1-pchisq(q,7)) Then there is no evidence to reject H0

(22)

Effects of small sample size.

Recall that Q = n

r X i=1

(Fi − πi0)2 πi0 =

r X i=1

(Ni − nπi0)2

i0

approx χ2[r−1]

The chi-square approximation is valid when the sample size is large and the expected counts nπi are not too small (at least 5 for all i = 1, . . . , r).

In fact:

(1) small n ⇒ small q ⇒ risk of type II error (2) small nπi0 ⇒ large q ⇒ risk type I error.

(23)

Examples.

Case (1): small n ⇒ small q ⇒ risk of type II error Consider the expected and ob-

served frequencies beside where the differences between them are greater than 40%.

1 2

expected 0.40 0.60 observed 0.15 0.85

In such a case: (f1−π10)

2

π10 + (f2−π20)

2

π20 = 0.2604

If n = 10, then q = 10 × 0.2604 = 2.604 with p-value 0.107

⇒ retain H0.

If n = 30, then q = 30 × 0.2604 = 7.812 with p-value 0.005

⇒ reject H0.

> e=c(0.4,0.6); o=c(0.15,0.85); cf=sum((o-e)^2/e);cf [1] 0.2604167

> n=10;cbind(cf*n,1-pchisq(cf*n,1)) [1,] 2.604167 0.1065832

> n=30;cbind(cf*n,1-pchisq(cf*n,1)) [1,] 7.8125 0.005188608

(24)

Case (2): small nπi0 ⇒ large q ⇒ risk type I error

Consider the expected and observed counts beside.

In (A) the expected counts are small twice.

values 0 1 2

(A) expected 10 2 2 observed 12 3 6

values 0 1 2

(B) expected 10 12 12 observed 12 13 16 In (A): q = 8.900 with p-value 0.0117 ⇒ reject H0.

In (B): q = 1.817 with p-value 0.4032 ⇒ retain H0.

> e=c(10,2,2); o=c(12,3,6); cf=sum((o-e)^2/e)

> cbind(cf,1-pchisq(cf,2)) [1,] 8.9 0.01167857

> e=c(10,12,12); o=c(12,13,16); cf=sum((o-e)^2/e)

> cbind(cf,1-pchisq(cf,2)) [1,] 1.816667 0.4031957

(25)

6 b1) Kolmogorov-Smirnov goodness-of-fit tests

Let X1, . . . , Xn be i.i.d. sample variables from a continuous ran- dom variable X with cumulative distribution function F .

Consider the test hypotheses:

H0 : F (x) = F0(x) for all x ∈ R

H1 : F (x) 6= F0(x) for at least a x ∈ R

Let F be theb empirical cumulative distribution function:

F (x) =b n X

i

i n

X(i) < x < X(i+1)

where (X(1), . . . , X(n)) is the sorted random sample and (.) denote the indicator function (equal to 1 if the condition is satisfied and equal to 0 otherwise). ˆF is a step function.

The sample values of F (x) are discussed in the slides “Exploratory Data Anal-b ysis”.

(26)

The Kolmogorov test statistic is D = sup

x∈R

F (x) − Fb 0(x) =

1≤x≤nmax



max



i

n − F0 X(i)

,

i − 1

n − F0 X(i)



D is a distribution-free statistic.

The test is one-sided right because a large sample value of D corresponds to a large difference between empirical and tested cumulative distribution function.

(27)

Example.

Goodness-of-fit of a uniform random variable X ∼ U (0, 2)

We want to test if a uniform random variable X ∼ U (0, 2) fits the following (sorted) data:

0.03 0.12 0.25 0.41 0.49 1.18 1.21 1.56 1.57 1.69

A random variable X ∼ U (0, 2) has

cumulative distribution function F0(x) =

0 if x < 0

1/2 x if 0 ≤ x < 2 1x if 2 ≤ x

Beside the empirical cumulative distribu- tion function (red) and F0 (black).

The maximum distance fo the two plot is achieved for x = 0.49 (fifth sorted value) and d = 0.49 ∗ 1215 = 0.255

0.0 0.5 1.0 1.5 2.0

0.00.20.40.60.81.0

(28)

Direct computation in R

> s=c(0.03,0.12,0.25,0.41,0.49,1.18,1.21,1.56,1.57,1.69)

> ks.test(s,"punif",0,2)

One-sample Kolmogorov-Smirnov test data: s

D = 0.255, p-value = 0.4593

alternative hypothesis: two-sided

There is no evidence to reject H0.

(29)

Example. Approximate distribution of Xn

see slides on Central limit theorem

Consider a simulation of 1000 samples, of size n each, from an exponential random variable X ∼ E(λ) with λ = 2.

The simulated distribution is compared with

- a Normal variable with sample mean and standard deviation - a Normal variable with theoretical mean and standard devia-

tion; which are known: 1/λ and 1/(λ√

n) respectively.

• n = 10

> lambda=2;x=c(1:1000);n=10

> for (i in 1:1000) x[i]=mean(rexp(n,lambda))

> ######### empirical mean and standard deviation

> ks.test(x,"pnorm",mean(x),sd(x))

One-sample Kolmogorov-Smirnov test data: x

D = 0.050283, p-value = 0.01273 alternative hypothesis: two-sided

(30)

> ######### theoretical mean and standard deviation

> ks.test(x,"pnorm",(1/lambda),(1/lambda/sqrt(n))) One-sample Kolmogorov-Smirnov test

data: x

D = 0.042575, p-value = 0.05329 alternative hypothesis: two-sided

In the first case there is evidence to reject that the simulated distribution of X10 is Normal. In the second one the evidence is weak.

(31)

• n = 30

> lambda=2;x=c(1:1000);n=30

> for (i in 1:1000) x[i]=mean(rexp(n,lambda))

> ######### empirical mean and standard deviation

> ks.test(x,"pnorm",mean(x),sd(x))

One-sample Kolmogorov-Smirnov test data: x

D = 0.035285, p-value = 0.1657 alternative hypothesis: two-sided

> ######### theoretical mean and standard deviation

> ks.test(x,"pnorm",(1/lambda),(1/lambda/sqrt(n))) One-sample Kolmogorov-Smirnov test

data: x

D = 0.032839, p-value = 0.231

alternative hypothesis: two-sided

In both cases there is evidence to retain that the simulated distribution of X10 is Normal.

(32)

6 b2) Two-sample Kolmogorov-Smirnov goodness-of-fit tests

Let X and Y be two continuous independent random variables with cumulative distribution functions FX and FY respectively.

The test hypotheses are:

H0 : FX(t) = FY (t) for all t ∈ R

H1 : FX(t) 6= FY (t) for at least a t ∈ R

Let X1, . . . , Xn1 and Y1, . . . , Yn2 be two independent random sam- ples with empirical cumulative distribution functions FbX and FbY respectively.

The Kolmogorov-Smirnov test statistic is Dn1,n2 = sup

x∈R

FbX(x) − FbY (x)

Dn1,n2 is a distribution-free statistic.

(33)

Example. Juiper trees

We want to test if biomass of male and female Juniper trees have the same distribution.

The two samples have size 6 each.

> m=c(71,72,74,76,77,78); f=c(73,79,80,82,83,84)

>

> Fm_Ff=rbind(cumsum(table(factor(m, levels=71:84)))/6, + cumsum(table(factor(f, levels=71:84)))/6)

> round(Fm_Ff,2)

71 72 73 74 75 76 77 78 79 80 81 82 83 84

0.17 0.33 0.33 0.50 0.50 0.67 0.83 1.00 1.00 1.0 1.0 1.00 1.00 1 0.00 0.00 0.17 0.17 0.17 0.17 0.17 0.17 0.33 0.5 0.5 0.67 0.83 1

plot(ecdf(m),col="blue",

xlim=c(70,85),xlab="",ylab="",main="") plot(ecdf(f),add=T,col="red",

xlim=c(70,85),xlab="",ylab="",main="")

70 75 80 85

0.00.20.40.60.81.0

(34)

The absolute values of difference between FbM and FbF are listed below and their maximum value is 0.833 reached at 78 of biomass.

> D=abs(Fm_Ff[1,]-Fm_Ff[2,])

> round(rbind(Fm_Ff,D),2)

71 72 73 74 75 76 77 78 79 80 81 82 83 84

0.17 0.33 0.33 0.50 0.50 0.67 0.83 1.00 1.00 1.0 1.0 1.00 1.00 1 0.00 0.00 0.17 0.17 0.17 0.17 0.17 0.17 0.33 0.5 0.5 0.67 0.83 1 D 0.17 0.33 0.17 0.33 0.33 0.50 0.67 0.83 0.67 0.5 0.5 0.33 0.17 0

> max(D)

[1] 0.8333333

Direct computation in R

> ks.test(m, f)

Two-sample Kolmogorov-Smirnov test data: m and f

D = 0.83333, p-value = 0.02597 alternative hypothesis: two-sided

There is evidence to reject H0

(35)

7. Final remarks

Form the book by T. Hill and P. Levicky (2006) Statistics method and applications. StatSoft. p. 385

It is not easy to give simple advice concerning the use of nonparametric procedures.

Each nonparametric procedure has its peculiar sensitivi- ties and blind spots.

For example, the Kolmogorov-Smirnov two-sample test is not only sensitive to differences in the location of dis- tributions (for example, differences in means) but is also greatly affected by differences in their shapes.

The Wilcoxon matched pairs test assumes that one can rank order the magnitude of differences in matched ob- servations in a meaningful manner. If this is not the case, one should rather use the Sign test.

(36)

In general, if the result of a study is important (e.g., does a very expensive and painful drug therapy help people get better?), then it is always advisable to run different nonparametric tests; should discrepancies in the results occur contingent upon which test is used, one should try to understand why some tests give different results.

On the other hand, nonparametric statistics are less sta- tistically powerful (sensitive) than their parametric coun- terparts, and if it is important to detect even small effects (e.g., is this food additive harmful to people?) one should be very careful in the choice of a test statistic.

Nonparametric methods are most appropriate when the sample sizes are small.

Riferimenti

Documenti correlati

and a half years of slow recovery, we expect inflation below 2% during most of 2013-2014, with a possible and a half years of slow recovery, we expect inflation below 2% during most

Opera Company Albert Saléza (Faust), Emma Calvé (Marguerite), Pol Plançon (Méphistophélès), Giuseppe Campanari (Valentin), Mathilde Bauermeister (Marthe), Theodore Meux

Since the early 1970s thcre have been several attempts to initiate the re-introduction of lynx into the German Alps. but none of the projects could be carried

Dopo un'analisi degli oli estratti da nocciole, effettuata per verificare l'applicabilità delle tecniche NMR e per determinare alcuni parametri di configurazione, sono stati

Phagocytic activity of microglial cells towards JH2-2 (white bars), the DsodA mutant (black bars) and the complemented DsodA strain (grey bars).. BV2 cells were incubated with

To cancel out pro- duction and detection charge-asymmetry effects, the search is carried out by measuring the difference between the CP asymmetries in a charmless decay and in a

Our goals in this paper are to test our model of galaxy evolution with a top-heavy IMF in starbursts against observations of dust- obscured star-forming galaxies over the redshift

The radial density profiles in 3D upper panel and 2D projected distances middle panel of various cluster mass components in the IllustrisTNG 300-1 simulation – dark matter, gas,