• Non ci sono risultati.

Inferential Statistics Hypothesis tests Normal Probability Plot

N/A
N/A
Protected

Academic year: 2021

Condividi "Inferential Statistics Hypothesis tests Normal Probability Plot"

Copied!
18
0
0

Testo completo

(1)

Inferential Statistics Hypothesis tests

Normal Probability Plot

Eva Riccomagno, Maria Piera Rogantin

DIMA – Universit`a di Genova

riccomagno@dima.unige.it rogantin@dima.unige.it

(2)

Part I. Test on the equality of the means in sub-groups

1. Descriptive approach: within group and between group vari- ances

2. Inferential approach: test on the equality of the group means (one-way ANOVA)

Part J. Normal Probability Plot

(3)

I. Test on the equality of the means in sub-groups (one-way ANOVA)

I.1 Descriptive approach: decomposition of the variance

Recall

K groups of size n

1

, . . . , n

K

n =

PKk=1

n

k

total sample size group means x

1

, . . . , x

K

and variances σ

12

, . . . , σ

K2

within group variance:

weighted sum of the variances of the groups

PKk=1 nk

n

σ

k2

between group variance:

weighted variances of the group means

PKk=1 nk

n

(x

k

− x

tot

)

2

total variance = within group var. + between group var.

σ

tot2

=

K X

k=1

n

k

n σ

k2

+

K X

k=1

n

k

n (x

k

− x

tot

)

2

(4)

Three examples. size: n

A

= 50, n

B

= 60, n

C

= 30, n = 140 Example 1 and Example 2

Group variances: σ

A2

= 4.89, σ

B2

= 4.94, σ

C2

= 4.17 Then, in both cases, the within group variance is

1

140 (50 × 4.89 + 60 × 4.94 + 30 × 4.17) = 666

140 ' 4.76 Example 3

Group variances: σ

A2

= 1190, σ

B2

= 1195, σ

C2

= 1785 Then the within group variance is

1

140 (50 × 1190 + 60 × 1195 + 30 × 1785) = 184750

140 ' 1320 Means on the groups

Example 1: x

A

= 10.1, x

B

= 24.8, x

C

= 39.8 Example 2: x

A

= 21.1, x

B

= 21.8, x

C

= 22.3 Example 3: x

A

= 11.8, x

B

= 18.9, x

C

= 31.7

The group means of Example 1 and Example 3 are similar but

the within group variances are strongly different

(5)

Example 1 x

A

= 10.1, x

B

= 24.8, x

C

= 39.8

0 10 20 30 40 50

0246810

0 10 20 30 40 50

0246810

0 10 20 30 40 50

0246810 σA2= 4.97 σB2= 5.05 σC2= 4.32

nA= 50 nB= 60 nC= 30

xA xB xC

xtot

x

tot

=

1401

(50 × 10.1 + 60 × 24.8 + 30 × 39.8) ' 21.7 between group variance ' 122.4

1

140 50(10.1 − 21.7)2 + 60(24.8 − 21.7)2 + 30(39.8 − 21.7)2

= 17133140 ' 122.4

within group variance ' 4.76

(6)

Example 2 x

A

= 21.1, x

B

= 21.8, x

C

= 22.3

0 10 20 30 40 50

0246810

0 10 20 30 40 50

0246810

0 10 20 30 40 50

0246810

xA xC xB xtot

σA2= 4.97 σB2= 5.05 σC2= 4.32

nA= 50 nB= 60 nC= 30

x

tot

=

1401

(50 × 21.1 + 60 × 21.8 + 30 × 22.3) ' 21.7 between group variance ' 0.21

1

140 50(21.1 − 21.7)2 + 60(21.8 − 21.7)2 + 30(22.3 − 21.7)2

= 29.4140 ' 0.21

within group variance ' 4.76

(7)

Example 3 x

A

= 11.8, x

B

= 18.9, x

C

= 31.7

−100 −50 0 50 100 150

012345

−100 −50 0 50 100 150

012345

−100 −50 0 50 100 150

012345

xA xBxC xtot

σA2= 1090

nA= 50

σB2= 1195 σC2= 1785

nB= 60 nC= 30

Pay attention to the different scales w.r.t previous Examples

x

tot

' 19.1

between group variance ' 1320

within group variance ' 7430

(8)

A measure of the “similarity” of the group means is between group variance

within group variance

Example 1:

122.4

4.76

' 25.7 Example 2:

0.21

4.76

' 0.04 Example 3:

13207430

' 0.18

In Example 1 the means are very different and the ratio is much

larger than one.

(9)

R code to generate data of Example 1 and to construct the histograms

na=50;nb=60;nc=30 ma=10;mb=25;mc=40

a=rnorm(na,ma,2.4);b=rnorm(nb,mb,2);c=rnorm(nc,mc,2.3) x=c(a,b,c)

gruppi=c(rep("A",na),rep("B",nb),rep("C",nc)) br=seq(1,50,.5)

x_l=c(0,50); y_l=c(0,10)

hist(a, breaks=br,main="",xlab="",ylab="",xlim=x_l,ylim=y_l,col="blue") par(new=T)

hist(b, breaks=br,main="",xlab="",ylab="",xlim=x_l,ylim=y_l) par(new=T)

hist(c, breaks=br,main="",xlab="",ylab="",xlim=x_l,ylim=y_l,col="red") par(new=F)

abline(v=mean(a),col="blue",lwd=2) abline(v=mean(b),lwd=2)

abline(v=mean(c),col="red",lwd=2)

abline(v=mean(x),col="darkgreen",lwd=2)

(10)

I.2 Inferential approach:

Test on the equality of the means in sub-groups

For each group assume a Normal random variable X

k

X

k

∼ N (µ

k

, σ

k2

) k = 1, . . . , K

Test hypotheses

H

0

: µ

1

= µ

2

= · · · = µ

K

and H

1

: at least two are different Let n

1

, . . . , n

K

be the size of the K independent samples from X

1

, . . . , X

K

and let n be the total sample size

The K sample mean random variables are X

1

∼ N µ

1

, σ

12

n

1

!

. . . X

K

∼ N µ

K

, σ

K2

n

K

!

The K sample variance random variables are

S

12

, . . . , S

K2

(11)

The estimators of the between and the within variances multiplied by n (called also between/within variations) are

between group variation: weighted variation of the group means

V

B

=

K X

k=1

n

k 

X

k

− X

tot2

where Xtot is the weighted sum of the sample mean random variables

within group variation: weighted sum of the group variances

V

W

=

K X

k=1

(n

k

− 1) S

k2

Test statistics

F = V

B

/ (K − 1) V

W

/ (n − K)

It follows a Fisher distribution F ∼ F

[K−1,n−K]

whose mean is (n − K)/(n − K − 2)

High values of the ratio between/within imply to reject H

0

(12)

Analysis of Variance Table

Degrees Sum of Mean of F value p-value of freedom Squares Squares

factor K − 1 vB vB/(K − 1) f P(F > f ) residuals n − K vW vW/(n − K)

total n − 1 vB + vW

where small letters indicate the sample value of the estimators

The row “total” is not displayed by some software, as R

(13)

Example 1

> anova(lm(x~gruppi))

Analysis of Variance Table Response: x

Df Sum Sq Mean Sq F value Pr(>F)

gruppi 2 18301.3 9150.6 1883.3 < 2.2e-16 ***

Residuals 137 665.6 4.9 ---

Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

Example 2

> anova(lm(x1~gruppi))

Analysis of Variance Table Response: x1

Df Sum Sq Mean Sq F value Pr(>F) gruppi 2 28.76 14.3810 2.9598 0.05515 . Residuals 137 665.65 4.8587

---

Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

(14)

Example 3

Analysis of Variance Table Response: x2

Df Sum Sq Mean Sq F value Pr(>F) gruppi 2 7928 3963.9 3.11 0.04777 * Residuals 137 174615 1274.6

---

Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

In Example 1 there is a strong evidence that the three groups are distinct

In Example 2 and Example 3 the evidence against H

0

is weak

(15)

Example. Nitrogen in red clover plants

Effect of bacteria (5 strains and a composite) on the nitrogen content of red clover plants.

10 15 20 25 30 35

●●●

3DOK1

10 15 20 25 30 35

●●

3DOK13

10 15 20 25 30 35

3DOK4

10 15 20 25 30 35

●●●

3DOK5

10 15 20 25 30 35

●● ●●●

3DOK7

10 15 20 25 30 35

●● ●●

COMPOS

10 15 20 25 30 35

●●●

3DOK1

10 15 20 25 30 35

3DOK13

10 15 20 25 30 35

● ●

3DOK4

10 15 20 25 30 35

●●●

3DOK5

10 15 20 25 30 35

● ●

3DOK7

10 15 20 25 30 35

●● ●● ●

COMPOS

> clover=read.table("C:/DATA/anova_redclover.txt",header =T);attach(clover)

> anova(lm(Nitrogen~Strain)) Analysis of Variance Table Response: Nitrogen

Df Sum Sq Mean Sq F value Pr(>F)

Strain 5 847.05 169.409 14.37 1.485e-06 ***

Residuals 24 282.93 11.789

(16)

Code for stripchart

for (i in 1:6){h=.15*i*2

stripchart(Nitrogen[Strain==levels(Strain)[i]],

method="stack", offset=.5,at =.15*i*2,pch=19,xlim=c(8,38),cex=2,col="red") axis(2, at=h, labels = FALSE)

text(y=h, par("usr")[1]-0.2, labels=levels(Strain)[i],pos = 2,xpd = TRUE) abline(h=h);par(new=T)}

(17)

Part J. Normal Probability Plot

Graphical technique for assessing whether or not the data can be considered a sample from a Normal distribution

The quantiles of the data are plotted against the corresponding quantiles of a standard Normal distribution

If the points form a nearly linear pattern, the normal distribution

is a good model for this data. Departures from the straight line

indicate departures from normality

(18)

Example.

Normal probability plot of PULSE1

qqnorm(PULSE1,pch=16,

main="Normal Q-Q plot of PULSE1") qqline(PULSE1,col="red",lwd=2)

fivenum(PULSE1)

[1] 48 64 71 80 100

round(qnorm(c(0.001,0.25,0.5,0.75, 0.999)),3)

[1] -3.090 -0.674 0.000 0.674 3.090

−2 −1 0 1 2

5060708090100

Normal Q−Q plot of PULSE1

Theoretical Quantiles

Sample Quantiles

Example.

Normal probability plot of BILIRUBINA

qqnorm(BILIRUBINA,pch=16,cex=0.5,

main="Normal Q-Q plot of BILIRUBINA") qqline(BILIRUBINA,col="red",lwd=2) fivenum(BILIRUBINA)

[1] 0.30 0.80 1.35 3.45 28.00

● ●

● ●

●●

●●

●●

●●

●●

−3 −2 −1 0 1 2 3

0510152025

Normal Q−Q plot of BILIRUBINA

Theoretical Quantiles

Sample Quantiles

Riferimenti

Documenti correlati

Federico appare radicale, ed è ancora una volta fondata sul presupposto che la mancata parola del principe sia motivata esclusivamente dall’interesse privato, come dimostra il

In the ascending and descending colon‚ the TO-TL and TM-TL rows bear intraperitoneal relationships and the TM-TO haustra face extraperitoneal structures.. (Reproduced from Meyers et

2.. the critical value is the smallest s s.t.. 10000 and 9500 respectively), only with large sample the probability of correct decision

If the sample is large, even a small difference can be “evidence”, that is hard to explain by the chance variability.. are close to those of the standard

A marketing company claims that 25% of the IT professionals choose the Chicago Tri- bune as their primary source for local IT news. A survey was conducted last month to check

b) two-samples.. Chicago Tribune Chicago land’s technology professionals get local technology news from various newspapers and magazines. A marketing company claims that 25% of the

Some texts interpret confidence intervals as follows: if I repeat the experiment over and over, the interval will contain the parameter 1 − α percent of the time, e.g. This is

Consider a test on the mean µ of a random variable X with unknown distribution and assume a large sample size.. Aside of probability.. Test on the proportion p of a binary