Inferential Statistics Hypothesis tests
Normal Probability Plot
Eva Riccomagno, Maria Piera Rogantin
DIMA – Universit`a di Genova
riccomagno@dima.unige.it rogantin@dima.unige.it
Part I. Test on the equality of the means in sub-groups
1. Descriptive approach: within group and between group vari- ances
2. Inferential approach: test on the equality of the group means (one-way ANOVA)
Part J. Normal Probability Plot
I. Test on the equality of the means in sub-groups (one-way ANOVA)
I.1 Descriptive approach: decomposition of the variance
Recall
K groups of size n
1, . . . , n
Kn =
PKk=1n
ktotal sample size group means x
1, . . . , x
Kand variances σ
12, . . . , σ
K2within group variance:
weighted sum of the variances of the groups
PKk=1 nkn
σ
k2between group variance:
weighted variances of the group means
PKk=1 nkn
(x
k− x
tot)
2total variance = within group var. + between group var.
σ
tot2=
K X
k=1
n
kn σ
k2+
K X
k=1
n
kn (x
k− x
tot)
2Three examples. size: n
A= 50, n
B= 60, n
C= 30, n = 140 Example 1 and Example 2
Group variances: σ
A2= 4.89, σ
B2= 4.94, σ
C2= 4.17 Then, in both cases, the within group variance is
1
140 (50 × 4.89 + 60 × 4.94 + 30 × 4.17) = 666
140 ' 4.76 Example 3
Group variances: σ
A2= 1190, σ
B2= 1195, σ
C2= 1785 Then the within group variance is
1
140 (50 × 1190 + 60 × 1195 + 30 × 1785) = 184750
140 ' 1320 Means on the groups
Example 1: x
A= 10.1, x
B= 24.8, x
C= 39.8 Example 2: x
A= 21.1, x
B= 21.8, x
C= 22.3 Example 3: x
A= 11.8, x
B= 18.9, x
C= 31.7
The group means of Example 1 and Example 3 are similar but
the within group variances are strongly different
Example 1 x
A= 10.1, x
B= 24.8, x
C= 39.8
0 10 20 30 40 50
0246810
0 10 20 30 40 50
0246810
0 10 20 30 40 50
0246810 σA2= 4.97 σB2= 5.05 σC2= 4.32
nA= 50 nB= 60 nC= 30
xA xB xC
xtot
x
tot=
1401(50 × 10.1 + 60 × 24.8 + 30 × 39.8) ' 21.7 between group variance ' 122.4
1
140 50(10.1 − 21.7)2 + 60(24.8 − 21.7)2 + 30(39.8 − 21.7)2
= 17133140 ' 122.4
within group variance ' 4.76
Example 2 x
A= 21.1, x
B= 21.8, x
C= 22.3
0 10 20 30 40 50
0246810
0 10 20 30 40 50
0246810
0 10 20 30 40 50
0246810
xA xC xB xtot
σA2= 4.97 σB2= 5.05 σC2= 4.32
nA= 50 nB= 60 nC= 30
x
tot=
1401(50 × 21.1 + 60 × 21.8 + 30 × 22.3) ' 21.7 between group variance ' 0.21
1
140 50(21.1 − 21.7)2 + 60(21.8 − 21.7)2 + 30(22.3 − 21.7)2
= 29.4140 ' 0.21
within group variance ' 4.76
Example 3 x
A= 11.8, x
B= 18.9, x
C= 31.7
−100 −50 0 50 100 150
012345
−100 −50 0 50 100 150
012345
−100 −50 0 50 100 150
012345
xA xBxC xtot
σA2= 1090
nA= 50
σB2= 1195 σC2= 1785
nB= 60 nC= 30
Pay attention to the different scales w.r.t previous Examples
x
tot' 19.1
between group variance ' 1320
within group variance ' 7430
A measure of the “similarity” of the group means is between group variance
within group variance
Example 1:
122.44.76
' 25.7 Example 2:
0.214.76
' 0.04 Example 3:
13207430' 0.18
In Example 1 the means are very different and the ratio is much
larger than one.
R code to generate data of Example 1 and to construct the histograms
na=50;nb=60;nc=30 ma=10;mb=25;mc=40
a=rnorm(na,ma,2.4);b=rnorm(nb,mb,2);c=rnorm(nc,mc,2.3) x=c(a,b,c)
gruppi=c(rep("A",na),rep("B",nb),rep("C",nc)) br=seq(1,50,.5)
x_l=c(0,50); y_l=c(0,10)
hist(a, breaks=br,main="",xlab="",ylab="",xlim=x_l,ylim=y_l,col="blue") par(new=T)
hist(b, breaks=br,main="",xlab="",ylab="",xlim=x_l,ylim=y_l) par(new=T)
hist(c, breaks=br,main="",xlab="",ylab="",xlim=x_l,ylim=y_l,col="red") par(new=F)
abline(v=mean(a),col="blue",lwd=2) abline(v=mean(b),lwd=2)
abline(v=mean(c),col="red",lwd=2)
abline(v=mean(x),col="darkgreen",lwd=2)
I.2 Inferential approach:
Test on the equality of the means in sub-groups
For each group assume a Normal random variable X
kX
k∼ N (µ
k, σ
k2) k = 1, . . . , K
Test hypotheses
H
0: µ
1= µ
2= · · · = µ
Kand H
1: at least two are different Let n
1, . . . , n
Kbe the size of the K independent samples from X
1, . . . , X
Kand let n be the total sample size
The K sample mean random variables are X
1∼ N µ
1, σ
12n
1!
. . . X
K∼ N µ
K, σ
K2n
K!
The K sample variance random variables are
S
12, . . . , S
K2The estimators of the between and the within variances multiplied by n (called also between/within variations) are
between group variation: weighted variation of the group means
V
B=
K X
k=1
n
kX
k− X
tot2where Xtot is the weighted sum of the sample mean random variables
within group variation: weighted sum of the group variances
V
W=
K X
k=1
(n
k− 1) S
k2Test statistics
F = V
B/ (K − 1) V
W/ (n − K)
It follows a Fisher distribution F ∼ F
[K−1,n−K]whose mean is (n − K)/(n − K − 2)
High values of the ratio between/within imply to reject H
0Analysis of Variance Table
Degrees Sum of Mean of F value p-value of freedom Squares Squares
factor K − 1 vB vB/(K − 1) f P(F > f ) residuals n − K vW vW/(n − K)
total n − 1 vB + vW
where small letters indicate the sample value of the estimators
The row “total” is not displayed by some software, as R
Example 1
> anova(lm(x~gruppi))
Analysis of Variance Table Response: x
Df Sum Sq Mean Sq F value Pr(>F)
gruppi 2 18301.3 9150.6 1883.3 < 2.2e-16 ***
Residuals 137 665.6 4.9 ---
Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
Example 2
> anova(lm(x1~gruppi))
Analysis of Variance Table Response: x1
Df Sum Sq Mean Sq F value Pr(>F) gruppi 2 28.76 14.3810 2.9598 0.05515 . Residuals 137 665.65 4.8587
---
Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
Example 3
Analysis of Variance Table Response: x2
Df Sum Sq Mean Sq F value Pr(>F) gruppi 2 7928 3963.9 3.11 0.04777 * Residuals 137 174615 1274.6
---
Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
In Example 1 there is a strong evidence that the three groups are distinct
In Example 2 and Example 3 the evidence against H
0is weak
Example. Nitrogen in red clover plants
Effect of bacteria (5 strains and a composite) on the nitrogen content of red clover plants.
10 15 20 25 30 35
● ● ●●●
3DOK1
10 15 20 25 30 35
●● ●●●
3DOK13
10 15 20 25 30 35
● ● ● ● ●
3DOK4
10 15 20 25 30 35
● ●●● ●
3DOK5
10 15 20 25 30 35
●● ●●●
3DOK7
10 15 20 25 30 35
●● ●● ●
COMPOS
10 15 20 25 30 35
● ● ●●●
3DOK1
10 15 20 25 30 35
●● ●●●
3DOK13
10 15 20 25 30 35
● ● ● ● ●
3DOK4
10 15 20 25 30 35
● ●●● ●
3DOK5
10 15 20 25 30 35
●● ●●●
3DOK7
10 15 20 25 30 35
●● ●● ●
COMPOS
> clover=read.table("C:/DATA/anova_redclover.txt",header =T);attach(clover)
> anova(lm(Nitrogen~Strain)) Analysis of Variance Table Response: Nitrogen
Df Sum Sq Mean Sq F value Pr(>F)
Strain 5 847.05 169.409 14.37 1.485e-06 ***
Residuals 24 282.93 11.789
Code for stripchart
for (i in 1:6){h=.15*i*2
stripchart(Nitrogen[Strain==levels(Strain)[i]],
method="stack", offset=.5,at =.15*i*2,pch=19,xlim=c(8,38),cex=2,col="red") axis(2, at=h, labels = FALSE)
text(y=h, par("usr")[1]-0.2, labels=levels(Strain)[i],pos = 2,xpd = TRUE) abline(h=h);par(new=T)}
Part J. Normal Probability Plot
Graphical technique for assessing whether or not the data can be considered a sample from a Normal distribution
The quantiles of the data are plotted against the corresponding quantiles of a standard Normal distribution
If the points form a nearly linear pattern, the normal distribution
is a good model for this data. Departures from the straight line
indicate departures from normality
Example.
Normal probability plot of PULSE1
qqnorm(PULSE1,pch=16,
main="Normal Q-Q plot of PULSE1") qqline(PULSE1,col="red",lwd=2)
fivenum(PULSE1)
[1] 48 64 71 80 100
round(qnorm(c(0.001,0.25,0.5,0.75, 0.999)),3)
[1] -3.090 -0.674 0.000 0.674 3.090
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
−2 −1 0 1 2
5060708090100
Normal Q−Q plot of PULSE1
Theoretical Quantiles
Sample Quantiles
Example.
Normal probability plot of BILIRUBINA
qqnorm(BILIRUBINA,pch=16,cex=0.5,
main="Normal Q-Q plot of BILIRUBINA") qqline(BILIRUBINA,col="red",lwd=2) fivenum(BILIRUBINA)
[1] 0.30 0.80 1.35 3.45 28.00
●
● ●●
●
● ●
●
●
●
●
●
●● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●
● ●
●
●
● ● ●
●
●
●
●
● ●
● ●
●
●
●
● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
● ●
●
●
●
●●
●
● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
● ● ● ●
●
●●
● ●
●
●
● ●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
● ●
●
● ●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●● ● ●
●
●
●
●
● ● ●●
● ●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●●
● ●●
●
●●
● ●
●
●
●
●
●
●
●
●
● ●
● ●
●
●
●
● ●
●
●
●
●
●●
●
●●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●●●
●
●
●
●
●
●
●
●●
●
●
● ●●
●
●
● ●●
●
●
● ●
●●
●●
●
−3 −2 −1 0 1 2 3
0510152025
Normal Q−Q plot of BILIRUBINA
Theoretical Quantiles
Sample Quantiles