Linear models
including analysis of variance
Eva Riccomagno, Maria Piera Rogantin
DIMA – Universit`a di Genova
riccomagno@dima.unige.it rogantin@dima.unige.it
Part B. ANalysis Of VAriance - ANOVA 1. Introduction
2. Univariate ANOVA a) One-way ANOVA
b) Two-way ANOVA crossed factors c) Two-way ANOVA nested factors 3. The Kruskal Wallis test
4. Multivariate ANOVA (MANOVA) 5. MANOVA with Repeated measure
1. Introduction
The general aim is to determine whether a quantitative variable has equal or different means in subgroups defined by one or more qualitative variables (or factors).
The ANOVA can be formulated in at least two equivalent ways:
1. especially when the subgroups are defined by one factor, the measure of the strength of relationship is given by
between group variance within group variance
2. as a special case of the linear models. This returns detailed information on which factor levels have the greatest/smallest impact on equality or difference of means. Moreover it gen- eralizes to any number of factors easily ... our approach
Example. Nitrogen in red clover plants
Effect of bacteria (5 strains and a composite) on the nitrogen content of red clover plants.
10 15 20 25 30 35
● ● ●●●
3DOK1
10 15 20 25 30 35
●● ●●●
3DOK13
10 15 20 25 30 35
● ● ● ● ●
3DOK4
10 15 20 25 30 35
● ●●● ●
3DOK5
10 15 20 25 30 35
●● ●●●
3DOK7
10 15 20 25 30 35
●● ●● ●
COMPOS
10 15 20 25 30 35
● ● ●●●
3DOK1
10 15 20 25 30 35
●● ●●●
3DOK13
10 15 20 25 30 35
● ● ● ● ●
3DOK4
10 15 20 25 30 35
● ●●● ●
3DOK5
10 15 20 25 30 35
●● ●●●
3DOK7
10 15 20 25 30 35
●● ●● ●
COMPOS
clover=read.table("C:/DATA/anova_redclover.txt",header =T);attach(clover) for (i in 1:6){h=.15*i*2
stripchart(Nitrogen[Strain==levels(Strain)[i]],
method="stack", offset=.5,at =.15*i*2,pch=19,xlim=c(8,38),cex=2,col="red") axis(2, at=h, labels = FALSE)
text(y=h, par("usr")[1]-0.2, labels=levels(Strain)[i],pos = 2,xpd = TRUE) abline(h=h);par(new=T)}
2. a) Univariate one-way ANOVA
Y quantitative response – A categorical covariate (with s levels) observed on n units
Aim: to determine whether the Y values depend or not on the levels of A.
Model: Yik = µ + αi + εik where:
- i for the levels of A, i = 1, . . . , s
- k for the replicates in the levels, k = 1, . . . , ni, with Psi=1 ni = n and where:
- µ is the grand mean
- αi depends on the i-th level of A, - εik is the residual.
The assumptions of the linear model are:
- sample variables:
(Y11, . . . , Yn1, Y21, . . . , Y2n2, . . . , Y1s, . . . , Ysns) - Yik ∼ N (µ + αi, σ2), cov(Yik, Yjh) = 0, for all i, j, h, k
The model is over-parameterised number of parameters: s + 1
number of different values of the covariate: s The mean values of the sample variables are:
E(Yik) = µ + αi for all the units in the level i , k = 1, . . . , ni The s means can be all estimated but not all s + 1 parameters individually.
Re-parameterisation of the model
Pay great attention to parametrization the software uses!
In R the first level (in lexico- graphic order) is chosen as reference level and the esti- mated parameters are:
(Intercept) µ + α1, A2 α2 − α1, ...
AS αs − α1
The estimate of the mean value of Yik (µ + αi) is obtained by summing the estimates of Intercept and Ai
Test of nullity of all parameters except the constant The hypotheses are:
H0 : α1 = · · · = αs = 0 H1 : at least one different from 0 H0 can be also expressed as equality of the means in the different groups:
H0 : µ + α1 = · · · = µ + αs
The F-statistic comparing the sums of squares of the residuals of the complete and of the reduced models is
F = (SS0 − SSC) / (s − 1)
SSC / (n − s) ∼ F[s−1,n−s]
Remember that the test is one sided right.
The test statistic F , except for a constant, is:
between group variance within group variance
High values of this ratio indicate high strength of relationship.
Example. Nitrogen in red clover plants
Effect of bacteria (5 strains and a composite) on the nitrogen content of red clover plants.
10 15 20 25 30 35
● ● ●●●
3DOK1
10 15 20 25 30 35
●● ●●●
3DOK13
10 15 20 25 30 35
● ● ● ● ●
3DOK4
10 15 20 25 30 35
● ●●● ●
3DOK5
10 15 20 25 30 35
●● ●●●
3DOK7
10 15 20 25 30 35
●● ●● ●
COMPOS
10 15 20 25 30 35
● ● ●●●
3DOK1
10 15 20 25 30 35
●● ●●●
3DOK13
10 15 20 25 30 35
● ● ● ● ●
3DOK4
10 15 20 25 30 35
● ●●● ●
3DOK5
10 15 20 25 30 35
●● ●●●
3DOK7
10 15 20 25 30 35
●● ●● ●
COMPOS
The role of α1 is played by the coefficient of the strain 3DOK1, denoted by Strain3DOK1
> anova_clover=lm(Nitrogen~Strain)
> summary(anova_clover)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 28.820 1.535 18.769 7.53e-16 ***
Strain3DOK13 -15.560 2.172 -7.166 2.09e-07 ***
Strain3DOK4 -14.180 2.172 -6.530 9.40e-07 ***
Strain3DOK5 -4.840 2.172 -2.229 0.035446 * Strain3DOK7 -8.900 2.172 -4.099 0.000411 ***
StrainCOMPOS -10.120 2.172 -4.660 9.85e-05 ***
Residual standard error: 3.433 on 24 degrees of freedom
Multiple R-squared: 0.7496, Adjusted R-squared: 0.6975 F-statistic: 14.37 on 5 and 24 DF, p-value: 1.485e-06
F-statistic
Test of nullity of all parameters except the constant or, equiv- alently, equality of the nitrogen means in the different strains.
There is strong evidence to reject H0.
Estimate nitrogen means in the different strains:
Strain3DOK1: ˆµ + ˆα1 = 28.82
Strain3DOK13: ˆµ + ˆα2 = 13.26 (28.820 − 15.560)
Strain3DOK4: ˆµ + ˆα3 = 14.64 (28.820 − 14.180) . . .
How to change reference level in R (a method)
> clover6= within(clover, Strain <- relevel(Strain, ref = 6))
> anova_clover6=lm(Nitrogen~Strain, clover6)
> summary(anova_clover6) ....
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 18.700 1.535 12.179 9.20e-12 ***
Strain3DOK1 10.120 2.172 4.660 9.85e-05 ***
Strain3DOK13 -5.440 2.172 -2.505 0.0194 * Strain3DOK4 -4.060 2.172 -1.870 0.0738 . Strain3DOK5 5.280 2.172 2.431 0.0229 * Strain3DOK7 1.220 2.172 0.562 0.5794
The tests on the nullity coefficients individually are:
- for Intercept H0 : µ + αR = 0, H1 : µ + αR 6= 0 (R ref. level) - for the i-th strains, i 6= R:
H0 : αi − αR = 0 H1 : αi − αR 6= 0
There is no evidence to reject that the nitrogen mean in Strain3DOK7 is equal to the nitrogen mean in StrainCOMPOS (reference level):
p-value: 0.5794.
Mean estimates: 17.48 (18.70 − 1.22) and 18.70 respectively.
Standardized residuals vs Fitted values
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●
●●
●●
●●
●
●
●●
●
●
15 20 25
−3−2−101
fitted values
standardized residuals
Notice that the fitted values are six, one for each level of strain.
The graph shows a homogenous cloud around the horizontal axis, indicating a good model.
plot(predict(anova_clover6),rstandard(anova_clover6),pch=16,cex.axis=1.5,
xlab="fitted values",ylab="standardized residuals",cex.lab=1.5); abline(h=0)
2. b) and c) Two-way ANOVA crossed and nested factors Y quantitative response; A and B factors
Two factors are crossed when every category of one factor occurs in the design with every category of the other factor. In other words, there is at least one observation in every combination of categories for the two factors.
A 1 2 3
B 1 2 3 1 2 3 1 2 3
x x x x x x x x x
x x x x x x x x x
A factor is nested within another factor when each category of the first factor occurs with only one category of the other. In other words, an observation has to be within one category of Factor B in order to have a specific category of Factor A. Not all combinations of categories are represented.
A 1 2 3
B 1 2 3 4 5 6 7 8 9
x x x x x x x x x
x x x x x x x x x
If two factors are crossed, their interaction can be calculated. If they are nested, it cannot.
2. b) Univariate two-way ANOVA – crossed factors Two-way ANOVA without interaction
Y quantitative response
A and B categorical covariates with s1 and s2 levels respectively The model is, for i = 1, . . . , s1 and j = 1, . . . , s2,
Yijk = µ + αi + βj + εijk
The model is re-
parametrised similarly to the one-way case.
In R with the default refer- ence level:
(Intercept) µ + α1 + β1, A2 α2 − α1,
...
AS1 αs1 − α1 B2 β2 − β1, ...
BS2 βs2 − β1 The number of estimable parameters is
1 + (s1 − 1) + (s2 − 1) = s1 + s2 − 1
The test on the nullity of the parameters αi − α1 and βj − β1 are similar to one-way ANOVA.
The test on the influence on the response of factor A has hy- potheses
H0 : α1 = · · · = αs1 = 0 H1 : at least one different from 0 The test statistic is:
FA = (SSR − SSC) / (s1 − 1)
SSC / (n − (s1 + s2 − 1)) ∼ F[s
1−1,n−s1−s2+1)]
Analogously for the factor B where the null hypothesis is H0 : β1 = · · · = βs2 = 0.
The problem is: what is the reduced model?
Different choices can be made. In R the default is a model without the factor submitted to the test (“type II” or “marginal”
tests):
- for A the reduced model is: Yjk = µ + βj + εAjk - for B the reduced model is: Yik = µ + αi + εBik. We do not discuss here other choices.
If the experiment is balanced some choices are equivalent.
An experiment is balanced if the number of observations of the response in each of the s1 × s2 combinations of levels is equal.
Example. Zooplankton from two lakes and with three nutrients
Six tanks for water from each of two lakes.
One of three nutrient sup- plements is added to a tank.
After 30 days the zooplank- ton in a unit volume of water is measured.
Zooplankton Supplement Lake
1 34 1 Rose
2 43 1 Rose
3 57 1 Dennison
4 40 1 Dennison
5 85 2 Rose
6 68 2 Rose
7 67 2 Dennison
8 53 2 Dennison
9 41 3 Rose
10 24 3 Rose
11 42 3 Dennison
12 52 3 Dennison
The experiment is balanced: two replicates for each of the six combinations of levels.
> Supplement=factor(Supplement)
> two_anova=lm(Zooplankton~Supplement+Lake)
Supplement has to be transformed from numeric to factor for ANOVA.
• Tests on the influence of each of the two factors
> anova(two_anova)
Analysis of Variance Table Response: Zooplankton
Df Sum Sq Mean Sq F value Pr(>F) Supplement 2 1918.50 959.25 6.4860 0.02117 * Lake 1 21.33 21.33 0.1442 0.71398 Residuals 8 1183.17 147.90
There is evidence to reject the influence of Supplement on the Zooplankton quantity while the lake is not influent.
• Tests on equality of the effects of each of the two factors These tests make precise which levels of Supplement give the greatest contribution to the previous results.
> summary(two_anova) ...
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 44.833 7.021 6.385 0.000212 ***
Supplement2 24.750 8.599 2.878 0.020570 * Supplement3 -3.750 8.599 -0.436 0.674306 Lake Rose -2.667 7.021 -0.380 0.713980
Residual standard error: 12.16 on 8 degrees of freedom
Multiple R-squared: 0.6211, Adjusted R-squared: 0.4791 F-statistic: 4.372 on 3 and 8 DF, p-value: 0.04228
There is evidence to reject:
H0 : α2 = α1 (effect of Supplement2 = effect of Supplement1) or equivalently
H0 : µ + α2 + βj = µ + α1 + βj for j = 1, 2
i.e. the mean of Zooplankton with Supplement2 is different to the mean with Supplement1, for each origin lake.
Two-way ANOVA with interaction
The model is, for i = 1, . . . , s1 and j = 1, . . . , s2, Yijk = µ + αi + βj + γij + εijk
where the coefficients γij model the interaction between the two factors.
The re-parameterization of the model is even more complicated than the previous case. We do not give here the details.
Tests on the nullity of each of the three groups of parameters
> two_int_anova=lm(Zooplankton~Supplement+Lake+Supplement:Lake)
> anova(two_int_anova)
Analysis of Variance Table Response: Zooplankton
Df Sum Sq Mean Sq F value Pr(>F) Supplement 2 1918.50 959.25 9.2532 0.01468 * Lake 1 21.33 21.33 0.2058 0.66603 Supplement:Lake 2 561.17 280.58 2.7066 0.14529 Residuals 6 622.00 103.67
There is evidence only on the influence of Supplement on the Zooplankton quantity.
2. b) Univariate two-way ANOVA – nested factors Example. Insecticides to kill mosquitoes
Four chemical companies produce insecticides. The composition of the insecticides differs from company to company.
The factors are nested. See the dataset.
Response variable: number of the live mosquitoes 4 hours after an insecticide is sprayed on 400 mosquitoes inside a glass.
Three replications are per- formed for each product.
Product
Company 1 2 3 4 Total
A 3 3 3 0 9
B 3 3 0 0 6
C 3 3 0 0 6
D 3 3 3 3 12
Total 12 12 6 3 33
> mosquitos=read.table("C:/DATA/mosquitos.txt,header=TRUE);attach(mosquitos)
> nested_anova=lm(NMosquito~Company+Product/Company)
> anova(nested_anova)
Analysis of Variance Table Response: NMosquito
Df Sum Sq Mean Sq F value Pr(>F)
Company 3 22813.3 7604.4 132.776 3.048e-14 ***
Product 7 1500.6 214.4 3.743 0.008098 **
Residuals 22 1260.0 57.3
There is very strong evidence to reject the equality of means of the number of live mosquitos for the different companies and products.
3. The corresponding distribution free test of the one-way ANOVA: the Kruskal Wallis test
It is used for comparing more independent samples. It extends the Mann-Whitney U test.
Consider s groups of size n1, . . . , ns respectively, with n = Pni=1 ni. Thus for group i there are ni i.i.d. sample variables:
Xi = (Xi,1, . . . , Xi,ni) i = 1, . . . , s
The null hypothesis is that the s vectors of sample variables have the same distribution:
H0 : Xi ∼ Xj for all i, j = 1, . . . , s
The alternative hypothesis is that some of the Xi’s tend to yield larger values than other Xj’s do.
As in the Mann-Whitney test, the rank of the sample variables of all groups are considered, ignoring group membership.
If the data contain no ties the test statistics is:
H = 12
n(n + 1)
s X
i=1
niRi − 3(n + 1)
where Ri is the sample mean of the ranks of the group i.
Under the null hypothesis, H ∼
approx χ2
[s−1].
As Ri is the sum of the ranks of the group i divided by ni, each sample size should be not too small (at least 5) for the approximation to be valid.
In presence of ties, to any tied values is assigned the average of the ranks they would have received had they not been tied and the formula above is slightly modified.
Example. Nitrogen in red clover plants
> kruskal.test(Nitrogen~Strain, clover6) Kruskal-Wallis rank sum test
data: Nitrogen by Strain
Kruskal-Wallis chi-squared = 21.659, df = 5, p-value = 0.0006077
4. Multivariate ANOVA (MANOVA) Multivariate linear model
Extend the regression model to the situation with m responses Y 1, Y 2, . . . , Y m and the same set of covariates X1, . . . , Xp−1 on each sample unit.
Each response follows its own regression model:
Y 1 = β01 + β11 X1 + β21 X2 + · · · + βp−11 Xp−1 + ε1 Y 2 = β02 + β12 X1 + β22 X2 + · · · + βp−12 Xp−1 + ε2
...
Y m = β0m + β1m X1 + β2m X2 + · · · + βp−1m Xp−1 + εm with Yij ∼ N (xtiβj, σj2) where βj is the vector (β0j, . . . , βp−1j ).
Or equivalently εji ∼ N (0, σj2)
Point estimator of the coefficients
For each response Y j the coefficients βj are estimated by Bj, as in the univariate linear model.
Inference on the coefficients (tests and confidence intervals): dif- ferent from the univariate case.
Indeed, the sample variables are:
Y1 · · · Ym Y11 Y1m
... . . . ...
Yi1 Yim ... . . . ...
Yn1 Ynm
While in each column the sample vari- ables can be assumed independent (differ- ent units) in each row they can not (same unit).
The variance/covariance matrix of the sample variables Yi1, Yi2, . . . , Yim and for the residual is assumed equal for each unit: Σ In the multivariate models Σ plays the role of the common vari- ance σ2 in the univariate case.
Test statistics for subset of coefficients
• univariate case: based on the sum of squares of residuals (of the complete and reduced models); recall that the point estimator of σ2 is
SSC/(n − p)
• multivariate case: based on the point estimator of the vari- ance/covariance matrix Σ (of the complete and reduced mod- els)
The estimate of Σ (multiplied by n − p, where p is the number of estimable coefficients of the model) is indicated by R as Sum of squares and products for error
Multivariate ANOVA
Consider a two-way MANOVA model: Yihkj = µ + αji + βkj + εjihk The coefficients are (here the coefficients of Y j are denoted by θj):
θ1 · · · θm
µ1 µm
α11 . . . αm1
... ...
α1s
1 . . . αms
1
β11 . . . β1m ... . . . ...
βs1
2 . . . βsm
2
The multivariate test on the influence on the responses of the factor A has null hy- pothesis:
H0 : α11 = · · · = α1s
1 = · · · = αm1 = · · · = αms
1
Analogously for the factor B.
R in the package car reports the following test statistics:
1. Wilk’s Lambda 2. Pillai’s trace
3. Hotelling’s trace
4. Roy’s Maximum Root
Generally the decision of the test is the same with the different statistics.
Example. Plastic film
Effects of rate of extrusion and amount of additive on extruding plastic film.
Response variables: tear resistance, gloss and opacity.
Covariates: extrusion rate and additive amount (both binary).
0 1
6.06.57.07.5
Resistance
Extrusion_rate
0 1
8.59.09.510.0
Gloss
Extrusion_rate
0 1
2468
Opacity
Extrusion_rate
●
●
●
0 1
6.06.57.07.5
Resistance
Additive
0 1
8.59.09.510.0
Gloss
Additive
●
●
0 1
2468
Opacity
Additive
Extrusion rate Additive
plastic=read.table("C:/DATA/plastic_film.txt",header=T,sep="\t") attach(plastic)
par(cex.axis=1.5, cex.lab=1.8, cex.main=2,lwd=2); par(mfrow=c(1,3)) for (var in colnames(plastic)[3:5])
boxplot(plastic[, var]~Extrusion_rate,main=var,xlab="Extrusion_rate")
Univariate ANOVA for plastic film
> m_an=lm(cbind(Resistance,Gloss,Opacity)~ Extrusion_rate*Additive)
> summary(m_an)
• Response Resistance
Residuals:
Min 1Q Median 3Q Max
-0.580 -0.205 0.060 0.220 0.520 Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.3000 0.1485 42.426 <2e-16 ***
Extrusion_rate 0.5800 0.2100 2.762 0.0139 *
Additive 0.3800 0.2100 1.810 0.0892 .
Extrusion_rate:Additive 0.0200 0.2970 0.067 0.9471 ---
Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1 Residual standard error: 0.332 on 16 degrees of freedom
Multiple R-squared: 0.5864, Adjusted R-squared: 0.5089 F-statistic: 7.563 on 3 and 16 DF, p-value: 0.00227
There is evidence to retain the non-influence of Additive on Resistance
• Response Gloss
Min 1Q Median 3Q Max
-0.600 -0.245 -0.070 0.325 0.700 Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 9.5600 0.1812 52.746 < 2e-16 ***
Extrusion_rate -0.8400 0.2563 -3.277 0.00474 **
Additive 0.0200 0.2563 0.078 0.93877
Extrusion_rate:Additive 0.6600 0.3625 1.821 0.08740 . Residual standard error: 0.4053 on 16 degrees of freedom
Multiple R-squared: 0.4832, Adjusted R-squared: 0.3863 F-statistic: 4.987 on 3 and 16 DF, p-value: 0.01247
There is evidence to retain the non-influence of Additive on Gloss
• Response Opacity
Residuals:
Min 1Q Median 3Q Max
-3.120 -1.615 0.220 1.185 3.380 Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.7400 0.9009 4.152 0.000751 ***
Extrusion_rate -0.6000 1.2740 -0.471 0.644030
Additive 0.1000 1.2740 0.078 0.938410
Extrusion_rate:Additive 1.7800 1.8017 0.988 0.337886 Residual standard error: 2.014 on 16 degrees of freedom
Multiple R-squared: 0.1251, Adjusted R-squared: -0.03897 F-statistic: 0.7625 on 3 and 16 DF, p-value: 0.5315
There is evidence to retain the non-influence of both covari- ates on Opacity.
Multivariate ANOVA for plastic film
> library(car)
> manova_pl = Anova(m_an)
> summary(manova_pl)
• Estimate of Σ in the complete model (multiplied by n − p)
Type II MANOVA Tests:
Sum of squares and products for error:
Resistance Gloss Opacity Resistance 1.764 0.020 -3.070 Gloss 0.020 2.628 -0.552 Opacity -3.070 -0.552 64.924
• Test for Extrusion rate.
Estimate of Σ in the reduced model (multiplied by n − p) and test statistics. The reduced model does not contain the coefficients of Extrusion rate.
Term: Extrusion_rate
Sum of squares and products for the hypothesis:
Resistance Gloss Opacity Resistance 1.7405 -1.5045 0.8555 Gloss -1.5045 1.3005 -0.7395 Opacity 0.8555 -0.7395 0.4205 Multivariate Tests: Extrusion_rate
Df test stat approx F num Df den Df Pr(>F) Pillai 1 0.6181416 7.554269 3 14 0.003034 **
Wilks 1 0.3818584 7.554269 3 14 0.003034 **
Hotelling-Lawley 1 1.6187719 7.554269 3 14 0.003034 **
Roy 1 1.6187719 7.554269 3 14 0.003034 **
• Test for Additive.
Estimate of Σ in the reduced model (multiplied by n − p) and test statistics. The reduced model does not contain the coefficients of Additive.
Term: Additive
Sum of squares and products for the hypothesis:
Resistance Gloss Opacity Resistance 0.7605 0.6825 1.9305 Gloss 0.6825 0.6125 1.7325 Opacity 1.9305 1.7325 4.9005 Multivariate Tests: Additive
Df test stat approx F num Df den Df Pr(>F) Pillai 1 0.4769651 4.255619 3 14 0.024745 * Wilks 1 0.5230349 4.255619 3 14 0.024745 * Hotelling-Lawley 1 0.9119183 4.255619 3 14 0.024745 *
Roy 1 0.9119183 4.255619 3 14 0.024745 *
• Test for the interaction Extrusion rate:Additive.
Estimate of Σ in the reduced model (multiplied by n − p) and test statistics. The reduced model does not contain the coefficients of the interaction.
Term: Extrusion_rate:Additive
Sum of squares and products for the hypothesis:
Resistance Gloss Opacity Resistance 0.0005 0.0165 0.0445 Gloss 0.0165 0.5445 1.4685 Opacity 0.0445 1.4685 3.9605
Multivariate Tests: Extrusion_rate:Additive
Df test stat approx F num Df den Df Pr(>F) Pillai 1 0.2228942 1.338522 3 14 0.30178
Wilks 1 0.7771058 1.338522 3 14 0.30178
Hotelling-Lawley 1 0.2868261 1.338522 3 14 0.30178
Roy 1 0.2868261 1.338522 3 14 0.30178
The interaction could be considered non influent on the re- sponse in the multivariate tests.
5. MANOVA with repeated measure
The repeated measure models are special cases of the multivari- ate models, where the responses are the same variable considered in different conditions, usually at consecutive times.
In addition to the usual uni- and multi-variate tests, the influence of the factors could be tested on appropriate linear transformation of the responses. For instance could be considered univariate models with response:
- the mean of responses: Y 1 + · · · + Y m/m
- or/and m − 1 consecutive differences: Y 2 − Y 1, Y 3 − Y 2, . . . , Y m − Y m−1
- or/and m − 1 differences from a special condition, e.g. the first or the last ones: Y 2 − Y 1, Y 3 − Y 1, . . . , Y m − Y 1
- . . .
The consecutive differences and differences from a special con- dition allows for example to determine if and when changes oc- curred in the time evolution of the phenomenon.
In general, it is a mistake to consider the condition under which the response variable is measured as a further factor.
Indeed the response variables of the same subject can not be considered independent.
Instead it is licit to do so if certain technical restrictions hold (e.g. sphericity of the estimated correlation matrix) that here we do not discuss.
Example. Effect of treatments in three times A response is measured
three times for each sub- ject (pre-treatment, post- treatment, and in a later follow-up). Each subject re- ceives randomly one of three treatments: A, B, or the control
The Control is chosen as ref- erence level.
data=read.table(col.names=
c("treat","PreY","PostY","FollowY"), text="
A 0 0 9
A 6 6 3
A 8 2 6
A 7 6 4
A 6 12 6
A 13 3 8
B 8 11 27
B 9 3 26
B 12 0 18
B 3 0 14
B 3 0 25
B 4 2 9
Control 4 3 7 Control 8 7 20 Control 2 0 10 Control 5 8 14 Control 1 0 11 Control 8 9 10")
data2= within(data, treat <-
relevel(treat, ref ="Control")) attach(data2)
Univariate and Multivariate ANOVA
rep m=lm(cbind(PreY,PostY,FollowY)~treat) a) Univariate ANOVA
summary(rep m)
• response PreY
Residuals:
Min 1Q Median 3Q Max
-6.6667 -2.6250 -0.1667 2.2500 6.3333 Coefficients:
Estimate Std. Error t value Pr(>|t|) (Intercept) 4.667 1.491 3.129 0.00689 **
treatA 2.000 2.109 0.948 0.35801 treatB 1.833 2.109 0.869 0.39840
Residual standard error: 3.653 on 15 degrees of freedom
Multiple R-squared: 0.06875, Adjusted R-squared: -0.05541 F-statistic: 0.5537 on 2 and 15 DF, p-value: 0.5861
• response PostY
Residuals:
Min 1Q Median 3Q Max
-4.833 -2.667 -1.083 2.167 8.333 Coefficients:
Estimate Std. Error t value Pr(>|t|) (Intercept) 4.5000 1.7051 2.639 0.0186 * treatA 0.3333 2.4114 0.138 0.8919 treatB -1.8333 2.4114 -0.760 0.4589
Residual standard error: 4.177 on 15 degrees of freedom
Multiple R-squared: 0.05875, Adjusted R-squared: -0.06675 F-statistic: 0.4682 on 2 and 15 DF, p-value: 0.635
• response FollowY
Residuals:
Min 1Q Median 3Q Max
-10.83 -2.00 -0.50 2.75 8.00 Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 12.000 2.105 5.700 4.2e-05 ***
treatA -6.000 2.977 -2.015 0.0621 . treatB 7.833 2.977 2.631 0.0189 *
Residual standard error: 5.156 on 15 degrees of freedom Multiple R-squared: 0.5915, Adjusted R-squared: 0.537 F-statistic: 10.86 on 2 and 15 DF, p-value: 0.001214
- From the F-statistic: only at the follow-up the three treatments have different effects on the response.
Indeed the F-statistic compare the current model with a model with the constant only; in such a case with only a factor as covariate it is a test statistic for the nullity of the coefficients of the factor (treatment).
- From the sign of the coefficients
αA − αControl and αB − αControl
at the follow-up the treatment A has a negative effect w.r.t the Control, while the treatment B has a positive effect.
b) Multivariate ANOVA
> library(car)
> mult_rep_m=Anova(rep_m)
> summary(mult_rep_m)
- Estimate of the error matrix in the complete and in the re- duced model
Type II MANOVA Tests:
Sum of squares and products for error:
PreY PostY FollowY PreY 200.16667 84.66667 72.50000 PostY 84.66667 261.66667 90.66667 FollowY 72.50000 90.66667 398.83333
--- Term: treat
Sum of squares and products for the hypothesis:
PreY PostY FollowY PreY 14.7777778 -4.666667 0.1111111 PostY -4.6666667 16.333333 -92.6666667 FollowY 0.1111111 -92.666667 577.4444444
- Multivariate tests
Multivariate Tests: treat
Df test stat approx F num Df den Df Pr(>F) Pillai 2 0.7456838 2.774307 6 28 0.0303162 * Wilks 2 0.3186403 3.343317 6 26 0.0141337 * Hotelling-Lawley 2 1.9364644 3.872929 6 24 0.0076269 **
Roy 2 1.8259052 8.520891 3 14 0.0018134 **
From a multivariate point of view we could say that the effect of the three treatments is different on the response.
For better understanding when the differences among the treat- ments occurred, we can study the following transformations of response variables:
PostY-PreY FollowY-PostY FollowY-PreY
Post_Pre=PostY-PreY FU_Post=FollowY-PostY FU_Pre=FollowY2-PreY2
• PostY-PreY
> reg1=lm(Post_Pre~treat); summary(reg1) Residuals:
Min 1Q Median 3Q Max
-8.1667 -1.5833 0.8333 1.8333 7.8333 Coefficients:
Estimate Std. Error t value Pr(>|t|) (Intercept) -0.1667 1.8028 -0.092 0.928 treatA -1.6667 2.5495 -0.654 0.523 treatB -3.6667 2.5495 -1.438 0.171
Residual standard error: 4.416 on 15 degrees of freedom
Multiple R-squared: 0.1215, Adjusted R-squared: 0.004338 F-statistic: 1.037 on 2 and 15 DF, p-value: 0.3786
• FollowY-PostY
> reg2=lm(FU_Post~treat); summary(reg2) Residuals:
Min 1Q Median 3Q Max
-10.1667 -3.4167 -0.1667 3.7500 7.8333 Coefficients:
Estimate Std. Error t value Pr(>|t|) (Intercept) 7.500 2.307 3.250 0.00538 **
treatA -6.333 3.263 -1.941 0.07131 . treatB 9.667 3.263 2.962 0.00969 **
Residual standard error: 5.652 on 15 degrees of freedom
Multiple R-squared: 0.6192, Adjusted R-squared: 0.5684 F-statistic: 12.19 on 2 and 15 DF, p-value: 0.0007167
• FollowY-PreY
> reg3=lm(FU_Pre~treat); summary(reg3) Residuals:
Min 1Q Median 3Q Max
-8.3333 -3.8333 -0.3333 3.4167 9.6667 Coefficients:
Estimate Std. Error t value Pr(>|t|) (Intercept) 7.333 2.246 3.265 0.00522 **
treatA -8.000 3.176 -2.519 0.02362 * treatB 6.000 3.176 1.889 0.07838 .
Residual standard error: 5.502 on 15 degrees of freedom
Multiple R-squared: 0.566, Adjusted R-squared: 0.5081 F-statistic: 9.78 on 2 and 15 DF, p-value: 0.001912
We can conclude that only at the follow-up the effect of the three treatments becomes statistically significant.