Linear models including analysis of variance

(1)

Linear models

including analysis of variance

Eva Riccomagno, Maria Piera Rogantin

DIMA – Universit`a di Genova

riccomagno@dima.unige.it rogantin@dima.unige.it

(2)

Part B. ANalysis Of VAriance - ANOVA 1. Introduction

2. Univariate ANOVA a) One-way ANOVA

b) Two-way ANOVA crossed factors c) Two-way ANOVA nested factors 3. The Kruskal Wallis test

4. Multivariate ANOVA (MANOVA) 5. MANOVA with Repeated measure

(3)

1. Introduction

The general aim is to determine whether a quantitative variable has equal or different means in subgroups defined by one or more qualitative variables (or factors).

The ANOVA can be formulated in at least two equivalent ways:

1. especially when the subgroups are defined by one factor, the measure of the strength of relationship is given by

between group variance within group variance

2. as a special case of the linear models. This returns detailed information on which factor levels have the greatest/smallest impact on equality or difference of means. Moreover it gen- eralizes to any number of factors easily ... our approach

(4)

Example. Nitrogen in red clover plants

Effect of bacteria (5 strains and a composite) on the nitrogen content of red clover plants.

10 15 20 25 30 35

● ● ●●●

3DOK1

10 15 20 25 30 35

●● ●●●

3DOK13

10 15 20 25 30 35

● ● ● ● ●

3DOK4

10 15 20 25 30 35

● ●●● ●

3DOK5

10 15 20 25 30 35

●● ●●●

3DOK7

10 15 20 25 30 35

●● ●● ●

COMPOS

10 15 20 25 30 35

● ● ●●●

3DOK1

10 15 20 25 30 35

●● ●●●

3DOK13

10 15 20 25 30 35

● ● ● ● ●

3DOK4

10 15 20 25 30 35

● ●●● ●

3DOK5

10 15 20 25 30 35

●● ●●●

3DOK7

10 15 20 25 30 35

●● ●● ●

COMPOS

clover=read.table("C:/DATA/anova_redclover.txt",header =T);attach(clover) for (i in 1:6){h=.15*i*2

stripchart(Nitrogen[Strain==levels(Strain)[i]],

method="stack", offset=.5,at =.15*i*2,pch=19,xlim=c(8,38),cex=2,col="red") axis(2, at=h, labels = FALSE)

text(y=h, par("usr")[1]-0.2, labels=levels(Strain)[i],pos = 2,xpd = TRUE) abline(h=h);par(new=T)}

(5)

2. a) Univariate one-way ANOVA

Y quantitative response – A categorical covariate (with s levels) observed on n units

Aim: to determine whether the Y values depend or not on the levels of A.

Model: Y_ik = µ + α_i + ε_ik where:

- i for the levels of A, i = 1, . . . , s

- k for the replicates in the levels, k = 1, . . . , n_i, with ^P^s_i=1 n_i = n and where:

- µ is the grand mean

- α_i depends on the i-th level of A, - ε_ik is the residual.

The assumptions of the linear model are:

- sample variables:

(Y₁₁, . . . , Y_n₁, Y₂₁, . . . , Y_2n₂, . . . , Y_1s, . . . , Y_sn_s) - Y_ik ∼ N (µ + α_i, σ²), cov(Y_ik, Y_jh) = 0, for all i, j, h, k

(6)

The model is over-parameterised number of parameters: s + 1

number of different values of the covariate: s The mean values of the sample variables are:

E(Y_ik) = µ + α_i for all the units in the level i , k = 1, . . . , n_i The s means can be all estimated but not all s + 1 parameters individually.

Re-parameterisation of the model

Pay great attention to parametrization the software uses!

In R the first level (in lexico- graphic order) is chosen as reference level and the estimated parameters are:

(Intercept) µ + α₁, A2 α₂ − α₁, ...

AS α_s − α₁

The estimate of the mean value of Y_ik (µ + α_i) is obtained by summing the estimates of Intercept and Ai

(7)

Test of nullity of all parameters except the constant The hypotheses are:

H₀ : α₁ = · · · = α_s = 0 H₁ : at least one different from 0 H₀ can be also expressed as equality of the means in the different groups:

H₀ : µ + α₁ = · · · = µ + α_s

The F-statistic comparing the sums of squares of the residuals of the complete and of the reduced models is

F = (SS₀ − SS_C) / (s − 1)

SS_C / (n − s) ∼ F_{[s−1,n−s]}

Remember that the test is one sided right.

The test statistic F , except for a constant, is:

between group variance within group variance

High values of this ratio indicate high strength of relationship.

(8)

Effect of bacteria (5 strains and a composite) on the nitrogen content of red clover plants.

10 15 20 25 30 35

● ● ●●●

3DOK1

10 15 20 25 30 35

●● ●●●

3DOK13

10 15 20 25 30 35

● ● ● ● ●

3DOK4

10 15 20 25 30 35

● ●●● ●

3DOK5

10 15 20 25 30 35

●● ●●●

3DOK7

10 15 20 25 30 35

●● ●● ●

COMPOS

10 15 20 25 30 35

● ● ●●●

3DOK1

10 15 20 25 30 35

●● ●●●

3DOK13

10 15 20 25 30 35

● ● ● ● ●

3DOK4

10 15 20 25 30 35

● ●●● ●

3DOK5

10 15 20 25 30 35

●● ●●●

3DOK7

10 15 20 25 30 35

●● ●● ●

COMPOS

The role of α₁ is played by the coefficient of the strain 3DOK1, denoted by Strain3DOK1

> anova_clover=lm(Nitrogen~Strain)

> summary(anova_clover)

(9)

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 28.820 1.535 18.769 7.53e-16 ***

Strain3DOK13 -15.560 2.172 -7.166 2.09e-07 ***

Strain3DOK4 -14.180 2.172 -6.530 9.40e-07 ***

Strain3DOK5 -4.840 2.172 -2.229 0.035446 * Strain3DOK7 -8.900 2.172 -4.099 0.000411 ***

StrainCOMPOS -10.120 2.172 -4.660 9.85e-05 ***

Residual standard error: 3.433 on 24 degrees of freedom

Multiple R-squared: 0.7496, Adjusted R-squared: 0.6975 F-statistic: 14.37 on 5 and 24 DF, p-value: 1.485e-06

F-statistic

Test of nullity of all parameters except the constant or, equivalently, equality of the nitrogen means in the different strains.

There is strong evidence to reject H₀.

Estimate nitrogen means in the different strains:

Strain3DOK1: ˆµ + ˆα₁ = 28.82

Strain3DOK13: ˆµ + ˆα₂ = 13.26 (28.820 − 15.560)

Strain3DOK4: ˆµ + ˆα₃ = 14.64 (28.820 − 14.180) . . .

(10)

How to change reference level in R (a method)

> clover6= within(clover, Strain <- relevel(Strain, ref = 6))

> anova_clover6=lm(Nitrogen~Strain, clover6)

> summary(anova_clover6) ....

Coefficients:

(Intercept) 18.700 1.535 12.179 9.20e-12 ***

Strain3DOK1 10.120 2.172 4.660 9.85e-05 ***

Strain3DOK13 -5.440 2.172 -2.505 0.0194 * Strain3DOK4 -4.060 2.172 -1.870 0.0738 . Strain3DOK5 5.280 2.172 2.431 0.0229 * Strain3DOK7 1.220 2.172 0.562 0.5794

The tests on the nullity coefficients individually are:

- for Intercept H₀ : µ + α_R = 0, H₁ : µ + α_R 6= 0 (R ref. level) - for the i-th strains, i 6= R:

H₀ : α_i − α_R = 0 H₁ : α_i − α_R 6= 0

There is no evidence to reject that the nitrogen mean in Strain3DOK7 is equal to the nitrogen mean in StrainCOMPOS (reference level):

p-value: 0.5794.

Mean estimates: 17.48 (18.70 − 1.22) and 18.70 respectively.

(11)

Standardized residuals vs Fitted values

●

● ●●

●

●●

●

●●

●

15 20 25

−3−2−101

fitted values

standardized residuals

Notice that the fitted values are six, one for each level of strain.

The graph shows a homogenous cloud around the horizontal axis, indicating a good model.

plot(predict(anova_clover6),rstandard(anova_clover6),pch=16,cex.axis=1.5,

xlab="fitted values",ylab="standardized residuals",cex.lab=1.5); abline(h=0)

(12)

2. b) and c) Two-way ANOVA crossed and nested factors Y quantitative response; A and B factors

Two factors are crossed when every category of one factor occurs in the design with every category of the other factor. In other words, there is at least one observation in every combination of categories for the two factors.

A 1 2 3

B 1 2 3 1 2 3 1 2 3

x x x x x x x x x

(13)

A factor is nested within another factor when each category of the first factor occurs with only one category of the other. In other words, an observation has to be within one category of Factor B in order to have a specific category of Factor A. Not all combinations of categories are represented.

A 1 2 3

B 1 2 3 4 5 6 7 8 9

x x x x x x x x x

If two factors are crossed, their interaction can be calculated. If they are nested, it cannot.

(14)

2. b) Univariate two-way ANOVA – crossed factors Two-way ANOVA without interaction

Y quantitative response

A and B categorical covariates with s₁ and s₂ levels respectively The model is, for i = 1, . . . , s₁ and j = 1, . . . , s₂,

Y_ijk = µ + α_i + β_j + ε_ijk

The model is re-

parametrised similarly to the one-way case.

In R with the default reference level:

(Intercept) µ + α₁ + β₁, A2 α₂ − α₁,

...

AS1 α_s₁ − α₁ B2 β₂ − β₁, ...

BS2 β_s₂ − β₁ The number of estimable parameters is

1 + (s₁ − 1) + (s₂ − 1) = s₁ + s₂ − 1

The test on the nullity of the parameters α_i − α₁ and β_j − β₁ are similar to one-way ANOVA.

(15)

The test on the influence on the response of factor A has hypotheses

H₀ : α₁ = · · · = α_s₁ = 0 H₁ : at least one different from 0 The test statistic is:

F_A = (SS_R − SS_C) / (s₁ − 1)

SS_C / (n − (s₁ + s₂ − 1)) ∼ F_[s

1−1,n−s₁−s₂+1)]

Analogously for the factor B where the null hypothesis is H₀ : β₁ = · · · = β_s₂ = 0.

The problem is: what is the reduced model?

Different choices can be made. In R the default is a model without the factor submitted to the test (“type II” or “marginal”

tests):

- for A the reduced model is: Y_jk = µ + β_j + ε^A_jk - for B the reduced model is: Y_ik = µ + α_i + ε^B_ik. We do not discuss here other choices.

If the experiment is balanced some choices are equivalent.

An experiment is balanced if the number of observations of the response in each of the s1 × s2 combinations of levels is equal.

(16)

Example. Zooplankton from two lakes and with three nutrients

Six tanks for water from each of two lakes.

One of three nutrient sup- plements is added to a tank.

After 30 days the zooplankton in a unit volume of water is measured.

Zooplankton Supplement Lake

1 34 1 Rose

2 43 1 Rose

3 57 1 Dennison

4 40 1 Dennison

5 85 2 Rose

6 68 2 Rose

7 67 2 Dennison

8 53 2 Dennison

9 41 3 Rose

10 24 3 Rose

11 42 3 Dennison

12 52 3 Dennison

The experiment is balanced: two replicates for each of the six combinations of levels.

> Supplement=factor(Supplement)

> two_anova=lm(Zooplankton~Supplement+Lake)

Supplement has to be transformed from numeric to factor for ANOVA.

(17)

• Tests on the influence of each of the two factors

> anova(two_anova)

Analysis of Variance Table Response: Zooplankton

Df Sum Sq Mean Sq F value Pr(>F) Supplement 2 1918.50 959.25 6.4860 0.02117 * Lake 1 21.33 21.33 0.1442 0.71398 Residuals 8 1183.17 147.90

There is evidence to reject the influence of Supplement on the Zooplankton quantity while the lake is not influent.

(18)

• Tests on equality of the effects of each of the two factors These tests make precise which levels of Supplement give the greatest contribution to the previous results.

> summary(two_anova) ...

Coefficients:

(Intercept) 44.833 7.021 6.385 0.000212 ***

Supplement2 24.750 8.599 2.878 0.020570 * Supplement3 -3.750 8.599 -0.436 0.674306 Lake Rose -2.667 7.021 -0.380 0.713980

Multiple R-squared: 0.6211, Adjusted R-squared: 0.4791 F-statistic: 4.372 on 3 and 8 DF, p-value: 0.04228

There is evidence to reject:

H₀ : α₂ = α₁ (effect of Supplement2 = effect of Supplement1) or equivalently

H₀ : µ + α₂ + β_j = µ + α₁ + β_j for j = 1, 2

i.e. the mean of Zooplankton with Supplement2 is different to the mean with Supplement1, for each origin lake.

(19)

Two-way ANOVA with interaction

The model is, for i = 1, . . . , s₁ and j = 1, . . . , s₂, Y_ijk = µ + α_i + β_j + γ_ij + ε_ijk

where the coefficients γ_ij model the interaction between the two factors.

The re-parameterization of the model is even more complicated than the previous case. We do not give here the details.

Tests on the nullity of each of the three groups of parameters

> two_int_anova=lm(Zooplankton~Supplement+Lake+Supplement:Lake)

> anova(two_int_anova)

Analysis of Variance Table Response: Zooplankton

Df Sum Sq Mean Sq F value Pr(>F) Supplement 2 1918.50 959.25 9.2532 0.01468 * Lake 1 21.33 21.33 0.2058 0.66603 Supplement:Lake 2 561.17 280.58 2.7066 0.14529 Residuals 6 622.00 103.67

There is evidence only on the influence of Supplement on the Zooplankton quantity.

(20)

2. b) Univariate two-way ANOVA – nested factors Example. Insecticides to kill mosquitoes

Four chemical companies produce insecticides. The composition of the insecticides differs from company to company.

The factors are nested. See the dataset.

Response variable: number of the live mosquitoes 4 hours after an insecticide is sprayed on 400 mosquitoes inside a glass.

Three replications are per- formed for each product.

Product

Company 1 2 3 4 Total

A 3 3 3 0 9

B 3 3 0 0 6

C 3 3 0 0 6

D 3 3 3 3 12

Total 12 12 6 3 33

(21)

> mosquitos=read.table("C:/DATA/mosquitos.txt,header=TRUE);attach(mosquitos)

> nested_anova=lm(NMosquito~Company+Product/Company)

> anova(nested_anova)

Analysis of Variance Table Response: NMosquito

Df Sum Sq Mean Sq F value Pr(>F)

Company 3 22813.3 7604.4 132.776 3.048e-14 ***

Product 7 1500.6 214.4 3.743 0.008098 **

Residuals 22 1260.0 57.3

There is very strong evidence to reject the equality of means of the number of live mosquitos for the different companies and products.

(22)

3. The corresponding distribution free test of the one-way ANOVA: the Kruskal Wallis test

It is used for comparing more independent samples. It extends the Mann-Whitney U test.

Consider s groups of size n₁, . . . , n_s respectively, with n = ^Pⁿ_i=1 n_i. Thus for group i there are n_i i.i.d. sample variables:

X_i = (X_i,1, . . . , X_i,n_i) i = 1, . . . , s

The null hypothesis is that the s vectors of sample variables have the same distribution:

H₀ : X_i ∼ X_j for all i, j = 1, . . . , s

The alternative hypothesis is that some of the X_i’s tend to yield larger values than other X_j’s do.

(23)

As in the Mann-Whitney test, the rank of the sample variables of all groups are considered, ignoring group membership.

If the data contain no ties the test statistics is:

H = 12

n(n + 1)

s X

i=1

n_iR_i − 3(n + 1)

where R_i is the sample mean of the ranks of the group i.

Under the null hypothesis, H ∼

approx χ²

[s−1].

As R_i is the sum of the ranks of the group i divided by n_i, each sample size should be not too small (at least 5) for the approximation to be valid.

In presence of ties, to any tied values is assigned the average of the ranks they would have received had they not been tied and the formula above is slightly modified.

(24)

> kruskal.test(Nitrogen~Strain, clover6) Kruskal-Wallis rank sum test

data: Nitrogen by Strain

Kruskal-Wallis chi-squared = 21.659, df = 5, p-value = 0.0006077

(25)

4. Multivariate ANOVA (MANOVA) Multivariate linear model

Extend the regression model to the situation with m responses Y ¹, Y ², . . . , Y ^m and the same set of covariates X₁, . . . , X_p−1 on each sample unit.

Each response follows its own regression model:

Y ¹ = β₀¹ + β₁¹ X₁ + β₂¹ X₂ + · · · + β_p−1¹ X_p−1 + ε¹ Y ² = β₀² + β₁² X₁ + β₂² X₂ + · · · + β_p−1² X_p−1 + ε²

...

Y ^m = β₀^m + β₁^m X₁ + β₂^m X₂ + · · · + β_p−1^m X_p−1 + ε^m with Y_i^j ∼ N (x^t_iβ^j, σ_j²) where β^j is the vector (β₀^j, . . . , β_p−1^j ).

Or equivalently ε^j_i ∼ N (0, σ_j²)

(26)

Point estimator of the coefficients

For each response Y ^j the coefficients β^j are estimated by B^j, as in the univariate linear model.

Inference on the coefficients (tests and confidence intervals): different from the univariate case.

Indeed, the sample variables are:

Y¹ · · · Y^m Y₁¹ Y₁^m

... . . . ...

Y_i¹ Y_i^m ... . . . ...

Y_n¹ Y_n^m

While in each column the sample variables can be assumed independent (different units) in each row they can not (same unit).

The variance/covariance matrix of the sample variables Y_i¹, Y_i², . . . , Y_i^m and for the residual is assumed equal for each unit: Σ In the multivariate models Σ plays the role of the common variance σ² in the univariate case.

(27)

Test statistics for subset of coefficients

• univariate case: based on the sum of squares of residuals (of the complete and reduced models); recall that the point estimator of σ² is

SS_C/(n − p)

• multivariate case: based on the point estimator of the variance/covariance matrix Σ (of the complete and reduced models)

The estimate of Σ (multiplied by n − p, where p is the number of estimable coefficients of the model) is indicated by R as Sum of squares and products for error

(28)

Multivariate ANOVA

Consider a two-way MANOVA model: Y_ihk^j = µ + α^j_i + β_k^j + ε^j_ihk The coefficients are (here the coefficients of Y ^j are denoted by θ^j):

θ¹ · · · θ^m

µ¹ µ^m

α¹₁ . . . α^m₁

... ...

α¹_s

1 . . . α^m_s

1

β₁¹ . . . β₁^m ... . . . ...

β_s¹

2 . . . β_s^m

2

The multivariate test on the influence on the responses of the factor A has null hypothesis:

H₀ : α¹₁ = · · · = α¹_s

1 = · · · = α^m₁ = · · · = α^m_s

1

Analogously for the factor B.

R in the package car reports the following test statistics:

1. Wilk’s Lambda 2. Pillai’s trace

3. Hotelling’s trace

4. Roy’s Maximum Root

Generally the decision of the test is the same with the different statistics.

(29)

Example. Plastic film

Effects of rate of extrusion and amount of additive on extruding plastic film.

Response variables: tear resistance, gloss and opacity.

Covariates: extrusion rate and additive amount (both binary).

0 1

6.06.57.07.5

Resistance

Extrusion_rate

0 1

8.59.09.510.0

Gloss

Extrusion_rate

0 1

2468

Opacity

Extrusion_rate

●

0 1

6.06.57.07.5

Resistance

Additive

0 1

8.59.09.510.0

Gloss

Additive

●

0 1

2468

Opacity

Additive

Extrusion rate Additive

plastic=read.table("C:/DATA/plastic_film.txt",header=T,sep="\t") attach(plastic)

par(cex.axis=1.5, cex.lab=1.8, cex.main=2,lwd=2); par(mfrow=c(1,3)) for (var in colnames(plastic)[3:5])

boxplot(plastic[, var]~Extrusion_rate,main=var,xlab="Extrusion_rate")

(30)

Univariate ANOVA for plastic film

> m_an=lm(cbind(Resistance,Gloss,Opacity)~ Extrusion_rate*Additive)

> summary(m_an)

• Response Resistance

Residuals:

Min 1Q Median 3Q Max

-0.580 -0.205 0.060 0.220 0.520 Coefficients:

(Intercept) 6.3000 0.1485 42.426 <2e-16 ***

Extrusion_rate 0.5800 0.2100 2.762 0.0139 *

Additive 0.3800 0.2100 1.810 0.0892 .

Extrusion_rate:Additive 0.0200 0.2970 0.067 0.9471 ---

Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1 Residual standard error: 0.332 on 16 degrees of freedom

There is evidence to retain the non-influence of Additive on Resistance

(31)

• Response Gloss

-0.600 -0.245 -0.070 0.325 0.700 Coefficients:

(Intercept) 9.5600 0.1812 52.746 < 2e-16 ***

Extrusion_rate -0.8400 0.2563 -3.277 0.00474 **

Additive 0.0200 0.2563 0.078 0.93877

Extrusion_rate:Additive 0.6600 0.3625 1.821 0.08740 . Residual standard error: 0.4053 on 16 degrees of freedom

There is evidence to retain the non-influence of Additive on Gloss

(32)

• Response Opacity

Residuals:

-3.120 -1.615 0.220 1.185 3.380 Coefficients:

(Intercept) 3.7400 0.9009 4.152 0.000751 ***

Extrusion_rate -0.6000 1.2740 -0.471 0.644030

Additive 0.1000 1.2740 0.078 0.938410

Extrusion_rate:Additive 1.7800 1.8017 0.988 0.337886 Residual standard error: 2.014 on 16 degrees of freedom

Multiple R-squared: 0.1251, Adjusted R-squared: -0.03897 F-statistic: 0.7625 on 3 and 16 DF, p-value: 0.5315

There is evidence to retain the non-influence of both covariates on Opacity.

(33)

Multivariate ANOVA for plastic film

> library(car)

> manova_pl = Anova(m_an)

> summary(manova_pl)

• Estimate of Σ in the complete model (multiplied by n − p)

Type II MANOVA Tests:

Sum of squares and products for error:

Resistance Gloss Opacity Resistance 1.764 0.020 -3.070 Gloss 0.020 2.628 -0.552 Opacity -3.070 -0.552 64.924

(34)

• Test for Extrusion rate.

Estimate of Σ in the reduced model (multiplied by n − p) and test statistics. The reduced model does not contain the coefficients of Extrusion rate.

Term: Extrusion_rate

Sum of squares and products for the hypothesis:

Resistance Gloss Opacity Resistance 1.7405 -1.5045 0.8555 Gloss -1.5045 1.3005 -0.7395 Opacity 0.8555 -0.7395 0.4205 Multivariate Tests: Extrusion_rate

Df test stat approx F num Df den Df Pr(>F) Pillai 1 0.6181416 7.554269 3 14 0.003034 **

Wilks 1 0.3818584 7.554269 3 14 0.003034 **

Hotelling-Lawley 1 1.6187719 7.554269 3 14 0.003034 **

Roy 1 1.6187719 7.554269 3 14 0.003034 **

(35)

• Test for Additive.

Estimate of Σ in the reduced model (multiplied by n − p) and test statistics. The reduced model does not contain the coefficients of Additive.

Term: Additive

Resistance Gloss Opacity Resistance 0.7605 0.6825 1.9305 Gloss 0.6825 0.6125 1.7325 Opacity 1.9305 1.7325 4.9005 Multivariate Tests: Additive

Df test stat approx F num Df den Df Pr(>F) Pillai 1 0.4769651 4.255619 3 14 0.024745 * Wilks 1 0.5230349 4.255619 3 14 0.024745 * Hotelling-Lawley 1 0.9119183 4.255619 3 14 0.024745 *

Roy 1 0.9119183 4.255619 3 14 0.024745 *

(36)

• Test for the interaction Extrusion rate:Additive.

Estimate of Σ in the reduced model (multiplied by n − p) and test statistics. The reduced model does not contain the coefficients of the interaction.

Term: Extrusion_rate:Additive

Resistance Gloss Opacity Resistance 0.0005 0.0165 0.0445 Gloss 0.0165 0.5445 1.4685 Opacity 0.0445 1.4685 3.9605

Multivariate Tests: Extrusion_rate:Additive

Df test stat approx F num Df den Df Pr(>F) Pillai 1 0.2228942 1.338522 3 14 0.30178

Wilks 1 0.7771058 1.338522 3 14 0.30178

Hotelling-Lawley 1 0.2868261 1.338522 3 14 0.30178

Roy 1 0.2868261 1.338522 3 14 0.30178

The interaction could be considered non influent on the response in the multivariate tests.

(37)

5. MANOVA with repeated measure

The repeated measure models are special cases of the multivariate models, where the responses are the same variable considered in different conditions, usually at consecutive times.

In addition to the usual uni- and multi-variate tests, the influence of the factors could be tested on appropriate linear transformation of the responses. For instance could be considered univariate models with response:

- the mean of responses: Y ¹ + · · · + Y ^m/m

- or/and m − 1 consecutive differences: Y ² − Y ¹, Y ³ − Y ², . . . , Y ^m − Y ^m−1

- or/and m − 1 differences from a special condition, e.g. the first or the last ones: Y ² − Y ¹, Y ³ − Y ¹, . . . , Y ^m − Y ¹

- . . .

The consecutive differences and differences from a special condition allows for example to determine if and when changes occurred in the time evolution of the phenomenon.

(38)

In general, it is a mistake to consider the condition under which the response variable is measured as a further factor.

Indeed the response variables of the same subject can not be considered independent.

Instead it is licit to do so if certain technical restrictions hold (e.g. sphericity of the estimated correlation matrix) that here we do not discuss.

(39)

Example. Effect of treatments in three times A response is measured

three times for each subject (pre-treatment, post- treatment, and in a later follow-up). Each subject re- ceives randomly one of three treatments: A, B, or the control

The Control is chosen as reference level.

data=read.table(col.names=

c("treat","PreY","PostY","FollowY"), text="

A 0 0 9

A 6 6 3

A 8 2 6

A 7 6 4

A 6 12 6

A 13 3 8

B 8 11 27

B 9 3 26

B 12 0 18

B 3 0 14

B 3 0 25

B 4 2 9

Control 4 3 7 Control 8 7 20 Control 2 0 10 Control 5 8 14 Control 1 0 11 Control 8 9 10")

data2= within(data, treat <-

relevel(treat, ref ="Control")) attach(data2)

(40)

Univariate and Multivariate ANOVA

rep m=lm(cbind(PreY,PostY,FollowY)~treat) a) Univariate ANOVA

summary(rep m)

• response PreY

Residuals:

-6.6667 -2.6250 -0.1667 2.2500 6.3333 Coefficients:

Estimate Std. Error t value Pr(>|t|) (Intercept) 4.667 1.491 3.129 0.00689 **

treatA 2.000 2.109 0.948 0.35801 treatB 1.833 2.109 0.869 0.39840

(41)

• response PostY

Residuals:

-4.833 -2.667 -1.083 2.167 8.333 Coefficients:

Estimate Std. Error t value Pr(>|t|) (Intercept) 4.5000 1.7051 2.639 0.0186 * treatA 0.3333 2.4114 0.138 0.8919 treatB -1.8333 2.4114 -0.760 0.4589

• response FollowY

Residuals:

-10.83 -2.00 -0.50 2.75 8.00 Coefficients:

(Intercept) 12.000 2.105 5.700 4.2e-05 ***

(42)

treatA -6.000 2.977 -2.015 0.0621 . treatB 7.833 2.977 2.631 0.0189 *

Residual standard error: 5.156 on 15 degrees of freedom Multiple R-squared: 0.5915, Adjusted R-squared: 0.537 F-statistic: 10.86 on 2 and 15 DF, p-value: 0.001214

- From the F-statistic: only at the follow-up the three treatments have different effects on the response.

Indeed the F-statistic compare the current model with a model with the constant only; in such a case with only a factor as covariate it is a test statistic for the nullity of the coefficients of the factor (treatment).

- From the sign of the coefficients

α_A − α_Control and α_B − α_Control

at the follow-up the treatment A has a negative effect w.r.t the Control, while the treatment B has a positive effect.

(43)

b) Multivariate ANOVA

> library(car)

> mult_rep_m=Anova(rep_m)

> summary(mult_rep_m)

- Estimate of the error matrix in the complete and in the reduced model

Type II MANOVA Tests:

Sum of squares and products for error:

PreY PostY FollowY PreY 200.16667 84.66667 72.50000 PostY 84.66667 261.66667 90.66667 FollowY 72.50000 90.66667 398.83333

--- Term: treat

PreY PostY FollowY PreY 14.7777778 -4.666667 0.1111111 PostY -4.6666667 16.333333 -92.6666667 FollowY 0.1111111 -92.666667 577.4444444

(44)

- Multivariate tests

Multivariate Tests: treat

Df test stat approx F num Df den Df Pr(>F) Pillai 2 0.7456838 2.774307 6 28 0.0303162 * Wilks 2 0.3186403 3.343317 6 26 0.0141337 * Hotelling-Lawley 2 1.9364644 3.872929 6 24 0.0076269 **

Roy 2 1.8259052 8.520891 3 14 0.0018134 **

From a multivariate point of view we could say that the effect of the three treatments is different on the response.

For better understanding when the differences among the treatments occurred, we can study the following transformations of response variables:

PostY-PreY FollowY-PostY FollowY-PreY

Post_Pre=PostY-PreY FU_Post=FollowY-PostY FU_Pre=FollowY2-PreY2

(45)

• PostY-PreY

> reg1=lm(Post_Pre~treat); summary(reg1) Residuals:

-8.1667 -1.5833 0.8333 1.8333 7.8333 Coefficients:

Estimate Std. Error t value Pr(>|t|) (Intercept) -0.1667 1.8028 -0.092 0.928 treatA -1.6667 2.5495 -0.654 0.523 treatB -3.6667 2.5495 -1.438 0.171

(46)

• FollowY-PostY

> reg2=lm(FU_Post~treat); summary(reg2) Residuals:

-10.1667 -3.4167 -0.1667 3.7500 7.8333 Coefficients:

treatA -6.333 3.263 -1.941 0.07131 . treatB 9.667 3.263 2.962 0.00969 **

(47)

• FollowY-PreY

> reg3=lm(FU_Pre~treat); summary(reg3) Residuals:

-8.3333 -3.8333 -0.3333 3.4167 9.6667 Coefficients:

treatA -8.000 3.176 -2.519 0.02362 * treatB 6.000 3.176 1.889 0.07838 .

We can conclude that only at the follow-up the effect of the three treatments becomes statistically significant.