• Non ci sono risultati.

Linear models including analysis of variance

N/A
N/A
Protected

Academic year: 2021

Condividi "Linear models including analysis of variance"

Copied!
47
0
0

Testo completo

(1)

Linear models

including analysis of variance

Eva Riccomagno, Maria Piera Rogantin

DIMA – Universit`a di Genova

riccomagno@dima.unige.it rogantin@dima.unige.it

(2)

Part B. ANalysis Of VAriance - ANOVA 1. Introduction

2. Univariate ANOVA a) One-way ANOVA

b) Two-way ANOVA crossed factors c) Two-way ANOVA nested factors 3. The Kruskal Wallis test

4. Multivariate ANOVA (MANOVA) 5. MANOVA with Repeated measure

(3)

1. Introduction

The general aim is to determine whether a quantitative variable has equal or different means in subgroups defined by one or more qualitative variables (or factors).

The ANOVA can be formulated in at least two equivalent ways:

1. especially when the subgroups are defined by one factor, the measure of the strength of relationship is given by

between group variance within group variance

2. as a special case of the linear models. This returns detailed information on which factor levels have the greatest/smallest impact on equality or difference of means. Moreover it gen- eralizes to any number of factors easily ... our approach

(4)

Example. Nitrogen in red clover plants

Effect of bacteria (5 strains and a composite) on the nitrogen content of red clover plants.

10 15 20 25 30 35

●●●

3DOK1

10 15 20 25 30 35

●●

3DOK13

10 15 20 25 30 35

3DOK4

10 15 20 25 30 35

●●●

3DOK5

10 15 20 25 30 35

●● ●●●

3DOK7

10 15 20 25 30 35

●● ●●

COMPOS

10 15 20 25 30 35

●●●

3DOK1

10 15 20 25 30 35

3DOK13

10 15 20 25 30 35

● ●

3DOK4

10 15 20 25 30 35

●●●

3DOK5

10 15 20 25 30 35

● ●

3DOK7

10 15 20 25 30 35

●● ●● ●

COMPOS

clover=read.table("C:/DATA/anova_redclover.txt",header =T);attach(clover) for (i in 1:6){h=.15*i*2

stripchart(Nitrogen[Strain==levels(Strain)[i]],

method="stack", offset=.5,at =.15*i*2,pch=19,xlim=c(8,38),cex=2,col="red") axis(2, at=h, labels = FALSE)

text(y=h, par("usr")[1]-0.2, labels=levels(Strain)[i],pos = 2,xpd = TRUE) abline(h=h);par(new=T)}

(5)

2. a) Univariate one-way ANOVA

Y quantitative response – A categorical covariate (with s levels) observed on n units

Aim: to determine whether the Y values depend or not on the levels of A.

Model: Yik = µ + αi + εik where:

- i for the levels of A, i = 1, . . . , s

- k for the replicates in the levels, k = 1, . . . , ni, with Psi=1 ni = n and where:

- µ is the grand mean

- αi depends on the i-th level of A, - εik is the residual.

The assumptions of the linear model are:

- sample variables:

(Y11, . . . , Yn1, Y21, . . . , Y2n2, . . . , Y1s, . . . , Ysns) - Yik ∼ N (µ + αi, σ2), cov(Yik, Yjh) = 0, for all i, j, h, k

(6)

The model is over-parameterised number of parameters: s + 1

number of different values of the covariate: s The mean values of the sample variables are:

E(Yik) = µ + αi for all the units in the level i , k = 1, . . . , ni The s means can be all estimated but not all s + 1 parameters individually.

Re-parameterisation of the model

Pay great attention to parametrization the software uses!

In R the first level (in lexico- graphic order) is chosen as reference level and the esti- mated parameters are:

(Intercept) µ + α1, A2 α2 − α1, ...

AS αs − α1

The estimate of the mean value of Yik (µ + αi) is obtained by summing the estimates of Intercept and Ai

(7)

Test of nullity of all parameters except the constant The hypotheses are:

H0 : α1 = · · · = αs = 0 H1 : at least one different from 0 H0 can be also expressed as equality of the means in the different groups:

H0 : µ + α1 = · · · = µ + αs

The F-statistic comparing the sums of squares of the residuals of the complete and of the reduced models is

F = (SS0 − SSC) / (s − 1)

SSC / (n − s) ∼ F[s−1,n−s]

Remember that the test is one sided right.

The test statistic F , except for a constant, is:

between group variance within group variance

High values of this ratio indicate high strength of relationship.

(8)

Example. Nitrogen in red clover plants

Effect of bacteria (5 strains and a composite) on the nitrogen content of red clover plants.

10 15 20 25 30 35

●●●

3DOK1

10 15 20 25 30 35

●●

3DOK13

10 15 20 25 30 35

3DOK4

10 15 20 25 30 35

●●●

3DOK5

10 15 20 25 30 35

●● ●●●

3DOK7

10 15 20 25 30 35

●● ●●

COMPOS

10 15 20 25 30 35

●●●

3DOK1

10 15 20 25 30 35

3DOK13

10 15 20 25 30 35

● ●

3DOK4

10 15 20 25 30 35

●●●

3DOK5

10 15 20 25 30 35

● ●

3DOK7

10 15 20 25 30 35

●● ●● ●

COMPOS

The role of α1 is played by the coefficient of the strain 3DOK1, denoted by Strain3DOK1

> anova_clover=lm(Nitrogen~Strain)

> summary(anova_clover)

(9)

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 28.820 1.535 18.769 7.53e-16 ***

Strain3DOK13 -15.560 2.172 -7.166 2.09e-07 ***

Strain3DOK4 -14.180 2.172 -6.530 9.40e-07 ***

Strain3DOK5 -4.840 2.172 -2.229 0.035446 * Strain3DOK7 -8.900 2.172 -4.099 0.000411 ***

StrainCOMPOS -10.120 2.172 -4.660 9.85e-05 ***

Residual standard error: 3.433 on 24 degrees of freedom

Multiple R-squared: 0.7496, Adjusted R-squared: 0.6975 F-statistic: 14.37 on 5 and 24 DF, p-value: 1.485e-06

F-statistic

Test of nullity of all parameters except the constant or, equiv- alently, equality of the nitrogen means in the different strains.

There is strong evidence to reject H0.

Estimate nitrogen means in the different strains:

Strain3DOK1: ˆµ + ˆα1 = 28.82

Strain3DOK13: ˆµ + ˆα2 = 13.26 (28.820 − 15.560)

Strain3DOK4: ˆµ + ˆα3 = 14.64 (28.820 − 14.180) . . .

(10)

How to change reference level in R (a method)

> clover6= within(clover, Strain <- relevel(Strain, ref = 6))

> anova_clover6=lm(Nitrogen~Strain, clover6)

> summary(anova_clover6) ....

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 18.700 1.535 12.179 9.20e-12 ***

Strain3DOK1 10.120 2.172 4.660 9.85e-05 ***

Strain3DOK13 -5.440 2.172 -2.505 0.0194 * Strain3DOK4 -4.060 2.172 -1.870 0.0738 . Strain3DOK5 5.280 2.172 2.431 0.0229 * Strain3DOK7 1.220 2.172 0.562 0.5794

The tests on the nullity coefficients individually are:

- for Intercept H0 : µ + αR = 0, H1 : µ + αR 6= 0 (R ref. level) - for the i-th strains, i 6= R:

H0 : αi − αR = 0 H1 : αi − αR 6= 0

There is no evidence to reject that the nitrogen mean in Strain3DOK7 is equal to the nitrogen mean in StrainCOMPOS (reference level):

p-value: 0.5794.

Mean estimates: 17.48 (18.70 − 1.22) and 18.70 respectively.

(11)

Standardized residuals vs Fitted values

15 20 25

−3−2−101

fitted values

standardized residuals

Notice that the fitted values are six, one for each level of strain.

The graph shows a homogenous cloud around the horizontal axis, indicating a good model.

plot(predict(anova_clover6),rstandard(anova_clover6),pch=16,cex.axis=1.5,

xlab="fitted values",ylab="standardized residuals",cex.lab=1.5); abline(h=0)

(12)

2. b) and c) Two-way ANOVA crossed and nested factors Y quantitative response; A and B factors

Two factors are crossed when every category of one factor occurs in the design with every category of the other factor. In other words, there is at least one observation in every combination of categories for the two factors.

A 1 2 3

B 1 2 3 1 2 3 1 2 3

x x x x x x x x x

x x x x x x x x x

(13)

A factor is nested within another factor when each category of the first factor occurs with only one category of the other. In other words, an observation has to be within one category of Factor B in order to have a specific category of Factor A. Not all combinations of categories are represented.

A 1 2 3

B 1 2 3 4 5 6 7 8 9

x x x x x x x x x

x x x x x x x x x

If two factors are crossed, their interaction can be calculated. If they are nested, it cannot.

(14)

2. b) Univariate two-way ANOVA – crossed factors Two-way ANOVA without interaction

Y quantitative response

A and B categorical covariates with s1 and s2 levels respectively The model is, for i = 1, . . . , s1 and j = 1, . . . , s2,

Yijk = µ + αi + βj + εijk

The model is re-

parametrised similarly to the one-way case.

In R with the default refer- ence level:

(Intercept) µ + α1 + β1, A2 α2 − α1,

...

AS1 αs1 − α1 B2 β2 − β1, ...

BS2 βs2 − β1 The number of estimable parameters is

1 + (s1 − 1) + (s2 − 1) = s1 + s2 − 1

The test on the nullity of the parameters αi − α1 and βj − β1 are similar to one-way ANOVA.

(15)

The test on the influence on the response of factor A has hy- potheses

H0 : α1 = · · · = αs1 = 0 H1 : at least one different from 0 The test statistic is:

FA = (SSR − SSC) / (s1 − 1)

SSC / (n − (s1 + s2 − 1)) ∼ F[s

1−1,n−s1−s2+1)]

Analogously for the factor B where the null hypothesis is H0 : β1 = · · · = βs2 = 0.

The problem is: what is the reduced model?

Different choices can be made. In R the default is a model without the factor submitted to the test (“type II” or “marginal”

tests):

- for A the reduced model is: Yjk = µ + βj + εAjk - for B the reduced model is: Yik = µ + αi + εBik. We do not discuss here other choices.

If the experiment is balanced some choices are equivalent.

An experiment is balanced if the number of observations of the response in each of the s1 × s2 combinations of levels is equal.

(16)

Example. Zooplankton from two lakes and with three nutrients

Six tanks for water from each of two lakes.

One of three nutrient sup- plements is added to a tank.

After 30 days the zooplank- ton in a unit volume of water is measured.

Zooplankton Supplement Lake

1 34 1 Rose

2 43 1 Rose

3 57 1 Dennison

4 40 1 Dennison

5 85 2 Rose

6 68 2 Rose

7 67 2 Dennison

8 53 2 Dennison

9 41 3 Rose

10 24 3 Rose

11 42 3 Dennison

12 52 3 Dennison

The experiment is balanced: two replicates for each of the six combinations of levels.

> Supplement=factor(Supplement)

> two_anova=lm(Zooplankton~Supplement+Lake)

Supplement has to be transformed from numeric to factor for ANOVA.

(17)

• Tests on the influence of each of the two factors

> anova(two_anova)

Analysis of Variance Table Response: Zooplankton

Df Sum Sq Mean Sq F value Pr(>F) Supplement 2 1918.50 959.25 6.4860 0.02117 * Lake 1 21.33 21.33 0.1442 0.71398 Residuals 8 1183.17 147.90

There is evidence to reject the influence of Supplement on the Zooplankton quantity while the lake is not influent.

(18)

• Tests on equality of the effects of each of the two factors These tests make precise which levels of Supplement give the greatest contribution to the previous results.

> summary(two_anova) ...

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 44.833 7.021 6.385 0.000212 ***

Supplement2 24.750 8.599 2.878 0.020570 * Supplement3 -3.750 8.599 -0.436 0.674306 Lake Rose -2.667 7.021 -0.380 0.713980

Residual standard error: 12.16 on 8 degrees of freedom

Multiple R-squared: 0.6211, Adjusted R-squared: 0.4791 F-statistic: 4.372 on 3 and 8 DF, p-value: 0.04228

There is evidence to reject:

H0 : α2 = α1 (effect of Supplement2 = effect of Supplement1) or equivalently

H0 : µ + α2 + βj = µ + α1 + βj for j = 1, 2

i.e. the mean of Zooplankton with Supplement2 is different to the mean with Supplement1, for each origin lake.

(19)

Two-way ANOVA with interaction

The model is, for i = 1, . . . , s1 and j = 1, . . . , s2, Yijk = µ + αi + βj + γij + εijk

where the coefficients γij model the interaction between the two factors.

The re-parameterization of the model is even more complicated than the previous case. We do not give here the details.

Tests on the nullity of each of the three groups of parameters

> two_int_anova=lm(Zooplankton~Supplement+Lake+Supplement:Lake)

> anova(two_int_anova)

Analysis of Variance Table Response: Zooplankton

Df Sum Sq Mean Sq F value Pr(>F) Supplement 2 1918.50 959.25 9.2532 0.01468 * Lake 1 21.33 21.33 0.2058 0.66603 Supplement:Lake 2 561.17 280.58 2.7066 0.14529 Residuals 6 622.00 103.67

There is evidence only on the influence of Supplement on the Zooplankton quantity.

(20)

2. b) Univariate two-way ANOVA – nested factors Example. Insecticides to kill mosquitoes

Four chemical companies produce insecticides. The composition of the insecticides differs from company to company.

The factors are nested. See the dataset.

Response variable: number of the live mosquitoes 4 hours after an insecticide is sprayed on 400 mosquitoes inside a glass.

Three replications are per- formed for each product.

Product

Company 1 2 3 4 Total

A 3 3 3 0 9

B 3 3 0 0 6

C 3 3 0 0 6

D 3 3 3 3 12

Total 12 12 6 3 33

(21)

> mosquitos=read.table("C:/DATA/mosquitos.txt,header=TRUE);attach(mosquitos)

> nested_anova=lm(NMosquito~Company+Product/Company)

> anova(nested_anova)

Analysis of Variance Table Response: NMosquito

Df Sum Sq Mean Sq F value Pr(>F)

Company 3 22813.3 7604.4 132.776 3.048e-14 ***

Product 7 1500.6 214.4 3.743 0.008098 **

Residuals 22 1260.0 57.3

There is very strong evidence to reject the equality of means of the number of live mosquitos for the different companies and products.

(22)

3. The corresponding distribution free test of the one-way ANOVA: the Kruskal Wallis test

It is used for comparing more independent samples. It extends the Mann-Whitney U test.

Consider s groups of size n1, . . . , ns respectively, with n = Pni=1 ni. Thus for group i there are ni i.i.d. sample variables:

Xi = (Xi,1, . . . , Xi,ni) i = 1, . . . , s

The null hypothesis is that the s vectors of sample variables have the same distribution:

H0 : Xi ∼ Xj for all i, j = 1, . . . , s

The alternative hypothesis is that some of the Xi’s tend to yield larger values than other Xj’s do.

(23)

As in the Mann-Whitney test, the rank of the sample variables of all groups are considered, ignoring group membership.

If the data contain no ties the test statistics is:

H = 12

n(n + 1)

s X

i=1

niRi − 3(n + 1)

where Ri is the sample mean of the ranks of the group i.

Under the null hypothesis, H ∼

approx χ2

[s−1].

As Ri is the sum of the ranks of the group i divided by ni, each sample size should be not too small (at least 5) for the approximation to be valid.

In presence of ties, to any tied values is assigned the average of the ranks they would have received had they not been tied and the formula above is slightly modified.

(24)

Example. Nitrogen in red clover plants

> kruskal.test(Nitrogen~Strain, clover6) Kruskal-Wallis rank sum test

data: Nitrogen by Strain

Kruskal-Wallis chi-squared = 21.659, df = 5, p-value = 0.0006077

(25)

4. Multivariate ANOVA (MANOVA) Multivariate linear model

Extend the regression model to the situation with m responses Y 1, Y 2, . . . , Y m and the same set of covariates X1, . . . , Xp−1 on each sample unit.

Each response follows its own regression model:

Y 1 = β01 + β11 X1 + β21 X2 + · · · + βp−11 Xp−1 + ε1 Y 2 = β02 + β12 X1 + β22 X2 + · · · + βp−12 Xp−1 + ε2

...

Y m = β0m + β1m X1 + β2m X2 + · · · + βp−1m Xp−1 + εm with Yij ∼ N (xtiβj, σj2) where βj is the vector (β0j, . . . , βp−1j ).

Or equivalently εji ∼ N (0, σj2)

(26)

Point estimator of the coefficients

For each response Y j the coefficients βj are estimated by Bj, as in the univariate linear model.

Inference on the coefficients (tests and confidence intervals): dif- ferent from the univariate case.

Indeed, the sample variables are:

Y1 · · · Ym Y11 Y1m

... . . . ...

Yi1 Yim ... . . . ...

Yn1 Ynm

While in each column the sample vari- ables can be assumed independent (differ- ent units) in each row they can not (same unit).

The variance/covariance matrix of the sample variables Yi1, Yi2, . . . , Yim and for the residual is assumed equal for each unit: Σ In the multivariate models Σ plays the role of the common vari- ance σ2 in the univariate case.

(27)

Test statistics for subset of coefficients

• univariate case: based on the sum of squares of residuals (of the complete and reduced models); recall that the point estimator of σ2 is

SSC/(n − p)

• multivariate case: based on the point estimator of the vari- ance/covariance matrix Σ (of the complete and reduced mod- els)

The estimate of Σ (multiplied by n − p, where p is the number of estimable coefficients of the model) is indicated by R as Sum of squares and products for error

(28)

Multivariate ANOVA

Consider a two-way MANOVA model: Yihkj = µ + αji + βkj + εjihk The coefficients are (here the coefficients of Y j are denoted by θj):

θ1 · · · θm

µ1 µm

α11 . . . αm1

... ...

α1s

1 . . . αms

1

β11 . . . β1m ... . . . ...

βs1

2 . . . βsm

2

The multivariate test on the influence on the responses of the factor A has null hy- pothesis:

H0 : α11 = · · · = α1s

1 = · · · = αm1 = · · · = αms

1

Analogously for the factor B.

R in the package car reports the following test statistics:

1. Wilk’s Lambda 2. Pillai’s trace

3. Hotelling’s trace

4. Roy’s Maximum Root

Generally the decision of the test is the same with the different statistics.

(29)

Example. Plastic film

Effects of rate of extrusion and amount of additive on extruding plastic film.

Response variables: tear resistance, gloss and opacity.

Covariates: extrusion rate and additive amount (both binary).

0 1

6.06.57.07.5

Resistance

Extrusion_rate

0 1

8.59.09.510.0

Gloss

Extrusion_rate

0 1

2468

Opacity

Extrusion_rate

0 1

6.06.57.07.5

Resistance

Additive

0 1

8.59.09.510.0

Gloss

Additive

0 1

2468

Opacity

Additive

Extrusion rate Additive

plastic=read.table("C:/DATA/plastic_film.txt",header=T,sep="\t") attach(plastic)

par(cex.axis=1.5, cex.lab=1.8, cex.main=2,lwd=2); par(mfrow=c(1,3)) for (var in colnames(plastic)[3:5])

boxplot(plastic[, var]~Extrusion_rate,main=var,xlab="Extrusion_rate")

(30)

Univariate ANOVA for plastic film

> m_an=lm(cbind(Resistance,Gloss,Opacity)~ Extrusion_rate*Additive)

> summary(m_an)

• Response Resistance

Residuals:

Min 1Q Median 3Q Max

-0.580 -0.205 0.060 0.220 0.520 Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 6.3000 0.1485 42.426 <2e-16 ***

Extrusion_rate 0.5800 0.2100 2.762 0.0139 *

Additive 0.3800 0.2100 1.810 0.0892 .

Extrusion_rate:Additive 0.0200 0.2970 0.067 0.9471 ---

Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1 Residual standard error: 0.332 on 16 degrees of freedom

Multiple R-squared: 0.5864, Adjusted R-squared: 0.5089 F-statistic: 7.563 on 3 and 16 DF, p-value: 0.00227

There is evidence to retain the non-influence of Additive on Resistance

(31)

• Response Gloss

Min 1Q Median 3Q Max

-0.600 -0.245 -0.070 0.325 0.700 Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 9.5600 0.1812 52.746 < 2e-16 ***

Extrusion_rate -0.8400 0.2563 -3.277 0.00474 **

Additive 0.0200 0.2563 0.078 0.93877

Extrusion_rate:Additive 0.6600 0.3625 1.821 0.08740 . Residual standard error: 0.4053 on 16 degrees of freedom

Multiple R-squared: 0.4832, Adjusted R-squared: 0.3863 F-statistic: 4.987 on 3 and 16 DF, p-value: 0.01247

There is evidence to retain the non-influence of Additive on Gloss

(32)

• Response Opacity

Residuals:

Min 1Q Median 3Q Max

-3.120 -1.615 0.220 1.185 3.380 Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 3.7400 0.9009 4.152 0.000751 ***

Extrusion_rate -0.6000 1.2740 -0.471 0.644030

Additive 0.1000 1.2740 0.078 0.938410

Extrusion_rate:Additive 1.7800 1.8017 0.988 0.337886 Residual standard error: 2.014 on 16 degrees of freedom

Multiple R-squared: 0.1251, Adjusted R-squared: -0.03897 F-statistic: 0.7625 on 3 and 16 DF, p-value: 0.5315

There is evidence to retain the non-influence of both covari- ates on Opacity.

(33)

Multivariate ANOVA for plastic film

> library(car)

> manova_pl = Anova(m_an)

> summary(manova_pl)

• Estimate of Σ in the complete model (multiplied by n − p)

Type II MANOVA Tests:

Sum of squares and products for error:

Resistance Gloss Opacity Resistance 1.764 0.020 -3.070 Gloss 0.020 2.628 -0.552 Opacity -3.070 -0.552 64.924

(34)

• Test for Extrusion rate.

Estimate of Σ in the reduced model (multiplied by n − p) and test statistics. The reduced model does not contain the coefficients of Extrusion rate.

Term: Extrusion_rate

Sum of squares and products for the hypothesis:

Resistance Gloss Opacity Resistance 1.7405 -1.5045 0.8555 Gloss -1.5045 1.3005 -0.7395 Opacity 0.8555 -0.7395 0.4205 Multivariate Tests: Extrusion_rate

Df test stat approx F num Df den Df Pr(>F) Pillai 1 0.6181416 7.554269 3 14 0.003034 **

Wilks 1 0.3818584 7.554269 3 14 0.003034 **

Hotelling-Lawley 1 1.6187719 7.554269 3 14 0.003034 **

Roy 1 1.6187719 7.554269 3 14 0.003034 **

(35)

• Test for Additive.

Estimate of Σ in the reduced model (multiplied by n − p) and test statistics. The reduced model does not contain the coefficients of Additive.

Term: Additive

Sum of squares and products for the hypothesis:

Resistance Gloss Opacity Resistance 0.7605 0.6825 1.9305 Gloss 0.6825 0.6125 1.7325 Opacity 1.9305 1.7325 4.9005 Multivariate Tests: Additive

Df test stat approx F num Df den Df Pr(>F) Pillai 1 0.4769651 4.255619 3 14 0.024745 * Wilks 1 0.5230349 4.255619 3 14 0.024745 * Hotelling-Lawley 1 0.9119183 4.255619 3 14 0.024745 *

Roy 1 0.9119183 4.255619 3 14 0.024745 *

(36)

• Test for the interaction Extrusion rate:Additive.

Estimate of Σ in the reduced model (multiplied by n − p) and test statistics. The reduced model does not contain the coefficients of the interaction.

Term: Extrusion_rate:Additive

Sum of squares and products for the hypothesis:

Resistance Gloss Opacity Resistance 0.0005 0.0165 0.0445 Gloss 0.0165 0.5445 1.4685 Opacity 0.0445 1.4685 3.9605

Multivariate Tests: Extrusion_rate:Additive

Df test stat approx F num Df den Df Pr(>F) Pillai 1 0.2228942 1.338522 3 14 0.30178

Wilks 1 0.7771058 1.338522 3 14 0.30178

Hotelling-Lawley 1 0.2868261 1.338522 3 14 0.30178

Roy 1 0.2868261 1.338522 3 14 0.30178

The interaction could be considered non influent on the re- sponse in the multivariate tests.

(37)

5. MANOVA with repeated measure

The repeated measure models are special cases of the multivari- ate models, where the responses are the same variable considered in different conditions, usually at consecutive times.

In addition to the usual uni- and multi-variate tests, the influence of the factors could be tested on appropriate linear transformation of the responses. For instance could be considered univariate models with response:

- the mean of responses: Y 1 + · · · + Y m/m

- or/and m − 1 consecutive differences: Y 2 − Y 1, Y 3 − Y 2, . . . , Y m − Y m−1

- or/and m − 1 differences from a special condition, e.g. the first or the last ones: Y 2 − Y 1, Y 3 − Y 1, . . . , Y m − Y 1

- . . .

The consecutive differences and differences from a special con- dition allows for example to determine if and when changes oc- curred in the time evolution of the phenomenon.

(38)

In general, it is a mistake to consider the condition under which the response variable is measured as a further factor.

Indeed the response variables of the same subject can not be considered independent.

Instead it is licit to do so if certain technical restrictions hold (e.g. sphericity of the estimated correlation matrix) that here we do not discuss.

(39)

Example. Effect of treatments in three times A response is measured

three times for each sub- ject (pre-treatment, post- treatment, and in a later follow-up). Each subject re- ceives randomly one of three treatments: A, B, or the control

The Control is chosen as ref- erence level.

data=read.table(col.names=

c("treat","PreY","PostY","FollowY"), text="

A 0 0 9

A 6 6 3

A 8 2 6

A 7 6 4

A 6 12 6

A 13 3 8

B 8 11 27

B 9 3 26

B 12 0 18

B 3 0 14

B 3 0 25

B 4 2 9

Control 4 3 7 Control 8 7 20 Control 2 0 10 Control 5 8 14 Control 1 0 11 Control 8 9 10")

data2= within(data, treat <-

relevel(treat, ref ="Control")) attach(data2)

(40)

Univariate and Multivariate ANOVA

rep m=lm(cbind(PreY,PostY,FollowY)~treat) a) Univariate ANOVA

summary(rep m)

• response PreY

Residuals:

Min 1Q Median 3Q Max

-6.6667 -2.6250 -0.1667 2.2500 6.3333 Coefficients:

Estimate Std. Error t value Pr(>|t|) (Intercept) 4.667 1.491 3.129 0.00689 **

treatA 2.000 2.109 0.948 0.35801 treatB 1.833 2.109 0.869 0.39840

Residual standard error: 3.653 on 15 degrees of freedom

Multiple R-squared: 0.06875, Adjusted R-squared: -0.05541 F-statistic: 0.5537 on 2 and 15 DF, p-value: 0.5861

(41)

• response PostY

Residuals:

Min 1Q Median 3Q Max

-4.833 -2.667 -1.083 2.167 8.333 Coefficients:

Estimate Std. Error t value Pr(>|t|) (Intercept) 4.5000 1.7051 2.639 0.0186 * treatA 0.3333 2.4114 0.138 0.8919 treatB -1.8333 2.4114 -0.760 0.4589

Residual standard error: 4.177 on 15 degrees of freedom

Multiple R-squared: 0.05875, Adjusted R-squared: -0.06675 F-statistic: 0.4682 on 2 and 15 DF, p-value: 0.635

• response FollowY

Residuals:

Min 1Q Median 3Q Max

-10.83 -2.00 -0.50 2.75 8.00 Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 12.000 2.105 5.700 4.2e-05 ***

(42)

treatA -6.000 2.977 -2.015 0.0621 . treatB 7.833 2.977 2.631 0.0189 *

Residual standard error: 5.156 on 15 degrees of freedom Multiple R-squared: 0.5915, Adjusted R-squared: 0.537 F-statistic: 10.86 on 2 and 15 DF, p-value: 0.001214

- From the F-statistic: only at the follow-up the three treatments have different effects on the response.

Indeed the F-statistic compare the current model with a model with the constant only; in such a case with only a factor as covariate it is a test statistic for the nullity of the coefficients of the factor (treatment).

- From the sign of the coefficients

αA − αControl and αB − αControl

at the follow-up the treatment A has a negative effect w.r.t the Control, while the treatment B has a positive effect.

(43)

b) Multivariate ANOVA

> library(car)

> mult_rep_m=Anova(rep_m)

> summary(mult_rep_m)

- Estimate of the error matrix in the complete and in the re- duced model

Type II MANOVA Tests:

Sum of squares and products for error:

PreY PostY FollowY PreY 200.16667 84.66667 72.50000 PostY 84.66667 261.66667 90.66667 FollowY 72.50000 90.66667 398.83333

--- Term: treat

Sum of squares and products for the hypothesis:

PreY PostY FollowY PreY 14.7777778 -4.666667 0.1111111 PostY -4.6666667 16.333333 -92.6666667 FollowY 0.1111111 -92.666667 577.4444444

(44)

- Multivariate tests

Multivariate Tests: treat

Df test stat approx F num Df den Df Pr(>F) Pillai 2 0.7456838 2.774307 6 28 0.0303162 * Wilks 2 0.3186403 3.343317 6 26 0.0141337 * Hotelling-Lawley 2 1.9364644 3.872929 6 24 0.0076269 **

Roy 2 1.8259052 8.520891 3 14 0.0018134 **

From a multivariate point of view we could say that the effect of the three treatments is different on the response.

For better understanding when the differences among the treat- ments occurred, we can study the following transformations of response variables:

PostY-PreY FollowY-PostY FollowY-PreY

Post_Pre=PostY-PreY FU_Post=FollowY-PostY FU_Pre=FollowY2-PreY2

(45)

• PostY-PreY

> reg1=lm(Post_Pre~treat); summary(reg1) Residuals:

Min 1Q Median 3Q Max

-8.1667 -1.5833 0.8333 1.8333 7.8333 Coefficients:

Estimate Std. Error t value Pr(>|t|) (Intercept) -0.1667 1.8028 -0.092 0.928 treatA -1.6667 2.5495 -0.654 0.523 treatB -3.6667 2.5495 -1.438 0.171

Residual standard error: 4.416 on 15 degrees of freedom

Multiple R-squared: 0.1215, Adjusted R-squared: 0.004338 F-statistic: 1.037 on 2 and 15 DF, p-value: 0.3786

(46)

• FollowY-PostY

> reg2=lm(FU_Post~treat); summary(reg2) Residuals:

Min 1Q Median 3Q Max

-10.1667 -3.4167 -0.1667 3.7500 7.8333 Coefficients:

Estimate Std. Error t value Pr(>|t|) (Intercept) 7.500 2.307 3.250 0.00538 **

treatA -6.333 3.263 -1.941 0.07131 . treatB 9.667 3.263 2.962 0.00969 **

Residual standard error: 5.652 on 15 degrees of freedom

Multiple R-squared: 0.6192, Adjusted R-squared: 0.5684 F-statistic: 12.19 on 2 and 15 DF, p-value: 0.0007167

(47)

• FollowY-PreY

> reg3=lm(FU_Pre~treat); summary(reg3) Residuals:

Min 1Q Median 3Q Max

-8.3333 -3.8333 -0.3333 3.4167 9.6667 Coefficients:

Estimate Std. Error t value Pr(>|t|) (Intercept) 7.333 2.246 3.265 0.00522 **

treatA -8.000 3.176 -2.519 0.02362 * treatB 6.000 3.176 1.889 0.07838 .

Residual standard error: 5.502 on 15 degrees of freedom

Multiple R-squared: 0.566, Adjusted R-squared: 0.5081 F-statistic: 9.78 on 2 and 15 DF, p-value: 0.001912

We can conclude that only at the follow-up the effect of the three treatments becomes statistically significant.

Riferimenti

Documenti correlati

• After developing a model to approximate Y through appropri- ate covariates, if an additional sample unit and its covariates are given without the accompanying value of Y , the

This difference between West Flemish and German follows, if we assume, as I have argued in chapter 3, that in West Flemish only finite verbs can move to the head of AspP.. Else

Fill in the blanks with “can” or “can’t” and number the pictures1. swim

Record: used at module level to define a user-defined data type containing one or more elements.. [Private | Public]

Now we modify the last row of M n by adding a proper linear combination of the other rows. This operation does not change the determinant of M n , but it will help us to

The presented work made an attempt of calibrating an existing hydraulic model of a selected water supply network with the use of individual water demand patterns devised

Therefore, when we think that the stochastic process of a certain application looses memory, tend to assume values independent from those at the beginning, as time goes to in…nity,

essential facilities doctrine – la quale stabilisce che il titolare dell’infrastruttura non duplicabile, la cosiddetta essential facility, in talune circostanze può