zona profond flusso

(1)

Esempio

Si vuole stabilire se esiste una dipendenza fra il flusso di un corso d’acqua (cioè la quantità di acqua che passa in un minuto) e la profondità del corso d’acqua. I dati sono i seguenti:

zona profond flusso

1 0.34 0.636

2 0.29 0.319

3 0.28 0.734

4 0.42 1.327

5 0.29 0.487

6 0.41 0.924

7 0.76 7.350

8 0.73 5.890

9 0.46 1.979

10 0.40 1.124

Il programma SAS per leggere i dati ed effettuare una prima analisi di regressione è il seguente :

data flusso;

input zona profond flusso;

datalines;

1 0.34 0.636 2 0.29 0.319 3 0.28 0.734 4 0.42 1.327 5 0.29 0.487 6 0.41 0.924 7 0.76 7.350 8 0.73 5.890 9 0.46 1.979 10 0.40 1.124

;

proc reg data=flusso;

model flusso= profond;

plot flusso* profond;

output out=regout p=flussopred r=flussores;

run;

Output SAS :

The REG Procedure Dependent Variable: flusso Number of Observations Read 10

Root MSE 0.60347 R-Square 0.9467 Dependent Mean 2.07700 Adj R-Sq 0.9400 Coeff Var 29.05490

Parameter Estimates Parameter Standard

Variable DF Estimate Error t Value Pr > |t|

Intercept 1 -3.98213 0.54298 -7.33 <.0001 profond 1 13.83363 1.16061 11.92 <.0001

(2)

fl u s s o = - 3 . 9 8 2 1 + 1 3 . 8 3 4 p r o f o n d

N 1 0 R s q 0 . 9 4 6 7 A d j R s q 0 . 9 4 0 0 R MS E 0 . 6 0 3 5

0 1 2 3 4 5 6 7 8

p r o f o n d

0 . 2 5 0 . 3 0 0 . 3 5 0 . 4 0 0 . 4 5 0 . 5 0 0 . 5 5 0 . 6 0 0 . 6 5 0 . 7 0 0 . 7 5 0 . 8 0

Per ottenere il grafico dei residui : symbol v=dot;

proc gplot data=regout;

plot residui*predetti/vref=0;

run;

R e s i d u a l

- 0 . 8 - 0 . 7 - 0 . 6 - 0 . 5 - 0 . 4 - 0 . 3 - 0 . 2 - 0 . 1 0 . 0 0 . 1 0 . 2 0 . 3 0 . 4 0 . 5 0 . 6 0 . 7 0 . 8 0 . 9

P r e d i c t e d V a l u e o f fl u s s o

- 1 0 1 2 3 4 5 6 7

(3)

Si può già intravedere che la dipendenza lineare non è marcata; questo si osserva ancora meglio tramite il grafico dei residui di un modello in cui si è supposta una dipendenza

lineare.

I dati e il precedente grafico dei residui possono indurre a supporre una dipendenza quadratica; si può quindi costruire un modello polinomiale del secondo ordine del tipo:

y = β

0

+ β

1

x + β

2

x

²

+ ε in cui le variabili esplicative sono due, X e X

²

.

Il programma SAS è il seguente :

data flusso;

set flusso;

sqprof=profond**2;

proc reg data=flusso;

model flusso= profond sqprof;

output out=regout p = predetti r = residui;

proc gplot data=sqregout;

plot residui*predetti/vref=0;

run;

quit;

The REG Procedure

Model: MODEL1 Dependent Variable: flusso Number of Observations Read 10

Analysis of Variance

Sum of Mean

Source DF Squares Square F Value Pr > F Model 2 54.10549 27.05275 346.50 <.0001 Error 7 0.54652 0.07807

Corrected Total 9 54.65201

Root MSE 0.27942 R-Square 0.9900 Dependent Mean 2.07700 Adj R-Sq 0.9871 Coeff Var 13.45294

Parameter Estimates Parameter Standard

Intercept 1 1.68269 1.05912 1.59 0.1561 profond 1 -10.86091 4.51711 -2.40 0.0472 sqprof 1 23.53522 4.27447 5.51 0.0009

(4)

GRAFICO DEI RESIDUI

R e s i d u a l

- 0 . 5 - 0 . 4 - 0 . 3 - 0 . 2 - 0 . 1 0 . 0 0 . 1 0 . 2 0 . 3 0 . 4

P r e d i c t e d V a l u e o f fl u s s o

0 1 2 3 4 5 6 7 8

(5)

Il grafico dei residui della regressione polinomiale del secondo ordine presenta già un andamento migliore ma si possono provare altri modelli ad esempio:

 √y = β

⁰

+ β

¹

x + ε

 log(y) = β

⁰

+ β

¹

log(x) + ε

Il primo di questi due modelli è del tutto simile al modello 2, mentre il secondo è motivato dal fatto che i due valori con il flusso e la profondità più alti sono quelli che si discostano maggiormente dalla linearità rispetto agli altri dati e il logaritmo “schiaccia”i valori più alti.

data flusso;

set flusso;

logprof=log(profond);

logflusso=log(flusso);

sqrflusso=sqrt(flusso);

proc reg data=flusso;

model sqrflusso= profond ; output out=sqrtregout p = predetti r = residui;

proc gplot data=sqrtregout;

plot residui*predetti/vref=0;

run;

proc reg data=flusso;

model logflusso= logprof ; output out=logregout p = predetti r = residui;

proc gplot data=logregout;

plot residui*predetti/vref=0;

run;

quit;

(6)

The REG Procedure

Model: MODEL1

Dependent Variable: sqrflusso Number of Observations Read 10

Analysis of Variance

Sum of Mean

Intercept 1 -0.55785 0.11489 -4.86 0.0013 profond 1 4.15836 0.24558 16.93 <.0001

R e s i d u a l

- 0 . 1 9 - 0 . 1 8 - 0 . 1 7 - 0 . 1 6 - 0 . 1 5 - 0 . 1 4 - 0 . 1 3 - 0 . 1 2 - 0 . 1 1 - 0 . 1 0 - 0 . 0 9 - 0 . 0 8 - 0 . 0 7 - 0 . 0 6 - 0 . 0 5 - 0 . 0 4 - 0 . 0 3 - 0 . 0 2 - 0 . 0 1 0 . 0 0 0 . 0 1 0 . 0 2 0 . 0 3 0 . 0 4 0 . 0 5 0 . 0 6 0 . 0 7 0 . 0 8 0 . 0 9 0 . 1 0 0 . 1 1 0 . 1 2 0 . 1 3 0 . 1 4 0 . 1 5 0 . 1 6 0 . 1 7 0 . 1 8 0 . 1 9 0 . 2 0 0 . 2 1 0 . 2 2 0 . 2 3 0 . 2 4 0 . 2 5 0 . 2 6

P r e d i c t e d V a l u e o f s q r fl u s s o

0 . 6 0 . 7 0 . 8 0 . 9 1 . 0 1 . 1 1 . 2 1 . 3 1 . 4 1 . 5 1 . 6 1 . 7 1 . 8 1 . 9 2 . 0 2 . 1 2 . 2 2 . 3 2 . 4 2 . 5 2 . 6 2 . 7

(7)

The REG Procedure Model: MODEL1

Dependent Variable: logflusso Number of Observations Read 10 Number of Observations Used 10 Analysis of Variance

Sum of Mean

Intercept 1 2.66614 0.23833 11.19 <.0001 logprof 1 2.76413 0.25103 11.01 <.0001

R e s i d u a l

- 0 . 4 - 0 . 3 - 0 . 2 - 0 . 1 0 . 0 0 . 1 0 . 2 0 . 3 0 . 4 0 . 5 0 . 6

P r e d i c t e d V a l u e o f l o g fl u s s o

- 1 0 1 2