• Non ci sono risultati.

Exploratory Data Analysis part 4

N/A
N/A
Protected

Academic year: 2021

Condividi "Exploratory Data Analysis part 4"

Copied!
13
0
0

Testo completo

(1)

Exploratory Data Analysis part 4

Eva Riccomagno, Maria Piera Rogantin

DIMA – Universit`a di Genova

http://www.dima.unige.it/~rogantin/UnigeStat/

(2)

Distribution and graphic representation of two quantitative variables

X and Y measured on the same population

The set of the points (xi, yi) and of the corresponding frequencies is the joint distribution of X and Y

The scatter-plot

Example: height and weight of a graduate class students

peso_alt= read.table(file=

"C:/DATA/stid98.txt",header=F,

col.names=c("altezza","peso","genere")) attach(peso_alt)

plot(altezza,peso,pch=16,col="red")

155 160 165 170 175 180 185 190

5060708090

altezza

peso

(3)

Indices for two quantitative variables Covariance

Xc and Y c centred variables

Xc = X − x Y c = Y − y

Γ(X, Y ) = 1 n

n X

i=1

xci yic = 1 n

n X

i=1

(xi − x) (yi − y)

Interpretation

The origin is in the centroid (x, y)

The points in the I and III quadrant give positive contribution to

n X

i=1

xci yic

while the points in the II and IV quadrant give negative contribution

10 12 14 16 18 20

101520

x

y

+

+

(4)

then the covariance is

- positive if –on average– high values of X correspond to high values of Y (and low values of X correspond to low values of Y )

- negative if –on average– high values of X correspond to low values of Y (and low values of X correspond to high values of Y )

- about zero otherwise

10 12 14 16 18 20

5101520

x1

y1

+

10 12 14 16 18 20

101520

x2

y2

~0

●●

10 12 14 16 18 20

0510

x3

y3

(5)

Correlation index (or linear correlation index)

Xs and Y s standardized variables:

Xs = X − x

σX end Y s = Y − y σY

ρ(X, Y ) = 1 n

n X

i=1

xsi yis =

n X

i=1

(xi − x)

q

P (xi − x)2

(yi − y)

q

P (yi − y)2

= Γ(X, Y ) σX σY

Some properties:

- ρ(X, Y ) does not depend on variability of the variables - ρ(X, Y ) is a pure number

- ρ(X, Y ) ∈ [−1, 1]

(6)

Remark 1 : positive correlation does not imply causality

Remark 2 : the correlation index detects only the linear depen- dence. Below ρ(X, Y ) = 0.02

20 25 30 35 40

050100

(7)

Covariance and correlation indices in a population and its sub-groups

Γtot(X, Y ) =

K X

k=1

fk Γk(X, Y ) +

K X

k=1

fk (xk − x) (yk − y)

(cf. formula of the variance)

(8)

Some paradoxes

left:

ρtot < 0, ρ1 > 0, ρ2 > 0, ρ3 > 0 right:

ρtot ' 0, ρ1 > 0, ρ2 < 0

0 10 20 30 40 50

5060708090

a

b

10 12 14 16 18 20

406080100

x

y

Example:

height and weight of a graduate class students (continue)

ρtot = 0.61,

ρM = 0.26, ρF = 0.78

plot(altezza,peso,pch=c(16,17)[genere],

col=c("red", "blue")[genere])

155 160 165 170 175 180 185 190

5060708090

altezza

peso

tot 0.61, M: 0.26, F: 0.78

(9)

Linear transformations of variables:

how the indices change

Univariate case

X variable with x, σX2 and qαX

• change of position: R = X + a

r = x + a σR2X2 qαR = qαX + a

• change of dispersion/scale: T = b X

t = b x σT2 = b2 σX2 qαT = b qαX

• change of both position and scale: Z = b X + a z = b x + a σZ2 = b2 σX2 qαZ = b qαX + a

(10)

R=X+a

0 5 10 15 20

04812

0 5 10 15 20

04812

Z=2X+a

0 5 10 15 20

04812

0 5 10 15 20

04812

05101520

X

05101520

R=X+a

05101520

Z=2X+a

Pay attention to the ticks on the axes

(11)

Two special transformations:

• centered variable:

Xc = X − x xc = 0 σX2 c = σX2

• standardized variable:

Xs = X − x

σX xs = 0 σX2 s = 1

(12)

Bivariate case

X with x, σX2 and qαX Y with y, σY2 and qαY

• change of position: R = X + a, M = Y + c

Γ(R, M ) = Γ(X, Y ) ρ(R, M ) = ρ(X, Y )

• change of dispersion: T = b X N = d Y

Γ(T , N ) = b d Γ(X, Y ) ρ(T , N ) = sign(b d) ρ(X, Y )

(13)

In the bivariate and multivariate cases it is often useful to use standardized variables

●●

−5 0 5 10

0510

original variables

x

y

−4 −3 −2 −1 0 1 2

−3−2−10123

scaled variables

x_sc

y_sc

x=rnorm(100,3,1);y=rnorm(100,3,3); x_sc=scale(x);y_sc=scale(y) par(mfrow=c(1,2)

plot(x,y,pch=19,asp=1,main="original variables")

plot(x_sc,y_sc,pch=19,asp=1,main="scaled variables")

## asp=1 to have same scale both on horizontal and on vertical axis

Riferimenti

Documenti correlati

The k-th percentile of a set real numbers partitions it so that k% of its values are below and (100 − k)% are above of the k-th percentile..

Eva Riccomagno, Maria Piera Rogantin. DIMA – Universit` a

blood groups, colours, gender, smoking habit,.. qualitative variables; here, in general, more different observed values)..

A Fréchet space is a comolete metrizable locally convex space. They are, l'espectively: the space of infinitely differentiable fUtlctions on the unit circle with the topology of

The spatial variation of soil stability infiltration rate in small watershed was studied by Yuan jianping[5].Jia hongwei[6] in Shiyang river basin, such as the establishment of

30: Second derivative of the spectra of a representative MCF-7 group of cells exposed to heat shock-stress in the nucleic acids region, at different experimental time points:

Cameron, Permutation Groups, London Mathematical Society Student Texts, 45 (Cambridge University Press, Cambridge, 1999)..

La linea di prodotto scelta (e selezionata sul portale online della Guida) è ‘l’Or- ganizzazione dei prodotti turistici o servizi turistici’.. Sveva e Mara si propongono di