• Non ci sono risultati.

Exploratory Data Analysis part 3

N/A
N/A
Protected

Academic year: 2021

Condividi "Exploratory Data Analysis part 3"

Copied!
7
0
0

Testo completo

(1)

Exploratory Data Analysis part 3

Eva Riccomagno, Maria Piera Rogantin

DIMA – Universit`a di Genova

http://www.dima.unige.it/~rogantin/UnigeStat/

(2)

Indices for quantitative variables

Position indices – central tendencies

- median: Q2 = min{x | F (x) ≥ 0.50}

- mean: x = 1n Pni=1 xi

- trimmed mean: mean of the 90% of the “central” data - mode: value with maximal frequency

1. Pni=1 (xi − x) = 0

2. Pni=1 (xi − x)2Pni=1 (xi − a)2 per ogni a ∈ R - the mean is centroid of

the distribution (equilibrium point)

- the mean is affected by out- liers, the median is not

60 80 100 120 140

● ●

● ●

● ●

●●

● ●

● ●

● ● ●

● ● ● ● ● ●● ●

3. Pni=1 |xi − Q2| ≤ Pni=1|xi − a| for all a ∈ R

(3)

Variability indices

• ranges

- range (R): max - min - interquartile range (IQR): Q3−Q1

• indices based on the deviations from a central value - variance and standard deviation

(V(X) or σX2 or σ2 – std(X) or σX or σ)

V(X) = 1 n

n X

i=1

(xi − x)2 std(X) =

v u u t

1 n

n X

i=1

(xi − x)2

(Variance and standard deviation can be defined with n − 1)

- mean absolute deviations (from mean and median) 1

n

n X

i=1

|xi − x| 1 n

n X

i=1

|xi − Q2|

• variability w.r.t. central value coefficient of variation CV(X) = std(X)

x (if x 6= 0).

Properties: σ ≤ R2 and |x − Q2| < σ

(4)

Mean and variance in a population and its sub-groups

Two subgroups: A and B

nA and nB units, fA and fB frequencies

xA and xB means and σA2 and σB2 variances

Frequency

10 20 30 40 50

010203040

Frequency

10 20 30 40 50

010203040

nA = 100, xA = 9.9, σA2 = 6.5; nB = 300, xB = 30.0, σB2 = 4.4 Mean

xtot = fA xA + fB xB (weighted mean) xtot = 100

9.9 + 300

30.0 = 24.98

(5)

Variance

Frequency

10 20 30 40 50

010203040

Frequency

10 20 30 40 50

010203040

xB = 30.0

Frequency

10 20 30 40 50

010203040

Frequency

10 20 30 40 50

010203040

xB = 40.0

σtot2 = fA σA2 + fB σB2  + fA (xA − xtot)2 + fB (xB − xtot)2 weighted variance plus weighted “variance of the means”

(6)

In the example:

above: σtot2 = 80.68 below: σtot2 = 179.53

In general

xtot =

K X

k=1

fk xk

σtot2 =

K X

k=1

fk σk2 +

K X

k=1

fk (xk − xtot)2

total variance =

within classes variance + between classes variance

(7)

R code for white, red and blue histograms

a=rnorm(100,10,2.4);mean(a);var(a) b=rnorm(300,30,2);mean(b);var(b) c=b+10

br=seq(1,50,.5)

hist(a, breaks=br,main=" ", xlab=" ",xlim=c(5,50),ylim=c(0,40)) par(new=T)

hist(b, breaks=br,main=" ", xlab=" ",xlim=c(5,50),ylim=c(0,40),col="red") par(new=F)

hist(a, breaks=br,main=" ", xlab=" ",xlim=c(5,50),ylim=c(0,40)) par(new=T)

hist(c, breaks=br,main=" ", xlab=" ",xlim=c(5,50),ylim=c(0,40),col="blue")

Riferimenti

Documenti correlati

The k-th percentile of a set real numbers partitions it so that k% of its values are below and (100 − k)% are above of the k-th percentile..

Remark 2 : the correlation index detects only the linear depen- dence.. formula of

blood groups, colours, gender, smoking habit,.. qualitative variables; here, in general, more different observed values)..

Prophylactic treatment with glucocorticoid agents may be appropriate for many patients with Graves’ ophthalmopathy whose hyperthyroidism is treated with radioiodine

UN PATRIMONIO LORDO CHE È DATO DA TUTTI GLI INVESTIMENTI FATTI SIA COL PATRIMONIO DELL'IMPRENDITORE CHE CON QUELLO DI SOGGETTI ESTERNI ALL'IMPRESA CIOÈ:. NETTO =

A FONTI DI FINANZIAMENTO A BREVE TERMINE... 15) COSA SIGNIFICA CICLO DI RITORNO DEL CAPITALE INVESTITO NELLE IMMOBILIZZAZIONI (BENI PLURIENNALI)?. IL CICLO DI

LE TRATTENUTE PREVIDENZIALI SONO COSTITUITE DAI CONTRIBUTI CHE IL LAVORATORE PAGA PER LA PARTE CHE GLI SPETTA ALL' INPS (ISTITUTO NAZIONALE DELLA

=&gt; COSTI FISSI CHE NON VARIANO AL VARIARE DELLA PRODUZIONE.. =&gt; COSTI VARIABILI CHE VARIANO AL VARIARE