Statistical tests:
Comparing two distributions: iKolmogorov Smirnov (KS) test
why? Establish if 2 distributions are significantly different or if they are drawn from the same
originai distribution
No assumption on the distribution kind (no models) technically named “non parametric test”
N
N Ntot
Dmax
The KS test can also be used to compare an observed distribution with an expected one.
Let's assume that we got the following 4 values [1.5, 3.3, 4.7, 6.8] which are a fair representation of the observed data set. This means that if we acquire more date, 25% are expected to have a value between 0 and 1.5, 50% between 0 and 3.3 etc,
Let's compare the oserved distribution with an exponential distribution of mean = 5
An exponential distribution of mean has this form
y=e
−1
On the right the corresponding cumulative functions.
What is the fraction of expected values below 1.5, 3.3 etc in a exponential distribution of mean
=5 ?
1−e−0.2⋅1.5=0.26 1−e−0.2⋅3.3=0.48
1−e−0.2⋅4.7=0.61
1−e−0.2⋅6.8=0.75
Dati 1.5 3.3 4.7 6.8 Empirical
Fraction lowest estimate
0 0.25 0.50 0.75
Empirical fraction largest estimate
0.25 0.50 0.75 1.0
Theoretical
fraction 0.26 0.48 0.61 0.75
Largest
deviation 0.26 0.23 0.14 0.25
one must check (on Tables) if this distance is
Size = 4. The two samples are not significantly different
Test too allows one to compare two
distributions to verify if they are compatibie, provided that one can divide each distribution into classes.
2
Let's suppose that a typical environment (i.e. galaxy groups) is characterized by the following
morphological content:
10 % Ellipticals, 20 % S0s, 30 % Early Spirals (i.e. Sa >
Sb) and 40 % Late Sprials (i.e. > Sbc)
Let suppose that I have 3 distinct galaxy samples For which I want to check if their morphological
Sample 1 : 18 E, 35 S0, 83 S early 104 S late
Total size 240 ( E 8%, S0 15%, S early 34%, S late 43%)
Sample 2 : 27 E, 50 S0, 80 S early 120 Slate
Total size 277 ( E 10%, S0 18%, S early 29%, S late 43%)
Sample 3 : 12 E, 20 S0, 100 S early 130 Slate
Totale 262 ( E 4%, S0 8%, S early 38%, S late 50%)
2=∑
i=1
k Oi−Ei2 Ei
Sample 1 :
Observed 18 35 83 104 Expected 24 48 72 96
2=∑
i=1
k Oi−Ei2 Ei
2=18−242
24 35−482
48 83−722
72 104−962 96
2=36
24 132
48 112
72 64 96
2
Sample 2 :
Observed 27 50 80 120 Expected 28 55 83 111
2=27−282
28 50−552
55 80−822
83 120−1112 111
2= 1
28 25
55 4
83 81 111
2=1.27
Sample 3 :
Observed 12, 20, 100, 130 Expected 26 52 79 105
2=12−202
20 20−522
52 100−792
79 130−1052 105
2=64
20 322
52 212
79 252 105
2=34.42
the fredom degrees are 3 k=4 (the
classes) m =0 (as no parameter has been derived from the data)
0.052 =3=7.82
=k−m−1
Looking at the 3rd row of the Table we see that Sample 1 can be considered different at 90% c.l.
Sample 2 at 5% c.l. And Sample 3 at more than 99.5 % c.l.
IDL
if we want a graphic window which is not erased By superposition we must digit
Idl> window, 0, retain=2
we can change the size of the window adding to the previous command also
xsize=1000,ysize=1000
Exercise 3
Make a plot of the function sin x and cos x (with x ranging between and ) and of their sum.
All graphs must be in the same plot and have different colors.
−
Exercise 4
Make on the same page 3 distinct plots representing the following functions
y=ex y=log x y=e−x cos x
Save the result in a postscript file
How to make a multiple plot?
Idl> !p.multi=[0,2,3] ---> 2 colums, 3 rows idl>!p.multi=[0,1,2] ---> 1 column 2 rows
How to save the result in a postscript file
Idl> set_plot, 'PS'
Idl> device,filename='namechosen.ps' Idl> plot,x,y
idl>device, /close Idl> set_plot, 'X'