Some thoughts after the first lecture
Eva Riccomagno, Maria Piera Rogantin
DIMA – Universit`a di Genova
http://www.dima.unige.it/~rogantin/IIT/
Participant interests
The most popular
ANOVA (1/2-way, ANOVA on rank, non-parametric) 10
t-test (parametric and non-parametric) 7
Multivariate (multiple) analysis (multifactorial, PCA) 4
Sample size 2
Other
Outliers 1
Kolmogorov-Smirnov 1
Repeated measure 1
Chi-square 1
and ... the whole world 4
From
https://en.wikipedia.org/wiki/Exploratory_data_analysis
John W. Tukey wrote the book Exploratory Data Analysis in 1977. Tukey held that too much emphasis in statistics was placed on statistical hypothesis testing (confirmatory data analy- sis); more emphasis needed to be placed on using data to suggest hypotheses to test. In particular, he held that confusing the two types of analyses and employing them on the same set of data can lead to systematic bias owing to the issues inherent in testing hypotheses suggested by the data.
The objectives of EDA are to:
• Suggest hypotheses about the causes of observed phenomena
• Assess assumptions on which statistical inference will be based
• Support the selection of appropriate statistical tools and tech- nique
• Provide a basis for further data collection through surveys or experiments
Many EDA techniques have been adopted into data mining, as well as into big data analytics. They are also being taught to young students as a way to introduce them to statistical thinking.
Explorative data analysis
analysis of the available data
Inferential statistics
analysis of the available sample
:
probability
:
information about population or pro- cess