Course title:

(1)

Course title: Basic statistics for research

Instructors: Eva Riccomagno and Maria Piera Rogantin

Description: The course consists of eight four-hour meetings and covers two main topics, exploratory data analysis and statistical inference. Each meeting includes a theoretical session and a practical session based on the software R. The practical sessions are based on datasets from case studies used to exemplify the theory. Participants are encouraged to provide datasets related to their research interests. Weekly assignments will be set to practice the techniques illustrated. Assignments will include bookwork exercises and data analyses with the software.

Essential probability concepts are scattered throughout the second part.

At the end of the course students will be able to perform the learnt techniques on their own datasets and to decide on the suitability of the learnt techniques for the analysis of their datasets.

Course structure

First part: univariate and multivariate exploratory data analysis

Lecture 1 [October 28th]: Introduction to the course and to the software R: quantitative and qualitative variables, basic data representation including graphical representations, the data matrix. Categorical and qualitative data. Row and column profiles. Barplot. Data structures in R, reading datasets in R, operations with data, for cycle.

Lecture 2 [Novenber 4th]: Analysis of univariate data and related computations in R:

frequency distributions (percentage distribution, cumulative distribution); Centrality measures (mean, median and mode); Dispersion measures (range, percentiles and quantiles, variance, standard deviation) also in subgroups. Histograms, dot-plots, box-plots. Corre- lation and Pearson R for bivariate data.

Lecture 3 [November 11th]: Cluster analysis: distance measures, hierarchical aggregation, aggregation index and dendrogram, decomposition of inertia. R code and output interpre- tation.

Second part: statistical inference

Lecture 4 [November 18th]: Fundamental concepts in parametric inference (point estima- tion, confidence sets and hypothesis testing). Point estimate of mean and proportions.

Law of large numbers. The binomial random variable. Introduction to statistical hypothesis testing (formulation of null and alternative hypothesis, choice of the test statistics, significance level and rejection region). Type I and II errors.

Lecture 5 [November 25th]: Hypothesis tests on a population mean. The normal random variable and the central limit theorem. Composite hypothesis, power function and p-value.

Sample size. The z-test, t-test, Wald-test. Test on a proportion.

Lecture 6 [December 12th]: Test for the equality of means (paired and non-paired samples).

More on normal random variable. Multiple tests. Abuse and misuse of statistical hypothesis testing in scientific research. Confidence intervals (one and two-sided, confidence levels, relationship with tests). Distribution free hypothesis tests I: example of distribution free statistics, the sign-test, Wilcoxon-Mann-Whitney test.

Lecture 7 [December 16th]: Distribution free hypothesis tests II: goodness-of-fit tests (chi- square goodness-of-fit tests, one and two sample Kolmogorov-Smirnov goodness-of-fit tests,

1

(2)

warnings on the use of non-parametric testing procedures). Linear models and anova I:

introduction, inference on the coefficients, inference on the mean responses, analysis of the residuals)

Lecture 8 [December 20th]: Linear models and anova II: tests of subsets of coefficients, pre- diction of the response and related error. ANOVA: univariate (one-way, two-way for crossed factors and two-way for nested factors), the Kruskall Wallis test. MANOVA: multivariate anova with and without repeated measures.

Notes: Slides and datasets are available at http://www.dima.unige.it/ rogantin/IIT/ as well as bibliographical references and some of the assignments.

December 19, 2016

2