BIVARIATE BOXPLOTS, MULTIPLE OUTLIERS, MULTIVARIATE TRANSFORMATIONS
AND DISCRIMINANT ANALYSIS:
THE 1997 HUNTER LECTURE
ANTHONY C. ATKINSON
1{ AND MARCO RIANI
21
The London School of Economics, London WC2A 2AE, UK
2
Istituto di Statistica, UniversitaÁ di Parma, Italy
SUMMARY
Outliers can have a large in¯uence on the model ®tted to data. The models we consider are the trans- formation of data to approximate normality and also discriminant analysis, perhaps on transformed observations. If there are only one or a few outliers, they may often be detected by the deletion methods associated with regression diagnostics. These can be thought of as `backwards' methods, as they start from a model ®tted to all the data. However such methods become cumbersome, and may fail, in the presence of multiple outliers. We instead consider a `forward' procedure in which very robust methods, such as least median of squares, are used to select a small, outlier free, subset of the data. This subset is increased in size using a search which avoids the inclusion of outliers. During the forward search we monitor quantities of interest, such as score statistics for transformation or, in discriminant analysis, misclassi®cation prob- abilities. Examples demonstrate how the method very clearly reveals structure in the data and ®nds in¯uential observations, which appear towards the end of the search. In our examples these in¯uen- tial observations can readily be related to patterns in the original data, perhaps after transformation.
# 1997 John Wiley & Sons, Ltd.
Environmetrics, 8, 583±602 (1997)
No. of Figures: 12. No. of Tables: 1. No of References: 23.
KEY WORDS