• Non ci sono risultati.

Computational Statistics: Editorial

N/A
N/A
Protected

Academic year: 2021

Condividi "Computational Statistics: Editorial"

Copied!
3
0
0

Testo completo

(1)

Computational Statistics (2006) 21:183-185 D O I 10.1007/s00180-006-0258-7

9 P h y s i c a - V e r l a g 2006

E d i t o r i a l

W h e n I have been invited as guest e d i t o r for this special issue, I re'Mized t h a t this i n v i t a t i o n arrived ten years after my first s t e p s in t h e s t a t i s t i c a l analysis for i n t e r v a l - v a l u e d d a t a .

I c a m e in touch with Edwin D i d a y and o t h e r people working in Symbolic D a t a in 1995; immediately, I was e x t r e m e l y fascinated by the c o m p l e x - d a t a p o t e n t i a l for c a r r y i n g information. Before long, I s t a r t e d focusing m y a t t e n - tion on continuous variables valued as intervals. However, ten years later I feel strongly a n d even more t h a t in real world d a t a are as t h e y are: we do have neither single-valued nor interval-valued variables. N a t u r e of variables d e p e n d s on our c a p a b i l i t i e s to measure them. T h a n k s to the ideas of E d w i n D i d a y and to his r e v o l u t i o n a r y Symbolic D a t a Analysis a p p r o a c h , I saw t h a t in parallel to the t a n g i b l e world there exists a n o t h e r world t h a t c a n n o t be di- rectly measured. I refer to d a t a t h a t are related to the definition of concepts. T h e s e d a t a can exist only in t e r m s of interval d a t a ; in fact, these definitions are g e n e r a t e d to identify a set of s t a t i s t i c a l units with their n a t u r a l variabil- ity. Working on i n t e r v a l - d a t a s t a t i s t i c a l analysis, I ran across o t h e r cases in which interval d a t a coding can be profitable.

On the above premise, we can d i s t i n g u i s h different sources of interval d a t a . A p p r o p r i a t e identification of interval d a t a n a t u r e is ticklish p r o b l e m and it is necessary to a d o p t consistent analysis t r e a t m e n t s . To simplify the descrip- tion, we can group i n t e r v a l - d a t a sources into t h r e e m a i n categories.

Interval d a t a g e n e r a t e d by imprecise m e a s u r e m e n t s or r e p e a t e d measures. T h i s condition refers to all cases in which we are not able to o b t a i n precise measures of the investigated variables. In o r d e r to avoid t h e v a r i a b i l i t y in- d u c e d by the m e a s u r e m e n t errors inflates the a c t u a l p h e n o m e n a variability, variables can be coded in t e r m s of interval d a t a , where the lower a n d up- p e r limit indicate the lowest a n d t h e highest registered measure, respectively. U n d e r this condition, interval s p r e a d s are e x t r e m e l y n a r r o w with respect the m e a s u r e d values.

A second s i t u a t i o n occurs when d a t a refer to concepts having not p r a c t i - cally m e a s u r a b l e dimensions. A classical e x a m p l e is given by the technical specifications or by the d e s c r i p t i o n of species and different p r o d u c t varieties, more generally. Let us t h i n k to an a n i m a l species or to a kind of wine: re- lated descriptions are necessarily "imprecise" in order to value t h e v a r i a b i l i t y a c c o u n t a b l e to. In t h e same c o n t e x t we can also include interval d a t a result- ing from queries to large or huge d a t a b a s e s , where t h e p a i r m i n i m u m and m a x i m u m value replaces the mean value.

L a s t condition refers to intervals associated to couples of variables t h a t arc c o m p l e m e n t a r y w i t h respect to a given concept. T r a d e flow d a t a for a given

(2)

184

good are defined by the couple of variables: import, export. Analogously, data collected in the customer satisfaction surveys, where expected and per- ceived satisfaction scores are indicated with respect to a set of given items. It turns out clear that the different interval data sources share the properties of being described by a pair of ordered values. It is also evident that these different data sources require different treatments according to their specific nature.

This CS special issue intends to present the different aspects in the statistical treatment of interval valued variables as much as possible. Selected papers are characterized with respect their methodological contribution and their application field. There are three domains of interest: factorial analysis, classification and clustering, linear regression.

I received twenty papers; among these, on the basis of the referees reports and on my personal evaluation ten were selected. Unfortunately, for sake of space some interesting papers have been discarded. Reader will notice that papers are not presented according to the first author alphabetic order; they have been classed according to their methodological approach through statistical analysis of interval data.

The paper from Diday and Billiard opens the issue presenting some basic definition and the most basically concept for interval data. Authors present the descriptive statistics for interval data.

Four papers are focused on clustering approaches dealing with interval data. The paper from Chavent et al. and the paper from De Carvalho et al. present some dynamic clustering algorithms. In their paper, D'Urso and Giordani face the problem of the outliers effects in classification. Their attention is focused on a robust classification algorithm. In the clustering framework, the paper i'rom Irpino and Tontodonato proposes the classical two step analysis (factorial analysis + clustering) generalized to interval data. They discuss many aspects related to the distance measures and their consistency in the interval data analysis approach.

Three papers concern to various issues in classification. Duarte-Silva and Brito present an interesting comparison among linear discriminant analysis methods. In this paper are compared the three most important approaches in the interval data coding. Authors highlight very well advantages and drawbacks of the three approaches. Most of their results can be general- ized to other analysis fields. In their paper, Ichino and Ishicawa introduce the Cartesian System Model for inter class analysis in pattern classification problems. Empirical proof of the interval-data statistical analysis usefulness is underlined by the paper of Cariou. The author presents a classification tree approach in the electrical consumption profiling.

(3)

185

data analysis problem using approaches derived from the interval arithmetic theory. More specifically, Gioia and Lauro focus their attention on the Prin- cipal Component Analysis and propose a new and original approach treating intervals as they arc without any coding. Corsaro and Marino propose a complete and critique overview of the interval arithmetic methods for solv- ing equations systems. Their attention is mainly focused on those problems that are more frequent in the statistical analysis. In particular they present a comparison among three interval arithmetic based methods for computing linear regression parameters.

However, it is worth to say that manifested interests through interval-data statistical treatments still do not have equal in real world problems. In actual fact, despite their consistency with the real world phenomena, there is a small number of applications of interval data analysis in facing real world problems. Among twenty papers, I received only two papers focused on practical aspects and only one is in this issue.

Although many topics have bccn touched, there are many others of intcr- est or potentially interesting for the interval-data statistical treatment. It appears quite clear that classical approaches based on the analytical prob- lem solutions cannot face with the complexity linked to the interval-data. Future challenges will point the attention on the operational research and lincar and non-linear programming techniques. Interesting developments in this direction have already manifested in the interval data treatment and in many other statistical fields.

My special thanks are due to more thirty referees I have involved in the revision process. Their comments were accurate and rigorous; they made the job easier form me. I am also indebted with Rosaria Romano for her help in keeping contacts with authors and referees.

Mace~uta, April 2006

Francesco Palurnbo

Dipartirnento di Istituzioni Econorniche e Finanziarie University of Macerata - I T A L Y

Riferimenti

Documenti correlati

The Minicorsi of Mathematical Analysis have been held at the University of Padova since 1998, and the subject of the Lectures ranges in various areas of Mathematical Analysis

GUHA-method in Data Mining, Pavelka Style Fuzzy Logic, Many-valued Similarity and Their Applications in Real

Of those surveyed, only 12% said that communication with the general public is truly relevant, whereas a limited 10% said that the relation with general journalists is

I pubblici con cui gli scienziati e i tecnici del Regno Unito ritengono più facile comunicare sono i giornalisti scientifici e gli uomini dell’industria (29%

Infine, gli imprenditori o i lavoratori autonomi (spesso coinvolti in interventi di promozione della creazione di impresa o di rafforzamento dell’imprenditorialità) sono presenti

Use of all other works requires consent of the right holder (author or publisher) if not exempted from copyright protection by the

I riferimenti a cui ispirarsi per la loro azienda agricola di quasi 500 ha, i Salvagnin li hanno trovati verso la fine degli anni 90 negli Sta- ti Uniti, dove in agricoltura si

(ii) Ad- ditive Manufacturing allows the connection between digital 3d printers and software development, (iii) Augmented Reality applied to support manufactur- ing processes,