View of The meaning of kurtosis, the influence function and an early intuition by L. Faleschini

(1)

THE MEANING OF KURTOSIS, THE INFLUENCE FUNCTION

AND AN EARLY INTUITION BY L. FALESCHINI*

A.M. Fiori, M. Zenga

1.INTRODUCTION

The oldest and most common measure of kurtosis is the standard fourth mo-ment coefficient: 4 4 2 2 2 X E P P E V P § · ¨_© ¸_¹ (1)

which was introduced by K. Pearson (Pearson, 1905).

There has been much discussion over whether E2 measures peakedness, tail weight or the “shoulders” of a distribution. A convenient way of grasping the meaning of E2 is to ask how E2 changes as observations are added to an existing distribution. To our knowledge, the author who first answered this question was Luigi Faleschini in 1948.

In this work we rediscover Faleschini’s methodology and show how it can be used to assess the relative importance of tail heaviness and peakedness in deter-miningE2 (section 2). The implications of Faleschini’s approach for a normal dis-tribution are detailed in section 3. In section 4 we establish an unexpected con-nection between a partial derivative computed by Faleschini and the influence function of E2, which appeared much later in statistical literature (Hampel, 1968 and 1974; Ruppert, 1987). Our conclusions are outlined in section 5, where a short parallel is drawn between Faleschini’s methodology and Zenga’s new ap-proach to kurtosis (Zenga, 1996 and 2006).

2.FALESCHINI’S DERIVATIVE ANDE2

Faleschini (1948) proposed the following methodology. Consider a statistical variable X consisting of n real values, x1, x2, ..., xn, sorted in increasing order (x1<

This paper reflects the common thinking of the authors, although, more specifically, sections 1, 3 and 5 are due to M. Zenga, sections 2 and 4 to A. M. Fiori.

(2)

x2< ... < xn). Denote the corresponding frequencies by f1, f2, ..., fn and the total

size of the distribution by n ₁ _r

r

P

¦

f . To observe how E2 is changed by obser-vations added at some particular point, say xr, compute the partial derivative of E2 with respect to the altered frequency fr. According to the sign of wE2/wfr it is easy

to identify the ranges where adding new observations raises (lowers) E2. As a con-sequence, the centre, flanks and tails of a distribution may be quantitatively speci-fied and their relative importance in determining E2 may be gauged from the ab-solute value of wE2/wfr.

Next is the computation of Faleschini’s derivative. Referring to the binomial theorem: 0 ( ) ( 1) k k i k i i i k a b a b i § · _{¨ ¸} © ¹

¦

(2)

Faleschini used the following representation of a central moment of order s (s 1, 2, 3, 4, ...): 1 0 1 ( ) ( 1) n s s r r r s i i s i i x f P s m i P P P § · _{¨ ¸} © ¹

¦

(3) where: 1 1 n s i s i r r r m x f P

¦

is the moment of order (s i) from the origin and P stands for the arithmetic mean (P{ m1). Elementary rules of differentiation lead to partial derivatives of

ms i and Piwith respect to fr:

2

1 1

1 1 1 1 ( ) n n s i s i s i r _r r _r r r n r r r s i r s i m x f x f f _f x m P w w

¦

₍₄₎

1 2 1 1 1 1 1 1 1 ( ). i n n i r _r r _r r r n r r r i r i x f x f f _f i x P P _P P P w w

¦

(5)

(3)

Combining (4) and (5) according to (3), we obtain the partial derivative of Ps with respect to fr :

^

`

0 1 0 0 0 1 1 ( ) ( 1) 1 1 ( 1) ( ) ( ) 1 ( 1) ( 1) ( 1)! ( ) ( 1) ( 1) ( 1)!( )! i s i s s i i r r s i s i i i r s i r s i i s s i s i i i i r s i i i s i r i s m i f f s x m i x m i P P s s x m i i P s x s i i i s i P P P P P P P P w w w w § · ¨ ¸ © ¹ § · _{¨ ¸} © ¹ § · § · ° ® ¨ ¸ ¨ ¸ ° © ¹ © ¹ ¯

¦

1 . i s i m P ½ _¾ ¿ (6)

With the substitution j (i 1) the last summation is equal to: 1 ( 1) 0 1 ( 1) s j j s j j s m j P § · _¨ _¸ © ¹

¦

and (6) reduces, using (2) and (3), to the simple expression:

1 1 {( )s ( ) } s r s r s r x x s f P P P P P P w w (7)

the leading term being the s-th power of the distance (xrP).

SinceE2 is a ratio of central moments, a standard differentiation argument im-plies that: 4 2 2 4 2 2 2 2 2 4 4 2 4 2 2 4 3 2 2 4 2 4 2 1 2 1 1 {[( ) 4 ( )] 2 [( ) ]}. r r r r r r r f f f f x x x P P P P E _P _{P P} P P P P P P P P P P P P w w w w w w w w § · ¨ ¸ _§ _· © ¹ ¨ ¸ © ¹

If the standardized quantity (xrP) P₂ is denoted by zr and the skewness

co-efficient P P₃ ₂3 2 by J1, the partial derivative of E2 with respect to fr may be

(4)

4 2 2 2 1 2 2 2 2 2 2 2 1 1 ( 4 2 2 ) 1 [( ) ( 1) 4 ] . r r r r r r z z z f P z z P E E J E E E E E J w w (8)

In the symmetric case (J1 0) closed-form solutions are available for the equa-tion wE2/wfr 0. Writing: 2 2 2 2 2 (z_r E ) E E( 1) we have: 2 2 2( 2 1) r z E r E E . Hence: 2 2( 2 1) r z r E r E E

which is equivalent, for V P₂, to the four solutions:

1,2,3,4

2 2( 2 1)

r

x rP V E r E E .

Owing to the inequality: E2 t 1 (see Stuart and Ord, 1994) the above roots are

al-ways real and break up the range of the statistical variable X into five regions:

(1) 2 1 (1) ( 2 ) 2 ( 2 ) ( 3) 2 ( 3) ( 4 ) 2 ( 4 ) 2 : : : : : : : : : : r r r r r r r r r r r r n Ƣ Ƣ Ƣ Ƣ Ƣ Region I x X x is positive f Region II x X x is negative f

Region III x X x is positive

f Region IV x X x is negative f Region V x X x f w w w w w w w w w w d d d d d d . r is positive

New observations lower E2 if added within regions II and IV, raise it if added

within regions I, III and V. Regions II and IV can therefore be identified with the

flanks, region III with the centre, regions I and V with the tails. Since wE2/wfr is of order z_r4, it grows very rapidly as xr decreases below x(1)_r or increases beyond

(5)

( 4 ) r

x . Hence it is clear that E2 is primarily a measure of tail behavior and only to a

lesser extent of peakedness.

3.THE CASE OF THE NORMAL DISTRIBUTION

For any symmetric distribution with E2 3 (including the normal as a special

case), (8) reduces to:

2 2 2 1_[( ₃₎ _6] r r z f P E w w (9) with roots: 1,4 2,3 2, 334 0,742 . r r x x P V P V r r

In the Gaussian model (Figure 1 - bottom) the centre (|xrP| d 0,742 V) ac-counts for nearly 55% of the data, the tails (|xrP| t 2,334 V) for less than 2%. The steepness of the function wE2/wfr (Figure 1 - top) indicates however that tail outliers drastically affect the value of E2.

Figure 1 – The behavior of Faleschini's derivative for E2 3 (top), with indication of the ranges cor-responding to the centre, flanks and tails (bottom). Both charts are drawn for standard parameter values (P 0, V 1), the lower represents a normal density function.

(6)

Relating (9) to a normal distribution, Faleschini argued that “a symmetric curve with E2 !3 will be iperbinomial (i.e. leptokurtic) because some values at the centre

and in the tails will have higher frequency than in the normal curve with the same mean and variance, and some values in the flanks will be underweighted com-pared to the corresponding normal curve. The situation is reversed for distribu-tions with E2 3” (Faleschini, 1948). Since E2 3 is not sufficient to ensure the

normal like behavior of a symmetric distribution (see Balanda and MacGillivray, 1988, or Kale and Sebastian, 1996) the above argument should be used with cau-tion.

4. E2AND THE INFLUENCE FUNCTION APPROACH

Faleschini’s derivative was nearly overlooked by statistical literature and a number of erroneous interpretations of E2appeared in the following decades (see

Balanda and MacGillivray, 1988 or Zenga, 2006 for a comprehensive review). It was only in 1987 that Ruppert succeeded in “quantitatively analyzing the meaning of

kurtosis” by means of Hampel’s influence function (IF).

The influence function approach (Hampel, 1968 and 1974) focuses on statisti-cal measures which may be expressed as (real-valued) functionals, i.e. mappings from a set of probability distributions to the real numbers. For a distribution F with finite fourth moment, the kurtosis coefficient (1) is the functional defined by: 4 2 2 2 ( ) ( ) ( ) F F F P E P where: ( ) [ ( )]s ( ) s F t F dF t P f P f

³

for s 2, 4 and: ( )F t dF t( ) P f f

³

.

The IF of E2is computed as follows. Consider the cumulative distribution

func-tion: 0 for ( ) 1 for x t x G t t x ® _t ¯ (10)

(7)

( ) (1 ) ( ) _x( )

F t_H H F t HG t 0 dHd1. (11)

“FH is nothing but F with contaminants at x and H is the proportion of contami-nation” (Ruppert, 1987). Then,

2 2 0 ( ) ( ) lim FH F H E E H p (12) is the (one-sided) derivative of E2 at F in the direction toward Gx and is called the influence function of E2 at F and x [IF (x; F, E2)]. Formula (12) has a heuristic

interpretation as the change of E2 caused by an infinitesimal contamination at

the point x, standardized by the size of the contaminating mass. This is logically related to Faleschini’s derivative and it is shown below that computing the IF of

E2 from its own definition (12) does indeed result in the same expression as

wE2/wfr.

If P(FH) denotes the mean of the contaminated distribution, it follows from (11) that:

(F_H) (1 ) ( )F x

P H P H (13)

and the same argument generalizes to the central moments:

( ) (1 ) ( ( ))s ( ) ( ( ))s s FH t FH dF t x FH P H f P H P f

_³

for integer s. Replacing P(FH) by (13) we have: ( ) (1 ) {[ ( )] [ ( )]} ( ) [(1 )( ( ))] s s s F t F x F dF t x F H P H P H P H H P f f

³

₍₁₄₎

where the integrand may be expanded binomially by (2).

Application of (14) for s 2 yields the variance of the contaminated distribu-tion: 2 2 2 2 2 2 2 2 2 2 2 ( ) (1 ){ ( ) ( ( )) } (1 ) ( ( )) ( ) {( ( )) ( )} ( ( )) F F x F x F F x F F x F H P H P H P H H P P H P P H P (15)

and for s 4, after some computation and rearrangement, the fourth central mo-ment obtains:

(8)

4 4 4 4 3 2 3 3 2 3 2 2 2 4 4 ( ) ( ) {( ( )) ( ) 4 ( )( ( ))} ( ( )){4( ( )) 4 ( ) 6 ( )( ( ))} 6 ( ( )) {( ( )) ( )} 3 ( ( )) . F F x F F F x F x F x F F F x F x F x F F x F H P P H P P P P H P P P P P H P P P H P (16)

We are now in a position to compute the influence function of E2:

4 4 2 ₀ 2 2 2 2 2 2 2 4 4 2 2 2 0 2 2 ( ) ( ) 1 ( ; , ) lim ( ) ( ) ( ) ( ) ( ) ( ) 1 lim . ( ) ( ) F F IF x F F F F F F F F F H H _H H H H _H P P E H P P P P P P H P P p p ½ ® ¾ ¯ ¿ Using (15) and (16) we obtain, as Hp 0,

2 4 2 4 2 4 3 2 2 4 2 2 1 ( ; , ) { ( ) [( ( )) ( ) 4 ( ) ( ( ))] ( ) 2 ( ) ( )[( ( )) ( )]} IF x F F x F F F x F F F F x F F E P P P P P P P P P P

and replacing (xP( ))F P₂( )F by z and P₃( ) [ ( )]F P₂ F 3 2 by J1 it reduces to:

2 2

2 2 2 2 1

( ; , ) ( ) ( 1) 4

IF x F E z E E E J z (17)

which is nothing but Faleschini’s derivative (8) multiplied by P (the size of the distribution) and rewritten in a continuous random variable notation (zr is substi-tuted for z). Although Faleschini considered E2 as a moment ratio and not as a

functional of a probability distribution F, he had already answered the basic ques-tion addressed by Ruppert: how does E2 change if we throw in an additional

ob-servation at some point on the real line? The partial derivative wE2/wfr is indeed the first intuitive notion of the influence function of E2 and provides the

“quanti-tative understanding of kurtosis ” invoked by Ruppert (1987).

It is worth mentioning that Ruppert did not compute (17) but a slight variant called the symmetric influence function (SIF)1. He started with a symmetric

distri-bution F with P(F) 0 and contaminated it at r x with equal probability, so as to leave the position of P(F) unchanged. The resulting SIF of E2 differs from (17)

only in the last term (which obviously disappears), hence it provides the same in-formation as the IF of E2 when the underlying distribution is itself symmetric (we

refer the reader to Fiori (2005) for further details).

1_{It is interesting to note that Ruppert himself referred the}_SIF_ofE_{2 to an earlier intuition by} Darlington (1970).

(9)

5.CONCLUSIONS

We have shown that Luigi Faleschini devised an influence function approach to kurtosis in 1948, i.e. twenty years before the IF was available in Hampel’s thesis (1968) and nearly forty years before the publication of Ruppert’s SIF for E2 (1987).

By Faleschini’s derivative wE2/wfr we have illustrated that peakedness and tailed-ness are both components of kurtosis, and E2 is clearly dominated by tail weight.

An alternative to Faleschini’s idea of adding observations at a point in a fre-quency distribution is to consider transfers between different units for a fixed dis-tribution size. This technique is central to income inequality studies and appears to have first been used in kurtosis measurement by Zenga (1996, 2006). By trans-formations that increase (decrease) kurtosis in an observed population Zenga ob-tained a “two-faced” variant of the Lorenz curve which he termed the “kurtosis diagram”.

Besides allowing for a partial ordering of distributions by kurtosis, the kurtosis diagram suggests how to construct new kurtosis measures with desirable proper-ties. Interested readers are referred to the appropriate references.

Dipartimento di Metodi Quantitativi ANNA MARIA FIORI

per le Scienze Economiche ed Aziendali MICHELE ZENGA

Università degli Studi di Milano-Bicocca

REFERENCES

K.P. BALANDA, H.L. MacGILLIVRAY(1988), Kurtosis: a critical review, “The American Statistician”,

42, n. 2, pp. 111-119.

R.B. DARLINGTON (1970), Is kurtosis really peakedness? “The American Statistician”, 24, n. 2,

pp. 19-22.

L. FALESCHINI (1948), Su alcune proprietà dei momenti impiegati nello studio della variabilità, asim-metria e curtosi, “Statistica”, 8, pp. 503-513.

A.M. FIORI (2005), La curtosi: nuovi aspetti metodologici e prospettive inferenziali, Tesi di Dottorato

in Statistica Metodologica ed Applicata, Università degli Studi di Milano-Bicocca, Di-partimento di Metodi Quantitativi per le Scienze Economiche ed Aziendali.

F.R. HAMPEL (1968), Contributions to the theory of robust estimation. Ph.D. Dissertation,

Univer-sity of California, Berkeley, Department of Statistics.

F.R. HAMPEL (1974), The influence curve and its role in robust estimation, “Journal of the American

Statistical Association”, 69, pp. 383-393.

B.K. KALE, G. SEBASTIAN(1996), On a class of nonnormal symmetric distributions with a kurtosis of three, in H.N. NAGARAJA, P.K. SEN, D.F. MORRISON (eds.), Statistical theory and applications: papers in honor of Herbert J. David, Springer-Verlag, New York.

K. PEARSON (1905), Skew variation, a rejoinder, “Biometrika”, 4, n. 1/2, pp. 169-212.

M. POLISICCHIO, M. ZENGA (1997), Kurtosis diagram for continuous variables, “Metron”, 55, pp.

21-41.

A. POLLASTRI (1999), Analysis of the kurtosis in a skew distribution, “Metron”, 57, pp. 131-146. D. RUPPERT (1987), What is kurtosis? An influence function approach, “The American

(10)

A. STUART, J.K. ORD (1994), Kendall’s advanced theory of statistics (Sixth Edition), vol. I: Distribution Theory, Edward Arnold, London.

M. ZENGA (1996), La curtosi, “Statistica”, 56, pp. 87-102.

M. ZENGA (2006), Kurtosis, in S. KOTZ, C.B. READ, N. BALAKRISHNAN AND B. VIDAKOVIC (eds.), En-cyclopedia of Statistical Sciences (second edition), John Wiley and Sons, New York.

RIASSUNTO

L’interpretazione della curtosi e la sua curva di influenza in un’intuizione di L. Faleschini

In questo lavoro si è mostrato come Luigi Faleschini, in un articolo apparso su Statistica nel 1948, abbia per primo intuito la derivazione della curva di influenza del coefficiente di curtosi.

SUMMARY

The meaning of kurtosis, the influence function and an early intuition by L. Faleschini

In this work we have shown that an onverlooked paper by Luigi Faleschini (1948) con-tains the first intuition of the influence function approach to kurtosis.