View of Edgeworth and Cornish Fisher expansions and confidence intervals for the distribution, density and

(1)

EDGEWORTH AND CORNISH FISHER EXPANSIONS AND CONFIDENCE INTERVALS FOR THE DISTRIBUTION, DENSITY AND QUANTILES OF KERNEL DENSITY ESTIMATES C.S. Withers, S. Nadarajah

1. INTRODUCTION

Our aim here is to present nonparametric confidence intervals (CIs) for densi-ties on ℜ based on kernel estimates. Suppose we have a random sample

1,..., n

X X of size n on ℜ with empirical distribution ˆ ,( )F x from a distribution X∼F(x) with density f x( 0) and derivatives f x⋅r( 0),r =1, 2,... finite at some

point x0. Let K(z) be a function on ℜ with the finite values

( ) r ( )i ri ri

B =B K =

∫

z K z dz for r=0,1,..., i=1,2,... and B = . The kernel estimate 01 1

for f x of kernel K and bandwidth (or smoothing) parameter h is ( 0)

1 0 0 1 1 ˆ₍ ₎ ₍ _{) ( )}ˆ n _, h i i f x K x y dF y n− U = =

_∫

− =

∑

(1) where _{( )} 1 _{( / )} h

K x =h K x h− and U1i =K xh( 0−Xi). The bandwidth h h n= ( ) is assumed to satisfy h → as n → ∞ . This estimate has mean 0

0 0 0 1 0 ( , ) ( ) ( ) ( ) ( ) ( )( )r / !. h h r r r T F K K x y dF y K z f x hz dz ∞ f x⋅ h B r = =

_∫

− =

_∫

− =

∑

− (2) Note that the existence of the full Taylor series in (2) is not needed for the results in this paper. The boundedness of the required number of derivatives of f is suf-ficient to validate the results presented.

Now suppose that the kernel K is of order p≥1, that is, B = for 1≤r<p and r1 0

0 rp

B ≠ . So, Ef xˆ( )₀ = f x( ₀)+O h( p). Also var f x{ (ˆ 0)} { (= f x B0) 02+O h( )}/( )nh

so its minimum asymptotic mean square error (AMSE) is

2

0 0

ˆ

| ( ) ( )| ( )

(2)

h cn= −α (3) for c >0 and α =1/(2p+ . This rate y holds of course for any 1) c >0, not just

the AMSE-optimal c c f x= ( , ₀) say. Compare Parzen (1962). (Allowing c to de-pend on x means that ₀ ˆ( )f x no longer integrates to one.) In fact, this choice ₀ of α achieves minimum asymptotic integrated MSE (AIMSE)

2

ˆ

| ( ) ( )| ( )

E f x − f x dx O n= −γ

∫

(4)

under suitable regularity conditions for y=2 /(2p p+ and 1) c >0, not just the AIMSE-optimal c c f= ( ) say. See Theorem 2.1.7 of Prakasa Rao (1983), and (3.4) of Terrell and Scott (1992). Using either AMSE or AIMSE, optimal c can be es-timated using an initial estimate for f. See, for example, Scott et al. (1977). How-ever, the use of such an estimate may destroy its optimality property.

Kernel estimates were introduced by Rosenblatt (1956). Since then there has been a huge literature on the subject. We mention only a few here. Silverman (1986) and Terrell and Scott (1992) show that the gain in optimizing c is generally not great compared with fixed c estimators. For example, the latter have asymp-totic efficiency about 0.9 or 0.8 for f normal or Cauchy. (Terrell and Scott (1992) scale K so that B = ± .) Prakasa Rao (1983) has many related results. For ex-p1 1

ample, some give convergence rates for supx| ( )f xˆ − f x( )|. See Theorems 2.1.10, 12, 15 and 20 and Theorems 2.2.3 and 2.2.4 for similar results. Silverman (1986) and Terrell and Scott (1992) also consider estimates of the form

0 ( ) 0 ˆ

( ) h y ( ) ( )

f x =

∫

K x − y dF y for K of (1). When h p =2 one can choose h(y) to achieve (4) as ifp =4, that is with y =8/9. Optimizing c is one way of decid-ing how to choose h or equivalently how to scale K. Another way is to replace h

by 1/2

2

( )_n ( )_n

h F ₌cn−α_µ F _{, where}

2( )F

µ is the variance and F the empirical dis-_n tribution. One can show that this does not affect the optimal rate γ. Silverman (1986, page 45) suggests this with α =1/(2p+ for 1) p = and 2 c =1.06.

Turning now to CIs for f x , the AMSE and AIMSE criteria are no longer ( ₀) relevant. The obvious criterion is to minimise the coverage error - the difference between the actual coverage probability and the nominal level of the CI - or rather the asymptotic coverage error (ACE). Theorem 2.1.18 of Prakasa Rao (1983) gives a consistent CI for f x but the ACE is not given. The first order CIs are ( 0)

based on Studentizing. Their optimal bandwidth h again has the form (3), giving

( )

ACE O n= −β . We shall refer to β as the ACE rate. Its possible values are 0

β > .

The contents of this paper are organized as follows. Section 2 provides Edge-worth and Cornish Fisher expansions for Studentized versions of the kernel

(3)

den-sity estimate, ˆ( )f x , and its extensions. These expansions are used to derive ₀ various CIs for f x . Section 3 derives first order one- and two-sided CIs using ( ₀) asymptotic Studentization (ε= and empirical Studentization (1) ε = p), where ε is an indicator variable. Section 4 derives second order one- and two-sided CIs. Section 5 considers choosing the ACE-optimal constant c in (3) for first order CIs based on empirical Studentization. Finally, some conclusions are noted in Section 6.

2. CORNISH FISHER EXPANSIONS

Here, we lay the groundwork by showing that the kernel density estimate has Cornish-Fisher expansions with parameter m=nh. Moments and cumulants of the kernel estimate ˆ( )f x in (1) are derived in Section 2.1. These quantities are used ₀ in Sections 2.2–2.5 to provide Cornish Fisher expansions for Studentized ver-sions of f x . Section 2.2 derives Cornish Fisher expansions for ˆ( )0

1/2 1/2 1 1 2 ˆ ( ) , n h Y =m w −w k− (5) where w1= f x( 0), wˆ1= ˆf x( 0), U1=K xh( 0−X), κrh =κr(U1) and 1 r rh rh

k ₌h −κ _{. Section 2.3 derives Cornish Fisher expansions for}

1/2₍ ˆ _), n Y =m W W− (6) where W =( ,...,w1 w ′q) , Wˆ =( , ,wˆ1 w ′ˆq) , q 1≥ , wih =T F Kh( , i), ( ) ( ) i i K z =K z

and wˆih =T F Kh( ,ˆ i). Note that we have suppressed the dependency of w on h ih by simply writing w for i w . Section 2.4 derives Cornish Fisher expansions for ih

1/2 1/2

0 ( ( )ˆ ( )) 21 ,

n

Y =m t W −t W a− (7)

where ( )t W is a real function with finite derivatives and satisfies the asymptotic expansion (see (19) and Withers (1982)):

1 ˆ ( ( )) i r ri i r t W a m κ ∞ − = − ≈

∑

(8)

for r 1≥ and for certain bounded functions ari =a wrih( ) given by Withers (1982). Finally, Section 2.5 derives Cornish Fisher expansions for

1/2 1 ( )ˆ n Y =m t W (9) for 1/2 1 0 ˆ2 ( ) ( ( )) _h t W = w − f x k− and k an estimate of ˆ₂_h k . _2h

(4)

The expansions given are “formal”. We have given the expansions in the full-est possible forms - rather than say to the first or the second order - because they could lead to better approximations and have wider applicability. Often expan-sions give better approximations even if they diverge. We have not attempted to check the existence or the validity of the expansions. Some Cramer-type condi-tions and condicondi-tions on differentiability may be required to ensure existence and validity, see, for example, Hall (1992) and Garcia-Soidan (1998). This task is be-yond the purpose of the present paper.

2.1 Moments and cumulants

The estimate f x of (1) can be viewed as the mean of a random sample of ˆ( )₀ size n, where each observation of the sample is distributed as U . Its ith moment ₁ is 1 1i ( 0 )i ( ) i . ih h ih m =EU =

_∫

K x − y dF y =h w− By (2), 0 0 ( ) ( ) / !r ih r ri r w ∞ f x B_⋅ h r = =

∑

− (10) 0 ( ) i w O h = + (11) for ₀ ( ₀) ( )i i

w = f x

_∫

K z dz. So, w is of magnitude 1 and _ih ₍ 1 i₎ ih

m =O h− . Typi-cally K is symmetric so that (10) is a power series in _{h . Both parametric and}2

nonparametric approaches, using ŵ and ˆ ,F have their merits. We can view ˆwi as the mean of a random sample of size n, where each observation of the sample is distributed as 1 0 0 ( ) (( )/ ) . i i i h U =K x −X =h K x− −X h (12)

The rth cumulant of ˆw is given by 1

1 1 1 ˆ ( ) r r . r w n rh m krh κ ₌ −κ ₌ − ₍₁₃₎

The moments and cumulants of ŵ are polynomials in h and w. An expression for the general krh =k wrh( ) as a polynomial in h and w is given in Appendix B. One can now use (10) to expand k as a formal power series in h with coeffi-rh cients functions of the derivatives of f at x . Note that 0 k is bounded as rh h → 0 and

(5)

1 0 lim₀ r ( 0) 0 . r _h rh r k h m− f x B → = = (14)

2.2 Cornish Fisher expansions for (5)

By (13), ˆ( )f x behaves like the mean of a random sample of size m nh₀ = with rth cumulant k . (Hall (1992, page 211) gave a heuristic justification for this.) By _rh (13), the cumulants of wˆ₁= f xˆ( ₀) satisfy the Cornish-Fisher assumption with re-spect tom nh= . That is 1

1

ˆ

( ) ( r)

r w O m

κ ₌ − _{as m → ∞ . So, the distribution of}

n Y in (5) satisfies the Cornish-Fisher expansions

( ) ( ) n n P x =P Y ≤x ≈ /2 1 ( ) ( ) r ( , ), r rh r x φ x ∞ m− h x λ = Φ −

∑

(15) ( )/ n dP x dx ≈ /2 1 ( ) r r( , _rh), r x m h x φ ∞ − λ =

∑

(16) 1_{( ( ))} n P x − Φ ≈ /2 1 ( , ), r r rh r x ∞ m− f x λ = −

∑

(17) 1_{( ( ))} n P− Φ x ≈ /2 1 ( , ), r r rh r x ∞ m− g x λ = +

∑

(18)

where Φ and φ are the unit normal distribution and density. Note that h , r h , r f , r r

g are functions given in Cornish and Fisher (1960) for r ≤ 4 or in Withers (1984) and that λrh are the coefficients of the expansions. For example, h x1( ,λrh)=

1( , rh)

f x λ =g x1( ,λrh)=λ3h(x2−1)/6 and h x1( ,λrh)=λ3h(x3−3 )/6x . (An al-ternative derivation is to apply Example 1 of Withers (1983) with

0 ( ) h( ) h( ) ( ) T F =T F =

∫

K x − y dF y , a = if _ri 0 i r≠ − , and 1 ari =κrh if i r= − . 1 So, 1 1 0 ˆ ( ( )) r r r f x rhn k mrh κ ₌κ − ₌ − _{, and} 1/2 1/2 1/2 1/2 0 1 2 ˆ1 1 2 ( ( ) ) ( ) n h h h h Y ₌n f x ₋κ κ− ₌m w ₋k k−

has an Edgeworth-Cornish-Fisher expansion with respect to m in terms of the coefficients Ar r, 1− = κ κrh 2−hr/2=λrhh1 /2−r for λrh of (17). By (18), one can replace n and {A_{r r}_{, 1}₋} in the Cornish-Fisher expansions by m and { }λ_rh .) The expansion (15) was noted in (4.83) of Hall (1992). (There is a slip there: µ40 in p y in 2( )

second equation of page 212 should be 2

40 3h 20

µ − µ . This slip is also on pages 269 and 271.)

(6)

2.3 Cornish Fisher expansions for (6)

The joint rth order cumulants of ŵ have magnitude _m1 r− _{for the same reason}

that k is bounded: rh 1 1 1 ˆ ˆ ( ,..., _r) r ( ... ) i i h r w w m k i i κ ₌ − ₍₁₉₎ for 1 ₁ 1 ( ... ) r ( ,..., _r) (1) h r i i k i i ₌h −κ U U ₌O _and i U of (12). This expresses k i i h( ... )1 r in the form ki₁...i hr ( )F . We can also write k i ih( ... )1 r =ki1...i hr ( )w as a polynomial in

h and w: see Appendix A. For example, ( )k ijh =wi j+ −hw wi j. Now fix q and set ( ( ) : 1h , )

V = k ij ≤i j q≤ . So, V is q q× . For x in _ℜq_{, let} _{( )} V x

Φ and φV( )x be the distribution and density of a normal q-vector with mean 0 and covariance V. Then for Y in (6) the Edgeworth expansion (15) has an extension of the form _n

/2 1 ( ) ( ) r ( / , ) ( )/ !, n V r h V r P Y x x ∞ m− b x k x r = Φ ∂ ∂ Φ ≤ ≈ +

∑

− (20) where, for s in _ℜq_, _{( , )} _{( ( ))} r h r b s k =B c s , 0 ( ) r ( ) r ri i B c B c = =

∑

, B c and r( ) B c are ri( ) the complete and partial exponential Bell polynomials tabled on page 307 of Com-tet (1974), c sr( )=kr+2, 1r+( )/{(s r+2)(r+1)} for kr r, 1− ( )s =κ(Ui1,...,U sir) ...i1 sir ,

and we use the tensor sum convention of implicitly summing repeated pairs of in-dices i₁,...,i over their range 1,...,q . (See Appendix B for a definition of these Bell _r polynomials.) Equation (5.67) of Hall (1992) gives a double expansion version of this for q = . 2

2.4 Cornish Fisher expansions for (7)

By Withers (1982), the Cornish-Fisher expansions (15)–(18) also hold with Y n replaced by Y in (7) and n0 λrh replaced by

/2 21

{ r }

h ri ri

A = A =a a− (21)

provided that a is bounded away from 0 and that some regularity conditions ₂₁ ensuring 1 | | | | 1 1 ( ) ( ) 1 ( ) q q j t t j j sup ς exp t K u dF x hu C ς h ∞ +…+ > = −∞ ⎧ ⎫ ⎪ ₋ ⎪ ₋ _{< −} ⎨ ⎬ ⎪ ⎪ ⎩

∑

⎭

∫

(22)

(7)

hold, where C(⋅) is some bounded real valued function. Also by Withers (1982), the first few coefficients in (8) are:

21 i j h( ), a =t t k ij⋅ ⋅ 11 ij h( )/2, a =t k ij⋅ 32 21 323 a =c + c (23) for ₁ _r r ( )/ ₁... _r i i i i t⋅ =∂ t W ∂w ∂w , c21=t t t k ijl⋅ ⋅ ⋅i j l h( ), and c23=t k ij t k lq t⋅i h( )⋅jl h( )⋅q, where we use the tensor summation convention. For example,

/2 0 1 ( ) ( ) ( ) r ( ) n r r P Y x x φ x ∞ m− h x = Φ ≤ ≈ −

∑

(24) for 3 1 0 ( , ) ( ) r ( ) r h r rih r i h x A h x − P He x = = =

∑

, where _{( )} _{( ) (}1 _/ _{) ( )}i r He x ₌φ x − ₋d dx φ x _is the rth Hermite polynomial, P is a polynomial in rih A , and the last summation h

is restricted to r−i odd. For example, 1 1

0,2 ( , h) ih i( ) i h x A P He x = =

∑

and 2 2 1,3,5 ( , _h) _ih _i( ) i h x A P He x = =

∑

for P10h =A11h, P12h =A32h/6, 2 21h ( 11h 22h)/2 P = A +A , 23h (4 11h 32h 43h)/24 P = A A +A , and P25h =A322h/72. By Withers (2000), ( ) ( 1 )r i i He x =E x+ − N =ψ say, where N N∼ (0,1), ψ0 = , 1 ψ1= , x 2 2 x 1 ψ = − , 3 3 x 3x ψ = − , ψ4 =x4 −6x2+ , 3 ψ5=x5−10x3+15x, ....

2.5 Cornish Fisher expansions for (9)

Now consider the Studentized estimate of f x given by (9). We shall con-( 0)

sider two types of estimates with a₂₁= +1 O h( )ε for ε =1 or p. We can apply (24) by noting that Yn1≤ if and only if z Yn0 x a211/2(z m t W1/2 ( ))

− ≤ = − . So, x z= + for δ δ δ= 0+δ1z, δ0= −a21−1/2m t W1/2( )=O m hp( 1/2 p), and 1/2 1 a21 1 O h( ).ε δ ₌ − _{− =} ₍₂₅₎ Also at x z= + , δ 2 1 ( )x He x_i( ) ( )z He z_i( ) ( )z He_i ( )z O( ), φ =φ −δφ ₊ + δ 2 1 1 1 1 0,2 ( ) ( ) ( ) ( ) ( ) _ih _i ( ) ( ), i x h x z h z z P He z O φ φ δφ ₊ δ = = −

∑

+

(8)

1 0 ( n ) ( n ) P Y ≤z =P Y ≤x = 3 /2 2 1 ( ) ( ) r ( ) ( ) r r x φ x m− h x O m− = Φ −

∑

+ = 2 3 /2 1 ( ) ( ){ /2 r r( ) r z φ z δ zδ m− h z = Φ − − + +

∑

1/2 3 1/2 1 2 1 1 0,2 ( ) ( ), ih i i m P He z O m m m δ − δ − δ −δ − + = = −

∑

+ + + + (26) giving 1 (| n | ) P Y ≤ = z 2 ( ) 1 2 ( ){ (Φ z − − φ z z − +δ1 δ02/2)+m h z−1 2( ) 1/2 3 1/2 2 1 2 0 1 1 0 0 0 0,2 ( ) ( ), ih i i m P He z O m m m δ − δ − δ −δ − + = = −

∑

+ + + + (27) 3. FIRST ORDER CIs

The first order CIs for f x are obtained as usual by Studentizing. This can ( 0)

be done using the empirical estimate of mvarf xˆ( )0 =k w2h( ) or by estimating its asymptotic approximation of (14), k w2h( )≈ f x( 0)

∫

K z dz( )2 .

3.1 First order one-sided CIs based on asymptotic studentization Choose 1/2 1 1 0 1 02 ( ) ( ( ))( ) . t w = w − f x w B − (28) Then 2 21 1 2h 1 ( ) a =t k⋅ = +O h . So, ε=1 in (25) and, by (26), 1 1 ( n ) ( ) ( )n P Y ≤z =Φz +O e (29) for 1/2 1/2 1 p n e =m− + +h m h . Taking h as in (3), e_n₁₌O n( −β)_with min((1 )/2, , (p 1/2) 1/2) β = −α α + α − .

The case p ≥ : 2 β has its maximum 1/3 at =1/3α . So, a one-sided lower CI for

0

( )

f x of level _Φ_{( )}_z ₊_{O n}₍ −1/3₎_{and “half-width”}_{O m}₍ −1/2₎₌_{O n}₍ −1/3₎_{is given}

by Y_n₁≤ , that is z

1/2

0 ˆ1 02 1ˆ

( ) ( / ) .

(9)

Replacing z by −z, a one-sided upper CI for f x of level ( 0) Φ( )z +O n( −1/3) and “half-width” _{O n}₍ −1/3₎_{is given by} 1 n Y ≥ , that is z 1/2 0 ˆ1 02 1ˆ ( ) ( / ) . f x ≤ =U w + B w m z (31)

The case p = : 1 β has its maximum 1/4 at =1/4α . So, both the error and width of these one-sided CIs of nominal level ( )Φ z are O n( −1/4).

Note 1. Note t w is strictly increasing. So, ( )1 Yn1≤ if and only if z wˆ1≤t−1(m−1/2z) if

and only if Yn ≤ = z0 m1/2{ (t−1 m−1/2z)−k k1h} 2−h1/2. So, an asymptotic expansion for the left hand side of (29) is given by the right hand side of (15) at x z= . Alternatively, we can 0

“stabilize the variance” by choosing s w with variance ( )ˆ1 ≈s w⋅1( )1 2w B1 02/n=1/n, that is 1/2

1 1 02

( ) 2( / )

s w = w B . That is, choosing t w( )1 =s w( )1 −s f x( ( 0)), a CI with the same

properties as (30) is given by Yn1=m t w1/2 ( )ˆ1 ≤ , that is z

1/2 1/2 2

0 ˆ1 02

( ) { ( / ) /2} .

f x ≥ =L w − B m z (32)

Replacing z by –z, a one-sided upper CI for f x of level ( 0) Φ( )z +O n( −1/3) and

“half-width” O n( −1/3) is given by

1/2 1/2 2

0 ˆ1 02

( ) { ( / ) /2} .

f x ≤ =U w + B m z (33)

A similar idea was used in S3, see page 210 of Hall (1992).

Note (11) suggests estimating θ = by w1 1 ˆ ˆ i i i θ ∞ τ θ = =

∑

for θˆi =w Bˆ /i 0i and { }τi constants adding to 1. Its bias is ( )O h or _{O h if K is symmetric (about 0).}_{( )}2

Also , 1 ˆ ( ) i j h( ) i j mvar θ ∞ τ τ k ij = =

∑

so that _E₍_{θ θ}_ˆ₋ ₎2 ₌_m−1_{θτ τ}_′_B ₊_{O h}₍ 2₊_{h m}_{/ )}_for 0, ( i j)

B= B + . To this degree of accuracy this MSE is minimized by

1_1/1 1₁

B B

τ₌ − _′ − _{, where 1 is an infinite vector of 1’s, giving}

2 1 1 2

ˆ

( ) /1 1 ( / )

Eθ θ₋ ₌m−θ _′B− ₊O h ₊h m _{. Apart from the question of if and how} these infinite sums need to be truncated, clearly this estimate can be used to pro-vide an alternative CI to that of (30). However, we shall see that its asymptotic efficiency relative to the following method is 0 for p≥3.

3.2 First order one-sided CIs based on empirical studentization Choose

(10)

1/2 1 0 2 ( ) ( ( )) _h( ) , t W = w − f x k w − (34) where 2 2h( ) 2h 2 1 k w =k =w −hw . Then 2 2 21 1 h(11) 2 1 2 h(12) 2 h(22) 1 ( p) a =t k⋅ + t t k⋅ ⋅ +t k⋅ = +O h since t⋅1=k2−h1/2+O h( p+1) and t⋅2 =O h( )p . So, ε = in (25). So, by (26), p

1 1

( n ) ( ) ( n ) ( )

P Y ≤z =Φ z +O e ′ =O n−β

for en1′=m−1/2+m h1/2 p and min((1 )/2,(= p+1/2) 1/2) with the maximum

/(2 2)

p p

β = + (35)

at 1/(α= p+ . So, a one-sided lower CI for 1) f x of level ( 0) /( 2 2 )

( )_z _{O n}( −p p+ )

Φ + and “half-width” _{O m}₍ −1/2₎₌_{O n}₍ −p/( 2p+2)₎_{is given by} 1 n Y ≤ , that is z 1/2 0 ˆ1 2 ˆ ( ) ( h( )/ ) . f x ≥L′=w − k w m z (36)

Replacing z by −z gives the corresponding one-sided upper CI

1/2

0 ˆ1 2 ˆ

( ) ( _h( )/ ) .

f x ≤U′=w + k w m z (37)

So, for p = or 2, empirical and asymptotic Studentization achieve the same 1 β

but for p ≥3 empirical Studentization is superior.

3.3 First order two-sided CIs

By (27) with h of (3), P Y(| n1| ) 2 ( ) 1≤ =z Φz − +O e( n2) for

2 1 2 1

2 1 0 p

n

e _{= +}δ δ ₊m− _∼hε ₊mh ₊m− _∼n−β_{and =min( ,1}_β _εα ₋_α_{,(2 +1)}_p _α_{− .}₁₎

If 1ε= the corresponding two-sided CI of level 2 ( ) 1Φ z − +O n( −β) is given by

0

( )

L≤ f x ≤ for L and U of (30) and (31). An alternative with the same as-U ymptotic properties is L≤ f x( 0)≤U, for L and Ũ of (32) and (33). If ε= p

(i.e. using empirical Studentization), the corresponding two-sided CI of level 2 ( ) 1Φ z − +O n( −β) is given by L′≤ f x( 0)≤U′ for L' and U' of (36) and (37).

4. SECOND ORDER CIs

(11)

4.1 Second order one-sided CIs using asymptotic studentization Taking t of (28), by (26), 1/2 1 1 1 0 ( _n ) ( ) ( ) ( ) P Y _≤z ₌_Φ z ₊m− h z ₊O δ ₊m− _so 1/2 1 1 1 0 ( _n ( ) ) ( ) ( )

P Y ₋m− h z _≤z ₌_Φ z ₊O δ ₊m− _{. So, one would expect that}

1/2 1 1 ˆ1 0 ( n ( ) ) ( ) ( ) P Y ₋m− h z _≤z ₌_Φ z ₊O δ ₊m− ₍₃₈₎ if 1/2 1 1 ˆ ( ) ( ) p( )

h z =h z +O m− . Let us take h zˆ1( )=h z w1h( , )ˆ . We now confirm (38).

Given z, set 1 1 1 ( ) ( ) ( , ) m h t W =t w −m h z w− for t w of (28). By Lemma 5.1 of ( )₁ Withers (1983), 1 ˆ ( ( )) i r m ri i r t W a m κ ∞ − = − ′′

≈

∑

for certain functions ari′′ =ari′′( )w .

Also a₁₀′′ =a₁₀, a₂₁′′ =a₂₁, a₁₁′′ =a₁₁−h z₁( ), and a₃₂′′ =a₃₂. Set

1/2 1/2 0 ( ( )ˆ ( )) 21 n m Y ′′ =m t W −t W a− and 1/2 1 ( )ˆ n m Y ′′ =m t W . Then Y_n₀′′ ≤ if x and only if 1/2 1/2 21 ˆ ( ( )_m ( )) m t w −t w ≤ =y a x if and only if 1/2 1/2 1 ( )ˆ ( ) n m

Y ′′ =m t w ≤ = +z y m t w and this occurs with probability

1/2 1 1 ( )x m− h x( ) ( )φ x O m( − ) Φ − ′′ + = 1/ 2 1 2 ( )z m− h z''( ) ( )φ z O e( _n ) Φ − + since 1/2 ( p)

x z O h m h= + + . Here, h ′′ is ₁ h for the coefficients 1 a ′′ . So, ri h z1′′( ) = 2 11 32( 1)/6 A ′′ +A z − = 1/2 1( ) 21 1( ) ( ) h z −a− h z =O h since 1/2 11 11 21 1( ) A ′′ =A −a− h z . So, P Y( _n₁′′ ≤ =z) Φ( )z +O e( _n₂). This confirms (38).

So, a second order lower one-sided CI is

1/2 1/2

0 2 ˆ1 02 1ˆ ˆ1

( ) ( / ) ( ( ))

f x ≥L =w − B w m z m+ − h z . For h of (3), its error behaves as _{∼ +}_δ _m−1_∼_hε₊_{m h}1/2 p ₊_m−1_∼_n−β_for_β ₌_min((1₋_{α εα}_), _,(_p₊_1/2)_α₋_1/2).

(For p ≥2 this rate improves on β using the method of Section 3.2.) Replacing z by −z, a second order upper one-sided CI of level ( )Φ z +O n( −β) is

1/2 1/2 0 2 ˆ1 02 1ˆ ˆ1 ( ) ( / ) ( ( )) f x ≤U =w + B w m z m− − h z . Recall that 2 1( ) 11 32( 1)/6 h z =A +A z − is given by (21)-(23) and (20). Now suppose we take instead 10 1

0 ˆ _{( ) lim} _{( , )}_ˆ h h h z h z w →

= . This introduces another error of magnitude O h since _p( ) h z w has a power series expansion in h. The ₁_h( , ) effect is to add a term of magnitude m−1/2h into e in (38). But n2 m−1/2h O e= ( n2)

so there is no change to the above α and β. Using λ₃₀= ₃

0 lim _h h→ λ = 3/2 20 30 w− w = 1/2 3/2 0 02 03 ( ) f x − B− B , we obtain

(12)

2

210 1, 110 30/2, 320 2 30, 10( ) 30(2 1)/6.

a = a = −λ a = − λ h z = −λ z + (39)

4.2 Second order one-sided CIs using empirical studentization

Now consider the second type of Studentizing. By (26),

1/2 1 1 2 ( _n ( ) ) ( ) ( _n ) P Y −m− h z ≤z =Φ z +O e ′ for 1 1/2 2 p n e _{′ =}m− ₊m h _∼n−β_and =min(1 , ,( +1/2)p p 1/2)

β −α α α− . As before one can show that

1/2

1 ˆ1 2

( n ( ) ) ( ) ( n )

P Y −m− h z ≤z =Φ z +O e ′ . So, a second order lower one-sided CI of level ( )Φ z +O n( −β) for β =2 /(2p p+ is given by =3/(2p+3)3) α and

0 2

( )

f x ≥L ′= 1/2 1/2 1

1 2 1

ˆ h( ) {ˆ h( , )}ˆ

w −k w m− z m h z w− − . The corresponding upper

one-sided CI is f x( 0)≤U ′2 = wˆ1+k w2h( ) {ˆ 1/2 m−1/2z m h z w− −1 1h( , )}ˆ . So, a two-sided CI of level 2 ( ) 1Φ z − +O n( −β) is given by L2′≤ f x( 0)≤U2′.

4.3 An alternative approach

For j ≥1, Example 1 of Withers (1983) gives a one-sided CI of level

/2

( )_x _{O n}( −j )

Φ + of the form V_jm( , )F xˆ ≤µ( )F for a mean ( )µ F =E Y for ( )

Y h X= and h(x) a given function, where, for example,

/2 1 ˆ ˆ ˆ ( , ) ( ) ( , ) j r jn r r V F x µ F n− q F x = = +

∑

, 1/2 1( , ) 2 q F x = −µ x, 1 2 2( , ) 2 3(1 2 )/6 q F x ₌µ µ− ₊ x _{, and} _{( )} r r Y µ =µ . Taking Y =K xh( 0−X), we have n r/2q F zr( , ) m r/2q w zr( , ) − ₌ − _{, where} 1/2 1( , ) 2h q w z = −ν z, 1 2 2( , ) 2h 3h(1 2 )/6 q w z ₌ν ν− ₊ z _and r 1 _{( )} rh h r Y ν ₌ −µ _{. See}

Appen-dix B for these as polynomials in h and w. Replacing x by z, this suggests evaluat-ing the coverage probability of the followevaluat-ing jth order CI for f x : ( 0)

0 ˆ ( , ) ( ), j jm L′′ =V w z ≤ f x (40) where 1 /2 1 ˆ ˆ ˆ ( , ) ( , ) j r jn r r V w z w m− q w z =

= +

∑

. We now evaluate the probability of (40) for the case j=2. Note (40) holds if and only if m t W1/2 m( )ˆ ≤ for z

1 1 ( ) ( ) ( ) m t W =t W +m t W− , t W1( )=w2−3/2w3(1 2 )/6+ z2 , and t of (34). By Lemma 5.1 of Withers (1983), 1 ˆ ( ( )) i, r m ri i r t W a m κ ∞ − = − ′ ≈

∑

(41)

(13)

where the leading a′ are given in terms of ri a for ri t t= by 0 ar r, 1−′ =ar r, 1−

and a11′ =a11+t W1( ). So, the Cornish-Fisher expansions hold for

1/2 1/2

0 ( ( )ˆ ( )) 21

n m

Y =m t W −t W a− with polynomials ( )h xi′ say. By an argument similar to that proving (38) one obtains the probability of (40) as

1/2 1 2 ( )z m− h z( ) ( )φ z O e( _n ) Φ − ′ + ′ for h z₁′( ) = 1/2 1( ) 21 1( ) ( ) h z +a− t W =O h so that the ACE has magnitude 1/2

2

n

m− h e+ ′ . If p ≥ this gives ACE rate 3 β =2/3 for 1/3

α = . If p ≤ this gives ACE rate 3 β =2 /(2p p+ for 3) α =3/(2p+ , the 3) same as achieved by a one-sided CI by empirical Studentization.

4.4 An improvement

One can show that we can improve the rate given above for p ≥3 to that given for p ≥3 if we choose t w so that 1( ) h z1′( )=O h( )ε with 3ε≥ . (This fol-p

lows since the ACE then has magnitude m−1/2hε +en′2.) We now show how to

achieve this with ε= . p

Write t of (34) as _ND−1/2_{. So,}_{N O h}₌ _{( )}p _{is not a function of w, but}_{D and} the derivatives of N and D are. For example, D⋅₁= −2hw₁. Also

1 t⋅ =t1−ND−1/2D⋅1/2= +t1 O h( p+1), 2 t⋅ =−ND−3/2/2=O h( ),p 11 t⋅ =t11+3ND−5/2D⋅1/4−ND−3/2D⋅11/2=t11+O h( p+2), 12 t⋅ =t12+3ND−5/2D⋅21/4=t12+O h( p+1), 22 t⋅ =3ND−5/2/4=O h( )p for 1/2 1 t =D− , 3/2 11 1 t = −D− D and 3/2 12 /2

t = −D− . In this way we can ap-proximate the derivatives of t to ( )_{O h , and so using (21)-(23), approximate the}p coefficients a and ( )_ri h z to ( )_r _{O h . Call these approximations}p _{( )}

ri ri a =a w and ( ) ( , ) r rh h z =h z w . For example, 11 3h/2, a = −λ 32 21 3 ,23 a =c + c 3/2 21 3h, c =D− k

(14)

3/2 2 5/2 1/2 3/2 23 1 2h 2h h(12) 2 1 2h ( 3 1 2) 2h , c = −D− D k⋅ −D− k k = hw k − w −hw w k− 2 1( ) 1h( , ) 11 32( 1)/6. h z =h z w =a +a z − Now choose t w_m( )=t w( )−h z w₁_h( , ). By (41), h z₁′( )= 1/2 1( ) 21 1( ) ( )p h z +a− t w =O h so that the ACE of the CI 1/2 _{( )}_ˆ

m

m t w ≤ has magnitude z e′ giving ACE rate n2

2 /(2p p 3)

β = + for α =3/(2p+ , the same as achieved for a one-sided CI by 3) empirical Studentization. This gives the second order lower one-sided CI

0 2 ( ) f x ≥L = 1/2 1/2 1 1 2 1 ˆ _h( ) (ˆ _h( , ))ˆ w −k w m− z m h z w+ − of level _Φ( )z ₊O n( −β)_{. The} corresponding upper CI is f x( ₀)≤U₂= 1/2 1/2 1 1 2 1 ˆ _h( ) (ˆ _h( , ))ˆ w +k w m− z m h z w− − .

4.5 Second order two-sided CIs using empirical studentization By (27), 1 (| n | ) P Y ≤ =z 1 2 2 0 2 ( ) 1 2 ( )_Φ z _{− −} φ z m h z− ( )₊O(δ ₊m− )₌ 1 2 2 ( ) 1 2 ( )Φ z − − φ z m h z w− h( , )+O e( n3) for e =n3 hp +mh2p +m−2 so that 1 1 2 (| n | ( , )) P Y ≤ +z m h z w− =2 ( ) 1Φ z − +O e( n3)

and this also holds with w replaced by ˆw . Also e_n₃_∼n−β_for

= min( , 1+(2 +1) ,2 2 )=2 /( +2) at =2/( +2)p p p p p

β α − α − α α .

This gives the two-sided CI | (f x0)−wˆ1| ≤ (k w m2h( )/ ) (ˆ 1/2 z m h z w+ −1 2( , ))ˆ of

asymptotic level 2 ( ) 1Φ z − and ACE rate =2 /( +2)β p p . 5. OPTIMIZING THE CONSTANT C IN THE SMOOTHING PARAMETER (3)

5.1 One-sided first order CI based on empirical studentization

Consider the one-sided first order CI (36) with =1/( +1)α p . We noted that, for any 0c > , its coverage error is (O n−β), where =p/(2 +2)β p . We now show how to choose c c f x= ( , 0) to minimize the asymptotic coverage error (ACE). Set

0 1

( ) ( 1) / !r

r f x Br r r

ρ = _⋅ − . Since K is of order p, 0ρp ≠ . We now prove the fol-lowing in terms of h z of (39). For 10( ) ρp > , 0 c f x = ( , 0)

1/( 1) 10 [ ( )/{(2 1) }] p p h z p₊ ρ + _{. For} ₀ p ρ < ,

(15)

1/( 1)

0 10

( , ) [ ( )/ ] p

p

c f x ₌ h z ρ + ₍₄₂₎

and the ACE is improved to O n( −β1), where

1 (p 2)/(2p 2)

β = + + > . This β

surprising result is analogous to the result of Silverman (1986) quoted in the in-troduction. (As for that result, in practice one needs to study the effect of replac-ing ρp by an estimate. We shall not consider that here.)

We now prove these results. By (25),

1/2 1/2 2 ( 2 ) p p p h x z m h_{= −} ρ k− ₊O δ ₊h for δ2=m h1/2 p+1, and 1 1 3 2 ( ) ( ) ( ) ( p) n n P Y _≤z ₌_Φ z ₋φ z e ₊Oδ ₊m− ₊h for 1/2 1/2 1/2 1/2 3 p 2 1( ) 30 ( ) n p h n e ₌m h ρ k− ₊m− h z ₌e ₊O m− h _, 1/2 1/2 30 p 10( ) ( ) n p e ₌m h ρ ₊m− h z ₌n a c−β _, 1/2 1/2 1/2 2 10 ( ) _p _h p ( ) a c ₌ρ k− c + ₊c− h z _,

and α, β as in (35). We want to minimize the asymptotic value of the error |en30|,

that is, minimize |a(c)|. We assume that

_∫

_{K >}3 ₀_{so that}

30 0

w > . The minimiz-ing c is as given. For ρ_p <0, |a(c)|=0 at c of (42) giving ACE rate behaves as

1 1/2 m h n β ε₊ − _∼ − _since_m−1₊_hp _∼_n−p/(p+1)_and 1 /( 1) p p+ >β for p > . For 2 0 p

ρ < , one can show that we can improve this ACE rate β₁ to

2 ( /2 1)/(p p 1) 1

β = + + > by replacing h cnβ ₌ −α_by

3

(1 )

h cn₌ −α ₊c n−α _for_{c of}

(42) and c3=N D/ for N c= p+3/2ρp+1+ch z11( ), and

1/2 1/2

10

( 1/2) p _p ( )/ 2

D_{= − +}p c + ρ ₊c− h z _{, where}

1r( )

h z is defined by the

expan-sion ₁ ₁ ₁ 0 ( ) ( ) ( ) r h r r h z h z ∞ h z h =

= =

∑

. In fact, this process may be repeated using a third term in h to further increase the ACE rate above β₂.

5.2 Two-sided first order CI based on empirical studentization

First note that 2 1

21 1 2 20 p p ( p ) a _{= −} w− ρ h ₊O h + _and 2 1 1 w20 php O h( p ) δ ₌ − ρ ₊ + _{. So,} by (27), 1 2 1 (| n | ) 2 ( ) 1 2 ( ) ( ) ( ) P Y _≤z ₌ _Φ z _{− −} φ z n a c−β ₊O n−β _for 1 1/(p 1) α = + , 1 p/(p 1) β = + , β2 =β1+α1= , 1 a c( )=γ0c−1−γ1cp +γ2c2p+1, γ0=h z20( ),

(16)

2 1/2 1 20 20 1 0 1 0,2 / _p _i _i ( ) i zw w P He z γ ρ − − + = = +

∑

, and 1 2 2 zw20ρp

γ ₌ − _{. The optimal}_c

mini-mizes|a(c)|. Again there are two cases. If the minimum is positive the CI has ACE rate β1= p/(p+ . But if there exists c such that ( ) 01) a c = then the CI has

ACE rate β2 = . Since 1 γ2> there one such c exists if 0 γ0< while if 0 γ1> 0

there will be two such c if γ1 is sufficiently large but otherwise no such c; that is, if

1 10

γ ≥γ , where γ10 is given by eliminating c from 0 a c( )0 =a c( ) 00 = for ( )a c the

derivative of ( )a c .

6. CONCLUSIONS

In this paper, we have given first order and second order one- and two-sided CIs for f x . These CIs are based on Edgeworth and Cornish Fisher expan-( ₀) sions for Studentized versions of the kernel density estimate, ˆ( )f x . Tables 1 ₀ and 2 summarize the main CIs we give in terms of their ACE rate β achieved at their optimal choice of α and their subsection.

TABLE 1

ACE rate β and α for (3) for one-sided CIs (ε= for asymptotic Studentization, 1 ε= for empirical p Studentization, *p = best ACE rate achievable using the c in (3), p = order of the kernel)

ε first order CIs § second order CIs §

1 1/3 at 1/3 if p≥2 3.1 1/2 at 1/2 if p≥2 4.1.1, Hall

1 1/4 at 1/4 if p=1 3.1 2/5 at 3/5 if p=1 4.1.1, Hall

P p/(2p+2) at 1/(p+1) 3.2 2p/(2p+3) at 3/(2p+3) 4.1.2, 4.1.4, Hall, page 222

p* (p+2)/(2p+2) at 1/(p+1) 5.1

TABLE 2

ACE rate β and α for (3) for twosided CIs (ε= for asymptotic Studentization, 1 ε= for empirical p Studentization, *p = best ACE rate achievable using the c in (3), p = order of the kernel)

ε first order CIs § second order CIs §

1 1/2 at 1/2 3.3 2/3 at 2/3 4.5

P p/(p+1) at 1/(p+1) 3.3 2p/(p+2) at 2/(p+2) 4.5

p* 1 at 1/(p+1) 5.2

Hall (1992, Section 4.4.3, page 222) gives one- and two-sided CIs based on boot-strapping with β =2 /(2p p+ and 3) β = p/(p+ using 1) α=3/(2p+ and 3)

1/(p 1)

α = + , respectively. (This is for the case where bias is not estimated. Oth-erwise the formulas for β become too complicated.) We have used two types of Studentizations for the first order CIs given in Section 3. Using asymptotic Studen-tization, we obtain β=1/3 and 1/2 for one- and two-sided CIs using 1/3α= and 1/2 for p ≥ . Using the usual empirical Studentization, these 2 values of β are improved to β = p/(2p+ for one-sided CIs, and 2)

(17)

/(p p 1)

β = + for two-sided CIs both using α =1/(p+ . The second order 1) one- and two-sided CIs given (Section 4) have β =2 /(2p p+ for one-sided CIs, 3) and 2 /(β= p p+ ) for two-sided CIs using 2) α=3/(2p+ and 2/(3) p +2).

We have also considered two cases for choosing the ACE-optimal constant c in (3) for first order CIs based on empirical Studentization. In one case, using this c increases the ACE rate to (p+2)/(2p+ for one-sided CIs and to 1 for two-2) sided CIs. This result is analogous to that of Silverman (1986).

ACKNOWLEDGMENTS

The authors would like to thank the Editor-in-Chief and the referee for carefully read-ing the paper and for their comments which greatly improved the paper.

Applied Mathematics Group CHRISTOPHER S. WITHERS

Industrial Research Limited

School of Mathematics SARALEES NADARAJAH

University of Manchester

REFERENCES

L. BIGGERI, (1999), Diritto alla ‘privacy’ e diritto all’informazione statistica, in Sistan-Istat, “Atti della Quarta Conferenza Nazionale di Statistica”, Roma, 11-13 novembre 1998, Roma, Istat, Tomo 1, pp. 259-279.

L. COMTET, (1974), Advanced Combinatorics, Reidel, Dordrecht, Holland.

R. A. FISHER, E. A. CORNISH, (1960), The percentile points of distributions having known cumulants, “Technometrics”, 2, pp. 209-225.

P. GARCIA-SOIDAN,(1998), Edgeworth expansions for triangular arrays, “Communications in Sta-tistics-Theory and Methods”, 27, pp. 705-722.

P. HALL, (1992), The Bootstrap and Edgeworth Expansion, Springer-Verlag, New York.

E. PARZEN,(1962), On the estimation of a probability density and mode, “Annals of Mathematical Statistics”, 33, pp. 1065-1076.

B. L. S. PRAKASA RAO,(1983),Nonparametric Functional Estimation, Academic Press, Orlando. M. ROSENBLATT, (1956), Remarks on some nonparametric estimates of a density function, “Annals of

Mathematical Statistics”, 27, pp. 883-835.

D. W. SCOTT, R. A. TAPIA, J. R. THOMPSON,(1977), Kernel density estimation by discrete maximum

pe-nalized-likelihood criteria, “Annals of Statistics”, 8, pp. 820-832.

B. W. SILVERMAN,(1986), Density Estimation for Statistics and Data Analysis, Chapman and Hall, New York.

G. R. TERRELL, D. W. SCOTT,(1992), Variable kernel density estimation, “Annals of Statistics”, 20, pp. 1236-1265.

C. S. WITHERS,(1982), The distribution and quantiles of a function of parameter estimates, “Annals of the Institute of Statistical Mathematics, Series A”, 34, pp. 55-68.

C. S. WITHERS,(1983), Expansions for the distribution and quantiles of a regular functional of the empirical

distribution with applications to nonparametric CIs, “Annals of Statistics”, 11, pp. 577-587.

C. S. WITHERS,(1984), Asymptotic expansions for distributions and quantiles with power series

cumu-lants, “Journal of the Royal Statistical Society, Series B”, 46, pp. 389-396.

C. S. WITHERS,(2000), A simple expression for the multivariate Hermite polynomial, “Statistics and Probability Letters”, 47, pp. 165-169.

(18)

APPENDIX A: WAYS TO CONSTRUCT KERNELS OF HIGHER ORDER

We saw in Section 1 that f x , the kernel estimate with kernel K of order ˆ( )0

p has bias ( )_{O h . One can reduce the bias to}p _{O h}₍ p+1₎_{by using}

0 ˆ 0 ˆ 0 1 ( ) ( ) ( ) ( ) / !p p p f x = f x − f x B_⋅ −h p , where ˆ ( ) ( / )p ˆ( ) p f x_⋅ = d dx f x . But this amounts to replacing the kernel K by a kernel of order p+1, giving

0( ) ( ) p( ) p1( 1) / ! (1p p p1( 1) / !) ( )p

K z =K z −K z B⋅ − p = −D B − p K z for = /D ∂ ∂z. If

K is symmetric then p is even so K is of order 0 p + . 2

Example 1. Take ( )= ( )K z φz so p = . Set = /2 D ∂ ∂z. Then

2 2 0( ) (1 /2) ( ) (1 2( )/2) ( ) (3 ) ( )/2 K z = −D φ z = −He z φ z = −z φ z has order 4, 4 2 1 2 4 6 2 4 6 ( ) (1 /8)(1 / 2) ( ) (1 ( )/ 2 ( )/8 ( )/16) ( ) (15 45 15 )/16 ( )/16 K z D D z He z He z He z z z z z z φ φ φ = + − = − + + = = − + − has order 8, 6 4 2 2 2 4 6 8 10 12 ( ) (1 /96)(1 /8)(1 /2) ( ) (1 ( )/2 ( )/8 ( )(1/16 1/96) ( )/(96 2) ( )/(96 8) ( )/(96 8 2) K z D D D z He z He z He z He z He z He z φ = − + − = = − + − + + × = = − × + × ×

(19)

APPENDIX B: CUMULANTS IN TERMS OF MOMENTS

Here, we show how to express 1

1 ˆ ( ) r rh r k ₌m −κ w _and 1 1 1 ˆ ˆ ( ... ) r ( ,..., _r ) h r i i

k i i ₌m −κ w w _{of (13) and (19) as polynomials in}_{h and}

1 2

( , ,...)

w= w w of (10). Let U be a real random variable with finite moments r

r

m =EU and cumulants κr defined as usual by

1 / ! ln(1 ( )) r r r t r S t κ ∞ = ≡ +

∑

for all

t in C for which the moment generating function 1₊_{S t}( )₌_EeUt_{exists. By} equa-tion (2), page 160 of Comtet (1974),

1 1 ( 1) ( 1)! ( ), r i r ri i i B m κ − = =

∑

− − (43)

where for m=( ,m m1 2,...), ( )B m is the partial exponential Bell polynomial defined by ri

1 / ! / ! ( ) / ! i r r r ri r r i m t r i B m t r ∞ ∞ = = ⎛ ⎞ = ⎜ ⎟ ⎝

∑

⎠

∑

for i=0,1,... . These polynomials B m are tabled by Comtet on page 307. For ri( ) example, B mr1( )=mr, B mrr( )=m1r, B m32( ) 3= m m1 2, B m42( ) 4= m m1 3+3m22, and

2 43( ) 6 1 2

B m = m m . An explicit formula for B m is given by ri( )

1 1 1 1 2 ( ) { ( ) ... r : ... , 2 ... } r n n ri r r r n N B m π n m m n n i n n rn r ∈ =

∑

+ + = + + + =

for π(n) the partition function defined by

1 ( ) !/ ( !i !) r n i i n r i n π = =

∏

for 1 2 2 ... r

r n= + n + +rn . Now take U U= 1=K xh( 0−X), where X has distribution

F on ℜ. In the notation of Section 2, 1 r

r rh r m =m =h w− , and 1 r r rh h krh κ ₌κ ₌ − _{. It} follows that 1 1 1 ( 1) ( 1)! ( ), r i i rh ri i k − i h B w− = =

∑

− − (44)

a polynomial in h of degree r − . For example, 1

1h 1, k =w 2 2h 2 1, k =w −hw 2 3 3h 3 3 2 1 2! 1, k =w − hw w + h w

(20)

2 2 2 3 4 4h 4 (4 1 3 3 ) 2! (62 1 2) 3! 1, k =w −h w w + w + h w w − h w 2 2 2 3 3 4 5 5h 5 (5 1 4 10 2 3) 2! (10 1 3 15 1 2) 3! (10 1 2) 4 ! 1. k =w −h w w + w w + h w w + w w − h w w + h w

Analogous to wr =wrh =h EYr−1 r and krh =hr−1κr( )Y , define νrh =hr−1µr( )Y . Expanding in terms of non-central moments gives

2 1 1 1 1 0 0 ( ) ( ) ( 1)( ) r r j j r r rh r j r j j j r r hw w hw w r h w j j ν ₋ − ₋ − = = ⎛ ⎞ ⎛ ⎞ = _{⎜ ⎟} − = _{⎜ ⎟} − + − − ⎝ ⎠ ⎝ ⎠

∑

for w0=h−1. For example, ν2h =k2h =w2−hw12 and ν4h =k4h +3hk22h. The mul-tivariate forms of (43) and (44) can now be written as

1 1 1 ... 1 ( 1) ( 1)! r( ) r r i i i i i i i i B w κ − = =

∑

− − and 1 1 1 1 1 ( ... ) ( 1) ( 1)! r( ), r i i i i h r i i k i i − i h B− w = =

∑

− − where i1...ir( ) i

B m is the multivariate form of B m . These may be written down ri( )

on sight. For example, 2

42( ) 4 1 3 3 2 B m = m m + m is replaced by 1...4 2i i ( ) B m = 1 2 4 1 2 3 4 4 3 i i i i i i i m m + m m

∑

, where ( ... )1 N r f i i

∑

is f j( ... )1 j summed over r all N permutations j1...j of r i i giving different terms. So, 1...r B2i1...i4( )m is

the sum of

∑

4 m mi₁ i₂ i₄ =m mi₁ i₂...i₄ +m mi₂ i i i_{3 4 1} +m mi₃ i i i_{4 1 2} +m mi₄ i i i_{1 2 4} and

1 2 3 4 1 2 3 4 1 3 2 4 1 4 2 3 3 i i i i i i i i i i i i i i i i m m =m m +m m +m m

∑

. So, 1 1 ( ) , h i k i =w 1 2 1 2 1 2 ( ) , h i i i i k i i =w + −hw w 1 2 3 1 2 3 1 2 3 3 2 1 2 3 ( ) 2! , h i i i i i i i i i k i i i =w + + −h

∑

w + w + h w w w 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 4 3 6 2 3 ( ) { } 2! 3! . h i i i i i i i i i i i i i i i i i i i i k i i i i w + + + h w + + w w + w + h w + w w h w w w w = = −

∑

+

∑

+

∑

− For example, 2 2 2 4 1 3 2 1 2 (112) (2 ) 2 h k =w −h w w +w + h w w .

(21)

SUMMARY

Edgeworth and Cornish Fisher expansions and confidence intervals for the distribution, density and quantiles of Kernel density estimates

We show that kernel density estimates ˆ( )f x0 of bandwidth h h n= ( )→0 satisfy the

Cornish-Fisher assumption with parameter m=nh. This allows Cornish-Fisher expansions about the normal for standardized and Studentized kernel density estimates in powers of

1/ 2

m− for smooth functions t. The expansions given are formal and the conditions for existence/validity are not explored. The expansions lead to first order confidence inter-vals (CIs) for f x( 0) of level 1− +ω O n( −β), where β = p/(2p+2) for one-sided CIs and β = p/(p+1) for two-sided CIs, where p is the order of the kernel used. The second order one- and two-sided CIs are given with β =2 /(2p p+3) and β =2 /(p p+2). We show how to choose the bandwidth for asymptotic optimality.