A characterization of monotone and regular divergences

(1)

UNIVERSITAT DE

BARCELONA

A _{CHARACTERIZATION OF MONOTONE AND}

REGULAR DIVERGENCES

by

J.M. Corcuera and F. Giummolé

AMS 1980 _{Subjects Classifications: 62F10, 62B10,} _62A99

BIBLIOTECA DE LA UNIVERSITAT DE BARCELONA

0701570847

Mathematics _{Preprint Series No.} ₂₁₃

(2)

A characterization

of

monotone

and

_{regular divergences}

J.M. Corcuera

Departament d’Estadística

Universitat de Barcelona

Gran Vía de les Corts _Catalanes, ₅₈₅

08007 _Barcelona, _Spain [email protected]

F. Giummolé

Dipartimento di Scienze Statistiche Universitá _{degli Studi di Padova}

Via S. _{Francesco, 33}

35121 _{Padova, Italy}

[email protected]

Abstract

In this _{paper we} _characterize _monotone _and _{regular divergences, which} inelude _{f-divergences} _{as a} _particular _case, _by _giving _{their Taylor} _expansión

up to fourth order. We extend a previous result obtained by Cencov, using

the invariant _properties _{of Amari’s a-connections.}

AMS 1980 _{subjeets classifications:} _62F10, _{62B10, 62A99.}

Keywords and phrases: differential geometry, divergence, embedding

(3)

1

Introduction

The _necessity _{of measuring} _how _different _two _populations _{are appears} _in _many statistical _{problems. A} _wide _class _{of indices} _or _{divergences has} _{been used with} such a finality (for _a comprehensive exposition _see Burbea (1983)). We _are _not

able to _give _an _{universal rule} _for _the _{choice of}_a _divergence _in _{each practical} _case.

Anyway, we can investigate the general properties that an Índex of discrepancy

should _possess _in _order _to _describe _a _{meaningful dissimilarity between populati¬}

ons. For instance, _suppose _to assemble the individuáis of _two finite populations in classes _Ai,...,Am. _Let _{D(Pi,P2) be} _a _convenient _function _{of the} proporti-ons Pi = (P¿(Ai),..., Pi(Am)), i = 1,2, of individuáis belonging to the different

groups in the two populations. We can now decide to join several classes,

ob-taining B\,..., B¡, l < m. If P, = (P,(Pi),..., P¿(P/)), it is natural to demand

that _D(Pi,P2) _< _P(Pi,P2), _since _the _new _{classification brings} _less

informati-on than the _previous _one. _{It is} also _necessary _to ask that the introduction of new artificial classes does not cause _any change _in the divergence. Formally, if

Qi = {qnPi(Ai),...,qiriPi{Ai),..., qmiPi(Am),..., qmrmPi(Am)), i = 1,2, where

El=iqjk

= 1, j = l,...,m, then D(PUP2) = D{Qi,Q2).

Divergencessatisfying thesetwopropertieshave already been studied by Cencov

(1972). He gives their Taylor expansión up to second order, by means of the

in-variance of the Fisher metric. In this _{paper we} _{extend Cencov’s result} _to _fourth

order, using the invariance properties of the a-connections. Following Campbell

(1986), we do not use the language of the categories.

An additional_property_allows _us _to_extend_a _divergence_to _the_case_{of when the} individuáis are classified in an infinite number of_groups. _This _property _expresses

a sort of _continuity _{of the} _{divergence, when} _we _let _{the number of} _{classes tend} _to infinity.

2 Some

basic definitions

In this _section, _we_introduce _operators _representing _an _{Índex of discrepancy betwe¬}

(4)

In the _sequel, we indícate with (X, a) a measurable space. aa C a is a finite

suba-field of a and PQ is the restriction of P, defined _on (X, a), _to _aa.

Definition 2.1 A _divergence _D(P,Q) _is _a _{real-valued function whose} _arguments

are two _probability _measures _defined _on _the _same _measurable _space.

Definition 2.2 Let _(Xi,flí) _and _{(X2,a2) be} _two _measurable _spaces. _We _say _that

K : _Xi _x _a2 _—> [0,1] _is _a Markov kernel, K _£

Stoch{(Xi, at), (X2, a2)}, if

it

satisfies the following properties:

1. _VA2 _€ _a2, _K(-,_A2) _is _a _measurable _map;

2. _Vxi _€ _Xi, _K(x!,•) _is _a _probability _on

_(X2,<*2);

If P is a probability _{measure on} (Xi, at), then K induces _a probability _measure on _{(X2, a2),} _KP, _{defined by}

Let _{D(-, •)} _be _a _{divergence and} _{(Xi, ai) and (X2> a2) be} _two _measurable _spaces.

Definition 2.3 _{D(P, Q)} _is _said _to _be _monotone _with _{respect to} _{Markov kernels if}

(2.1) —00 < _{D(KP, KQ)} _< _D(P,Q) _< _+00,

for every P, Q probability measures on (Xi, a¡), and for every Markov kernel K £

Stoch{(Xi, ai), (Xs, a2)}.

As observed in the _{introduction,} _(2.1) _is _a _natural _{property to} _{require, since} _a transformation _through _a_{Markov kernel will,} _in_general, _{cause a}_loss_{of information} that is well _{explained by} _a _{decreasing of the divergence.}

Monotonicity of a divergence function implies its invariance under a particular

(5)

aj)-Defínition 2.4 K G _{Stoch{(Xi, a¡), (X2,} _a2)} _¿s _said _to _{be Blackwell sufficient} (B-sufficient) with respect to V if there exists N G

Stoch{(X2, as), (Xi, a¡)} such

that _N(KP) ₌ _P, _VP _G _{P. ITe} _say _{that K} _is

_B-sufficient

_{if V is}

_the

_{family of all}

probability measures on (Xi,

o-i)-Proposition 2.1 If D is a monotone function with respect to Markov kernels,

then, for every B-sufficient K,

(2.2) D(P,Q) = D(KP,KQ), VP,Q.

Proof:

D(P,Q) = D{N(KP),N(KQ)) < D(KP,KQ),

that, together with the monotonicity, gives (2.2). ■

This is also natural since a B-sufficient Markov kernel does not cause _any loss

of information.

Corollary 2.1 The valué ofD(P,P) is independent ofP and it is a mínimum

valué _{of the} _function _D _:

D(P, P) = D0 < D(Q, R), VP,Q,P.

Proof: Given a probability measure P, it always exists a Markov kernel K, taking

every probability measure on P: K{x, •) = P(-), Vx € X- Then, D(Q, R) > D(KQ, KR) = D(P, P), VQ, P,

proving that D(P, P) is a minimum valué for D. Now, for every probability measure P', K is B-sufficient with _{respect to} _{the family} _V ₌ _{P,_P'}, _since

N(x,-) = P'(-), Vx € X, transforms P ¡n P'. By (2.2), D(P, P) = D(P',

P')

= D0, VP,P'.

(6)

Definition 2.5 _D(P,Q) _is _said _to _{be regular} _if

(2.3)

D(P, Q)

=

\imD(Pa,Qa),

for every P and Q probability measures on (X, a), where the limit is taken over the

filter of all finite suba-fields aa of a, that is, over any increasing sequence

{a„}

such that _{cr(U„ a„)} ₌ a.

Remark. Since thé restriction of a probability _measure _to a sub<r-field is a par¬ ticular case of Markov kernel, for _monotone divergences the limit in (2.3) is _a supremum.

The _{regularity condition} _enables _us _to _extend _to _{the general} _{case a} _divergence

originally defined on probability measures over finite cr-fields.

3 The

multinomial

case

In the _present _section _we _{consider equivalent} _probability _measures _defined _on _the measurable _space _{(X, am),} _where _am _is _a _finite _{sub<r-field of} _a _{generated by the} _m

atoms _{Ai,...,Am. Every} _probability _measure _P _on _{(X, a) induces} _a _probability

measure on _{(X, am),} _{defined by} _m _valúes, _xi...,xm, _with ₌ _P(A¿) _> ₀ _and xí = 1. Thus, every P corresponds to a point of the simplex

Sm-i x £

Sm-i can be regarded as a surface in the differentiable manifold Rm.

, i = 1,...,

m|,

associated

to_every_point _x _£ _{Rm. If}_x _£ _{Sm-1, the derivative of}_a_function_h(x_i,...,_{xm) along}

dh

a curve _Xi ₌ _ipi(t), _i ₌ _1,... _,m, _{tangent to} _{Sm-1, takes the form} _YT=i

—•

C/X í

Since _Yl'iLi_V’t'ÍO ₌ _1, _then _V’KO ₌ ₀ _and _every _vector _X _{tangent to} _the

simplex Sm-1 can be represented as X — a¿X,, with SíTx a¡ = 0.

There is a _tangent _space Mx, with base _< Xi _—

(7)

For n > _m, let B\,...,Bn be _a partition of X such that A, ₌ U_je/, Bji _* _~

1,...,m, where /1?...,Im is a partition of

{1,..., n}.

For any probability measure

P on (X, a), let _{cr¿ =}

P(A¿),

i ₌ _1,...,_m,

and

_yj ₌

P(Bj), j

₌ 1,...,_n.

Thus

(3.1) Xi =

J2vj,

* = je/, Conversely, define 9¿j = -P(5j|A¿) = — if j G Xi 0 if_j _{& U.} Then m (3.2) yj =

£ftja:¿,

j

= 1,...,n. i=i

Note that _{(%)¿j is} _a _stochastic _{matrix, that is:} _qij _> _0, _V¿,

_Y^j=lq¡j

₌

_1,

_Vi;

and _{qr¡¡qaj =} ₀ _if _r _/ _5, _Vj.

_{(3.2) defines}

_a

_{Markov kernel /}

_: _{5'm_i} _—)•

_{5n_i and}

(3.1) an inverse of /, so that / is B-suíficient with respect to the family of all probability measures on (X, am)- In fact, it is easy to prove that any B-sufficient

Markov kernel with _{respect to} _{the family of} _{all probability} _{measures on} _(X, _am),

can be written in the form (3.2). We cali f _a Markov embedding.

The _jacobian _map _associated _to _{/, /*} _: _Mx _—> _{My, is} _{defined by}

(3.3) f*Xi =

J2qaYP

i = 1,...,m.

j=i

3.1

_Embedding

_invsiriant

_structures

In the _present _section we consider geometrical structures defined on the simplex,

that are invariant with _{respect to} _{Markov embeddings. We characterize invariant}

Riemannian metrics and affine _connections, _{showing that,} _up _{to constant} _factors, they coincides respectively with the Fisher metric and the a-connections.

3.1.1 Invariant Riemannian metrics

Definition 3.1 _If

(8)

where x € _{Sm_i} and y = f(x) € we say that f is an isometry

and that {•,

•)

is invariant under_f. _{(•, •)} _is _said _to _{be embedding invariant}

_if

_it _{is invariant}

_under

every Markov embedding.

The _following _result _was _{first given by} _Cencov _{(1972). Anyway,} _we _{refer the} reader to _Campbell ₍₁₉₈₆₎ _for _an _easier _proof.

Theorem 3.1 The _{only embedding} _invariant _Riemannian _metrics _are _{of the form}

(3.5)

(Xi,Xj)m(x)

=

A^

where A > 0 and _6¡j _is _the _{Kronecker delta.}

Remark. _{(The Fisher} _{metric) Theorem} _3.1 _states _{the unicity,} _up _to _a

multiplica-tive _constant, _of_{embedding invariant metrics. Let} ₌ _X{_—_{Xm, i} ₌ _1,...,_m_—_1.

Then (íí¿,Uj)m(•£) (3.6) (Xi - Xm,X, -

Xm)m(x)

j\ík_

^im

_ _j =

a(

— + \ / \%i %m/

that is the _{same, up} _to _a_constant_{factor, that the Fisher metric in the} _multinomial

case, see Amari (1985, p. 31).

3.1.2 Invariant affine connections

A similarcharacterizationcan begiven for embedding invariant affine connections. For this _{purpose, even} _{though the} _tangent _space _{Mx has dimensión} _m _— _1, _{it will} be betterfor us to work with an overdefined _system _of_m _vectors, _{vx,..., vm,} _such

that:

1. _{{u.-j,..., vim_1}} _is _a _base _{of Mx,} _V{iu_... _,im-ij _C _{l,...,m}; 2- _Z?=1v¿ ₌ _0.

(9)

^ m

We choose = X{ Y'X¡, i ₌ 1,...,_m, where X{ ₌

d

1,...,m is the

usual base of _{Mx in} _{Rm. Notice that} € Mx, x € Sm-u = 1 since vi — I2T-i atXj, with a, = , and a, = 0. We suppose the tangent

. m . .

space is equipped with an embedding invariant Riemannian metric.

We will describe an affine connection on Sm_i through the coefficients

ytJk

obtained by evaluating it on the vectors ui,..., vm:

(3.7) VVtVj =

'%2~/ijkvk,

V*,j

₌

k=1

Since _{{ui,... ,uTO}} _is _not _a _{base of} _Mx,

_7¿jfc,s

_are _not _Christoffel _{symbols for the} connection V _and, _moreover, _they _are _not _{uniquely determined.} _We _choose _the

satisfying the following condition:

(3.8)

£V

= 0,

V?,

j = 1,...,m — 1.

fc=l

Condition _(3.8) _guarantees _{that expression} _(3.7) _is _a _{good definition for V. To} _see this, let 77i m

2 7í/vfc

=

¿

7¿>feUfc,

A:=l that is m

-7¡i*K

= 0. fc=l Since m m—1 0 =

H

~ ~am)vk =>■ ak = am, Vk = 1,...,m - 1, k=i k=i then

7,'/

~

7¿j*

=

7¿jm

“

7¿jm,

Vfc

= 1,...,m - 1.

E(7¿¿

A:=l

7./)

=

mtó"

- = 0, By (3.8),

(10)

that is

_7^m

₌

_7,-jm

_and

7¿/=7¿A

Vfc=

¿From the

7¿jfc’s,

i,j,k = — 1, we can easily

obtain

Christoffel symbols

Tijk

for V, with respect to

the base

17,...,

m m—1 =

Ei¡7>

=

Efe‘-7iJ”K

= k=1 k=1 m—1 m—1 m—1 fc=i /=l fc=i Thus m—1 (3.9)

r¿jfe

=

7.J*

+

£

7.-;*»

;=i

Let _V, _'7ij

_k

_and _Tij

_k

_{denote respectively the affine connection V} _on _{5m_i, its} coefficients in _{(3.7) and the corresponding Christoffel symbols in} _{(3.9). We will}

omit the _superscript m when not necessary.

m

If _{/ is} _a _Markov _{embedding between 5m_i and 5n_i,} _m _< _n, _then _V _induces

through / an affine connection on f(Sm-1), an (m — l)-dimensional submanifold

of _{5n_i,} _{defined by}

f

(wv(x)j,

_vu,veMx,

where _/* _: _Mx _—>■ _My _is _{the jacobian function corresponding} _to _{/. Moreover,} _as

a submanifold of the Riemannian manifold _5n-i, _{f{Sm-1) naturally inherits the}

affine connection of_Sn-\i

VrufViy), w,v e Mx,

where V is the _orthogonal _{projection of} V on the tangent space to /(5m_1). We

can thus _give _{the following definition of invariance:}

Definition 3.2 .4n _affine _connection _V _is _said _to _be _embedding _invariant _{if the}

m

(11)

on by _V, that is

(3.10) /*

(vu

v(x))

=$f.u

_rv(y),

for every Markov embedding f.

Next theorem _gives _a _{characterization} _{of the affine} _connections _defined_on _the pro-bability simplex, that are invariant under Markov embeddings. Following

Camp-bell’s characterization of invariant _metrics, _we _give _a _{proof that does} _not _use Cencov’s _{language of categories.}

Theorem 3.2 The _only _affine _connections _that _are _{embedding invariant} _are _{of the}

form

Vw

=

iU-

+

G^

_(xt

-

S¡t),

where I and G are constants and

_S¡j

_is _{the Kronecker delta.}

Proof: Condition (3.10) of invariance for V can be written in terms of the vectors

of _Mx as

/m \ n

(3.11) /* vj(x) =V/*ví f*vj(y), i,j = 1,...,m.

Consider first the case of m = n. In this _situation, _f(Sm-1) ₌ _{5n_i and} _V

n

coincides with V- If/ interchanges xr and then /* interchanges Xr and Xs,

and thus vr and va. Let xi = ... = xm, that is x =

(

—,..., —

);

then f(x) = x.

\m mj If i = j, by (3.11) _we obtain: fm (^víVí(x)) = /*

{yuk(x)vkj

₌

7ük(x)f*vk

₌

V/„,/u¿(x).

It follows that: 7rrr(x) = 7s/(a:), 7ür(x) = 7¿¿s(a:), i {r,s},

(12)

and

1rr'(x) = 7ss’(x),

íg{r,s}.

Since r and s _may be chosen arbitrarly, _{we may} write

7¿¿ (>e) — Fm and If * # _3,

'ynk{x)

= Gm, i^k. r(v„»y(i)) =

r

(7ü‘(*k)

₌₇₌

v/.„r^(x).

It follows:

7r**(®)

=

7srfc(a:),

7fcr (*^) — 7^3 (•I')? 75itr(a;) = 7rJfeS(x), 7¡j (a:) = 7«j (*c)) where _k,i,j _^ _{{r, s}, which imply}

7ú (*^) ^7717 ® /J / Similarly, 7»> (*^) — 7i« (‘^)í 7ri (**-) = 7s¿ (^)í 7¿rr(x) =

7¿ss(x),

7ri (•*-) = 7

where ¿ _^ _{{r, s},} _{together with} _the _simmetry_condition

7íik

=

7ü*,

imply

(13)

1 1 Thus, for x — ( —,..., — \m m rn j , 7

ijk{x)

₌ Fm ^ —3 — k Gm * =J / ^ Hm ¿/j, k € {i,j} .4» * /i, £ £

Next, let n = hm, where h is an integer bigger than one. If fh is the Markov

embedding defined by y = fh(x) = \-r-, • • •, ~r, • • •,-p-, • • •,■— j, each

compo-\ h h h h 3

nent _being _{repeated h} _times, _{then f*X{} ₌ _— ₊ ... + and f*V{ =

h

1 1 1 71

-(ií(,-_i)A+l

+

...

+

uih)

= t

Y

un

where

u¿ =

Y{

YYj

and Ri

=

{(i

-n h reñ, n j=i l)h + 1,...,ih}, i = For i — j, _we have /*

(Vv, v¿(a:))

₌

_/*

(luk{x)v^j

_=lnk{x)f*vk

=

^

7nk(x)

_r€Rk

Y

ur

T _\

7,■,■*(»)

£

_«r

₊

Y

7«fc(a:)

_«r

)

\ fcjí» r€-Rfc / .

(

^

, ur -(-

^ ^

ur | , h ' r€ü, and V/*„, f*Vi(y) = 1 h2

J_

/i2

J_

/i2

E

VUr

Us(y)

= —

Y

7r.(y)u

r,s£R¡ r,a€Ri

Y

7rr*(y)ttjfc

+

_{E E}

,

_(y)«*

r£Ri r£R¿ s€ñ,

s#r

El%K

+

EEi-‘

(y)«*

+

E

(y)w*

r(¿Ri r€Rik^ir rgfí¿ k^r,s

(14)

rgf?, j€Rí a^r +

_{X! £}

_{Pr/(2/)Ur+7r.*(y)w<}

h? Fn

^2

Ur ur +

(/í

—

l)Gn

^

Ur rgfl, rgR, rgfi¿ +_/i(/l - l)/n

5>r

+

(A

-

l)(/l

-

2)/B

^Ur

+

2(/l

-

l)ffn

5]

Ur r£Ri r£Ri r£R,

Since _VpVi _{f*Vi(y) belongs} _to _the _tangent _space _to

_f(Sm-1)

_in _y, _we

_do

_not

_need

n n n

to _project _it _to _find _V/*n¿ _{f*Vi(y), that} _is,

V=V-Condition _(3.11) _implies: (3.12) Gm = Gn + (h — 1)In and (3.13) Fm =

^

₊

^Gn

₊

íh~1)j.—

2)In

₊

₂

^Hn.

h h h h For i _/ _j, _we _have: /*

(vv, Uj(x))

₌

_/*

(\jk{x)vk^j

_=7ijk(x)f*vk

₌

i

_7¿j*(x)

_5]

r€Rk = 7

E

™ijk(X)

“r+

7V(x)

2

Ur+

7¿/(x)

5]

_Ur r€Rk rgfí¿ rg-R, — Al

Im

^2

Hm

^2

)

> rgR.uR, and V/*v, /*u¿(y) = =

¿EEi

h2

u.(y)

=

lEE

%3k{y)uk

h?

reRi seRj r€Ri_agR,

EEE

(y)«*

+

EEp

_{r«r(y)«r+ 7„-(y)«}

•eR «cR l-¿r. . -1-n -<-R '

(15)

h2 h2In

]T

Ur + -

l)4i

Ur

+

Ur

r£R,Uñj reR,UR} refí.uR,

V/*v,

f*Vj(y)

still belongs

to

the

tangent

space

to

/(5m-i) in

y

and

V

coincides

with V-By (3.11), (3.14) Irn — hln and (3.15)

Hm

=

Hn

+

(h-l)In.

Now, by (3.14), mlm = nln = /,

that is

/

=1

1 m. — m By (3.15), Hm - Im = Hn - In = H, so Hm = H+~. m

A similar _expression _{holds, by} _{(3.12), for G:}

Gm — GH .

m

Finally, (3.13) implies that

-(Fm

-G-2H -

Im)

=

~(Fn

-

G

-

2H

-

/„)

=

F,

m n

that is

Fin = mF4 (- G+2H. m

We can then _write, _for _x ₌

(—,...,

_—

^,

Vm m/

7«fc(*)

=

mF + _//m ₊_G ₊ _i = j = k

G+ _//m _i == j ^ k

H + _I/m _i^j,

ke{i,j}

(16)

(T\

T

\

where _JZíli _ri _— _n, _{and all} _are _{positive integers.} n n J Define — if je Ri ri «7íj = 0 _otherwise,

where = {n ₊ _...₊_{r¿_i} ₊1,...,_n+ _...+

r¿},

¿

₌ 1,...,_m.

The

corresponding

Markov _{embedding /} _{maps x} _to _{y =}

i—,...,

_—

).

_Moreover

n n

and thus

f*Xi — ~

(K1+...+r,_1+i

+

_{• • •}

+

k^n+.-.+r,)

n \ /

/ vi= ~

^ri

_{+ ...+r¿_i +}_l

+

_{• • •}

+

nri+...+ri^

₌ _—

53

_ur. reRi

For i = j, _we have

/*

ÍVv¿

_u¿(x))

_=liik{x)f*vk

= —

7uk{x) 53

un

r* _reRk

and

V/*v; / Vi(y) — 2

Y1

^Ur

us{y)

— ^2

53 ^rs

(y)Uk

ri _r,seRi ri _r,seRi 1

53 lrrk(y)Uk+

Yl

1rsk(y)Uk

rSRi s^r

(nF

'

+

-n

+

G

+

2H)

'

YUr

+

ri

(g+-)Y

Ur r£fí¿ ' n' rgfí, +(r¿ - 1)

(g

+

-)

53

_ur

+

r¿(rf

_-

1)-

53

_ur V _{nJ rlR,} _n _r$R, +(r¿ - 1)(rt- - 2)-_”

53

“r +

2(r¿

-

1 )(h

+

-')

53

ur ' U' r€R, n n

(17)

V=V-Invariance condition _(3.11) _gives: m i/ \ 1 ltil{x) ---- - nF +r,G+ 2r¿i/ H— n. — +G +2i/ + Ixí Xi and V}, k/ \ Tk j xk T ; / •

7„- _(a:) = —<3-1 / = G 1- Ixk, k^i.

r¿ n X{ For x = ri n n , we have: (-Ixí -)- G+ 2H Xi G— + /** Xi H + Ixk Ixk i=j = k i - j i1 k i^j, ke{i,j} Finally, any x G Sm-1 Since the V n n )

can be approximated arbitrarilywell by _{an x} of the form

m i

lijk,s are C°° functions, then

(3.16)

™¡jk(x)

= ¡Xk+ H{Sik +5jk) +Sij Vx e Sm-1.

Tít i

We can now impose condition (3.8) _on the coefficients _7,¿

fe,s.

If i ^ j, _we obtain

H = — for i =j, _we have F ₌ —(3. Substituting theseconditions in (3.16) gives

the result. ■

Remark. _{(a-connections)} _By _(3.9), _{we can} _{obtain the Christoffel symbols of} _any

embedding invariant connection on the simplex Sm_i, with respect to the base

(3.17) Vij

k(x)

=

l(xk-xm-

₊

GSf-

(xk

_-xm-

Sik).

Let us now show that the affine connections characterized _by (3.17) _are _{in fact,}

(18)

1 771—1

m i=l

Let Ui = Vi - vm = Xi - Xm, i — 1,... ,m — 1. Then vm =

^

Using

the _repeated _{índex convention} _{and avoiding} _the _superindex _m,

X UtUj = VV,-Vm(vj

V7n)

~

Vl,tVj XViVm XVmVj d" XVrnVm

—

lij 7/im Vk 'Ijm Vk d~7mm ^k

m—1 —

fTíj

Títm Ijm “hTtnm

^

í

V

^

' m ¿=i í fc ~ I 7ij 7«m 7jm ^ m d*7mm ^ '

(7ij

mfeí

V

7¿m 7jm d“7inm

^

j

+

G^J2(xk-8ik)=

₀

for _¿,_j ₌ _1,2,...,_m _— _1. _Now, _{by (3.8) and} _since

m _771/ _r 1 r \ x ^ k=1 A:=l V Z / X* ¿=1 for ¿ = 1,...,_m, _we have r..fe — ^..k _ -v. k4-'\ k

for — 1,2,...,_m _— 1. That is,

iV

=

i(xk-5ik

+

*Jk-xk

+

$ik d- $mk .

$jk

d- mk

&mk

d"

&

mk

^ Xk d- d- xk +G

(-

_5ih)

_-

_—(xt

_-

_6,t)

_-

^-(x*

_-

_Sjt)

₊ _—

_(x„

_-

_Smt)

Xi Xi Xr, =

Gfarfc(^

+

—)

V V xi Xm I Xi Finally, m—1

rijt = ri/<w =

GEU(-

₊

—1

r=l Xi Xr Xi — + Xr Xr (3.18) = Xm Xi

that _coincides, _up _to _a _constant _factor, _{with the} _coefficient _{of the a-connections in} the multinomial_{case, see} _Amari _(1985, _p. _43).

(19)

3.2 An

_{expansión for D}

To _{study the} _local _{behaviour of} _a _{divergence D,} _suppose _{it is} _weakly _smooth,

that _is, _at _any _point _{of Sm-i it admits} _an _expression _in _local _{coordinates that} _is differentiable _up _to _necessary _order.

Theorem 3.3 At each _point _{P of Sm-i,} any monotone divergence D(P,Q), ad¬

mits the _expansión

D(p,Q) = D0 +

Dl±

WAL-flM1

₊

¿

-

JW

Í=1 P{Ai) i=i p(Aty

+e>3

£

[Q{A%)

-

P(A)}<

₊ _D^

[Q(At)

-

P(At)r

P{*) +o(\\Q -

P\\4),

U =1 P(Ai)

where _{Do, Di,} _D2, _{D3, Z)4} _are _constants, _D\ _> _0.

Proof: Consider in 5m_i the system of local coordinates that to each P £ 5m_i

associates a _{vector p =} (p\,... _where ₌ _P(Ai), _i ₌ _Then

D(P,Q) = d(p,q) is the expression of D in local coordinates. In a neighborhood

of_{p, we} _have:

d(p,q) = d(p,p) +d-i(p,p)(qi -Pi)+

^d.ij(p,p)(qi

-

pt)(qj

-

pj)

+

^d.ijk(p,p)(qi

-

Pi)(qj -Pj)(qk-Pk)

+

^d;ijkh(p,p)(qi

-

Pi)(qj

-

Pj)(qk

-

pk){qh

~

Ph)

+

o(||<?

-

p||4),

where o?... are the _{partial derivatives of d with} _respect _to _the _{two arguments.} _For

any divergence we have that:

d(p,p) = d0, dÁPiP) = 0

and

(20)

since d takes mínimum valué in the diagonal. Moreover, by Proposition 2.1,

mo-notonicity of D implies its invariance; then, ifit does not vanish, that is the only interesting case, d.¿j(p,p) defines an inner product on

Rm_1

which is embedding

invariant. We take _d;!J(p,p) _as _{the metric} _tensor _and _{denote by d’13}

_the

_inverse

matrix of _{d;,¿. By}

_(3.6),

_we _obtain (3.19) d]tj(p,p) = A

(—

+

—)

,

»,i

=

l,...,m-l,

A>0.

VPi Prn) Thus, (3.20) m—1

]T

d;¿J(p,p)(<?¿

-

Pi){qj

-

Pj)

= i,j=l TO_1 (8-1 \ = A

¿

(~T

+

—

)

(Qi

~

Pi){<lj

~

Pj)

i,j=l \Pl Pm/

A

/y

(<?¿

-

Pi)2

|

y-1

(<?¿

-

Pi){qj

-

Pj)

\¿=1 P* i,j=1 Pm a

fy?

(Qi-Pi?

,

(qm-Pm)2\

=

¿r

{qi-Pif

i=i Pi m—1 since _qm _-_pm ₌ _- _(<?, _- p¿)-¿=i

As _regards _the _third _order _term, _by _{differentiating} _(3.19), _we _{obtain that}

(3-21) d.ijk(p,p) + dk;ij(p,p) = A .

\Pm Pi )

On the other _hand, _<4;¿¿(p,p) _behaves _as _{the Christoffel symbols of} _an _affine

con-nection, see Amari (1985, p. 98). Since D is an invariant divergence, the affine connection is _embedding _{invariant and by} _(3.18) _we _can _write

dk-,ij(p,p) = H , i,j,k = 1,...,m — 1.

(21)

By (3.21), (3.22) Thus, (3.23) m—1

2 d;*jk{p,p)(qi

-

Pi){qj

-

Pj)(qk

-

Pk)

= i,j,k= 1 ' m—1 / i r , \ = #

Z

(

T-)

(*

“

#)(#

~ ~

Pfc)

¿,j,fc=l \Pm Pi / r>V'' (9* —

P*)3

=

~b2^—~2—•

! =1 P*

Let us now _study the fourth order _term. By differentiating (3.22), _we obtain

(3.24) d-,ijkh(p,p) +

dh,ijk(p,p)

= B I $ijkh + -j-1

I

•

\ Pi Pm J

Since

_c^ijfcÍP,?)

₌

_{d’htdh;ijk{p,p),}

_behaves

_as

_the

_components

_of

_a

_connection

_string

(see Blaesild (1988)), we can write

=

ViV^efc,

for some covariant derivative V and _any _vector _e. _Moreover, by the invariance of

Z), V must beembedding invariant, that is, the corresponding Christoffel symbols

are of the form

rijk =

g(^t-s-^].

Pm Pi

In order to calcúlate we will need the following _expressions:

(22)

d’rS = A {6rspr _- prps), m—1

£

drs

=

A~lpm(

1

-

pm),

■,s=1 m—1 ^

'

d’ Sjkr = djkPjPmi r,í=1 m—1

^ ) d’ SjkfSisf — A

(dijktPi

&jk&itPiPj

)•

r,í=l

We thus have

^■ijket

—

V,Vjefc

=

V¿(r^ses)

=

á,(rjfcs)es

+

rjfcsV¿es

=

(dfiy

+

rJfcT¿sf)e(.

Now, d\Yjk +

r^r,-,4

— =

-d'rad'htdid;sh

Tjkr +

d}rsd'htY3krYish

+

dhtdiTjkh

= dht

[d;rT,fcr

{-did,ah +

Tiah)

+

diTjkh\

= dht = dcht

(¿- ’f)

_a-1)

*x

(¿+íy'

c(—+—)

í—+^)+f(4-+%

, \Pm Pi

)

\Pm Pj

J

\Prn Pi , Then \Pm Pj ) \Pm

Finally, we substitute the precedingexpression in (3.24), obtaining

(3.25) i„kk{p,p) = C

(—

₊

^) (—

₊

—)

₊

D

(4-

₊

^

\Pm Pj

J

\Pm Pi

J

\Pm Pi , Thus, m—1 (3.26)

_Y1

_{dijkh{p,p)(qi}

-

pi){qj

-

Pj)(qk

-

pk)(qh

-

ph)

= i,j,k,h=l

(23)

= C

(—

₊

^

(qj

-

Pj)(qk

-

pk)

£

(—

+

—\

(qi

-

Pi){qh

~

Ph)

jjíl

\Pm Pj

J

i,h=l \Pm p. /

JyK =

m

+DE

(i. -_3PiY Pt

= C

(±te-^V

\i=1 Pl /

+

D±&-j!£

1 =1 ^1

By (3.20), (3.23) and

(3.26),

we can finally write

d(p-,q) =

+<¿4

Íe"^^)2

+

“(lií

-í>II4).

di > 0, that, written in terms of D, gives the result.

t=i Pi

4

The

_general

_case

In the _{preceding section,} _we _obtained _a _local _{expression for} _monotone _divergences defined on multinomial distributions. _Using the regularity condition, _{we are} able

to extend this _expansión _to _{the general} _case. _We _need _{the following} _result:

Theorem 4.1 Let_f _: _R _—> _R+ _be _a _convex_{function. For} _any_P _{and Q,} _equivalent probability measures on (X, a), there exists a non negative integral

fxf{lP(x))

P(ix)

=

li°m?/

($£))

P(A>)'

where the limit is taken over the _filter of all finite subcr-fields _aa of_a.

Proof: Since

(24)

the thesis can be written in the form

Üam/X

f

PÁ<ÍX)

=

Ix

f

WiX))

P(dX)-It is sufiicient to _prove _it _for _any _increasing _sequence _{{a„} of finite} _subcr-field _of a, such that tr(Un an) = a. Since /-divergences are monotone, see Heyer (1982, Theorem _22.9, _{p.169), and by the} _{remark following Definition} _2.5, _we _obtain:

P(dx).

We show now that the reverse inequality also holds. Since

we can apply _a well known theorem of_convergenceof martingales, thus obtaining

dQn dPn dQ dP 4 E dQ

\

_

dQ

dP

Ü)

_~dP'

Since _{/ is} _a _continuous _{function, the} _convergence _{still holds:}

dQa\

m- r ₍

dQ]

dPj

J\dPr

By the Fatou lemma,

Ix

f

(§W)

_P{ix)

-

U»m/x

f

(^(l))

P{dx)

=

n»m/x

;

(^(I))

K{dx)'

and the thesis is _proved. _■

Remark. The _{preceding theorem} _can _be _easily _extended _to _the _case _{of /} _being

any linear combination of non negative convex functions. We can now _prove _{the main result of the} _present _section:

(25)

Theorem 4.2 _If

/*

Q(dx)P(dx)— P(dx) P(dx) < oo,

then, at each point P, any monotone and regular divergence D(P,Q), admits the

expansión (4.1) D(P,Q) D°+ Dl

¡X

lQ(dx) -

P(dx)]2

P(dx) [Q{dx) -

P(dx)}3

P(dx)2

+D,k

IX [Q(dx)_P{dxf-

P{dx)Y

+o(\\Q-p\\%

+ _D4 ir _[Q(dx) _~

_P(dx)}2

\JX

P(dx) 2

where _{D0, Di, D2, D3, D4} _are _constants, _D\ _> _0. Proof: By Theorem 3.3, it holds:

D(P„,Q„) = D0 + D,

_£

MiizZíii)]!

+

o2

^

WW

-

p(^)]3

P(Ai)

+o(\\Qa-Pa\\4),

p(Aiy [Q(Ai) -

P(AiW

P(Ai)

for _every _Pa _{and Qa,} _restrictions _of _P _{and Q} _to _{the finite dimensional subcr-field}

aa of a. We can now _pass _to _{the limit.} _The _terms_with _coefficients _Di _{and Z)4} _can

be obtained _by _applying _Theorem _4.1 _with _f(x) ₌ _(x _—

_l)2.

_The _same _{holds for} the term with coefficient _{D3, with} _f(x) ₌ _(x _—

_l)4.

_{For the} _third _order _term _we

can use the remark _{following Theorem} _4.1, _since _f(x) ₌ _(x _—

_l)3

can be written as the difference oftwo non _negative _convex _functions:

where and f(x) = f^x) - f2(x), fi(x) 0 x < 1 (x —

l)3

X > 1 /ai*) ₀—(x —

l)3

X < 1 x > 1.

(26)

The _regularity _of _D _guarantees _{the result.}

4.1 The

_parametric

_case

Suppose now that P and Q belong to some regular parametric model, that is, P and _Q _are _equivalent _probability _measures _{with densities} _p(x]0) _and _p(x-,9'),

0, 9' € 0 C with respect to somecommondominatingmeasure p. By Theorem 4.2, we have that any monotone and regular divergence between P and Q can be

expanded as: (4.2) D(P,Q) Moreover, fl(M') Do+

Dí(p(ñlt^lñ

J _\ _p{x;9) 2 p(x;O)p(dx) p{x;0') -p(s;fl) p{x-0) 3 p(x; 0)p(dx) +

o(\9'

-

0|3).

p(x;9') = p(x;9) +<9tp(x;

9)(9'i

- 9,)

-«i)+o(i«'

-

«i2).

so that

(p(x;9')

~p(x;9)\2

\ p(x;*) /

/

_^

₁

_{djjpjx; 9)}

\

P(x]9)

"l) +

2 p(x-,9)

(*í

-

0i)(9'

- 9i) +

o(\9'

—

0|3)

=

dil(x]9)dJl(x-,9)(0'-9i)(9'J-9J)

+dkl(x] 9)

[<9tJ/(x;

9) + 9)djl(x;

9)]

(0'

-

0t)(0'

-

9}){0'k

- 9k)

+o{\9'-9\%

(P(x;9')

- p{x;9) V P(x\0) djp{x; 0) p(x;9) (*í - Oí) +

o(\9’-9\3)

and 3

(27)

=

dií(x;$)d3i(x]e)dki(x-,e)(9't

-

BiWi

-

QjM

-

ek)

+

o(\e'

-

9\3).

By substituting in (4.2), we obtain

D(9,

9')

= Do +

Digij(d)(9¡

-

0¿)(0'

-

9j)

+D, Tijk

(9M

-

OiM

-

03){9'k

-

6k)

+D2Ti]k{6){6\

-

et)(e’

- - ek) +

o{\6'

-

9\3)

'rnAWi

~

KM

-

*;)+

hk

(om

-

om

-

e3){e'k

-

9k)

where a = — Do + D\

+o{\0'-9\%

D\ + 2D2 Di

and _{gij(9) and} _Tijk ₍₉₎ _are _{respectively the Fisher metric}

and Amari’s a-connections of the _parametric _model, _see _Amari _(1985, _pp. ₂₆ _and

(28)

References

Amari, S. (1985), Differential-Geometrical Methods in Statistics, Vol. 28 of Lecture

Notes in _{Statistics, Springer} _{Verlag, New York.}

Blaesild, P. (1988), Yokes and tensors derived from yokes, Research reports 173, University of Aarhus.

Burbea, J. (1983), J-divergences and related concepts, Vol. 4 of Encyclopedia of

Statistical Sciences, J. Wiley, pp. 290-296.

Campbell, L. (1986), ‘An extended Cencov characterization of the information metric’, Proc. Amer. Math. Soc. 98, 135-141.

Heyer, H. (1982), Theory of Statistical Experiments, Springer-Verlag, New York. Cencov, N. (1972), Statistical Decisión Rules and Optimal Infererence, Nauka,

Moscow _(in _{Russian, English} _{versión, 1982,} _{Math. Monographs,} _53, _Amer.

(29)

Subject Classification: 34C37, 34C29, 58F30, 70H05. January1996.

196 A second order Stratonowch _differential _equation_with _{boundary conditions. Aureli} _Alabert

and David Nualart. AMS_{Subject Classification: 60H10, 60H07. January} _1996.

197 A Gentzen _system _equivalent _to _the _BCK-logic. _{Romá J. Adilion} _{and Ventura Verdú.} Mathematics _Subject _{Classification:} _03B20. _January _1996.

198 On Gentzen _systems _assoctated _{with the} _{finite linear MV-algebras.} _Ángel _{J. Gil,} _Antoni Torrens Torrell and Ventura Verdú. Mathematics _{Subject Classification: 03B22, 03F03,}

03G20, 03G99. February 1996.

199 _{Exponentially} _small _splitting _of _separatrices _under _{fast quasiperiodic forcing.} _Amadeu

Delshams, Vassili Gelfreich, Ángel Jorba and Tere M. Seara. AMS classification scheme

numbers: _{34C37, 58F27, 58F36, 11J25.} _{February 1996.}

200 Existence and _{regularity of the density}_for_Solutions _to _stochastic _differential _equations_with

boundary conditions. Arturo Kohatsu-Higa and Marta Sanz-Solé. AMS 1990 SCI: 60H07,

60H10, 60H99. March 1996.

201 A _forcing _{constructíon} _{of thin-tall Boolean algebras. Juan} _{Carlos Martínez.} ₁₉₉₁ _Mathe¬ matics_{Subject Classification: 03E35, 06E99,} _54612. _March _1996.

202 _Weighted _{continuous metric} _{scaling. C. M. Cuadras} _and_{J. Fortiana.} _AMS _{Subject Classi¬} fication: 62H25, 62G99. April 1996.

203 Homoclinic orbits in the _{complex domain. V.F. Lazutkin and C. Simó. AMS Subject Clas¬} sification: 58F05. _{May 1996.}

204 _{Quasivarieties generated by}_simple _{MV-algebras. Joan Gispert and Antoni Torrens. Mathe¬} matics _{Subject Classification: 03B50,} _{03G99, 06F99, 08C15. May 1996.}

205 _{Regularity of the law for}_a_{class of}_anticipating_stochastic_{differential equations. Caries Rovira}

and Marta Sanz-Solé. AMS _Subject _{Classification: 60H07, 60H10. May 1996.}

206 _Effective _{computations in}_Hamiltonian _{dynamics. Caries Simó. AMS Subject Classification:}

58F05, 70F07. May 1996.

207 Small_{perturbations} _in _a _{hyperbolic stochastic}_partial_differential _equation. _David Márquez-Carreras and Marta Sanz-Solé. AMS _Subject _{Classification: 60H15,} _{60H07. May} _1996.

208 _{Transformed empincal} _processes _{and modified} _{Kolmogorov-Smimov} _tests _{for multivariate} distributions. A. Cabaña and E. M. Cabaña. AMS Subject Classification: Primary: 62G30,

62G20, 62G10. Secondary: 60G15. June 1996.

209 _{Anticipating stochastic} _Volterra _equations. _{Elisa Alce} _and _{David Nualart.} _AMS _Subject

Classification: 60H07, 60H20. June 1996.

210 On the _relationship _{between a-connections} _and _the _asymptotic _properties _of_{predictive dis¬} tributions. J.M. Corcuera and F. Giummolé. AMS 1980 Subjects Classifications: 62F10, 62B10, 62A99. July 1996.

211 Global_{efficiency. J.M.} _Corcuera_and _{J.M. Oller.} _{AMS 1980}_Subjects_{Classifications:} _62F10,

62B10, 62A99. July 1996.

212 Intrinsic _analysis _{of the} _statistical _estimation. _J.M. _{Oller and} _{J.M. Corcuera.} _{AMS 1980}

(30)