A DISTANCE BASED APPROACH TO DISCR1MINANT

ANALYSIS AND ITS PROPERTIES

by

C. M. Cuadras

AMS _{Subject Classification:} _{62H30}

ABSTRACT

A _{general distance based} _{method for allocating} _{an} _{observation}
to one of several known _{populations,} _{on} _{the} _{basis of both} _{}

con-tinuous and discrete _{explanatory} _{variables, is} _{proposed and} _{}
stu-died. This method _{depends} on a given statistical distance
be-tween _{observations,} _{and leads} _{to} some classic discriminant rules

by taking suitable distances. The error rates can be easily

com-puted and, unlike other distance classification rules, the prior probabilities can be taken into account.

Key words and phrases: Linearand quadratic discriminant

func-tions; Location model; Statistical distances; Given margináis.

AMS _{subject} _{classification: 62H30}

1. INTRODUCTION

The _{problem of allocating} _{an} _{individual} _{to} _{one} _{of} _{two} _{populations} _{was}
proposed by Fisher (1936), who obtained the linear discriminant function
(LDF). Optimality ofLDF under the assumption of multivariate normality,

with a common covariance matrix, is a well-known property. LDF gives
a _{good performance} _{even} _{when} _{the} _{above} _{assumptions} _{are} _{violated and}
several studies have been carried outon LDF as well as alternative methods

(Krzanowski, 1977, Seber, 1984, Raveh, 1989). The discrimination pro¬
blemleads toa_{quadratic discriminant function} (QDF) _{when the covariance}
matrices are not _{equal.}

LDF is _{essentially based} _{on} _{continuous} _{variables} _{(Wald,} _{1944,} _{}
Ander-son, 1958). However, most problems in applied statistics fall in the mixed

case, i.e., the variables are both continuous and discrete. An interesting

approach, based in the location model

### (LM),

LMis applicable when the data contain both continuousand binary variables. This method computes an LDF for each pattern of the binary variables, is óptima! under the assumption of conditional normality and can be extended to the multistate case for discrete variables
binary variables, is óptima! under the assumption of conditional normality
and can be extended to the multistate case for discrete variables
(Lachen-bruch and _{Goldstein,} _{1979).} _{However, the LM approach} _{needs} _{a} _{conside¬}
rable _{computational effort and has} _{not} _{been implemented in the standard}
packages (Knoke, 1982). Moreover LM cannot be used when there are not

enough data for manylocations (Vlachonikolis and Marriott, 1982).

Logistic regression (LR) is another interesting approach, which ineludes

quite a wide class of models and also allows discrimination with mixed va¬
riables _{(Anderson, 1972). LR also requires} _{a} _{large} _{amount} _{of computing.}

LDF, QDF, LM and LRare all based on the ratioof probability density

functions, the máximum likelihood rule

### (ML).

The ML rule is a particular caseof the_{Bayes discriminant rule}

_{(BR), the rule obtained after supposing}that the

_{populations have prior probabilities. BR provides}

_{a}

_{general}

_{ap¬}proach to discriminant analysis which has certain óptima! properties from a theoretica!

_{point}

_{of view.}

Another _{general approach, first studied by} _{Matusita} _{(1956) is based} _{on}
the _{concept} _{of distance. Let d(w,}_{7r,) be} _{a} _{distance between} _{w} _{to} _{7ri} _{=}

1,2, where u isanindividualtobe allocated. The allocation ruleis: allocate
uj to the nearest_{population} _{tt,} _{.}

r

LDF is _{closely related} _{to}_{the Mahalanobis distance and}_{both}_{approaches,}
ML and _{distance,}areequivalent for multivariate normal data (Mardia et al,

### 1979).

If we substitute the covariance matrix### by the identity

matrix, weobtain the Euclidean distance classiñer _{(EDF), which is} _{an} _{alternative} _{to}
LDF _{(Marco} _{et} _{al,} _{1987), and has advantages when the number of variables}
is_{large relative} _{to}_{the training}_{sample}_{size.} _{EDF is} _{a}_{distance rule based}_{on}
Euclidean distance. The distance rule is also _{equivalent} _{to} _{BR} _{(including}
ML _{rule), for the LM model,} _{as} _{is} _{proved} _{by Krzanowski}

_{(1986, 1987).}

A distance-based _{approach} _{(DB)} _{for regression and} _{classification} _{with}
mixed variables was _{proposed by Cuadras} (1989) and Cuadras and _{Arenas}

### (1990).

Although considerable research has been done on discrimination, and the State of the Art of allocation has reached such a high level that itseems difficult tofurnish _{original} _{ideas,}_{this} _{paper} _{has the} _{aim}_{of examining}
further _{aspects} _{and applications of discrimination} _{by the above mentioned}
DB _{approach.}

Given every couple of individuáis the DB method uses a distance

d(u>,u}') to computediscriminant functions. The assumption underlyingthis

method is the_{following: in} _{some} _{circumnstances} _{(mixed data, for example),}
it is easier to obtain a distance d(., _{.) than} _{a} _{probability density function}

!>(•)•

2. A DISTANCE-BASED CLASSIFICATION RULE IN TWO

POPULATIONS OBTAINED FROM SAMPLE DATA

Let 7ri, 7T2 betwopopulations and supposethat u is an individualtobe

allocated. _{Suppose that} _{a} _{distance function} _{¿(., .) is defined} on the basis

of several _{(mixed) variables. The distance-based rule (1),} _{or} _{DB} _{rule,} _{is}
introduced and some of its _{properties} _{are} studied by Cuadras (1989).
2.1 _{Training} _{sets}

Suppose that samples C\ and C2 ofsizes ni and «2 areavailables from tti and 7T2, respectively. Suppose that an ni x ni intradistance matrix

D\ =

### (¿¿¿(I))

_{can}be computed

_{on}

### C\ by

using### the

distance### 6(., .),

_{as}

D2 =

### (¿»'i(2)),

and### ¿¿(2),

where am = *= 1-2

= *= 1-2### (>)

nk _{¿} _{£nk} _{..}

A decisión rule for allocation of u> is:

allocate u _{to ;r,} if /,(u>) _{=} _{min}{/j(w),

### /2(w)}

(2)This rule leads to a minimum-distance classification rule, _{as a} _{consequence}

of theorem 1.

Let us denote C= {1,2,_{•}_{• •}

### ,n}.

D_{=}

### (6¿¿)

is_{an}

_{n}

_{x}

_{n}

### distance

matrix,(n + 1) is a new individual, ¿i,- • •,Sn are the distances from (n + 1) to

each ofC and _{f(n}

_{+1)}

_{is}

_{defined according}

_{to}

_{(1).}

_{Then}

_{(n}

_{+1)}

_{and C}

_{can}be

_{represented by the points P, Pi},• • •,Pn 6 RT XgRa, where g =

### y/^J

and r+ s = n— 1. The coordinates are _{given} _{by}

P =

Pi = (x'i, gy'¡, 0) i = 1,•• •,n

and it issatisfied that

### ¿ij

=### d2(Pi,Pj)

=### ll*¿

-### *jl|2

-### l|s/¿

“### V¿||2

where

_{||}

_{•}

_{||}

_{means}

_{the Euclidean}

_{norm.}

_{An}

_{explicit construction of P, Pj,}

_{• • •,}

P„ is given in the following:

Theorem 1

Set P = (*,y,0), where _{x -} (E,Xi)/n, _{y =}

(Ey¿)/n-Then

f(, _{:W)} =

### d2(p,p)

In _{particular,} _{if D is} _{a} _{Euclidean distance matrix and} _{n+1,} _{C} _{are repre¬}
sented _{by} *,*!,•••,*„€ RPi then

f

f(u) =

### d2(x,x)

= ||* -### x||2

Proof. Let B =

### (6tJ)

= HAH,### where JET

is### the centring matrix

### and

A =### (a,j)

_{=}

### -f(¿,2).

_{If B}

_{>}

_{O,}

_{i.e.}

_{D is}

_{a}

### Euclidean

### distance

### matrix,

let Ax be the _{diagonal matrix of positive eigenvalues} _{and X the matrix}

of _{corresponding} _{eigenvectors.} _{The} _{rows} _{of X,}

_{x\,}

_{•••,*'„,}

_{verify}

_{<5,2}

_{=}

### II®,

—### Xj||2

and constitute the classical solution of MDS (Mardia et al., 1979).

et### al.,

1979).

In _{general} _{(D need} _{not} _{be} _{Euclidean),} _{we} _{must} _{also consider} _{a} _{diago¬}
nal matrix _{Ay} _{of negative eigenvalues}

_{of}

_{B,}

_{whose corresponding eigen¬}

vectors are the columns of the matrix _{Y, and} y'are _{the} _{rows}
of Y _{(Lingoes, 1971). Each element i} _{(say) of C} _{can} _{be} _{represented by}

P¡ = (*',,y'¿,0) and (n+1) by

### (x',y',z).

Note that, as 1 is aneigenvector of B of eigenvalue 0, then X'l = 0 and Y'1 = 0, so * = (J2xi)/n = 0

0, 17=

### (£3/,■)/«

= 0.Gower _{(1968)} _{proves}_{that if B} _{>} _{0,} _{the coordinates of}

_{(n +1)}

_{are}

_{given}by

z X

,2

where

If i? is anon-Euclidean distance matrix, it is easily proved that

t

On the other hand

henee

### Y. 6¡j

=### 2

ntra### B

/(n+ 1) =

### z2

+### (x'x

-### y'y)

Since x = 0 and _{y} _{=} 0, all coordinates of P _{are} nuil. Henee the distance

between P and P =

### (x',gy',z)

_{is}/(n

_{+}l).

_{□}

Now it is clear that decisión rule _{(2)} _{is} _{equivalent} _{to}
allocatew _{to tt,} _{if}

### d2(P,P)

_{<}

### d?(Q,Q)

otherwise to where _{P,P,Q and Q} _{are} _{suitably constructed. Note} _{that}
this construction is not needed to obtain and _{/2(w).}

Decisiónrule _{(2) leads} _{to}_{standard} _{discriminant functions} _{for}_{continuous}
data and usual distances. Let *¿ti,• • • fc = 1,2, be the training

samples and * the observation to be allocated. Then (2) is based on the LDF

L(x) = x--(xi + x2) S 1 (xx - *2),

where S is the _{pooled sample covariance matrix,} _{provided that the} _{square}
distance is the Mahalanobis distance

{xki -

### xkj)'

### S

### 1

### (xki

-### xkj).

(3)If the covariance matrices need not be_{equal and} we replace S for Sk in

(3), then rule (2) is based on a QDF.

Finally, ifwe choose the Euclidean distance, this decisión rule is based on the EDF

1 _ _ 1' _ _

E(x) = X --(*!+*2) (X1-X2).

2.2 Error rates

The _{leaving-one-out} _{method} _{to} _{estímate} _{the probability of }
misclassi-fication _{(Lachenbruch, 1975)} _{can} _{be} _{applied} _{with simple} _{additionai} _{com¬}
putaron. Let us denote A = (a,j), B =

### (6tJ) and C

=### (cij),

wheref

aij =

### ¿5(1),

### bij

=### ¿5(2)

### and

c,¿ =### ¿2(t,j),

### where i

G### C\ and j

G### Cj.

A,B and C are ni X ni,n2 X n2 and ni x n2 matrices respectively. If

ni na

O =

### ^

®ij, ®¡r, =

### )

,Cir,i<j t=1 r=J

and

_{£>,&_, and}

_{c.j}

_{are}

_{similarly}

_{defined,}

_{to}

_{allocate i}

_{G}

_{C\}

_{we use}

_{the}

discriminant function
/i(0 = (ni - 1)

### 1

a, - (ni - 1)### 2

(a- a,)and

/2(i) = nj

### 1c¿.

—### nj2&.

Henee

mi =

### freql€C)

### [/i(i)

_{-}

### /2(i)

_{>}

### 0]

### (4)

is the _{frequeney of individuáis} _{misclassiñed in C\.} _{The frequeney} _{m2} _{of}
misclassification in _{C2} _{is} _{analogously obtained} _{as a} _{function} _{of b,bj and}
c.j. If the n = ni + n2 observationsare random samples

### from

iri Utt2,### the

error rate is estimated _{by}

mi + m2

ni + n2 (5)

Thus é is _{directly computed from A, B} _{and C.}

Now _{suppose}_{that} _{C\} _{and} _{C2} _{are} _{nonoverlapping classes, that} _{is,}

d2 < d2 (xly) _{Vi} _{G} _{Cx}

_{(6)}

1

d2

_{(Vj,y-j)}

### <

### d2

### (y¿,*,)

### Vj

### 6

### C2

### (7)

where x,-,x* are the Euclidean representation oft as a member of Ci,C2,

respectively, and x_, is the mean on Ci after leaving out individual i.

Similarly we have _{y3,}

_{y*}

_{and}

_{y_3.}It is

_{easily proved that}

Mi) =

### d2(xi,x-i),

M*) =

### d2(x*,y).

Thus,if both_{(6) and (7) apply,} mi = m2 = 0 and aperfect classification
for_{nonoverlappling classes is obtained.}

2.3 Distance between _{samples}

Suppose that the observable variables areboth quantitative and
qualita-tive _{(nominal, ordinal, dichotomous) and that the distribution of the varia¬}
bles is unknown. Then we _{may} _{use any} _{distance function chosen} _{among}_{the}

wide_{repertory}_{available. One candidate is the}_{square}_{distance}

_{d?}

_{=}

_{1}

_{—}where stJ isthe aJl-purposemeasure

### of similarity proposed by Gower

### (1971)

and well described_{by Seber}

_{(1984). This distance provides}

_{a}

_{Euclidean}distance matrix and

_{gives}

_{good results in regression}

_{with}

_{mixed data,}see

Cuadras and Arenas _{(1990). When} _{some}_{valúes} _{are} _{missing,} _{can} _{also be}
obtained but it could be a non-Euclidean distance. However, rule (2) also
applies. Therefore, this DB method can be used for handling missing valúes
in discriminant _{analysis.}

For continuous _{variables,} _{letting (x,i,}• • •,

### x,p), (xjj,

• • •,### Xjp) be

twoob-servations for individuáis i and _{j,} _{we} _{also} _{use} _{the} _{square} _{distance}

### d2j

=### |x¿i

-xji### |

+• • •+### |x,p

-### Xjp|

### (8)

which will be confronted with the Mahalanobis and Euclidean distances in
the _{examples} _{given in section 6.}

3. CLASSIFICATION WHEN THE DISTRIBUTIONS ARE

KNOWN

Suppose that the observable variables are related to a random vector

with a_{probability density} _{p,(x)} _{with} _{respect to} _{a} _{suitable} _{measure} _{A, if}_{*}
comes from tt,-, i = 1,2.

Let _{*o} _{be}_{an}_{observation} _{to}_{be}_{allocated} _{and} _{S(.,} _{.)} _{a}_{distance}_{function.}
The discriminant function which _{generalizes} _{(1)} _{is given} _{by}

/»(*o) =

### J

### 62

_{(x0,x)pi(x)d\(x)}

### -

### ^

### J

### 62(x,y)pi(x)pi(y)dX(x)dX(y)

=

### Hí0-\h{.

### (9)

Hío is theexpectation of the random variable

### S2

(xo,x) on . Weassumex,y tobe independenttoobtain the expectation of

### ¿2(x,y)

on 7r,X7r, .The decisión rule for _{allocating} _{xq} _{is}

allocate _{*0}_{to}7r¿ if /,(*o) = niiní/^xo),

### /2(*o)}

(10)Before _{presenting} _{some} _{properties} _{of the} _{DB} _{rule (10),} _{we} _{introduce}
a distance between individuáis based on the so-called Rao distance _{(Rao,}

1945) and studied laterbyBurbeaand Rao(1982), Oller and Cuadras (1985)

and others.

3.1 A distance between individuáis

Suppose that the random vector X is related to astatistical model 5 =

### {p(x;0},

where# is an n-dimensional vector parameter and### p{x,0)

is aprobability density. Assume the usual regularity conditions and consider the efficient score

Ze = —logp(X;0) (11)

Then _{E(Z$)} _{=} _{0} _{and G} _{=}

_{E(ZeZ¿)}

_{is}

_{the Fisher information matrix,}where Z$ is interpreted as a column vector.

Transform twoobservations Xi,x2 to z\,z2 _{by using} (11). The square
distance between *i,*2 withrespect tothe statistical model S = {p(*,0)}

is _{given by}

### 62(xux2)

= (Zi ~### ZiYG-1

{Zx - Z2) (12)This distance was introduced by Cuadras (1988, 1989) and especially 011er

(1989), who provides further theoretical justification.

If _{p(x,0)} _{is} _{the N(p,£) distribution} _{(S being}

_{fixed),}

_{then Z^}

_{=}

### U~l(X

— n), G =### 27-1

and (12) reduces to the familiar Mahalanobis dis¬ tance.If X = (ii,-*-,xm) has the multinomial distribution

### flpj;*,

wherea:* € {0,1}, k = 1,•• •,m, Xi falls in the cell r and X2 falls in the cell s, we obtain the square distance

### 62(x1,x2)

= (1 +P,### *)

### (13)

where 6rs is the Kronecker delta.

In _{general, distance} _{(12) is related} _{to} _{the quadratic form}

_{Z'eGZg, where}

Zg is given by

### (11),

which satisfies (011er,### 1989)

E

_{(Z'gG~xZg)}

=

### E

### (tra G^Z'gZ'g)

=### tra

### (g^g)

=### n.

### (14)

Suppose next that the random variables are related to a random vector

W — (X\,X2), where X, is _{a} random vector with density p,(x¿, 0¿) with

respect toameasure A,-, i = 1,2. Supposethat there isadensity p(w,0\,02)
with _{respect to} _{a}_{suitable} _{measure p,} _{with margináis}

_{p,(x,-,#,),}

_{t}

_{=}

_{1,2.}

Letting

Ze, =

_{QQ-\ogp,(xt,dt)}

let us define
and the _{symmetric matrix}

£*12

_{^}

G22 _{)}

_{’}

the_{expectation}_{taken} _{with} _{respect to} _{}
p(te,0i,02)-Since

Gu =

### J J

### ZgtZ'8'P(xi,X2,61,02)d\i(xi)d\2(x2)

=

### J

### Zg{Z6>

### pi(xi,Oi)dXi(xi)

it is clear that is the Fisher information matrix for _{p,(x¿,0,),} _{i} _{=} _{1,2.}

However, G is not a Fisher information matrix for p(w,0i,$2).

Assume that G12 is a suitable fixed matrix and wi = (xn,xi2), w2 =

(*21,*22) aretwoobservations transformedto z\ =

### (zn,zi2)>

*2 = (*2i>*22)by meansof (11), _{i.e.,}

zij =

### —logp^Oj).

We define the _{square}_{distance between} _{Wi} _{and} _{W2}

### S2(wi,w2)

= (z\ -### z2)'

### G~*

(zi - z2) (15)This distancecanbe used when wisamixed random_{vector,}_{for}_{example,}
x\ is the continuous type and 2^ is the discrete type.

When both X\ and X2 have absolutely continuous distributions with

respect to the Lebesgue measure, the pdf p(w,61,62) exists. Theorem 2

Suppose that

### (11)

gives a transformation from to zgt which is one-to-one and satisfies the condition of the_{change of variables for múltiple}

some restrictive conditions) there _{exists} _{a} pdf

### p(w,0i,02)

with margináisp(*,,0,), i = 1,2, such that G\2 =

### £(Z$,

•Z'e3)-This result canbe obtained as a_{consequence}_{of the}_{problem of}_{}
construc-ting probability distributions with given multivariate margináis and agiven
intercorrelation matrix. _{However, this construction is} _{quite complicated. A}
solution is_{given} _{by} _{Cuadras} _{(1990).}

Although the density p(w,0\,02) exists, the true function is unknown

in _{practice.} _{Nevertheless, distance} _{(15)} _{can} _{be computed. Consequently,} _{a}
DB classification rule is available for mixed _{variables, if only the marginal}
distributions are known. _{This,}_{again,} _{is} _{the main idea of this} _{paper.}

3.2 Parameter estimation with _{given margináis}

Parameters _{0\} _{and 62} _{are} _{unknown in} _{applications.} _{Suppose that}

W{ = (x,i,2¿2), i = I,---,N, is arandom sample from a population with

pdf p(w,0\,02)• However, only the margináis pdfs p,(2,,0,),t = 1,2, are

known.

Let 0 = (0i,02) be_{the ML estimation} obtained considering the product

pdf pi(xi,0i) -^2(^21 ^2)1 i-e-,

N 2

### EE»¡(«s.t)

=### «

where

Thetrue valué _{0q} is characterized by (Kent, 1982, Royall, 1986)

where the _{expectation} _{is} _{taken with} _{respect to} _{}
p(w,01,^2)-Since

=

### J

### Ui{x-l,Oi)p-i{xi,Ol)dxi

### =

### O

and _{similarly} _{for} _{O is} _{&} _{consistent} _{estímate} _{of $q.} _{(See also}
Huster et _{al,} _{1989). Finally,} _{a} _{consistent} _{estimation} _{of}

_{Gkj}

_{is given}

_{by}

Gkj

### 4E"‘

_{V'i}

### (z¡¡,i¡)

.t=l

3.3 Some _{properties of the classification rule}

The discriminant function _{(9) and the decisión rule} _{(10) satisfy} _{certain}

properties.

1) Hi is the diversity coefficient of tt, and Híq is the average difference between fio and 7r,, where tto = {*o}• Since Ho = 0 we find that

ft(xo) = H,o ~

### \

### (Hl

### +

### H0)

is the Jensen difference between ir¿ and tto.

2) Suppose that x\ and *2 are independent random vectors and the

distances

_{£?(.,}

_{.),¿2(-» •)}

_{3X6}

_{defined by using}

_{3:1,12}

_{respectively.}

According to Oller (1989), let us define the square distance between

(*11, *12) and (*21, *22) by

### ^2(*ll,*i2;*21i*22)

= ¿l(*lli*2l) +### ^(*21,

*22),and let _{(xoi,*o2) be} _{an} _{observation} _{to} _{be allocated. Then, using} _{an}
obvious notation, it is verified that

/t(*01, *02) = /.l(*0l)+/.2(*02).

### (16)

## 3)If

X is and the Mahalanobis distance is adopted, then /.'(*0) = (*o -### Mi)'

(*o- Mi)and rule _{(10) leads} _{to} _{a} _{minimum-distance rule. This result} _{is} _{even}
valid for nonnormal data, see theorem 3.

4) If X has the multinomial distribution n

### p*¡¡,

when X come### from

w,, and ifwe use distance_{(13), then}

/.(so) = (1 ~Pik)/Pik

### (17)

if _{*o} _{falls in the cell k.} _{Thus,} _{the} _{classification rule} _{is:}
allocatexq to ir,- if pu¡ =

min{pi/t,p2fc}-Let us now _{suppose} _{that given} _{Xi,xi} _{€} _{7r,,} _{there exists} _{<p : ir,-} _{—►}

Rq,ti = ip(xi),t2 = <p(x2)1 such that

### S2(xi,x2)

= (ti -### t2)'(ti

- Í2) ti,<2 € Rqi.e., £(., .) is a g-dimensional Euclidean distance. One way to define t¿ =

### <p(x¡)

isti = G~iZ{

where Z, is obtained from (11) and G is the Fisher information matrix. Generally, if are given and ¿>(.,.) is a Euclidean distance,

£1,• ••,ín can be obtained by metric scaling. Assumefurther that

Mi =

### [<¿>(*)]

exists andE*i

### (II*-Mili)2

< o® where Emeans_{expectation}

_{with}

_{respect to}

_{pi{x).}Theorem 3

Let *0 bean observation tobe allocated, t0 = <p(*o) and fi(xQ) given

by (9). Then

Proof. Let xi,• ••,xn beiid random vectorsfrom x, and let ti = <¿>(x,), i =

1,• ••,N. Then

### S2(xr,xa)

= (tr -### t,)'(tr

- t3)and, after some _{algebra,} we find

2

### ¿L¿2(a5r’**)=

_{]v}

where t = (]>3r tr)/TV. From the law of large numbers it

### follows

that_{as}

N —► oc the above _{expression} _{tends} _{to}

### ^E^{S2(x,y)}

### =

### E„t{\\t

-### /x,||2}.

Similarly

### 4

_{£}

_{<2(xo,}

_{*,)}

= ### iit„

-### i\\2

+ -### *)'(*'

### -')

■*

r 1 r

and, as N —► oo this expression tends to

### E„,{S2(x0,x)}

=### ||to-/*¿||2+ ^ir,-{||t-Mili2}

and_{(18) follows.}

_{□}

For distance _{(15)} _{an} _{alternative} _{proof of Theorem} _{3} _{is given.} _{Using} _{a}
suitable notation:

### ¿2(xo,®)

=### (z0-z)'G

### 1

(z0-z),### E(z,G~1z)

=### EÍXtslG-'zz')

= tra### (G_1G)

= n.Henee, because E(z) = 0,

### ¿2(x0,x)j

=### z'oG

### ^o

### +

### n.

6 _{(x,y)} _{=}

_{(zx-zy)G}

_{(zx-zy),}

E

_{E(z'xG~'zx)}

_{+}

_{E(z,yG~1zy)}

_{-}

_{2E{z'xG~'zy)}

2n.
We condude that the discriminant function is _{given by}

/(*o) =

### z'oG-'zo,

the _{square} _{distance} _{between} _{zo} _{=} _{33}

_{logp(*o,0) and}

_{E(z)}

_{=}

_{0.}

4. THE BAYESIAN APPROACH

The _{Bayes} _{discriminant rule} _{(BR) allocates} _{xo} _{to} _{the population} _{for}
which _{qip¡(x)} _{is} _{greatest,} _{where} _{q\} _{and} _{<72} _{are} _{the prior probabilities of}

drawing anobservation from 7Ti and ^2 respectively. BR leads toML when

qi = 92 = 1/2.

If _{Ti} _{is} _{i} _{=} _{1,2, ML is} _{equivalent} _{to} _{the mínimum distance}
rule_{provided that} _{the Mahalanobis distance is}_{adopted. For mixed data and}
using the location model LM, Krzanowski (1986) using a distance based on
Matusita _{affinity,} _{proves} _{that LM is also} _{equivalent} _{to} _{a} _{mínimum} _{distance}
rule. _{However, neither the LM} _{approach} _{ñor} _{the Matusita} _{approach takes}
the _{prior} _{probabilities into} _{account.} _{In} _{contrast,} _{this is possible in the DB}
method.

4.1 _{Incorporating prior probabilities}

Suppose that uq is known to belong to 7r,- with probability g,,i = 1,2.

Letus consider discriminant function _{(9) with distance (13). Given} _{u} _{€} _{T\}
we find

### ¿2(u>o,u>)

= 0 if u>o€jti, =### 9i-1

+### 9Í1

if wo € tt2. Henee #io = q\ ■0 +### 92(1

+### 9Í1)

_{=}

### ?i-1

Moreover _{#1} = 0, so we obtain the function /i(u>o) =

### 1.

Similarly### h{uo)

=### gj1-

However,### rule (10)

remains invariant if we add the sameconstant to _{f\ and /2} . Therefore, in order to construct a function that is

a _{square} _{distance,} _{let} _{us} _{introduce the prior discriminant function}

Note that _{/¿(wo) is} _{consistent} _{with the discriminant} _{function}

_{(17).}

Thus rule _{(10) allocates} _{w0} _{to} _{the population} _{for which the prior} _{}
pro-bability is greater.

Suppose now that a vector observation X is known with density p,(x).

Taking into account property (16), let us introduce the posterior discrimi¬ nant functions

### f?3(xo)

= + -l ¿=1,2. (19)where _{Hío,Hí} _{are} _{defined} _{in} _{(9).}
A decisión rule for _{allocating} _{xo} _{is:}

allocate _{x0} _{to} 7r, if

### ff{x0)

=### min{/f(a:o),

### ff

### (*o)}

(20)Discriminant function _{(19) is constructed by using distance} _{(15), but} _{it}
also _{applies for} _{any} _{other distance} _{¿(.,} _{.).}

However, note that rule (10) is invariant after multiplying ¿(., .) by a

positive constant, but the decisión taken from rule (20) could be affected.

In order toavoid this arbitrariness, it is necessary to imposeastandardizing
condition similar to _{(14)} _{to} _{the quadratic} _{form related} _{to} _{S(.,} _{.).}

4.2 Multivariate normal

The DB rule based on (18) _{and the BR} _{are} closely related. _{Suppose that}

7T, is JV(/i¿,¿7), i = 1,2. Then ML is based on the LDF

V(x) = -

### n2)'

x -### ^(mi

-### M2y

### ^_1

### (Mi

### +

### M2X

BR is based on B(x) = V(x) +log(0/l-6), and DB is based on D(x) =### V(x)+(e-

### I)/[0(1-0)],

where 0 and 1 — 0 are the _{prior} _{probabilities of drawing} _{an} _{observation}
from 7ri and *2 respectively.

For 0=

### |

_{we}

_{obtain}

### V(x)

_{=}

### B(x)

_{=}

### D(x), otherwise functions B{x)

and _{D(x)} _{are} _{different.} _{However,} _{the difference between} _{log(0/l} _{—} _{0) and}

(0—

### ^)/[0(l

—### 0)]

### is

not too### marked

### in

### the interval

### (0.2, 0.8). See Fig.

### 1.

(0-l/2)/[0(l-0)]

« B ln

### (lé?)

### 0

### <

### 0

### <

### 1

Figure 1. Graphical comparison between distance based rule (continuous curve)

and _{Bayes rule} _{(discrete} _{curve),} _{for the multivariate normal }
distribu-tion. 6 and 1—0 are the_{prior} _{probabilities.}

4.3 Multinomial distribution

Suppose thatarandomevent Ahas the conditional probabilities P(j4/7t,)

Pi, i = 1,2. If u>o € A, the Bayes decisión rule is allocate uq to 7r, if

P\0 > P2(l — 0), otherwiseto tt2 . Therefore BR is based on

b = log_{p\} _{-}logp2 _{+}log(0/l
- 0)

and DB is based on

For 6 =

### |,

_{b and d}

_{are}

_{equivalent,}

_{otherwise}

_{the decisión could be}

different. To _{study this} _{difference, let} us consider the Borel set K C (0,

### l)3

K =

### {(puP2,0)\b-d>0}

We take the same decisión as _{long} _{as} (pi,p2,0) _{€} _{K.} After _{some} tedious
computations, the Lebesgue measure of K ¡s

ft(K) = 0.971

which _{is very} _{cióse} _{to} _{1,} _{the} _{Lebesgue} _{measure} _{of} _{(0,}

_{l)3.}

Thereby, the decisión taken after using either BR or DB, isalmost thesame.
This is _{quite interesting, because} _{although the} _{two} _{decisión rules} _{are} _{based}

on different _{criteria,} _{their results coincide}_{in} _{practice.}

5. CLASSIFICATION INTO SEVERAL POPULATIONS

Suppose that we have k mutually exclusive populations 7Ti,• • •,jt* . The

theory developed is easily extended to k > 2. Let C,- be a sample from 7r, and Di a distance matrix. If u>o is an individual to be allocated and

### Sj(i)

is the distance from uo to j € C,, the discriminant function /,-(w) isdefined as in _{(1), and the} _{decisión rule} _{for allocating} _{u} _{is}

allocateu _{to tt,} if /,(w) _{=}

### min{/i(o;),-•

_{•}

### ,/jt(w)}.

(21)It is obvious that theorem 1 also _{applies here,} _{so} _{(20) is} _{a} _{}
minimum-distance discriminant rule. In _{addition, the} _{error} _{rate} _{computations} _{are}

easily obtained.

When the distributions are known (up _{to} parameters),_{extensions of}_{(9)}

and _{(10),} _{as} _{well} _{as} _{therorem} _{3,} _{do} _{not} _{present} _{further difficulties.}

Finally, if prior probabilities q\,- ■ ■,qk are known, we may use the pos¬ terior discriminant functions

### /f(*o)

= fi(xo) +### q-1

- 1 i = 1,• • •,k,6. SOME EXAMPLES

Three real data _{examples} _{are} _{used} _{to} _{compare} _{the DB approach with}
LDF, QDF, EDF and LM approachs.

Data set 1 is the “avanced breast cáncer data” used by Krzanowski

### (1975,

1986,### 1987)

to ¡Ilústrate the LM. In this data set,### of

186 cases### of

ablative_{surgery}

_{for advanced breast}

_{cáncer,}

_{99}wereclassified as “successful

or intermedíate” (7Ti) and _{87} _{as} _{“failure”} (ir2). Gower’s distance (section

2.3) is used on the basis of6 continuous variables and 3 binary variables for
the DB _{method, but} we take ranks on the continuous variables instead of

numérica! _{valúes,} _{as} _{the} _{range} _{of}_{most} _{variables} _{was} _{too} _{large.}

Data set 2 is the well-known Fisher’s Iris data _{(Fisher,} _{1936), which}
consider _{ni} _{=} _{n2} _{=} _{713} _{=} _{50} _{observations} on three species of iris, I.
se-tosa _{(ttj),} _{I.} _{versicolor} _{(7t2) and} _{/. virginica} _{(^3) and the discrimination}

problem on the basis of 4 continuous variables. The DB method uses the
Euclidean distance _{(which yields the} _{EDF) and distance (8).}

Data set _{3,} _{taken from Mardia} _{et} _{al.} _{(1979,} _{p.} _{328), is concerned with}

the _{problem} _{of} _{discriminating between} _{the species} _{Chaetocnema} _{concinna}

(jti) and Ch. heikeriingeri (tt2) on the basisof2continuous variables and

samples of sizes m = n2 = 18. Gower’s distance (versión for continuous

variables) is also used.

The _{leaving-one-out method} _{was} _{used,} _{as} _{described} _{in section 2.2,} _{for}
computing the error rates. Table 1 summarizes the results obtained.

Table 1

Numbers of misclassifications obtained with the_{leaving-one-out method.}

1. Cáncer data *1 ít2 LM 34 27 61 LDF 41 31 72 EDF 45 43 88 DB 33 32 65 2. Iris data JTi 7T2 *3 LDF 0 2 1 EDF 1 5 7 DB 0 2 3 3 13 5 3. Chaetocnema data 7Ti 7T2 LDF 0 1 QDF 1 1 EDF 2 3 DB 0 0 1 2 5 0

7. SUMMARY AND CONCLUSIONS

A distance based _{approach} _{(DB), which} _{uses a} _{statistical distance }

be-tween _{observations,} _{is} _{described and theoretical properties} are presented.
This method reduces to linear discrimination _{(LDF),} _{or even} _{quadratic} _{}
dis-crimination

_{(QDF)}

_{when}

_{the sample Mahalanobis distance is used. }More-over the DB approach yields

_{a}general discriminant function which reduces to the Euclidean discriminant function

_{(EDF) when the Euclidean sample}distance is used.

This method offers a rather simple algebraic _{way} _{to} consider the dis¬
crimination _{problem with mixed variables} _{as} _{well} _{as}_{the missing valúes} _{case,}

provided that a suitable distance is adopted. No restriction seems to be

necessary on the numbersof binary variables, sothe DB method may be an
alternative to discrimination based on the location model _{(LM).}

The _{leaving-one-out method} _{for computing the}_{probability} _{of }
misclassi-fication can be applied following _{an} elementary matrix computation in this
approach.

Finally, unlike other methods based on distances, the priorprobabilities

can be taken into account in the DB method, with results quite similar to

those _{given} _{by the Bayes} _{decisión} _{rule.}

