• Non ci sono risultati.

View of Minimax Estimation of the Mean Matrix of the Matrix Variate Normal Distribution under the Divergence Loss Function

N/A
N/A
Protected

Academic year: 2021

Condividi "View of Minimax Estimation of the Mean Matrix of the Matrix Variate Normal Distribution under the Divergence Loss Function"

Copied!
16
0
0

Testo completo

(1)

MINIMAX ESTIMATION OF THE MEAN MATRIX OF THE

MATRIX VARIATE NORMAL DISTRIBUTION UNDER THE

DIVERGENCE LOSS FUNCTION

Shokofeh Zinodiny

Amirkabir University of Technology, Tehran, Iran Sadegh Rezaei

Amirkabir University of Technology, Tehran, Iran Saralees Nadarajah1

University of Manchester, Manchester M13 9PL, UK

1. INTRODUCTION

Let X=€xi, jŠbe ap× m random matrix having a matrix variate normal distribution with mean matrixΘ = €θi, jŠand covariance matrix V⊗ Im, where V is known or

unknown, Imis them× m identity matrix and ⊗ denotes the Kronecker product. This note considers estimation ofΘ relative to the divergence loss function

L(a;Θ) = 1 β(1 − β)  1− Z Rp×m fβ(X|a) f1−β(X|Θ) dx  , (1)

where 0< β < 1, f (X|Θ) is the conditional density function of X given Θ and a is an estimator ofΘ.

Estimators for the mean matrix have been proposed under different loss functions. Efron and Morris (1972) proposed an empirical Bayes estimator outperforming the max-imum likelihood estimator,X , for the case m > p + 1. Stein (1973), Zhang (1986a), Zhang (1986b), Bilodeau and Kariya (1989), Ghosh and Shieh (1991), Konno (1991), Tsukuma (2008), Tsukuma (2010) and others found minimax estimators better than the maximum likelihood estimator for quadratic loss and general quadratic loss functions.

The most recent minimax estimator forΘ is that due to Zinodiny et al. (2017). They estimated Θ under the balanced loss function. But Zinodiny et al. (2017) considered only the case V= σ2I

p, whereσ2 is unknown. In this note, we consider three cases 1Corresponding author. E-mail: mbbsssn2@manchester.ac.uk

(2)

370

as outlined in the abstract. Furthermore, the divergence loss function in (1) has attrac-tive properties not shared by other loss functions, including the balanced loss function. Firstly, it is invariant to one-to-one coordinate transformation of a. Secondly, the Fisher information function, used commonly to measure the sensitivity of estimates, is pro-portional to (1). Thirdly, many loss functions including the balanced loss function are symmetric, but (1) is not symmetric. Ifβ is closer to 1 more weight is given to a. If β is closer to 0 more weight is given toΘ. Other attractive properties of (1) can be found in Kashyap (1974).

The divergence loss function is directly comparable to the densities f (X|Θ) and f (X|a). Robert (2001) refers to it as an intrinsic loss function, see also Amari (1982) and Cressie and Read (1984). The divergence loss function has been applied in many areas. Some examples include: discrimination between stationary Gaussian processes (Cord-uas, 1985); the Vocal Joystick engine, a real-time software library which can be used to map non-linguistic vocalizations into realizable continuous control signals (Malkin et al., 2011); risk-aware modeling framework for speech summarization (Chen and Lin, 2012); models for web image retrieval (Yanget al., 2012); models for coclustering the mouse brain atlas (Jiet al., 2013); properties of record values (Paul and Thomas, 2016).

The aim of this note is to estimate the mean matrix of the matrix variate normal distribution under the divergence loss function. The contents are organized as follows: Section 2 shows that the empirical Bayes estimators dominate the maximum likelihood estimator under (1) form> p + 1 and hence the latter is inadmissible for the case V is known. We find a general class of minimax estimators using a technique due to Stein (1973) when V= Ip. Section 3 shows thatX is inadmissible for m> p + 1 when V =

σ2I

p, whereσ2is unknown. Section 4 shows thatX is inadmissible for the case V is

unknown. These sections in fact extend the results of Ghoshet al. (2008) and Ghosh and Mergel (2009) to the multivariate case. Section 5 performs a simulation study to compare one of the derived estimators with that in Zinodinyet al. (2017). Section 6 concludes the note.

2. ESTIMATION OF THE MEAN MATRIX WHENVIS KNOWN

Here, we suppose

X∼ Np×m(Θ,V ⊗ Im),

whereNp×m(Θ,V ⊗ Im) denotes the matrix variate normal distribution with mean

ma-trix Θ and covariance matrix V ⊗ Im, where V is assumed known. We consider the problem of estimating the mean matrixΘ under the loss function (1), namely

L(a;Θ) = 1 β(1 − β)  1− Z Rp×m fβ(X|a) f1−β(X|Θ) dx  = 1 β(1 − β) • 1− e−β(1−β)tr((a−Θ)0V−1(a−Θ)2 ) ˜ , (2)

(3)

where tr(A) and A0denote, respectively, the trace and the transpose of a matrix A, and

f (X|Θ) denotes the matrix variate normal density function. The usual estimator for Θ is the maximum likelihood estimator,X .

Lemma 1 shows that the maximum likelihood estimator is minimax under the loss function (2).

LEMMA1. The estimator X is minimax under the loss function (2).

PROOF. Suppose the proper prior sequenceπn(Θ) is distributed according to a ma-trix variate normal distribution with mean mama-trix zero and covariance mama-trixnC⊗Im,

i.e. Θ ∼ Np×m€0p×m,nC⊗ Im

Š

, where C is a p× p known positive definite matrix. Then, the posterior distribution is

Θ|X ∼ Np×m€€ Ip− V(nC + V)−1Š X, V−1+ (nC)−1−1 ⊗ Im Š . Thus, the Bayes estimator is

δπn(X) = E [Θ|X] =”

Ip− V(nC + V)−1—X. Moreover, the risk function of X and Bayes risk function ofδπ

n(X) are R(X;Θ) =1−[1 + β(1 − β)] −p m2 β(1 − β) and rn= r€πn,δπ n(X)Š = 1− V−1+ (nC)−1 m 2 [1 + β(1 − β)]V−1+ (nC)−1 − m 2 β(1 − β) ,

respectively, where|A| denotes the determinant of A. Since the determinant of a matrix is a continuous function, we have

lim

n→∞rn=

1−[1 + β(1 − β)]−p m2

β(1 − β) = supΘ R(X;Θ).

Hence, using Theorem 1.12 on page 613 of Lehmann and Casella (1998) and results of Blyth (1951),X is minimax under the loss function (2). 2 Now we construct a class of empirical Bayes estimators better thanX . Assume we have some additional information aboutΘ that can be written as

Θ ∼ Np×m€

0p×m, A⊗ Im

Š .

(4)

372

The conditional distribution ofΘ given X is

Np×m€€Ip− V(V + A)−1ŠX, V−1+ A−1−1 ⊗ Im

Š . So, the Bayes estimator ofΘ under (2) is

b

ΘB= E [Θ|X] =

€

Ip− VΣ−1ŠX,

whereΣ = A+V. In an empirical Bayes scenario, Σ is unknown, and is estimated from the marginal distribution of X. The marginal distribution of X isNp×m€0p×m,Σ ⊗ ImŠ. So, S= XX0is completely sufficient forΣ and an empirical Bayes estimator is

b

ΘEB= E [Θ|X] =”

Ip− VbΣ−1(S)—X,

where bΣ−1(S) is an estimator for Σ−1and bΣ−1(S) depends on X only through S.

Accord-ing to Ghosh and Shieh (1991), a natural candidate for bΣ−1(S) is (m − p −1)S−1. Ghosh

and Shieh (1991) obtained this empirical Bayes estimator when the loss function was L1(δ;Θ) = tr (δ − Θ)0Q(δ − Θ), (3)

where Q is a p× p known positive definite matrix.

To continue, we use the following notations borrowed from Ghosh and Shieh (1991): let EΘ denote the expectation conditional on Θ; eE denotes the expectation over the marginal distribution ofΘ; E denotes the expectation over the joint distribution of X andΘ. Let Ri(δ;Θ) = EΘ[Li(δ;Θ)], i = 1,2,3 and R(δ;Θ) = EΘ[L(δ;Θ)]. We use the notation D=€di, jŠ, wheredi, j=1+δi, j

2



∂ si, j (see the last line of page 308 in Ghosh

and Shieh (1991)),δi, jbeing the Kronecker deltas andsi, jthe(i, j)th element of S. Note that D is a differential operator in the form of a matrix. If we assume f(A) is a real-valued function of ap× p matrix A, then D f (A) is a p × p matrix with (i, j )th element

1+δi, j

2 ∂ f (A)

∂ si, j . For example, ifp= 2 and f (A) = s

2 1,1+ s2,22 then Df(A) =  2s1,1 0 0 2s2,2  . Also for any p× p matrix T, D(T) is a p × p matrix with (i, j )th element

p

P

l=1di,ltl , j.

For example, if p= 2 then D(S) =  3

2 0

0 32 

.

THEOREM2. The empirical Bayes estimator in (3) is minimax under (2) if it is

mini-max under the quadratic loss function (3) when Q= V−1.

PROOF. The difference of the risks of bΘEB and X is

R€ΘbEB;Θ Š

− R(X;Θ) =β(1 − β)1 ”g(X;Θ) − g€ΘbEB;Θ Š—

(5)

where

g(δ;Θ) = Ehe−β(1−β) tr((δ−Θ)0V−1(δ−Θ))/2i. Using the fact thate−x≥ 1 − x, we see that

g(bΘEB;Θ) ≥ g(X;Θ)− β(1 − β) 2[1 + β(1 − β)]p m

2

EΘ[tr(X0Σb−1V0V−1VbΣ−1X− 2 tr(X0Σb−1V0V−1(X − Θ))], whereEΘis taken with respect toNp×m€Θ,1+β(1−β)V ⊗ Im

Š

. Using Theorem 2 in Ghosh and Shieh (1991) and the notationR1(δ;Θ) = EΘ[L1(δ;Θ)], we can write

g€ΘbEB;Θ Š − g(X;Θ) ≥ −β(1 − β) 2 [1 + β(1 − β)] −p m2 ”R 1 € b ΘEB;Ȋ− R1(X;Θ) — . (4) Hence, ifR1€ΘbEB,Θ Š ≤ R1(X;Θ), then R € b ΘEB;Θ Š ≤ R(X;Θ). 2

Similar to Ghosh and Shieh (1991), we can write (4) as g€ΘbEB;Θ Š − g(X;Θ) ≥ −β(1 − β) 2 [1 + β(1 − β)] −p m2 E”tr€V12U 1V 1 2Š— and U1= bΣ−1(S)SbΣ−1(S) − 4D€SbΣ−1(S)Š− 2(m − p − 1)bΣ−1(S), andR1€ΘbEB;Θ Š

≤ R1(X;Θ) if U1≤ 0 has a positive probability for some Θ. If bΣ−1(S)

is chosen in such a way that bΘEB improves onX under the quadratic loss function (3),

then it improves onX also under (2).

Let Op be the set of orthogonal matrices of order p and let Vm, p be the Stiefel manifold, namely, Vm, p = ¦

V∈ Rm× p, V0V= I p

©

. Write the singular value decom-position of X as ULV0, where U ∈ Op, V ∈ Vm, p and L = diag

€

l1,l2, . . . ,lpŠwith l1> l2> ··· > lp> 0. Ghosh and Shieh (1991) showed that if bΣ−1(S) = UL−1Ψ(L)U0,

whereΨ∗(L) is a diagonal matrix with diagonal elements equal to ψ

i(L), i = 1,..., p,

then bΘEB is minimax under the following conditions when the loss function is (3) with

Q= V−1.

THEOREM3. Suppose thatψi(L), i = 1,..., p satisfy I. 0< ψ

(6)

374

II. For every i= 1,..., p, ∂ ψ∗i(L)

∂ li ≥ 0,

III. ψ

i(L) are similarly ordered to li, that is



ψ

i(L) − ψ∗t(L)(li− lt) ≥ 0 for every 1 ≤

i, t≤ p.

Then, bΘEB = ”Ip− VUL−1Ψ(L)U

X improves on X for m > p + 1 under the loss function (3).

PROOF. See Ghosh and Shieh (1991). 2

Using Theorem 2, we can see that bΘEB=”Ip− VUL−1Ψ(L)U

X is minimax for

m > p + 1 under the conditions of Theorem 3 when the loss function is (2). This

means,X is inadmissible for m> p + 1 under the loss function (2). Examples (1) and (2) in Ghosh and Shieh (1991) illustrated the above result. Of course, one may obtain estimators of the form (3) which dominateX when the conditions of Theorem 3 are not met. Some examples are given in Ghosh and Shieh (1991).

In the rest of this section, we consider the problem of estimatingΘ under the loss function (2) for the case V= Ip. Letδ = X + G, where G = G(X) is a p × m matrix valued function of X. Also let∇ be a p × m matrix with (i, j ) element equal to the differential operator∂ x

i, j. We obtain a general condition for minimaxity.

THEOREM4. Suppose

δ = X + G

is minimax under (3). Then it is also minimax under (2). PROOF. The difference of the risks ofδ and X is

R(δ;Θ) − R(X;Θ) = g (X;Θ) − g (δ;Θ), where g(δ;Θ) = EΘ • e−β(1−β)tr((δ−Θ)0(δ−Θ)2 ) ˜ . Usinge−x≥ 1 − x, we see that

g(δ;Θ) ≥ g (X;Θ) −β(1 − β)

2 [1 + β(1 − β)]

−p m2 Etr G0G+ 2tr G0(X − Θ),

whereEΘis taken with respect toNp×mΘ, Ip

1+β(1−β)⊗ Im



. Using Theorem 2 in Ghosh and Shieh (1991), we can write

g(δ;Θ) − g (X;Θ) ≥ −β(1 − β)

2 [1 + β(1 − β)]

−p m2 [R

1(δ;Θ) − R1(X;Θ)].

(7)

By Theorem 4, ifEΘ[tr(G0G) + 2tr(G0(X − Θ))] ≤ 0 then R(δ;Θ) ≤ R(X;Θ). Us-ing Stein (1973)’s identities,EΘ

•tr(G0(X−Θ)) 1+β(1−β) ˜ = EΘ[tr(∇G0)], so we can write EΘtr G0G + 2tr G0(X − Θ) = E tr G0G + 2[1 + β(1 − β)]tr ∇G0 .

So, if tr(G0G) + 2[1 + β(1 − β)]tr(∇G0) ≤ 0 with a positive probability for some Θ,

thenδ is minimax under the loss function (2).

Now, assume the class of shrinkage estimators

δ =”

Ip− UF−1Ψ(F)U0—X, (5)

whereΨ(F) = diag€ψ1, . . . ,ψpŠis a diagonal matrix withψibeing functions of F= L2.

We obtain the following by applying Theorem 4. COROLLARY5. Supposeψi satisfy

I. For fixed f1, . . . ,fi−1,fi+1, . . . ,fp,∂ ψi

∂ fi ≥ 0, where i = 1, . . . , p,

II. 0≤ ψp≤ · · · ≤ ψ1≤ 2(m − p − 1).

Then,δ in (5) is minimax under the loss function (2).

PROOF. With the notationΦ(F) = (F0)−1Ψ(F), Stein (1973) showed that

δ =”

Ip− UΦ(F)U0—X is minimax under conditions (I) and (II) when the loss function is (3). Using Theorem 4, we can see thatδ = X + G is minimax under (2), where G =

−UΦ(F)U0X. 2

Tsukuma (2008) showed that the proper Bayes estimators with respect to the follow-ing prior

Θ ∼ Np×m€

0p×m,Λ−1€Ip− ˊ⊗ Im

Š

are of the formδ =”Ip− UF−1Ψ(F)U

X, whereΛ is a p × m random matrix with the density function

π (Λ) ∝ |Λ|a2−1I€0

p× p< Λ < Ip

Š .

Under certaina and loss function (3), these estimators are minimax, thus we can see from Corollary 5 thatδ is minimax under (2).

(8)

376

3. ESTIMATION OF THE MEAN MATRIX WHENV= σ2I

p ANDσ2IS UNKNOWN

Here, we assume that

X∼ Np×m€Θ,σ2Ip⊗ Im

Š , whereσ2is unknown. So,S is independent of X and

S∼ σ2χn2,

whereχ2

ndenotes a chi-square random variable withn degrees of freedom. We consider

estimation ofΘ under L(a;Θ) = 1 β(1 − β)  1− Z Rp×m fβ(X|a) f1−β(X|Θ) dx  = 1 β(1 − β)  1− e− β(1−β)tr((a−Θ)0(a−Θ)) 2σ2  . (6)

Similar to Lemma 1, one can show that the maximum likelihood estimator,X , is imax under the loss function (6). Tsukuma (2010) obtained general conditions for min-imaxity of estimators having the formδ1 = X + G(X, S), where G(X, S) is a p × m matrix valued function of X andS under the quadratic loss function

L2(a;Θ) =tr((a − Θ)

0(a − Θ))

σ2 . (7)

THEOREM6. The estimatorδ1= X + G(X, S) is minimax under the quadratic loss function (6) if it is minimax under (7) for m> p + 1.

PROOF. We have R1;Θ) − R(X;Θ) = 1 β(1 − β)[g (X;Θ) − g (δ1;Θ)]. Usinge−x≥ 1 − x, g1;Θ) ≥ g (X;Θ) −β(1 − β) 2 [1 + β(1 − β)] −p m2 E Θ  tr(G0G) + 2tr(G0(X − Θ)) σ2  and g1;Θ) − g (X;Θ) ≥ −β(1 − β) 2 [1 + β(1 − β)] −p m2 [R 21;Θ) − R2(X;Θ)], where R2(δ;Θ) = EΘ[L2(δ;Θ)].

(9)

So, if

R21;Θ) ≤ R2(X;Θ), then

R1;Θ) ≤ R(X;Θ).

The proof is complete. 2

Tsukuma (2010) showed that if (n − 2)tr(G0G)

s + 2tr ∇G

0

+∂ tr(G0G)

∂ s ≤ 0

thenδ1= X + G(X, S) is minimax under the loss function (7). So, using Theorem 6, we see thatδ is minimax under the loss function (6).

THEOREM7. Consider the estimator

δ2(X, S) =

”

Ip− UF−1Ψ(F, S)U0—X,

where F = Ls = diag€f1, . . . ,fpŠandΨ(F, s) = diag€ψ1, . . . ,ψpŠare diagonal matrices withψi being functions of F and s. If the following conditions hold, thenδ2is minimax under the loss function (7)

I. For fixed F, ∂ ψi

∂ s ≤ 0,

II. For fixed f1, . . . ,fj−1,fj+1, . . . ,fp,∂ ψi

∂ fj ≥ 0, where i, j = 1, . . . , p,

III. 0≤ ψp≤ · · · ≤ ψ1≤2(m−p−1)n+2 .

PROOF. Tsukuma (2010) showed thatδ2is minimax under conditions I-III when the loss function is (7). So, using Theorem 6, we see thatδ2is minimax under the loss

function (6) if conditions I-III hold. 2

4. ESTIMATION OF THE MEAN MATRIX WHENVIS UNKNOWN

Here, we assume

X∼ Np×m(Θ,V ⊗ Im),

where V is unknown. So, bV is independent of X and b

(10)

378

whereWp(V, n) denotes a Wishart random vector with n degrees of freedom and mean nV. We consider estimatingΘ under the loss function

L(a;Θ) = 1 β(1 − β)  1− Z Rp×m fβ(X|a) f1−β(X|Θ) dx  = 1 β(1 − β) • 1− e−β(1−β)tr((a−Θ)0V−1(a−Θ)2 ) ˜ . (8)

Similar to Lemma 1, one can show that the maximum likelihood estimator,X , is mini-max under the loss function (8).

We now consider the relationship between this loss function and the following quadratic loss function L3(a;Θ,V) = tr (a − Θ)0Q∗(a − Θ), (9) where Q∗= V−12QV− 1 2. Assumeδ3= X+G € X, bVŠ, where G€X, bVŠis ap× m matrix valued function of X and bV.

THEOREM8. The estimatorδ3= X + G€X, bVŠis minimax under the loss function (8) if it is minimax under (9). PROOF. We have R3;Θ) − R(X;Θ) = 1 β(1 − β)[g (X;Θ) − g (δ3;Θ)], where g3;Θ) = E – e− β(1−β)tr((δ3−Θ)0V−1(δ3−Θ)) 2 ™ . So, g3;Θ) ≥ g(X;Θ) −β(1 − β) 2 [1+β(1 − β)] −p m2 EΘ[tr(G0Q∗G) + 2tr(G0Q∗(X − Θ))], whereEΘis taken with respect toNp×m€Θ,1+β(1−β)V ⊗ Im

Š . Using R3(δ;Θ) = EΘ[L3(δ;Θ)], we can write g3;Θ) − g (X;Θ) ≥ − β(1 − β) 2 [1 + β(1 − β)] −p m2 [R 33;Θ) − R3(X;Θ)]. Hence, ifR33;Θ) ≤ R3(X;Θ), then R(δ3;Θ) ≤ R(X;Θ). 2

(11)

Shieh (1993) considered empirical Bayes estimators of the form b Θ1EB= ” Ip− bVS−1τ€V, Sb Š— X, where S= XX0andτ€ b

V, SŠis a symmetric matrix. Shieh (1993) showed that bΘ1EB is better than the maximum likelihood estimator,X , under (8). For example, he showed that ifτ€V, Sb Š = (m − p − 1)Ip, then bΘEBimproves onX for m> p + 1.

For the case Q= V = Ip, Konno (1990), Konno (1991), Konno (1992) considered the estimators δ4= ( ” Ip− RF−1Φ(F)R0— X, ifm< p, ” Ip− QF−1Φ(F)RQ−1—X, if m≥ p,

where, forp< m, XbV−1X0= RFR0, R∈ Opand, forp≥ m, QbV−1Q0= Ip, Q0XX0Q=

F= diag€f1, . . . ,fm∧ pŠ, f1≥ · · · ≥ fm∧ p > 0. Under the following conditions, Konno

(1990), Konno (1991), Konno (1992) showed thatδ4improves onX when the loss func-tion is (8):

i) Fori= 1,..., p, φi(F) is nondecreasing in fi;

ii) 0≤ φm∧ p(F) ≤ φm∧ p−1(F) ≤ ··· ≤ φ1(F) ≤2(m∨p−m∧p−1)n+(2m−p)∧p+1.

So, using Theorem 8, we see that these estimators are minimax under (8). 5. SIMULATION STUDY

As mentioned in Section 1, Zinodinyet al. (2017) estimatedΘ under the balanced loss function when V= σ2I

p, whereσ2is unknown. Here, we perform a simulation study

to compare this estimator with the estimator under the divergence loss function, see Theorem 8. The simulation study was performed as follows:

1. simulate 10000 random samples each of sizen from a matrix variate normal dis-tribution with zero means and V= σ2I

p;

2. compute Zinodinyet al. (2017)’s estimator and the estimator in Theorem 8 for each of the samples, say¦Θb1, bΘ2, . . . , bΘ10000

©

and¦Θe1, eΘ2, . . . , eΘ10000 ©

;

3. compute the biases of Zinodinyet al. (2017)’s estimator and the estimator in The-orem 8 as 1 10000m p 10000 X i=1 p X j=1 m X k=1 b Θi, j ,k

(12)

380 and 1 10000m p 10000 X i=1 p X j=1 m X k=1 e Θi, j ,k;

4. compute the mean squared errors of Zinodinyet al. (2017)’s estimator and the estimator in Theorem 8 as 1 10000m p 10000 X i=1 p X j=1 m X k=1 b Θ2 i, j ,k and 1 10000m p 10000 X i=1 p X j=1 m X k=1 e Θ2 i, j ,k.

We repeated this procedure forn= 10,12,...,109. We chose σ = 1, m = 10 and p = 5.

20 40 60 80 100 −0.20 −0.10 0.00 n Bias 20 40 60 80 100 −0.20 −0.10 0.00

Figure 1 – Biases of Zinodiny et al. (2017)’s estimator (solid curve) and the estimator in Theorem 8 (broken curve) versusn.

(13)

20 40 60 80 100 0.00 0.02 0.04 0.06 0.08 n MSE 20 40 60 80 100 0.00 0.02 0.04 0.06 0.08

Figure 2 – Mean squared errors of Zinodiny et al. (2017)’s estimator (solid curve) and the estimator in Theorem 8 (broken curve) versusn.

The biases versusn are drawn in Figure 1. The mean squared errors versus n are drawn in Figure 2. We see that the estimator in Theorem 8 has consistently smaller bias and consistently smaller mean squared error for everyn. We tookσ = 1, m = 10 and p= 5 in the simulations. But the same observation was noted for a wide range of other values ofσ, m and p. Hence, the estimator in Theorem 8 is a better estimator with respect to both bias and mean squared error.

6. CONCLUSIONS

We have considered the problem of estimating the mean of a matrix variate normal dis-tribution. We have provided minimax estimators under the divergence loss function. The divergence loss function has several attractive features compared to other loss func-tions considered in the literature.

We have given minimax estimators for the mean considering different forms for the covariance matrix. The forms considered include the cases that the covariance matrix is known and the covariance matrix is completely unknown.

We have performed a simulation study to compare one of our estimators with the most recently proposed estimator. The simulation shows that our estimator has consis-tently smaller bias and consisconsis-tently smaller mean squared error.

(14)

382

ACKNOWLEDGEMENTS

The authors would like to thank the referee and the Editor for careful reading and comments which greatly improved the paper.

REFERENCES

S. AMARI(1982). Differential geometry of curved exponential families-curvatures and in-formation loss. Annals of Statistics, 10, pp. 357–387.

M. BILODEAU, T. KARIYA(1989). Minimax estimators in the normal manova model. Journal of Multivariate Analysis, 28, pp. 260–270.

C. R. BLYTH(1951). On minimax statistical decision procedures and their admissibility. Annals of Mathematical Statistics, 22, pp. 22–42.

B. CHEN, S. H. LIN(2012).A risk-aware modeling framework for speech summarization. IEEE Transactions on Audio, Speech, and Language Processing, 20, pp. 211–222. M. CORDUAS(1985).On the divergence between linear processes. Statistica, 45, pp. 393–

401.

N. CRESSIE, T. R. C. READ(1984).Multinomial goodness-of-fit tests. Journal of the Royal Statistical Society Series B, 46, pp. 440–464.

B. EFRON, C. MORRIS(1972). Empirical Bayes on vector observations: An extension of Stein’s method. Biometrika, 59, pp. 335–347.

M. GHOSH, V. MERGEL (2009). On the stein phenomenon under divergence loss and an unknown variance-covariance matrix. Journal of Multivariate Analysis, 100, pp. 2331–2336.

M. GHOSH, V. MERGEL, G. S. DATTA(2008). Estimation, prediction and the stein phe-nomenon under divergence loss. Journal of Multivariate Analysis, 99, pp. 1941–1961. M. GHOSH, G. SHIEH(1991). Empirical bayes minimax estimators of matrix normal

means. Journal of Multivariate Analysis, 38, pp. 306–318.

S. JI, W. ZHANG, R. LI(2013).A probabilistic latent semantic analysis model for cocluster-ing the mouse brain atlas. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 10, pp. 1460–1468.

R. L. KASHYAP(1974). Minimax estimation with divergence loss function. Information

Sciences, 7, pp. 341–364.

Y. KONNO(1990). Families of minimax estimators of matrix of normal means with

(15)

Y. KONNO(1991).On estimation of a matrix of normal means with unknown covariance matrix. Journal of Multivariate Analysis, 36, pp. 44–55.

Y. KONNO(1992). Improved estimation of matrix of normal mean and eigenvalues in the multivariate F -distribution. Ph.D. thesis, Institute of Mathematics, University of Tsukuba.

E. L. LEHMANN, G. CASELLA(1998). Theory of Point Estimation. Springer Verlag, New York, 2 ed.

J. MALKIN, X. LI, S. HARADA, J. LANDAY, J. BILMES(2011).The vocal joystick engine

v1.0. Computer Speech and Language, 25, pp. 535–555.

J. PAUL, P. Y. THOMAS(2016). Sharma-mittal entropy properties on record values. Sta-tistica, 76, pp. 273–287.

C. P. ROBERT(2001). The Bayesian Choice, second edition. Springer Verlag, New York.

G. SHIEH(1993). Empirical bayes minimax estimators of matrix normal means for arbi-trary quadratic loss and unknown covariance matrix. Statistics and Decisions, 11, pp. 317–341.

C. STEIN (1973). Estimation of the mean of a multivariate normal distribution. In J. H ´AJEK(ed.),Proceedings of the Prague Symposium on Asymptotic statistics. Universita Karlova, Prague, pp. 345–381.

H. TSUKUMA(2008). Admissibility and minimaxity of Bayes estimators for a normal

mean matrix. Journal of Multivariate Analysis, 99, pp. 2251–2264.

H. TSUKUMA(2010). Proper Bayes minimax estimators of the normal mean matrix with common unknown variances. Journal of Statistical Planning and Inference, 140, pp. 2596–2606.

L. YANG, B. GENG, A. HANJALIC, X. S. HUA(2012).A unified context model for web image retrieval. ACM Transactions on Multimedia Computing, Communications, and Applications, 8. Article No. 28.

Z. ZHANG(1986a). On estimation of matrix of normal mean. Journal of Multivariate Analysis, 18, pp. 70–82.

Z. ZHANG(1986b). Selecting a minimax estimator doing well, at a point. Journal of Multivariate Analysis, 19, pp. 14–23.

S. ZINODINY, S. REZAEI, S. NADARAJAH(2017).Bayes minimax estimation of the mean

matrix of matrix-variate normal distribution under balanced loss function. Statistics and Probability Letters, 125, pp. 110–120.

(16)

384

SUMMARY

The problem of estimating the mean matrix of a matrix-variate normal distribution with a co-variance matrix is considered under two loss functions. We construct a class of empirical Bayes estimators which are better than the maximum likelihood estimator under the first loss function and hence show that the maximum likelihood estimator is inadmissible. We find a general class of minimax estimators. Also we give a class of estimators that improve on the maximum likelihood estimator under the second loss function and hence show that the maximum likelihood estimator is inadmissible.

Keywords: Empirical Bayes estimation; Matrix variate normal distribution; Mean matrix; Mini-max estimation.

Figura

Figure 1 – Biases of Zinodiny et al. (2017)’s estimator (solid curve) and the estimator in Theorem 8 (broken curve) versus n.
Figure 2 – Mean squared errors of Zinodiny et al. (2017)’s estimator (solid curve) and the estimator in Theorem 8 (broken curve) versus n.

Riferimenti

Documenti correlati

Quello con la madre è un rapporto che segna fortemente Alda Merini dall‟infanzia fino al periodo della saggezza - anni in cui lo spirito irriverente della poetessa non

We then investigate how galaxies with different stellar masses and stellar activities are distributed around the filaments and find a significant segregation, with the most massive

With the above prior, we use three different loss functions for the model (1): first is the squared error loss function which is symmetric, second is the precautionary loss which is

We show that segregation according to information preferences leads to a seg- regation of information: If groups do not interact with each other, within each group, only the

Prognostic impact of programmed cell death-1 (PD-1) and PD-ligand 1 (PD-L1) expression in cancer cells and tumor-infiltrating lymphocytes in ovarian high grade serous carcinoma..

The aim of the present work is to perform a numerical acoustic properties evaluation of the engine cover by using the hybrid FEM/SEA approach, and to realize

Nell’ambito della societa` feudale la famiglia svolgeva un ruolo patrimoniale: ora essa fini- sce spesso per configurarsi come vera e propria impresa economica; nel ’400, quando,

COSTRUIRE UN VIDEO SU UN CONFLITTO AMBIENTALE IN ATTO Strumenti di indagine $A EFG7 video 6A?8