Fisher Information Geometry for Shape Analysis

(1)

Angela De Sanctis and Stefano Antonio Gattone *

University "G. d’Annunzio" of Chieti-Pescara, Pescara, Italy; [email protected] * Correspondence: [email protected]

Abstract: The aim of this study is to model shapes from complex systems using Information Geometry tools. It is well-known that the Fisher information endows the statistical manifold, defined by a family of probability distributions, with a Riemannian metric, called the Fisher–Rao metric. With respect to this, geodesic paths are determined, minimizing information in Fisher sense. Under the hypothesis that it is possible to extract from the shape a finite number of representing points, called landmarks, we propose to model each of them with a probability distribution, as for example a multivariate Gaussian distribution. Then using the geodesic distance, induced by the Fisher–Rao metric, we can define a shape metric which enables us to quantify differences between shapes. The discriminative power of the proposed shape metric is tested performing a cluster analysis on the shapes of three different groups of specimens corresponding to three species of flatfish. Results show a better ability in recovering the true cluster structure with respect to other existing shape distances.

Keywords:landmarks; shape analysis; Fisher–Rao metric; information geometry; geodesics

1. Introduction

Shape analysis is a timely and interesting research field. Applications of shape analysis have been involved in various areas such as morphometry, image analysis, biology, database retrieval, and so on. In all these fields the aim could be to get an unsupervised classification of the objects in different clusters such that objects within a group are more similar in terms of shapes than they are in other groups. Then, the clustering of shapes is a longstanding challenge in the framework of geometric morphometrics [1], since the recognition of groups of similar morphologies, and then of the differences among these groups, is a key step of the analysis when geometric morphometrics protocols are applied. Shapes must be invariant to rotation, scale and translation so that a straightforward way to proceed is first to align the objects by using Procrustes analysis and then to apply standard clustering algorithms minimizing a given distance or dissimilarity measure evaluated within each cluster [2,3]. Similarly, [4] proposed to use a dissimilarity measure based on the inter-landmark distances and then apply standard statistical procedures such as hierarchical clustering or k-means clustering.

Under the hypothesis that it is possible to extract from the shape a finite number of representing points, called landmarks, we propose new statistical modeling of 2-dimensional shapes by representing each landmark by a bivariate Gaussian random variable. The means and the variances parameters becomes the coordinates of the statistical manifold. In particular, the variances reflect the uncertainty in the landmark’s placement and the variability across a family of shapes. Within this framework , we derive a distance between two shapes using tools from Information Geometry [5,6], which considers statistical models as Riemannian manifolds with the Fisher–Rao metric. However, the associated geodesic distance has not yet been derived for the family of general multivariate normal distributions. Closed form expressions have been obtained only for isotropic and diagonal Gaussian distributions [7].

(2)

discriminative power is evaluated through a real application.

2. Information Geometry

Let I= [0, 1]and P a k-dimensional family of positive probability density functions p : I×Rk →R+

(x, θ) 7→ p(x, θ), parametrized by θ∈ Rk. In classical Information Geometry, the Fisher information matrix g, with generic(i, j)- entry

gij(θ) = Z 1 0 p(x/θ) ∂ ∂θi log p(x/θ) ∂ ∂θj log p (x/θ)dx. (1) is regarded as the most natural Riemannian structure on the parameter space [5,6].

From differential geometry we know that a metric matrix g defines an inner product on the tangent space of the manifold as follows:hu, vi =uTgijv with associated normkuk =p

hu, ui. Then the distance between two points P, Q of the manifold is given by the minimum of the lengths of all the piecewise smooth paths γ joining these two points. Precisely the length of a path is calculated by using the inner product:

Length of γ=

Z

γ

kγ0(t)kdt and so

d(P, Q) =minγ{Length ofγ}.

A curve that encompasses this shortest path is called a geodesic. In particular we consider the family of p-variate normal distributions:

p(x, µ,Σ) = (2π)−2p(detΣ) −1 2 exp(−1 2 (x−µ) T_Σ−1₍_x₋ µ))

where x = (x1, x2, ..., xp)T, µ = (µ1, µ2, µp)T is the mean vector andΣ the covariance matrix. Note that the parameter space has dimension k=p+ p(p+1)₂ . We have three sub-cases [7]:

(i) Round Gaussian distributions:Σ=σ2I

In this case the family can identified with the(p+1)-dimensional half space parameterized by

(µ1, µ2, ..., µp, σ), σ>0, and the Fisher information matrix is:

g(µ1, µ2, ..., µp, σ) =        1 σ2 0 . . . 0 0 0 1 σ2 . . . 0 0 0 0 . . . 0 0 0 0 . . . 1 σ2 0 0 0 . . . 0 2p σ2       

Using a similarity transformation with the following matrix of the hyperbolic metric in the

(p+1)-dimensional half space

g(µ1, µ2, ..., µp, σ) =        1 σ2 0 . . . 0 0 0 1 σ2 . . . 0 0 0 0 . . . 0 0 0 0 . . . 1 σ2 0 0 0 . . . 0 1 σ2        the closed form for the distance is given by

(3)

where(µ¯1, σ1) = ((µ11, ..., µ1p, σ1)and(µ¯2, σ2) = ((µ21, ..., µ2p, σ2)besides|.|is the usual Euclidean norm. The geodesics in the parameter space are contained in planes orthogonal to the hyperplane σ =0 and are either lines or half ellipses centered at this hyperplane. The curvature of the family is equal to _p(p+1)−1 .

(ii) Diagonal Gaussian distributions:Σ=diag(σ₁2, σ₂2, ..., σp2)

The family of all independent multivariate normal distributions is the intersection of half-spaces parameterized by(µ1, σ1, µ2, σ2, ..., µp, σp), σi >0, so the Fisher information matrix is:

g((µ1, σ1, µ2, σ2, ..., µp, σp) =          1 σ₁2 0 . . . 0 0 0 2 σ₁2 . . . 0 0 0 0 . . . 0 0 0 0 . . . 1 σ2p 0 0 0 . . . 0 2 σp2          .

In this case the metric is a product metric, the curvature of the family is_2(2p−1)−1 , and, using again the similarity with the hyperbolic metric, we have the following closed form for the distance

d((µ11, σ11, ..., µ1p, σ1p),(µ21, σ21, ..., µ2p, σ2p)) = v u u u t2 p

∑

i=1 ln |(_√µ1i 2, σ1i) − ( µ2i √ 2,−σ2i)| + |( µ1i √ 2, σ1i) − ( µ2i √ 2, σ2i)| |(_√µ1i 2, σ1i) − ( µ2i √ 2,−σ2i)| − |( µ1i √ 2, σ1i) − ( µ2i √ 2, σ2i)| !2 (3)

(iii) General Gaussian distributions:Σ any symmetric positive definite covariance matrix The analysis is much more difficult and it is not known a closed form for the associated distance.

3. Modeling of 2-Dimensional Shapes

We will consider only planar objects, as for example a section of the skull. The shape of the object consists of all information invariant under similarity transformations, that is translations, rotations and scaling [8]. Data from a shape are often realized as a set of points. Many methods allow us to extract a finite number of points, which are representative of the shape and are called landmarks. One way to compare shapes of different objects is to first register them on some common coordinate system for removing the similarity transformations [9,10]. Alternatively, Procrustes methods [11] may be used in which objects are scaled, rotated and translated so that their landmarks lie as close as possible to each other. Suppose we have a sample of n planar shapes. Let us denote the shape coordinates of the j-th configuration Sj, j = 1, ..., n, via its K landmarks µj =

µj1, µj2, . . . , µjK

with generic element µjk =

µjk1, µjk2, for k = 1, . . . , K. For the k-th landmark an estimate of the

coordinates covariance matrixΣkis given by Σk= 1 n n

∑

j=1 vec(µjk−µ¯k)(vec(µjk−µ¯k))T (4) 1_∑n

(4)

bivariate Gaussian density. Assuming a round Gaussian distribution, case i) of section (2), we have Σk =Σ=σ2I2obtaining the following representation for the k-th landmark, for k=1, . . . , K:

1

2πσ2exp{−

kx−µkk2

2σ2 } (5)

where x is a generic 2-dimensional vector. In the landmark representation (5), σ2is a free parameter isotropic across all the K landmarks. Therefore, only the means are used as coordinates of the statistical manifold. We can relax the isotropic hypothesis assuming diagonal Gaussian distribution, case ii) of section (2). To this end, the new model for the representation of the k-th-landmark, for k=1, . . . , K, is given by 1 2π|Σk|− 1 2_exp{−1 2(x−µk)TΣ−1_k (x−µk)} (6) where Σk = σ_k2I2 and σk2 = {σk12, σk22}is the vector containing the variances of the k-th landmark coordinates. Representation (6) allows us to express the k-th landmark coordinates as θk= (µk, σk)on a 4-dimensional manifold which is the Cartesian product of two half-planes.

4. Shape Metrics Based on Geodesic Distance

The landmarks representation as probability distributions enables to perform various type of analysis as for example quantifying the difference between shapes. Let S1 and S2 be two planar shapes. Denote the length of the geodesic path connecting the k-th landmarks of the two configurations by d(θ_kS1, θ_kS2). A shape metric for measuring the difference between S1and S2can be obtained evaluating the geodesic distances between the corresponding landmarks of the two configurations as follows: d(S1, S2) = K

∑

k=1 d(θS_k1, θ_kS2). (7)

In Section 2 we provided closed form for the geodesic distance d(θS_k1, θS_k2) according to the type of landmarks representation adopted. Under the isotropic variance assumption - landmarks representation (5) - the geodesic distance is computed with (2) while with the varying variance model - landmarks representation (6) - the geodesic distance is computed with (3). In order to apply (3) we need the covariance matrix in (6) to be diagonal. To this purpose we can apply an orthogonal transformation to the original landmarks coordinates which does not affect the analysis because it induces a rotation on the plane which leaves invariant the shape of a configuration.

The discriminative power of the proposed shape metrics is tested performing a cluster analysis on the shapes of three different groups of specimens corresponding to three species of flatfish. Namely, 60 individuals of plaice (Pleuronectes platessa L.) and 63 of flounder (Platichthys flesus L.) were collected in North Bull Island [12]. In addition, a group of 14 individuals of the species S. solea were used. These individuals were collected during the activities for the SOLEMON survey carried out the in the Adriatic Sea (Mediterranean Sea) during the autumn 2014. This last group of individuals would represent a kind of out-group with respect to the other two ones, since it belongs to a different and phylogenetically distant family (Soleidae) [13]. Each fish was photographed in lateral aspect and digital images were extensively described in [12]. In summary, the scheme of 21 landmarks described in Figure1was digitized for each individual of the three species.

(5)

Figure 1. Landmark configuration collected on (a) Platichthys flesus, (b) Pleuronectes platessa, and (c) Solea solea. 1, snout tip; 2, 3 and 4, points of maximum curvature of the peduncle; 5, insertion of the operculum on the lateral profile; 6, posterior extremity of premaxillar; 7 and 8, centres of the eyes; 9, beginning of the lateral line; 10 and 11, superior and inferior insertion of the pectoral fin; 12-16, semi-landmarks collected on dorsal fin; 17-20, semi-semi-landmarks collected on anal fin; 21, insertion of first ray of the pelvic fin.

Three different shape metrics are computed:

• shape metric under the varying variance model (6) (dG) • shape metric under the fixed variance model (5) ( rG)

(6)

by minimizing the Euclidean sum of squares between the landmark configurations using translation, rotation and scale. The computation of the geodesic distance within the model (5) requires the choice of the free parameter σ2. In order to test the sensitiveness of the final clustering with respect to changes in the value of σ2, we adopted three different values of σ2 given by the first (rGQ₁), the second (rGQ2) and the third quartile (rGQ3) of the values of the variances of each landmark and for

each dimension. For each metric we computed the pairwise distances of all shapes. The obtained distance matrices are then used in a hierarchical clustering algorithm. The behaviour of the clustering has been evaluated by means of the a-Rand index [14]. Results for the solutions with three clusters are reported in Table1below.

Table 1.a-Rand index and number of miss-classified fishes.

Shape distance dG rGQ1 rGQ2 rGQ3 PD

aRand-index 0.8957 0.6669 0.7752 0.8708 0.3495

Number of missclassified fishes 4 14 9 5 35

Results show that the shape metric with the varying variance representation leads to the best performance in recovering the true cluster structure. The fixed variance model results in a worse cluster recovery with different behaviours depending on the value of the free parameter σ2. For this particular data set, the Procrustes distance turns out to have a very poor performance in terms of cluster recovery.

As a conclusion, we only remark that the proposed shape representation allows also to study and predict the evolution in time of a shape [15].

Acknowledgments:We thank Tommaso Russo and Domitilla Pulcini for preparing the fish data and the support provided in commenting the results of the cluster analysis.

References

1. Zelditch, M.L.; Swiderski, D.L.; Sheets, H.D.; Fink, W.L. Geometric Morphometrics for Biologists: A Primer; Elsevier Academic Press: London, UK, 2004.

2. Stoyan, D.; Stoyan, H. A further application of D.G. Kendall’s Procrustes Analysis. Biom. J. 1990, 32, 293–301.

3. Srivastava, A.; Joshi, S.H.; Mio, W.; Liu, X. Statistical Shape analysis: Clustering, learning, and testing. IEEE Trans. PAMI 2005, 27, 590–602.

4. Lele, S.; Richtsmeier, J. An Invariant Approach to Statistical Analysis of Shapes; Chapman & Hall/CRC: New York, NY, USA, 2001.

5. Amari, S.; Nagaoka, H. Methods of information geometry, volume 191 of Translations of Mathematical Monographs; American Mathematical Society: Providence, RI, USA, 2000.

6. Murray, M.K.; Rice, J.W. Differential Geometry and Statistics; Chapman & Hall, New York, NY, USA, 1984. 7. Costa, S.; Santos, S.; Strapasson, J. Fisher information distance: A geometrical reading. Discret. Appl. Math.

2015, 197, 59–69.

8. Dryden, I.L.; Mardia, K.V. Statistical Shape Analysis; John Wiley & Sons: Hoboken, NJ, USA, 1998. 9. Bookstein, F.L. Size and shape spaces for landmark data in two dimensions. Stat. Sci. 1986, 1, 181–242. 10. Kendall, D.G. Shape manifolds, Procrustean metrics and complex projective spaces. Bull. Lond. Math. Soc.

1984, 16, 81–121.

11. Goodall, C.R. Procrustes methods in the statistical analysis of shape. J. R. Stat. Soc. 1991, 53, 285–339. 12. Russo, T.; Pulcini, D.; O’Leary, A.; Cataudella, S.; Mariani, S. Relationship between body shape and trophic

niche segregation in two closely related sympatric fishes. J. Fish Biol. 2008, 73, 809–828.

13. Berendzen, P.B.; Dimmick, W.W. Phylogenetic relationships of Pleuronectiformes based on molecular evidence. Copeia 2002, 3, 642–652.

(7)

14. Hubert, L.; Arabie, P. Comparing Partitions. J. Classif. 1985, 2, 193–218.

15. Sanctis, A.D.; Gattone, S. Methods of Information Geometry to model complex shapes. Eur. Phys. J. Spec. Top. 2016, 225, 1271–1279.

c

2016 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC-BY) license (http://creativecommons.org/licenses/by/4.0/).