Probability measures on infinite dimensional embedded manifolds with applications to computer vision

(1)

Universit`

a di Pisa

Facolt`

a di Scienze Matematiche, Fisiche e Naturali

Probability Measures

on Infinite Dimensional Embedded Manifolds

with applications to

Computer Vision

Tesi di laurea

Candidata Eleonora Bardelli

Relatore Andrea Mennucci

Corso di Laurea Magistrale in Matematica

(2)

(3)

Introduction

This thesis aims to provide examples of probability measures on infinite dimen-sional embedded manifolds. We try to generalize some approaches commonly used to define measures on finite dimensional manifolds. However, when study-ing probability on infinite dimensional spaces many difficulties arise, and hence some finite dimensional methods could fail.

The text is divided into three chapters and only in the last one probability on manifolds is addressed. In the first chapter, we discuss what motivated us to the study of probability measures on infinite dimensional embedded manifolds and explain the connection with computer vision. The second chapter provides an introduction to Gaussian measures in Hilbert spaces, which are widely used in the third chapter.

Computer vision is a branch of computer science that studies methods for processing and analyzing digital images in order to extract interesting features. For example the tracking problem consists of recognizing and following a moving object in a movie.

Objects in images are usually identified by their shape and different shape spaces have been proposed in the literature. Some of them are infinite dimen-sional, for example the space of closed plane curves. In the thesis we concentrate on a particular space of closed plane curves, previously proposed in [35, 31]. Our space is endowed with a differential structure and a Riemannian metric, which is closely related to a Stiefel manifold, and can be embedded in a infinite di-mensional Hilbert space.

The problem of tracking is often formulated as a Bayesian filtering problem. The motion of the object is modeled by a stochastic equation and the probability distribution of its shape is estimated, conditioning on the current frame. The problem of Bayesian filtering has many applications in engineering, it is well studied in Rn_{, and several methods to solve it exactly or approximately are}

known. However it is difficult to rigorously generalize these methods (and the problem itself) to infinite dimensional spaces.

It would be desirable to know something more on Bayesian filtering on some infinite dimensional shape spaces. Before that, it is at least necessary to study

(6)

some examples of probability measures on them.

In the second chapter we provide an introduction to Gaussian measures in Hilbert spaces, following [3]. Such measures have many properties in common with normal distributions in Rn and are characterized by their Fourier trans-form.

Unlike the finite dimensional case, absolute continuity of different Gaussian measures with respect to each other is not guaranteed in general and fails in some common situations, for example when translating or scaling a measure. To see this fact, a deeper analysis is needed, involving the Cameron-Martin space. The third chapter deals with probability measures on manifolds. A few different approaches to endow a manifold with a measure are presented, all borrowed from the study of probability on finite dimensional manifolds, and their generalization to the infinite dimensional case is analysed.

A natural way to put a measure on an embedded manifold is provided by the restriction of the Hausdorff measure. A “Gaussian” generalization of it to finite codimension objects in a Hilbert space can be found in [13], but it is far less natural.

A simple way to sidestep the problem of defining a measure on a manifold is to use the push forward of a measure defined on the ambient space under some function.

A reasonable function to choose in finite dimension is the projection onto the nearest point of the manifold, which is unique for almost every point with respect to the Lebesgue measure. Unfortunately this kind of projection may be ill-defined in infinite dimensional spaces. In particular we show that, for any given Gaussian measure, there is a manifold such that the set of those points for which the projection does not exist is non negligible.

On the other hand, this approach can be suitable for Stiefel manifolds. In-deed, we prove that the projection is well-defined for almost every point, with respect to any Gaussian measure.

The push-forward of measures under the exponential map is examined last. In finite dimension, provided measures on tangent spaces are equivalent (mutu-ally absolutely continuous) to the Lebesgue measure, the push-forward measures are equivalent, even when using exponential maps from different points. This fails in infinite dimension and we show that the same Gaussian measure, pro-jected from the north or the south pole of a sphere, gives two measures singular with respect to each other.

1.1 Notation

In this Section we define some common notation, that are used throughout the text.

We denote by N the set of natural numbers and, by convenience for sequence numbering, we let it start from 1. As usual the sets of real and complex numbers are denoted by R and C, and the Euclidean norm in Rn _{by |·|.}

(7)

1.1. NOTATION 7

The symbol S1

denotes the unit circle in R2

S1=_{x ∈ R}2 | |x| = 1 .

It is a submanifold of R2 _{and has a Riemannian structure induced by the}

Eu-clidean product in the ambient space R2_{. When integrating on S}1_{, we consider}

on it the Hausdorff measure, which coincide with the arc length measure. Given a manifold M , we denote by

Ck_{(M, R}n)

the set of functions from M to Rn which are continuously differentiable k times and functions in C0_{(M, R}n) are just continuous. For example C1 S1_{, R}2 is the set of continuously differentiable functions from the unit sphere to R2_.

The remaining of this section is devoted to fix the terminology and notation for probability.

Let Ω be a set and F ⊆ P(Ω) a σ-algebra. We call the pair (Ω, F ) a mea-surable space. By the term measure we mean a countably additive nonnegative finite function µ : F → [0, +∞). When we need to talk about other kind of mea-sures, we explicitly say signed measure or not finite measure. We call such a triple (Ω, F , µ) a measure space. When µ(Ω) = 1 the triple is called a probability space.

Given a set Ω1, a measurable space (Ω2, F2) and a family G of functions

Ω1→ Ω2, we call σ-algebra generated by G the smallest σ-algebra with respect

to whom all functions in G are measurable.

Given two measures µ and ν on F , we denote by µ ν

the absolute continuity of µ with respect to ν. If the two measures are such that µ ν and ν µ we say that they are equivalent and denote this fact by

µ ∼ ν.

On the opposite side, we say that two measures are orthogonal or singular to each other if their are concentrated on disjoint sets.

If Ω is a topological space, the Borel σ-algebra B(Ω) is the smallest σ-algebra that contains all open sets. In the following, when not otherwise specified, measures on a topological space Ω are assumed to defined on B(Ω).

A measure µ on a topological space Ω is called a Radon measure if for all B ∈ B(X) and for all ε > 0 there exists a compact set C such that µ(B \ C) < ε. Let (Ω1, F1, µ) be a measure space, (Ω2, F2) a measurable space. A random

variable f is a measurable function Ω1 → Ω2. If Ω2 = R, f is called a real

random variable.

The measure f]µ on F2 defined by

f]µ(A) = µ f−1(A)

(8)

is called the image of µ under the function f . Note that the measure f]µ is

unchanged if f is modified on a null set. A change of variables formula relates the integrals with respect to a measure and an image of it, see Theorem 4.1.11 of [11] for more details.

Proposition 1.1.1. Let ϕ : Ω2→ R a measurable function. Then ϕ is integrable

with respect to f]µ if and only if ϕ ◦ f is integrable with respect to µ and in that

case one has _ˆ

Ω2

ϕ df]µ =

ˆ

Ω1

(9)

Chapter 2

Motivation

One topic in computer vision is the analysis of a digital image to recognize some objects in it. The image can be static or be a dynamic video, in which a object should be detected and then followed as it moves.

The problem of recognizing the contours of objects in a fixed image is often referred to as image segmentation, while the problem of following a moving object is called tracking. Of course the first should be addressed also when dealing with the second.

In the last decades of the XX century, the most used tecnique to segment an image was to first identify edges, and then try to connect them to form a whole contour.

The current approach works in the opposite direction. The algorithm is initialized with a closed curve and then moves it to match a contour in the image. This is done by defining an energy function on all curves, which is supposed to be small on objects contour, and looking for the minimum of it.

This approach was introduced in [18] and is now widely used. It is referred to as active contours.

The energy function usually contains an edge-based term, which attracts the minimum towards the edges, and also a smoothness term, which prevents the minimum from being too irregular.

The active contours method raises some interesting mathematical problems. The main one is how to minimize a function whose domain is a space of curves. In turn, we have to define more properly what a curve is and the geometric structure of the space of curves.

Defining properly a space of curves, or a shape space as it is called, is an important issue because a good choice can make the minimization easier. This problem has been studied also in recent years, with the proposal of some new shape spaces, for example in [25, 35, 31, 26, 7].

The most common minimization algorithms are based on gradient flow tech-niques. To implement such a technique, it is necessary to have a differential structure and a metric on the shape space. This is the reason for which some recent developments focus on shape spaces with a linear or differential structure

(10)

Figure 2.1: An example of tracking. Two overlapped objects move in opposite directions.

and study metrics on them.

2.1 Shape spaces

There are many ways to define a shape space. Many have been used in literature and each has his own advantages. As explained above, it is useful to define on the shape space also a differential structure and a metric. In doing that, we shall also keep in mind computational issues, since in the end we aim to numerically implement the theory.

The intuitive idea of a “contour” is that of a subset of R2_{. This a structureless}

dataset, not apt to calculus. We will need to give to the space of contours a form of differential structure.

One simple way is to see contours as the image of a closed curve, i.e. a func-tion c : S1→ R2_{with some regularity. A contour could obviously be represented}

by a curve in many ways; even if we consider only non intersecting curves, a curve can always be reparametrized to get a different curve with the same im-age. This requires a quotient: a proper definition of shapes as a quotient of curves under reparametrization is given in [24] and [25].

The main limitation when dealing with curves, is that they are always con-nected. For example this can be a difficulty when tracking two overlapped figures that move in opposite direction. In this situation, the shape evolves creating two connected components. Figure 2.1 provides an example of this phenomenon.

A different approach, that addresses this limitation, is the so called level set method, introduced in [27]. In this case, one represent a contour as the zero set of a regular function. Again there are many ways to represent the same contours, and a quotient should be performed.

In both cases, the shape space is infinite dimensional.

We now provide proper definitions about curves and show some examples of Riemannian-like metrics on spaces of curves.

By the word curve we mean a function c : S1 → R2_{. The set of all}

con-tinuously differentiable curves is C1 _S1

, R2_{. Given a differentiable curve c(t),}

we denote by ˙c or d

(11)

2.1. SHAPE SPACES 11

derivative never vanish

˙c(t) 6= 0 for all t ∈ S1 and we denote by Mi the set of C1immersed curves.

The set C1 _S1

, R2_{has a linear structure and it is a Banach space with the}

norm kck_C1 = sup t∈S1 |c(t)| + sup t∈S1 | ˙c(t)| .

The immersed curves are an open subset of this Banach space, and so have a differential structure as well. A tangent vector in a point c can be canonically identified with an element of the Banach space C1 _S1

, R2_{and so we indicate}

tangent vectors as C1_{functions h : S}1_{→ R}2_.

Let Diff(S1) be the group of diffeomorphism of the circle, i.e. the set of C1 functions ϕ : S1→ S1_{such that ϕ}−1 _{is continuously differentiable as well. This}

group acts by reparametrization on immersed curves, Diff(S1_{) × M}

i → Mi

(ϕ, c) 7→ c ◦ ϕ A geometric curve is an element of the quotient

Mi/Diff(S1_).

This quotient turns out to be almost a manifold modeled on C1 S1_{, R}2, but has some singular points, see [6] for more details.

Quantities that do not depend on parametrization and can be defined on the quotient are often referred to as geometric quantities. Geometric energy func-tions and metrics are preferred in computer vision, since they better represent the actual contour and are less influenced by the representation.

A common way to define geometric quantities is to evaluate non geometric quantities on the parametrization by arc length. If c is an immersed curve, the parametrization by arc length of c is the curve c ◦ ϕ, such that ϕ ∈ Diff(S1₎

preserves orientation and the norm of the derivative_dtdc ◦ ϕ

is constant on S1. Some common geometric quantities of this kind are the following.

Let c be an immersed curve and h ∈ C1 S1_{, R}2. The derivative by arc length in c is

∂s,ch(t) =

˙h(t) | ˙c(t)|. If f : R2→ Rn

is a measurable function with values in Rn for some n ∈ N, the integral by arc length of f on c is

ˆ c f ds = ˆ S1 f ◦ c(t) | ˙c(t)| dt ,

clearly this definition makes sense also when f is defined only on the image of the curve c.

(12)

The centroid of an immersed curve c is

avg(c) = ˆ

c

c ds

and its length is

len(c) = ˆ

c

1 ds .

In applications it is also useful to perform a quotient by translation and scaling. Curves up to translations and scaling are often identified with curves of centroid the origin and length 1. We denote the set of immersed curves with centroid the origin and length 1 as

Md = {c ∈ Mi | avg(c) = 0 and len(c) = 1 } .

By the inverse function theorem, Mdis a manifold modeled on C1 S1, R2, see

[19, Theorem 5.9] for reference about this theorem in Banach spaces.

Before talking about metrics, note that the differential structure we have defined on Miis not modeled on a Hilbert space, which the natural place where

a metric can be defined. What is usually done to overcome this problem, is to define a pointwise metric and then ad hoc prove that some energies have a gradient and it is sufficiently regular to admit a gradient flow.

The simpler pointwise metric that we can consider on Mi is the L2 metric.

Given a curve c and two tangent vectors h, k ∈ C1 S1_{, R}2, the L2 metric is defined as

hh, ki_L2_,c=

ˆ

S1

hh(t), k(t)i dt

where h·, ·i is the standard scalar product on R2. With this metric, the shape space Mi is a subspace of L2 S1, R2.

The first improvement that can be done to the above metric is to make it geometric. This leads to the definition of the H0 _metric,

hh, ki_H0_,c=

ˆ

c

hh, ki ds .

This metric is widely used in computer vision, and often implicitly assumed when doing gradient flows without talking about a metric.

The metric H0 _{is sometimes defined as}

hh, ki_c= 1 len(c)

ˆ

c

hh, ki ds ,

adding a conformal factor len(c) to make it scale invariant, i.e. hh, ki_c = hh, ki_λc for all λ > 0 .

The conformal factor only changes the velocity of geodesics and gradient flows but not their trajectories.

(13)

2.1. SHAPE SPACES 13

The H0 _{metric also induces a distance on M}

i, that we call H0 distance. As

usual, the distance of two curves is the infimum of the length of the paths in Mi between them.

However, the H0 _{metric has some undesirable features. It was shown in}

[23] that the H0 _{distance induces a pathological distance on the quotient of}

Mi by parametrization, i.e. any two curves can be made arbitrarily close by

reparametrizing them.

Moreover gradient flows of some common energies are very irregular and quickly evolve towards non-smooth curves, while others are even ill defined. In general, H0 gradient flows are very sensitive to noise and numerically instable. Regularization terms can be added to the energies, but this changes the min-imization problem to be solved. More details about these phenomena can be found in [32].

Other metrics, presented for example in [32, 8, 25], use first or higher deriva-tives of tangent vectors. These are usually referred to as Sobolev-type metrics. An example of such a metric is the H1 _{metric, defined as}

hh, ki_H1_,c= 1 len(c) ˆ c hh, ki ds + len(c) ˆ c h∂s,ch, ∂s,cki ds ,

where the factors len(c) make it scale invariant.

This kind of metrics address some of the problems of the H0 metric. The distance induced on the quotient of Mi by reparametrization is not identically

null, as proven in [21], and gradient flows are more regular compared to the H0 gradient flows (see [32]). Moreover, as noted in [33], some ill defined gradient flows with respect to the H0metric are well defined with respect to Sobolev-type metrics, and then more energies can be minimized with this kind of metrics.

Some metrics could also be designed to outline some geometric features of the curves motion, or to induce easily computable geodesics and gradients.

Easiness of computation is the main feature of a metric introduced in [34, 35] and recently proposed in [31]. This metric is closely related to a Stiefel manifold and it is for us the main motivation for studying that kind of manifolds.

The original metric of [35] was defined only on the submanifold Mdof curves

with centroid the origin and length 1. Given a curve c ∈ Md and two tangent

vectors h, k ∈ TcMd the metric is defined as

hh, ki_St,c= ˆ

c

h∂s,ch, ∂s,cki ds .

In next section we explain the connection with Stiefel manifolds, which is not at all evident from this definition.

In the remaining of this section, we see some geometric consideration that allow to extend a metric defined on Md to the whole Mi, as proposed in [31].

Roughly, to get immersed curves from Md, we should specify the centroid

and the scale. This is done by the function Φ Φ : _R2

× R × Md → Mi

(14)

It is easy to show that Φ is a diffeomorphism with inverse Φ−1: Mi → R2× R × Md

c 7→

avg(c), log len(c),c − avg(c) len(c)

.

On R2

and R we can consider the standard scalar product as a metric. Given a metric h·, ·i_M

d on Md, we can consider the product metric on R

2_{× R × M} d

and define a metric on Mi as the pull back under Φ−1 of this product metric.

For example, given a curve c ∈ Mi and tangent vectors h, k ∈ TcMi, the

extension of the metric h·, ·i_St defined above can be computed as follows. First decompose the curve and the tangent vectors using Φ,

cd= c − avg(c) len(c)

(ha, hl, hd) = DΦ−1h (ka, kl, kd) = DΦ−1k and then compute the product metric

hh, ki_St,c= hha, kai + hl_kl₊

ˆ

cd

∂s,cdhd∂_s,cdkdds .

The differential of Φ−1 can be written in a closed form, see [31] for details, and this allows the gradient of some commonly used energies to be explicitly written in a nice form.

With respect to the metric h·, ·i_Ston Mi, centroid translations, scale changes

and deformations of the curve are orthogonal. Moreover, the relative weights of these components can be tuned adding coefficients as follows

hh, ki_c= λahha, kai + λlhlkl+ λd

ˆ

cd

∂s,cdhd∂_s,cdkdds .

The ability to separate these components is important in computer vision appli-cations, in fact an object is usually identified by the Md component of a curve,

while position and scale can depend on the location of the camera and other minor factors.

2.2 The Stiefel manifold

Definition 2.2.1. Let p ∈ R and H a Hilbert space. The Stiefel manifold St (p, H) is the subset of Hp consisting of orthonormal p-uples of vectors.

St (p, H) = {(v1, . . . , vp) ∈ Hp | hvi, vji = 0∀i 6= j and |vi| = 1∀i }

It is easy to check that the Stiefel manifold is actually a manifold, modeled on Hp_{, even when H is infinite dimensional. This can be done using the inverse}

(15)

2.2. THE STIEFEL MANIFOLD 15

The space Hp _{is naturally a Hilbert space, with the scalar product}

h(v1, . . . , vp), (w1, . . . , wp)i_Hp=

p

X

i=1

hvi, wiiH

and this scalar product induces also a Riemannian metric on the Stiefel manifold. With a little abuse of notation, we write St 2, C0 to indicate the set

St 2, C0 = (e, f ) ∈ St 2, L2 _S1

, R2

e and f ∈ C0 S1, R2 . The set St 2, C0_{can be seen also as a submanifold of C}0_{× C}0 _{and it is}

then a manifold modeled on C0 _S1

, R2_{. Let also St}

0 be the open subset in

St 2, C0_{of pairs (e, f ) that never vanish simultaneously,}

St0=(e, f ) ∈ St 2, C0

e(t)2+ f (t)26= 0 for all t ∈ S1 .

A two fold covering of Md can be defined on St0 and the pull-back of the

metric h·, ·i_St defined at page 13 is the metric induced by the inclusion of St0in

St 2, L2 S1_{, R}2.

The two fold covering Ψ : St0→ Md is defined by the conditions that

d

dtΨ(e, f )(t) = 1 2(e

2_{− f}2_{, 2ef )(t)} _{for all t ∈ S}1

avg(Ψ(e, f )) = 0 for every (e, f ) ∈ St0.

We now check that Ψ is well defined and give on overview of how defining a local inverse, referring the reader to [35] for a complete proof of the fact that Ψ is a two fold covering and it is an isometry.

All the following (and the definition) is much clearer if we identify the plane R2 with the complex plane C.

Let (e, f ) = e + if be couple of functions in St0 and consider its square

(e + if )2_{. The fact that e and f are orthonormal, implies that}

ˆ S1 e2− f2_{= 1 − 1 = 0} 2 ˆ S1 ef = 0 .

This means that (e + if )2_{is the derivative of some closed curve c ∈ C}1 _S1

, R2_.

The fact that e and f never vanish simultaneously, implies also that c ∈ Mi.

Moreover the length of c is

len(c) = ˆ S1 (e + if )2= ˆ S1 e2+ f2= 2

The integral curve c is determined up to translation, so we can choose it in a unique way to get avg(c) = 0. Now it is sufficient to scale the curve c by a factor 1

(16)

Conversely, if c ∈ Md, its derivative never vanish, and so it is possible to

extract a continuous square root of 2 ˙c. Let e, f be such that 2 ˙c = 2(e + i f )2_.

The property that c is closed implies ˆ S1 e2= ˆ S1 f2 ˆ S1 ef = 0 .

The length of c can be computed as

1 = len(c) = ˆ S1 1 2(e + if ) 2 =1 2 ˆ S1 e2+1 2 ˆ S1 f2 and so we get that e and f are orthonormal.

2.3 The filtering problem

In this section we introduce the so called filtering problem. This problem often arises in engineering, when an estimate of some parameters, based on noisy measurements, is needed.

After that, we explain how this problem fit to the tracking problem of com-puter vision.

As a naming convention, in this section we use capital Latin letters to indi-cate random variables and the corresponding lower-case letter to indiindi-cate ele-ments in their range. For example, we denote by X a Rn_{-valued random variable}

and by x an element of Rn_{. Greek letters are also used to indicate some random}

variables.

In engineering a common problem is to give an estimate of the physical state of a system, based on some measurements. The system evolves in time accordingly to known, but possibly probabilistic, laws and the measurement does not necessarily give complete information and be affected by errors.

Suppose given a probability space (Ω, F , µ) on which all the subsequent random variables and stochastic processes are defined.

The state of the system is represented by a discrete time random variable Xt with values in Rn, called state vector. The evolution in time is modeled as

Xt+1= f (Xt, ξt)

where f : Rn

× Rm

→ Rn _{is a given transition function and the system noise ξ} t

is a sequence of independent Rm_{-valued random variables of known distribution.}

The ξtare also independent of the past state vectors X0. . . Xt−1. The starting

state X0 is supposed to be known.

The measurement is another random variable Ytdefined as

(17)

2.3. THE FILTERING PROBLEM 17

where the measurement function g is given and the measurement noise ηt is a

sequence of independent random variables of known distribution. The ηt are

also independent of past state vectors and system noise.

The measurement Yt became available at some moment of time. The

fil-tering problem consist of giving the “better” estimate of Xt knowing all the

measurement until time t, that is the values y1. . . ytof the variables Y1, . . . , Yt.

Following a common notation, we denote a tuple y1. . . ytby y1:t.

The estimate can be done computing the conditional probability distribution of Xtgiven the measurements Y1:t= y1:t,

p(Xt|Y1:t= y1:t) .

This probability is often called the posterior.

Before going on, we spend a few words about the posterior probability, saying what we mean by that word and symbol.

Given two events A and B ⊆ Ω, such that P (B) 6= 0, there is no doubt on what the conditional probability is. The probability of A given B is

µ(A ∩ B) µ(B)

and similarly can be defined the conditional probability distribution of a random variable X given a non negligible event B. When conditioning on negligible events, as it can easily happens in the above case, this should be refined.

Consider two random variables X and Y with values in Rn

and Rm

respec-tively. Suppose the distribution of the couple (X, Y ) has density with respect to the Lebesgue measure on Rn_{× R}m_{and call f (x, y) the density. Let also f}

Y

be the density of Y]µ with respect to Lm(the] notation is defined in Section

1.1),

fY(y) =

ˆ

Rn

f (x, y) dLn(x) .

Then the probability of X conditioned to Y = y is an absolutely continuous measure with respect toLn _{defined by the density}

p(X|Y = y)(·) = f (·, y) fY(y)

if fY(y) 6= 0 and identically 0 otherwise.

We wish to remark that the symbol p(X|Y = y) denotes a function, which is a density with respect to the Lebesgue measure. We rarely refer the measure itself, and when needed we indicate it by p(X|Y = y)Ln, although this notation make sense only in case X and Y are absolutely continuous with respect to the Lebesgue measure.

It holds an integration by part formula

(18)

which means ˆ Rn×Rm ϕ(x, y) d(X, Y )]µ(x, y) = = ˆ Rm ˆ Rn

ϕ(x, y)p(X|Y = y)(x) dLn(x) dY]µ(y) =

= ˆ Rm ˆ Rn ϕ(x, y)f (x, y) fY(y) dLn(x)fY(y) dLm(y)

for all integrable ϕ : Rn_{× R}m_{→ R.}

Conditional probability can be defined in a very general setting, asking that the integration by parts formula still holds. However not much of what we say in the following is meaningful without additional hypothesis and in engineer-ing literature all the measures are often implicitly assumed to be absolutely continuous with respect to the Lebesgue measure. For this reasons we restrict our presentation of the filtering problem to random variables with density with respect to the Lebesgue measure.

Back to the filtering problem, we have a model for the evolution of a stochas-tic process Xt. At some time we get a measurement yt, which we would like

to process real-time to get the posterior probability p(Xt|Y1:t= y1:t). This

in-formation is of course contained in the model and an inductive formula for the posterior can be written.

The starting point to write the inductive formula is the relation

p(Xt|Y1:t= y1:t)(x) =

p(Yt|Xt= x)(yt)

p(Yt|Y1:t−1= y1:t−1)(yt)

p(Xt|Y1:t−1= y1:t−1)(x)

which make sense when p(Yt|Y1:t−1)(yt) 6= 0. Note that, by the integration by

parts formula, this is true for (Yt, Y1:t−1)]µ almost every yt, y1:t−1.

A closed form formula is supposed to be available for p(Yt|Xt = x) and

p(Xt|Xt−1 = xt−1). In real world models, these can usually be easily

de-duced from the model. From these and the posterior at time t − 1, the term p(Xt|Y1:t−1= y1:t−1), usually referred to as prior, can be written in integral

form p(Xt|Y1:t−1= y1:t−1)(x) = = ˆ Rn p(Xt|Xt−1= x0)(x) p(Xt−1|Y1:t−1= y1:t−1)(x0) dLn(x0) .

The term p(Yt|Y1:t−1= y1:t−1) needs not to be computed, because it just

nor-malizes the density to have integral equal to 1.

Past data is supposed to have been processed yet, leading to the posterior at time t − 1 (or to an approximation of it), so what is needed to compute the posterior at time t is to do an integral.

In some cases, that integral can be explicitly solved. For example, this is the case when the transition and measurement functions are linear, and noise and

(19)

2.3. THE FILTERING PROBLEM 19

starting state random variables have Gaussian laws. The algorithm to compute the posterior in this case is called Kalman filter, see [2, 17] for a more detailed description.

Otherwise, the posterior should be approximated. This is done by algorithms like the extended Kalman filter, see [2] for details, and sequential importance sampling or particle filtering methods, see [2, 15].

Algorithms that compute the exact or approximate posterior using the in-ductive formula above are in general called Bayesian filters.

A limitation of the above description of the filtering problem is the fact that the random variables Xtand Ytare asked to take values in Rnand Rmfor some

n and m. Actually, this is not really needed.

Suppose Xt takes values in ΩXt and Ytin ΩYt for each t and to have some

“reference” measures µton ΩXt and νton ΩYt. Replace the hypothesis that all

random variables have density with respect to the Lebesgue measure with the one that, for every n, m ∈ N, for every t1. . . tn, s1. . . sm∈ N, the law of

(Xt1, . . . , Xtn, Ys1, . . . , Ysn)

has density with respect to the product measure µt1⊗ · · · ⊗ µtn⊗ νs1⊗ · · · ⊗ νsm.

Under this hypothesis, the conditional probability can be defined as in the case of random variables with density with respect to the Lebesgue measure and it easy to see that what we have said about the calculation of the posterior make sense.

The problem of this generalization is how to choose the reference measures such that the hypothesis of absolute continuity is satisfied by the dynamic and measurement models, which often are given and should model real world.

In the case the reference measures are Lebesgue measures, the hypothesis of absolute continuity is very reasonable and satisfied but nearly all models used in engineering.

Another situation in which the absolute continuity hypothesis is easily met is when some variables take values in an embedded manifold M and we choose as reference measure an Hausdorff measure (for the definition of Hausdorff measure and further reference see 4.1). Again a lot of commonly used models satisfy the hypothesis.

Things get more complicated when trying to formulate the filtering problem in a infinite dimensional Hilbert space or manifold. We talk about probability measures on Hilbert spaces, with special attention to Gaussian measures, from Section 3.2 and there we see that absolute continuity surprisingly fails in some common situations, for example when considering translated or scaled measures. For this reason it could be hard, if not impossible, to choose some good reference measures for a given model. We know no rigorous formulation of the filtering problem that works well in infinite dimensional spaces, and in literature the approach is more heuristic than rigorous.

(20)

2.4 Filtering and tracking

Recall that the tracking problem consist of tracking the motion and deformation of an object in a sequence of digital images. This problem can be given a formulation very similar to the one of the filtering problem.

The state vector Xt is the contour of a real object. Real world objects are

not usually thought in three dimension, but identified with their projection on a plane, and Xtis a random variable with values in a shape space S, for example

(a finite dimensional subspace of) regular closed curves with values in R2. The t-th image Itis instead thought of as a measurement. To get a filtering

problem, one should also specify a dynamic model for the state vector and a model for the measurement, i.e. which process leads from the projection of the real world on a plane to the image. Before making some examples of these models, we spend a few words on the spaces in which the random variables take values.

The image is usually regarded as matrix of pixels, i.e. an element of Rn1×n2

with n1 and n2∈ N the sides’ size, and so Ittakes values in a finite dimensional

vector space.

In the converse, there are various shape spaces. Some of them are finite dimensional vector spaces or manifolds and for these the Bayesian filtering tech-niques make sense and have been successfully used in literature, see [16, 28] for example. Other useful shape spaces are not finite dimensional, for example the Stiefel manifold presented in Section 2.2 which is an infinite dimensional mani-fold embedded in a Hilbert space. Heuristic algorithms, which mimic Bayesian filtering algorithms, have been proposed also for these spaces, see for example [29]. To our knowledge there is no rigorous formulation of this techniques and few examples of probability measures on these shape spaces have been studied. We present examples of dynamic and measurement models both for Rn_and

infinite dimensional manifolds shape spaces.

The dynamic model obviously depends on what kind of objects one needs to follow. Accurate models can be done when one knows in advance the kind of objects being tracked. In the case of a general purpose tracker, there is no prior knowledge of the object’s motion. The simpler model is a kind a “Brownian motion”, the object evolves accordingly to

Xt+1= Xt+ ξt

where ξt is a noise random variable and Xttakes values in Rn. Usually ξt has

a distribution clustered around 0 and the equation just models the fact that at time t the object is likely to be close to where it was at time t − 1.

If the shape space is a manifold M , we can consider a noise random variable ξtwith values in the tangent space TXtM and write the model as

Xt+1= expXtξt

where exp_X

tis the exponential map based in Xt, see Section 4.3 for the definition

(21)

2.4. FILTERING AND TRACKING 21

Another general purpose model ask also some coherence to the velocity of the object. The state vector is a couple (Xt, vt), where vtrepresents the velocity

of the object at time t, and when Xttakes values in Rn the model looks like

Xt+1= Xt+ vt+ ξt

vt+1= Xt+1− Xt

where ξtis a noise random vector, usually clustered around 0.

If the shape space is a manifold, vt and ξt belong to TXtM and the model

is defined by the equations

Xt+1= expXt(vt+ ηt)

vt+1= − exp−1Xt+1(Xt) .

Regarding the measurement model, is often defined imposing the value of the conditional probability p(It|Xt= xt). Choose an energy E(it, xt), defined

on Rn1×n2 _{× S, which takes a local minimum when x}

t is on the contour of an

object, and define

p(It|Xt= xt) =

1 ze

−E(it,xt)Ln1×n2

where z ∈ R is a normalization factor. In this way we are not really modelling the measurement, but just saying that the chosen energy is likely to be minimized on the contours of objects.

An example of energy that can be used in the definition above is the Chan-Vese energy. Before defining it, we should introduce some notation. Let x be a shape in the shape space S and i ∈ Rn1×n2 _{an image. Denote by D the set}

D = {(a, b) | 1 ≤ a ≤ n1, 1 ≤ b ≤ n2}

and by iab, with (a, b) ∈ D, the values of the pixels of i.

Given a planar curve x, we denote by ˚x a suitable discrete approximation of the topological interior of x. The topological interior is considered here because it is the region occupied by the object whose contour is x (see Figure 2.2).

We use integral notation to indicate summations on the image pixels, i.e. if A ⊆ D and f ∈ Rn1×n2_, ˆ A f = X (i,j)∈A fij

and the mean of f in A is

A f = 1 |A| ˆ A f where |A| is the cardinality of A.

The Chan-Vese energy can now be defined as

E(i, x) = ˆ ˚x i − avg(˚x)2+ ˆ D\˚x i − avg(D \ ˚x)2

(22)

(23)

2.4. FILTERING AND TRACKING 23 where avg(˚x) = ˚x i and avg(D \ ˚x) = D\˚x i .

Roughly, this energy gives a measure of how much uniform are the region inside and outside the shape.

(24)

(25)

Chapter 3

Gaussian measures

In this section we make an introduction to probability measures in Hilbert spaces. We are mostly interested in Gaussian measures. The main reference for this kind of results is the book [3], which treats the subject with great generality, considering the case of locally convex spaces.

We restrict our presentation to Hilbert spaces. Gaussian measures in Hilbert spaces can be treated writing down everything in coordinates, and some books follow this approach, for example [9]. We think the abstract setting of [3] is more clear, and in this introduction we follow the book [3].

3.1 Finite dimensional Gaussian measures

Definition 3.1.1. A measure on R is called Gaussian if it is the Dirac measure δmat a point m or if it has density

x 7→ 1 σ√2πexp −(x − m) 2 2σ2

with respect to the Lebesgue measure for some m ∈ R and σ > 0.

If we put σ = 0 for any Dirac measure, the parameters m and σ2 are the mean and variance of µ, namely

m = ˆ R x dµ(x) σ2= ˆ R (x − m)2dµ(x) .

A Gaussian measure on R is called non degenerate if it is not a Dirac measure. A real random variable X on a probability space (Ω, F , µ) is called Gaussian if X]µ is a Gaussian measure on R.

Definition 3.1.2. A measure µ on Rn _{is called Gaussian if for all linear}

func-tional f , the induced measure f]µ is Gaussian.

(26)

The measure µ is called non degenerate if for all linear functional f , the measure f]µ is non degenerate.

We recall now some properties of Gaussian random variables and finite di-mensional Gaussian measures. We do not provide many proofs, referring the reader to Sections 9.4 and 9.5 of [11] for a more detailed discussion.

Proposition 3.1.3. The Fourier transform of a Gaussian measure γ with mean m and variance σ on R is b γ(ξ) = exp i mξ −1 2σ 2_ξ2 .

Corollary 3.1.4. Let {Xn}n∈N be a sequence of centered Gaussian random

variables on a measure space (Ω, F , µ). Suppose they converge almost surely to a random variable X. Then X is Gaussian and centered.

Proof. Let σ2

n be the variance of Xn. By the dominate convergence theorem

d X]µ(ξ) = ˆ H eiξXdµ = lim n→∞ ˆ H eiξXn_{dµ = lim} n→∞e −1 2σ 2 nξ2_,

in particular the rightmost limit exists. This implies that the limit σ = lim

n→∞σ 2 n

exists. The Fourier transform of X is

d X]µ(ξ) = lim n→∞e −1 2σ 2 nξ2_{= e}−12σ 2_ξ2

and then by Proposition 3.1.3 and injectivity of the Fourier transform, X is Gaussian and centered.

An other result about the convergence of Gaussian random variables is the following, see [3, Theorem 1.1.4].

Proposition 3.1.5. Let {Xn}n∈Na sequence of independent centered Gaussian

random variables of variances σ2

n on a probability space (Ω, F , µ). Then the

following conditions are equivalent: 1. the seriesP∞

n=1Xn converges almost everywhere;

2. there exists a subsequence of partial sumsPnk

i=1Xi that converges almost

everywhere as k → ∞; 3. the seriesP∞ n=1Xn converges in probability; 4. the seriesP∞ n=1Xn converges in L2(µ); 5. the seriesP∞ n=1σ 2 n is finite.

(27)

3.1. FINITE DIMENSIONAL GAUSSIAN MEASURES 27

Proposition 3.1.6. A measure µ on Rn _{is Gaussian if and only if its Fourier}

transform is equal to b µ(ξ) = exp i hm, ξi − 1 2hKξ, ξi for some m ∈ Rn

and some symmetric nonnegative matrix K ∈ Rn×n_.

If µ is a Gaussian measure, the vector m and matrix K given by the Propo-sition above are called the mean and covariance matrix of µ. They are related to the mean and variance of image measures of µ under linear maps. Indeed if x ∈ Rn and x∗_{: R}n→ R is the linear random variable

x∗(y) = hx, yi

given by the standard scalar product in Rn, the measure x∗_]µ has mean hm, xi and variance hKx, xi. Moreover, the covariance of two linear random variables x∗1 and x∗2 is hKx1, x2i.

By choosing on orthonormal base in Rnwith respect to which K is diagonal, µ can be decomposed as a product of one-dimensional Gaussian measures. This proves the following Corollary.

Corollary 3.1.7. Let µ be a Gaussian measure on Rn _{and m, K as in}

Propo-sition 3.1.6. Then

1. the support of µ is the orthogonal to Ker(K), in particular it is a subspace and it coincides with Rn _{if and only if K is invertible;}

2. µ has density with respect to the Lebesgue measure if and only if K is invertible;

3. µ is non degenerate if and only if K is invertible.

Proposition 3.1.8. Let X1 and X2 be real Gaussian random variable. Then

any linear combination αX1+ βX2, with α, β in R, is Gaussian as well.

Given a probability space (Ω, F , µ), we often regard a real Gaussian random variable X as an element of L2(µ). This make sense, indeed

ˆ Ω f2dµ = ˆ R x2df]µ(x) < +∞

because Gaussian measures on R has second moment. In the case where Ω = Rn

and µ is a Gaussian measure, all linear functional could be regarded as elements of L2(µ).

Proposition 3.1.9. Let X1, . . . , Xn be centered Gaussian real random

vari-ables. Suppose X1 is orthogonal to X2, . . . , Xn in L2(γ). Then X1 is

(28)

3.2 Gaussian measures in Hilbert spaces

In this section we define Gaussian measures in Hilbert spaces and outline some of their properties, giving also some proof. The reference for other proofs remains [3].

First of all we should say why we generalize Gaussian measures to Hilbert spaces and not, for example, the Lebesgue measure. The fact is that there is no analogue of Lebesgue measure on infinite dimensional spaces, since by a known lemma says that translation invariant measures are not so interesting.

Lemma 3.2.1. Let H be a separable infinite dimensional Hilbert space and µ a translation invariant, possibly not finite, measure on H. Then either µ is identically 0 or it is +∞ on all open sets.

Proof. Let µ be a measure as in the statement and suppose that there exists an open set of finite measure. Then there exists also an open ball B0of finite

mea-sure. Call 3r its radius. Being H infinite dimensional, there exists a sequence of balls {Bn}n∈N of radius r, contained in B0and disjoint. By σ-additivity of µ

∞

X

n=1

µ(Bn) ≤ µ(B0) < +∞ ,

but by translation invariance all the Bnhave the same measure and so it should

hold

µ(B) = 0 for all balls B of radius r .

By separability, H is covered by a countable union of balls of radius r and so µ is identically 0.

Gaussian measures could instead be generalized to Hilbert spaces and retain some nice properties.

In the following let H be a separable Hilbert space. The definition of Gaus-sian measure and some properties are also true in more general spaces, not necessarily separable, but we restrict ourselves to these.

When talking about Gaussian measures, it could be confusing to identify H and its dual. For this reason, we keep them distinct, denoting by H∗ the dual space and by x∗ the linear functional associated to x ∈ H, i.e. for every x ∈ H, x∗is a function defined by

x∗(y) = hx, yi for all y ∈ H .

Definition 3.2.2. Let H be a separable Hilbert space. A measure γ on H is said to be Gaussian if for all x∗∈ H∗_{the image measure x}∗

]γ on R is Gaussian.

Note that all Gaussian measures are probability measures. Indeed, given x∗ ∈ H∗_{, it holds γ(H) = x}∗

]γ(R) and all Gaussian measures on R are

proba-bility measures.

As in the finite dimensional case, all continuous linear functional in H∗ _can

(29)

3.2. GAUSSIAN MEASURES IN HILBERT SPACES 29

on H is Gaussian if and only if all continuous linear functional are Gaussian random variables.

To do an example of Gaussian measure on a Hilbert space, we need a lemma about σ-algebras. This Lemma can be stated in a more general form, see for example Theorem A.3.7 in [3].

Lemma 3.2.3. Let H be a separable Hilbert space, {en} an orthonormal basis

of H and e∗_n the coordinate functions relative to that base.

Then the Borel σ-algebra B(H) is generated by the family {e∗_n}_n∈N of func-tions H → R.

Proof. Let E be the σ-algebra generated by {e∗_n}_n∈N. Finite linear combinations of the e∗_n are measurable with respect to E . It could be verified that translations by elements in Span(en) are measurable too.

The inclusion E ⊆ B(H) is true.

Let {xm}m∈N⊆ Span(en) a countable set dense in H. Since xm are dense,

B(H) is generated by balls centered in {xm} and also by closed balls with those

centers. Translations by xm are E measurable and so to prove the inclusion

B(H) ⊆ E it sufficient to see that all closed balls centered in the origin belong to E .

Let B be a closed ball of radius r with center the origin. For every element xm∈ B, consider the half-space L/ m= {x ∈ H | hxm, xi ≤ r |xm| }. Then

Lm⊇ B and xm∈ L/ m.

The intersection

L = \

m t.c. xm∈B/

Lm

is equal to B. Indeed, if x /∈ B, by density there exists xmsuch that

|x − xm| <

|x| − r 2 .

With elementary calculations, it can be seen that x /∈ Lm⊇ B.

Then B is a countable intersection of Lm, which are E measurable, and then

it is E measurable too.

Example 3.2.4. Let R∞be a product of real lines and F the product σ-algebra of infinitely many copies of B(R). We denote by (xn)n∈N the elements of R∞.

Let also {σn}n∈Nbe a sequence such that

X

n∈N

σ2_n< +∞

and γn centered Gaussian measures on R with variance σ2n.

Define the probability measure_{γ on (R}_e ∞, F ) as the product of γn. Since

ˆ R∞ X n∈N x2ndeγ = X n∈N σ2n< +∞ ,

(30)

the function P x2

n is finite almost everywhere and then the measure eγ is con-centrated on the set `2₌_(x

n)n∈N

P x2_n< +∞ , which is measurable. The set `2_{is a Hilbert space with the usual scalar product}

h(xn), (yn)i_`2=

X

n∈N

xnyn.

Subsets of `2

can be seen as subsets of R∞ and so the measure _eγ could be evaluated on the restriction of F to `2_{, namely subsets of `}2 _{that belongs to F .}

By Lemma 3.2.3, the Borel σ-algebra B(`2_{) is generated by the coordinates}

functions e∗_n: (xm)m∈N 7→ xn, which are measurable with respect to F and so

B(`2_{) is contained in the restriction of F to `}2_.

We can then consider _eγ as a probability measure on `2_{. This is a Gaussian}

measure. Indeed it follows from the definition that e∗_n are Gaussian random variables and by Proposition 3.1.8, linear combinations of e∗_n are Gaussian as well. Every linear functional on `2_{is pointwise limit of such linear combinations}

and so by Corollary 3.1.4 it is Gaussian too.

Gaussian measures are characterized by their Fourier transform. We recall that the Fourier transform of a measure µ is the function_bµ : H∗→ R

b µ(x∗) =

ˆ

eix∗(y)dµ(y) .

Proposition 3.2.5. A measure µ on H is Gaussian if and only if its Fourier transform is b µ(x∗) = exp iL(x∗) −1 2B(x ∗_{, x}∗₎

for some linear function L on H∗ _{and some symmetric nonnegative bilinear}

function B on H∗. Moreover L(x∗) = ˆ x∗dµ (3.1) and B(x∗₁, x∗₂) = ˆ x∗₁− L(x∗₁) x∗₂− L(x∗₂) dµ . (3.2) Proof. Let µ be a measure with such a Fourier transform. For every element x∗∈ H∗ the Fourier transform of the measure x∗]µ is

[ x∗ ]µ(ξ) = ˆ R eiξtdx∗]µ(t) = ˆ H

eiξx∗(y)dµ(y) =µ(ξx_b ∗) = = exp iL(ξx∗) −1 2B(ξx ∗_{, ξx}∗₎_{= exp}_iξL(x∗_{) −}1 2ξ 2_B(x∗_{, x}∗₎_.

By Proposition 3.1.3, this is also the Fourier transform of the Gaussian mea-sure with mean L(x∗_{) and variance B(x}∗_{, x}∗_{). Since the Fourier transform is}

(31)

Conversely, if µ is a Gaussian measure, its Fourier transform can be com-puted using Proposition 3.1.3,

b µ(x∗) = ˆ H eix∗(y)dµ(y) = ˆ R eitdx∗]µ(t) = = [x∗ ]µ(1) = exp iL(x∗) −1 2B(x ∗_{, x}∗₎ ,

where L and B are defined by Equation 3.1 and 3.2. Clearly L is linear and B is symmetric, nonnegative and bilinear, so we have proven the first part of the proposition, Equation 3.1, and Equation 3.2 in the special case where x∗₁= x∗₂. To prove Equation 3.2 it is sufficient to note that left and right hand side are symmetric bilinear forms that induce the same norm and this is sufficient to say that they are equal.

The operators L and B have some important continuity property, as stated in next theorem.

We recall that if K is a symmetric nonnegative compact operator on H, by a well know theorem on the diagonalization of symmetric compact operators (see e.g. [5, Teorema VI.11]), there exists an orthonormal basis {en}n∈N of H made

of eigenvectors for K. Let σn2 be the correspondent eigenvalues. We say that K

is “trace-class” if the sum of the eigenvalues converges,

∞

X

n=1

σ_n2< ∞ .

The term trace-class operator is usually referred to a much more general class of operators, see e.g. [14], but here we just need to say in short that the sum of the eigenvalues of a symmetric nonnegative compact operator converges. Theorem 3.2.6. Let γ be a Gaussian measure on a separable Hilbert space H, L and B defined as in Proposition 3.2.5. Then there exist a vector mγ ∈ H and

a symmetric nonnegative compact “trace-class” operator K such that

L(x∗) = x∗(mγ) = hmγ, xi (3.3)

B(x∗₁, x∗₂) = hKx1, x2i . (3.4)

Conversely, for any such m and K there exists a Gaussian measure γ on H with Fourier transform

b γ(x∗) = exp i hm,xi − 1 2hKx, xi

Proof. We first show that L and B are continuous. By dominate convergence theorem, the Fourier transform of γ is continuous H∗ → R with respect to the weak topology on H∗ and then also with respect to the strong topology. The Fourier transform is b γ(x∗) = exp iL(x∗) −1 2B(x ∗_{, x}∗₎

(32)

and so x∗7→ B(x∗, x∗) and x∗7→ L(x∗) are continuous as well.

The function L is a linear continuous functional on H∗, and then there exists mγ satisfying Equation (3.3).

Since B is a bilinear operator, from the continuity of x∗7→ B(x∗_{, x}∗_{) follows}

also the continuity of B : H × H → R, and so there exists a linear continuous operator K that satisfies Equation (3.4). Symmetry and nonnegativity of K follow from symmetry and nonnegativity of B.

To see that K is compact, consider a bounded sequence {xn}n∈N⊆ H. By

weak compactness, there exists a weakly convergent subsequence {xnk} and

translating we can suppose xnk* 0. By the continuity of the Fourier transform

outlined before,

hKxnk, xnki → 0 .

Since K is continuous and nonnegative, there exists its square root √K by a well known functional analysis theorem (see Theorem 12.33 of [30]) and√K is a continuous symmetric nonnegative operator. The equation above can then be written as _D√ Kxnk, √ Kxnk E → 0 ,

which means √Kxnk → 0 in the strong topology of H, and as a consequence

Kxnk→ 0 as well.

It remains to show that K is trace class. Translating the measure by −mγ,

it is possible to reduce to the case when γ is centered. We prove that centered Gaussian measures have second moment, i.e.

ˆ

H

|x|2 dγ < ∞ .

Let {en}n∈N an orthonormal basis of H made of eigenvectors for K. The

func-tionals e∗_n are orthogonal Gaussian random variables and so, by Proposition 3.1.9, they are independent.

Since the norm can be written as

|x|2=

∞

X

n=1

e∗_n(x)2 and since e∗_n are independent, the seriesP∞

n=1e ∗

n restricted to a bounded set

converges in L2_(γ).

With a diagonal argument, it is possible to extract a subsequence nk such

that the partial sums Pnk

i=1e∗n converge almost everywhere as k → ∞. By

Proposition 3.1.5 this implies

∞ X n=1 ˆ H e∗2_n dγ < ∞ and then _ˆ H |x|2 dγ(x) = ˆ H ∞ X n=1 e∗_n(x)2dγ(x) < ∞ .

(33)

Now the fact that K is trace class is straight forward, indeed the eigenvalue σ2 n relative to en is σ_n2= hKen, eni = B(e∗n, e∗n) = ˆ H e∗2_n dγ

by definition of B and we have just proven that the sum of these terms converges. To see the converse, we have to show that that there exists a Gaussian measure whose L and B functions satisfy Equations (3.3) and (3.4). Since K is symmetric and compact, there exists an orthonormal basis {en}n∈N of H

made of eigenvectors for K. Let σ2

n be the correspondent eigenvalues. The

construction of Example 3.2.4 leads to a centered Gaussian measure γ such that the coordinates functions are independent and have covariance σ2

n. For such

measure, the function B, which is continuous as we have proven above, is

B(x∗, y∗) = B ∞ X n=1 hx, eni e∗n, ∞ X m=1 hy, emi e∗m ! = = ∞ X n=1 hx, eni hy, eni B(e∗n, e∗n) = ∞ X i=1 hx, eni hy, eni σn2 = = ∞ X i=1 hx, eni hy, eni hKen, eni = hKx, yi

and so Equation (3.4) is satisfied. To get a measure that satisfies also Equation (3.3) is it sufficient to translate γ by mγ.

The vector mγ is called the mean of γ and γ is said to be centered if mγ

is the origin. The translate of γ by −mγ is still a Gaussian measure and it is

centered, so we can always suppose a Gaussian measure to be centered up to a translation.

Theorem 3.2.6 has many interesting corollaries. The first corollary below is an intermediate step of the proof.

Corollary 3.2.7. Every Gaussian measure γ on a Hilbert space H has second moment, namely _ˆ

H

|x − mγ| 2

dγ(x) < +∞ .

In Rn, chosen a system of coordinates, there is a “standard” Gaussian mea-sure, the one with center in the origin and covariance matrix the identity, which is rotationally symmetric. Theorem 3.2.6 says that in infinite dimen-sional spaces, Gaussian measures are not symmetric, and the variance of the coordinates functions should go to zero quite fast.

Corollary 3.2.8. The function exp−1 2|x|

2

is the Fourier transform of no measure on a infinite dimensional separable Hilbert space.

(34)

Proof. Suppose by contradiction that there exists a measure µ on the Hilbert space H. By Proposition 3.2.5 µ is Gaussian measure, but then by Theorem 3.2.6 the identity should be a trace class operator and this could be true only if dim H < ∞.

Corollary 3.2.9. Let γ be a Gaussian measure on a Hilbert space H. Then there exists an orthonormal basis of H such that the coordinate functions are independent.

In some sense, γ can be seen a product of one dimensional Gaussian mea-sures.

Proof. The operator K is compact and symmetric, there is an orthonormal basis {en}n∈N of eigenvectors for K. The coordinate functions e∗n are orthogonal in

L2_{(γ), indeed if n 6= m}

he∗n, e∗miL2_(γ) = hKen, emi = 0 ,

but, since e∗_n are Gaussian random variables, by Proposition 3.1.9 this implies that they are also independent.

In general, finite measures on Hilbert spaces have another couple of useful properties, which of course are true also for Gaussian measures.

Lemma 3.2.10. Let H a separable Hilbert space, {en}n∈Nan orthonormal basis

and Pn: H → H the projection on the subspace generate by e1, . . . , en. Let also

µ be a finite measure and consider the image measures Pn]µ. Then Pn]µ *Cbµ,

i.e. for every ϕ : H → R continuous and bounded ˆ

ϕ dPn]µ →

ˆ ϕ dµ .

Proof. Let ϕ ∈ Cb be a continuous bounded function. Since {en}n∈N is a basis

of H, for every x ∈ H,

Pn(x) → x

and then also ϕ ◦ Pn→ ϕ pointwise. It follows that

ˆ ϕ dPn]µ = ˆ ϕ ◦ Pndµ → ˆ ϕ dµ

by dominate convergence, because ϕ is bounded and µ is finite.

The second property says that also in separable Hilbert spaces measures are characterized by their Fourier transform.

Lemma 3.2.11. Let µ and ν two measures on a separable Hilbert space and suppose their Fourier transforms coincideµ =_b _bν .

(35)

Proof. By Lemma 3.2.3, H∗ generates the Borel σ-algebra B(H). This means that it is sufficient to prove

µ(B) = ν(B) for all B in the family

F =(x∗

1, . . . , x∗n)−1(B0) | n ∈ N, x1∗. . . x∗n∈ H∗, B0∈ B(R) .

Indeed, F is closed under finite intersection, the family of sets on which µ and ν coincide is a Dynkin system and the smaller σ-algebra that contains F coincide with the one generated by H∗.

Equivalently we can prove

(x∗₁, . . . , x∗_n)]µ = (x∗1, . . . , x∗n)]ν

for all x∗

1. . . x∗n∈ H∗and n ∈ N. This follows from the injectivity of the Fourier

transform for measures on Rn_{. Indeed, letting f = (x}∗

1, . . . , x∗n), d f]µ(ξ) =µ(ξ_b 1x∗1+ · · · + ξnx∗n) =bν(ξ1x ∗ 1+ · · · + ξnx∗n) = cf]ν(ξ) for all ξ = (ξ1, . . . , ξn) ∈ Rn.

The last remark is that all measures on a separable Hilbert space are Radon. This follows from the Ulam’s lemma, which is true in general for measures in complete metric spaces, see also Theorem 7.1.4 in [11].

Proposition 3.2.12 (Ulam’s lemma). Let (Ω, d) be a complete separable metric space and µ a finite measure on Ω. Then µ is Radon.

Proof. We first prove that for every ε there exists a compact set Cε such that

µ(Ω \ Cε) < ε. Let {xn}n∈N be a dense set in Ω and denote by Bm(xn) the

ball of radius 1

m and center xn. The union of that balls at m fixed covers Ω by

density

+∞

[

n=1

Bm(xn) = Ω

and so for every m there exists Nmsuch that, called Umthe set

Um= Nm [ n=1 Bm(xn) , it holds µ(Ω \ Um) < 2−mε . Now let U = +∞ \ m=1 Um.

(36)

and Cεbe the closure of U . The set U is totally bounded by construction, and

so, since Ω is complete, its closure Cε is compact. We can also estimate the

measure of its complement as

µ(Ω \ Cε) ≤ µ(Ω \ U ) = µ ∞ [ m=1 Ω \ Um ! < ∞ X m=1 2−mε = ε and so Cεis set we were looking for.

It is a well known result, see for example Theorem 7.1.3 in [11], that if µ is a finite measure on a metric space, then for every Borel set B and every ε there exists a closed set C ⊆ B such that µ(B \ C) < ε. Intersecting C and Cεwe get

a compact set contained in B such that

µ(B \ (C ∩ Cε)) ≤ µ(B \ C) + µ(Ω \ Cε) < 2ε

and we are done.

The fact that Gaussian measures are Radon implies that they are concen-trated on a countable union of compact sets. This could be a bit surprising, because compact sets are quite “small” in Hilbert spaces. As we will see in the following, it is possible to show other “small” sets on which a Gaussian measure is concentrated and this phenomenon is a big difference between the finite and infinite dimensional case.

3.3 The Cameron-Martin space

Let H be a separable Hilbert space and denote its scalar product by h·, ·i_H and its norm by |·|_H. Let also γ be a Gaussian measure on H. As in the previous section, we denote by H∗ the dual of H and for each x ∈ H, x∗_{: H → R is the} continuous linear functional defined by

x∗(y) = hx, yi . We denote the dual norm on H∗ by |·|_H as well.

By Theorem 3.2.6 there exists the mean of γ and it is denoted by mγ, namely

for each x∗∈ H∗ _ˆ H

x∗dγ = x∗(mγ) .

We now define also the covariance of γ, which is a scalar product on H∗defined by hx∗ 1, x∗2iγ = ˆ H x∗₁− x∗ 1(mγ) x∗₂− x∗ 2(mγ) dγ .

Note that if γ is centered the covariance h·, ·i_γ is the scalar product in L2_(γ).

The covariance of γ induces on H∗a norm |x∗|2_γ =

ˆ

H

(x∗− x∗(mγ))2dγ ,

(37)

3.3. THE CAMERON-MARTIN SPACE 37

Definition 3.3.1. The covariance operator R0_γ: H∗ → H is defined by the equation

x∗₁(R0_γx∗₂) = hx∗₁, x∗₂i_γ .

This definition is well posed thanks to Theorem 3.2.6, which states that the covariance is continuous with respect to the dual norm.

Definition 3.3.2. We denote by H∗

γ the closure in L2(γ) of the set

{x∗− x∗(mγ) | x∗∈ H∗} .

Note that, unless γ is centered, H∗ * Hγ∗, but Hγ∗ contains translations of

the elements in H∗_.

Lemma 3.3.3. Every f ∈ H_γ∗ is a centered Gaussian random variable. Proof. Let f ∈ H_γ∗ and x∗_n a sequence in H∗ such that fn = x∗n − x∗n(mγ)

converges to f in L2(γ). Possibly extracting a subsequence, we can suppose that the convergence is almost sure. By Corollary 3.1.4, the limit random variable f is Gaussian and centered.

Lemma 3.3.4. For every f ∈ H_γ∗ the linear functional on H∗ x∗7→

ˆ

H

[x∗− x∗(mγ)] f dγ = hf, x∗− x∗(mγ)i_L2_(γ) (3.5)

is continuous with respect to the |·|_H norm and so, the covariance operator admits an “extension” to H_γ∗ defined by

x∗(Rγf ) =

ˆ

H

[x∗− x∗(mγ)] f dγ .

Beware that, unless γ is centered, this not at all an extension, because in general H∗

* Hγ∗. It happens instead that

R0_γ(x∗) = Rγ(x∗− x∗(mγ)) ,

R0γ is defined on H∗ and Rγ is defined on Hγ∗. Sometimes, see e.g. [3], these

two operators are both denoted by Rγ.

Proof. The functional defined in Equation 3.5 is clearly continuous with respect to the |·|_γ norm on H∗. But this norm is bounded by the |·|_H norm, indeed

|x∗_|2 γ = ˆ H [x∗(y − mγ)]2dγ(y) ≤ |x∗| 2 H ˆ H |y − mγ| 2 H dγ(y) and _ˆ H |y − mγ|2_H dγ < ∞

(38)

By duality H could be seen as a set of linear functional on H∗. If we consider on H∗ the norm |·|_γ, not all the elements of H are continuous with respect to this norm. The continuous ones are the Cameron-Martin space.

Definition 3.3.5. The Cameron-Martin space, denoted by Hγ, is a subspace

of H defined as Hγ = ( x ∈ H sup y∗_∈H∗_,|y∗_| γ≤1 y∗(x) < +∞ ) .

On the Cameron-Martin space is defined the norm |x|_H

γ= sup

y∗∈H∗

|y∗|_γ≤1

y∗(x)

With a little abuse of notation, we define |·|_H

γ on the whole H, letting

|x|_H

γ= +∞ for every x /∈ Hγ.

The Cameron-Martin space is closely related to the structure of the Gaussian measure γ and we now outline some of its properties.

Proposition 3.3.6. The Cameron-Martin space is the image of H_γ∗through the operator Rγ,

Hγ = Rγ(Hγ∗) .

Moreover, for every f ∈ H_γ∗,

|Rγf |_H_γ = |f |L2_(γ) .

Proof. Let f be an element of H_γ∗. Then, for every y∗∈ H∗_{such that |y}∗_| γ ≤ 1, by definition of Rγ y∗(Rγf ) = hy∗− y∗(mγ), f i_L2_(γ)≤ |y ∗_| γ|f |L2_(γ) ≤ |f |_L2_(γ)< +∞ and so Rγf ∈ Hγ.

Conversely, if x ∈ Hγ the linear functional x∗∗: H∗→ R

x∗∗: y∗7→ y∗_(x)

is continuous with respect to the |·|_γ norm on H∗_{. This means that a}

corre-spondent functional x∗∗

γ can be defined Hγ∗→ R, letting

x∗∗_γ (y∗− y∗(mγ)) = y∗(x)

for every y∗∈ H∗_{and extending this function to H}∗

γ by continuity with respect

to the L2_{(γ) norm. Since H}∗

γ is an Hilbert space with the L2(γ) norm, by Riesz

theorem, the functional x∗∗_γ is represented by some f ∈ H_γ∗. This f is such that for every y ∈ H∗

y∗(x) = x∗∗_γ (y∗− y∗_(m

(39)

3.3. THE CAMERON-MARTIN SPACE 39

and this means that x = Rγ(f ) by definition of Rγ.

Riesz theorem gives also the equality of the norms, indeed

|f |_L2_(γ)= sup g∈H∗ γ x∗∗_γ (g) |g|_L2(γ) = by density of {y∗− y∗_(m γ) | y∗∈ H∗} in Hγ∗, = sup y∗_∈H∗ x∗∗ γ (y∗− y∗mγ) |y∗_{− y}∗_(m γ)|L2_(γ) = sup y∗_∈H∗ y∗(x) |y∗_| γ = |x|_H γ

and we are done.

The following Corollary is a direct consequence of the above Proposition, we state it here for future reference.

Corollary 3.3.7. The Cameron-Martin space Hγ with the scalar product

hx, yi_H γ =R −1 γ (x), R −1 γ (y) L2(γ)

is a Hilbert space, where Rγ is the operator defined in Lemma 3.3.4.

The norm induced by the scalar product h·, ·i_H

γ is the |·|Hγ norm defined in

Definition 3.3.5 and the function Rγ is an isometry between (Hγ, h·, ·i_H_γ) and

(H_γ∗, h·, ·i_L2_(γ)).

In Rn _{we are used to the fact that all Gaussian measures are absolutely}

continuous with respect to each other. If the space is infinite dimensional, this is not true. A measure and a the translation of it can be singular with respect to each other. The next lemmas and propositions aims to show this fact.

We denote by τx: H → H the translation of x in H, τx(y) = y + x, so that

the translated of a measure γ by x is τx]γ.

Lemma 3.3.8. Let γ and ν two finite measures on a measure space (Ω, F ). Then they are orthogonal if and only if the total variation of their difference is

kγ − νk = γ(Ω) + ν(Ω)

Proof. It is a well known fact that, when considering the difference of two pos-itive measures, the total variation is given by the formula

kγ − νk = sup {γ(E) + ν(F ) | E, F measurables, E ∩ F = ∅ } .

From the above formula it is clear that kγ − νk ≤ γ(Ω)+ν(Ω). If γ and ν are singular, then there exist two disjoint sets E and F such that γ is concentrated on E and ν is concentrated on F . It follows that

kγ − νk ≥ γ(E) + ν(F ) = γ(Ω) + ν(Ω) and one implication is proven.

Probability measures on infinite dimensional embedded manifolds with applications to computer vision

Universit`

a di Pisa

Facolt`

a di Scienze Matematiche, Fisiche e Naturali

Probability Measures

on Infinite Dimensional Embedded Manifolds

with applications to

Computer Vision

Tesi di laurea

Candidata Eleonora Bardelli

Relatore Andrea Mennucci

Corso di Laurea Magistrale in Matematica

Contents

Chapter 1

Introduction

1.1

Notation

Chapter 2

Motivation

2.1

Shape spaces

2.2

The Stiefel manifold

2.3

The filtering problem

2.4

Filtering and tracking

Chapter 3

Gaussian measures

3.1

Finite dimensional Gaussian measures

3.2

Gaussian measures in Hilbert spaces

3.3

The Cameron-Martin space