### UNIVERSITY OF PISA

### D

### EPARTMENT OF

### I

### NFORMATION

### E

### NGINEERING

### Master Degree in Robotics and Automation Engineering

### Object localization using vision and touch:

### experiments on the iCub humanoid robot

Candidate:

Nicola Agostino Piga

Supervisors:

Prof. Lucia Pallottino Dott. Lorenzo Natale

## Contents

Introduction 1

1 Sequential Bayesian estimation algorithms 5

1.1 Bayes’ theorem . . . 5

1.2 The chain rule of conditional densities . . . 6

1.3 Recursive Bayesian filtering . . . 7

1.3.1 Classical assumptions in Bayesian filtering . . . 7

1.3.2 The filtering distribution . . . 9

1.3.3 The predictive distribution . . . 10

1.3.4 The prior distribution . . . 11

1.3.5 Exact solution of the equations . . . 11

1.4 Particle filtering . . . 11

1.4.1 Monte Carlo sampling . . . 12

1.4.2 Importance sampling . . . 13

1.4.3 Sequential importance sampling . . . 16

1.4.4 Sequential importance resampling particle filtering . . . 21

1.5 Unscented particle filtering . . . 24

1.5.1 Unscented transform . . . 25

1.5.2 Unscented Kalman filter . . . 29

1.5.3 Unscented particle filter . . . 32

1.5.4 Maximum a posteriori estimation in particle filtering . . . 34

2 Object localization using vision and touch 37 2.1 Memory unscented particle filter . . . 38

2.1.1 Motion model . . . 39

2.1.2 Measurement model . . . 40

2.1.3 The role of the memory . . . 41

2.2.1 Measurement energy potentials . . . 44

2.2.2 Measurement errors . . . 44

2.2.3 Reformulation of the measurements likelihood . . . 46

2.2.4 Reformulation of the measurement equation . . . 47

2.2.5 Measurements from the visual domain . . . 48

2.2.6 Measurements from the tactile domain . . . 49

2.2.7 Review of the memory mechanism . . . 50

2.3 Accounting for moving objects . . . 56

2.3.1 Assumptions on the motion of the object . . . 57

2.3.2 Reformulation of the motion model . . . 58

2.3.3 Reformulation of the Markov transition density . . . 60

2.4 Final algorithm . . . 60

3 The iCub humanoid robot 64 3.1 Description of the robot . . . 64

3.2 Perception capabilities of the robot . . . 68

3.2.1 The iCub stereo vision pipeline . . . 68

3.2.2 The tactile sensors on iCub fingertips . . . 71

3.2.3 The iCub springy fingers model . . . 72

3.3 Motion capabilities of the robot . . . 72

3.3.1 The iCub Cartesian interface . . . 73

3.3.2 The iCub gaze interface . . . 75

4 Software implementation 76 4.1 Software libraries . . . 77

4.1.1 The YARP library . . . 78

4.1.2 The iCub libraries . . . 78

4.1.3 The CGAL library . . . 79

4.1.4 The VCG library . . . 80

4.1.5 The ArUco library . . . 80

4.1.6 The SuperimposeMesh library . . . 81

4.1.7 The VTK library . . . 82

4.2 Description of the developed software modules . . . 84

4.2.1 Localization module . . . 84

4.2.2 Control module . . . 87

4.2.3 Ground-truth estimation module . . . 87

4.2.4 Visualization modules . . . 88

4.3.1 The Gazebo environment . . . 91

4.3.2 Ground truth plugin . . . 92

4.3.3 Point cloud plugin . . . 93

4.3.4 Fingertips contact detection plugin . . . 95

5 Simulation results 97 5.1 Description of the experiments . . . 97

5.2 Evaluation metrics . . . 104

5.3 Analysis of the results . . . 106

5.3.1 Experiment 1: translation with standard UPF . . . 106

5.3.2 Experiment 2: translation with prediction-only UPF . . . 112

5.3.3 Experiment 3: rotation with standard UPF . . . 115

5.3.4 Experiment 4: rotation with prediction-only UPF . . . 116

6 Experimental results 119 6.1 Description of the experiments . . . 119

6.2 Differences between the simulated and experimental setup . . . 124

6.2.1 Ground-truth estimation . . . 124

6.2.2 Contacts detection mechanism . . . 125

6.2.3 Mismatch between visual and tactile measurements . . . 125

6.3 The UPF with correction of the measurements . . . 131

6.4 Analysis of the results . . . 133

6.4.1 Visual localization . . . 133

6.4.2 Comparison between standard UPF and UPF with correction 137 6.4.3 Prediction-only UPF . . . 141

## Introduction

Precise knowledge of the pose, the shape and the properties of objects is a fundamen-tal requirement for an autonomous robot that has to manipulate and interact with objects in order to grasp and/or move them in a robust and efficient way. Such a knowledge is available to the robot through noisy sensors that provide each a differ-ent piece of information. Hence, as happens for humans, also robots need to employ multiple sensing modalities in order to extract as much information as possible from the sorrounding environment.

The specific problem of estimating the pose of an object is called object local-ization. This problem has been addressed in several ways and using several sensing modalities, the most common being vision, touch and force sensing.

In robotics the importance of combining visual and tactile information for object pose estimation was recognized since the 1980s. In 1984, Allen [1] used vision to characterize and localize the surface of an object using bicubic surface patches and developed a strategy to sample tactile traces, such as 3D coordinates and surface normals, to better approximate the surface. In 1998, Honda et al. [8] devised a least squares based algorithm to estimate the pose of an object grasped and manipulated by a multi-fingered hand by finding correspondences between the contact location of the fingertips and visual features projected over the surface of the object and tracked with a 3D vision system.

In the last years Kalman Filtering and, more in general, Bayesian methods have been playing a central role in object pose estimation because they allow combining measurements from different sensing modalities taking into account the associated uncertainties in a natural way.

In 2012, Herbert et al.[7] combined visual appearance features and silhouette of both the manipulator and the grasped object as well as tactile measurements using an Unscented Kalman Filter. The proposed algorithm estimates the relative pose between the palm of the manipulator and the grasped object and, interestingly, the mismatch between the pose of the palm according to the forward kinematics and the same pose according to the visual estimate.

Unscented Kalman Filtering allows incorporating the nonlinearities that arise in the measurement models of the sensors. Other works also considered the multimodal nature of the object pose estimation problem using, e.g., Particle Filters. In 2015, Bimbo et al. [2] used particles to represent the research region of an object pose estimation algorithm. The particles are initialized according to a prior coming from vision and then particles showing high fitness with tactile data are replicated recur-sively. In 2017, Vezzani et al. [25] proposed the Memory Unscented Particle Filter which combines an Unscented Particle Filter with a memory strategy to estimate the 6D pose of a stationary object using 3D tactile contact points and, notably, not using contact normals.

Although particle filters are good at accounting for multimodality, they suffer from the particle depletion problem especially when tactile measurements are in-volved. In fact, during manipulation, the real pose of the object belongs to a lower-dimensional manifold determined by non-penetration constraints between the manip-ulator and the manipulated object. This kind of constraint is not easily incorporated in standard particle filters where sampled particles correspond with high probabil-ity to configurations in which the manipulator and the object are overlapping or separated. In [12] Koval et al. proposed the Manifold Particle Filter that sample particles directly from a contact manifold ensuring satisfaction of the non-penetration constraint.

Other works focused on the role of friction forces and contacts arising between the manipulator and the manipulated object and between the object and the external environment. In 2017, Rodriguez et al. [29] modelled the planar motion of a pushed object using the concept of limit surface that maps the forces acting on the object to its velocity under quasi-static motion regime. This motion model was used within the iSAM real time state estimation framework that combines visual and tactile measurements. Very recently, in [30] the same authors exploited the same filtering framework to support the task of inserting a suction-held object. In this case, the contact points between the object and the external environment are not observed directly but inferred using a data-driven method based on Support Vector Machines. The concept of the limit surface is also used in [28] to recover the pose and the shape of an object during tactile planar exploration.

Some works also considered the problem of designing controllers that exploit in-formation coming from vision and touch to localize and explore unknown objects. In 2015, Li et al. [13] combined tactile-servoing and visual-servoing controllers in order to perform actions such as object approach, grasp stability improvement by sliding the fingers along the surface of the object, in-hand manipulation and object exploration. In 2017, Yang et al. [27] combined vision and touch to perform contour

following around the edges of an unknown object using Bayesian evidence accumu-lation to fuse knowledge coming from the two sensing modalities and estimate the position of the tactile sensor with respect to the edges of the object. This informa-tion is used to plan corrective acinforma-tions that allow the manipulator to track the edges of the object. In 2017, Fantacci et al. [5], proposed a framework for markerless visual-servoing on unknown objects. The proposed framework combines a recursive Bayesian filtering technique that estimates the 6D pose of the robot’s end-effector without the use of markers with a image-based visual-servoing controller that com-mands the robot towards the desired pose.

From a technological standpoint, the most used sensors for vision in the liter-ature are stereo cameras and RGB-D sensor. These sensors are generally used to obtain point clouds belonging to the surface of the object to be localized. As regards contact detection and sensing, a force/torque sensor mounted on the wrist of an industrial gripper is a common choice. However, in some works human-like robotic hands equipped with tactile sensors placed on the fingertips, see e.g. [23], are used. Very recently, a new kind of tactile sensor, called GelSight, was proposed. It consists of a thin elastomer observed by a conventional color camera that captures the defor-mations of the elastomer when the sensor is pressed against an object. This sensor allows capturing dense local geometric information and was successfully exploited in many applications such as pose estimation [9].

Although new sensors may offer interesting sensing capabilities, they may not always be available on robotic platforms. Sometimes, even more traditional sensors, e.g. force/torque sensors, are not available thereby precluding the use of information like contact normals and contact forces within object localization algorithms. For this reason, the development of localization algorithms relying on more common sensors, such as encoders, stereo cameras and simple contact sensors, is of great interest.

The aim of the present thesis is to study and implement a filtering algorithm for object localization that uses visual and tactile information in the solely form of Cartesian points belonging to the surface of the object. To this end, the Memory Unscented Particle Filter algorithm [25] for tactile localization of a stationary object will be presented and extended in order to localize an object using visual measure-ments, in the form of point clouds, and track its pose using tactile measurements while the object is manipulated by an external end-effector. The considered tactile measurements will be in the form of contact points computed using binary contact sensors, encoder readings and the forward kinematics of the robot. The performance and the limitations of the proposed algorithm will be discussed through the analy-sis of the results of localization experiments performed on a simulated environment and on the iCub humanoid robot platform using its stereo vision and tactile sensing

systems.

The present dissertation is divided into six chapters: 1. Sequential Bayesian estimation algorithms:

the first chapter presents the theory of sequential Bayesian filtering that is required to understand the Memory Unscented Particle Filter algorithm. In particular the Particle Filter and the Unscented Particle Filter algorithms are presented.

2. Object localization using vision and touch:

the second chapter introduces the problem of object localization and extends the Memory Unscented Particle Filter algorithm for tactile localization to the case of localization using visual and tactile measurements.

3. The iCub humanoid robot:

the third chapter introduces the humanoid robot platform used in the experi-mental phase of the thesis, the iCub robot. In particular the visual and tactile perception capabilities of the robot are presented.

4. Software implementation:

the fourth chapter describes the software modules developed during the thesis and required to execute localization experiments in a simulated environment and in an experimental setup.

5. Simulation results:

the fifth chapter presents and analyzes the simulation results relative to local-ization experiments.

6. Experimental results:

the sixth and last chapter presents and analyzes the experimental results rela-tive to localization experiments.

## 1.

## Sequential Bayesian estimation

## algorithms

The goal of the object localization problem is to estimate in real time the pose xt of

an object O using measurements yt = {y1, . . . , yt} collected up to the current time

instant t . The object localization problem can be cast into the sequential Bayesian estimation framework and addressed as a nonlinear filtering problem. The sequen-tial Bayesian estimation framework consists in a general probabilistic approach for estimating an unknown probability distribution recursively over time using incoming measurements and a mathematical model of the process under observation. Accord-ing to this framework, the goal of the filterAccord-ing problem is to recursively estimate the conditional distribution of the state xt of a dynamical system given the

measure-ments yt_{. Once this distribution is estimated, it is possible to extract an estimate ˆ}_{x}
t

of the state.

In the following, the theory of sequential Bayesian estimation is first presented. Then, two Bayesian sequential estimation algorithms, the Particle Filter and the Un-scented Particle Filter, are presented. These algorithms are used in the next chapter to develop an object localization algorithm that uses visual and tactile measurements.

### 1.1

### Bayes’ theorem

The theory behind Bayesian estimation is developed starting from the well known Bayes’ theorem. If continuous Random Variables, from now on RVs, are used to represent the space of events, the theorem takes the form

pX|Y =y(x|y) =

pY |X=x(y|x) pX(x)

pY(y)

(1.1)

with X, Y RVs, pX(·), pY(·) probability density functions (PDFs) and pX|Y(·|·),

In the Bayesian interpretation, the theorem links the probability distribution of X before and after accounting for the evidence Y = y. In details:

• the term pX(x) is called the prior, i.e. the distribution of X before taking into

account Y = y;

• the term pY |X=x(y|x)/pY(y) is called the support that Y provides for X;

• the term pX|Y =y(x|y) is called the posterior, i.e. the distribution of X after

taking into account Y = y.

The support is obtained as the quotient between the likelihood pY |X=x(y|x) and the

evidence pY(y). It should be noted that the evidence is just a normalization factor

while the likelihood is the term that effectively explains the posterior in terms of the prior.

### 1.2

### The chain rule of conditional densities

Another tool from probability theory, that is required to develop Bayesian filtering algorithms, is the chain rule of conditional densities. The rule allows to express the joint distribution of a set of RVs using only conditional densities. To show this, a random process

Xt:= X0:t = {X0, ..., Xt} (1.2)

is considered, where each Xi is a RV. For example, the joint distribution

pXt,Xt−1,Xt−2(xt, xt−1, xt−2)

can be expressed, using the definition of conditional probability density function

pX,Y(x, y) = pX|Y =y(x|y) pY(y)

as

pXt,Xt−1,Xt−2(xt, xt−1, xt−2) = p(xt, xt−1, xt−2)

= p(xt, xt−1|xt−2) p(xt−2)

where the subscripts have been removed for notational convenience. Extending the example to a generic random process Xt, the following result is obtained

pXt(xt) : = p_{X}
t,...,X0(xt, ..., x0)
= p(xt, ..., x0)
=
t
Y
k=0
p(xk|xk−1).
(1.3)

### 1.3

### Recursive Bayesian filtering

In this section the fundamental equations of the Bayesian filtering theory are pre-sented. The exact solution of these equations provides an answer to the problem of recursively computing the marginal posterior distribution, or filtering distribution, of a RV Xt, extracted from a random process Xt = {X0, ..., Xt} , at each time step t

given the history of the evidences up to time t, Yt_{:}

pt|t(xt) := pXt|Yt=yt(xt|y

t_{)} _{(1.4)}

It should be noted that typically the RV Xt represents the hidden (or unobserved )

state of a system at time t while the evidence Yt represents a measurement of the

state of the system at time t.

In the following, the subscripts, that indicate to which RVs a PDF is referring to, will be removed, whenever possible, to increase readability. Also, the RVs and the values that they can assume will be confused and lower case symbols, i.e. x instead of X, will be preferred with some exceptions.

### 1.3.1

### Classical assumptions in Bayesian filtering

Before delving into the derivation of the equations it is required to state some as-sumptions that are typically employed in the framework of Bayesian filtering. Assumption 1. Markov property of states

The state of the system at time t, xt, only depends on the state at time t − 1

and not on past states and measurements. More formally, the state xt given xt−1 is

independent of any state xj with j < t − 1 and of any measurement ykwith k ≤ t − 1:

The assumption can also be restated in the following way by repeatedly using the definition of conditional density function and (1.5):

p(xt|xt−1, xj, yk) =
p(xt, xt−1, xj, yk)
p(xt−1, xj, yk)
= p(xt, x
j_{, y}k_{|x}
t−1) p(xt−1)
p(xt−1, xj, yk)
= p(xt|xt−1)
p(xj, yk|xt−1)
p(xt−1, xj, yk)
p(xt−1)
= p(xt|xt−1)
p(xt−1)
p(xt−1)
= p(xt|xt−1)
(1.6)

where j < t − 1 and k ≤ t − 1. The last term p(xt|xt−1) is the conditional density

pXt|Xt−1=xt−1(xt|xt−1)

that describes the stochastic dynamics of the system. This density corresponds to the Markov transition density φt|t−1(xt|xt−1)

pXt|Xt−1=xt−1(xt|xt−1) = φt|t−1(xt|xt−1). (1.7)

Assumption 2. Conditional independence of measurements

The measurement at time t, yt, only depends on the state at time t, xt, and not

on past and future states and measurements. More formally, the measurement yt

given xt is independent of any state xj and of any measurement yk with j, k 6= t:

p(yt, xj, xj
0_{:T}
, yk, yk0:T|xt) =
p(yt|xt) p(yk, yk
0_{:T}
, xj, xj0:T|xt), j, k ≤ t − 1, j0, k0 ≥ t + 1.
(1.8)

probability and (1.8):
p(yt|xj, xt, xj
0_{:T}
, yk, yk0:T) = p(yt, x
j_{, x}
t, xj
0_{:T}
, yk, yk0:T)
p(xj_{, x}
t, xj
0_{:T}
, yk_{, y}k0_{:T}
)
= p(yt, x
j_{, x}j0:T_{, y}k_{, y}k0:T_{|x}
t) p(xt)
p(xj_{, x}
t, xj
0_{:T}
, yk_{, y}k0_{:T}
)
= p(yt|xt)
p(yk, yk0:T, xj, xj0:T|xt)
p(xj_{, x}
t, xj
0_{:T}
, yk_{, y}k0_{:T}
) p(xt)
= p(yt|xt)
p(xt)
p(xt)
= p(yt|xt)
(1.9)
where j, k ≤ t − 1 and j0, k0 ≥ t + 1.

The last term p(yt|xt) is the conditional density

pYt|Xt=xt(yt|xt)

that describes the measurement model. This density can be written in terms of a measurement likelihood function `t(yt|xt)

pYt|Xt=xt(yt|xt) = `t(yt|xt). (1.10)

### 1.3.2

### The filtering distribution

The filtering distribution (1.4) can be computed using Bayes’ theorem (1.1) as
pt|t(xt) = p(xt|yt)
= p(xt|yt, yt−1)
= p(yt|xt, y
t−1_{) p(x}
t|yt−1)
p(yt|yt−1)
.
(1.11)

The prior term in (1.11)

p(xt|yt−1)

is the the predictive distribution

pt|t−1(xt) := pXt_{|Y}t−1_{=y}t−1(x_{t}|yt−1). (1.12)

The likelihood term in (1.11)

can be written, using (1.9), as

p(yt|xt, yt−1) = p(yt|xt)

= `t(yt|xt).

(1.13)

As regards the evidence part in (1.11)

p(yt|yt−1)

it can be expressed using the law of total probability as

p(yt|yt−1) = Z p(yt|yt−1, ξ) p(ξ, yt−1) dξ = Z p(yt|yt−1, ξ) pt|t−1(ξ) dξ

where ξ represents the state at time t, xt. Taking into account (1.9), the expression

reduces to p(yt|yt−1) = Z p(yt|ξ) pt|t−1(ξ) dξ = Z `t(yt|ξ) pt|t−1(ξ) dξ. (1.14)

By substituting (1.13) and (1.14) in (1.11) the final form of the filtering distribution is obtained

pt|t(xt) =

`t(yt|xt) pt|t−1(xt)

R `t(yt|ξ) pt|t−1(ξ) dξ

. (1.15)

### 1.3.3

### The predictive distribution

In order to evaluate the predictive distribution (1.12), the first step is to compute the joint conditional density of the state Xt and Xt−1

p(xt, xt−1|yt−1)

using Bayes’ theorem and the Markov property of the state, see (1.6): p(xt, xt−1|yt−1) = p(xt|xt−1, yt−1) p(xt−1|yt−1)

= p(xt|xt−1) p(xt−1|yt−1)

Then, the marginal density of Xt, given the measurements yt−1, can be obtained by

marginalization over the state Xt−1 leading to the predictive distribution:

pt|t−1(xt) = p(xt|yt−1) = Z p(xt, ξ|yt−1) dξ = Z φt|t−1(xt|ξ) pt−1|t−1(ξ) dξ. (1.16)

The equation (1.16) is called the Chapman-Kolmogorov equation.

### 1.3.4

### The prior distribution

The equations (1.15) and (1.12) allow to compute the filtering distribution of the state recursively by means of a prediction step (1.12) and an update step (1.15). When the recursion starts, the result of the update step of the previous time step is not available. Hence, the result of the prediction step reduces to the prior distribution p(X0) of the state at time t = 0, X0. This distribution is chosen by the user according

to the available information on the state X0.

### 1.3.5

### Exact solution of the equations

The solution of the filtering equations, presented in the previous sections, is only possible for a small class of priors and likelihood distributions. For example, if they are approximated using Gaussian densities, a closed form of the filtering distribution exists and leads to the classical Bayesian filters such as the Kalman Filter and its variants, e.g. the Extended Kalman Filter and the Unscented Kalman Filter that are suitable for non-linear filtering problems.

Sometimes, the prior and the likelihood distributions, hence the resulting filtering distribution, are non-Gaussian and multimodal. Using the classical Bayesian filters to tackle multimodal filtering problems may produce very wrong results. In this respect, Monte Carlo methods, such as the particle filter algorithm, described in the next section, allow to approximate this kind of distributions.

### 1.4

### Particle filtering

In this section a generic particle filter algorithm, also known as Sequential impor-tance resampling algorithm, is presented. First, the so-called Imporimpor-tance sampling

algorithm is derived using ideas taken from Monte Carlo methods. Then, a sequential version of importance sampling is derived that is required to approximate the filter-ing distribution in a recursive fashion. Finally, the Sequential importance resamplfilter-ing algorithm is derived.

### 1.4.1

### Monte Carlo sampling

The ideas behind the importance sampling algorithm stems from the observation that knowing the value of the filtering distribution itself is not always required. Sometimes, is suffices to evaluate some kind of integral over the filtering distribution. An example is the conditional mean estimate of the state Xt

E(g(Xt)|Yt = yt) =

Z

g(ξ) pt|t(ξ) dξ

where g(·) is an arbitrary function. If the function g(·) takes the form

g(·) = φt+1|t(xt+1|·)

the predictive distribution is obtained

pt+1|t(xt+1) =

Z

φt+1|t(xt+1|ξ) pt|t(ξ) dξ

that will prove useful, in the following sections, to derive a MAP (Maximum A posteriori Probability) estimate of the state of the system.

Monte Carlo methods provide a numerical method for the evaluation of integrals of that form. The approximation requires to draw N i.i.d samples

x(i)_{t} ∼ pt|t(xt), i = 1, . . . , N

and then to estimate the integral as

E(g(Xt)|Yt= yt) ≈
1
N
N
X
i=1
g(x(i)_{t} ). (1.17)

Unfortunately, drawing samples directly from pt|t(xt) is very difficult for many

rea-sons. The main being the fact that the filtering distribution is only available in a complicated form, see (1.16) and (1.15), involving the integral of the filtering distri-bution itself.

The importance sampling (IS) algorithm, presented in the next section, deals with this problem by sampling from an approximate distribution called the importance distribution. Usually, IS is presented as an algorithm that approximates the full posterior distribution

pt(xt) := pXt_{|Y}t_{=y}t(xt|yt).

Although different from the filtering distribution pt|t(xt), it is an easy matter to

obtain an approximation to the filtering distribution starting from the output of the IS algorithm and it will be shown later on.

### 1.4.2

### Importance sampling

The IS algorithm (see, e.g., Liu, 2001 in [14]) is a Monte Carlo method that
approxi-mates the full posterior distribution pt_{(x}t_{) by drawing samples from an approximate}

distribution called the importance distribution

π(xt|yt).

Very few assumptions are made on the importance distribution: 1. it is easy to draw samples from it;

2. its support covers that of the posterior distribution, i.e.

pt(ξ) > 0 ⇒ π(ξ|yt) > 0 ∀ξ (1.18) The first step in the derivation requires to expand the posterior distribution using the Bayes’ theorem (1.1) as

pt(xt) = p(xt|yt_{)}
= p(y
t_{|x}t_{) p(x}t_{)}
p(yt_{)}
= p(y
t_{|x}t_{) p(x}t_{)}
R p(yt_{|ξ) p(ξ) dξ}

where the term p(yt_{|x}t_{) is the measurements likelihood function and the term p(x}t_{) is}

the integral, whose evalution is of interest, as
E(g(Xt)|Yt = yt) =
Z
g(ξ) pt(ξ) dξ
=
Z
g(ξ)
p(yt_{|ξ) p(ξ)}
R p(yt_{|ξ}0
) p(ξ0) dξ0
dξ
= R g(ξ) p(y
t_{|ξ) p(ξ) dξ}
R p(yt_{|ξ) p(ξ) dξ} .

Using the assumption (1.18), it is possible to manipulate the obtained expression by multipling and dividing by the importance distribution obtaining

R g(ξ) p(yt_{|ξ) p(ξ) dξ}
R p(yt_{|ξ) p(ξ) dξ} =
R p(yt_{|ξ) p(ξ)}
π(ξ|yt_{)} g(ξ)
π(ξ|yt_{) dξ}
R p(yt_{|ξ) p(ξ)}
π(ξ|yt_{)}
π(ξ|yt_{) dξ}
.

Both integrals represent the expectation integral of the term in the brackets over the
distribution π(ξ|yt_{). Thus, they can be approximated using Monte Carlo }

approxi-mation by drawing N i.i.d samples from the importance distribution
xt,(i) ∼ π(xt_{|y}t_{),} _{i = 1, . . . , N}
leading to
E(g(Xt)|Yt = yt) ≈
1
N
PN
i=1

p(yt|xt,(i)_{) p(x}t,(i)_{)}

π(xt,(i)_{|y}t_{)} g(x
t,(i)_{)}
1
N
PN
j=1
p(yt_{|x}t,(j_{)) p(x}t,(j)_{)}
π(xt,(j)_{|y}t_{)}
=
N
X
i=1

p(yt_{|x}t,(i)_{) p(x}t,(i)_{)}

π(xt,(i)_{|y}t_{)}
PN
j=1
p(yt_{|x}t,(j_{)) p(x}t,(j)_{)}
π(xt,(j)_{|y}t_{)}
g(xt,(i))
=
N
X
i=1
˜
w(i)
PN
j=1w˜(j)
!
g(xt,(i))
=
N
X
i=1
w(i)g(xt,(i)).
(1.19)
The weights
˜
w(i) = p(y
t_{|x}t,(i)_{) p(x}t,(i)_{)}
π(xt,(i)_{|y}t_{)}

are called the unnormalized weights while the weights

w(i) = w˜

(i)

PN

j=1w˜(j)

are called the normalized weights. By recalling the Sifting property of the Dirac delta function δ(·) it is possible to rewrite (1.19) as

E(g(Xt)|Yt = yt) =
Z
g(ξ) pt(ξ) dξ
≈
N
X
i=1
w(i)g(xt,(i))
=
N
X
i=1
Z
w(i)δ(ξ − xt,(i)) g(ξ) dξ
=
Z
g(ξ)
N
X
i=1
w(i)δ(ξ − xt,(i))
!
dξ
=
Z
g(ξ) ˆpt(ξ) dξ
where
ˆ
pt(xt) =
N
X
i=1
w(i)δ(xt− xt,(i)_{)}

is the Monte Carlo approximation to the posterior distribution. The resulting algorithm is the following.

Algorithm 1. Importance sampling

Given a measurements likelihood function p(yt|xt_{) and a prior distribution p(x}t_{),}

the posterior distribution pt_{(x}

t) can be approximated as follows.

1. Draw N samples from the importance distribution:
xt,(i) ∼ π(xt_{|y}t_{),} _{i = 1, . . . , N.}

2. Compute the unnormalized and normalized weights as:

˜
w(i) = p(y
t_{|x}t,(i)_{) p(x}t,(i)_{)}
π(xt,(i)_{|y}t_{)} (1.20)
w(i) = w˜
(i)
PN
j=1w˜(j)
.

3. The approximation to the posterior pt_{(x}t_{) is then given as:}
ˆ
pt(xt) =
N
X
i=1
w(i)δ(xt− xt,(i)_{).}

The performance of the IS algorithm highly depends on the quality of the
im-portance distribution, that will be discussed later on. Performance also depends on
the choice of the likelihood function and of the prior distribution and on how well
they describe, respectively, the sensors used to collect the measurements and the
distribution of the process Xt_{. All these choices will clearly affect the ability of the}

algorithm to tackle the filtering problem, especially when severe non-linearities are involved.

As regards multimodality, the IS algorithm is advantaged, with respect to clas-sical Bayesian filters, by the fact that the posterior distribution is represented using mulitple samples.

### 1.4.3

### Sequential importance sampling

In order to obtain a sequential version of the IS algorithm, called sequential impor-tance sampling (SIS) (see, e.g., Doucet et al., 2001 in [3]), the proposal distribution is expanded using the definition of conditional probability density function as

π(xt|yt) = π(xt, xt−1|yt)

= π(xt|xt−1, yt) π(xt−1|yt).

By making the hypothesis that the distribution of Xt−1 given Yt−1 = yt−1 does not depend on the value assumed by the future measurement Yt, the term π(xt−1|yt) may

be written as

π(xt−1|yt_{) = π(x}t−1_{|y}
t, yt−1)

= π(xt−1|yt−1)

resulting in the following factorization of the importance distribution
π(xt|yt_{) = π(x}

t|xt−1, yt) π(xt−1|yt−1). (1.21)

Then, (1.21) is substituted into the definition of unnormalized weight (1.20) obtaining

˜

w = p(y

t_{|x}t_{) p(x}t_{)}

π(xt|xt−1, yt) π(xt−1|yt−1)

If this expression is multiplied and divided by the factor p(yt−1_{|x}t−1_{) p(x}t−1_{) a }

recur-sive version of the unnormalized weight is obtained as

˜
w = p(y
t_{|x}t_{) p(x}t_{)}
π(xt|xt−1, yt) π(xt−1|yt−1)
p(yt−1_{|x}t−1_{) p(x}t−1_{)}
p(yt−1_{|x}t−1_{) p(x}t−1_{)}
= p(y
t_{|x}t_{) p(x}t_{)}
π(xt|xt−1, yt) p(yt−1|xt−1) p(xt−1)
p(yt−1_{|x}t−1_{) p(x}t−1_{)}
π(xt−1_{|y}t−1_{)}
= p(y
t_{|x}t_{) p(x}t_{)}
π(xt|xt−1, yt) p(yt−1|xt−1) p(xt−1)
˜
wt−1:= ˜wt

where the unnormalized weight at time t, ˜wt, has been defined

˜
wt=
p(yt_{|x}t_{) p(x}t_{)}
π(xt|xt−1, yt) p(yt−1|xt−1) p(xt−1)
˜
wt−1. (1.22)

Using the chain rule (1.3), the prior p(xt) can be expanded as p(xt) =

t

Y

k=0

p(xk|xk−1).

By application of the Markov property of states (1.6) to each factor of the product, the expression reduces to

p(xt) = t Y k=0 p(xk|xk−1) = p(xt|xt−1) t−1 Y k=0 p(xk|xk−1). (1.23)

Using again the chain rule (1.3), the measurements likelihood function can be
ex-panded as
p(yt|xt_{) =}
t
Y
k=0
p(yk|yk−1, xt).

By application of the Conditional independence of measurements (1.9) to each factor of the product, the expression reduces to

p(yt|xt_{) =}
t
Y
k=0
p(yk|xk)
= p(yt|xt)
t−1
Y
k=0
p(yk|xk).
(1.24)

Taking into account equations (1.23) and (1.24), the expression of the unnormalized weight (1.22) can be simplified as

˜ wt = ˜wt−1 p(yt|xt) Qt−1 k=0p(yk|xk) Qt−1 k=0p(yk|xk) p(xt|xt−1) Qt−1 k=0p(xk|xk−1) Qt−1 k=0p(xk|xk−1) 1 π(xt|xt−1, yt)

which, after cancelation between the numerators and the denominators, gives the final recursion ˜ wt= ˜wt−1 p(yt|xt) p(xt|xt−1) π(xt|xt−1, yt) .

By recalling the definitions of Markov transition density (1.7) and measurement likelihood function(1.10) the recursion is written as

˜

wt= ˜wt−1

`t(yt|xt) φt|t−1(xt|xt−1)

π(xt|xt−1, yt)

. (1.25)

The recursion of the weights allows to evaluate a new weight ˜wt, at each time step t,

using the weight of the previous step ˜wt−1. The same principle can be used to draw

samples xt,(i) ∼ π(xt_{|y}t_{) using the output of the previous steps. Indeed, since the}

proposal distribution has been factored as

π(xt|yt_{) = π(x}

t|xt−1, yt) π(xt−1|yt−1)

it is possible to draw the samples xt,(i) by drawing the new state samples for the step t as

x(i)_{t} ∼ π(xt|xt−1, yt)

and augmenting each existing sample as

xt,(i) = {x(i)_{t} , xt−1,(i)}.

The generic sequential importance sampling algorithm can then be described as follows.

Algorithm 2. Sequential importance sampling
1. Draw N i.i.d samples x0,(i)_{= {x}(i)

0 } from the prior

x(i)_{0} ∼ p(x0), i = 1, . . . , N.

2. For each t = 1, . . . , T :

(a) Draw new state samples x(i)_{t} from the importance distributions:
x(i)_{t} ∼ π(xt|xt−1, yt), i = 1, . . . , N.

(b) Augment the existing samples using the new state samples:

xt,(i) = {x(i)_{t} , xt−1,(i)}.
(c) Calculate the new unnormalized weights as:

˜
w(i)_{t} = ˜w_{t−1}(i) `t(yt|x
(i)
t ) φt|t−1(x
(i)
t |x
(i)
t−1)
π(x(i)_{t} |xt−1,(i)_{, y}t_{)} .

(d) Normalize the weights using:

w(i)_{t} = w˜
(i)
t
PN
j=1w˜
(j)
t
.

(e) The approximation to the posterior pt_{(x}t_{) is then given as:}

ˆ
pt(xt) =
N
X
i=1
w(i)_{t} δ(xt− xt,(i)). (1.26)

In order to obtain an approximation to the filtering distribution, it suffices to perform a marginalization over the sequence of states xt−1:

ˆ
pt|t(xt) =
Z
ˆ
pt(xt) dxt−1
=
Z N
X
i=1
w_{t}(i) (t+1)δ(xt− xt,(i))
!
dxt−1
=
N
X
i=1
w_{t}(i)
Z
(t+1)_{δ(x}t_{− x}t,(i)_{)dx}t−1

By recalling the following property of the multidimensional Dirac delta function

(t+1)_{δ(x}t_{) = δ(x}

t)δ(xt−1) . . . δ(x0)

= δ(xt) (t)δ(xt−1)

the inner integral can be expanded as
Z
(t+1)_{δ(x}t_{− x}t,(i)_{) dx}t−1_{=}
Z
δ(xt− x
(i)
t ) (t)δ(xt−1− xt−1,(i)) dxt−1
= δ(xt− x
(i)
t )
Z
(t)_{δ(x}t−1_{− x}t−1,(i)_{) dx}t−1
= δ(xt− x(i)t )
leading to
ˆ
pt|t(xt) =
N
X
i=1
w(i)_{t} δ(xt− x
(i)
t ).

In summary, the weights that approximate the filtering distribution are the same weights that approximate the full posterior distribution.

A common assumption used in the approximation to the filtering distribution is that the importance distribution is Markovian in the sense that

π(xt|xt−1, yt) = π(xt|xt−1, yt).

With this choice, at the end of each recursion, the algorithm does not need to store the entire histories of the states

xt,(i), i = 1, . . . , N (1.27) but only the current states

x(i)_{t} , i = 1, . . . , N. (1.28)
The resulting algorithm is the following.

Algorithm 3. Filtering distribution approximation with SIS
1. Draw N i.i.d samples {x(i)_{0} } from the prior:

x(i)_{0} ∼ p(x0), i = 1, . . . , N

2. For each t = 1, . . . , T :

(a) Draw new state samples x(i)_{t} from the importance distributions:
x(i)_{t} ∼ π(x(i)_{t} |x(i)_{t−1}, yt), i = 1, . . . , N.

(b) Calculate the new unnormalized weights as:

˜
w(i)_{t} = ˜w_{t−1}(i) `t(yt|x
(i)
t ) φt|t−1(x
(i)
t |x
(i)
t−1)
π(x(i)_{t} |x(i)_{t−1}, yt_{)} .

(c) Normalize the weights using:

w(i)_{t} = w˜
(i)
t
PN
j=1w˜
(j)
t
.

(d) The approximation to the posterior pt|t(xt) is then given as:

ˆ
pt|t(xt) =
N
X
i=1
w_{t}(i)δ(xt− x
(i)
t ). (1.29)

### 1.4.4

### Sequential importance resampling particle filtering

The SIS algorithm is affected by a problem known as degeneracy problem: after a certain number of steps, almost all the samples have a very small weight. This fact have two main consequences:

1. a high computation load is devoted to processing samples that do not contribute to the approximation to the target distribution;

2. the performance of the algorithm ruins, on the long run, because the target distribution is represented with very few samples.

The degeneracy problem can be solved by using resampling strategies. The main idea is to use the information given by the weights to drop out useless samples, i.e. having a small weight, and replicate samples with large weights. This procedure may be performed at every time step or only when a certain condition on the current number of effective samples is satisfied. The latter solution is called adaptive sampling.

In the adaptive sampling method, the value of the weights is used to estimate the effective number of samples as

Nef f(t) ≈
1
PN
i=1
w_{t}(i) .

Then, resampling is performed when Nef f is lower than a given threshold Nthr.

Adding a resampling step to the SIS algorithm gives the so called sequential importance resampling algorithm (SIR) (see, e.g., Doucet et al., 2001 in [3];) or particle filter (PF). The name is due to the fact that the approximation to the filtering distribution is completely described by the set of particles

{(w_{t}(i), x(i)_{t} ), i = 1, . . . , N }.

It should be noted that in some texts, see, e.g., Ristic et al., 2004 in [19], the SIR algorithm performs resampling at each step while in other texts, see, e.g., Sarkka, 2013 in [22], it performs resampling only when the number of effective particles is below the given threshold.

The particle filter algorithm is the following. Algorithm 4. Particle filter

1. Draw N i.i.d samples {x(i)_{0} } from the prior

x(i)_{0} ∼ p(x0), i = 1, . . . , N

and set w(i)_{0} = 1/N .
2. For each t = 1, . . . , T :

(a) Draw new state samples x(i)_{t} from the importance distributions:
x(i)_{t} ∼ π(x(i)_{t} |x(i)_{t−1}, yt), i = 1, . . . , N.

(b) Calculate the new unnormalized weights as:

˜
w(i)_{t} = ˜w_{t−1}(i) `t(yt|x
(i)
t ) φt|t−1(x
(i)
t |x
(i)
t−1)
π(x(i)_{t} |x(i)_{t−1}, yt_{)} .

(c) Normalize the weights using:

w(i)_{t} = w˜
(i)
t
PN
j=1w˜
(j)
t
.

(d) If Nef f(t) < Nthr perform resampling.

(e) The approximation to the posterior pt|t(xt) is then given as:

ˆ
pt|t(xt) =
N
X
i=1
w_{t}(i)δ(xt− x
(i)
t ). (1.30)

The adaptive sampling method provides a criterion to decide when to perform resampling but does not provide a method to perform resampling. Among the pro-posed solutions available in the literature, the systematic resampling is one of the most used due to its reduced computational load, O(N ), and its simplicity.

The systematic resampling algorithm consider the normalized weights w(i)_{t} as if
they formed the probability mass function (pmf) of a discrete random variable. It
should be noted that the normalized weights can be used as a pmf since their sum
is equal to 1 due to the normalization procedure. Then, it forms the cumulative
distribution function (cdf) associated to the pmf by evaluating the cumulative sum
of weights (csw) ci as
c1 = w
(1)
t
ci = ci−1+ w
(i)
t , i ∈ [2, N ].

Given the definition of cdf, the csw ci are such that ci ∈ [0, 1] and cN = 1.

In order to perform resampling, N indexes ui are extracted as follows

u1 ∼ U (0, N−1)

ui = ui−1+ N−1, i ∈ [2, N ].

According to their definition, the indexes are such that ui ∈ [0, 1], hence they can be

compared to the csw ci. In order to extract N new particles, each index uj is used

to move along the csw until a weight w_{i}(t)

(j) is found such that

uj ∈ [c, c + w (t) i(j)]

where c is one of the csw. The index i(j) is the index of the newly extracted particle.

It is clear that the larger the weight, the higher the number of indexes that will belong to the interval occupied by that weight.

Since the interval [0, 1] is sampled uniformly by the choice of indexes ui, after each

extraction the weight associated to each new particle is set to 1/N . The resulting algorithm is the following.

Algorithm 5. Systematic resampling 1. Initialize the csw as:

c1 = w (1) t ci = ci−1+ w (i) t , i ∈ [2, N ].

2. Start at the bottom of the csw:

i = 1. (1.31)

3. Draw an initial condition for the indexes uj:

u1 ∼ U (0, N−1).

4. For each j = 1, . . . , N : (a) Move along the csw:

uj = u1+ N−1(j − 1)

(b) While uj > ci:

i = i + 1 (c) Assign the newly extracted particle:

x(j) ∗_{t} = x(i)_{t}
(d) Assign the new weight:

w_{t}(j)= N−1.

### 1.5

### Unscented particle filtering

The Unscented Particle Filter (UPF) is a particle filter that uses the Unscented Kalman Filter (UKF) as the importance distribution. The algorithm is particularly suited to filtering problems involving stochastic state space models of non-linear systems and shows better performance with respect to other variants of the particle filter using, e.g., the Extended Kalman Filter (EKF) as the importance distribution. In this section, first a numerical method known as Unscented Transform, that is used within the UKF, is presented. Then, the equations of the UKF and UPF algorithms are reported.

### 1.5.1

### Unscented transform

The unscented transform (UT) (Julier et al., 2000 [11]) is a numerical method that estimates the statistics of a random variable that undergoes a non-linear transforma-tion. More formally, given a random variable X ∈ Rn having mean ¯X = E(X) and covariance PX = Cov(X), the UT estimates the statistics of the random variable

Y = g(X) (1.32)

with g(·) a generic function.

Differently from other methods based on linearization, the UT directly approxi-mates the mean and covariance of the target distribution of Y instead of trying to approximate the non-linear function g(·). More in details, the UT is a second order method in the sense that the estimate of the mean of Y is exact for polynomials up to order two (three if X is distributed according to a Gaussian distribution).

The core idea behind the UT is to use a fixed number of points, called sigma points, that are chosen deterministically in a way such that the mean and the co-variance of the distribution of X are captured exactly. Then, the sigma points are propagated through the non-linearity and the mean and covariance of Y are esti-mated from them.

The complete procedure follows. 1. Form a set of 2n + 1 sigma points:

X(0) _{= ¯}_{X,}
X(i) _{= ¯}_{X +}√_{n + λ}hp_{P}
X
i
i
,
X(i+n) _{= ¯}_{X −}√_{n + λ}hp_{P}
X
i
i
, i = 1, . . . , n

where the operator [·]_{i} indicates the i-th column of its argument and λ is a
scaling parameter that is defined in terms of the UT parameters α and κ as

λ = α2(n + κ) − n. (1.33) The parameters α and κ can be used to decide the spread of the sigma points around the mean (cite Wand and Van der Merwe). The matrix square root can be evaluated using the singular value decomposition (SVD) as

PX = U SVT

p

PX = U

√ S.

2. Propagate the sigma points through the non-linearity:
Y(i) _{= g(X}(i)_{),} _{i = 0, . . . , 2n.}

3. Estimate the mean and covariance of g(X) as

E(Y ) ≈ ¯YU=
2n
X
i=0
W_{i}(m)Y(i)
Cov(Y ) ≈ P_{Y}U=
2n
X
i=0

W_{i}(c)(Y(i)− ¯Y )(Y(i)− ¯Y )T

(1.34)

where the weights W_{i}m and W_{i}c are constant and have the expression
W_{0}(m)= λ
n + λ,
W_{0}(c) = λ
n + λ + (1 − α
2_{+ β),}
W_{i}(m)= W_{i}(c) = 1
2(n + λ), i = 1, . . . , 2n.
(1.35)

The UT parameter β ≥ 0 can be used to incorporate prior knowledge of the distribution of X. For example, for Gaussian distributions, β = 2 is a good choice.

As regards the other UT parameters, κ ≥ 0 guarantees that the estimated covariance is positive semi-definite while α should ideally be a small value, typically α ∈ [0, 1], to avoid sampling non-local effects.

Often (see, e.g., Sarkka, 2013 in [22]), a Gaussian interpretation is given to the UT: the variable X is considered to be normal

X ∼ N ( ¯X, PX)

and the distribution of Y is approximated using a normal distribution
Y ∼ N ( ¯YU, P_{Y}U)

where ¯YU _{and P}U

Y are those defined in (1.34).

A Gaussian noise term

can also be taken into account in the transformed variable in two forms, namely additive and non-additive. In the additive form, the transformed variable assumes the expression

˜

Y = g(X) + Q = Y + Q with Y defined in (1.32) while in the non-additive form

˜

Y = g(X, Q).

Also, it is assumed that X and Q are independent. The UT transform can then be used to estimate the statistics of the extended random variable

X ˜YT .

These statistics are required for the development of the Unscented Kalman Filter algorithm.

In the following, only the additive case is considered. The mean of the extended
variable is computed as
EX_{˜}
Y
= E
X
Y + Q
=E (X)
E (Y )
≈
_{¯}
X
¯
YU

matrix of the extended variable is computed as
E X_{˜}
Y
− EX_{˜}
Y
X
˜
Y
− EX_{˜}
Y
T!
=E X_{˜}
Y
X
˜
Y
T!
− EX_{˜}
Y
EX_{˜}
Y
T
=
E XXT − E (X) E (X) E
X ˜YT
− E (X) E ˜Y
EX ˜YT_{− E (X) E} ˜_{Y}T _{E} ˜_{Y ˜}_{Y}T_{− E} ˜_{Y}_{E} ˜_{Y}
=
PX Cov(X, ˜Y )
Cov(X, ˜Y )T _{Cov( ˜}_{Y , ˜}_{Y )}
=
PX Cov(X, Y ) + Cov(X, Q)

(Cov(X, Y ) + Cov(X, Q))T Cov(Y, Y ) + 2Cov(Y, Q) + Cov(Q, Q)
=
PX Cov(X, Y )
Cov(X, Y )T _{Cov(Y, Y ) + P}
Q
≈
PX PX,YU
(P_{X,Y}U )T P_{Y}U+ PQ

where the UT estimate PU

Y in (1.34), the UT estimate PX,YU

P_{X,Y}U =

2n

X

i=0

W_{i}(c)(X(i)− ¯X)(Y(i)− ¯Y )T

and the fact that X and Q are independent, hence Y = g(X) and Q are, have been used.

The resulting algorithm follows.

Algorithm 6. Unscented approximation of an additive transform The Gaussian approximation to the distribution of

X ˜YT

where X ∼ N ( ¯X, PX), ˜Y = g(X) + Q, Q ∼ N (0, PQ) and X and Q independent

variables is given as
X
˜
Y
∼ N
_{¯}
X
¯
YU
,
PX PXYU
(P_{XY}U )T P_{Y}U+ PQ

where the subvectors and submatrices are computed as follows. 1. Form the set of 2n + 1 sigma points:

X(0) _{= ¯}_{X,}
X(i) _{= ¯}_{X +}√_{n + λ}hp_{P}
X
i
i
,
X(i+n) _{= ¯}_{X −}√_{n + λ}hp_{P}
X
i
i
, i = 1, . . . , n

where the parameter λ is defined in (1.33).

2. Propagate the sigma points through the non-linearity: Y(i) = g(X(i)), i = 0, . . . , 2n. 3. The subvectors and submatrices are then computed as:

¯
YU=
2n
X
i=0
W_{i}(m)Y(i)
P_{Y}U=
2n
X
i=0

W_{i}(c)(Y(i)− ¯Y )(Y(i)− ¯Y )T

P_{XY}U =

2n

X

i=0

W_{i}(c)(X(i)− ¯X)(Y(i)− ¯Y )T

where the weights W_{i}(m) and W_{i}(c) are defined in (1.35).

### 1.5.2

### Unscented Kalman filter

The Unscented Kalman Filter (UKF) (Wan and Van der Merwe, 2000 [26]) is an approximate minimum mean square error estimation algorithm that utilizes the un-scented transform. Similarly to the Extended Kalman Filter, it can be used for approximating the filtering distribution of stochastic state space models of the form

xt= f (xt−1, qk−1)

yt= h(xt, rt)

where f (·) and h(·) are generic functions, qt−1 ∼ N (0, Qt−1) is the Gaussian process

noise and rt ∼ N (0, Rt) is the Gaussian measurement noise. If the process and

measurement noises are assumed to be additive, the model takes the form xt= f (xt−1) + qk−1

yt= h(xt) + rt.

(1.37)

In the following, the model with additive noise will be considered.

The UKF algorithm is obtained by solving the recursive Bayes filtering equations (1.16) and (1.15) under the hypothesis of Gaussian prior

p0|0(x0) = N (x0; ¯x0, P0)

and using the unscented transform to obtain Gaussian approximations to the distri-bution of the predicted state

xt= f (xt−1) + qt−1,

to the distribution of the predicted measurement yt= h(xt) + rt

and to the joint distribution of the predicted state xtand the predicted measurement

yt. The output of the algorithm is then a Gaussian approximation to the filtering

distribution

pt|t(xt) ≈ N (xt; ¯xt, Pt). (1.38)

The point estimate is extracted as the the conditional expectation ˆ

xt= E(Xt|Yt = yt) ≈ ¯xt.

Since the UKF algorithm is obtained using UT approximation, the conditional ex-pectation obtained by integration of the approximate filtering distribution, ¯xt, does

not correspond to the real minimum mean square error estimate, i.e. the UKF is suboptimal. However, the properties of the UT allow to obtain better results when compared to the Extended Kalman Filter that uses first-order Taylor approximations to the distributions.

The UKF algorithm is presented in the following. Algorithm 7. Unscented Kalman Filter (additive noise) Given a state space model of the form

xt= f (xt−1) + qk−1

where qt−1∼ N (0, Qt−1) and rt ∼ N (0, Rt), the approximate minimum mean square

error estimate is obtained recursively as follows. 1. Prediction:

(a) Form the sigma points:

X_{t−1}(0) = ¯xt−1,
X_{t−1}(i) = ¯xt−1+
√
n + λhpPt−1
i
i
,
X_{t−1}(i+n) = ¯xt−1−
√
n + λhpPt−1
i
i
, i = 1, . . . , n

where the parameter λ is defined in (1.33).

(b) Propagate the sigma points through the dynamic model:

ˆ Xt

(i)

= f (X_{t−1}(i)), i = 0, . . . , 2n.

(c) Compute the predicted mean ¯x−_{t} and the predicted covariance P_{t}−:

¯
x−_{t} =
2n
X
i=0
W_{i}(m)Xˆt
(i)
P_{t}− =
2n
X
i=0
W_{i}(c)( ˆXt
(i)
− ¯x−)( ˆXt
(i)
− ¯x−)T + Qt−1

where the weights W_{i}(m) and W_{i}(c) are defined in (1.35).
2. Correction:

(a) Propagate the predicted sigma points through the measurement model:

ˆ Yt (i) = h( ˆXt (i) ), i = 0, . . . , 2n.

(b) Compute the predicted mean of the measurement ¯yt, the predicted

the predicted measurement P_{x}_{¯}−
t y¯t:
¯
yt=
2n
X
i=0
W_{i}(m)Yˆt
(i)
Py¯t =
2n
X
i=0
W_{i}(c)( ˆYt
(i)
− ¯yt)( ˆYt
(i)
− ¯yt)T + Rt
P_{x}_{¯}−
t y¯t =
2n
X
i=0
W_{i}(c)( ˆXt
(i)
− ¯x−_{t})( ˆYt
(i)
− ¯yt)T.

(c) Compute the filter gain Kt, the estimate ¯xt and the covariance Pt:

Kt= Px¯−_{t} y¯t(Py¯t)
−1
¯
xt= ¯x−t + Kt(yt− ¯yt)
Pt= Pt−− KtPy¯tK
T
t .
(1.39)

### 1.5.3

### Unscented particle filter

The Unscented Particle Filter (Merwe et al., 2000 [24]) is a particle filter that uses the UKF Gaussian approximation (1.38) as the importance distribution

π(xt|xt−1, yt) = N (xt; ¯xt, Pt)

with ¯xt and Pt defined in (1.39). Each particle (w (i) t , x

(i)

t ) is then represented using

a weight w_{t}(i), a mean ¯x(i)_{t} and a covariance P_{t}(i)

(w_{t}(i), x(i)_{t} ) = (w_{t}(i), ¯x(i)_{t} , P_{t}(i)).

At each step, the mean and the covariance are updated using the UKF prediction and correction steps with the result of moving the particles toward regions of higher measurement likelihood.

In order to apply the UKF to each particle, it should be noted that the UKF algorithm describes the dynamic and measurement model of the system using a state space formalism

xt= f (xt−1) + qk−1

while the PF algorithm uses a Markov transition density φ(xt|xt−1) and a

measure-ment likelihood function `(yt|xt) as it is clear from the particle weights recursion

˜
w_{t}(i) = ˜w(i)_{t−1} `t(yt|x
(i)
t ) φt|t−1(x
(i)
t |x
(i)
t−1)
π(x(i)_{t} |x(i)_{t−1}, yt_{)} .

Consequently, it is required to assume that the Markov transition density and the measurement likelihood function are generated according to the dynamic and mea-surement model, respectively.

The final algorithm can be stated as follows. Algorithm 8. Unscented Particle Filter

1. Draw N i.i.d samples {x(i)_{0} } from the prior

x(i)_{0} ∼ p(x0), i = 1, . . . , N

and set w(i)_{0} = 1/N , P_{0}(i)= P0.

2. For each t = 1, . . . , T :

(a) Evaluation of the importance distribution:

i. Given the particle (x(i)_{t−1}, P_{t−1}(i)), move the particle using the UKF
al-gorithm (alg. 7):

(¯x(i)_{t} , P_{t}(i)) = UKF(f (·), h(·); ¯x(i)_{t−1}, P_{t−1}(i), yt).

ii. Form the importance distribution as:

π(x(i)_{t} |x(i)_{t−1}, yt) = N (x_{t}(i); ¯x(i)_{t} , P_{t}(i)), i = 1, . . . , N.
(b) Update of the weights

i. Draw new state samples x(i)_{t} from the importance distributions:
x(i)_{t} ∼ N (x(i)_{t} ; ¯x(i)_{t} , P_{t}(i)), i = 1, . . . , N.

ii. Calculate the new unnormalized weights as:

˜
w(i)_{t} = ˜w(i)_{t−1} `t(yt|x
(i)
t ) φt|t−1(x
(i)
t |x
(i)
t−1)
π(x(i)_{t} |x(i)_{t−1}, yt_{)} .

iii. Normalize the weights using:
w_{t}(i) = w˜
(i)
t
PN
j=1w˜
(j)
t
.
(c) Resampling

i. Evaluate the effective number of particles:

Nef f(t) ≈
1
PN
i=1
w_{t}(i) .
ii. If Nef f(t) < Nthr perform resampling

(d) The approximation to the posterior pt|t(xt) is then given as:

ˆ
pt|t(xt) =
N
X
i=1
w_{t}(i)δ(xt− x
(i)
t ). (1.40)

### 1.5.4

### Maximum a posteriori estimation in particle filtering

The algorithms presented in the previous sections allow to approximate the filtering distribution pt|t(xt). Once the approximation is available, an estimate of the state

xt can be extracted in several ways. One of the most used estimate is the minimum

mean square error (MMSE) estimate which happens to be equal to the expectation of the filtering distribution

ˆ
xM M SE_{t} = E(Xt|Yt= yt)
=
Z
ξ pt|t(ξ) dξ
≈
Z
ξ
N
X
i=1
w_{t}(i)δ(ξ − x(i)_{t} )
!
dξ
=
N
X
i=1
w_{t}(i)
Z
ξ δ(ξ − x(i)_{t} ) dξ
=
N
X
i=1
w_{t}(i)x(i)_{t} .

One of the disadvantage of the MMSE estimate is that it is not suitable for mul-timodal distributions. Indeed, this kind of weighted estimate may be located in a region of the state space between two modes where the distribution has a small value. In case of multimodality, another kind of estimate, namely the maximum a pos-teriori (MAP) estimate, that takes the state that maximizes the distribution, may be more pertinent. More formally the MAP estimate can be expressed as

ˆ

xM AP_{t} = arg max

xt

pt|t(xt).

Performing optimization over the filtering distribution, or its particle filter approxi-mation, is a complex operation. For this reason, in the literature, the MAP estimate is sometimes taken as the state of the particle with the maximum weight. However, it has been shown that this choice may be far from the true MAP.

Recently, Saha et. al [21] have proposed a new method for the computation of the MAP estimate starting from the approximation of the filtering distribution provided by a particle filter. The method is derived in the following.

The first step is to recall the expressions of the predictive distribution (1.16)

pt|t−1(xt) =

Z

φt|t−1(xt|ξ) pt−1|t−1(ξ) dξ

and of the filtering distribution (1.15)

pt|t(xt) =

`t(yt|xt) pt|t−1(xt)

R `t(yt|ξ) pt|t−1(ξ) dξ

.

The predictive distribution can be approximated using the approximation of the filtering distribution, provided by the particle filter, as

pt|t−1(xt) ≈
Z
φt|t−1(xt|ξ)
N
X
i=1
w(i)_{t} δ(xt−1− x
(i)
t−1)
!
dξ
=
N
X
i=1
w_{t}(i)
Z
φt|t−1(xt|ξ) δ(xt−1− x
(i)
t−1) dξ
=
N
X
i=1
w_{t}(i)φt|t−1(xt|x
(i)
t−1).

Then, the filtering distribution can be approximated as
pt|t(xt) ∝ `t(yt|xt) pt|t−1(xt)
= `t(yt|xt)
N
X
i=1
w_{t}(i)φt|t−1(xt|x
(i)
t−1)

where the denominator has been removed since it does not depends on xt, hence it is

irrelevant for the evaluation of the maximum with respect to xt. The MAP estimate

can be evaluated as
ˆ
xM AP_{t} = arg max
xt
`t(yt|xt)
N
X
i=1
w_{t}(i)φt|t−1(xt|x
(i)
t−1).

In principle, an optimization technique should be employed to find the maximum over the continuous space to which xt belongs to. Alternatively, an approximation

can be made by considering a discretized research space and taking as possible values for the state xt the states of the particles x(i)t resulting in the estimate

ˆ
xM AP_{t} = arg max
x(i)t
`t(yt|x
(i)
t )
N
X
i=1
w_{t}(i)φt|t−1(x
(i)
t |x
(i)
t−1). (1.41)

## 2.

## Object localization using vision

## and touch

The object localization problem is the problem of estimating in real time the pose of an object O whose shape is known, using noisy measurements provided by external sensors. No hypotheses are made on the state of the object that may be stationary or in motion due to external stimuli.

Several kind of sensors may be used to serve this purpose, e.g. RGB(-D) cameras, tactile sensors and force/torque (F/T) sensors.

Cameras, and in general vision sensors, provide information on the pose of the object on a global basis. These information are available through RGB or RGB-D images, the latter containing information on the depth of the scene captured by the sensor. One common way to exploit this sensors, for object localization, is to extract point clouds, i.e. clouds of Cartesian points, from stereo RGB images or from RGB-D images, belonging to the surface of the object of interest.

Tactile sensors and F/T sensors provide local information. F/T sensors are used to measure forces and torques exchanged between the end-effector and the object. Hence, they allow to infer the dynamics of an object when it is manipulated by the end-effector. Tactile sensors, instead, generally provide binarized information, i.e. contact/no contact, that can be combined with the forward kinematics of a robot to obtain the Cartesian position of the contacts occurring between the end-effector and the object.

Information from vision and tactile/FT sensors may not always be available at the same time, depending on the relative configuration of the sensors with respect to the object. In a scenario involving a humanoid robot or an industrial manipulator, it often happens that, when the end-effector manipulates an object, it may be totally or partially occluded thereby precluding the use of vision. This situation should be taken into account in the development of a localization algorithm.

It should also be noted that F/T sensors may not always be available in robotic platforms. Especially in humanoid robots, involving complex end-effectors, e.g.

multi-fingered hands, these sensors are not easy to implement on each finger tip and are not available. Consequently, localization algorithms that do not require force/torque measurements are of great interest and the attention will be restricted to these kind of algorithms in the following.

Since visual information can be represented as clouds of Cartesian points, be-longing to the visible part of the object, and tactile information as sets of Cartesian points, corresponding to contact points between the object and the end-effector, the localization problem may be specialized in the following sense.

The goal of the object localization problem is to estimate in real time the pose
of an object O of known shape, using measurements y_{t}_{∈ R}3nt

y_{t}=yT

t,1 . . . yTt,nt

T
where each y_{t,i} _{∈ R}3_{, with i ∈ {1, . . . , n}

t}, is a Cartesian point in the space and

nt is the number of points available at time t. A visual measurement may consist

of several hundreds of points belonging to a point cloud. Conversely, for a tactile measurement, ntwill not be greater than the maximum possible number of contacts

between the object and the end-effector. For example, nt ≤ 5 for a human-like

end-effector with tactile sensors on the finger tips. Finally, the measurements are assumed to be corrupted by Gaussian noise.

Among the algorithms, available in the literature, that share the presented frame-work there is the Memory Unscented Particle Filter (MUPF) [25] that solves the object localization problem using tactile measurements only in case of a stationary object. In the following, the MUPF algorithm will be presented and then extended to the more general case of moving objects and measurements coming from both the visual and tactile domain.

### 2.1

### Memory unscented particle filter

The Memory Unscented Particle Filter (MUPF) is a UPF based filtering algorithm
that solves the problem of estimating in real time the pose x of a stationary object
O of known shape, using tactile measurements y_{t}.

The pose x ∈ R6 of the object is represented using three Cartesian coordinates x, y and z and three Euler angles φ, θ and ψ

x =x y z φ θ ψT . (2.1) In order to clearly explain the meaning of each component, the following assumptions are made. Suppose that a world reference frame

is attached to the robot root frame and that another reference frame {B} = {Ob; xb, yb, zb}

is attached to the object O, with Ob a point belonging to the object, e.g. the object

centroid. Then, the configuration of the object can be generically expressed using a
homogeneous transformation
T_{b}r =R
r
b rp
0T 1
where
r_{p =}_{x y z}T

is the vector from Or to Ob expressed in {R} and Rbr is the rotation matrix between

the frames {R} and {B}. This rotation matrix is parametrized using a Euler ZYX parametrization

Φ =φ θ ψT .

Each measurement y_{t}_{∈ R}3 _{corresponds to the Cartesian position of a contact point}

between the object and the end-effector. As it is clear, within the MUPF nt≡ 1, i.e.

a single contact point is given to the algorithm at each time step.

Since the MUPF is based on a UPF algorithm, in order to completely define the algorithm is required to specify a stochastic state space model of the form (1.36) or (1.37), the Markov transition density φt|t−1(xt|xt−1), defined in (1.7), and a

mea-surement likelihood function `t(yt|xt), defined in (1.10).

### 2.1.1

### Motion model

The MUPF algorithm assumes that the object is static, i.e. xt = f (xt−1) = xt−1

that corresponds to the the Markov transition density φt|t−1(xt|xt−1) = δ(xt− xt−1).

However, since particle filtering techniques are not particularly suitable for static localization, an artificial Gaussian noise is added obtaining

xt = xt−1+ w

w ∼ N (0, Q) (2.2)

that corresponds to the transition density

### 2.1.2

### Measurement model

The measurement likelihood function used within the MUPF is based on the prox-imity model (Petrovskaya and Khatib, 2008 [18]). According to this model, the measurements are assumed to be conditionally independent of each other given the state xt (see assumption 2 in 1.3.1) and corrupted by Gaussian additive noise

y_{t}= ˜y_{t}+ ν

ν ∼ N (0, σ_{m}2I3×3)

with ˜y_{t} the un-corrupted contact point. As can be seen from the definition
of the noise covariance matrix, each Cartesian component of the contact point is
independent from the others.

The likelihood function has the expression

`t(yt|xt) ∝ exp
− 1
2σ2
m
ky_{t}− p∗(xt)k
2
(2.3)

where p∗(xt) is the Cartesian point belonging to the surface of the object, whose

pose is xt, that is closest to the measurement yt. More formally,

p∗(xt) = arg min

p∈∂Oxtkyt− pk

where ∂Oxt _{is the boundary of the object in the pose x}

t.

The measurement model can be interpreted as follows. The measurement y_{t}
corresponds to the actual contact point corrupted by the measurement noise and its
likelihood, given the state xt, increases as the distance between the measurement and

p∗ decreases according to a Gaussian distribution with variance σ2

m. In other terms,

each measurement is distributed according to a Gaussian distribution centered on p∗ that, in the end, is considered to be the actual contact point by the proximity model.

Clearly, p∗ is not the real contact point but a sort of prediction obtained from the knowledge of the pose of the object xt. This interpretation leads to the following

definition of the state-space measurement equation
y_{t}= ht(xt) + ν

ν ∼ N (0, σ_{m}2I3×3)

(2.4)

where ht(xt) = p(xt). An example of the output of the measurement equation is

Figure 2.1: Measurement prediction according to the proximity model.

It should be noted that the definition (2.4) is consistent with the likelihood func-tion (2.3). In fact, the likelihood can be evaluated as

pYt|Xt=xt(yt|xt) =
1
(2πσm)
3
2
exp
−1
2(yt− p
∗
(xt))T σ2mI3×3
−1
(y_{t}− p∗(xt))
∝ exp
− 1
2σ2
m
(y_{t}− p∗(xt))T(yt− p
∗
(xt))
= exp
− 1
2σ2
m
ky_{t}− p∗(xt)k2
.

### 2.1.3

### The role of the memory

The MUPF estimates the pose x ∈ R6 _{using a single measurement y ∈ R}3 _{per time}

step. However, a single contact point is not informative enough to update the particle weights in a meaningful way using the standard UPF weight recursion (1.25)

˜

w_{t}(i) ∝ ˜w_{t−1}(i) `t(yt|x
(i)
t ).

In fact, there may be particles with wrong poses that are compatible with that single measurement more than other particles with right poses. Noise in the measurements is also responsible for this phenomenon. Since smaller weights are erroneously as-signed to good particles, when the resampling step is performed those particles may be pruned leading to a poor representation of the filtering distribution.

The MUPF algorithm solves this problem by using a memory mechanism that uses up to m past measurements to update the weights. The expression of the weights

recursion is the following
˜
w_{t}(i) ∝ ˜w_{t−1}(i)
t
Y
k=¯k(t)
`t(yk|x
(i)
t ) (2.5)

where the index ¯k(t)

¯ k(t) =

(

t − m + 1, t − m ≥ 1 1, otherwise allow to select the most recent m measurements available.

It should be noted that this operation is sound since the object is assumed to be
stationary, hence past measurements are comparable with the pose of the object at
each time t. However, special care is required in order to extract a point estimate
from the particle filter approximation. To see this, the estimation problem is recast
into a batch estimation problem in which all the measurements yt _{are provided to}

the algorithm in one step. Using the Bayes’ theorem, the conditional distribution of the state given all the measurements can be written as

p(x|yt) ∝ p(yt|xt)p(x)

where x is the pose of the stationary object and p(x) is the prior on the object pose. By using the chain rule (1.3) and the assumption of conditional independence of the measurements (1.9), the same distribution can be expanded as:

p(x|yt) ∝

t

Y

k=1

`(y_{k}|x)p(x).

In a importance sampling (1.4.2) interpretation, this would correspond to having an approximation to the filtering distribution as

ˆ
pt|t(xt) =
N
X
i=1
w_{t}(i)δ(xt− x
(i)
t )
˜
w(i)_{t} ∝
t
Y
k=1
`k(yk|xk)
!
w(i)_{t} = w˜
(i)
t
PN
j=1w˜
(j)
t
.