Calibrating a varying camera from silhouettes and background

(1)

Politecnico di Milano

Dipartimento di Elettronica, Informazione e Bioingegneria Dottorato di Ricerca in Ingegneria dell’Informazione

Calibrating a Varying Camera

from Silhouettes and Background

Doctoral Dissertation of:

Dong Han

Supervisor:

Prof. Vincenzo Caglioti

Tutor:

Prof. Francesco Amigoni

Supervisor of the Doctoral Program:

Prof. Carlo Fiorini

(2)

”I have seen flowers come in stony places And kind things done by men with ugly faces And the gold cup won by the worst horse at the races

So I trust, too.”

(3)

Abstract

In this thesis, we consider a specific 3D modeling scenario where a smooth 3D object of unknown shape is observed, against a background plane with unknown patterns, by an uncalibrated, moving camera with varying intrinsic parameters. Starting from the acquired images, we address the problem of recovering i) the value of the camera intrinsic parameters for each image, ii) the camera motion and, iii) the 3D reconstruction of some points on the object surface.

The problem is particularly interesting in prosthetics, or custom cloth, shoe manufacturing, where the 3D model of a smooth textureless object is to be ob-tained. The smoothness of the object makes it difficult to identify image cor-respondences, due to the continuously varying contour generator as the camera viewpoint moves. In addition, using, e.g., consumer cameras (e.g., those provided with tablets or smart phones), often the auto-focus property of the device is ac-tive letting the intrinsic camera parameters vary between acquisitions. Therefore, calibration methods, such as Zhang[2000], cannot be applied.

We propose a new framework that could calibrate a fully varying camera from background and silhouettes of smooth objects, not requiring more than two frontier points per view-pair.The epipole positions in each image pair are first estimated by finding two epipolar lines that are tangent to the object silhouette with the help of the plane induced homography. The projective reconstruction of the im-age sequence is then robustly computed from the estimated epipolar geometry. A flexible self-calibration algorithm using the absolute dual quadric is employed to determine the camera intrinsic parameters and to upgrade the projective recon-struction to metric.

The proposed algorithm does not depend on the assumption of orthographic or affine cameras to find epipoles thus is applicable to generic projective cameras with both varying focal length and principal point. Experiments with both synthetic and real images are carried out and good calibration accuracy is achieved.

(4)

Acknowledgements

I would like to thank my advisor Professor Vincenzo Caglioti to give me the precious opportunity to pursue my Ph.D under his guidance. He have lead me to the field of multiple view geometry in computer vision which I enjoy working on. He has been extremely helpful providing me with his insight in various problems and great encouragement during the years I study in Politecnico di Milano. His passion and vision continuously motivate me to pursue better research and have fun.

I would also like to thank Professor Giuseppe Menga of Politecnico di Torino for his support in my Ph.D application and the great research opportunities he had provided to me. His gentlemanship and kindness have been and always will be my learning model.

I want to thank my friends Yongtao Sun, Zhe Yin, Xiang Chen and Jie Xu for their companionship and the great joy they have brought to me. They have made my life out of campus colorful and full of good memories. I would like to thank my parents, my girl friend for their constant encouragement and support, without which I could not have achieved what I had today.

Last but not the least, I would like to thank China Scholarship Council for the financial support, which enables me to have this truly previous opportunity of pursuing my Ph.D degree in Italy.

(5)

List of Figures

1.1 Traditional calibration method. (a) chessboard pattern used by Zhang [2000], (b) circle control point pattern used by Heikkila and Silven [1997], (c-d) the laser pointer and its projection used by

Svoboda et al. [2005] . . . 4

1.2 Problem formulation. A smooth object is observed against a back-ground plane with unknown texture by a un-calibrated camera with varying intrinsic parameters. . . 6

2.1 Main stages of the proposed framework . . . 14

3.1 Contour Generator. The viewing ray through a point X of the contour generator is on the tangent plane to the object surface. . . 17

3.2 Frontier point . . . 17

3.3 Two frontier points P1 and P2 always exist. . . 18

3.4 The geometric solution for epipole estimation by locating frontier points with the assistance of the background plane. . . 19

3.5 Overview of the geometric solution . . . 20

3.6 Distance between a line and a closed curve . . . 21

3.7 Algebraic solution of finding epipole . . . 22

3.8 Multiple frontier points for a non-convex object; Red dots are out-ermost frontiers and blue dots are ’inner’ frontiers . . . 25

3.9 Convex hull and valid segments of a non-convex object. (a) valid segments are represented by thick colored curves. (b) tangent envelop 26 3.10 Line to curve distance with outer tangents . . . 28

3.11 A typical example of the objective function. . . 29

3.12 The synthetic scene . . . 33

3.13 The detected epipolar tangents for synthetic scene. Each row are a pair of corresponding views. Lines of the same color are the detected corresponding epipolar tangents . . . 34

3.14 The epipole error v.s. homography error . . . 35

4.1 The view triangle. (a) 3 views in general positions where funda-mental matrices are independently computable; (b). Graph repre-sentation of a view triplet. The arrows indicate the dependency of camera matrices in a reconstruction, e.g., P2 _{is only dependent on} F12 while P3 is computed based on both F13 and F23 . . . 40

(10)

List of Figures x

4.2 Incremental construction. Each node represents a camera. A new node l is added to a existing network Ci by selecting two best nodes

j, k to form a view triangle. . . 44

6.1 Gaussian Edge model . . . 57

6.2 Sub-pixel edge position by Gauss Fitting . . . 58

6.3 Color feature image for segmentation and edge location. (a) original image; (b) color feature image; (c) zoom-in view around an edge of (b); (d) detected sub-pixel edge points. . . 59

6.4 The synthetic images. (a) edge with 0 noise; (b) zoomed-in view of (a); (c) image corrupted by Gaussian noise . . . 62

6.5 Edge position error for sub-pixel detection algorithm. . . 63

6.6 Apparent contour with sub-pixel accuracy. (a) apparent contour; (b-d) zoomed view . . . 64

7.1 Detected epipolar tangents for the bear model . . . 69

7.2 The average relative error of intrinsic parameters. (a) The average relative error of the focal length. (b) The average relative error of the principal point. . . 70

7.3 Frontier point reconstruction for a sphere from 10 views. . . 72

7.4 The average relative distance error ∆R for spheres. . . 73

7.5 The reprojection error of the metric reconstruction . . . 74

7.6 The detected epipolar tangents for real images. (a) and (b), (c) and (d) are two corresponding view pairs. The corresponding tangent lines in a view pair are represented in the same color. . . 75

(11)

List of Tables

6.1 Edge detection error . . . 63

(12)

(13)

Symbols

X a matrix x a column vector xT _{a row vector} x a scalar ' equal up to a scale xiii

(14)

(15)

Dedicated to my parents

(16)

(17)

Chapter 1 Introduction

1.1 3D visual modeling

The visual modeling of 3D real world objects is an important research area in Computer Vision. Its applications range from robot navigation, visual inspection to graphics industry (films, games and augmented reality). Currently, models in real applications (especially in film and video game industry) are built manually therefore it takes a huge amount of human effort. The task in computer vision is to make such modeling process automatic, easy and yet be able to generate realistic results.

For textured objects and scenes, many 3D reconstruction systems have been implemented (Tomasi and Kanade[1992],Pollefeys et al.[2004],Rothganger et al.

[2006], Kowdle et al. [2014]), which can model the scene from arbitrary image collections or video sequences. And recently, very large-scale 3D reconstruction systems for online photo collections that scales to millions of photos have been de-veloped (Snavely et al. [2006],Pollefeys et al.[2008],Furukawa and Ponce[2010]). Although those systems have different structures and emphases, they all rely on detecting distinctive features (such as corners or local regions) from objects to establish correspondences between multiple views and subsequently to recover the 3D geometry. However, the use of this technique is prohibited in modeling smooth and textureless objects because no direct distinct local features are detectable. For example, in prosthetics, custom cloth or shoe manufacturing, 3D models of textureless or little textured limbs, cloth or leather are widely used. These appli-cations play an important role in improving people’s health and quality of life and they calls for a different kind of visual modeling system.

(18)

Chapter 1. Introduction 2 In this thesis, we consider a specific 3D modeling scenario where a smooth 3D object of unknown shape is observed, against a background plane with unknown patterns. We study the fundamental problem of building a reconstruction system in such situations, i.e. the calibration of a camera (with possibly varying intrinsic parameters) and the estimation of the shape of the object. A plane is often visible in man-made environments, especially in indoor scenes and its existence facilitates the establishment of geometric constraints on cameras and helps to bring desirable improvement for the system performance.

We start by detecting epipolar geometry in each view pair. Since no detectable distinctive features are directly available, the shape or the silhouette becomes a dominant feature for representing the object. However a significant problem with silhouettes is that they are constantly changing with the variation of the view point therefore causing difficulty in silhouette matching or tracking in multiple views. Luckily, the existence of some ’fixed’ points on the boundary of the silhouette provides enough information for constraining the camera motion. Reasonably, the scheme of finding ’fixed’ points are adopted by almost all existing smooth object reconstruction systems (Furukawa et al.[2006], Sinha and Pollefeys[2010]) for obtaining constraints on camera motion, although they differ in applicability and algorithm complexity. We follow this idea but further take advantage of the constraints provided by the background plane to develop a new solution framework for searching those points and computing the epipolar geometry. The framework formulates the searching as an optimization problem where the epipolar tangency constraints are imposed and the plane induced homography acts as a bridge for linking the epipolar lines in a view pair. Unlike the previous methods which are restricted to using orthographic or affine cameras, the proposed algorithm is applicable to general projective cameras and yet do not requires initial guesses for camera parameters.

For recovering the shape of the object, the techniques of building a projective reconstruction from multiple view geometries are studied. Due to the presence of noise, emphasis is laid on getting a robust projective reconstruction with maximum consistency. Generally there are two ways to extend the binocular or trinocular reconstruction to multiple view reconstruction. The first is incremental reconstruc-tion, where the reconstruction is built by adding a new view a time to the initial triple view reconstruction and solving progressively larger bundle adjustment to resist error accumulation. The second is to hierarchically merge sub-sequences,

(19)

Chapter 1. Introduction 3 where the whole sequence is partitioned into several sub-sequences and the over-lapping views between sub-sequences are used to link them together. The litera-ture has seen good results in using the first (Beardsley et al. [1997],Pollefeys et al.

[2004]) and the second method (Fitzgibbon and Zisserman [1998], Nist´er [2000]). In our framework, we adopt the incremental construction scheme and use directly the plane induced homography and estimated epipoles for constructing projective matrices. The direct projective matrix construction results in a lower dimensional searching space for achieving consistent triple view reconstruction than using the general fundamental matrix based approach. We then use the consistent triple view construction solution as the building block for incremental construction.

To recover the calibration parameters and the shape of the object, self-calibration techniques are studied to upgrade the projective reconstruction to metric. Since 3D reconstruction systems are becoming more and more common to non-experts users equipped with consumer cameras, we want to make our system to be flexible enough to deal with cameras with a varying focal length. The varying focal length is either caused by manual zooming changes from the user or by the use of auto-focus cameras provided by smart phones or tablets. Most self-calibration systems that are capable of dealing with zooming cameras, assume that the principal point is fixed at the image center throughout the sequence (Malis and Cipolla [2002],

Ponce et al.[2005]) and the error caused by this not-fully-satisfied constraint does not affect much the calibration accuracy. We however try to use constraints that are more consistent with zooming cameras (i.e., let the principal point vary in calibration) and expect a better calibration accuracy. Prior information that is common to practical cameras could also be used to ease the computation. These priors should be common to practical modern cameras so that no special limitation is caused by using those priors.

1.2 Camera calibration

A very important step in 3D reconstruction is camera calibration, i.e., to obtain the intrinsic and extrinsic camera parameters. Traditional camera calibration methods often start by designing a calibration target where the patterns are known geometrically. The camera moves around the target or vice versa the target moves in front of a fixed camera resulting in a number of acquired images. The pattern on the calibration target is designed to be easily detectable in order to facilitate the matching between the 3D control points on the target and their images. A camera

(20)

Chapter 1. Introduction 4 model is then fit to the established 3D to 2D correspondences by minimizing the overall reprojection error.

(a) (b)

(c) (d)

Fig. 1.1 Traditional calibration method. (a) chessboard pattern used byZhang [2000], (b) circle control point pattern used byHeikkila and Silven[1997], (c-d)

the laser pointer and its projection used by Svoboda et al.[2005]

Tsai [1987] was a popular camera calibration technique which uses coplanar squares to generate correspondence. It is suitable for a wide area of applications because it could deal with both coplanar and non-coplanar points (non-coplanar points could be generated by moving the planar pattern), and both monoview and multiple views. It also provides the possibility of calibrating the extrinsic and intrinsic parameters separately. A similar method is simultaneously proposed by

Zhang [2000] and Sturm and Maybank [1999], who use various poses of a planar chessboard pattern to collect correspondences and the planar homography to sim-plify the calculation. Heikkila and Silven [1997] proposed a four step calibration system which is based on detecting circular control points. It is believed that the accuracy for detecting a circular control point in image is more accurate than de-tecting corners thereforeHeikkila and Silven[1997] is supposed to generate better results in practice.

(21)

Chapter 1. Introduction 5 The above mentioned calibration algorithms are classified as off-line calibration methods because they are usually conducted before using the calibrated camera to perform other tasks. One significant drawback of these methods is that they are slow, time-consuming and often require physical intersection with the scene which is sometimes unrealistic, dangerous, or even impossible to achieve. Another weakness is that they are designed for camera with fixed internal parameters. If the camera is auto-focus or the zooming settings change during image acquisition, these methods could not work at all. As for being used for calibrating multiple camera systems, visibility of the planar target to all the cameras becomes a prob-lem. Frequently, the cameras are placed around the scene and the calibration target is only visible to a small subset of the camera group at one time. Therefore only a small number of cameras are calibrated by one image acquisition process. Though it is possible to perform this small group calibration multiple times and merge the results together to obtain the calibration of the whole network, it is very error prone and require more effort (Sinha [2009]).

A multiple camera system algorithm was also developed inSvoboda et al.[2005] where it tracks a virtual point projected by a common laser pointer for establishing the point correspondence. Although this method does not have visibility problem it requires manual control of the laser pointer.

1.3 Problem formulation

In this thesis, we consider a specific camera calibration scenario (shown in Fig.1.2): an uncalibrated camera takes some images of a static scene, constituted by a background plane π with some unknown planar pattern and a smooth, textureless 3D object S of unknown shape, from different viewing positions. The camera is undergoing some general motion and while moving, both the focal length and the principal point may change due to either manual zooming or the use of an auto-focus camera on a smart phone or tablet. Although the variation of the principal point shall not be large, it can not be assumed to be fixed in the image center.

For each pair of two views, a sufficient number of correspondences, for the com-putation of the fundamental matrix, is not available directly due to the smoothness of the 3D object. In fact, the planar pattern only provide 4 independent corre-spondences. The addressed problem is to i) determine the camera internal and external parameters, in correspondence of each of the acquired images; ii) estimate the shape of the object by sparse reconstruction (recover the frontier points). In

(22)

Chapter 1. Introduction 6

Fig. 1.2 Problem formulation. A smooth object is observed against a back-ground plane with unknown texture by a un-calibrated camera with varying

intrinsic parameters.

principle, the introduced framework allows to recover the object’s shape, e.g. visual hull reconstructionLaurentini[1994], however in this work we mainly concentrate on camera calibration, and we just reconstruct some points on the object surface in order to provide an indirect evidence of the calibration accuracy.

1.4 Thesis outline

The thesis is organized into 8 chapters. In Chapter 2, an overview of the system structure is presented and the state-of-art is examined. Chapter 3 discusses the proposed geometric and algebraic solution for locating epipoles from silhouettes and plane induced homographies. In Chapter 4, we show how to recover, with enhanced consistency, the multiple view projective reconstruction from pair-wise epipolar geometries. We then show in Chapter 5 the self-calibration algorithm that enables us to recover the structure of the scene and the motion of the camera albeit both focal length and principal point are varying. In Chapter 6, it is briefly explained how to detect the apparent contour based on sub-pixel edge estimation. Chapter 7 shows the final experimental result and Chapter 8 concludes the thesis.

(23)

Chapter 2 Overview of the framework

In this chapter we investigate the state-of-art algorithms that are similar to us and give an overview of the main stages of the proposed calibration framework.

2.1 Background

2.1.1 Camera model

Throughout the thesis, we are considering a general projective camera, where the pinhole camera model applies. The intrinsic parameters of the camera are arranged in a calibration matrix K =     fx s px fy py 1     (2.1)

where fx and fy are the focal length in x and y directions respectively; the aspect

ratio α = fy/fx; (px, py) is the principal point, which is the point where the

principal axis (the line from the camera center perpendicular to the image plane) intersects the image plane; s is the camera skew, which is 0 for common cameras.

2.1.2 Projective space

A homogeneous vector (x, y, z, 1) is equivalent to (kx, ky, kz, k) for any non-zero scalar k. The set of equivalence class of vectors in Rn forms the projective space Pn+1. A homogeneous vector X = (X1, X2, X3, X4)

T

with X4 6= 0 represents the

point (X1/X4, X2/X4, X3/X4) T

of R3 with inhomogeneous coordinates. In P3_{, the coordinates of a point is a 3-vector (x}

1, x2, x3). The points are

called ideal points or points at infinity if x3 = 0. All the ideal points line on the

(24)

Chapter 2. Overview of the framework 8 line l∞, called the line at infinity. Any circle intersects l∞ at two circular points

I = (1, i, 0)T _{and J = (1, −i, 0)}T_.

In P4, the coordinates of a point is a 4-vector. The ideal points (X1, X2, X3, 0)T

are contained in a plane called the plane at infinity, which has the canonical form π∞= (0, 0, 0, 1)T.

The absolute conic Ω∞ is a point conic on π∞. Points on Ω∞ satisfy

X2₁+ X2₂+ X2₃ = 0 X4 = 0

The image of the absolute conic (IAC) is the conic ω = (KKT)−1. The dual image of the absolute conic (DIAC) is defined as ω∗ = ω−1 = KKT. The fact that IAC (or DIAC) is only related to the internal parameters of the camera is often used for camera calibration.

The absolute dual quadric Q∗_∞is the dual of the absolute conic. Geometrically it consists of the planes tangent to Ω∞. Algebraically it is represented by 4 × 4

homogeneous matrix of rank 3. Its canonical form in metric space is

Q∗_∞= " I 0 0 0 # (2.2)

2.1.3 Self-calibration

Auto-calibration is the process of determining internal camera parameters directly from multiple uncalibrated images. It differs from the traditional camera calibra-tion methods in the way it establishes the constraints on the parameters of the camera matrix. Traditional methods rely on the calibration targets with known geometry to establish point correspondences. Self-calibration methods often resort to the internal constraints on the intrinsic parameters (e.g. camera skew is equal to 0 for almost all modern cameras; or perhaps fixed/known aspect ratio, or even known principal point), to calibrate the camera. Most traditional self-calibration methods assume that the principal point is fixed at the image center so that lin-ear equations on the intrinsic parameters could be obtained. However, although principal point variation may not be large, the principal point in fact may vary along with the focal length variation. In this case, a higher number of unknown parameters are to be solved and the equations on those parameters are non-linear. This adds more difficulty to the problem of camera calibration from silhouettes of

(25)

Chapter 2. Overview of the framework 9 smooth objects. A more schematic introduction of the self-calibration is presented in Chapter 5.

2.2 The state-of-art

The literature is roughly divided into two categories based on the techniques they use: plane based camera self-calibration and silhouette based camera calibration.

2.2.1 Plane based camera self-calibration

A few methods exist in the current literature, that are capable of calibrating a camera only from the unknown texture on a plane without resorting to any off-plane information of the scene. These methods are appealing for solving the addressed problem in our thesis. However, due to the limited information used, they as a category do not work if the principal point is varying and the calibration accuracy for off-plane points is doubtful.

Triggs [1998] was the first to put forward such a method. They demonstrat-ed that it is possible to use circular points of a plane to obtain constraints on calibration parameter. Their method is based on two fundamental ideas: 1), t-wo circular points on the plane are also on the absolute conic because they are the intersection between the plane and the absolute conic; 2), circular points are mapped from one image to the other by the plane induced homography. In each image, two constraints on the unknown parameters of the camera are provided by circular points therefore the camera calibration could be obtained if enough images are acquired. A case of constant internal parameters is shown in the pa-per. However, a significant problem of the method is that it can not calibrate cameras with both varying focal length and principal point. A simple counting argument suffices to explain this. Assume there are m views thus the total num-ber of constraints provided by circular points is 2m. Since both focal length and principal point are varying (a general assumption of square pixel and zero skew for modern cameras is often appropriate), the total number of unknowns are 3m + 4 (2 for principal point and 1 for focal length for each of the m views; 4 in total for circular points). Consequently the number of unknowns is always larger than the number of available constraints. Therefore the parameters of the camera can not be obtained. Another concern about the method, as shown by Cross et al. [1999], is that even for fixed intrinsic cameras, the calibration accuracy for the points off the plane drops sharply though accuracy for points on the plane is very good.

(26)

Chapter 2. Overview of the framework 10

Malis and Cipolla [2002] propose to enforce the multiview constraints be-tween collineations for calibrating cameras from planar structures. The super-E-collineation matrix is decomposed to extract the plane normal and a cost function based on the equality of singular values of the essential matrix is minimized to self-calibrate the camera. However, the equality of singular values only provides two constraints for each independent E-collineation matrix (n images have n − 1 independent collineation matrix), therefore the method ofMalis and Cipolla[2002] would also suffer from the same problem asTriggs[1998]: an insufficient number of constraints for calibration under variable principal point. Moreover, this method requires an initial guess of the camera parameters to derive super-E-collineation matrix hence make it not fully automatic.

A more recent algorithm is shown in Zhou et al. [2012] for plane-based struc-ture and motion, where they track feastruc-ture points on the plane to obtain a set of plane induced inter-image homographies for self-calibration. For cameras with varying focal length, the self-calibration constraints are established based on the condition that the Euclidean homography preserves the length of any vectors in-side the subspace perpendicular to the normal of the plane (Ma et al.[2003]). This condition provides only 1 constraint on the internal parameters for each indepen-dent homography therefore the method would also fail in case of varying principal point.

Other works such as Gurdjos and Sturm [2003], Menudet et al. [2008] not only ignore the influence of a varying principal point but also require additional assumptions on the input data (e.g., a fronto-parallel image for initialization).

2.2.2 Silhouette based calibration

Cipolla et al. [1995] proposed the name of frontier points for representing the frontier of the visible region swept out by the contour generator and used this idea to derive the camera motion from object silhouettes. Following this line, several algorithms are proposed that take advantage of constraints associated with frontier points for structure and motion. A common strategy is to find epipolar tangents -epipolar lines that are tangent to the silhouette. InWong et al.[2002], the authors minimize the geometric distances between the epipolar tangent points and the epipolar lines to find the frontier points and hence to compute the fundamental matrix, but they derivation is limited to a circular motion. Based on their previous work, Wong and Cipolla [2004] proposed later to use calibration results from a circular motion as the initialization for deriving the calibration of cameras with

(27)

Chapter 2. Overview of the framework 11 an arbitrary general motion. Later on, Hernandez et al. [2007] put forward the idea of silhouette coherence which could be regarded as the generalization of the epipolar tangency constraint under circular motion. In general, the application of these methods is limited by the requirement of circular motion (either as camera motion constraints or as initialization).

The method presented in ˚Astr¨om and Kahl [1999] only applies to cameras with restricted motion, i.e., cameras with smooth motion and small displacement between different camera positions, however the possible extension of the method is of great importance. For scenes with a high number of detectable frontier points (e.g., greater than 7), a fully varying camera could be calibrated from only silhouettes without resorting to any additional information. In fact, 7 pairs of corresponding epipolar tangent lines are sufficient to determine both the 3 degrees of freedom (dof) of the 1-D projective mapping between corresponding epipolar lines, and the 4 dof of the two epipoles. A drawback of this method however is that it needs a high number of frontier points in each view pair, which are only present in very rare surfaces.

A calibration framework that is very similar to ours is described by Cross et al. [1999], who uses the plane induced parallax to find epipoles and uses the method proposed byTriggs[1998] to get an initial camera calibration. This initial calibration is refined with frontier points to improve accuracy with points off the plane. Briefly, the plane induced homography is used to map the object contour in the first view to the second. According to parallax geometry, the line which is tangent to both the original and the mapped contours must pass through the epipole. Accordingly, they detect two ’Bitangents’ to determine the epipole and hence to reconstruct the scene. However, since this framework adopts the method of Triggs [1998] for calibration, it shares the common problem with Triggs [1998], being not able to calibrate cameras with both varying focal length and principal point.

A very important work on structure and motion using silhouettes is Furukawa et al.[2006]. The property that epipolar tangent lines are parallel for orthograph-ic, affine, or weak perspective cameras is used to search for frontier points. A algorithm based on RANSAC (Fischler and Bolles [1981]) is devised to simulta-neously estimate the camera parameters and a consistent set of frontier points in view pairs. Although it is the first non-iterative method for recovering epipolar geometry using silhouettes, the framework is limited by the usable camera types

(28)

Chapter 2. Overview of the framework 12 (i.e., limited to orthographic, affine or weak perspective). For a general projective camera the method will not work.

Sinha and Pollefeys [2010], Sinha et al. [2004] developed a RANSAC based algorithm to recover epipolar geometries for video sequences. They represent the convex hull of the silhouette using a tangent envelop. At each iteration, the epipole positions and epipolar line homography are hypothesized by randomly choosing a pair of tangents from a randomly selected view pair, then the epipolar tangent constraint is verified via computing the reprojected epipolar transfer error with all frames in the sequences. This framework works with general projective cameras, however it requires dynamic silhouettes from video sequences as input and is also too computationally demanding due to huge number of hypotheses to validate.

Inspired by the silhouette representation in Sinha et al. [2004], McIlroy and Drummond [2009] uses smoothed tangent envelop for representing the silhouette and develops a continuous optimization algorithm to find the epipolar tangents. Apart from restricting the camera type to affine, the method also requires initial-ising the camera pose by hand which makes the framework not fully automatic and therefore less desirable.

2.3 Our contribution

In this thesis, we present an approach to calibrate a varying camera from back-ground and silhouettes of smooth objects, not requiring more than two frontier points per view-pair. The significance of our framework is understood by consid-ering the following points:

A. Only-plane based calibration can not cope with more than 1 varying param-eters

B. Only-silhouette based calibration can cope with fully varying camera, pro-vided that very involved surfaces are used ( with at least 7 frontier points for each pair of views )

C. Our method is based on both plane and silhouettes, requiring not more than two frontier points and being applicable to ordinary surfaces.

Though fully varying camera could be calibrated from only silhouettes, we show an alternative method that requires a smaller number of frontier points and a background plane. The proposed algorithm has the following advantages:

(29)

Chapter 2. Overview of the framework 13 • both focal length and principal point can vary

• requires only two frontier points • camera motion is not restricted

• no initial guess of camera parameters is required The detailed contributions are :

1. the geometric solution of computing epipolar geometry assuming the exis-tence of a background plane. This solution framework is applicable to generic projective cameras and does not require a known camera motion.

2. the corresponding algebraic solution that locates epipoles by finding fron-tier points. A robust optimization algorithm for finding epipolar tangents is proposed, which only requires object silhouettes and plane induced ho-mographies as input and does not resort to manual initialization or initial guesses.

3. a consistency enhanced method for incremental projective reconstruction based on a minimal formulation of three view partial reconstruction. The projection matrices are constructed directly from inter-image homographies and epipoles without explicitly computing the fundamental matrix thus the search space for building consistent projective camera triplets is in a lower dimension.

4. An minor contribution is a sub-pixel color edge detection algorithm based on Gaussian fitting which is used to derive an accurate object apparent contour. Moreover, it’s also worth to mention that our calibration framework also applies to multiple camera network calibration with a background plane constantly visible in the scene.

2.4 Articulation of the framework

In view of the limitations of the current silhouette based calibration methods, we propose a new framework that features more relaxed and reasonable constraints on camera intrinsics and the applicability to general projective camera.

The proposed calibration framework is coarsely divided into three stages: epipole estimation from detecting the two outmost frontiers, projective reconstruction with maximum consistency and metric reconstruction. The main stages of the frame-work are shown in Fig. 2.1.

(30)

Chapter 2. Overview of the framework 14

Fig. 2.1 Main stages of the proposed framework

The first stage recovers the epipolar geometry. The epipoles are estimated by searching for a pair of epipolar tangents (which is equivalent to finding frontier points) with the help of the inter-image homography and the object silhouette. The unknown pattern on background plane is assumed to be visible in all views, which facilitates the computation of the inter-image homography induced by the plane. We then use this homography to link two corresponding epipolar tangent lines of a view pair so that it is possible to enforce the epipolar tangency constraint in the search for frontier points. The epipole is finally determined as the intersection of two aforementioned epipolar tangents in each image of a view pair.

The second stage of our framework is to recover a projective reconstruction from the pair-wise epipolar geometry. The building block of the projective re-construction is the view triangle whose structure is constructed directly from the estimated epipoles and the inter-image homography. The view triangle is repeat-edly constructed and solved for adding each view to the initial reconstruction until all views are reconstructed. Projective bundle adjustment is also carried out each time a new view is added to enhance the overall consistency.

Self-calibration based on absolute dual quadric is employed in the third stage to obtain the internal parameters and to upgrade the projective reconstruction to metric. Common assumptions on modern CCD cameras are used i.e., absence of camera skew, square pixel, and principal point is close to the image center (but varying). The flexible self-calibration algorithm based on absolute dual quadric is adopted in order to calibrate the camera with both varying focal length and prin-cipal point. Finally metric bundle adjustment is performed to achieve maximum accuracy.

(31)

Chapter 3 Epipolar geometry estimation

3.1 Introduction

The epipolar geometry is actually the geometry of the intersection between the image plane and a pencil of planes with the baseline as axis (Hartley and Zisser-man [2004]). It is frequently used in early stage in the problem of reconstructing a scene from image matchings. Algorithms for computing the epipolar geometry from multiple 2D image corresponding points are extensively studied. Howev-er, this corresponding-point-based method can not be used when an insufficient number of image matchings can be found, e.g. when the image matchings are all coplanar, or no direct correspondences are to be found, e.g., if the scene objects are textureless. In the above situations, the silhouette of the object in the scene becomes a dominant feature.

The rim of an rigid object is the 3D curve consisting of the outermost visible points on the object’s surface. The boundary of the object’s silhouette is consisted of points that are the intersection between the image plane and the viewing rays passing through the object’s rim. Corresponding silhouettes of the same object in two views provides rich information about the epipolar geometry. Although the silhouette boundary is always changing along with the variation of the camera viewpoint, on object surface there exist some ’fixed points’ that are both visible in two images. Once enough ’fixed points’ are found, intersecting the tangent lines passing through their image projections will reveal the location of the epipole. Luckily we are always able to find at least two of those ’fixed points’ in each view pair. However finding the fixed points is no easy task without any additional information. As pointed out by Sinha and Pollefeys [2010], it is a ’chicken-egg’

(32)

Chapter 3. Epipolar geometry estimation 16 problem in the sense that if we know the epipole the fixed points could be easily obtained while if the fixed points are known the epipoles are readily computed.

The difficulty seems to reside in the fact that the link between two views are not easily obtained if only two silhouettes are given as input. In our framework, the background plane is by design to establish the link and consequently to facilitate the search for the ’fixed points’. The inter-image homography induced by the background plane helps to map the so called epipolar tangent in one image to the other. By enforcing the tangency constraint regarding the epipolar lines, we could formulate finding the ’fixed points’ as an optimization process. In this chapter, we study the techniques of finding the geometrical relations between two views using the object silhouette information with the help of the unknown but textured planar background pattern.

3.2 Background

3.2.1 The apparent contour and frontier points

As the camera moves around a rigid object, the silhouette of the object is view dependent. The apparent contour is the projection of the contour generator which is the 3D curve composed of points whose tangent plane passes through the camera centerFurukawa et al.[2006]. Since the contour generator consists of the outermost points (the last points visible) of the object, the lines connecting the camera center and a point on the contour generator lie on the tangent plane to the object surface, as shown in Fig.3.1.

When a object is observed by two views, the contour generators project to two apparent contours in the two images respectively. On the contour generators, the ’fixed points’ that are visible in both views are called frontier points. They are the intersections of two contour generators, where the tangent plane of the object surface contains both camera centers. The idea is shown in Fig.3.2.

This tangent plane πp at a frontier point P intersects the two image planes at

two epipolar tangent lines, lp and lp0 respectively. The name could be understood

from the following two facts: 1st_{, l}

p and lp0 are epipolar lines because π_p contains

the baseline. 2nd, they are tangent to the apparent contours, Γ and Γ0 respectively due to the fact that they are the trace of the tangent plane πp. The projections

of the frontier points (e.g. p and p0 ) are the only true correspondences between two images.

(33)

Chapter 3. Epipolar geometry estimation 17

Fig. 3.1 Contour Generator. The viewing ray through a point X of the contour generator is on the tangent plane to the object surface.

Fig. 3.2 Frontier point

In general, a frontier point corresponding to two views is not visible in a third view. For a non-convex object, there might be more than two frontier points while for a convex object, there are exactly two of them. Therefore if the object surface is closed, at least two such points always exist, which lies on the basis for our algorithm to be feasible. The outermost frontier points could be located at the top and bottom of the object respectively as shown in Fig.3.3.

(34)

Fig. 3.3 Two frontier points P1 and P2 always exist.

3.2.2 Convex optimization basics

3.2.2.1 Convex set and convex hull

A set is convex if the line segment connecting any two points in the set is also in in the set.

A convex combination of x1, x2, · · · , xk is defined as θ1x1+ θ2x2+ · · · + θkxk

where θi ≥ 0, i = 1, · · · , k and

Pk

i=1θi = 1.

The convex hull conv C of a set C is the set of all the convex combinations of points in C (Boyd and Vandenberghe [2004]).

3.2.2.2 Backtracking line search

A common strategy for solving nonlinear optimization problems, which are fre-quently encountered in the thesis, is to do iterative search. These methods are in general called descent methods. The idea of this type of methods is to find, given a starting point t(1)_{, a descent direction ∆t and the step size x such that}

f (t(i)_{+ x}(i)_∆t(i)_{) < f (t}(i)_{), i = 1, · · · .}

For this strategy, t(1)_{is assumed to be given, subsequently ∆t and s become two}

important parameters. Searching along the negative gradient direction proves to be a natural choice, i.e., ∆t = −∇f . For the step size, various methods have been proposed however a fundamental yet useful method is backtracking line search, which is also known as the Armijo rule (Armijo[1966]). The algorithm is summa-rized as follows,

(35)

Chapter 3. Epipolar geometry estimation 19 1. set the control parameter α ∈ (0, 0.5), β ∈ (0, 1)

2. set s = 1, search along the descent direction ∆t for f at t ∈ dom f ; 3. while f (t + s∇t) > f (t) + αs∇f ∆t , ∆t = β∆t

The Armijo rule says that each step of the iteration should result in a sufficient descent in the function value where this sufficient descent is normally a fraction of the decrease predicted by linear extrapolation .

3.3 Epipole estimation

Our aim in this chapter is to determine the fundamental matrix by locating the epipoles in each pair of two views by finding frontier points with the help of the inter-image homography Hπ, which is induced by the background plane π. For

each pair of two views of the scene, a sufficient number of correspondences, for the computation of the fundamental matrix, is not available directly due to the smoothness of the 3D object. In fact, the planar pattern only provides 4 inde-pendent correspondences. Additional constraints on computing the fundamental matrix have to be obtained with extra efforts.

Fig. 3.4 The geometric solution for epipole estimation by locating frontier points with the assistance of the background plane

(36)

3.3.1 The geometric solution

The proposed geometric solution for epipole estimation is illustrated in Fig.3.4

and schematically summarized in Fig.3.5.

The apparent contour Γ and Γ0 are the projections of two contour generators Σ and Σ0 respectively. Consider I1 the first of the two images. The tangent line

lp to Γ at an image point p on the first image is the intersection between the

image plane and the tangent plane πP to the object S at the object point P (the

frontier point). The backprojection l onto the plane π of the image line lp is the

intersection between the above mentioned tangent plane and the plane π.

Notice that, among all the possible planes containing the line l, only a small number of them are tangent to the object S (e.g., if the object is convex, there are only two planes through l that are tangent to S).

Fig. 3.5 Overview of the geometric solution

Consider now the second image I2, and suppose that there exists a point p0

on the apparent contour Γ0 of the object S on the second image, such that the tangent line l0_p to Γ0 at p0 backprojects to the same line l on the plane π: then the tangent plane π0_P to S through l, whose trace on the image plane of the second camera is l0_p, coincides with the tangent plane πP to S. Thus the point p0 is the

second image of P, while point p is the first image of P. Hence, points p and p0 are corresponding points and lp and l0p are corresponding epipolar tangent lines.

Notice that the common tangent plane πP contains both viewpoints O and O0.

In order to avoid correspondences on the plane π (or too close to it), it is sufficient to check that the backprojection points of p and p0 onto the plane π are spatially separated. We recall that, in addition to correspondences related to points on the plane π, other correspondences are needed related to points out of that plane.

Suppose that a second tangent plane πQ is found similarly to πP, and let lq

(37)

Chapter 3. Epipolar geometry estimation 21 plane. Also this second tangent plane contains both viewpoints O and O0. The intersection line between πP and πQ is the line joining O and O0: therefore the

trace of this line on the first (second) image is the intersection between the image lines lp and lq (l0p and l0q). Therefore the intersection point of lp and lqis the epipole

e of the first image plane, and similarly the intersection between, l0_p and l0_q is the second epipole e0.

From the epipole e0 and the inter-image homography Hπ, the fundamental

matrix can be derived:

F = [e0]×Hπ

where [a]× is the skew-symmetric matrix such that [a]×x = a × x is the

cross-product of the two vectors a and x (Room[1952]).

3.3.2 The algebraic solution

Algebraically, we implement the above geometric solution as an optimization pro-cess where the frontier points are localized by finding two pairs of epipolar tangents. Specifically, two constraints are enforced: 1), a pair of true corresponding epipolar tangent lines are tangent respectively to their apparent contours in two views; 2), the plane induced homography would transfer one epipolar tangent to the other since they back-project to the same line on the background plane.

3.3.2.1 Optimization formulation

To enforce the tangency constraint, an important quantity to minimize in the optimization is the distance, in a image, from a line to an apparent contour. Definition 3.1. The distance from a 2D line ln to a 2D closed smooth curve C,

denoted as Dl2C(ln, C), is the smallest distance from ln to all curve tangents that

are parallel to ln.

(38)

Chapter 3. Epipolar geometry estimation 22 An example of the line to curve distance is shown in Fig.3.6. Suppose we want to determine the distance from the line ln to the curve Γ. l1 and l2 are two

curve tangents that are parallel to ln. Dl2C(ln, Γ) is the distance between the two

parallel lines ln and l1, rather than between ln and l2, because l1 is the closest

curve tangent that is parallel to l0_p.

Fig. 3.7 Algebraic solution of finding epipole

The frontier point searching problem is then formulated as an optimization process: minimize x Dl2C(l 0 p(x), Γ 0 ) subject to l0_p(x) = H−T_π lp(x) x ∈ dom Dl2C (3.1)

As shown in Fig.3.7, Γ and Γ0 are two corresponding apparent contours (apparent contours are used temporarily, in practice the algorithm uses the convex hull of the apparent contours instead). A tangent line lp to Γ is mapped to l0p in the

second image via the inter-image homography Hπ because lp and l0p backproject

to the same line on π. Dl2C l0p(x), Γ

0_{geometrically represents the distance from}

the mapped line l0_p to the second apparent contour Γ0 and is designed as the objec-tive function of the formulated optimization problem. Essentially it measures the degree of violation to the tangency constraint regarding l0_p, therefore minimizing Dl2C l0p(x), Γ

0_{for some valid parameter x (domf in (}

3.1) means the domain of a function f ) amounts to finding a pair of corresponding epipolar lines that are tangent to Γ and Γ0 respectively. The image projections of the frontier points are

(39)

Chapter 3. Epipolar geometry estimation 23 the global minimizers (or ideally the roots) of the objective function where the tangency constraint violation is close to 0.

3.3.2.2 The proposed algorithm

Centering on solving the optimization problem, the proposed epipole finding al-gorithm is summarized in Alal-gorithm 1. The alal-gorithm is mainly divided into four parts: 1) Convex hull computation and segment division, where the convex hull of each apparent contour is computed and qualified segments of the convex hull are selected for searching, 2) Discrete initialization, where a number of integer initial points are generated for future iterative search by exploring the integer parameter space, 3) Levenberg-Marquardt (LM) Levenberg [1944] refinement, where the ini-tial points are refined and candidates of epipolar tangent lines are generated, 4) Robust epipolar tangent line pair selection and epipole determination.

Algorithm 1. Given two corresponding apparent contours Γ(t) and Γ0 in a view pair and the plane induced homography Hπ ( the first apparent contour is

repre-sented as a function of nondecreasing parameter t, Γ(t) : R → R2 ), our aim is to determine the epipole positions by finding the epipolar tangents l∗_p(t) and l∗_q(t), i.e, minimize t Dl2C(l 0 p(t), CH 0 ) subject to l0_p(t) = H−T_π lp(t) t ∈ dom Dl2C (3.2)

1. Convex hull computation and segment division. Compute the convex hull CH and CH0 for the apparent contours Γ(t) and Γ0 respectively using quickhull algorithm Barber et al. [1996]. CH is divided into several curve segments according to the sign of the curvature. Only outward curving segments Si,

i = 1, · · · , Ns are selected and their corresponding parameter t ∈ [ti1, ti2]

composes the search space Ω, i.e., dom Dl2C = Ω.

2. Discrete initialization. Use the proposed discretized backtracking line search algorithm to locate the local minima ¯d∗ of ¯Dl2C, where ¯Dl2C is the discretized

version of the objective function. The obtained local minimum points are used to initialize the optimization (3.2). Specifically, ¯Dl2C is generated by

restricting the domain of Dl2C to Ωd, where Ωd is the set consisting of all

integers in Ω. The starting points ¯tj for the discretized backtracking line

search algorithm are generated by (approximately) uniformly sampling N points in each segment of Ωd. A threshold dth is set for ruling out bad

(40)

Chapter 3. Epipolar geometry estimation 24 initialization. Denote the set of all the good starting points as ¯T and the local minimum points as ¯t∗_j.

Repeat

a. find local minima ¯d∗_j via discretized backtracking line search (search steps are integers) with ¯tj as the starting point.

The corresponding local minimum points are ¯t∗_j; b. keep ¯t∗_j, if ¯d∗_j < dth

until each ¯tj ∈ ¯T is visited

3. Levenberg-Marquardt refinement. Perform LM in the continuous feasible set Ω using each seed ¯t∗_j as initialization. Derive the candidate set T∗ consisting of a number of Ncrefined optimal points t∗, where each t∗ corresponds to an

estimated projection of a frontier point.

4. Robust epipolar tangent line pair selection and epipole determination. For each group of two candidates from T∗, derive the epipolar tangents, the epipoles and subsequently the fundamental matrix. Compute the Sampson error using two frontier point correspondences. The final pair of epipolar tangent lines is selected as the one with the smallest among all Np =N₂c

pairs.

For k ← 1 to Np do:

a. derive the projections of frontier points p, q, the curve tangents lp,

lq in the first image and respectively their counterparts in the second

image p0, q0, l0_p = H−T_π lp, l0q = H −T

π lq. p0, q0 are computed as points

of tangency on Γ0 regarding l0_p and l0_q respectively (shown in Section

3.3.2.2.4).

b. derive the epipoles e = lp× lq, e0 = l0p× l0q and the fundamental matrix

F = [e0]×Hπ

c. compute Sampson error k with respect to F using frontier point

cor-respondences p ↔ p0, q ↔ q0.

The optimal epipolar tangent pair l∗_p(t) and l∗_q(t) is selected as the one with the smallest k.

The key gradients of the algorithm, namely the convex hull representation, the objective function, the global minima search strategy and the robust opti-mal epipolar tangent pair selection, are explained in detail in sections 3.3.2.2.1,

(41)

3.3.2.2.1 Convex hull representation

Our algorithm starts by computing the convex hull of the object and searching only the two outermost frontier points. A non-convex object could have multiple frontier points (see Fig.3.8) due to the complex shape. Theoretically, it is possible to locate all the frontier points of the contour and take the common intersection as the epipole (or use RANSAC to determine the best choice). However the com-putation load of searching multiple frontiers might be several times as much as searching only 2 outermost frontiers in case of unknown epipolar geometry. As shown in Section3.3.1, knowing the location of two frontier points suffices to deter-mine the epipole, therefore we decide to locate only the outermost frontier points to achieve high efficiency. If the apparent contour and inter-image homography are accurate enough, using two outermost frontier points could also produces accurate epipole locations.

Fig. 3.8 Multiple frontier points for a non-convex object; Red dots are outer-most frontiers and blue dots are ’inner’ frontiers

Image projections of such outermost frontier points must lie on outward-curving segments of the apparent contour, where the tangent plane would only intersect the object surface at one point rather than cutting through it (see Fig.3.9(b)). These valid segments are where the apparent contour and its convex hull overlap (see Fig.3.9(a)).

Specifically, the detected sub-pixel silhouette points are arranged in clock-wise order and the convex hull CH is computed via quickhull algorithm Barber et al.

[1996]. CH and the apparent contour Γ are represented by B-spline curves. Both CH and Γ are tracked to find the outward-curving segments Sito which we restrict

the search for frontiers. Let Ω represent the search space consisting all the valid segments Si. One would notice Ω ⊂ R is actually a set of line segments and points

(42)

(a) (b)

Fig. 3.9 Convex hull and valid segments of a non-convex object. (a) valid segments are represented by thick colored curves. (b) tangent envelop

space to Ω), rather than the whole apparent contour, thus tremendously reduce the computation load.

3.3.2.2.2 The objective function

The objection function is essentially the line-to-curve distance Dl2C(l0p(t), CH 0

). In this section, the structure and the evaluation procedure of the objective function are discussed in detail.

The line to curve distance Given a line l0_p(t) and the convex hull CH0 in an image, computing the line to curve distance Dl2C(l0p(t), CH

0

) is equivalent to finding a curve point whose tangent line is parallel to l0_p. The procedure for evaluating the line to curve distance is described in Algorithm 2, which has 2 important elements: i). Get two starting points. For a convex closed curve, there exist two tangents that are parallel to a given line l. To find the two points, we sample the integers from Ω and select the two candidates as the ones whose angle to l0_p are the smallest. This angle θ indicates the ’parallelism’ between two lines, which is numerically measured (or represented) by the norm of the cross product of the two direction vectors. The computation is as follows. Let l1

and l2 be the two lines between which the parallelism we are interested in.

The third coordinates of the lines l1 and l2 are first set to 0 and the

resul-tant vectors, l0₁ and l0₂, are normalized to have unit norm. The parallelism measure, cp, is eventually computed as the norm of the cross product of l01

and l0 2.

cp = kl01 × l 0

(43)

Chapter 3. Epipolar geometry estimation 27 ii). Nonlinear refinement. The two starting points are provided to the Levenberg-Marquardt nonlinear iterative search algorithm to move the initial points to the optimal. The final point is chosen to be the one with a smaller dis-tance to the input line l and this disdis-tance is output as Dl2C(l0p(t), CH

0

). cp

is a suitable objective function for this optimization problem because it is a continuous representation of the angle between two lines, i.e., the smaller cp

is, the ’more parallel’ l1 and l2 are (if lines are parallel cp = 0; if they are

orthogonal cp = 1). Therefore as the optimization is carried out, the best

parallel lines are readily found as the ones with the smallest cp.

Algorithm 2. Compute the line to curve distance. Given a line l = [a, b, c]T _{and a curve CH(t) = [x(t), y(t)]}T _{with dom CH = Ω, determine}

the line to curve distance Dl2C(l, Γ) ;

begin

l0 _{= [a, b, 0]}T _;

sample integers in Ω to form its discretization set Ωd ;

for each ti in Ωd do

pi = [x(ti), y(ti)]T /* the point */

lpi = [∇y|ti, −∇x|ti, y(ti)∇x|ti− x(ti)∇y|ti]

T _{/* tangent line */} l0 pi = [∇y|ti, −∇x|ti, 0] T_/p(∇y| ti) 2_{+ (∇x|} ti) 2 _{/* normalization */} ci p = kl01× l0pik2 /* parallelism */ end

get two points ps1 and ps2 corresponding to the smallest two cp in {c

i p} ; dj_q = |lTpsj|/ c √ a2 _{+ b}2_{, for j = 1, 2;} d∗ q = min{d 1 q, d 2 q} ;

return Dl2C(l, CH) = d∗_q, the corresponding spline parameter tk, point

pk and tangent line lpk ;

end

Note all the search is performed in the search space Ω meaning that only outward-curving segments of the convex hull are searched. By restricting the pa-rameter in Ω, the algorithm will only output the outer tangents which are exactly what are needed to compute the correct violation measure of the tangency con-straint. On the contrary, if the tangent lines that ’cut through’ the apparent contour are the output, the violation measure could be wrong. This is shown in Fig.3.10 where the solid contour lines represent the convex hull and dotted curve is the apparent contour. Only the outward curving segment (in red) is searched for epipolar tangent lines. The distance between the outer tangent l2 and the line

l0_p is the needed line to curve distance while d1, the distance between l1 and l0p, is

(44)

Fig. 3.10 Line to curve distance with outer tangents

The function value Given Algorithm 2, evaluating the objective function is an easy matter, which is summarized in Algorithm 3.

Algorithm 3. Evaluate the objective function. Given a convex hull CH of an object and its search space Ω in image the first image I1 and their

counterparts CH0, Ω0 in the second image I2, evaluate the objective function

f (t) with t ∈ Ω;

1. obtain curve tangent lp at point p indexed by parameter t.

2. get the mapped line l0_p = H−T_π lp in the second image

3. compute the line to curve distance f (t) = Dl2C(l0p(t), CH 0

) with Algorithm 2.

CH0 is not dependent on the function parameter t and thus could be treated as a fixed parameter. The only free variable t of Dl2C(l0p(t), CH

0

) is the parameter (chord-length) of the B-spline representation of the first convex hull CH (see Chap-ter 6). The function domain Ω consists of some unconnected line segments and/or singular points in R due to the fact that only a few outward curving segments of the first convex hull are searched. Therefore Dl2C(l0p(t), CH

0

) is a real valued, noncontinuous function. If written in the extended-valued extension fashion (Boyd and Vandenberghe [2004]), the function is,

f (t) = ( Dl2C(l0p(t), CH 0 ) for t ∈ Ω +∞ otherwise (3.4)

where f (t) represents the extended-valued objective function.

A typical shape of the objective function is shown in Fig.3.11. The color seg-ments correspond to outward-curving convex hull segseg-ments that will be searched by our algorithm. Dotted line segments correspond to inward-curving convex hull segments which will not be searched. One observes that the function is a real

(45)

Fig. 3.11 A typical example of the objective function.

(positive) valued, non-continuous, multi-modal function whose two global mini-mum point (or two roots) are to be found.

Although the size of the iterative search involved is small, evaluating the ob-jective function becomes relatively expensive. Therefore it would be nice if a minimum number of evaluation of the objective function is carried out in the search for global minima. Section 3.3.2.2.3 seeks to achieve the global optimal with minimum cost.

3.3.2.2.3 The global minima search strategy

Some good global minimization methods, e.g. the particle swarm and ant colony optimization, are too complex and too slow to be used here. Our strategy is to find a number of starting points that are close enough to the global minima and then use a high-performance local optimization method to approach the global minima.

The discrete initialization We choose to take advantage of the Levenberg-Marquardt for the iterative search. However, good initialization should be provid-ed. The method of initialization is problem-dependent and can be very compu-tational intensive if no additional information is acquired. We propose to explore the discretized version of the objective function to ease the initialization search. A discretized version of the objective function ¯Dl2C is generated by sampling integer

(46)

Chapter 3. Epipolar geometry estimation 30 values of Ω. And ¯Dl2C is explored to produce a set of initialization points where

the value of the objective function is small.

To explore ¯Dl2C, a discretized backtracking line search algorithm is devised,

which is essentially the gradient descent backtracking line search algorithm Boyd and Vandenberghe [2004] operating on integers. The algorithm is summarized in Algorithm 4. There are 3 important elements of the algorithm,

Algorithm 4. The discretized backtracking line search. Given a starting point t = t0, we want to find the integer optimal point for the

function f . The following parameters are used α = 1 × 10−4, β = 0.5, na= 0, Gtol = 1 × 10−4, Ftol= 1 × 10−4;

while ∇f (t) > Gtol and f (t) > Ftol do

δ = −f (t)/∇f (t) ; ∆t = sign(δ)d |δ| e ; tp = t + ∆t ; while f (tp+ ∆t) >= f (tp) + α∇f (tp)∆t do δ = βδ; ∆t = sign(δ)d |δ| e ; tp = t + ∆t ; na= na+ 1 ; if na > 10 then na= 0 ; break; end end t = tp ; end

where sign(x) is the sign of x and dxe is the ceiling of x, i.e., the smallest integer not less than x

1. The search step. The step for the search algorithm is a Newton-Raphson one which is frequently used to find the roots of a polynomial (Press et al.

[2007]). This step is reasonable, if not ideal, for our case because the objec-tive function is always nonnegaobjec-tive and the global minima are the roots of the function. This appropriate step enables a fast search for the initialization. 2. The search direction. The search direction is the negative gradient direction

as most gradient based iterative search methods would adopt. However in our discretized algorithm, the gradient is estimated by central difference based on neighbourhood integer points. The discrete gradient at point x is set to 0 if the function value at two of its neighbouring points are both greater than f (x). i.e., if f (x + 1) > f (x) and f (x − 1) > f (x) then ∇f |x = 0. This

Calibrating a varying camera from silhouettes and background