Identificazione e segmentazione contestuale di oggetti tramite regolarizzazione delle corrispondenze

(1)

Corso di Laurea magistrale (ordinamento ex

D.M. 270/2004)

in Informatica

Tesi di Laurea

Identificazione e segmentazione

contestuale di oggetti tramite

regolarizzazione delle

corrispondenze

Relatore

Ch. Prof. Andrea Torsello

Laureando

Sebastiano Vascon

Matricola 788442

Anno Accademico

2011 / 2012

(2)

(3)

List of Figures

1.1 a) the original image b) a non adaptive threshold applied c) using adaptive thresholding. Image taken from [15]. . . 10 1.2 The k-means algorithm and the generated clusters . . . 12 1.3 a) A net ow b) a cut on a net ow c) an image d) the max ow

on the net built over the image e) the min cut of the net. The blue and red marker are seeds used by the algorithm to discern the two classes. Image taken from [3] . . . 13 2.1 Arrows indicating the maximum change in the intensity. The

dark area has an higher value than the brighter thus the arrow go from the white to the black zone. . . 17 2.2 In the picture is shown an example of rst derivative applied to

an horizontal slice of an image. We can see that the derivative is positive or negative on the edges and is equal to zero in the at zone. Image taken from [15]. . . 18 2.3 a) Original image b) result of Sobel ltering . . . 20 2.4 a) Image of an electrophoresis b) the eect of the

oversegmenta-tion on an image. Image taken from [15] . . . 21 2.5 An ordered queue in action: a) the ordered queue b)extraction

of an element with highest priority c) reduction of the number of simple FIFO queue in the set d) treatment of an highest priority element which arrive in the queue. Image taken from [2] . . . 23 2.6 An scale space example of an MRI image. By increasing the scale

parameter t the details become fewever. . . 27 2.7 The eect of the image rotation in histogram of gradient's

orien-tation . . . 28 2.8 In this graph are represented the two cumulative distribution

function for x and y. The needed work to transform x into y is exactly the area between the two function. The arrow represent the ow from x to y in order to match their weight. In this picture the total area is 10+8+5+6+1+1 = 31, the total weight of the two distribution is equal to 13 thus the EMD(x, y) = 31

13 = 2.38.

Image taken from [37] . . . 34 5

(6)

2.9 The iterative Von Neumann scheme. In blue we have the non negative constraint, in green the constraint with sum equal to 1,

x0 is the starting point and x∗ the point at the intersection . . . 36

2.10 In the picture are shown the GrowthDirection phase of a point in a system with two variable (x1 and x2) and one constraints (x1+ x2 = 1). The gradient of the point is projected into the tangent subspace of the constraints. The tangent space in red is orthogonal to the constraint (green) gradient (the arrow point to (0,0)). The fact that we want to achieve a zero sum lead the tangent subspace being represented by the subspace with x1+ x2= 0. . . 38

2.11 The step is calculated x(t) _{and after that we project it} orthogo-nally on the constraints space to make it feasible. . . 39

3.1 The pipeline of the proposed solution . . . 42

3.2 a) Initial image b) regular grid 5 × 5. The point is made bigger for the sake of clearness. A seed is intended as single pixel with an unique id. c) watershed segmentation result d) the center of each region . . . 44

3.3 In the center there is a watershed point (id = −1) and ids in its neighborhood are checked . . . 44

3.4 The Gaussian scale space and the gradient orientation . . . 45

3.5 The similarity function . . . 47

3.6 Boundary penalty at dierent values of λB . . . 50

4.1 a) IMA: the object to be matched b) IMB: the object to be matched from a dierent perspective c) CL1 : the object in front of a clutter d) CL2 : the object in clutter partially hidden e) CL3 : the object in clutter partially hidden with altered color f) CL4 : the object in clutter rotated and partially hidden . . . 56

4.2 a) an original ALOI image b) our cropped version . . . 56

4.3 a) the image to be matched b) the image rotated of 90 ◦, c) the resulting match with 9 regions, d) 16 regions e) 25 regions f) 36 regions . . . 61

4.4 An example of matching increasing the number of regions. a)3x3 b)4x4 c)5x5 . . . 62

4.5 The graphs represent the behavior of the algorithm under varia-tions of regions,kP and kG. The object to be found in the clutter was IMA (a-d) and IMB(e-h). . . 63

4.6 a) The averaged precision and recall for each number of regions b) the average over all query and regions . . . 64

4.7 The precision and recall obtained by modifying the number of regions in the dataset . . . 65

(7)

List of Tables

4.1 Parameter involved in the solution . . . 57 4.2 As we can see the combination that work better are when kP >=

kG since we don't have mismatch or no match. . . 59

(8)

Dedicata a tutta la mia famiglia: omnipresente e di impareggiabile supporto.

(9)

Chapter 1 Introduction

1.1 Objectives

The main goal of this thesis is the implementation of an algorithm for the segmentation and matching of images by correspondences regularization. The research is set to nding a method that will produce a sort of probabilistic inter-section as an output, when provided two images as input. In this the problems of both matching (wider is the intersection higher is the probability that the match is consistent) and segmenting the image (once we had the intersection the segmentation is straightforward) are solved together. A method capable of de-termining such eect cannot be found in literature. Segmentation and matching are two problems that arise many times in computer visions applications and frequently they come out together. For attaining such goal every image must be segmented in regions (macropixel) with a watershed algorithm. After this, the problem is casted into a non linear optimization problem in order to max-imize a quadratic function representing possible assignments. This technique is commonly applied in graph matching problem for nding the correspondent vertices in two graphs. An interesting contribution is the possibility that certain parts of the images may not be matched in the second image. In terms of graph matching this approach falls into the category of inexact graph matching. By doing, the proposed approach will lead to a more relaxed assignment space and at the same time in a stronger matching technique.

Matching and contextual segmentation are done by correspondences regular-ization. This means that by iteratively regularizing the associations between the image's regions, using the projection gradient method to project them in a doubly-stochastic space, the best tting assignments will arise from the whole set of possible associations. The concept of best tting is due to a weighting function which will be a combination of photometric similarity, geometric simi-larity and a boundary length penalty across all the regions of the two images. Since the goal is to achieve matching with contextual segmentation, a gentle introduction to the aim and methods of both problems should be necessary.

(10)

(a) (b) (c)

Figure 1.1: a) the original image b) a non adaptive threshold applied c) using adaptive thresholding. Image taken from [15].

1.2 Segmentation techniques, an overview

Segmentation is a low-level computer vision process that, that provides the par-titioning of a given image into multiple regions that collectively cover the entire image. Each pixel in the image is thus assigned to a particular class (labeling). Pixels of the same class share certain visual characteristics (i.e. color, texture, edges etc.). Its main objectives is to simplify the representation of an image into something that is more meaningful and easier to analyze in further step. Im-age segmentations are involved in applications like objects recognition, medical imaging analysis, face recognition, ngerprints recognition, image compression, image editing and in image retrieval systems. Image segmentation methods divide into four major categories:

• Based on pixels similarity (histogram thresholding, adaptive thresholding) • Based on region growing (region growing and watershed)

• _{Based on clustering technique (k-means)}

• Based on the graph theory algorithms (cut and normalized graph-cut methods)

In the rst category fall the simpler methods, which assume that pixels showing similar value in intensity must belong to the same region. The underlying idea is that the corresponding histogram of a given image is analyzed in order to nd the value which best divides it into two parts. This leads to segmenting an image into two classes, the rst composed by pixels whose intensity is greater than the threshold and the second in pixels lower than the threshold. The more the number of thresholding values are decided, the more the number of classes will be generated. The Otsu's method [31] belongs to this class: by using statis-tics over the image it adapts the threshold in order to maximize the interclass

(11)

1.2. SEGMENTATION TECHNIQUES, AN OVERVIEW 11 variance, thus obtaining a more denite separation between the regions. The adaptive thresholding is based on dividing the image into smaller regions. A threshold is applied to each region, dividing the region's pixels in the corre-sponding class. Combining the results from each region, the nal segmentation is carried out (see g.1.1). In the case of dividing an image into foreground and background (which is one of the common) if a previous knowledge on the background composition (i.e. security camera) is provided, such composition can subtracted from the image in order to estimate if a pixel belongs to the background or the foreground. These approaches, however, do not evaluate the relationships between neighboring pixels. Because of this, the region growing methods and the following are more helpful to solve the problem.

The idea underlying region growing is that, starting from a set of points which represents the classes of segmentation, neighboring pixels (i.e. from a 8 grid neighbor) are iteratively added to this point when they respect a certain dis-tance function which is problem dependent. The watershed algorithm belongs to this class of segmentation methods and the idea is that an image may be considered as a topographic relief in which the pixel's intensity can be intended as the altitude. The algorithm oods the image from the seed (which can be estimated in dierent ways). When the oods coming from distinct seeds touch each other, a ridge is created at the intersection point in order to keep the two "kinds of water" separated. At convergence the image is segmented in a number of regions equal to the number of seeds. This approach approach will be used in this thesis (see sec.2.2).

The third class belong to the classic clustering methods for data analysis. The k-means algorithm [25] is a famous clustering technique which relies on certain number (k) of clusters each with its centroid to begin with. Iteratively pix-els closer to a certain point are then iteratively added. The distance between a point and a centroid is calculated using a distance function, although often Euclidean distance is used. At this point a mean over the point belonging to each cluster is calculated, and the center of that cluster is moved to the mean. Then all the points close to the new center are added again and the mean is calculated once more. The procedure is repeat until nothing will change. The main drawback on this method is the diculty in nding a suitable number of clusters (k) which made the approach problem dependent (see g.1.2 for an example).

The fourth method comes from the graph theory.

It exploits the idea of max-ow/min-cut over a net to nd the segmentation boundary across an image. A weighted graph in which every pixel represents a node and the edge between nodes represents the similarity of pair of pixels is created over the image. Once this net is built, a max-ow/min-cut algorithm is executed over it by nding the cut with minimal cost and dividing the image into two disjoint subsets [3]. The algorithm iterates again over each subset until the number of desired partitions is reached (see g.1.3). Because the weight of cut is directly proportional to the number of edges in the cut, the minimum cut algorithm favors small partitions. The normalized-cut was introduced to avoid this problem. The cost of the cut is thus the sum of the cut normalized by the

(12)

Figure 1.2: The k-means algorithm and the generated clusters

sum of the edges in the two disjoint sets. Adding this check makes the problem NP-complete because all the possible disjoint sets must be found in order to spot the best cut. Therefore the solution can be only approximated, although some heuristics are available.

1.3 Graph matching techniques, an overview

Since the approach used in this thesis will be cast into a more general graph matching problem, some methods used in literature to solve or approximate this problem must be reviewed.

In many applications a crucial operation is being able to compare objects or model and object in order to nd similarities and correspondences. When struc-tured information is represented by graphs1_{, this comparison is performed using}

some form of graph matching. "Graph matching is the process of nding a cor-respondence between the nodes and the edges of two graphs that satises some (more or less stringent) constraints ensuring that similar substructures in one graph are mapped to similar substructures in the other" [7].

Given two graph GA= (VA, EA), GB= (VB, EB)with |VA| = |VB|we can state

the graph-matching problem as follow :

Find a one-to-one mapping f : VB→ VAsuch that (u, v) ∈ EBif f (f (u), f (v)) ∈

EA. If this mapping f exists, it is called isomorphism, and it said that GB is

isomorphic to GA. This type of matching is called exact due to the fact that

all the vertices of GB are mapped in only one vertex of GA. If this requirement 1_{A graph is a data structure G composed by a set of vertices V and a set of edges E}

connecting pairs of vertices. If edges contain a certain measure representing the relationship between the two connected vertices, the graph is called weighted. Otherwise it is unweighted.

(13)

1.3. GRAPH MATCHING TECHNIQUES, AN OVERVIEW 13

(a) (b)

(c) (d) (e)

Figure 1.3: a) A net ow b) a cut on a net ow c) an image d) the max ow on the net built over the image e) the min cut of the net. The blue and red marker are seeds used by the algorithm to discern the two classes. Image taken from [3]

(14)

is relaxed, the matching will be inexact and what maybe found is not an iso-morphism. However nding the best mapping between the two graphs is still possible. Best mapping under certain assumption will be problem dependent and often measured using a tness function that quantify the goodness of a match. The matching is inexact, for example, when the number of nodes in the two graphs is dierent. In the case of dierent graph sizes a one to one mapping may still be found, but with a reduced set in the largest graphs. In this case the found mapping is the solution of the sub-graph isomorphism. In some inexact matching problems the task is still to nd a one-to-one mapping, but with the exception of certain vertices, which cannot be matched by assigning them to a "null" set of vertices. This case is the base of the approach followed here. When considering a graph as a data structure used to represent information and their relationship, nding the best matching with another graph is the equivalent of saying that the two represented objects are identical. Graph matching is still an open problem due to its NP completeness, and as of today an algorithm to solve it in polynomial time does not exists. It plays a central role in the pattern recognition [53] and is one of the topics the scientic community is spending a lot of its eort on. Another important aspect of this problem concerns on how the graph represents the information. Despite a lot of work has been done in the past years, the graph-based representation is indeed an open problem, since no stable and generally working solution has been found. Therefore this problem is currently solved case by case. Many graph-based representations that could be used for dierent purpose, like segmentation-graphs [10, 4], shock-graphs [39], skeleton tree [18] and the contour graphs [50] exists in literature.

Since the problem is well separated into two main classes (inexact and exact), the algorithm for solving it may also be divided into two major families:

• Algorithm for exact graph matching:

The exact graph matching algorithms often use the concept of states-search to iteratively add partial matches to an initial solution that re-spects certain constraints. At a certain point the algorithm cannot add more pairs of nodes to expand the current matching. In this case there were two possibilities: accept the solution as a partial matching or using a backtracking technique to nd alternative expansions. If all the pos-sible mappings that satisfy the constraints have already been tried, the algorithm halts. An interesting property of these algorithms is their ex-ibility in taking into account the attributes of both edge and vertices in the constraints rules. Dierent approach has been proposed but probably the simplest is depth-rst search that requires less memory than others and lends itself very well to a recursive formulation; it is also known as branch and bound. The rst algorithm of this family is due to Ullmann [46] in 1976. Ghahraman [13] in 1980 introduced the netgraph proposing a method for pruning the states, and a recent work that used the random walk theory has been introduced in [16]. Therefore all the exact graph matching techniques are not widely used in real application, due to the fact that having an exact match is too restrictive and the graph is often

(15)

1.3. GRAPH MATCHING TECHNIQUES, AN OVERVIEW 15 generated over real world data, thus with errors. Another reason is that the state space search could lead to an exponential time in the worst case and this could be a severe drawback.

• Algorithm for inexact graph matching: Depending on the wanted solution, two types of inexact graph-matching algorithms exists:

Optimal inexact: In this category the resulting match is a global op-timal solution, and the matching is performed with minimum cost. This implies that if an exact solution exists, it will be found by such algorithms. Hence they can be seen as a generalization of exact matching algorithms. Optimal inexact algorithms include the possi-bility that the compared graphs could be dierent without improving the computation time. On the contrary, they are usually fairly more expensive than their exact counterparts.

Approximate or suboptimal: Matching algorithms in this class only ensure to nd a local minimum of the matching cost. Usually this minimum is not very far from the global one, but there are no guar-antees. The major improvement is that approximate algorithms are often in polynomial time. The methods proposed arise from a dier-ent perspective, in fact the underlaying idea is to cast graph matching into a continuous, nonlinear optimization problem. In this eld there are many optimization algorithms that can be used to nd a solution to this problem because the extensive set of mathematics and opti-mization theory can be used. To this class belong the Relaxation la-beling2_{method introduced by [11], the Graduated Assignment Graph}

Matching algorithm introduced by Gold and Rangarajan in 1996 [14] and the continuous formulation of the Motzkins-Strauss theorem [29] that relate the maximum clique problem to a maximization of stan-dard quadratic function. This last method needs a brief explanation: given two graphs G1s and G2the association graph GAis constructed.

Since a clique in the association graph GArepresents an isomorphism

between G1 and G2, nding a maximizer for the SQAP3 in the

as-sociation graph means nding the best match between G1 and G2.

Pelillo [33] used this approach to match hierarchical structure, like trees, and using an evolutionary dynamics approach for solving the SQAP.

2_{Each node in the two matching graphs has a label. Two nodes with the same label}

represent a possible assignment. During the algorithm's iteration a probability vector for the labels is kept for each node and being modied accordingly to the neighborings nodes. At convergence for each node the labeling with maximum probability is chosen.

(16)

1.4 Structure of the thesis

The thesis is articulated into ve chapter:

• Chapter 1: An introduction on this thesis and on the problems of image segmentation and graph matching

• Chapter 2: The background knowledge useful to understand the proposed approach to the problem

• Chapter 3: Our solution • Chapter 4: Results

• Chapter 5: Conclusion and future works

Readers skilled in computer vision, statistics and non linear optimization tech-niques may skip the second chapter and start from the third since all the required theoretical and practical references will be pointed out when needed.

(17)

Chapter 2 Background knowledges

In this chapter we will recap the background knowledges needed to properly understand this work.

2.1 Image gradient

Figure 2.1: Arrows in-dicating the maximum change in the intensity. The dark area has an higher value than the brighter thus the arrow go from the white to the black zone.

The gradient of an image is a two-dimensional vec-torial eld in which every vector points to the max-imum rate of change in the image's intensity. The component of each vector are the horizontal and ver-tical derivatives of the image in that point.The rst derivative of a generic function tells us its asymptotic behavior on each point (ascending, descending) and in case of maximum/minimum it means that we are in a point with strong variance in intensity thus a probable edge. An example of function derivative is shown in gure 2.2. An image is a two dimensional function an thus is necessarily to compute the partial derivative on each dimension. The partial derivatives of an image (i.e. formula 2.1) are calculated by con-volving (see section 2.3.1) the original image with a dierential operator (see sec. 2.1.1).

Image's gradient represents the rst step in almost all algorithm that involve some image processing technique, in particular to those that use edge detection methods like the Canny's one [6].

(18)

Figure 2.2: In the picture is shown an example of rst derivative applied to an horizontal slice of an image. We can see that the derivative is positive or negative on the edges and is equal to zero in the at zone. Image taken from [15].

The gradient of an image is dened as follow: Ix =

∂f

∂x partial derivative (gradient) in the x direction (2.1) Iy =

∂f

∂y partial derivative (gradient) in the y direction ∇f (x, y) = ∂f ∂xxˆi+ ∂f ∂yyˆ (2.2) ||∇f (x, y)|| = qI2 x+ Iy2 (2.3) θ(Ix, Iy) = atan2 (Iy, Ix) (2.4)

where ||∇f(x, y)|| is the magnitude of the gradient and represent the amount of variance in the intensity, while θ correspond to the angle formed by the vec-tor (Ix, Iy)on the x-axis. The atan2(Iy, Ix)operator is dened in the interval

(−π, π], or (0, 2π] if we add π, and is used instead of the atan(Iy

Ix because its

denition allow to determine the direction of the rotation and thus is more suit-able for graphics applications. In terms of the atan function it can be expressed as follows: atan2(y, x) =                    arctan y_x x > 0 arctan y_x + π y ≥ 0, x < 0 arctan y_x − π y < 0, x < 0 +π₂ y > 0, x = 0 −π 2 y < 0, x = 0 undened y = 0, x = 0

The θ angle is positive if is counterclockwise (the y coordinate is on the upper plane) and is negative if is clockwise (the y coordinate is on the lower half-plane). This information is not provided by the usual atan which is not able to

(19)

2.1. IMAGE GRADIENT 19 discern between two opposite angle.

2.1.1 Dierential lter

The dierential lter comes from the classical mathematical formulation of the derivative. Symmetrical or asymmetrical dierence can be used to construct a dierential lter: Ix(x, y) = ∂I ∂x(x, y) = limh→0 I(x + h, y) − I(x − h, y) 2h symmetrical Ix(x, y) = ∂I ∂x(x, y) = limh→0 I(x + h, y) − I(x, y) h asymmetrical

because we are dealing with discrete points, the minimum value that can be assumed by h is 1 thus the formulation become

Ix[x, y] =

∂I

∂x(x, y) = 1

2(I[x + 1, y] − I[x − 1, y]) (2.5) Ix[x, y] =

∂I

∂x(x, y) = (I[x + 1, y] − I[x, y]) (2.6) the same idea is used to construct the y dierential lter. Now let's see how to construct the kernel of the lter by taking into account the convolution operator denition.

We know that the convolution is nothing more than a weighted sum of the points values. In order to calculate this sum we need to have a weighting mask to be moved around the image (see eqn.2.10). Let's take the 2.5.

The window will be of size 1×3 because the lter use the derivative that will be calculate by taking the preceded (x − 1), the succeeded (x + 1) and the actual (x) points, [x − 1, x, x + 1].

Now let's take a look on the multiplicative constant dened in 2.5. We saw that the preceded point is subtracted (. . . −I[x − 1, y]), the succeeded is added (I[x + 1, y] . . . ) and the actual is not present (0) and every term is multiplied by 1

2. Thus the nal dierential kernel will be:

W∂x=

1

2−1 0 +1 (2.7) the same procedure will be used for the asymmetrical version (eqn 2.6) and for generating the y dierential lter the only dierence will be on the window's size that in this case will be 3 × 1 because the lter will be in the y direction.

(20)

(a) (b)

Figure 2.3: a) Original image b) result of Sobel ltering

2.1.2 Sobel lter

Is a dierential lter obtained by a combination of a dierential symmetrical lter (eqn. 2.7) and a Gaussian lter 1_{. It produce the derivative in x or}

y direction and at the same time, thanks to the Gaussian lter, it doesn't emphasize the noise by smoothing the image, a common problem when deal with derivatives. It is dened as follow:

WG = 1 4   1 2 1 

 the Gaussian lter W∂x =

1

2−1 0 +1 the dierential lter WSobel x = WG∗ W∂x= 1 8   −1 0 +1 −2 0 +2 −1 0 +1   (2.8) the same idea is applied to the y dierential lter

WG =

1

41 2 1 the Gaussian lter W∂y = 1 2   −1 0 +1 

 the dierential lter

WSobel y = WG∗ W∂y = 1 8   −1 −2 −1 0 0 0 +1 +2 +1   (2.9) The result of the convolution is shown in g. 2.3 and as we can see

1_{A Gaussian lter is a discrete approximation of a Gaussian function and is used in}

com-puter vision in its 2D version to remove noise by making the image smoother. By this reason it falls in the family of the smoothing lter.

(21)

2.2. WATERSHED SEGMENTATION 21

(a) (b)

Figure 2.4: a) Image of an electrophoresis b) the eect of the oversegmentation on an image. Image taken from [15]

2.2 Watershed segmentation

The watershed transform is a traditional segmentation technique that fall in the category of the region-grows algorithms. It is used in gray-scale mathematical morphology and introduced by Meyer and Beucher in their paper [27] in 1990 and extended in [26] in 1992. In the last twenty years their work has been amplied and become a fundamental tools in image analysis. In this thesis work we will use the original implementation [26] because is best suit our needs and is already implemented into the OpenCV library.

The underlaying intuition of how the watershed algorithms works is provided by the classical topographic analogy for the representation of gray-scale images. In this representation, the image is considered as a topographic relief in which the numerical value of each pixel determining the corresponding point elevation. The idea is to bore a set of holes, one on each minimum of the relief (or at user-dene position called marker) and constantly pump water from this hole in the topographic representation. The water, entering through the holes, lls up the various catchment basins. In order to avoid the conuence of the oods coming from dierent minima (marker), dam along the lines where the oods would merge are built. At convergence only the dams emerge separating the various catchment basins, i.e segmenting the image.

2.2.1 Flooding from minima or maker

The choice of ooding from minima or maker lead to dierent segmentation results and exploit pros and cons:

(22)

Flood from Minima

The approach of using the regional minima of the image is quiet simple. The starting ooding points set is represented by the set of points that has in their neighborhood all highest points than itself, i.e. is a local minima. Each point of the set is marked with a dierent label. As we can imagine in a real picture there will be a lot of local minima, this is due to noise and will lead to a common drawback called oversegmentation of the picture (see g. 2.4). A proposed solution involve the usage of gradient of the image instead of the image itself. The main reason is that the gradient has a plain reaction in the at zone and a stronger response in the edges.

This approach has a positive aspect that doesn't need the user interaction for segmenting an image.

Algorithm 2.1 Extraction of local minima M ← ∅the list of local minima point I ←the gray-scale image or its gradient for all p ∈ I do

if ∀n ∈ neighbourhood[p], p ≤ n then

M.Add(p) . pis a local minima end if

end for

Flood from Marker

The introduction of markers in the pipeline of the algorithms overcome the problem of the over-segmentation stated in the previous section. The user decide the position of the ooding source by putting a marker. A drawback of this approach is the necessity of the user interaction in order to set the marker but being able to select the appropriate positions of the markers will lead to a more precise and denite segmentation.

2.2.2 The Ordered Queue

Order relations of ooding

If we observe of what's happening during a ooding we can notice that exists a dual order relation between the points involved in:

• Vertical order: A point z is ooded before a point y if and only if y is situated in an higher position than z on the relief. From this follow that before a plateau begin to be ooded all its neighboring points with lower altitude have been ooded too.

• Horizontal order: The ood progress inward into the plateau with uniform speed as the water enter at constant speed. This means that the rst

(23)

2.2. WATERSHED SEGMENTATION 23

Figure 2.5: An ordered queue in action: a) the ordered queue b)extraction of an element with highest priority c) reduction of the number of simple FIFO queue in the set d) treatment of an highest priority element which arrive in the queue. Image taken from [2]

neighbors set of points of an already ooded point, or a water source point, are ooded for rst, second neighbors on the second iteration and so on.

An algorithm which simulates the ooding, to be consistent, has to reproduce this dual order relation. This is obtained quite naturally by using an ordered queue.

The ordered queue's structure is nothing more than a series of simple FIFO2

queues. At each queue is assigned a level of priority (gray level in case of images means: Darkest → Highest, Brightest → Lowest). At anytime is possible to add a new element in one of the queues but only from the highest priority queue is possible to retrieve an element. The extraction is from the bottom of the queue. If the extraction fails, i.e. the highest priority queue is empty, the queue is suppressed and the queue with the priority immediately below is used for the item extraction. In this way the current queue is anytime the ones with highest priority. If an element whose need to be inserted into a suppressed queue arrive, it will be put at the end of the current queue so it will be the next element to be retrieve. By doing this we respect the so-called order relation of ooding. An ordered queue implemented in this manner, allows to store points in any order and to retrieve them in the order of the ooding. For this reason, this structure is at the base of an elegant optimal implementation of the watershed line. For an example see gure 2.5.

(24)

Algorithm 2.2 Watershed algorithm [26] M ←the set of points marked [1...n]

L ←the number of the gray level in the image ∀l ∈ L, Q[l] ← ∅an L-level empty ordered queue Initialization

for all m ∈ M do

for all p ∈ Neighbour[m] do l ←the gray-level of the point p Q[l] ← p

end for end for

Growing phase for all p ∈ Q do

if ∀k ∈ Neighborhood[p], Label[k] = i OR have no label assigned then Label[p] = i

put all the neighbor of p that have no label into Q

else . point p is between two or more dierent regions Label[p] = −1 . pis part of the frontier end if

remove p from Q end for

W = {p ∈Image s.t. Label[p] = −1} . W is composed only by the frontier's points return W

(25)

2.3. SCALE SPACE 25

2.2.3 Algorithm

Once the starting point techniques is decided we produce the set of markers. Each marker is identied by an unique label and each region will keep the label of the marker which has been the source of the ood. A marker may be constitute by several connected components, the important thing is that all these connected component share the same label if we want that belong to the same nal region.

In algorithm 2.2 the Watershed procedure is explained in pseudocode.

2.3 Scale space

Scale-space theory, mainly developed by the signal processing and image pro-cessing communities, is a widely used framework for multi-scale signal represen-tation and its main objective is to understand how image's structure changes at dierent scale. The background's motivations come from physics and biological vision studies and are sustained by a mathematical formal theory developed and consolidated in the last twenty years. This theory was introduced by the Koenderink's rst seminal paper [19], even if the ring spark was the Witkin's article [48] on one dimensional signal processing and some other Japanese works. However the Koenderink's theory open the street of a procient research activ-ity and an hyperactive production of image processing methods.

Informally the underlaying idea is that the ner details in an image will disap-pear once we get far from it and apdisap-pear again once we get closer. A classical example are the leaves on a tree: if we are standing closer in front of a tree we can see the leaves clearly with all their veins and structures, if we go far from it we can't see the leaves anymore but we will see the whole tree, is we take a step ahead the forest where the tree grow appear and so on. It is meaningless to discuss the tree concept at the centimeters or kilometer level. At those scales, it is much more relevant to talk about the veins that form the leaves, and the forest made of trees, respectively. Another important observation is that the same object can be seen bigger or tinier depending by the point of view. This fact, that objects in the world appear in dierent ways depending on the scale on which we observe them, has important and obvious implications if our aims is to describing them [21]. By saying that the notion of scale is of utmost im-portance when needed to design methods that analyze measurement that comes from the real world. The rst problem to face in this context is that we need to know at which scale (e.g. centimeters/kilometers) we have to work to achieve a feasible result. In some specic application this information is provided but often, specially in case of applications that threat generic data, no additional information of this type is given. From this point of view the only reasonable approach is to represent the input data at multiple-scale. This means that the original signal (image) should be embedded into a one-parameter family of de-rived images in which at the ner-scale we will nd the original image with as

(26)

many details as possible and, by increasing the scale (get far from the object), the resulting image become coarse.

The following properties are known as the scale-space axioms and form the foundation of the scale-space. A construction methods for a scale-space should:

• not introduce artifact at the coarse level • being scale invariant

• being shift invariant • being rotation invariant • not enhance local extrema • not create local extrema

The problem of nding a way to generate a scale space that respect all those properties has been solved by many authors [21], the interesting point is that almost everyone convey to the same solution by starting from dierent point: the best method to generate a scale space is by convolving the original signal (or image) with a Gaussian kernel and its derivatives by increasing the aperture of the Gaussian with the scale level. This formulation become famous as the canonical and it is still widely used.

2.3.1 Gaussian scale space

For an image f(x, y), f : R2_{→ R}_{, its Gaussian scale-space representation is a}

family of derived signals L(x, y; t) dened by the convolution of f(x, y) with a Gaussian kernel g(x, y; t):

t = σ2 the variance of the Gaussian g(x, y; t) = 1

2πte

−x2 +y2 2t

L(·, ·; t) = g(·, ·; t) ∗ f (·, ·)

When t = 0 the lter g becomes an impulse function thus the convolution operator doesn't aect the values of the images (L(x, y; 0) = f(x, y)), so the scale-space representation at scale level t = 0 is the image f(x, y) itself. As t increases, the Gaussian lter become wider and L result in a heavier smoothing of f causing the loss of details, see gure 2.6.

Convolution

The convolution is a mathematical operator that takes two function f(t), g(t) R → R (both integrable) and produce a third function by combining them. The operator is dened as follow:

(f ∗ g)(t) := Z ∞ −∞ f (τ )g(t − τ )dτ = Z ∞ −∞ f (t − τ )g(τ )dτ

(27)

2.4. ORIENTATION'S HISTOGRAM 27

Figure 2.6: An scale space example of an MRI image. By increasing the scale parameter t the details become fewever.

In case of discrete function, like images, we need to use its discrete version (2.10) that roughly compute a weighted sum of the points in the image by centering a window (kernel) on each point and moving it along the image. Each point is evaluated by a weighted sum of the points values around it. The weight of each points is specied by the window (kernel) W .

(W ∗ I) (x, y) = a X s=−a b X t=−b W [s, t]I[x − s, y − t] (2.10) Convolution is an important tool, is one of the basic operator in image processing for the linear lters and is widley applied in all the elds that require some signal processing process.

2.4 Orientation's histogram

2.4.1 Bin circular shift ⇒ Gradient rotation

Bins circular shift is a procedure that simply take an histogram as input an produce on output the same histogram but with the values of each bins shifted by a certain amount. This operation become interesting if we assume that a certain transformation that can occur in our data should be reected in the histogram structure. In case of the gradient's orientations histogram it make sense.

Let assume that H is our histogram, the gradient orientation take values in [−π, π](as we have seen in section 2.1) and we quantize its interval into nBins bins. This means that each bin represent an angle variation of

binSize = |[−π, π]| nBins =

2π nBins

(28)

Figure 2.7: The eect of the image rotation in histogram of gradient's orientation

.

What's happen if we move the value of the i-th bin to the (i + 1)-th ?

With this operation we are saying that all the gradient orientations that fall in the i-th bin, from now on, will fall in the (i + 1)-th bin, in other words is the same as we rotate the gradient orientation (counter/clockwise) of a binSize value.

The histogram divide the orientation's space from −π to π into bins [0, nBins− 1], so if we shift the bin from left to right we apply a counterclockwise rotation because we are moving from a negative angle to a positive angle, if we move from right to left we are rotating the orientation in a clockwise manner. The shift operation is circular because we are moving on a circle and thus is correct that the value at 0-th and (nBins − 1)-th position, if rotated (shifted), re-enter in the histogram at the opposite position. In gure 2.7 we can see what happen at the gradient orientations histogram if we rotate an image of 90 degree.This operation is important in our framework because will be a key point for making the algorithm rotation invariant.

2.4.2 Histogram distance function

An histogram is a data structure used to represent the distribution of certain properties. Is commonly represented as an multidimensional-array where each cell represent a bin and each bin represent a particular feature or properties, e.g. color, intensity, shape, lines, circle etc. The value of each bins is the number of time (frequency) that the represented properties appear in the image or more general in our space.

(29)

use-2.4. ORIENTATION'S HISTOGRAM 29 ful measure to quantify how two distribution are close thus how dis/similar is what they represent. These kind of measure are widely used in many applica-tion like in shape matching [1, 28, 45, 22], audio retrieval [52], image retrieval [51], face recognition [49], texture analysis [54, 9], color analysis and 3D ob-ject recognition. Features descriptors, like SIFT and SURF, are widely used in computer vision application and are often in form of histogram that convey the descriptive information. In this section we will analyze dierent type of distance function with reference to their applications.

2.4.3 Bin-to-Bin dissimilarity

In this category the dissimilarity is computed be taking in account only pair of bins that have the same index in both histogram and the complete dissim-ilarity value will be a combination of all the pairwise dissimdissim-ilarity. Given two histogram H and K with the same number of bins and with hiand kithe value

of the i-th bin, we dene:

Minkowski distance dLr(H, K) = ( n X i=1 |hi− ki|r) 1 r

The L1, L2 and L∞ distances was commonly used to measure dissimilarity

between color images. The L1 was shown [41] that lead to many false

nega-tive because the neighboring bins are not consider. However is used in the L2

form (Euclidean distance) is commonly used as a metric for comparing feature descriptor.

Kullback-Leibler divergence

This measure was introduced in [20] and its rst application was in information theory elds to measure the dierence between two probability distributions H and K. dKL(H, K) = X i hilog hi ki

It is a non-symmetric measure (dKL(H, K) 6= dKL(K, H)) thus is not a true

metric. In our setting the meaning become how inecient on average it would be to code one histogram using the other as a code-book.

χ2 _statistics

The χ2 _{histogram distance comes from the χ}2 _{test-statistic where it is used to}

test the t between a distribution and observed frequencies. In CV elds has been applied for object and texture classication , local descriptor matching ,

(30)

boundary detection and shape classication. χ2(H, K) = 1 2 X i (hi− ki)2 (hi+ qi)

The χ2 _{has the main advantage that reduce the eect of dierences caused by}

bins with large values against the bin with small value leading to a smoother distance measure.

Histogram intersection distance

This measure was proposed for color image retrieval in [42]. The intersection of histograms H and K is given by:

d∩(H, K) =

P

imin{hi, ki}

min {|H|, |K|}

where |H| and |G| are the number of samples on each histogram. Colors which are not present in the user's query image do not contribute to the intersection distance, furthermore this measure reduces the contribution of background col-ors. The sum is normalized by the histogram with fewest samples.

Generally speaking the bin-to-bin techniques suer of the so called quantiza-tion eect because the quality of the measure depend on the size of the bins: a coarse binning lead to a poor discriminative power and on the other side a too ne bin size will assign similar feature to dierent bins leading to a probable mismatch during the computation of the dissimilarity value. Another drawback is that they doesn't take into account the relationship between bins and thus the perceptual similarity.

2.4.4 Cross-Bin dissimilarity

Cross-bin dissimilarity measure is an important family of dissimilarity function that doesn't suer of the problem mentioned in the bin-to-bin dissimilarity case. They provide a nal quantity based also on the relationship between bins so are able to calculate dissimilarities that are much more close to the human perception, thus are highly suitable for CV applications.

For computing a measure of this type a perceptually-dissimilarity function or dis/similarity matrix among the set of bins must be provided and his role has a crucial importance in the nal result. In case of the quantization side eect occur the cross term contributes on the tness of the perceptual distance between bins.

(31)

2.4. ORIENTATION'S HISTOGRAM 31 Quadratic-form distance

This distance was introduced rstly in [30] and was successfully applied for color based image retrieval application.

dA(H, K) =

q

(h − k)TA (h − k)

where A is the similarity matrix that represent the cross-bin similarity. Given two bins (i, j) , aij ∈ A represent the similarity between the i-th and the j-th

bins. In case of histogram representing RGB images [30] suggest to use the following weight rule:

i ∈ H, j ∈ K = i-th and j-th histogram bin representing an RGB color aij = 1 − dij max dij dij = q (Ri− Rj) 2 + (Gi− Gj) 2 + (Bi− Bj) 2 Quadratic-Chi family

Presented in [32] its main idea is to combine the Quadratic-Form and the χ2

statistics. The result is a measure that take into account the cross-bin rela-tionship (from the Quadratic-Form) and reduce the side eect produced by the large bin values (from the χ2_{-statistics).The formulation is as follow:}

QC_mA(H, K) = v u u t X ij _H i− Ki (P c(Hc+ Kc)Aci) m _H j− Kj (P c(Hc+ Kc)Acj) m Aij

where A is the bin-similarity matrix which is non-negative and symmetric and each diagonal element is bigger or equal to every other element in its row (∀i, jAii ≥ Aij) and m be the normalization factor (m ∈ [0, 1)).

This measure exploit some interesting properties: Similarity-Matrix-Quantization-Invariance which ensure that in case of erroneous quantization of the histogram this will not eect the distance and the Sparseness-Invariance that ensure that the distance of sparse histogram will be equal to distance between full histogram.

Earth Mover Distance

The Earth Mover Distance is a distance measure between two discrete probabil-ity distribution, it is commonly used due to its capabilprobabil-ity of better emphasize the perceptual similarity than the previous proposed methods thus since his in-troduction in 1998 [35] became a standard distance measure between probability distributions (histogram). Many application related to the image retrieval elds has been constructed over this method [36, 34, 38, 23, 51] but not only, in fact we nd application in audio retrieval [52], pattern discovery [43], face recogni-tion [49] and mineralogy [17]. We start by dening the concept of signature.

(32)

Given two probability distribution, or histogram, a signature is x = {(x1, w1), (x2, w2), . . . , (xm, wm)}

y = {(y1, u1), (y2, u2), . . . , (yn, un)}

where wi ∈ Rk with i = 1 . . . m correspond to the amount of mass (weight) in

position xi in the x distribution and uj ∈ Rk with j = 1 . . . n is the weight at

position yj in distribution y. Roughly speaking wi represent the value of the

histogram at bin i-th and xi represent the position in the space of that bin.

If we think on the two distribution as two pile of material, we can say that the EMD is proportional to the needed work for turning the x distribution into the y distribution. This transformation is done by transporting material (mass) from location of x into locations of y till the pile of x is rearranged to look exactly like y. Computing the EMD is thus based on the solution to the well-known transportation-problem. Suppose to have several supplier (the rst distribution x), each with an amount of goods, are required to supply several customers (the second distribution y), each with a limited storage capacity. For each supplier-consumer pair the cost (ground distance) is given. We want to nd the least-expensive ow of goods from supplier to consumers that satisfy the consumers demands.

It can be casted into a linear programming problem and thus solved applying the simplex algorithm ([10]).:

Let x, y two signature as dened above and D = [dij]a ground distance matrix

where dij is the ground distance between xi and yj. We want to nd the a ow

F = [fij]such that it minimize the overall cost:

W ORK(x, y, F) = m X i n X j fijdij subject to: fij ≥ 0i = 1 . . . m, j = 1 . . . n (2.11) n X j=1 fij ≤ wi, i = 1 . . . m (2.12) m X i=1 fij ≤ ujj = 1 . . . n (2.13) m,n X i=1,j=1 fij = min   m X i=1 wi, n X j=1 uj   (2.14) Constraint 2.11 forces to moving the supplies from x to y and not vice versa, the 2.12 limits the amount of possible ow from each element of x to their weight, 2.13 limits the amount of possible received ow from to each element of y and the last 2.14 require to move the maximum amount possible of supplies.Once

(33)

2.4. ORIENTATION'S HISTOGRAM 33 the transportation problem is solved we found the optimal ow F.

The Earth Mover Distance is dened as the work normalized by the total ow: EM D(x, y) = Pm i Pn j fijdij Pm i Pn j fij (2.15) This measure [36] oer a the following advantages:

• Can be applied to the more general variable-size signature. Signatures are more compact and the cost of moving "earth" reects the notion of closeness avoiding the problem of quantization typical of the non quadratic measure

• Allow partial matches.

• Is a true metric if the ground distance is a metric and if the total weight of the two signature are equal. This is very important because allow us to create an image spaces with a metric structure.

• Match perceptual similarity better if the ground distance is perceptually meaningful.

The algorithm for solving the EMD is in class of O(n3₎ _{[36] but in literature}

we can nd many approximation of it [38, 23] that made it computationally interesting. Fortunately in case of 1Dimensional signature (like normalized his-togram) there exists an ecient linear time algorithm based on the cumulative distribution functions (CDFs) of the distributions [37].

Let x = (X; w) ∈ D1,m _{and y = (Y ; u) ∈ D}1,n _{be distributions on the real line}

(signature) and each point in X and Y are sorted by their position: x1< x2< · · · < xm∧ y1< y2< · · · < yn

The cumulative distribution function (CDF) of x and y are dened as

Wt =      0 if t ∈ (−∞, x1) Pk i=1wi if t ∈ [x1, xk+1, 1 ≤ k ≤ m − 1 wP=Pm i=1wi if t ∈ [xm, ∞) (2.16) Ut =      0 if t ∈ (−∞, y1) Pk i=1ui if t ∈ [y1, yk+1, 1 ≤ k ≤ n − 1 uP=Pn i=1ui if t ∈ [ym, ∞) (2.17)

If x and y are equal weight (like in probability distribution or normalized his-togram), then the work to transform one distribution into the other is the area between the graphs of the cumulative distribution functions of x and y. As we say in 2.15 the EMD is the work normalized by the minimum total weight of the two distribution. In case of two probability distributions the minimum total weight is equal to 1 for both the distribution, thus the EMD is exactly the work.

(34)

Figure 2.8: In this graph are represented the two cumulative distribution func-tion for x and y. The needed work to transform x into y is exactly the area between the two function. The arrow represent the ow from x to y in order to match their weight. In this picture the total area is 10 + 8 + 5 + 6 + 1 + 1 = 31, the total weight of the two distribution is equal to 13 thus the EMD(x, y) =

31

(35)

2.5. OPTIMIZATION 35

2.5 Optimization

Optimization is a eld of mathematic whose aim is to maximizing/minimizing a certain function, i.e. nding its local/global extrema. Often the objective function is subject to a set of constraints, in the case we fall in the constrained problem category. If the function or the constraints have non linear terms we say that the optimization is non linear. The famous Graduated Assign Method [14] use non linear optimization technique for approximating the solution of a graph matching problem. In our work we faces with a similar problem of [14], in fact our aim is to nd a mapping between two set of macropixels, which is the same as nding a mapping between two set of vertices in graph matching problem. We will see that our problem will be maximizing a non linear function subjected to a linear set of constraints.

A famous method to achieve this objective is the projected gradient which is a classical methods for non-linear constrained optimization.

2.5.1 Sinkhorn projection

The Sinkhorn projection [40] is a well known method to construct a doubly-stochastic matrix from a positive matrix. A doubly-doubly-stochastic matrix A is an n × nmatrix that respect the following constraints:

X i aij = 1, ∀j = {1 . . . n} X j aij = 1, ∀i = {1 . . . n}

The Sinkhorn algorithm normalize iteratively the rows and the columns of a matrix till the sum of rows and cols is equal to 1.

2.5.2 Von Neumann scheme

The Von Neumann [47] scheme is an iterative process used to geometrically nd the point in the intersection of two sets. In order to clarify the idea and port it into our optimization world we can think on a set of constraints as a set of geometrical entity. Starting from a generic point x0 _{we iteratively project}

it orthogonally to the rst constraint obtaining a new point x1_{, after that we}

project x1 _{orthogonally on the second constraints obtaining x}2_{, again project}

x2 orthogonally on the rst constraints in an alternation schema, indeed the schema become famous as the Von Neumann Alternating Projection schema. This schema is proved to be convergent on the intersection of the two constraints if they are both linear [47] or one linear and the other conic. Under these condition and interesting properties emerge: the Euclidean distance between the starting point x0 _{and the arriving point x}∗ _{is the minimal.}

(36)

Algorithm 2.3 Sinkhorn normalization method Ais positive starting n × m matrix

B ← 0n×m . an empty starting matrix

repeat for i = 1 to n do for j = 1 to m do bij← aij P

jaij . normalize each row

end for end for for j = 1 to m do for i = 1 to n do bij← aij P

iaij . normalize each column

end for end for

. compute the sum of each row and cols and check 6= 1 Rs← 01×n Cs← 01×n for i = 1 to n do for j = 1 to m do Rs[i] ← Rs[i] + bij Cs[j] ← Cs[j] + bij end for end for

until ∀i = 1 . . . n, Rs[i] 6= 1 ∧ ∀j = 1 . . . m, Cs[j] 6= 1

Figure 2.9: The iterative Von Neumann scheme. In blue we have the non negative constraint, in green the constraint with sum equal to 1, x0 is the

(37)

2.5.3 Projected gradient

The projected gradient is an iterative methods in the class of the gradient as-cend/descend methods. Since the gradient descent method is used for uncon-strained problem the projected gradient extends its capability in order to satisfy the required constraints.

The idea is to project the gradient of the function in a point onto the constraint subspace in order to dene the direction of movement which remain inside of the feasibility zone 3_{. Once funded the direction of increasing/decreasing}

(max-imizing/minimizing the function) we use this direction to make a step in the iterative process.

In this section we will give some hints on the method, for a more formal deni-tion see [8] (cap 12.4 pg.367). Considering a problem on the form:

max f (x) subject to: aT ix ≤ bi i ∈ I1 aT ix = bi i ∈ I2 (2.18) Where f is a generic function, the constraints are composed by a set of linear systems and I1, I2are two sets of indexes representing the active constraints. An

iteration of the projected gradient schema is as follow: Given x(t=0) _{an initial}

feasible solution

x(t+1)₌_{ConstraintProjection}x(t)_{+ α ∗}_{GrowthDirection(∇f)} _(2.19)

Where GrowthDirection(. . . ) is a method used to nd the direction of increas-ing (or decreasincreas-ing if we face with a minimization) respectincreas-ing the constraints and ConstraintProjection(. . . ) is used to project the step in the space of the constraints ensuring that the found solution remain feasible.

GrowthDirection

The method takes on input the gradient of the function4_{. Since we want to}

achieve an increase in the objective function we are interested in nding a di-rection along the gradient ∇f such that ∇fd > 0 and at the same time we would like to remain feasible with the point x(t)_{. This means that if we nd a}

growing direction with zero sum5 _{we will be on good street. Thus we consider} 3_{The feasibility zone is a space in which every point respect the problem's constraints} 4_{The gradient of a function is dened as the sum over all the partial derivative ∇f =} ∂f ∂x1+ ∂f ∂x2+ · · · + ∂f ∂xn

5_{Being at zero sum means that we are adding and subtracting values from the variables in}

the system but the overall quantity is indeed zero, this allow us to not break the constraints in any case remaining feasible

(38)

Figure 2.10: In the picture are shown the GrowthDirection phase of a point in a system with two variable (x1and x2) and one constraints (x1+ x2= 1). The

gradient of the point is projected into the tangent subspace of the constraints. The tangent space in red is orthogonal to the constraint (green) gradient (the arrow point to (0,0)). The fact that we want to achieve a zero sum lead the tangent subspace being represented by the subspace with x1+ x2= 0.

the direction aT

id = 0, i ∈ W(x), this means that vector d must lie in the

tan-gent subspace M formed by the active constraints (the working set 6 _W(x)).

In order to satisfy the zero equality, aT

id = 0, we have to project the gradient

orthogonally to the tangent subspace since the dot product become equal to zero. The tangent space is the orthogonal space of the constraint's gradient (see fg.2.10). Algorithmically the projection onto the tangent subspace of g. 2.10 is as follow:

ConstraintProjection

Once we had the feasible direction d we make the increasing (decreasing) step in order to increase (decrease) the function's value. The step is simply done by:

x(t)₌_x(t)_{+ α ∗}_d

Where α is a positive constant. Since this operation could lead to a non feasible point we project orthogonally the new point x(t)_{into the space of the constraints}

see g.2.11 by using the common orthogonal projection.

(39)

Algorithm 2.4 Growing Direction in [44] b ←Pm

i=1∇f [n, i] . the sum of the slack value in the last column

r ←Pn

i=1∇f [i, m] . the sum of the slack value in the last row

(IX)i←P n−1

j=1∇f [j, i], i = 0 . . . m − 1 . the sum of the element by column

excluding the slacks (XI)i←P

m−1

j=1 ∇f [i, j], i = 0 . . . n − 1 . the sum of the element by row

excluding the slacks IXI ←Pn−1

i=1

Pm−1

j=1 ∇f [i, j] . sum of the internal value of the matrix .

αand β are the quantity of the matrix to be projected along gradient of the row and column constraints respectively.

αi←

h

(XI)i+ ∇f [i, m] − IXI+(m+1)∗b−m∗r_m+n+1

i ∗ 1 m+1, αn+1= 0 βi← h (IX)i+ ∇f [n, i] − IXI+(n+1)∗r−n∗b_m+n+1 i ∗ 1 n+1, βm+1= 0

∇f0[i, j] ←X[i, j] − β[j] − α[i]

Figure 2.11: The step is calculated x(t)_{and after that we project it orthogonally}

(40)

Algorithm 2.5 Constraints Projection for g.2.11 [44]

X = X + α ∗ ∇f0 . be the step after calculating the feasible gradient

direction b ←Pm

i=1(X[n, i] − 1) . the sum of the slack value in the last column

r ←Pn

i=1(X[i, m] − 1) . the sum of the slack value in the last row

(IX)i←P n−1

j=1X[j, i], i = 0 . . . m − 1 . the sum of the element by column

excluding the slacks (XI)i←P

m−1

j=1 X[i, j], i = 0 . . . n − 1 . the sum of the element by row

excluding the slacks IXI ←Pn−1

i=1

Pm−1

j=1 X[i, j] . sum of the internal value of the matrix .

αand β are the quantity of the matrix to be projected along gradient of the row and column constraints respectively.

αi←

h

(XI)i+ (P [i, m] − 1) −IXI+(m+1)∗b−m∗r_m+n+1

i ∗ 1 m+1, αn+1= 0 βi ← h (IX)i+ (P [n, i] − 1) − IXI+(n+1)∗r−n∗b m+n+1 i ∗ 1 n+1, βm+1= 0

(41)

Chapter 3 The solution

This chapter is focused on the solution, giving pointers on the previous sections in order to both clarify and demonstrate the correctness of the approach followed here.

The corresponding application has been developed using C# and the OpenCV [5] library.

3.1 The main pipeline

The pipeline of the project can be seen in g 3.1.

3.1.1 Macropixel descriptor

The rst step is extracting the Macropixels descriptor. This part is divided into: • Construction of a regular grid of N point

• Execution of the watershed algorithm using the point in the grid as the seeds of the algorithm

• Each region from the watershed output becomes a Macropixel

• Calculate the center of each region as the average of the points that belong to it

• Calculate the shared border length between two neighborly regions • Construction of the L-levels Gaussian scale-space from the original image • Compute the gradient orientation for each level of the scale-space • Super impose on each level the watershed result

• Generate the histogram of the gradient orientation for each region in each level of the Gaussian scale-space

(42)

Figure 3.1: The pipeline of the proposed solution

At the end of this phase, for each image a list of regions described by their histogram of the gradient orientations at dierent scale is obtained.

Construction of a regular grid

Let I the image, w, h its width and height, n the number of point in horizontal direction of the grid and m the number of points in the vertical direction of the grid. The grid of n × m equal spaced points is: Once the regular grid has been generated and the watershed algorithm has been applied, the image segmentation (see g.3.2) will be obtained. Let W S the watershed result, r a region in W S (Macropixel), calculate the center and the shared boundary with the neighborly regions:

∀pt ∈ r, center[r] = P pt.x |r| ,

P pt.y |r|

where |r| is the number of points in the region r.

The shared boundary between two regions is calculated by checking which re-gions ids are separated by a point in the watershed lines.

(43)

3.1. THE MAIN PIPELINE 43

Algorithm 3.1 Grid construction and watershed segmentation

I . the image

w . Image width

h . Image height

n . number of horizontal points m . number of vertical points id ← 1 . the id of the current region stepx=

w n

stepy= mh

seed ← 0w×h . the nal seeds

for i = stepx

2 to w do

for j =stepy

2 to h do

seed[i, j] ← id . create the id-th seed id ← id + 1 j ← j +step_y end for i ← i +step_x end for − − − − − − − − − wshed ← alg2.2(seed)

Algorithm 3.2 Shared boundary

W S . the watershed pts ← {pt ∈ W S, s.t. p = −1} . set of points in the w.s. SB ← 0_{|W S|×|W S|} . |WS| = number of regions. SB is a symmetric sparse matrix

for each pt ∈ pts do . for each point in the watershed line for each n ∈ NB[pt] do . NB[pt] is a queue containing the neighborhoods regions id without duplicate

for each m ∈ NB[pt] 6= n do SB[n, m] ← SB[n, m] + 1 end for

N B ← N B − n . remove the id avoiding multiple counting for that w.s. point

end for

pts ← pts − pt . remove the point from the watershed end for

(44)

(a) (b)

(c) (d)

Figure 3.2: a) Initial image b) regular grid 5 × 5. The point is made bigger for the sake of clearness. A seed is intended as single pixel with an unique id. c) watershed segmentation result d) the center of each region

3 ₂ 1 3 3 2 1 -1

Figure 3.3: In the center there is a watershed point (id = −1) and ids in its neighborhood are checked

(45)

3.1. THE MAIN PIPELINE 45

(a) (b) (c) (d) (e)

(f) (g) (h) (i) (j)

Figure 3.4: The Gaussian scale space and the gradient orientation

3.1.2 Gaussian scale space

As explained in sec.2.3.1, the usage of a scale-space allows the algorithm to be scale invariant. The building of the scale space is done by convolving the orig-inal image with a Gaussian kernel. In our approach we have used a level = 10 level scale space and progressively increased the value of σ in a SIFT[24] avor. Although the approach does not contemplate a down-sampling phase. Given a starting image I, the following images in the scale-space are built as follow:

l = {0, . . . , 9}the levels σl =

√ 2l

2

W = 3 ∗ σl the size of the Gaussian window

I0 = the initial image

Il = (Il−1∗ G(σ,W ))

The result is in g3.4

Computing the gradient orientations histogram

Once the region has been extracted and the scale space has been made, the orientations of the gradient at each level must be extracted. As said, the orien-tation of the gradient reveals the direction of the maximum changes in image intensity. This information will be used to compare dierent regions.

(46)

To extract the x and y derivatives of the images a Sobel lter Sx, Sy (see sec

2.1.2) is used. Once the two derivative are computed the gradient orientation (see sec. 2.1) is a straightforward operation:

l = 0, . . . , 9 ∂x = Il∗ Sx

∂y = Il∗ Sy

θl = atan2(∂y, ∂x)

Generating the descriptor

Now all the necessary operations for computing the Macropixels descriptor have been performed.

The Macropixel descriptor for a region r ∈ W S is the union of all information in one object: Ila set of images in the scale space, the center of each region, θl

the matrix of the orientations of the gradients and W S the segmentation of the image by using the watershed algorithm.

Algorithm 3.3 Macropixel Descriptor for r ∈ W S do

mpi←null

mpi.center ← r.center

for l ∈ {0, . . . level} do

mpi.Hl← r.θl . ll the histogram at level l

end for end for

3.1.3 Feasible Assignments

From now on we will start thinking in the term of assignment between macropix-els in two images. The rst problem is the dimensionality of the approach. If we have two images with n macropixel, the possible assignment is in the order of O(n2₎_{, and this quantity will grow till a O(n}4₎_{of relationships in the next}

section.

In order to reduce the dimensionality of the problem it was decided to prune all the possible assignments by thresholding. In that way only the much similar ones are kept. Given two images A, B and their set of macropixel the photo-metric similarity among two regions is calculated as follows:

nBins = 72 the number of bins in the histogram of orientations(3.1) simp(u, v) = e−λF A∗d(u,v) with u ∈ A, v ∈ B macropixel (3.2)

d(u, v) = min l∈levels min s∈nBinsEM D( −−→ Hul s , Hv0) (3.3)

Identificazione e segmentazione contestuale di oggetti tramite regolarizzazione delle corrispondenze

Corso di Laurea magistrale (ordinamento ex

D.M. 270/2004)

in Informatica

Tesi di Laurea

Identificazione e segmentazione

contestuale di oggetti tramite

regolarizzazione delle

corrispondenze

Relatore

Ch. Prof. Andrea Torsello

Laureando

Sebastiano Vascon

Matricola 788442

Anno Accademico

2011 / 2012

Contents

List of Figures

List of Tables

Chapter 1

Introduction

1.1 Objectives

1.2 Segmentation techniques, an overview

1.3 Graph matching techniques, an overview

1.4 Structure of the thesis

Chapter 2

Background knowledges

2.1 Image gradient

2.1.1 Dierential lter

2.1.2 Sobel lter

2.2 Watershed segmentation

2.2.1 Flooding from minima or maker

2.2.2 The Ordered Queue

2.2.3 Algorithm

2.3 Scale space

2.3.1 Gaussian scale space

2.4 Orientation's histogram

2.4.1 Bin circular shift ⇒ Gradient rotation

2.4.2 Histogram distance function

2.4.3 Bin-to-Bin dissimilarity

2.4.4 Cross-Bin dissimilarity

2.5 Optimization

2.5.1 Sinkhorn projection

2.5.2 Von Neumann scheme

2.5.3 Projected gradient

Chapter 3

The solution

3.1 The main pipeline

3.1.1 Macropixel descriptor

3.1.2 Gaussian scale space

3.1.3 Feasible Assignments

2.1.1 Dierential lter

2.1.2 Sobel lter