Speed-up for AMAT implementation using pyramid reduction

(1)

Master’s Degree

in Computer Science

Final Thesis

Speed-up for AMAT implementation

using pyramid reduction

Supervisor

Ch. Prof. Andrea Torsello

Graduand Alessandro Bicciato Matricolation number 845487 Academic Year 2018 / 2019

(2)

(3)

Chapter 1 Introduction

Since the birth of geometry thousands years ago nobody approached the problem of recognizing the shapes and then consequently the objects in a mathematical way. In 1826, Nic´ephore Ni´epce with his famous photo View from the Window at Le Gras started a new era, the era of photography. Now the human could imprint on a bidimensional object the tridimensional space of the natural environment. Yet, we must wait for more than a century to find the first articles of researchers trying to theorize the concept of shape recognition.

In 1935 Koffka started trying to approach the problem from psychology with his Gestalt principles [16]. Simplifying, those principles analyzed the brain reasoning method. For instance, some of the principle were the law of closure, in which the brain, preferring big shapes, fill the incomplete parts of an object; the law of common region in which a person is able to group the elements that are in the same closed region; the law of symmetry in which the brain seeks balance and order in designs. We have to wait other three decades to get a functional mathematical model in this area, precisely in 1967, when Blum theorized the concept of medial axis transform (MAT) [4] in biology. Before defining the medial axis transform we must define what is the medial axis of an object: it is the set of points having more than one closest point on the object’s boundary. Blum initially referred to them as topological skeletons. If we associate to each of those points a disk whose radius is a strict maximum to the point boundaries we have the medial axis transform.

However, the theory behind Blum work involved the recognition of shapes in a neutral environment and not considering the chromatic spectrum. Those limitations avoided the application of the theory to the natural images. The natural images, as suggest the name, are all the representations of a subject in its natural environment through photos or drawings. For our purposes we include also chromatic objects in a blank background, like logos or isolated natural objects. The two natural consequences of this work were to give a meaning to the MAT objects that they were now able to find, i.e. recognize what objects were the skeletons found, and focusing on expand the theory validity of shape recognition for natural images.

On this second aspect, half a century later, on August 2017, Stavros Tsogkas and Sven Dickinson from University of Toronto published an article and an

(6)

algorithm about a new approach of solution on the medial axis transform that worked also for natural images, the appearance medial axis transform [23]. The work was later presented at the International Conference of Computer Vision (ICCV 2017) in Venice, Italy. The solution key of this project wasn’t to detect the shape in the picture and then build its MAT, but it was to make all the MAT for the shape, the foreground and the background and then make an hypothetical recognition of the various patterns. In this way they could theorize an algorithm that worked for all the conditions.

Although the brilliant solution, the algorithm is quite slow but it lends itself to the development of new extensions. So, here it is where our work starts, we wanted to find new methods that could improve the speed and other components worthy of improvement in the code. From February 2018 to September 2019 we worked trying to improve it, touching speed, simplicity of computation, compression and accuracy of the results. In this thesis we will follow these months of work, make some considerations about the results and if it is worthy continuing along the different branches.

We followed three branches:

• include an algorithm for pyramid reduction to simplify the calculation and then as consequence a speed up of the code;

• add a neighborhood search on the solutions found to have a more complete and lighter cover;

• accelerate the whole execution using the graphic card and parallel computing; The thesis will deepen the history presented earlier with a summary of the theory of medial axis transform (Section 2.1) and the appearance medial axis transform (Section 2.2). We will analyze the exact points where the code is slow making a big impact on the execution and we will inspect which are the most important methods. Then we will talk about the first branch, the pyramid reduction (Chapter 3), a brief of theory and the impact on the algorithm. This branch was also the most developed and it took most of the time. Our choice of use the pyramid reduction aimed to gain speed because we fragmented the input image in smaller images and as consequence we simplified the calculations because the code worked with smaller disks resulting in a sort of divide and conquer function.

The medial axis transform of a shape should be, in a very practical way, a connected line with forks, the natural development path could be the search for points near a point in the set of the skeleton which belong themselves to the skeleton. We implemented this idea and we will discuss it in the Chapter 4 about our neighborhood search branch with the various implementations tried and the expectations. In this branch the speed was a marginal aspect trying to focus on the accuracy and the compression of the skeleton.

In the last branch of development we wanted to take advantage of the hardware moving the heaviest computations to the graphic card. We will analyze in Chapter 5

(7)

the pros and cons and how we integrated the GPU computation into the code using the tools provided by Matlab.

Summarizing, with the three branches we essentially aimed for more speed and accuracy from AMAT algorithm. We expected more speed in the pyramid reduction in a exponential way with enlarging of the set of disk’s radii and a proportional way using the GPU as the complexity would be always the same but it will change the computing power. From the neighborhood search we expected more accuracy and compression of the skeletons even if we lose some speed.

We will inspect the results for all the branches (Chapter 7) but before we will also explain the initial approach to the source code and the heavy modifications we had to do in order to modify the code for our purposes (Chapter 6). The job isn’t completed yet, in the conclusion (Chapter 8) we will discuss with a look to the future for new potential ideas to expand the algorithm as there is still plenty of space of improvement that it can be done.

(8)

(9)

Chapter 2 History

The first works on shape recognition were approached from a psychological thematic. Researchers wanted to understand what happened in the visual system and in the brain when we recognize a shape.

A very huge current of thought was the psychology of Gestalt, begun at the end of the 19th century and it found is major exponent in Kurt Koffka that in 1935 formalized the Gestalt principles [16]. There are fifteen principles, the most important for our path are these five:

• Symmetry: the brain seek balance and order in design;

• Common region: we group the elements in the same closed region; • Continuation: we follow lines and connect them if there is a gap;

• Closure: we prefer big and complete shapes, so we automatically fill in the gaps;

• Convexity: we perceive convex shapes ahead of concave ones.

Later, in mid fifties, other psychologists made studies on visual identification. Bitterman et al. made a study in which they searched the correct Foveal form threshold for luminous figures briefly exposed in a dark room in order to being correctly identified [3]. They found that, with a derived model of the K¨ohler-Wallach figural after-effects theory1, the threshold directly varies with the ratio of perimeter to area of a shape and inversely with the magnitude of critical detail.

Deutsch et al. studied the process of classifying environmental stimulation in order to attempt to elaborate a mechanism of shape perception [8]. They studied also the perception of animals, finding that a shape is recognized independently of the location in the visual field, of its inclination angle and its size. Moreover, they found that humans and rats perceive mirror images alike and primitive organisms find hard to distinguish between squares and circles. These abilities are located in the striate 1_{The K¨}_{ohler-Wallach figural after-effects is a theory based on the spread of currents which build}

(10)

cortex and survive when the most of the cortex is removed. Furthermore there is an excitation in the retina when contours are recognized.

In 1958 Kirsch et al. tried to find a pattern recognition algorithm with the SEAC2 [15]. The main focus of its work was however the introduction of the very first scanner that could convert an image in a digital representation. Kirsch stated that in pattern recognition the aim is to reduce the amount of information to the minimum necessary for recognizing one pattern from a group of patterns. The algorithm relied into scanning sequentially the gray-scaled image until a black point was found (assuming that it was the pattern of a shape) and then followed the trace to get the profile. They was in a sort of way very close to what almost a decade later Blum will enunciate as the medial axis transform.

2.1 Medial Axis Transform

Medial Axis Transform was first introduced in 1967 by Harry Blum [4]. Blum approached the problem from the observation in biology, stating that in two thousand years no formulation about it wasn’t done in geometry. However there are some differences between the biological and physical problem. And to complicate things, the problem of exploring function is hard to do in isolation, so he approached the problem exploring together the geometry and visual function that result. At the time we knew that intensity, color, edge, angle and motion, i.e. the visual field, and the average global properties such as total luminous flux were largely experimented, but the global properties, in particular the distribution, weren’t and were unknown. The shape of an object has been shown to be a stable organization, however the difficulty to recognize its form resides in the enormous number and variety of shapes. If we can’t assume that a shape can be isolated and normalized a priori, or the variety of perturbations that a shape may take and still be identified, we have to think to new properties without have too many combinatories of simple congruence views. Before 1967 some researchers tried to formulate theories starting from propagation or diffusion models of interaction [3][8][14][15][16] but failed to arrive to a set of incisive shape attributes. In his work, Blum developed that past notions and gave a precise set of shape attributes useful in two ways: developing a psychology of shape for everyday life and focusing attention on some of its biases.

In other words, Blum introduced a method for biological shape recognition using the medial axis of an object as the set of all points having more than one closest point on the shape’s boundary.

2_{Standard Eastern Automatic Computer: an electronic computer built from 1950 with an 11}

(11)

2.1.1 Definition

We can define the medial axis transform through time: consider a continuous isotropic plane with each point that has the following properties:

• excitation - represented by a binary status 0/1;

• propagation - each excited point excites an adjacent point with a delay proportional to the distance;

• refraction - an excited point can’t be re-excited after an arbitrary interval of time.

A visual stimulus from which the edges that have been extracted impinges on that plane at some fixed time and excites the plane at those points. Then the excitation spread on all directions. We can see on Figure 2.1 that the waves generated from two points expand arrive to the point of being a single contour when they intersect. When a shape is convex and we try to propagate it, it won’t generate points outside itself.

Figure 2.1: Propagation waves from two points

Let’s assume the shape of two arcs and the tangents connecting them. They result in a convex shape. Then propagate the shape inscribing it. We have a corner appearing at time t1 near the center of the smaller arc and then disappear at time

t2 in the center of the bigger arc. All the corner points have a velocity associated

with them that is determined by the angle formed. An accelerating point would result from curvature toward this locus and a decelerating point would result from curvature away from this locus. The locus of the corners will be called the medial axis. If we include the time on those points, we will refer to them as the medial axis function.

Let’s analyze Figure 2.2. We can see a rectangle and its medial axis function with different distortion perspectives. The frontal one has many heavy dots representing parallel lines and they occur at the same time. Then we rotate the perspective, we see that the velocity of the central element becomes the cue to perspective angle arriving to the point in the third shape that we can’t recognize anymore the rectangle.

(12)

Figure 2.2: Rectangular perspectives with medial axis

2.1.2 Shock graphs

Years later, in 1999, Siddiqi et al. [19] introduced the concept of Shock Graph, a theory for the generic representation of 2D shapes where their structural descriptions were derived from the shocks, or singularities, of a curve evolution process, acting on bounding contours. These shocks are organized in acyclic graphs. Let’s consider the static picture obtained in the limit of Blum’s grass fire transformation: the locus of shock positions gives the medial axis. In this case the shock provides the additional information of the radius function along the medial axis. We have a much richer foundation for recognition that it is obtained from an unlabeled skeleton.

We can observe four types of shocks (Figure 2.3): • 1-shock: the radius varies monotonically

• 2-shock: the radius function achieves a strict local minimum • 3-shock: the radius is constant

• 4-shock: the radius function achieves a strict local maximum

(13)

2.2 Appearance Medial Axis Transform

In 2017 Tsogkas and Dickinson introduced a new method of generalization of the medial axis transform for natural images, the Appearance-MAT (AMAT) [23]. The method was a weighted geometric set cover problem in which they extended the detection of the medial point with a local scale and made an algorithm for inverting the result, i.e. reconstructing the original image.

Their algorithm focuses around the medial axis transform reconstruction and the association of each medial point with its scale and its local appearance information. Also, they used a bottom-up grouping scheme that exploits the additional scale and appearance information to connect points into medial branches. This lead a no assumption of the knowledge of the objects in the picture, computing the medial axes for both the foreground and the background.

Computing all the image means introducing a factor of compression in the image, arriving to use only circa 10% of its pixels. However, this compression is lossy.

For the binary shapes they used the concepts introduced in Section 2.1.2 of shock graphs and of bone graphs [17]. Bone graphs offer improved stability and a more intuitive representation of an object’s parts by analyzing and identifying connecting structures.

Yet, the works of medial axis detection on natural images is not quite developed compared to the binary one. The work on AMAT relies on pick elements from both worlds, having more similarities with the binary detection. The critical aspect of the algorithm is that it doesn’t involve learning or using data-sets for detecting the medial axes.

2.2.1 Definition

Figure 2.4: Reconstruction process, disk representation and density differences

In Figure 2.4 we can see on the left image the medial axis of a jaguar O and its boundary ΘO. This is a visual representation of the medial axis transform and

we should be able to reconstruct the image from the skeleton. In natural images this process is more complicated because we have complex scenes, with numerous objects and let’s not forget the colors properties and texture distributions. Also, we can note that in the image there is a composition of many small regions of uniform

(14)

appearance. This lead to the same assumption of most of the superpixel algorithms clustering the image respecting perceptually meaningful region boundaries.

On the middle image of Figure 2.4 we can see the definition of the disk in a RGB environment. We denote the disk of radius r with center in point p as Dp,r. Now,

consider an RGB image I ⊂ R3 _{and a disk-shaped region D}I

p,r ⊂ I. Let f : D I

→ RK

be a function that maps DI

p,r to a vector fp,r= f ◦ DIp,r that we will be the encoding

of DI p,r.

Now we define the decoding function g : RK _{→ D}I _{as the function that maps}

fp,r to a disk patch ˜Dp,rI = gp,r = g ◦ fp,r.

Generally, f and g are lossy mappings, which means that the reconstruction error ep,r= k ˜DIp,r− DIp,rk ≥ 0.

We are now able to define mathematically the AMAT as the set of tuples M : {(p1, rp1, fp1,rp1), ..., (pm, rpm, fpm,rpm)}, such that: M = arg min p,r m X i=1 epi,ri I = m [ i=1 DI_p_i_,r_i (2.1)

2.2.2 Set cover problem

Consider the Equation 2.1 of the AMAT definition, our goal is to find the subset of disks that cover the image, maintaining a low reconstruction cost. Following MAT principles, we seek a sparse solution with a low number of medial points and aim to reconstruct the input picture. This means we have a geometric set cover problem.

We need to define to each disk a cost property cij ∈ Dp,rI . Now we have to

regularize the minimization criterion in the equation by adding a scale-dependent term sj = ω_r_js ∝ _r1_j to the cost cij. In this way we avoid the useless solution to select

each pixel with radius 1 and we prefer larger disks as long sj isn’t too large with

respect to the error incurred by picking Dp,rj+1 instead of Dp,rj. Selecting a high

value for ωs leads to a sparser solution with higher total reconstruction error. The

right element of Figure 2.4 shows an example of two cases. When we are selecting the bigger red disk, we lose so much detail compared to the smaller green ones. This was a main problem in the realization of the neighborhood search (Chapter 4).

The choice of the resolution method for the set cover problem was to use a greedy approximation algorithm adapted for support weights.

(15)

Algorithm 1 AMAT greedy algorithm Input: XI _{= p}

1, ..., pN, R = r1, ..., rR, f, g

Output: M

1: Initialization: M ← ∅, Xc← ∅ . Xc_{: covered pixels}

2: Compute fp,r, gp,r = g ◦ fp,r, cp,r, ∀p ∈ I, ∀r ∈ R 3: while Xc _{⊂ X}I _do 4: ce p,r ← cp,r |Dp,r/Xc|+ ωs r , ∀p ∈ I, ∀r ∈ R 5: (p∗, r∗) ← arg min_p,rce p,r 6: cp,r ← cp,r− cp,r |Dp∗,r∗/Xc|, ∀p, r : Dp ∗_,r∗∩ D_p,r6= ∅ 7: M ← M ∪ (p∗, r∗, fp∗_,r∗) 8: Xc_{← X}c_{∪ D} p∗_,r∗ 9: end while

In Algorithm 1 we have the steps of the greedy algorithm. At line 2 we compute all the possible costs cij for all the disks Dij. We have another cost

variable defined as effective cost that keeps track of the covered pixels, ce

ij = cij

Aij + sij where Aij is the number of pixels covered inside Dij.

Inside the loop (line 3) we find the minimum cost ce

ij (line 5) and add the

correspondent disk Dij to the solution set M (line 7). We remove the area of Dij

from the remaining pixels (line 8). On each iteration we must recalculate the costs of the disks that intersect Dij (line 6) allowing each disk to be penalized only for

the new pixels that it is covering. Disk cost computation

Instead of using a simple error metric to compute cij which could score low error

without respecting image boundaries, Tsogkas and Dickinson proposed an alternative: convert the RGB image into the CIELAB color space3 that it is more suitable for measuring perceptual distances.

We can now define the cost of Dij as

cij = X k X l kfij − fklk2∀k, l : Dkl⊂ Dij (2.2)

which means that a low cost cij implies that the encoding fij represents all disks

that are fully contained in Dij, so Dij is not crossing any region boundaries.

2.2.3 Complexity

The first execution in the algorithm is the computation of the costs for all disks in Dij. If we have a large rj, the running will be heavier, resulting in a whole

complexity of O(N R4_{). However, this isn’t the most time consuming step, because}

3_{CIE L*a*b* space is characterized with three variables, L* the luminosity going from 0 to 100,}

(16)

the critical point is the greedy approximation algorithm. At each iteration we cover at most O(R2_{) pixel but we have also to recompute the calculus of the costs of all}

the overlapping disks, resulting in a complexity O(N R2PR

r=1r

2_{) = O(N R}5_).

We want to make particular emphasis on the update of the costs, analyzing its Matlab code of Listing 2.1 and of Listing 2.2, a critical point of all the research.

We can see that the update method is a simple computation, in input are passed the AMAT class instance, the disk center’s coordinates and the binary matrix of the new pixels covered. Most of the code lines are assignments of the covered space. Inside we call the updateCosts method that raises the computation difficulty as explained earlier. In this method we have a loop that interacts with all the radius set. There are five heavy computations (lines 15, 17, 20, 22, 25) that recalculate the costs. The most expensive operation among them is the 2-D convolution between the area covered of a certain radius and the correspondent filter, located at row 15. The function is defined as: given two-dimensional variables A and B the convolution on point C(i, j) is C(j, k) =X p X q A(p, q)B(j − p + 1, k − q + 1) (2.3)

where p and q run over all values that lead to legal subscripts of A(p, q) and B(j − p + 1, k − q + 1) [6].

During the research we tried, with also the help of Tsogkas and Dickinson, to search for other faster methods to replace or to simplify the operation of convolution, arriving to a simplification of the input parameters in the pyramid reduction system (Chapter 3) but we didn’t find a real alternative.

1 function update(mat, minCost, xc, yc, rc, newPixelsCovered)

2 mat.covered(newPixelsCovered) = true;

3 mat.price(newPixelsCovered) = minCost / mat.numNewPixelsCovered(yc, xc, rc);

4 mat.axis(yc, xc, :) = mat.encoding(yc, xc, :, rc);

5 mat.radius(yc, xc) = mat.scales(rc);

6 mat.updateCosts(xc, yc, newPixelsCovered);

7 end

Listing 2.1: Update function

1 function updateCosts(mat, xc, yc, newPixelsCovered)

2 [yy, xx] = find(newPixelsCovered);

3 xminCovered = min(xx);

4 xmaxCovered = max(xx);

5 yminCovered = min(yy);

6 ymaxCovered = max(yy);

7 newPixelsCovered = double(newPixelsCovered);

8 for r = 1:mat.numScales

9 scale = mat.scales(r);

10 x1 = max(xminCovered - scale, 1);

11 y1 = max(yminCovered - scale, 1);

12 x2 = min(xmaxCovered + scale, mat.numCols);

13 y2 = min(ymaxCovered + scale, mat.numRows);

14 % Find how many of the newPixelsCovered are covered by other disks.

15 numPixelsSubtracted = conv2(newPixelsCovered(y1:y2, x1:x2), mat.filters{r}, ’ same’);

16 % and subtract the respective counts from those disks.

17 mat.numNewPixelsCovered(y1:y2, x1:x2, r) = mat.numNewPixelsCovered(y1:y2, x1:x2, r) - numPixelsSubtracted;

(17)

18 % update diskCost, diskCostPerPixel, and diskCostEfficiency *only* for 19 % the locations that have been affected, for efficiency.

20 mat.diskCost(y1:y2, x1:x2, r) = mat.diskCost(y1:y2, x1:x2, r) - ...

21 numPixelsSubtracted .* mat.diskCostPerPixel(y1:y2, x1:x2, r);

22 mat.diskCostPerPixel(y1:y2, x1:x2, r) = mat.diskCost(y1:y2, x1:x2, r) ./ ...

23 max(eps, mat.numNewPixelsCovered(y1:y2, x1:x2, r)) + ... % avoid 0/0 24 mat.BIG * (mat.numNewPixelsCovered(y1:y2, x1:x2, r) == 0); % x/0 = inf 25 mat.diskCostEffective(y1:y2, x1:x2, r) = ...

26 mat.diskCostPerPixel(y1:y2, x1:x2, r) + mat.ws / mat.scales(r);

27 end

28 % Make sure disk with the same center is not selected again 29 mat.diskCost(yc, xc, :) = mat.BIG;

30 mat.diskCostEffective(yc, xc, :) = mat.BIG;

31 end

(18)

(19)

Chapter 3 Pyramid reduction

The divide and conquer algorithm design paradigm works using a recursion breaking down a problem in more sub-problems until it becomes simple enough to be efficiently resolved and the solutions are combined in order to give the solution to the initial problem. Inspired by this design we thought about a similar approach for AMAT.

The born idea was to use the concept of image pyramids to break the set of solution disks into many subsets. The image pyramid is a set of images that are a derived scaled image from the original. If the scale is a fraction of the origin scale that image is said a downsample, otherwise it is an upsample. The exact upsample/downsample are always mapped with a scale of power of 2, if they have another scale they are defined as wavelets.

The concept of fragmenting the disk set led to study only the exact downsampling of the input image. We will have a set of images (we will refer to them as levels) whose disk’s radius set is limited to few and very small sizes. This should induce simpler computations on all the calculations that involve the disks and moreover for each level we will have less area to cover as the image is smaller than the original.

3.1 Theory

The pyramid method is a type of image transformation [21] often used when we are in cases where we have to scale an image and we don’t know a priori the final resolution we want. For instance, in face recognition where we have to standardize the dimensions of a face through enlargement and reduction of the whole photo in order to be able to detect it.

In our research we used the pyramid method on the input image to simplify the update of costs process (Listing 2.2). In this case we had to reduce the input image in various smaller images stopping at the maximum covering radius wanted.

As we wanted to only reduce the input images, we didn’t had to face the upsampling problem, however it was left the downsampling or decimation problem. In order to solve it we have to compute a convolution of the image with a low-pass

(20)

filter and keep every r -th sample resulting in g(i, j) = 1 r X k,l f (k, l)h(i − k/r, j − l/r) (3.1)

On Figure 3.1 we can see a representation of an image from the original (a) to the convoluted (b).

Figure 3.1: Decimation before and after the convolution

There are various filters associated to r, the commonly used is the binomial filter with r = 2 introduced by Burt and Adelson in 1983 [5]. On Table 3.1, in which we have some of the commonly occurring filters and signals, we can see how the binomial kernel does a good job separating high and low frequencies leaving a fair amount of high-frequency detail that leads to aliasing.

For high downsampling rates the windowed sinc is preferred, on the other way. In our proposed solution we support all the filters shown on Table 3.2 setting as default filter the binomial (aka Gaussian).

Looking at Figure 3.2 we can analyze the different behavior of the various filters: • the linear is the poorest at aliasing

• the binomial cut more than half of the frequencies

• the cubic filters cut less than the binomial but they have a sharper fall-off • the windowed sinc has almost the smoothest cut

• QMF-9 has the smoothest fall-off

• the JPEG 2000 is very similar to the linear

We said earlier that in our algorithm we set as default the binomial filter, the reasons lay into being the most used, the almost simpler to calculate (introduces minor errors) and carrying among the levels the most manageable aliasing.

Managing the cover on the aliasing spaces when converting the cover from level to level was the main problem for this extension of AMAT.

(21)

Name Kernel Transform Plot box-3 1₃ 1 1 1 1₃(1 + 2 cos ω) box-5 1 5 1 1 1 1 1 1 5(1 + 2 cos ω + 2 cos 2ω) linear 1₄ 1 2 1 1₂(1 + cos ω) binomial ₁₆1 1 4 6 4 1 1₄(1 + cos ω)2 Sobel 1₂ −1 0 1 sin ω corner 1₂ −1 2 −1 1 2(1 − cos ω)

Table 3.1: Filter Fourier transforms

|n| Linear Binomial a = −1Cubic a = −0.5Cubic

Windowed sinc QMF-9 JPEG2000 0 0.50 0.3750 0.5000 0.50000 0.4939 0.5638 0.6029 1 0.25 0.2500 0.3125 0.28125 0.2684 0.2932 0.2669 2 0.0625 0.0000 0.00000 0.0000 −0.0519 −0.0782 3 −0.0625 −0.03125 −0.0153 −0.0431 −0.0169 4 0.0000 0.0198 0.0267

(22)

Figure 3.2: Frequency response for the filters of Table 3.2

3.2 Construction of the pyramid

After the basis of the previous section, we need to answer how we made the pyramid and moreover how we used it.

Having in mind that we want something like Figure 3.3(a), where l = 0 correspond to the input image, we need to set how many levels we want and the reduction factor. For the last one we made a reduction for each level of half (0.5×) not considering introducing wavelets.

For setting the number of levels we came up with a formulation that involves the input scales for the set cover problem. Reminding the fact that pyramids can be used to accelerate coarse-to-fine search algorithms and to look for patterns at different scales we used them in AMAT to compute only few and scaled (so automatically smaller) disks for each level starting from the bigger disks in the coarser levels and refining the cover search in the finest level until we arrive to the original image scale.

(23)

(a) Visual representation of an image pyramid

(b) Relation of the pixels of an octave pyramid from the finest (top) to the

coarsest (bottom) level

Figure 3.3: Examples of image pyramids

3.2.1 Number of levels and their scales

The first thing to calculate is the number of levels we need. In input of the execution of AMAT we ask for the range of radii of the disks we are going to use. With the minimum radius we calculate the pyramid’s bottom level as

bottom level = log₂ minimum 2 + 1, bottom level ∈ N (3.2) and with the maximum radius we calculate the top level as

top level = log₂maximum 4 + 1, top level ∈ N (3.3)

Using these formulas we will have all the levels that will cover with radius only between 2 and 4. However, as we can see in Algorithm 2 the method is designed to not search for the same disks at different levels, i.e. radius 2 of a level is radius 4 of its lower level and we prevent to include it, and when we calculate the coarser level we avoid to make a level’s radius set composed by only one element inducing variability when we do the cover.

(24)

Algorithm 2 Calculus of the scales for every level Input: scaleRange

Output: scalesPerLevel

1: top level = jlog₂ max(scaleRange)₄ k+ 1

2: bottomLevel =jlog₂min(scaleRange)₂ k+ 1

3: for i = bottomLevel to topLevel do

4: if i == bottomLevel then

5: smallestRadius =jmin(scaleRange)₂i−1

k

6: else

7: smallestRadius = 3

8: if i == topLevel then

9: biggestRadius =lmax(scaleRange)₂i−1

m

10: else

11: biggestRadius = 4

12: Set scalesP erLevel(i) ← {smallestRadius, ..., biggestRadius}

3.2.2 Pyramid

Like we said in Section 3.1, in our actual construction of the image pyramid we used the filters of Table 3.2 adding a parametric cubic filter and setting as default the binomial filter that produced the most suitable images for the cover. We decided to make a custom function that made it instead of using the native function that reduce the image (impyramid [12]) because it support only one filter, the Gaussian, and it doesn’t generate the whole pyramid but only the upper or lower level.

3.3 Implementation

Recalling Section 3.2.1 we need to cover the various levels with almost the same range of radii of disks. On Figure 3.5 we can see how the biggest and the smallest disks for each level are portrayed on the upper level.

To approach the set cover problem, we began with the intuition to start from the coarser level and go down to the input resolution spreading the cover in the levels below to form like a tridimensional tree of cover. However, the initial algorithm using the greedy approach didn’t fit for this kind of solution.

The next solution we tried was to set a threshold filter control for each level on the disk costs to stop when we had encountered too many solution’s disks that had a minor disk cost on its lower levels. This time the original algorithm fit but, after the first level, the behavior of the algorithm started to act stopping too early resulting in a large part of the image not being covered at all.

These failures lead to the last and final approach: 1. keep the greedy method to cover all the first level;

(25)

Figure 3.4: Example of disk representation through different levels

2. make a fit of the solution for the next level; 3. cover the remaining parts;

4. repeat the steps from 2. until we pass each level along the pyramid to the base.

As we have a 2× decimation we had another problem to solve, the scale of the selected disks. On Figure 3.5 we have a representation of the problem, when we enlarge a disk we enlarge also its center as it hasn’t got an atomic representation in the image space resulting in four pixels. Also, the radius r can be of value 2r or 2r − 1 (starting the images with coordinates (1, 1)).

With these elements we have to find which of the eight possible solutions for every disk chosen before fit in the new level cover. Our final attempt was to set a threshold function on every center, so we may pick all four or also none of the points. Let’s have a point in the image that is part of the upper level cover’s axis. In the threshold formula we need to retrieve the minimum cost from the effective costs’ set of the origin level at that point and then we find its rank percent value. When we have that ratio, we have to keep in consideration the radius of the disk we found initially as a big disk is more inclined to be affected to noise than one small. So, formally, we get the formula:

rank(costsEf f ective(x, y, r)) |costsEf f ective| ·

max(scales) − r + 1

max(scales) < 0.05 (3.4) where costsEf f ective is the sorted set of unique costs of the origin level and scales is the set resulted by the union of the origin and the computing level. The function trims the bigger radii on few admissible ranks than the smaller to avoid to select big disks picked for being the minimum radius available and including image noise. If our hypothetical disk pass the filter we can include it in the current layer solution and we have to update the disk costs like in the greedy cover method (Section 2.2.3). However, this time we are not updating among all the radii set but we can exclude the bigger ones that were computed in upper levels, and the

(26)

Figure 3.5: Visual representation of the four different output circles when enlarging a covered disk from the upper level

smaller because they are too small to be used. Recalling Listing 2.2, in the conv2 call (line 15) we will have only very small matrices that will fairly speed up the whole execution.

Our concern among the whole project was to keep all the introductions of code the simplest as possible, considering that we were trying to accelerate an algorithm already computing for the simplest calculations. Therefore, we are sure that there are more sophisticated formulas out there to compute the threshold filter but that they may involve more computing time.

(27)

Chapter 4 Neighborhood search

Let’s forget for a moment the implementation behind AMAT algorithm and consider only the topology of medial axis transform: the skeleton of a shape should be a continuous line, so why couldn’t the AMAT algorithm follow the same affinity? From this question we started our second expansion, the neighborhood search. We thought about a method that could merge with the greedy set cover method for using both methods when needed.

We need a method that could use the disk costs from the greedy algorithm avoiding too many new calculations, a new method of comparison between the near disks and a proceeding method. For the comparison method of the disks we needed a fast method and we focused on what Matlab could offer as the native methods are faster than making them by ourselves. The proceeding method is very important because in a large image we could have a very long skeleton which would induce a long chain of neighbor controls. That means that we really could have a problem of memory storing all the chain. We explored both recursive and iterative methods and chose the best.

4.1 Implementation

The neighborhood search relies on a simple assumption: if we find a suitable disk, in this case it will be a disk with minimum cost, we are composing a skeleton, so we can assume that the center is part of such skeleton and there may be in at least one out of eight nearest pixel another center that will also behave to the skeleton of the same object.

Following this path, we can make two hypothesis:

• we are not covering anymore in a casual way but in a more consistent way; • following the skeleton we should be able to create a more slim skeleton that

will lead to an even more compress AMAT.

Our approach is very similar to the greedy cover method with some slightly important differences. Firstly, we must find some possible solutions for the nearest

(28)

pixels: we select for each center the disk with minimum cost, so we are aware that it will introduce the best solution for that center. If the selected disk radius is bigger than the original disk radius, we reject it for a simple reason: if it was a usable solution it would had a minor cost of the origin disk and it would be picked in its place from the start. Otherwise, if it is smaller, we will put it in a set after analyzing its similarity.

To check the similarity of the selected disks to the origin disk we discarded the histogram comparison between the disks’ content because of encountering the potential problem of the chessboard where two histograms could be even equals but the content is very different due its composition. Matlab offers a fancy solution for compare two images with a function called ssim [20], standing for structural similarity index.

So, we have this set of suitable disks that may be part of the origin’s skeleton. Instead of pursuing a recursive method that should be of common sense, we were forced to use an iterative approach. The cause is the length of the stack formed by all the calls that can be very long and fill too much memory of repeated variables. Instead we threatened the neighbors set as a priority queue ordering by descend its elements on their radii and on the similarity number from the greedy method picked disk.

4.2 Structural similarity index

The structural similarity index is a concept introduced in 2004 by Wang et al. [24] focusing on full-reference image quality assessment. Before, the simplest and most used metric was the mean squared error (MSE), computed by averaging the squared intensity differences of distorted and reference image pixels, along with the related quantity of peak signal-to-noise ratio (PSNR). Aside being simple to calculate and have clear physical meanings, their downside is to not be very well matched to perceived visual quality.

In fact, natural image signals are highly structured, there are strong dependencies among their pixels, especially when they are spatially proximate and these dependencies carry important information about the structure of the objects in the scene. The previous philosophy was based on Minkowsky error metric, i.e. in pointwise signal differences that are independent of the underlying signal structure. The weak point of these quality measures based on error sensitivity is that they decompose the image using linear transformations without removing the image strong dependencies.

Wang et al. aimed to find a more direct way to compare the structures of the reference and the distorted signals. For example, Figure 4.1 shows the same photo with different types of distortions but all with a MSE of 210.

a) original image;

(29)

(a) (b) (c)

(d) (e) (f)

Figure 4.1: Comparison of the same image with different types of distortion c) mean-shifted, MSSIM = 0.9900;

d) JPEG compressed, MSSIM = 0.6949; e) blurred, MSSIM = 0.7052;

f) image contaminated with salt-pepper impulsive noise, MSSIM = 0.7748. Beside the technical differences we can see that all the images are drastically different from each other. Moreover, if we take the contrast stretched, b), it is difficult with the error sensitivity philosophy to explain why it has a very high quality considering that its visual difference from the original is easily discerned.

So, the structural similarity index takes in consideration three aspects and combine them: the luminance, the contrast and the structure. The resultant is the formula:

S(x, y) = f (l(x, y), c(x, y), s(x, y)) (4.1) The object’s surface luminance is the product of illumination and reflection, they are independent from the shape of an object so we need to separate their influence. Since luminance and contrast may vary in the scene, we use the local luminance and contrast for the definition. We estimate the mean intensity in a discrete signal as µx = _N1 PN_i=1xi, so the luminance comparison function is then a function of µx and

(30)

Then we remove the mean intensity from the signal. In the discrete form we will have the resulting signal x − µx corresponding to the projection of vector x onto the

hyperplane PN

i=1xi = 0.

We need also to define the standard deviation as the estimation of the signal contrast in the discrete form as σx = (_{N −1}1 PN_i=1(xi − µx)2)1/2. Similarly to the

luminance, the contrast c(x, y) is the comparison between σx and σy.

Now that we normalized the signal removing the luminance and the contrast we can compare the structure s(x, y) corresponding to (x − µx)/σx and (y − µy)/σy.

On Figure 4.2 there is a diagram of the quality assessment system, which is built to follow these conditions:

• Symmetry: S(x, y) = S(y, x); • Boundedness: S(x, y) ≤ 1;

• Unique maximum: S(x, y) = 1 iff x = y.

Figure 4.2: Diagram of the structural similarity system

Now that we have all the elements, we can define the components. Given a constant C1 = (K1L)2 to avoid instability where µ2x and µ2y are close to zero, where

L is the dynamic range of the pixel values and K1 1 us a small constant, we

define the luminance comparison function as

l(x, y) = 2µxµy+ C1 µ2

x+ µ2y + C1

(4.2) Similarly, given a constant C2 = (K2L)2 to avoid instability where σ2x and σy2 are

close to zero, where L is the dynamic range of the pixel values and K2 1 us a

small constant, we define the contrast comparison function as c(x, y) = 2σxσy + C2

σ2

x+ σ2y+ C2

(31)

Assume now the two unit vectors defined before, (x − µx)/σx and (y − µy)/σy.

The correlation between these is a simple and effective measure to quantify the structural similarity. Thus, we define the structural similarity as

s(x, y) = σxy+ C3 σxσy + C3

(4.4) where C3 as before is the non zero bond and the discrete form of σxy can be estimated

as σxy = _{N −1}1

PN

i=1(xi− µx)(yi− µy).

Finally, we combine the three comparison equations to have the structural similarity index between signal x and y resulting

SSIM (x, y) = [l(x, y)]α· [c(x, y)]β _{· [s(x, y)]}γ _(4.5)

where α > 0, β > 0 and γ > 0 are parameters to adjust the weight of the components. A simplification of the Equation 4.5, that it is used also by Matlab engine, is to set α = β = γ = 1 and C3 = C2/2 leading to

SSIM (x, y) = (2µxµy+ C1)(2σxy + C2) (µ2

x+ µ2y + C1)(σ2x+ σy2+ C2)

(32)

(33)

Chapter 5 GPU structures

The last improvement we wanted to explore was the use in the algorithm of the GPU structures provided by Matlab and apply some parallelism.

Matlab has an add-on named Parallel Computing Toolbox that helps to write algorithms for resolving problems using multicore processors, GPUs and computer clusters. The powerful aspect is that Matlab provides a large number of basic and not basic methods that are already usable at high level to interact with the toolbox without any configuration.

Although, using the GPU requires a NVidia graphic card because the toolbox relies to CUDA [7] as architecture to interact with the graphic card.

5.1 CUDA

The initial release of the Compute Unified Device Architecture, now officially known by its acronym, CUDA, was in June 23, 2007 by NVidia. Its purpose was to allow programmers to use the approach of GPGPU (General-Purpose computing on Graphics Processing Units).

The GPGPU is a programming approach that aim to perform the operations that are usually computed by the CPU, by the GPU. A GPGPU pipeline can be described as a parallel processing between one or more GPU that analyzes data. In 2001 this computing became more and more popular resulting in the most famous libraries, OpenGL and DirectX, followed then by CUDA which allowed programmers to ignore the underlying graphical concepts in favor of more common high-performance computing concepts. Alongside CUDA it is developed OpenCL (Dec. 2008) which is compatible with other manufacturer graphic cards. Unfortunately, several studies found out that OpenCL is from 13% to 67% slower than CUDA performances [9][11][13].

On Figure 5.1 we can see a simplified example of execution using CUDA: 1. the data from the main memory is copied in the GPU memory;

(34)

3. GPU’s CUDA cores execute the kernel in parallel;

4. the results are stored in the GPU memory and then they are copied back in the main memory.

Figure 5.1: Processing flow on CUDA

However, using the GPU isn’t always a good idea, we introduce parallelism, which is good, but we introduce also latency on moving data from the CPU cache/RAM to the GPU shared memory. Specifications says that the threads should be running in groups of at least 32 for best performance, with total number of threads numbering in the thousands. Branches in the program code don’t affect performance significantly, provided that each of 32 threads takes the same execution path. The SIMD1 _{execution model becomes a significant limitation for}

any inherently divergent task. With those basis, small computations may result in a backfire of performances.

5.2 Implementation

5.2.1 GPU

As said in the introduction of this chapter, Matlab provides high level functionalities for implementing an algorithm to use the graphic card cores. In particular we 1_{SIMD stand for single instruction, multiple data, it is a processor designed to execute the same}

(35)

used the object introduced in Matlab R2010b, gpuArray [2], which is the structure of an array stored inside the graphic card shared memory. At the moment (v. R2019b) there are 424 operations supported by gpuArray, which means that they are computed by the CUDA cores.

In our code we could accelerate the updateCosts function (Listing 2.2), in order to do that we need to convert several variables from a normal array to a gpuArray. We must declare only three variables to propagate the others in the GPU, the array of the disk costs, the matrix with the number of disks that cover each pixel and the one of the new pixels covered in the cover greedy method. Matlab then is smart enough to convert automatically the other arguments of the operations in gpuArray, compute them in the GPU and store the result in a new gpuArray.

5.2.2 Parallel execution

In the wave of the conversion of the execution to the GPU we tried to use the others prefabricated methods of the parallel toolbox to parallelize the operations of some loops that allow it. One of them was our loop of updating disk costs. The automatic tool is parfor [10]. However, it resulted useless due the bonds of the object oriented solution.

Parfor is a delicate function, if there is a variable that isn’t uniquely classified and doesn’t meet category requirements it returns an error. Those classifications are:

Loop Variables Loop indices;

Sliced Variables Arrays whose segments are operated on by different iterations of the loop;

Broadcast Variables Variables defined before the loop whose value is required inside the loop, but never assigned inside the loop;

Reduction Variables Variables that accumulates a value across iterations of the loop, regardless of iteration order;

Temporary Variables Variables created inside the loop, and not accessed outside the loop.

So, reading or writing a class attribute, which are nearly all the variables used inside the loop of updating costs, results in a violation of the classification and producing an error. The solution may be to define a pool of threads and construct manually a function that run different cycles in parallel. Unfortunately, this hypothesis wasn’t tested due the lack of time at the end of the project.

(36)

(37)

Chapter 6 Tools of use

Before starting the expansion with the new improvements, we did a re-factory of the code. The project initially revolved around a unique file in which there was the AMAT class that had all the methods inside of it. Besides being incomprehensible due a lack of documentation, the code was too long to retrieve the methods when needed.

We rethought all the fragmentation using a relatively new feature of Matlab: we moved the class inside of a folder and moved the class methods out in independent files. Also, since AMAT support the cover for disks or squares we made a module system, creating a class for disk and another of squares. By this way in a future where there may be needed a new shape it will be relatively easy to implement and link it to the AMAT class.

We took a general rule for make methods, which is the way most programmers use nowadays: we fragmented the methods with few lines using the philosophy of ”one method does one thing”. Also, the methods are sort of modules themselves: we wanted to avoid ending having too many files, so we decided to make the needed method for the class in independent files and those methods would call smaller methods inside their file. The scope of those auxiliary methods is to be visible only for the methods inside their file.

All this work really helped to improve the organization and the speed of realization of the whole expansion.

Alongside the development of the new functionalities we developed some debug tools. The first is the one that allow to export in a GIF animation file the cover. Every new disk selected it adds to the GIF a new frame with the new status of the selected disk, the cover, the CIELAB of the axes and the radii. We added the generation of some log files in csv. For the neighborhood search we logged the disk selection of the greedy loop and its neighbors with the minimum costs and their relatives’ radii and coordinates. For the pyramid reduction we added a log when converting the cover from a level to another saving the point’s information, coordinates, radius, cost and threshold filter calculus. In addition to the last log we generate also a graph that we can see in the results (Section 7.2) in which we plot the centers of the previous level cover coloring them with a gradient of the percentiles

(38)

of the filter number with a scaled radius and the original image in the background to localize the points.

6.1 Matlab

The original developed AMAT algorithm was made, as said in the previous chapters, in Matlab. This held the choice to follow this path and continue using that programming language.

Matlab strength is to be a mathematic language because it has many instructions already written, for example the 2D convolution that it is callable with an instruction. This approach leads to more efficiency on heavy operations because they are native and written in a lower level, which means a lighter and faster code. However, the downside happens when we try to export the source code as standalone software which will be heavy because Matlab will include all these functions.

Along with the original development Tsogkas tried to implement the algorithm in C++ but the results were slower than the Matlab solution [22].

6.1.1 Parallel Computing Toolbox

Parallel Computing Toolbox is a Matlab add-on that provides some high level parallel functions and integration with the graphic card. We used this add-on for Chapter 5 where we used GPU structures, such as gpuArray. The GPU integration, however, works only on NVidia graphic cards because Matlab needs the installation of NVidia Cuda development (Section 5.1) as base to communicate with the graphic card.

6.2 Tests

Luckily, we were able to use computers that mounted NVidia graphic cards. For the development and tests we used a Sony laptop with a processor Intel I7-3520M, 8GB of RAM DDR3 and a overclocked 1GB NVidia GeForce GT 640M LE. As it was not so fast and the components were somehow outdated, we used a far more powerful and recent configuration for the performance tests, a computer with processor Intel I7-8700, 16GB of RAM DDR4 and a 6GB NVidia GTX 1060.

The two computers equipped in the final runs Windows 10 pro 1903, Matlab R2019a (Mar. 20, 2019) and NVidia CUDA 10.1.243 (Aug. 16, 2019).

(39)

Chapter 7 Results

After discussing about the techniques we used in AMAT it is come the moment to evaluate the correctness of a practical use of them. Firstly we must give a basis of comparison using the plain algorithm as it is and then we can start use the three methods. The expectations were quite high, however we will see that the results don’t do honor to all the work done.

For basis of our tests we used the image of the Google logo. The Figure 7.1 was used for demo demonstrations and we will report here the various executions. The algorithm was initially tested on BSDS500 dataset and we used a subset of 28 out of 200 photos to test our extensions. For simplicity we will analyze in detail and report the data of the Google logo executions and then we will comment the runs on the dataset reporting the results of three samples and focusing on time, accuracy and differences noticed in the behavior of the algorithm.

Figure 7.1: Image used as basis for our tests

7.1 Original execution

In order to test the quality of our improvements we need to set a reference and we executed the brand new algorithm without active extensions. We can see the result in Figure 7.2. The original image was of dimensions 512x512 pixels and we applied a cover from radii 2 to 70. We can do some visual considerations: the skeleton is

(40)

quite thick, so it gives us some margin of compression and the reconstruction is almost the same noticing some smudge on the very acute angles. From Table 7.1 we can see the time of one average execution used as reference. The execution with the original algorithm is almost of 12 minutes with the configuration we used (Section 6.2) and the most expensive function is the updateCosts which is called ∼ 30.000 times. Inside updateCosts the convolution operation that we discussed in the earlier chapters is called more than two million times. We can see how the simplification of that piece of code would influence so much the whole execution occupying more than half of the time.

Figure 7.2: Final results of the execution with the original algorithm

Function Name Calls Total Time Self Time

AMAT.compute 2 707.418s 0.010s AMAT.setCover 1 632.441s 236.594s AMAT.update 29398 385.569s 21.035s AMAT.updateCosts 29398 364.534s 364.534s AMAT.compute >setLevelParams 1 48.285s 0.002s AMAT.setCoverParams 1 48.282s 0.004s AMAT.calculateDiskCosts 1 47.145s 0.095s AMAT.calculateDiskCosts >computeCosts 1 47.050s 0.001s Disk.computeCosts 1 47.048s 47.046s AMAT.compute >setMainAttributes 1 24.451s 0.001s AMAT.compute >calculateDepth 1 24.331s 20.696s AMAT.getPointsCovered 58796 13.175s 2.342s Disk.Disk >Disk.getArea 58796 10.833s 10.833s AMAT.computeReconstruction 1 2.231s 2.174s AMAT.computeEncodings 1 1.124s 0.002s Disk.computeEncodings 1 1.002s 0.999s ... ... ... ...

Table 7.1: Profiler for orginal execution

Unlike the Google logo execution (Figure 7.2) that represented very defined shapes, on Figure 7.3, Figure 7.4 and Figure 7.5 we have real photos from the

(41)

BSDS500 dataset. In these cases we have a fragmented trace because there many tones but the reconstructions are really good. The input images are all 321x481 pixels (481x321 if horizontal) and covered with a set of radii from 2 to 40 pixels.

Figure 7.3: Sample execution of the original algorithm

7.2 Pyramid reduction

We had many problems with the pyramid reduction and in the end we couldn’t arrive to a satisfactory result. Beside of the incorrectness of the skeleton, the algorithm is a speed up of the original, cutting in our demo more than half of the time, five and half minutes against the original twelve. As we can see in the profiler of Table 7.2 the algorithm did only a little more than half of the disk cost updates and we have to remind that these calculations were much simpler than the original because in the original we could have convolutions of 70x70 matrices and in that run at most 5x5 for the first level.

However, in the progresses of Figure 7.6 we see from the third level that the skeleton starts to become noisy finishing with a large noisy skeleton. Our hypothesis is that this problem is due the blurring of the edges of the Gaussian filter used to make the pyramid. So far there isn’t a real solution in the construction of the

(42)

pyramid because adopting another filter would respond with a worse quality of the level’s images, leading the search of a solution in the place of the cover’s conversion from level to level.

We adapted the filter function to be more permissive for small radii (see equation in Section 3.3), however this problem is caused by the noise that from a level to another disappear and to correctly cover it we should use a disk of radius bigger than the permitted ones in that level.

In Figure 7.7 we have a visual representation of the filter values’ percentiles disposed onto the original image (gray-scaled for clarity) when converting the cover to the next level. The blue dots represent a value closer to zero and the minimum percentile, going up to green, yellow, red and finishing with brown representing the higher values. We can appreciate that the filter works fine but we are forced every time to select small disks to finish the cover of the image.

Function Name Calls Total Time Self Time

AMAT.compute 6 330.574s 0.088s AMAT.convertSmallerCover 4 203.208s 72.697s AMAT.compute >setMainAttributes 1 101.337s 0.012s AMAT.compute >calculateDepth 1 101.175s 85.701s AMAT.updateCosts 158144 70.328s 70.328s AMAT.convertSmallerCover >getBiggerCosts 4 47.838s 0.037s AMAT.calculateDiskCosts 9 46.640s 0.110s AMAT.calculateDiskCosts >computeCosts 9 46.530s 0.005s Disk.computeCosts 9 46.525s 46.519s AMAT.getPointsCovered 280364 34.498s 6.390s Disk.Disk >Disk.getArea 280364 28.109s 28.109s AMAT.setCover 5 19.080s 3.864s AMAT.update 14571 13.179s 7.492s AMAT.computeReconstruction 1 6.456s 6.150s AMAT.computeEncodings 9 1.497s 0.008s Disk.computeEncodings 9 1.075s 1.069s ... ... ... ...

(43)

(a) 1st level: 32x32px, scales: {3, 4, 5}

(b) 2nd level: 64x64px, scales: {3, 4}

(c) 3rd level: 128x128px, scales: {3, 4}

(d) 4th level: 256x256px, scales: {3, 4}

(44)

(f) Final result with reconstruction, original dimension: 512x512px, scales: {2, ..., 70}

Figure 7.6: Execution of the levels in pyramid reduction

Figure 7.7: Filter values’ percentiles of covered disks centers when converting from smaller level to bigger

(45)

Similarly to the first execution, we encounter the same behavior for Figure 7.8, Figure 7.9 and Figure 7.10. The final reconstructions recall something of the original image but they are too blurred and the skeletons are an average of two times bigger than the original execution. Alongside to these three samples we run the same execution with other 25 photos from the BSDS500 dataset. The execution times fluctuated from being 60 to 65% faster than the simple executions and the skeleton sizes reflected the three samples of being circa two times the simple counterpart.

(46)

(47)

(48)

7.3 Neighborhood search

In Chapter 4 we explained why we chose the structural similarity index instead of other solutions, however as we can see in Table 7.3 that make a cover searching into the neighborhood revealed to be more slower than the classic cover by 70-80%. In our demo we can see that we got a total of 24396 neighbors (getSelectedNeighbors calls) out of / 60000 candidates (ssim calls minus getSelectedNeighbors calls), which means a lower bound compatibility of 40% of a center with one of its adjacent.

We can also see that in the demo at the end of the greedy algorithm there were only 37 axes. This means that if we are able of making a large disk cost update only at the end of the search we will have only 37 calls to that method instead of 24 thousands, which would lead to a harder calculation but with a minor overlapping and our hypothesis is that it should significantly accelerate the process.

Also, the reconstruction as we can see on Figure 7.11 is a little more smudged in the edges than the original approach. However, in the medial axes and radii sections of the results we noticed a slimmer skeleton which means that we gained compression losing some quality. By MAT correctness theoretically we lose the definition of the smaller ramifications but in a practical meaning it has small influence in the overall result. Between the original medial axes demo and this demo there is a compression ratio of 47.5%, going from 29398 pixels composing the skeleton to 13976.

(49)

Function Name Calls Total Time Self Time AMAT.compute 2 1210.011s 0.013s AMAT.setCover 1 1152.890s 0.323s AMAT.coverNeighbors 37 1150.810s 4.439s AMAT.coverNeighbors >getSelectedNeighbors 24396 916.232s 18.236s ssim 84912 550.465s 12.697s

ssim >@(X)imgaussfilt3(X, radius,

’FilterSize’, filtSize, ’Padding’, ’replicate’) 424560 524.078s 1.847s

imgaussfilt3 424560 522.232s 7.322s imgaussfilt3 >spatialGaussianFilter 424560 401.230s 10.178s imfilter 1273680 385.473s 120.692s AMAT.coverNeighbors >getCoveredArea 86646 317.230s 294.051s AMAT.updateCosts 24396 221.577s 221.577s AMAT.update 13976 202.951s 4.236s

imfilter >parse inputs 1273680 175.989s 64.638s

validatestring 4245632 126.750s 36.349s stringToChar 1783156 97.545s 62.466s imgaussfilt3 >parseInputs 424560 79.331s 46.920s AMAT.getPointsCovered 211379 59.736s 8.301s validatestring >checkString 4245632 57.754s 57.754s Disk.Disk >Disk.getArea 211379 51.435s 51.435s AMAT.compute >setLevelParams 1 49.779s 0.003s AMAT.setCoverParams 1 49.775s 0.011s AMAT.calculateDiskCosts 1 48.263s 0.096s AMAT.calculateDiskCosts >computeCosts 1 48.167s 0.006s Disk.computeCosts 1 48.162s 48.160s ... ... ... ...

(50)

Like for pyramid reduction, we executed the neighborhood search to a subset of the BSDS500 of 28 photos. On Figure 7.12, Figure 7.13 and Figure 7.14 we have a sample of that subset. Even if the mean trace is 24% larger than the simple execution, we can see in the three samples that the skeletons are more clustered in respect to the original run. However, that clustering is not worth a main usage because the timing went up to more than eight times the original.

Figure 7.12: Sample execution of neighborhood search method

After seeing the behavior of the neighborhood search on different types of images we can draw some conclusions: its usage is very useful when we are trying to cover an image that has very defined bounds, like a logo, because the resulting skeletons are more compressed. On the other side, when we have a not very well defined image the skeletons result thicker but they are still useful as they could be used for a clustering system opening a world of possibilities to make a defined MAT for ambiguous shapes. Finally, this method has a good potential but it has a fundamental limit in its execution time that is really slow in respect to a simple execution.

7.4 GPU structures

The execution using the GPU was quite surprising because we got results in the total opposite direction of what we were expecting. The results (Figure 7.15) are

(51)

almost the same as the simple execution, as expected.

The things went really bad when we look at the profiler of Table 7.4. The single cover of that image took more than 8.5 hours to complete the task. The most expensive method is still the one that update the costs. We should focus on two rows that could explain why it was so slow: gpuArray.subsindex and gather. These two functions help reading variables in the graphic card shared memory. Even if they don’t have a real impact on the execution, collecting only 31 seconds both, they are called 293980 times, exactly ten times more than the updates. The feeling is that Matlab needs to copy the GPU variables to the memory, check something and then it can do the operation using the GPU. During the execution we looked at the graphic card resources, the shared memory was constantly used without going full, we saw that it grown at the beginning when happens all the allocation and then remained constant. The graphic cores were used on few times with a very small duration. In conclusion, the algorithm used the graphic card but it didn’t exploit its full potentiality being blocked by something internal of Matlab.

We didn’t test the execution on the BSDS500 dataset because the cover on the logo is simpler than any of the photos provided and it spent almost half day for a run. That means that on a photo it could spend more than 12 hours and even with a execution we confuted our hypothesis of the benefits of execute the algorithm using the graphic card with Matlab.

(52)

Function Name Calls Total Time Self Time AMAT.compute 2 31217.156s 0.009s AMAT.setCover 1 31153.206s 67.420s AMAT.update 29398 31027.489s 21.533s AMAT.updateCosts 29398 30981.746s 30981.746s AMAT.getPointsCovered 58796 57.352s 1.849s Disk.Disk >Disk.getArea 58796 55.503s 55.503s AMAT.compute >setLevelParams 1 48.842s 0.001s AMAT.setCoverParams 1 48.841s 0.001s AMAT.calculateDiskCosts 1 47.711s 0.157s AMAT.calculateDiskCosts >computeCosts 1 47.554s 0.000s Disk.computeCosts 1 47.554s 47.550s gpuArray.subsindex 293980 28.620s 26.070s AMAT.compute >setMainAttributes 1 12.967s 0.001s AMAT.compute >calculateDepth 1 12.843s 9.319s gather 293980 2.550s 2.550s AMAT.computeReconstruction 1 2.132s 2.076s AMAT.computeEncodings 1 1.122s 0.001s Disk.computeEncodings 1 0.997s 0.994s ... ... ... ...

(53)

Chapter 8 Conclusion

From this project we can appreciate the vastness of the computer vision world. The original AMAT algorithm works well but we saw that there are so many evolution that there is plenty of choice, like open a Pandora box, fragment the computation, different search of solutions, compute with different hardware, change cover object, concurrent computation, etc.

The failure is part of the process as we saw in the results. Surely, we could approach the pyramid reduction results in different ways: the one that looks at the reconstruction, with a satisfactory result; the other that looks at the skeleton which is totally wrong and finally the one that looks at a new opportunity, use the pyramid reduction to make approximate skeletons with a limited range of big radii without make conversions at the beginning, before starting the covering process.

This speech has its validity, the quality of the result is made by the degree of satisfaction for its purpose. Nonetheless the objective truth of the pyramid reduction algorithm is that it needs more development because it doesn’t give a nearly optimal result of the skeleton which is the principal objective for the Appearance Medial Axis Transform. We think that the solution to get a satisfactory result is very near to the solution proposed in this thesis and therefore our algorithm should be used as basis for future works until somebody will arrive to an optimal result.

In the neighborhood search we saw its potential: sacrificing some of the reconstruction details we obtain slimmer skeletons, which is good, however we didn’t accelerate the algorithm, producing a half victory. We struggled so much along the project searching simple methods that could improve the time of execution realizing how much the original algorithm was nearly to the bone.

We saw from the results and from the introduction to AMAT how all rounded about the update of the costs when we find a new disk covering and in particular the operation of convolution. A future work should try to find a generalization, something to speed it up or a simpler method that could replace it partially or even totally. With the pyramid reduction the method it is faster, computing only small radii; on the other side, with the neighborhood search it should be possible to make only one update at the end of all the path built with the structural similarity index. It would be a bigger operation, but it would be more compact without updating

Speed-up for AMAT implementation using pyramid reduction

Master’s Degree

in Computer Science

Final Thesis

Speed-up for AMAT implementation

using pyramid reduction

Contents

Chapter 1

Introduction

Chapter 2

History

2.1

Medial Axis Transform

2.1.1

Definition

2.1.2

Shock graphs

2.2

Appearance Medial Axis Transform

2.2.1

Definition

2.2.2

Set cover problem

2.2.3

Complexity

Chapter 3

Pyramid reduction

3.1

Theory

3.2

Construction of the pyramid

3.2.1

Number of levels and their scales

3.2.2

Pyramid

3.3

Implementation

Chapter 4

Neighborhood search

4.1

Implementation

4.2

Structural similarity index

Chapter 5

GPU structures

5.1

CUDA

5.2

Implementation

5.2.1

GPU

5.2.2

Parallel execution

Chapter 6

Tools of use

6.1

Matlab

6.1.1

Parallel Computing Toolbox

6.2

Tests

Chapter 7

Results

7.1

Original execution

7.2

Pyramid reduction

7.3

Neighborhood search

7.4

GPU structures

Chapter 8

Conclusion