Autonomous obstacles map creation for UAVs in unknown environments using Visual-SLAM

(1)

DIPARTIMENTO DI INGEGNERIA DELL’INFORMAZIONE CORSO DI LAUREA MAGISTRALE IN

INGEGNERIA ROBOTICA E DELL’AUTOMAZIONE

Tesi di laurea

Autonomous obstacles map creation

for UAVs in unknown environments

using Visual-SLAM

Relatori:

Candidata:

Prof. Mario Innocenti

Giulia Carmignoli

Prof. Lorenzo Pollini

Ing. Andrea Bracci

(2)

In order to develop autonomous robotics devices with the aim of assist humans in their everyday life and not only, autonomous unmanned vehicles play an important role. In particular UAVs, i.e Unmanned Aerial Vehicles is taking hold in different fields. One of this consists on exploration of unknown environments, where is nec-essary that the vehicle localize itself and takes decisions about path planning. This argument is widely deal in literature, but many solutions give too unpolished final results, or provide an obstacles map unusable to plan trajectories on it.

So that we present an algorithm which is able to generate a suitable discretized 2D levels map starting from images camera acquisitions, exploiting SLAM algorithms.

(3)

Chapter 1 Introduction

1.1 Work context

Unmanned Aerial Vehicles (UAVs) are part of a research area which is growing increasingly. They are used in military and civilian domains for surveillance, recon-naissance, mapping, cartography, inspection, homeland security, search and rescue, fire detection, agricultural imaging, traffic monitoring, to name just few applica-tion domains. The term Unmanned Aerial Vehicle (also known as drone) refers to a pilot-less aircraft, a flying machine without an on-board human pilot. As such, ’unmanned’ refers to total absence of humans who directs and actively pilots the aircraft. Control function may be either on-board, or off-board (remote control).

The earliest recorded use of a UAV dates back to 1849 when the Austrians attacked the Italian city of Venice using unmanned balloons that were loaded with explosives. Although balloons would not be considered a UAV today. In 1915, British military used aerial photography to their advantage in the Battle of Neuve Chapelle. They were able to capture more than 1500 sky view maps of the German trench fortifications in the region.

(6)

War in 1916 and created the first pilot-less aircraft. Those early UAVs were very unreliable and inaccurate, and only a handful of people recognized at that initial stage their potential and future impact on the society. During the Second World War, Reginald Denny created the first remote-controlled aircraft called the Radio-plane OQ-2. This was the first massed produced UAV product in the U.S and was a breakthrough in manufacturing and supply drones for the military. More recently, in 1990 miniature and micro UAVs were introduced and in 2000 the U.S deployed the Predator drone in Afghanistan while searching for Osama Bin Laden.

Although many of the most notable drone flights have been for military pur-poses, technology is continuing to advance and receive more attention. Nowadays the demand is still considerably high for defense and military applications, however commercial applications are poised to gain popularity. For example retailing and e-commerce industries are working on making deliveries in shortest possible time using GPS programmed drones. In 2015, Amazon Inc. recommended the allocation of a dedicated airspace to the concerned authorities for the operation of these ad-vanced delivery UAVs to deliver goods to the customers.

As quoted in the articles "Commercial UAV Market Analysis...", the global com-mercial UAV market size was valued at USD 500 million in 2014 and is expected to grow at a CAGR of approximately 17% over the forecast period (until 2022).

The exploration of unknown environments done by an autonomous mobile robot is helpful to perform searching and rescuing missions in places too remote or danger-ous, unreachable or potentially harmful to humans. Some examples of such locations can be a calamity-hit area, inhospitable mountainous regions, or battlefields. In all those scenarios is great could send an autonomous mobile robot instead of people.

(7)

1.2 – Problem definition

1.2 Problem definition

The aim of our work is to provide the UAV with an autonomous capability of gen-erate an obstacles map moving within an unknown environment, while at the same time it localizes itself and navigates making decisions based on the new environment data received.

The working scenario is an unknown environment, where the UAV has to nav-igate, from a starting GPS coordinate, to a target one. Taking into account the possibility of navigate in an indoor environment, the GPS signal will be lost. So that, the UAV is equipped with a stereo camera and the localization of the vehicle only uses images acquisitions as data inputs.

This algorithm doesn’t consist on a simple obstacles avoidance problem. Indeed the vehicle has to recognize its own position respect a starting point, assumed as the origin of global coordinates system. Along the path, the algorithm generates the obstacles map and it is able to detect if the UAV comes back to a point where it has already been. So it can perform a sort of loop detection.

This work merges two important aspects of research, that is the localization and the planning. In particular we focus on the generation of a suitable 3D map discretized in altitude, starting from a stereo-set camera images. The map can be used in real-time as input for a 2D path planning algorithm.

(8)

1.3 Document structure

After this brief introduction, Chapter 2 describes the state of the art about Local-ization and Mapping problems, comparing different existing methods. The main contribution of this work is detailed in Chapter 3. Where, after a short introduc-tion on the SLAM technique, used as localizaintroduc-tion method, is described how camera acquisitions are processed and transformed into a global obstacles Map, which is updated in real-time. Chapter 4 consist on a report of some results obtained from indoor and outdoor data-set. Finally Chapter 5 reports the conclusion of the work and some hints for possible future works.

(9)

Chapter 2 State of the art

Many researches were conducted to make the UAV able to explore without any prior knowledge on environment it probes, increasing the estimation accuracy and reduc-ing the error occurs durreduc-ing exploration. Several techniques have been introduced in finding the suitable one to solve challenges faced in autonomous mobile robot. Normally, the technique used only focuses on one of the robot functionality that is either mapping or localization process.

There are several methods for localization, but a key factor of the problem’s complexity, depends on the target environment of operation, that in our specific case is unknown. The most prevalent navigation method used for unmanned tasks today is GPS (Global Positioning System). GPS requires an unobstructed line of sight from the position of the object/robot to four or more GPS satellites. Hence, it is available and usable only when there is a clear sky. It fails to operate where it is impossible or difficult to receive satellite signals, and even when it is present, its accuracy is insufficient for precise unmanned navigation and mapping. An alter-native technique is LIght Detection And Ranging (LIDAR). It is used to efficiently create digital terrain models, with high accuracy. This technique is especially useful

(10)

Other alternatives consist on localization using odometry [2], it has inherent flaws such as errors caused by wheel slippage on ground based vehicles. Platforms like Unmanned Aerial Vehicles (UAVs) have built-in odometry capabilities that can be affected by drift. Landmark-based localization is an ideal supplement to odometry and GPS. Recognition of landmarks such as tags or terrain using cameras can

pro-vide localization data [3]. Laser scanner, sonar and other sensors can be mounted on

board the vehicle as explained in [4]. But our aim is to develope a passive solution

available for a potentially wide range of different situations, including indoor and outdoor environments. Simultaneous Localization And Mapping (SLAM) is a tech-nique applied in artificial intelligence mobile robot for a self-exploration in numerous geographical environments. SLAM is concerned with the problem of building a map of an unknown environment by a mobile robot while at the same time navigating the environment using the map. The use of SLAM problems can be motivated in two different ways: one might be interested in detailed environment models, or one might seek to maintain an accurate sense of a mobile robot’s location. It was

origi-nally developed by Hugh Durrant-Whyte and John J. Leonard [5] based on earlier

work by Smith, Self and Cheeseman [6] that introduces the concept of

implement-ing the estimation of spatial uncertainty. Durrant-Whyte and Leonard originally termed it SMAL but it was later changed to give a better impact. Their algorithm implements an Extended Kalman Filter method in solving the SLAM problem. Us-ing a probabilistic method to limit the impact of inaccurate sensor readUs-ing on the accuracy of the resulting map of mobile robot [7]. In 2001, [8] proposes a substantial implementation of the SLAM algorithm on a vehicle operating in an outdoor envi-ronment using millimeter-wave (MMW) radar to provide relative map observations.

In 2002, M. Montemerlo introduce the FastSLAM algorithm [9]. This is one of the

(11)

hybrid approach which integrates Particle Filter and Extended Kalman Filter ap-proaches. Such implementation gives FastSLAM popular for it higher data accuracy.

A year later, FastSLAM evolves to a second version called FastSLAM 2.0 [10]. The

modification done in FastSLAM 2.0 is that the proposed distribution must relied in both previous pose estimation and also the actual measurement of the mobile robot, while first version of FastSLAM only relied on previous pose estimation of the mobile robot. Most of the introduced SLAM algorithms address a single-robot

mapping and dealing with static environment [11].

SLAM consists of multiple parts: landmark extraction, data association, state estimation, state and landmark update. There are many ways to solve each of the smaller parts. Figure 2.1 show the basic concept of how the autonomous mobile robot works.

Figure 2.1: General steps performed in autonomous mobile robot.

In order to have a clearer framework of the existing solutions, a possible clas-sification methodology of the SLAM algorithms can be done based on the method use to extract landmark, and on the sensors used to acquire images. Our input

(12)

sensor is a camera, so we focus on methods with this type of data, they are sub-stantially divided on monocular or stereo acquisitions. About landmark extraction and place recognition there are essentially two different approaches. The first one consists on direct method, which optimizes directly over pixel intensities instead of feature reprojection errors. And the second one is a feature-based method. We can cite a monocular, direct method [12], and other stereo, directs [23], [13], [14].

Recent real-time monocular SLAM algorithms such as DTAM [24] and LSD-SLAM

[12] are able to perform dense or semidense reconstruction of the environment, while

the camera is localized by optimizing directly over image pixel intensities. There direct approaches do not need feature extraction and thus avoid the corresponding artifacts. They are also more robust to blur, low-textured environments and

high-frequency texture like asphalt [25]. Their denser reconstructions, as compared with

the sparse point map of other system or PTAM, could be more useful for other tasks than just camera localization. However, direct methods have their own limitations. First, these methods assume a surface reflectance model that in real scenes produces its own artifacts. The photometric consistency limits the baseline of the matches, typically narrower than those that features allow. This has a great impact in re-construction accuracy, which requires wide baseline observations to reduce depth uncertainty. Direct methods, if not correctly modeled, are quite affected by rolling-shutter, autogain, and autoexposure artifacts (as in the TUM RGB-D Benchmark). Finally, because direct methods are, in general, very computationally demanding, the map is just incrementally expanded as in DTAM, or map optimization is reduced to a pose graph, discarding all sensor measurements as in LSD-SLAM.

In contrast, feature-based methods are able to match features with a wide baseline, thanks to their good invariance to viewpoint and illumination changes. In the

con-text of structure and motion estimation, Torr and Zisserman [26] already pointed

(13)

acquisitions and feature based methods as [15] which demonstrates real-time per-formance on a large-scale outdoor environment, or [16], [17]. [18] presents a system which automatically detects and recovers from tracking failure while preserving map

integrity. Many others based on stereo acquisitions and feature extraction, as [19],

[20] where by means of a dense scene flow representation, moving objects can be

detected. [21] provide to navigate in large environments without restrictions using

a cheap wide-angle stereo camera.

Another design choice is about which feature extract from images. Popular

descriptors are SIFT, Scale-Invariant Feature Transform [27], SURF, Speeded Up

Robust Features [28], or recent A-KAZE, Accelerated-KAZE (Japanese word,

mean-ing "wind")[29], our goal is to find features that need the shortest possible time to be extracted. To obtain general place recognition capabilities, we require rotation in-variance, which excludes BRIEF, Binary Robust Independent Elementary Features

[30] and LDB, Local Difference Binary [31]. Another descriptor is ORB [32],

Ori-ented FAST and Rotated BRIEF. Indeed, it is basically a fusion of FAST keypoint detector and BRIEF descriptor. They are extremely fast to compute and match, while they have good invariance to viewpoint. The good performance of ORB for place recognition is shown in [34].

In order to improve and optimize the mobile robot performances in terms of landmark and robot location estimation, is appropriate to introduce Bundle

Adjust-ment (BA) algorithm [35]. BA is known to provide accurate estimates of camera

localizations as well as a sparse geometrical reconstruction [36], [37], given that a strong network of matches and good initial guesses are provided. It is a process which adjusts iteratively the pose of cameras as well as points position in order to obtain the optimal least squares solution. For a long time, this approach was con-sidered unaffordable for real-time applications such as visual SLAM. Now, we know that to achieve accurate results at non prohibitive computational cost, a real-time

(14)

SLAM algorithm must employ BA. The first time application of BA was the visual

odometry work of Mouragnon et al. [38], followed by the ground-breaking SLAM

work of Klein and Murray [39], known as parallel tracking and mapping (PTAM).

This algorithm, while limited to small-scale operation, provides simple but effective methods for keyframe selection, feature matching, point triangulation, camera lo-calization for every frame, and relolo-calization after tracking failure. Unfortunately, several factors severely limit its application: lack of loop closing and adequate han-dling of occlusions, low invariance to viewpoint of the relocalization, and the need of human intervention for map bootstrapping.

As explained in [38] the problem of reconstruction consists on finding the 3D

model of the environment, by using only the recorded data. The problem often takes SFM denomination for Structure From Motion, which was the subject of many works in vision. There are several approaches for SFM algorithms. The methods without global optimization of the full geometry are fast but their accuracy is questionable, since errors accumulate in time. Among those works of Vision-Based SLAM, Nister [40] presents a method called "Visual Odometry". This method estimates the move-ment of a stereo head or a simple camera in real time from the only visual data.

The SFM approach builds 3D models from large image collections or videos [41].

The SFM approach is unsuitable for our problem, considering the short available computing time and the aim of the real-time mapping. The best solution consists on employing a SLAM algorithm, where the process of mapping and localization is done concurrently and recursively. Among the different methods found in the literature, the one which offers best performance, tested on data recorded benchmarks is the

ORB-SLAM. We used the SLAM presented in [33], that’s a feature-based method

that uses ORB descriptor and stereo cameras acquisitions. We preferred to use a stereo-set instead a simple monocular because it prevents the onset of problems

(15)

connected with the depth estimation. Indeed depth is not observable from just one camera, the scale of the map and estimated trajectory is unknown. In addition the system bootstrapping require multi-view or filtering techniques to produce an initial map as it cannot be triangulated from the very first frame. Last but not least, monocular SLAM suffers from scale drift and may fail if performing pure rotations in exploration. By using a stereo or an RGB-D camera all these issues are solved and allows for the most reliable Visual SLAM solutions.

Our target is to navigate in an unknown environment performing a collision avoidance so isn’t required a too careful detail level of the scene. This means that we aren’t interested in a painstaking reproduction of the objects seen by the UAV. The developed algorithm is thought to be precautionary, but ensuring a faithful representation of the shapes of the obstacles.

Methods for 3D reconstruction abound in the literature. Reconstruction from single images requires geometrical constraints on the 3D structure of the observed objects.

For example in [42], the approach described is based on three types of constraints:

coplanar points, perpendicularity of directions or planes and parallelism of directions or planes. In other approaches moving obstacles are detected through ray-casting on spherical voxelization of point clouds [43], but it is too approximate for our purposes. This work proposes a solution to generate an obstacles map. While the UAV moves in the unknown environment, the global altitude levels quantized map is real-time updated adding on it the respective 2D contours of the 3D obstacles seen from the stereo-set camera.

(16)

(17)

Chapter 3 Global obstacles map

3.1 Slam algorithm, input output definition

Our vehicle navigates in an unknown environment and it needs to localize itself. Between the existing techniques in literature that deal with localization problem we have chosen to employ a SLAM algorithm. In particular the solution developed by

Raúl Mur-Artal and Juan D. Tardós that’s ORB-SLAM2, [33]. We do this choice

relying on the good performances obtained on some benchmarks. The input of the algorithm is a pair of stereo images, and its output is the estimation of the camera pose and a sparse point-cloud of the environment.

ORB-SLAM2 as a feature-based method pre-process the input to extract fea-tures at salient keypoint locations, as shown in Figure 3.1.

Left Image Right Image Extract ORB Extract ORB Stereo Matching Stereo Keypoints Stereo Keypoints Rectified Stereo

(18)

The input images are then discarded and all system operations are based on these features, so that the system is independent on the sensor being stereo or RGB-D. The system handles stereo keypoints, which are further classified as close or far. Stereo keypoints are defined by three coordinates xs = (uL, vL, uR), being (uL, vL)

the coordinates on the left image and uRthe horizontal coordinate in the right image.

The algorithm extracts ORB in both images and for every left ORB it searches for a match in the right image. This can be done very efficiently assuming stereo rectified images, so that epipolar lines are horizontal. Then the algorithm generates the stereo keypoint with the coordinates of the left ORB and the horizontal coordinate of the right match, which is sub-pixel refined by patch correlation. A stereo keypoint is classified as close if its associated depth is less than 40 times the stereo baseline, as suggested in [44], otherwise it is classified as far. Close keypoints can be safely triangulated from one frame as depth is accurately estimated and provide scale, translation and rotation information. On the other hand far points provide accurate rotation information but weaker scale and translation information. The algorithm triangulate far points when they are supported by multiple views.

After this pre-processing phase the SLAM starts its localization work, following the flowchart steps shown in Figure 3.2. We try to briefly explain them in the next paragraph.

(19)

3.1 – Slam algorithm, input output definition Map is initialized? Add new Keyframe? Loop detection? Stereo Images Acquisition Pose Prediction Track Local Map

Yes No

Create Keyframe setting its pose to the origin Create initial Map from all

stereo Keypoints

Motion-only BA

Keyframe Insertion New Points creation

Local BA Features matching prev.

Frame

Yes No

Yes

Compute Sim Transf. Loop Fusion Pose graph Optimization

Full BA No MAP MapPoints Keyframes Covisibility Graph PLACE RECOGNITION Recognition Database Visual Vocabulary Pre-process input

Process new Keyframe

(20)

One of the main benefits of using stereo or RGB-D cameras is that, by having depth information from just one frame, we do not need a specific structure from motion initialization as in the monocular case. At system start-up we create a keyframe with the first frame, set its pose to the origin, and create an initial map from all stereo keypoints.

When a new pair of stereo images is acquired the algorithm performs first an initial feature matching with the previous frame and optimize the pose using

motion-only BA (explained in Appendix of [45]). If the tracking is lost (e.g., due to

occlusions or abrupt movements), the Place Recognition module, based on DBoW2

[46], is used to perform a global relocalization. Once there is an initial estimation

of the camera pose and features matching, a local visible map is retrieved using

the covisibility graph [47] of keyframes that is maintained by the system. Then,

matches with the local map points are searched by reprojection, and camera pose is optimized again with all matches. Finally, will be decide if a new keyframe is inserted.

ORB-SLAM2 follows the policy introduced in monocular ORB-SLAM, [45] of

inserting keyframes very often and culling redundant ones afterwards. To insert a new Keyframe, all the following conditions must be met.

1. More than 20 frames must have passed from the last global relocalization. 2. More than 20 frames have passed from last keyframe insertion.

3. Current frame tracks at least 50 points.

4. Current frame tracks less than 90% points than Kref.

The distinction between close and far stereo points allows us to introduce a new condition for keyframe insertion, which can be critical in challenging environments where a big part of the scene is far from the stereo sensor. In such environment we

(21)

3.1 – Slam algorithm, input output definition

need to have a sufficient amount of close points to accurately estimate translation,

therefore if the number of tracked close points drops below τt and the frame could

create at least τc new close stereo points, the system will insert a new keyframe.

The developers empirically found that τt = 100 and τc = 70 works well in all their

experiments.

The new keyframes are processed and a local BA is performed to achieve an optimal reconstruction in the surroundings of the camera pose. New correspondences for unmatched ORB in the new keyframe are searched in connected keyframes in the covisibility graph to triangulate new points.

Every time a new keyframe is added the algorithm searches loops to verify if there is loop closing. If a loop is detected, it computes a similarity transformation that informs about the drift accumulated in the loop. Then, both sides of the loop are aligned and duplicated points are fused. Finally, a pose graph optimization over

similarity constraints [48] is performed to achieve global consistency, and a full BA

is executed to compute the optimal structure.

For completeness we report that the implementation strategy used by the au-thors consists on work with 4 different threads: Tracking, Local Mapping, Loop Closing and BA.

(22)

3.2 Map generation, input output definition

From an input/output point of view, the map generation algorithm can be defined as follows. The inputs are a pair of stereo rectified images and the camera pose at the instant of their acquisition. It is extracted from the keyframe’s information, previously computed by SLAM. The output is the 3D quantized altitude map, that consist on the reproduction of obstacles’ contours on more levels.

Generate Map

Left Image Camera Pose Disparity Map Point Cloud Point Cloud Filtering 3D Clustering From cluster to Mesh Env. discretiz in altitude levels Update global map 3D altitude quantized map Right Image

Figure 3.3: Scheme of Map generation algorithm.

In Figure 3.3 are shown the main algorithm blocks, explained in detail in the following sections. It should be stressed that not all the acquired images are pro-cessed to extract information on the environment, but only those associated with a keyframe. This means that the obstacles map is updated at a frame rate lower than the images acquisition rate. From the stereo pair is extracted the disparity map, used to find the associated point cloud. The point cloud needs to be filtered, to obtain one with a lower number of outliers. The next steps consist on the segmenta-tion of the point cloud and the generasegmenta-tion of the 3D meshes from the resulted single clusters. Finally is performed the discretization of the environment’s altitude with the aim of generate more 2D level maps with the contours of the mapped obstacles.

(23)

3.3 – Developed algorithm

3.3 Developed algorithm

3.3.1 Disparity map creation

In order to reconstruct the surrounding environment of the UAV, we need to extract depth information about the obstacles starting from the images, Figure 3.4. In our case we can simply do it computing the disparity map from the stereo pair images, Figure 3.5.

Figure 3.4: Left side: Left acquired image. Right side: Right acquired image.

(24)

Working with rectified images, the concept of the disparity research consists on open a window of pixels on the right image along the epipolar line of left pixel image. And search inside it the one who best match the left pixel. In particular, the candidate is chosen only if the parameter called Uniqueness Ratio is higher than a threshold. Uniqueness Ratio is a margin in percentage by which the best (minimum) computed cost function value should "win" the second best value to consider the found match correct. Depending on the environment that the UAV has to explore, for example an indoor or an outdoor one, we can be interested on mapping obstacles at different depth from the camera. So also the disparity map will have different features, depending on the explored scenario. For this reason the user can set three parameters to better configure his framework. The parameters are the minimum and the maximum depth of interest and a scale factor used to reduce the resolution of the input images starting from which the algorithm computes the disparity map. As known there is a correspondence between the disparity (D) and the depth (Z) of a pixel. In fact setting the minimum and the maximum desired depth , the other pair is consequently fixed. Being related by the baseline (B) and the focal length (F) according to the following relationship

Z = BF

D

It is obvious that the baseline and the focal length affect the maximum estimated depth, so the settable user parameter should be congruent with it. Indeed being in an extreme case, where the disparity between two pixel is one the estimated depth is equal to their product, from the previous expression.

We can observe that with decreasing disparity, so estimating the pose of far obstacles, the disparity uncertainty (or estimation error) assumes an higher weight on the depth measure. From the previous relation we can extract how the depth change in

(25)

function of disparity changing, that’s simply the derived.

dZ

dD =

−(BF )

D2

When we work with small value of disparity, so estimating the depth of far object, a little variation (dD) of disparity produce a big one of depth. In other words the depth of an object near the UAV is affected by the same estimation error of a far one, but its weight is different on the two measure. So to limit as much as possible the uncertainty effect, we try to fix a lower threshold on the minimum computed disparity.

The effect of different setting conditions, about the maximum depth parameters, so on the minimum acceptable disparity, can be better appreciated on the point cloud, whose generation will be explained in the next section.

(26)

3.3.2 Point Cloud generation

The SLAM algorithm works by means of features extraction from environment’s keypoints. This keypoints are represented in a cloud, but since they belong to the environment, their number is limited and related with the acquired scene. Indeed if we set the number of feature to be extracted at a too high threshold, we risk to extract environment noise instead of real keypoints. So we can’t use the sparse point cloud to generate the desired obstacles map, but we need to create a dense one. In Figure 3.7 is shown a comparison between the sparse and the dense point cloud of a detected object.

Figure 3.6: Left side: sparse point cloud from SLAM. Right side: dense point cloud from our algorithm. Yellow dashed lines: drawn lines to better clarify a correspondence between the two images. Green: Camera tracking. Blue: previously assumed camera pose.

(27)

The point cloud generation works according to the following algorithmic scheme, Figure 3.7.

Point Cloud generation

Disparity

Map Point Cloud

Reproject Image to 3D Disparity Check Altitude Check Insert point into Point Cloud DISCARDED Y Y N N

Figure 3.7: Flowchart of Point Cloud generation algorithm.

The information contained in the disparity map must be reprojected in a 3D space. To do that we need the perspective transformation matrix, named Q, ob-tained by the camera calibration parameters, used to rectify images. To better

understand how it is composed we refer to [51]. The pixel’s information, that are

its position x, y in the image, and its disparity d are used to generate a 3D point in the space, having X, Y, Z coordinates, and a scale factor W . Using the following transformation           X Y Z W           = Q           x y d 1          

To know the position of the point in the 3D space we divide its coordinate X, Y, Z for the scale factor W.

For every disparity map’s pixel we check if its disparity value belongs to the fixed admissible interval, whose bounds were previously computed, according on the maximum and minimum desired map’s depth. To better understand the necessity

(28)

of a fixed minimum disparity threshold as explained in the previous subsection, we show two point cloud, Figure 3.8, 3.9. The first one is computed with an higher maximum depth than the second one, so the disparity threshold is lower in the first case. Indeed we can note that in Figure 3.8 there are more depth points. But moving away from the camera center the density of the cloud becomes increasingly sparse, loosing information about the environment. This highlights that is ill-judged to acquire too deep distances.

(29)

Figure 3.9: Example of a point cloud, acquiring at low depth.

The Figure 3.10 shows in the same picture the two point clouds to understand the different depth of acquisition.

(30)

If the pixel pass the first check, a second one is performed on it. That is to verify if its "altitude" in the 3D local space, so its Y , is included in a chosen interval from the image center. This allow us to generate a point cloud relative to a stripe of the whole acquired image, Figure 3.11, Figure 3.12.

Figure 3.11: Point cloud stripe.

Figure 3.12: Two point clouds. Green: entire point cloud. Pink: selected stripe of the entire cloud.

This is done substantially for two reasons.

The first one is due to the fact that usually the quality of the acquired images is better in the middle and degrades moving toward the contours of it. So selecting a central strip of the image, is more probably that we work with an higher quality information. The second one is related to the size of the data package we have to

(31)

store, and manage. In this way we can collect a lower number of pixel, without loss of information necessary to the environment reconstruction. The choice of the stripe’s size is connected with the dimension of the UAV. In fact we can say that the discarded pixels, don’t damage the navigation as the vehicle keeps enough information around his size. Decreasing too much the dimension of the stripe will lead to blindness of the UAV.

A different solution could be taken about the choice of which points discard. We can think to check if the 3D global point belong to a stripe of plus or minus half meter, respect the global UAV pose, instead a local check as we done. But this approach brings two disadvantages. The first one is that the point can be generated by a border image pixel, so it carries information less precise than one taken in central region of image. The second one is connected with the motion of the UAV. Indeed it can perform a pitch standing still, acquiring image of the environment at at an altitude different from the UAV’s altitude. In this case if the entire acquired scene is situated at a global altitude out of the desired stripe we lost completely the images’ information.

We remember that the generated point cloud is local. This means that the 3D points pose is respect the camera center, and not respect the origin of the global coordinate system, corresponding with camera pose at the beginning of the naviga-tion. Indeed, if we overlap some local point clouds, Figure 3.13, relative at different keyframes, we look that the scene isn’t reconstruct. Because they need to be con-verted all in the same reference system, to show the same element in the same position also if it was acquired in different camera position.

(32)

Figure 3.13: Two different local point clouds (green, blue).

The choice of keep local information of the acquired environment instead global one is due to the fact that while the vehicle navigate the previous estimated poses assumed are iteratively adjusted. So storing local information we can convert the cloud from local to global when we want.

(33)

3.3.3 Point Cloud filtering

In order to obtain a point cloud as "clean" as possible, trying to delete the acquired noise, and outliers, we filter it. The filtering methodology is substantially an algo-rithm of outliers removal. It is based on the analysis of the number of neighbors contained inside a sphere centered in the point belonging the cloud. The algorithm iterates through the entire input point cloud, and for each point, retrieves the num-ber of neighbors within a certain radius. The point will be considered an outlier if it has too few neighbors, as determined by the setting user threshold. We chose the radius of the sphere depending on the point cloud’s resolution. In this way, the filtering is strictly connected with the cloud’s features, and provides best results. A tuning phase on the settable parameters is fundamental to obtain good results. The aim is to find a trade off between have too many outliers and discard also useful points.

The cloud resolution consists on the mean distance between a point and its neighbor inside the cloud. To extract this measure, for each point is taken the distance be-tween it and its nearest neighbor. Summing all the distance and dividing the results for the number of the summed element, we have obtained the cloud resolution. In Figure 3.14, 3.15 are shown two examples of filtered point cloud, placing in the same image the original cloud and the filtered one.

(34)

Figure 3.14: Blue: Filtered point cloud. Red: Original point cloud.

Figure 3.15: Violet: Filtered point cloud. Yellow: Original point cloud.

We can note that moving toward regions where the cloud is more sparse, the filtering algorithm removes a lot of points. Because there aren’t the needed number of neighbors in the sphere centered on it.

(35)

3.3.4 3D clustering

Now we have a dense point cloud, that show a faithful reproduction of the environ-ment, but it is still unfit to give to a path-planning algorithm. Indeed it needs solid obstacles to works, because the vehicle has to find impenetrable surfaces, not simply dense points.

The direct reconstruction of the environment from the point cloud can produce few big solid obstacles, instead of many little ones as they are in real world. So before performing a 3D mesh of the dense point cloud we need to cluster the points, in order to obtain many different single cloud that represent the obstacles. In this way we try to faithfully reproduce the real environment, giving a segmented representation of it. The literature suggests different approaches for segmentation, and they differ mainly in the method or criterion used to measure the similarity between a given set of points and for making the grouping decisions. Once such a similarity measure has been defined, segments can be obtained by grouping together the points whose similarity measure is within given thresholds and which are spatially connected. Many approaches are tailored only for planar surfaces, which are too limiting for an unknown environment exploration. Others require to know first the number of clusters and that’s unfit for our problem.

So we decided to use an algorithm based on region growing method, [49]. In

particular this method uses smoothness constraint, which finds smoothly connected areas in point clouds. It uses only local surface normals and point connectivity which can be enforced using either k-nearest or fixed distance neighbours.

(36)

Point Cloud

Region growing

Segmentation

Max Altitude

Min Altitude

Figure 3.16: Flowchart of the segmentation algorithm.

The method requires a small number of intuitive parameters, which provide a trade-off between under- and over-segmentation. In particular is appropriate to focus on two of this that are smoothness threshold and curvature threshold. The points in a segment should locally make a smooth surface, whose normals do not vary “too much” from each other. This constraint would be enforced by having a

threshold (θth, smoothness threshold) on the angles between the current seed point

and the points added to the region. The second parameter is employed to evaluate cases which may lead to the start of a new segment during the process of region growing. For example an intersection edge is defined by the intersection of two surfaces, whose surface normal at the intersection line make an angle greater than

the given threshold (cos−1_n

1 · n2 > θcurv). An example of such an edge would be

the edge coming from the intersection of the two planar sides of a box.

As final result we have obtained many segmented local point clouds relative the same Keyframe. To better understand the concept of segmentation, we show an example of it. Starting from the point cloud in figure 3.17, we perform the clustering algorithm obtaining the different coloured clusters in Figure 3.18.

(37)

Figure 3.17: An example of point cloud. At the center of the image there are a set of black points, that we can appreciate in the pink cluster of the following figure.

Figure 3.18: Different coloured clusters, as product of segmentation of the upper point cloud.

(38)

3.3.5 Polyhedral mesh

Starting from the segmented point clouds we aiming to obtain solid obstacles rep-resentations of the surroundings the UAV. This can be done by performing a 3D meshing algorithm of the available clusters. In the perspective of a 2D planning solution as collision avoidance algorithm, it needs a map with 2D polygons. We can choose to create a 2D meshes with two different strategies, shown in Figure 3.19, that represent the required algorithm steps of each one. Our choice is the first solution, due to reasons of shortest computing time. Compared them we highlight that it is the best one.

Local Cluster 3D Local Mesh Local -> Global

Min-Max Altitude Flatten global poly

2D Convex Hull Local Cluster Local -> Global Min-Max Altitude Flatten global cluster 2D Convex Hull

Figure 3.19: Two strategy of meshing algorithm.

The reasons for which we need to perform the illustrated steps in figure 3.19 will be explained in detail in the next subsection. Assuming to know them, we can compare the two approaches to understand which is the suitable one.

The transformation step from a local to a global reference system of a point cloud requires longer time than the same transformation of the respective 3D mesh. Due to the less number of points that compose the mesh of the cloud instead the

(39)

cloud itself. For the same reason, find the maximum and the minimum altitude of a polygon is faster in a mesh than in a cloud, due to the number of points that make them. In the 3D mesh the algorithm has to explore only the polyhedron’s vertexes, in the other case all the cloud’s points. So this are two reasons that prove the advantage of the the first solution, respect the second one.

Going on with the comparison, we can note that project a 3D polyhedron on a 2D plane is less expensive, in terms of time, than project its 3D cluster of points. And consequently the input of the 2D mesh generation algorithm (last step in the flowchart) is lighter, because in the first case there will be only the polyhedron’s vertexes. This is an important aspect, because the 2D meshes must be realized every time there is a correction of the keyframe position, that may lead to a different global position of the obstacles. For example when a loop closure is detected new keyframe’s positions may be estimated. So, although the first solution has a step algorithm more, it is suitable because the steps are faster than those in the second solution.

Looking at the first flowchart, it starts with the transformation from local clus-ters to 3D local meshes. In literature there are many solutions to reconstruct sur-face from the point cloud. Greedy Projection Triangulation, Z. C. Marton, is an implementation of a greedy triangulation algorithm for 3D points based on local 2D projections. It assumes locally smooth surfaces and relatively smooth transitions be-tween areas with different point densities. Doing that hypothesis we lost generality.

The Poisson surface reconstruction algorithm, proposed in [50] shows that surface

reconstruction from oriented points can be cast as a spatial Poisson problem. We didn’t choose this solution because sometimes it can produce open solids.

To create a surface out of points we use a convex hull algorithm, because it generates the boundaries of the cluster that is fully inside the mesh, as shown in Figure 3.21. It is suitable for our purpose, indeed we want solid 3D objects that contain the real

(40)

obstacles, reproducing their original shapes. Convex hull doesn’t need any kind of setting parameters to work. Starting from a cluster, as the one in Figure 3.20 left side, performing a convex hull on it, we obtain the solid in Figure 3.20 right side. Showing them in the same picture, it’s obvious that the mesh contains the cloud, 3.21.

Figure 3.20: Left side: Obstacle Cluster. Right side: Obstacle Mesh.

(41)

To better clarify the concept of performing clusters, in Figure 3.18 is shown a segmentation of the entire environment, but we should remember that to generate the obstacles map we process only a stripe of the entire cloud. So in Figure 3.22 we report the stripe of cloud starting from which is done a segmentation subsequently a 3D mesh.

Figure 3.22: Up: cloud stripe of the environment. Down: Sets of coloured points: Clusters. White shapes: Meshes of the clusters.

(42)

3.3.6 Environment discretization in altitude levels

Now the keyframe has got a stereo pair images and a set of local polyhedrons that represent the obstacles recognized in them. The term "local" polyhedrons means that their stored pose is relative to a local reference system, which origin is in the camera center.

The goal is to generate a global map to be updated along the navigation. We propose a solution where the map is structured by 2D altitude levels. As shown in Figure 3.23, where β1 and β2 are two different levels.

Figure 3.23: Levels structure of the environment

The environment’s altitude is discretized on regular interval whose resolution is a settable user parameter (d). It can be changed depending on the scenario and on the desired accuracy. Choosing an higher resolution will be obtained a more detailed map at the expense of a longer computing time.

In the previous subsection we explained the reasons of the choice of the first algorithm solution, between the two scheme illustrated in Figure 3.16. Now we explain how it leads to the desired result. The first step consists on a coordinate transformation. We need to transform the 3D solids from a local reference system to the global, in order to know which is their assumed position in the real environment. In particular we are interested to save the minimum and the maximum altitude of each polyhedron, Figure 3.24. So that we don’t lose their altitude information, storing it before project the polyhedrons on a 2D surface. Indeed the next step

(43)

consists on project the solids obstacles on a plane surface, flattening all them points on the same altitude level.

Figure 3.24: Red: Bounding box of the polyhedron. White: polyhedron.

After that we perform a 2D convex hull, on the set of points flattened on the plane to obtain polygons obstacles, Figure 3.26.

(44)

Figure 3.26: Left side: Example of a polyhedron’s 3D mesh, view from above. Right side: 2D flattened polygon.

They don’t seem perfectly the same object, because it depends on the different point of view at the moment of the screen shot.

Once again the convex hull algorithm is the best solution to obtain the contours of the projected polyhedron. Then every 2D flattened polyhedron, become polygon, is plotted on each altitude level map from the minimum to the maximum previously saved. We will discuss about it in the next subsection.

We can note that flattening the solids on a plane put us in cautionary condi-tions. The polyhedron is represented in each map level as a polygon that depicts its maximum contour size. In this way the obstacles details are lost, for example if the solid has got a cavity it is completely unmapped. Remembering that our aim is mapping the unknown environment to navigate it, the proposed solution is an acceptable compromise. In fact the mission of our UAV is not to reproduce the en-vironment details aiming on a recognition process, but simply identify the obstacles and avoid them along its navigation.

(45)

3.3.7 2D map level creation

This subsection explains how the last block of the algorithmic chain, in Figure 3.3, works. It has the task of analyze each flattened polygon of the processed keyframe and add them on the respective levels on which the polygons belong.

About the single map level generation were studied different solutions.

The early idea was to perform intersection between polygons and if they had common points, they were be joined, taking into account only their union in order to generate the obstacles map. But this solution was affected by two elementary problems. The first one was that from the intersection of two convex polygons can be originated a polygon with holes, and we aren’t interested on this type of information. Indeed we only need obstacles’ contours information.

The second one was related to the computational cost of the intersection-union algorithm. Taking into account that our obstacles map grows along the navigation, we must avoid that the computational cost to generate it grows unlimited together. As cited in the article [52], the algorithm of intersection-union polygons runs a time complexity of O((n + m + k)log(n + m + k)) in the worst case. Where n, m are the edges’ number of the two polygons and k is the number of intersections between them. So it’s obvious that increasing the dimension of the global map, the required time to update it increases too. Indeed every time a new keyframe was added, we needed to check the intersection between the polygons already included in the map and the new ones belonging to the keyframe.

This first solution reduced too much the rate of work, so we discarded it and we chose a computer vision technique to create the map. It works in terms of pixels map, avoiding problems related with intersections or unions of polygons.

(46)

At the beginning of the navigation of the vehicle, the level map is made of a completely black plane, because there isn’t any mapped obstacles yet. Along the path of the UAV will be added the detected objects on it, creating different needed levels. The flattened polygons, standing for the obstacles’ contours, are filled in white color and handled as stickers to be pasted on the black planes that represent the discretized environments levels. The Figure 3.27 help to understand the concept just explained.

Figure 3.27: Left up: Initial level black map. Right up: Level Map with two obstacles. Left down: Level Map after adding two more obstacles. Right down: Level Map after adding two more obstacles.

The idea is to generate a level map that grows while the UAV is exploring areas not yet visited. In other words, to not limit the size of the exploration environment, if the vehicle moves in a region out of the current map we provide to add a new

(47)

tile on the global map. The union of adjacent tiles makes the level map. Every tile is identified by a map-key, that’s a pair of integer and it has got a local reference system centered in the middle of the figure. The tile zero for example has the map-key equal to (0, 0), the adjacent right tile (1, 0) and so on, Figure 3.28 show them (they should have been coloured in black, but to better understand the structure they are white).

Figure 3.28: Level Map tiles structure.

In Figure 3.29 is shown the flowchart of the algorithmic steps, to better clarify the sequence of the necessary operations to obtain an obstacles map. First of all is applied a transformation on each flattened polygon belonging to the keyframe in exam, then is processed a polygon a time and add it on levels map, as reported in the flowchart.

(48)

Load a Polygon Does needed level map exist? Y N Generate them

Find bounding rectangle

Find Vertex’s tile

Draw polygon in all the found tiles

For each level from min to max

Add it on the list of tiles we have to draw on Vertex finished? Y N Is it already found? Y N

Update contours Map

Transform all polygons from global coordinates to pixel map

(49)

Now we deal in detail the transformations applied on each flattened polygon. Supposing to have a unique global pixels map (tile (0,0)) with ideally unlimited sizes and the origin of the reference system placed in the middle of it, each 2D point of the polygon must be converted from global position coordinates into global pixel coordinates.The map has got a settable resolution (R), and dimension (D). The resolution is a sort of scale parameters, to convert meters in pixels.

For each polygon’s vertex is applied the following transformation.

       x y 1        pixel =        R 0 D−1₂ 0 R D−1 2 0 0 1               X Y 1        meters

After having transformed all polygons, to create the map is examined a polygon at a time, finding the bounding rectangle containing it, to extract its sizes and its bounding vertexes, as shown in Figure 3.30.

Figure 3.30: Black: Convex pixel polygons. Red: Bounding rectangles, and their respective vertexes.

(50)

For each of the four vertex is found on which tile of the map it belongs, in order to know where the polygon should be plotted, saving only different found tiles in a sort of tile’s list.

This means that the algorithm starts finding the tile of the first vertex and storing this information. Then is examined the second vertex and if it belongs to a different tile also this information will be saved, otherwise it will be discarded. In the latter case the check’s result is discarded because it doesn’t give us any new information. The tile on which the polygon should be drawn is the same of a previous vertex, so it’s enough plot it only a time. A possible case is a situation where the four vertexes belong to four different tiles, and the same polygon will be drawn four times, as shown in Figure 3.31.

Figure 3.31: Example of possible case: polygon belong to 4 different tiles. Black: 4 different adjacent tiles. Red: Bounding rectangle and its 4 vertexes.

(51)

After that we wonder if all the levels on which the polygon must be drawn already exist. In case of negative response the algorithm generate them as black maps, and they constitute the tile zero of the level. Then we have to draw the polygon in the tiles of the tile’s list, previously described, following the strategy shown in Figure 3.32. Transform Polygon in reference tile Tile? Create Tile Fill convex Polygon

Add it to map level

Y N

Figure 3.32: Flowchart of drawing polygon algorithm

We find the respective transformation associated at the tile that allows to map the pixels polygon from the global reference system centered in tile 0,0 to the local to the tile. Now we fill of white the converted convex polygon and stick it on the tile, updating the global level map. We use the word "stick" because it gives a simple idea of the basic concept of the solution. That’s treat the level map as a page of tiles on which we place the stickers that are the polygons. In this way we resolve the problem related at the overlap of polygons, because the algorithm simply sticks them one over the other.

(52)

The same steps are execute for each level map from the minimum to the max-imum in which the polygon is situated. After that, to conclude the updating of the map we need to update also the contours map. That’s simply a map which highlights the contours of the 2D obstacles (stickers), Figure 3.33.

Figure 3.33: Level map contours

The algorithm choice to find only a time the bounding rectangle of the flattened polygon is due to the fact that the object to draw on each level is the same. Indeed as previously explained we consider the obstacles as the biggest size that it can fill.

(53)

3.3.8 Loop detection

Along the navigation the UAV may go back to places already visited and mapped. As reported in the first section of this chapter, 3.1, the SLAM algorithm is able to detect this situations and to perform an update of the stored vehicle’s positions in order to correct them. So, consequently also the obstacles map needs to be updated. The old map is completely deleted and a new one is created. The new obstacles map will have only the local obstacles respect the current position of the vehicle. So to reconstruct this new local map we use only the information of the local keyframes. They are strictly connected with the path done by the vehicle. So it isn’t guaranteed that will be re-generated the entire map, but only the areas near the current vehicle position. This choice is done because assuming that the UAV has navigated for a long path, the algorithm needs too much time to reconstruct from zero the entire map and probably the vehicle won’t travel anymore from previously mapped areas. This process can be done in background, while the vehicle goes on. An example of loop closing is shown in Figures from3.35 to 3.36. Where the first picture reports the travelled path and the entire obstacles map in light blue colour (difficult to appreciate, due to the dimension of the image), before the loop detection. The second is a zoom version of the previous one, to better identify the presence of the obstacles. The third show the scenario after the loop detection underlining the regeneration of the local map only.

(54)

Zoom Region

Figure 3.34: View from above of the travelled path before loop detection.

Figure 3.35: Zoom Region of previous image. Green: Direction of travelled path. Light blue: Obstacles contours.

(55)

Figure 3.36: Example of local map re-generation, after a loop detection. Red lines: indicate the mapped regions before the loop closing. Green line: path done after loop detection.

(56)

(57)

Chapter 4 Results

4.1 From outdoor data-set

To verify the correct operation of our algorithm we need a sequence of images, to give it as inputs. At the beginning of the software development we used some recorded data-sets. The first data-set we used is the known KITTI data-set. In Figure 4.1 is shown a left camera image to have knowledge of the explored environment.

Figure 4.1: KITTI data-set, an example of left camera image

Even if it doesn’t reproduce exactly the conditions of an UAV navigation, we employed it because it is a benchmark for the work based on camera acquisition for Visual-SLAM localization. The sequence of the images is acquired by a stereo

(58)

camera mounted on car, so the altitude of the camera is always the same, more or less. For this reason we say that the motion of an aerial vehicle is not faithfully reproduce by this data set. Anyway it can be used as useful test.

The main camera parameters are: the focal length (718,856 pixels), the resolu-tion of the images (1241x376) the baseline (53,7 cm) and the camera FOV (81,6°). The first results that we show concern the localization problem, solved by the SLAM algorithm. While in Figure 4.2, is shown the result of the camera localization, after a time frame from the beginning of the path.

The pose of the camera in the current keyframe is represented in green and the pre-vious pose in blue. We can also note a cloud of red and black points, but compared them with the point clouds reported in chapter 3 it’s obvious that this ones are sparse clouds. The red points are the keypoints of the current keyframe, whereas the blacks are associated to the previous ones.

From the stereo-pairs images and the computed camera poses our algorithm pro-duces the dense point clouds of the explored environment. The Figure 4.3 shows a surroundings reproductions, starting from camera images acquisitions. The green shape stay for the current camera position, the blue ones for the previous positions.

(59)

4.1 – From outdoor data-set

Figure 4.2: KITTI Data-set, camera localization. Blue: previous camera poses, Red points: matched keypoints, Black points: keypoints associated on previous keyframes.

Figure 4.3: Point Cloud reconstruction from Kitti data-set acquisitions. Blue: pre-viously camera poses. Green: Current camera pose.

(60)

To do a comparison between a camera acquisition and a cloud reconstruction, we report approximately the same scene in Figure 4.4.

Figure 4.4: Up: Camera left acquisition. Down: Point Cloud reconstruction from KITTI acquisitions.

Starting from the dense point cloud of the environment, our algorithm generates the obstacles map of the detected objects. In Figure 4.5 is shown a view from above of the environment on which is mapped the obstacles contours. In the same images is reported the obstacles map in light blue color, the camera localization in blue and the point cloud of the environment. In the first picture there is also a green line that stands for the camera tracking.

(61)

Figure 4.5: Up: View from above of the Kitti point cloud. Light blue: obstacles contours map. Blue: assumed camera positions. Green: camera tracking. Down: zoom of some obstacles, without camera tracking.

From the images seems that there are doubles contours of some obstacles, but it depends on the point of view from which we observe the scene. Indeed the algorithm generates a levels map of the environments, so observing it from the above, the

(62)

doubles contours is due to the presence of more overlapping levels.

The pictures in Figure 4.6 explains better what we’ve just described.

Figure 4.6: Up: View from side of a stripe of the Kitti point cloud.

Light blue: obstacles contours map. Blue: assumed camera positions. Green: cam-era tracking. Down: zoom of an area of the previous scene.

(63)

In the previous figures is shown only a stripe of the entire point cloud of the environment. Indeed as reported in subsection 3.3.2, the generation of the obstacles map involves only a subset of points belonging to a stripe selected from the entire cloud. The height of the chosen stripe is a fixed constant parameter for the entire navigation, while the stripe’s center is the camera center. So since the camera altitude is the same more or less for the entire path, we view only a bound of the environment and the generated levels map are only 3 or 4.

(64)

4.2 From first indoor data-set

The second data set is an indoor industrial scenario. The sequence of the stereo-pair images is acquired by an UAV that navigates to capture different environment points of view. Since the acquisition is done by an aerial vehicle, this represents a suitable test condition.

In Figure 4.7 is shown a left camera image to have knowledge of the explored environment.

Figure 4.7: Indoor data-set, an example of left camera image

The main camera parameters are: the focal length (458,645 pixels), the resolu-tion of the images (752x480) the baseline (11 cm) and the camera FOV (78,7°).

(65)

4.2 – From first indoor data-set

We show in the same picture, Figure 4.13, the results of the SLAM algorithm and the dense point cloud of the explored environment. The algorithm generates the point cloud starting from the stereo-pair acquisitions and the camera poses information deriving from SLAM. The green rectangle in picture 4.13 stands for the current camera position, while the green line represent the camera tracking and the blue shapes the previous assumed positions.

Figure 4.8: Indoor data-set, camera localization and point cloud of the environment. Comparing the figures 4.13 and 4.3, we can appreciate obvious differences in the camera movements. In the KITTI data-set the camera keeps more or less the same altitude and orientation, performing a long path. While in this case the vehicle changes both the altitude and the orientation and the global map is less expanded because the performed track is shorter than the KITTI one. Also the detected objects are completely different, but in either cases their reconstruction by means of point clouds is very satisfactory.

(66)

To do a comparison between a camera acquisition and the respective cloud reconstruction, we report approximately the same scene in Figure 4.14.

Figure 4.9: Up: Camera left acquisition. Down: Point Cloud reconstruction from Indoor acquisitions.

(67)

Starting from the dense point cloud of the environment, our algorithm generates the obstacles map of the detected objects. To highlight the concept of a growing map, explained in the subsection 3.3.7, we report the following four pictures, in Figure 4.10, with the purpose of appreciate the adding of virtual tiles while the UAV navigates unexplored regions. The images represent a view from above of the environment.

Figure 4.10: Sequence of indoor environment reconstruction and obstacles map gen-eration along the UAV navigation. Light blue: obstacles contours map.

Blue: assumed camera positions. Green: camera tracking.

In the third and fourth pictures there are some sets of points (in the left side of the images) that aren’t mapped in any obstacles. This happens because in those

(68)

regions there is a low density of the cloud. Indeed the unmapped points belong to far areas from the camera so with high probability they are a sparse set of points and during the filtering procedure of the cloud they were discarded from the set of points to be processed to extract the obstacles. The reported point cloud is the original, not the filtered one, so we visualize the entire cloud and not that processed to generate the obstacles’ contours.

Observing the obstacles map from a different point of view, we can appreciate the generation of the different altitude levels. As opposed to the KITTI data-set, here the UAV changes its altitude along the navigation, and the camera tilt so the obstacles map has got more levels than the KITTI map, as shown in Figure 4.16.

(69)

Figure 4.11: Up: Point cloud reconstruction and levels obstacles map. Down: Point cloud reconstruction and levels obstacles map after a time from the previous cap-ture. Light blue: obstacles contours map. Blue: assumed camera positions. Green: camera tracking and current camera pose.

(70)

4.3 From second indoor data-set

We try our algorithm on a third data-set, which is an indoor sequence acquired by an UAV in a room with many different objects and textures. In Figure 4.12 is shown an example of a recorded sequence’s image, to have knowledge of the explored environment.

Figure 4.12: Indoor data-set, an example of left camera image

The main camera parameters are: the focal length (458,645 pixels), the resolu-tion of the images (752x480) the baseline (11 cm) and the camera FOV (78,7°).

(71)

4.3 – From second indoor data-set

We show in the same picture, Figure 4.13, the results of the localization problem solved by SLAM algorithm and the dense point cloud of the explored environment. The algorithm generates the point cloud starting from the stereo-pair acquisitions and the camera poses information deriving from SLAM. The green rectangle in picture 4.13 stands for the current camera position, while the green line represent the camera tracking and the blue shapes the previous assumed positions.

Figure 4.13: Indoor data-set, point cloud of the environment and camera localiza-tion. Blue: previous camera localizalocaliza-tion. Green: current camera localizalocaliza-tion.

(72)

To do a comparison between a camera acquisition and the respective cloud reconstruction, we report approximately the same scene in Figure 4.14.

Figure 4.14: Up: Camera left acquisition.

Autonomous obstacles map creation for UAVs in unknown environments using Visual-SLAM

Tesi di laurea

Autonomous obstacles map creation

for UAVs in unknown environments

using Visual-SLAM

Relatori:

Candidata:

Prof. Mario Innocenti

Giulia Carmignoli

Prof. Lorenzo Pollini

Ing. Andrea Bracci

Contents

Chapter 1

Introduction

1.1

Work context

1.2

Problem definition

1.3

Document structure

Chapter 2

State of the art

Chapter 3

Global obstacles map

3.1

Slam algorithm, input output definition

3.2

Map generation, input output definition

Generate Map

3.3

Developed algorithm

3.3.1

Disparity map creation

3.3.2

Point Cloud generation

Point Cloud generation

3.3.3

Point Cloud filtering

3.3.4

3D clustering

3.3.5

Polyhedral mesh

3.3.6

Environment discretization in altitude levels

3.3.7

2D map level creation

3.3.8

Loop detection

Zoom Region

Chapter 4

Results

4.1

From outdoor data-set

4.2

From first indoor data-set

4.3

From second indoor data-set