A computer vision line-tracking algorithm for UAV GNSS-aided guidance

(1)

Politecnico di Milano

School of Industrial and Information Engineering

Master of Science in Aeronautical Engineering

A computer vision line-tracking algorithm for UAV

GNSS-aided guidance

Advisor: Prof. Marco LOVERA Co-Advisor: Prof. Giorgio GUGLIERI

Dr. Davide INVERNIZZI Dr. Simone PANZA Eng. Mattia GIURATO

Thesis by:

Gabriele ROGGI Matr. 898391

(2)

(3)

(4)

(5)

Acknowledgments

La prima persona che tengo a ringraziare per l’opportunitá di svolgere questa tesi e per le sue competenze e disponibilità è il Professor Marco Lovera. Un grazie di cuore a Simone per il suo altruismo, la sua infinita pazienza e il supporto offertomi per la realizzazione delle prove sperimentali. Un grosso grazie va anche a Mattia e Davide per tutti i loro consigli e suggerimenti.

A seguire, ci tengo a ringraziare Salvatore, compagno di infinite giornate di studio e lavoro, senza il quale tutto questo non sarebbe stato possibile. Ringrazio tutti i miei compagni di corso e, in particolare, Alessandro, Giulia, Andrea e Simone. Grazie a Francesco per avermi sopportato in questi 5 anni di convivenza e a Checco per tutte le serate passate insieme.

Volevo poi ringraziare Emanuel e Davide, punti di riferimento fin dall’infanzia che, nonostante la lontananza, mi sono sempre rimasti vicino. Ringrazio inoltre i miei amici Raffaele, Stefano e Giordano.

Un ringraziamento speciale va a Cristina per avermi incoraggiato nei momenti difficili e per aver sopportato i miei sbalzi di umore di questi ultimi 2 anni. Infine il ringraziamento pi`u sentito va a tutta la mia famiglia, ai miei nonni Mirca, Dante e Marisa, a mio fratello Matteo e mia sorella Lara, ma soprattutto ai miei genitori Luca e Veruska per avermi messo nelle condizioni migliori per portare a termine questo lungo e prezioso lavoro e per avermi sostenuto in tutte le scelte io facessi. Semplicemente e profondamente grazie.

(6)

(7)

Abstract

The interest and the possible applications of Unmanned Aerial Vehicles (UAVs) have been increasing at a fast rate in the last few years. Despite the recent tech-nological evolution, mainly constituted by the miniaturization of sensors, flight control units and payloads, UAVs have shown limits in terms of accuracy and reliability. In particular, whenever inspection and monitoring applications are involved, such as, e.g., in the case of photovoltaic (PV) plants, correct tracking of the desired trajectory and accurate positioning above the region of interest as-sume a major importance. However, UAV outdoor navigation is currently based on the integration of Inertial Measurement Unit (IMU) and Global Navigation Satellite System (GNSS) information while the mission is usually planned in the form of waypoints using a Ground Control Station (GCS) software exploiting geo-referenced maps. The error in the representation of the object of interest in the georeferenced map, combined with the inherent GNSS inaccuracy can constitute a significant degradation of the inspections quality.

The purpose of this thesis is the one of combining vision with control on the waypoint position to achieve precise vision-based line tracking for a multirotor UAV. The project, in particular for what concerns the vision algorithm, will be specifically designed for the application in PV plants inspections. However, the same concept can be applied in any applications in which line tracking is involved, e.g., power lines inspections.

The vision-based line-tracking guidance law is implemented and validated in simulation environments. Eventually, the algorithm is successfully tested and val-idated through experimental activity in Flying Arena for Rotorcraft Technologies (FlyART) of Aerospace Systems and Control Laboratory (ASCL) of Politecnico di Milano.

(8)

(9)

Sommario

L’interesse e le possibili applicazioni degli Unmanned Aerial Vehicles (UAVs) stanno crescendo molto velocemente negli ultimi anni. Nonostante la recente evoluzione tecnologica, costituita principalmente dalla miniaturizzazione dei sen-sori, delle unità dedite al controllo e dei carichi paganti, gli UAV hanno mostrato limiti in termini di accuratezza e affidabilità. In particolare, in applicazioni di ispezione e monitoraggio, come nel caso di impianti fotovoltaici, il rispetto della traiettoria desirata e un accurato posizionamento sulle regioni di interesse as-sumono una grande importanza. Attualmente la navigazione outdoor degli UAV è basata sull’integrazione di informazioni provenienti da piattaforme inerziali e dal sistema satellitare globale di navigazione, mentre la pianificazione del volo avviene tramite l’inserimento di waypoint su apposite stazioni di controllo che sfruttano mappe georeferenziate. L’errore nella rappresentazione dell’oggetto di interesse nella mappa georeferenziata, combinato con l’intrinseca inaccuratezza del sistema di navigazione globale, può costituire un significativo decremento della qualità dell’ispezione.

Lo scopo della tesi è quello di combinare tecniche di visione artificiale con il controllo della posizione dei waypoint per ottenere un preciso allineamento di un drone multirotore con la traiettoria desiderata. Dal punto di vista della visione, l’algoritmo è stato concepito per essere utilizzato in applicazioni di ispezione di impianti fotovoltaici. Tuttavia, lo stesso concetto può essere generalizzato per qualsiasi applicazione che coinvolga l’allineamento con un oggetto desiderato in cui siano presenti caratteristiche lineari.

La legge di guida `e validata in simulazione e, infine, testata e validata con successo attraverso l’attivit`a sperimentale condotta nell’arena FlyART del labo-ratorio di controllo e sistemi aerospaziali (ASCL) del Politecnico di Milano.

(10)

(11)

5.2.2 Gazebo simulator . . . 67 5.2.3 QGroundControl . . . 68 5.3 Implementation . . . 68 5.3.1 World setup . . . 69 5.3.2 Camera integration . . . 70 5.3.3 GNSS noise model . . . 71 5.3.4 Offboard API . . . 72 6 Results 75 6.1 Simulation results . . . 75 6.2 Experimental setup . . . 82 6.2.1 ANT-1 drone . . . 83

6.2.2 Motion Capture system (Mo-Cap) . . . 86

6.2.3 Ground station . . . 86

(13)

CONTENTS IX

6.3.1 Identification . . . 87

6.3.2 Sensitivity to parameters . . . 88

6.3.3 Computer vision validation . . . 93

6.3.4 Robustness verification . . . 93

(14)

(15)

List of Figures

1 Mission planning example on PV plants . . . 1

1.1 Thw WGS 84 coordinate system definition . . . 8

1.2 Position control block diagram . . . 14

1.3 Position control with set-point error block diagram . . . 14

1.4 Simplified position control with set-point error . . . 14

2.1 Gaussian function with zero mean and standard deviation σ = 1 . 22 2.2 Pascal’s triangle . . . 22

2.3 Pascal’s triangle for subtraction . . . 23

2.4 Image used as test example (a) and after Gaussian smoothing (b) 28 2.5 Result after Sobel operator application (a) and after non-maximum suppression (b) . . . 29

2.6 Final result (after hysteresis thresholding) . . . 30

2.7 Gray-level histogram with Otsu thresholding (b) of image (a) . . . 32

2.8 Conversion of three points from the image space (a) to the accu-mulator space (b) . . . 33

2.9 Conversion of three points, belonging to a vertical line, from the image space (a) to the accumulator space (b) . . . 34

2.10 Foot-of-normal parameterization of a line . . . 35

2.11 Accumulator space in polar coordinates . . . 35

2.12 Application of the Hough transform to a chessboard . . . 36

3.1 Aerial image of a row of photovoltaic panels . . . 38

3.2 Result of application of Canny edge-detector (a) and enlargement to include only a single panel(b) . . . 39

3.3 Result of application of the Hough transform . . . 39

3.4 Result of application of the filtering process . . . 40

3.5 Extraction of the parameters resembling the mid-line . . . 40

3.6 Computation of the X parameter . . . 41

3.7 Geometric description of IF OV and GIF OV of a single detector element . . . 42

(16)

XII LIST OF FIGURES

3.9 Simplified position control with camera measurement and

compen-sator . . . 44

3.10 Results in terms of transfer functions Snx and Snδ and correspond-ing weights . . . 48

3.11 Step responses of Snx and Snδ . . . 48

3.12 Open-loop transfer function with and without taking into account position dynamics . . . 49

4.1 Attitude block diagram . . . 54

4.2 Complete system comprehensive of the saturation on the reference attitude . . . 56

4.3 Back calculation architecture . . . 57

4.4 Quadrotor Simulink model . . . 57

4.5 Guidance law on the set-point position . . . 58

4.6 Heading update . . . 59

4.7 Position in NED reference frame (tilt angles saturation) . . . 60

4.8 Euler angles (tilt angles saturation) . . . 60

4.9 Euler angles with anti-windup . . . 61

4.10 North position with anti-windup . . . 62

4.11 North position considering ”ideal” camera measurements . . . 62

5.1 Comparison between conventional (a) and Offboard (b) FCU-companion computer (CC) interaction . . . 66

5.2 Gazebo in a default simulation with PX4 . . . 67

5.3 Gazebo simulator and QGroundControl interaction . . . 68

5.4 SITL structure . . . 69

5.5 Customized Gazebo world . . . 69

5.6 Simulated Yuneec Typhoon H480 . . . 70

5.7 PV panel image acquired by the on-board camera in Gazebo (10 meters altitude) . . . 71

5.8 Algorithm scheme . . . 74

6.1 Ground truth and estimated trajectory compared to the desired one 76 6.2 Position in NED reference frame . . . 76

6.3 Error between UAV and desired position versus computer vision estimate . . . 77

6.4 Computer vision algorithm error . . . 77

6.5 Simulator setup for the generalized case . . . 78

6.6 Trajectory and set-point in the generalized case . . . 79

6.7 Error between UAV and desired position versus computer vision estimate in the generalized case . . . 80

6.8 UAV position in North and East direction compared to the cor-rected set-points . . . 80

(17)

LIST OF FIGURES XIII

6.10 Heading angle after correction (enlargement) . . . 81

6.11 Local reference frame of the arena . . . 82

6.12 The Flying Arena for Rotorcraft Technologies (FlyART) . . . 82

6.13 ANT-1 . . . 83

6.14 The Pixfalcon FCU . . . 84

6.15 NanoPi NEO Air companion . . . 84

6.16 CAM500B camera . . . 85

6.17 Infrared camera (a) and markers (b) of the Mo-Cap system . . . . 86

6.18 PRBS and response of the system . . . 87

6.19 Comparison between identified closed-loop position dynamics and model used for H∞ synthesis . . . 88

6.20 Position in NED reference frame (α = 1) . . . 90

6.21 North position (α = 2) . . . 90

6.22 North position (α = 3) . . . 91

6.23 Frequency response of the loop transfer function including the po-sition dynamics and with its static approximation (α = 3) . . . . 92

6.24 Vision estimate compared to UAV North position and correspond-ing error . . . 94 6.25 Performance in terms of set-point and position for Mo-Cap altitude

(18)

(19)

List of Tables

5.1 Gazebo camera plugin parameters . . . 70

5.2 Gazebo GNSS plugin parameters . . . 72

6.1 NanoPi NEO Air features . . . 85

6.2 CAM500B features . . . 85

6.3 Comparison between identified second order model and the one used in H∞ tuning . . . 88

6.4 Comparison between error in the set-point position and compen-sator parameters . . . 91

(20)

(21)

Introduction

An Unmanned Aerial Vehicle (UAV) is an aircraft able to fly autonomously or piloted from the ground, anyhow without a pilot aboard. Usually called drones, in recent years this type of vehicles have received increasing attention both in civil and military fields thanks to their wide range of application, e.g., aerial monitoring and inspection, surveillance, search and rescue, precision agriculture, photography, product delivery and entertainment. While there is a great variety of types of UAVs [1], in this thesis we focus on the category of multirotor Vertical Take-Off and Landing (VTOL) vehicles provided with at least four motors, of small/medium size and remotely controlled. Thanks to their versatility in tech-nical operations, they are widespread in commercial and research fields. When dealing with inspections for Operations & Maintenance (O&M) activities, these vehicles still lack the level of accuracy required.

Let’s consider, as a case study, the monitoring process on a photovoltaic (PV) plant, in which the correct acquisition of aerial images in visible and/or thermal spectrum assumes a major importance for appropriate maintenance scheduling and intervention, with focus on the way in which the mission is usually planned. Typically, on a control station, the drone operator manually inserts a series of waypoints as shown in Figure 1.

Figure 1: Mission planning example on PV plants

(22)

2 Introduction

care of tracking them. However, no guarantee that the drone will actually fly over the PV panels exists.

First of all, the waypoint can be misplaced by the operator and/or not exactly correspondent to the desired location due to possible incorrect georeferencing of the map in which the waypoint itself is positioned. It must be noted that, geo-referenced maps obtained from satellite imagery, e.g., Google Earth [2], widely used in this framework, are not a reliable source of information. Researches in the literature have shown that the error in planimetric coordinates (North, East), between objects as captured in Google Earth and obtained from a base map pro-duced by ground surveying techniques, ranges from 5 [3] to 10 meters [4] and even more [5]-[6], depending, among the many factors, on the geographic location. The accuracy of measured distances in Google Earth is considerably better than the accuracy of measured individual coordinates, but, in any case, an error of 2 or more meters is expected [7].

Secondly, the intrinsic inaccuracy of the GNSS signal received can lead to poor performance in the estimation in the UAV pose, leading to tracking errors and consequent degradations in the inspection performance. The GNSS error sources could be many, ranging from ionospheric and tropospheric delays, clock-related and satellites’ orbit-related errors, multipath errors and the receiver noise itself [8]. In particular, the GNSS receivers embedded on a professional UAV have been tested in [9] highlighting a 1 m to 2 m and 4 m accuracy respectively on the horizontal and vertical position. It is worth noting that this result provides some indications about the magnitude of the error, but cannot be generalized since dependent on the hardware used, satellite coverage and geographic position.

Objective

In this thesis the objective is to realize a guidance system which relies not only on the GNSS signal, but also on information coming from additional sensors. In this framework, a particularly promising solution is provided by vision systems. Through the use of appropriate techniques, images allow to extract useful infor-mation, in terms of features, for navigation, guidance and control of a vehicle in a partially or totally unknown environment. In the particular case of PV plants, not many features can be used for localization purposes due to the repetitiveness of the texture of the PV panels. On the other hand, through the use of appropriate techniques, information about the correctness of the trajectory followed by the UAV, with respect to the desired trajectory, can be extracted. Exploiting this information, the idea is to realize a system capable of keeping the UAV above the PV panels during all the mission, adjusting the UAV trajectory to compensate the sources of errors mentioned above.

(23)

Introduction 3

State of the art

In the literature, the problem of controlling a vehicle, or in general a robot, using feedback information coming from a vision sensor, is referred to as visual servo-ing [10]. It is a commonly addressed problem in robotics, but in which results from many elemental areas including high-speed image processing, kinematics, dynamics, control theory and real time computing, are fused.

In visual servoing, referring to robotics terminology and depending on the camera configuration, a first distinction can be made between eye-in-hand and eye-to-hand. In the former, the camera is mounted directly on the robot’s end-effector, while in the latter the camera is mounted directly on the workspace. This terminology can be adapted to UAV applications referring to on-board visual systems (eye-in-hand) in opposition with ground visual systems (eye-to-hand) [11]. Going more into details, there are two ways to implement vision-feedback robot control: the dynamic look-and-move system and the direct visual servo system.

• The dynamic look-and-move system performs vision-feedback control of robot manipulation in a hierarchical manner. The higher-level vision sys-tem provides a desired pose reference input to a lower-level robot controller which then uses feedback from the joints to internally stabilize the robot [12].

• In direct visual servo control systems, instead, a visual controller replaces the robot controller and directly computes inputs to the robot joints. The direct visual servo system usually requires high-speed vision feedback information, which places a high burden on both system hardware and software.

In [13] some considerations and experimental results about these two different techniques are presented. It is shown that the frequency of the visual servo control is very much dependent on the image processing time, as opposed to the look-and-move technique. On the other hand, visual servo control presents smaller steady-state positioning errors, because the images captured when the manipulator is close to its desired position are used in the system feedback to improve in real time the robot accuracy.

A further classification for visual servo control exists and regards the space in which the error, or in general the vision feedback, is computed.

• In Position Based Visual Servoing (PBVS), the error is computed in the 3D task space. Features are extracted from the image and used in conjunc-tion with a geometric model of the target and the known camera model to estimate the pose of the target with respect to the camera. As a result, considering an on-board visual system, the pose of the vehicle is computed by reducing estimation errors. Weaknesses of PBVS regard its sensitivity to both camera and robot calibration errors and usually its knowledge re-quirement about the 3D target model [14]. Furthermore, there is no way to

(24)

4 Introduction

ensure the object will remain in the camera field of view during the servoing, especially in the presence of large calibration errors.

• In Image Based Visual Servoing (IBVS) control values are computed on the basis of image features directly. The control goal is specified as a certain configuration of the features in the image to be reached when the task is well performed. This approach is particularly meaningful when the robotic task can be specified, in a natural way, as a relative positioning of the camera frame handled by the robot with regard to a peculiar part of the environment [15]. The image-based approach may reduce computational delay, eliminate the necessity for image interpretation and eliminate errors due to sensor modelling and camera calibration [16]. However, the sampling rate of the closed loop control is directly given by the image processing time. Thus, to ensure the performance of the control system, this sampling rate has to be sufficiently high. For the current operations in direct visual servoing, a rate of 10 images per second can be considered as a lower limit. The low frequency imposed by vision might cause problems on controller’s stability [17]. In addition to this, the convergence of IBVS is theoretically ensured only in a region (quite difficult to determine analytically) around the desired position.

In conclusion, it can be stated that PBVS is simpler as far as the implemen-tation is concerned and safer compared to IBVS. This is due to the fact that, in PBVS, visual information is used to provide an enhanced pose estimation of the vehicle, involving the reconstruction of the target pose with respect to the camera, while the control strategy is practically unaltered. In IBVS, it must be verified that the controller, designed to operate in the feature space, does not excite the unstable or marginally stable dynamics of the system. On the other hand, IBVS is most promising for what concerns accuracy and robustness against modelling and camera calibration errors. In addition to this, being the control action computed in the feature space, the presence of the object in the camera field of view during operations is guaranteed.

In the considered problem, due to the repetitiveness of the PV panels, the longitudinal and lateral lines of the panels are the most suitable features to be exploited to realize the visual servoing task. As far as tracking of linear features is concerned, some works regarding visual servoing for UAVs are available in the literature. Among these, in [18] a visual servoing problem concerning the tracking of power lines has been faced. In this framework, the authors have used a direct visual servoing approach both in the form of PBVS and IBVS. A similar IBVS solution for tracking linear features has been also proposed in [19]. In [20] and [21] similar works have been conducted on fixed-wing UAV for alignment with ground runway lines for landing. To the best of the author’s knowledge, this kind of approach has never been used on photovoltaic plants.

(25)

Introduction 5

Approach

It is worth noting that a direct visual servoing approach, proposed in the presented literature studies, requires an ad-hoc design of the control law and, as a conse-quence, it is not compatible with pre-existing flight control software. In contrast, this work aims to develop a general law which could be implemented in different platforms. In this context, indirect visual servoing (dynamic look-and-move sys-tem) could represent a suitable solution. The idea would be the one of using the high-level vision information to provide a desired modification in the position and heading reference to the flight control software available on the platform. In con-trast with a direct visual servoing system, in which the controller rate is bounded by the vision system frequency, possibly leading to stabilization problems, the proposed solution is designed to rely on the UAV autopilot. In order to keep the object of interest for the vision system in the camera field of view, a solution consisting of a saturation level on the attitude angles is suggested. Finally, for the development and validation of the system, a software-in-the-loop simulation envi-ronment that allows to test the integration of all the UAV components, including also the output of the camera, is proposed and recommended.

Thesis structure

The thesis is organized as follows:

• In Chapter 1 modelling and simulation of multirotor UAV flight is presented; in particular, reference frames, rotational formalism, flight dynamics and no-tation used in the thesis are described. Finally the mathematical formulation of the problem is presented.

• In Chapter 2 an overview about image processing and feature extraction techniques is provided; specifically the techniques used for the development of the computer vision algorithm are presented and discussed.

• In Chapter 3 the vision algorithm is applied to the problem of interest. The information extracted are then used to formulate the guidance law.

• In Chapter 4 the UAV control structure is analyzed and exploited to insert saturation on the roll and pitch angles. An anti-windup technique is de-veloped and applied to a Simulink implementation of the proposed control strategy.

• In Chapter 5 the software-in-the-loop simulation environment used to vali-date the overall system is presented. The attention is focused on the analysis and simulation of each UAV architecture component.

(26)

6 Introduction

• In Chapter 6 the results of the simulation are presented and discussed. Then, after a brief description of the experimental system setup (hardware and software), the flight tests’ results in different conditions are shown.

(27)

Chapter 1 Problem statement

In this chapter all the adopted conventions related to reference frames and three dimensional rotations are presented. Thus, the dynamic equations of a multiro-tor UAV are derived. Finally the mathematical formulation of the problem is provided.

1.1 Multirotor kinematics and dynamics

In this section a brief introduction about modelling and simulation of aircraft dynamics is presented.

1.1.1 Reference frames

In order to describe the motion of a flying vehicle, at least two reference frames are needed, namely an inertial and a body frame. In this framework, the description of the geodetic reference system, in which the waypoint corresponding to the mission planning are expressed, of two local frames, i.e., NED and ENU, and of the body frame are provided.

Geodetic reference frame

The model used for the Geodetic reference frame is the WGS 84. The WGS 84 coordinate system is a right-handed, Earth-fixed orthogonal frame, based on an ellipsoid formulated in 1984, that approximates very well the surface of the Earth. According to the definition given by the U.S Defense Mapping Agency [22], it is possible to state what is described in Figure 1.1, namely:

• The origin is placed on the Earth’s center of mass;

• the Z-axis is the direction of the Conventional Terrestrial Pole (CTP) for polar motion, as defined by the Bureau International de l’Heure (BIH);

(28)

8 Problem statement

Figure 1.1: Thw WGS 84 coordinate system definition

• the X-axis is the intersection of the WGS-84 reference meridian plane and the plane of the CTP’s equator, the reference meridian being the zero merid-ian defined by the BIH;

• the Y -axis completes a right-handed, Earth-centred, earth-fixed (ECEF) orthogonal coordinate system, measured in the plane of the CTP equator, 90° east of the X-axis.

The identification of the position of a vehicle in this reference frame is done through three quantities, namely:

1. latitude. The angular distance with respect to the Equatorial plane;

2. longitude. The angular distance with respect to the Greenwich Meridian, that is the meridian of reference;

3. altitude. The distance from the origin of the system to the vehicle. Local NED frame

The local North-East-Down frame FN = (ON, {N, E, D}) is a right-handed

orthogonal frame widely used in the aerospace field. Its origin is placed on a generic point on the Earth’s surface, e.g., the place of take-off. The first and second axes N and E define the horizontal plane tangent to the surface of the Earth for altitude equal to zero and parallel to it for altitude higher than zero. The N axis points to the North and the E axis points to the East; the third axis, D, has the direction of gravity, pointing downwards.

(29)

1.1 Multirotor kinematics and dynamics 9

The local East-North-Up frame FE = (OP, {E, N, U }) is widely used in

ge-ography. Its coordinates are formed from a plane tangent to the Earth’s surface fixed to a specific location and hence it is sometimes known as a ”local geodetic plane”. By convention, the first axis E points to the East, the third axis U points in the opposite direction with respect to gravity; while to complete a right-handed frame, the second axis N points North.

Body frame

The body reference frame FB = (OB, {b1, b2, b3}) refers to a right-handed

system of coordinates, rigidly attached to the center of gravity of the UAV, i.e., OB ≡ CG, and changing orientation with it. The unit vector b1 lies in the plane

of symmetry of the vehicle and points forward, b2 points to the right wing, normal

to the plane of symmetry, and b3 points downward the vehicle.

1.1.2 Rotation formalism

A rotation matrix is a matrix belonging to SO(3), the group of orthogonal matrices under matrix multiplication with determinant equal to 1. A rotation matrix could expresses three different meanings:

• the orientation of a frame with respect to another frame;

• the transformation that relates the coordinates of a point in two different frames;

• the rotation of a vector in a coordinate frame.

A frame or a vector could be rotated around different axes many times: in order to perform this operation, it is useful to define the elementary rotations around a coordinate axis, in particular:

Rx(α) =   1 0 0 0 cos(α) sin(α) 0 − sin(α) cos(α)   (1.1) Ry(β) =   cos(β) 0 − sin(β) 0 1 0 sin(β) 0 cos(β)   (1.2) Rz(γ) =   cos(γ) sin(γ) 0 − sin(γ) cos(γ) 0 0 0 1  . (1.3)

(30)

10 Problem statement

The rotation direction is positive counterclockwise, following the right-hand rule. Since a rotation matrices is orthonormal, the inverse rotation is obtained by trans-posing it. In fact:

R_x−1= RT x (1.4) R_y−1= RT y (1.5) R_z−1= RT z. (1.6)

Many consecutive rotations could be performed multiplying the rotation matri-ces, noting that rotations performed in different order produce different results Rx(α)Ry(β) 6= Ry(β)Rx(α).

Euler angles

The most common minimal representation, i.e., described by three real pa-rameters, is the Euler parameterization, in which the angles of the three planar rotations will play the role of the rotation parameters. It is standard practice in aeronautics to use the 3-2-1 right-handed rotation, that is a first rotation around the b3 axis, followed by b2 and finally the b1 axis, defining the attitude matrix:

B

RW(φ, θ, ψ) = Rx(φ)Ry(θ)Rz(ψ), (1.7)

where B and W stands for ”body” and ”world” respectively, while φ is the roll angle, θ is the pitch angle and ψ is the yaw angle. For the sake of completeness,

B_R W(φ, θ, ψ) =   cθcψ cθsψ −sθ sφsθcψ− cφsψ sφsθsψ+ cφcψ sφcθ cφsθcψ + sφsψ cφsθsψ− sφcψ cφcθ  , (1.8) where to simplify the notation, s∗ = sin(∗) and c∗ = cos(∗). To obtain WRB(φ, θ, ψ),

the rotation matrix from the body frame to the world frame, it is sufficient to transpose B_R

W(φ, θ, ψ):

W_R

B = BRTW. (1.9)

Thanks to rotation matrices and Euler angles, it is possible to express the vectors of kinematic quantities with respect to the body frame or the inertial frame:

uW = WRBuB (1.10)

uB = BRWuW, (1.11)

where u is a generic vector expressed in the inertial frame (uW) and in the body

(31)

1.1 Multirotor kinematics and dynamics 11

Quaternions

A quaternion is a four-dimensional representation of a sphere introduced by W. R. Hamilton that can be used to represent the orientation of a rigid body or a coordinate frame in three-dimensional space (see [23]).

q=     q1 q2 q3 q4     = ρ q4 , (1.12)

and it is constrained to have unitary norm kqk2 = kρk2+ q2

4 = 1. (1.13)

The attitude matrix is then related to the quaternion by a homogeneous quadratic function of the components of a quaternion (see [24])

R(q) = q2 4 − kρk

2_I

3+ 2ρρT − 2q4ρ,ˆ (1.14)

where I3 is the 3 × 3 identity matrix. Note that the hat map ˆ· is defined by the

condition that ˆxy _{= x × y for all x, y ∈ R}3. In particular,

ˆ ρ=   0 −q3 q2 q3 0 −q1 −q2 q1 0  . (1.15)

It is of course also possible to extract from an attitude matrix R the corre-sponding quaternion using a modification of Shepperd’s algorithm (see [25]).

With quaternions, successive rotations can be accomplished using quaternion multiplication

R(q0)R(q) = R(q0 ⊗ q), (1.16) where ⊗ is the Hamiltonian product, which can be computed as follows

q0⊗ q =q 0 4ρ+ q4ρ0− ρ0× ρ q₄0q4− ρ0 · ρ . (1.17)

Moreover, the conjugate of a quaternion is defined by q∗ =−ρ

q4

. (1.18)

(32)

1.1.3 Flight dynamics

The UAV flight dynamics is modelled by the Newton-Euler equations which de-scribe the combined translational and rotational dynamics of a rigid body. Such equations can be written considering as a reference frame both the inertial frame FI = (OI, {e1, e2, e3}) and the body frame FB = (OB, {b1, b2, b3}).

Linear motion

The linear motion equations in the body frame is written through Newton’s second law. fB = d(mvB) dt , (1.19) where fB = fx fy fz T

is the external force applied to the UAV and vB =

u v wT is the linear velocity. Equation (1.19) can be expanded as

fB = dm dt vB+ m ∂v_B ∂t + ω × vB (1.20) = ˙mvB+ m ˙vB+ ω × mvB, (1.21)

where ω ∈ R3 _{is the body angular velocity. Since the considered UAVs are}

pro-pelled by electrical motors and the energy is stored in batteries, it is possible to neglect the term related to the variation of mass over time, then

fB = m ˙vB+ ω × mvB. (1.22)

The obtained result represents the rate of change of momentum as a result of the applied force.

Angular motion

Newton’s second law for the angular motion in body frame states τB =

d(Jω)

dt , (1.23)

where τB =l m n T

is the external torque applied to the UAV and J ∈ R3×3

is the body inertia tensor and it is defined as

J =   Jxx −Jxy −Jxz −Jxy Jyy −Jyz −Jxz −Jyz Jzz  , (1.24)

(33)

1.2 Mathematical formulation 13 Jxx = Z m (y2_{+ z}2_{) dm,} Jyy = Z m (x2+ z2) dm, Jxy = Z m (xy) dm, Jzz = Z m (x2+ y2) dm, Jxz = Z m (xz) dm, Jyz = Z m (yz) dm. As for the linear motion, it is possible to expand equation (1.23) while neglecting the mass (and then inertia) variation over time

τB = J ˙ω + ω × Jω. (1.25)

External forces and moments

The external forces and moments, i.e., fB and τB, applied to the rigid body

of the multirotor is what affects its motion while flying. Clearly, the gravity force is applied to the multirotor UAV, which will be described as:

fg = mge3, (1.26)

of course such force is expressed in the inertial frame FI and it must be rotated

into body frame FB by the attitude matrix R. It is then common to consider as

an external contribution the aerodynamic damping, which is proportional to the body velocity

fd= −kf dvB, (1.27)

τd= −kτ dω. (1.28)

Finally, the control force fc and τc, expressed in the body frame, must be taken

into account. The total external forces and moments are respectively

fB = RTfg+ fd+ fc, (1.29)

τB = τd+ τc. (1.30)

1.2 Mathematical formulation

As already stated in the Introduction, the aim of the thesis is to exploit vision information to provide a desired modification in the position set-point to the flight controller available on the UAV.

Focusing for the sake of simplicity on a single axis, a conventional position con-trol loop for a multirotor UAV can be described using the block diagram in Figure 1.2, where x is the position variable to be controlled and xo _{is the}

(34)

Figure 1.2: Position control block diagram

position dynamics and the position controller. Commonly, the position control is implemented using GNSS, then the feedback system will be affected by its measurement noise nGN SS.

In turn, the set-point xo _{can be either generated directly by a pilot or may}

correspond to a sequence of waypoints defined on the GCS at the beginning of the mission. In the latter case, however, an additional error term has to be taken into account, namely the error in the positioning of the waypoints, which may be due, to, e.g., limitations of the GCS interface or inaccurate georeferencing of the map of the plant to be inspected. In such a case a more realistic representation of the control system, see Figure 1.3, should take into account that the actual set-point received by the control system (denoted as ˜xo_{) is equal to the intended set-point}

xo plus an error term nM AP.

Figure 1.3: Position control with set-point error block diagram

This block diagram is interesting as it shows clearly that the effect of GNSS measurement error and the effect of inaccurate set-point computation enter the control system at the same point, so that letting

F(s) = Rx(s)Gx(s) 1 + Rx(s)Gx(s)

, (1.31)

the complementary sensitivity of the position control system, the block diagram can be equivalently represented as in Figure 1.4, where nxo represents the

simul-taneous effect of all the error terms.

(35)

1.2 Mathematical formulation 15

Assuming now that a vision-based system can provide additional measure-ments, specifically that the camera can return a (moderately) noisy measurement of x−xo_{, the problem under study becomes the one of exploiting camera}

measure-ments to compensate the effect of nxo on the performance of the control system.

In the following chapters, firstly the image processing algorithm to extract the vision-based measurements x − x0 _{will be discussed. Then, the attention will turn}

(36)

(37)

Chapter 2 Feature extraction techniques

In this chapter, an overview about image processing and feature extraction tech-niques is provided. The overall objective is to describe the techtech-niques used for the development of the vision algorithm, namely Canny edge-detection and Hough transform, as well as to provide some background and examples.

2.1 Overview

The general structure for a feature extraction system can be thought as composed of three building blocks: image acquisition, image preprocessing and segmentation and feature extraction itself. Image acquisition converts a scene into an array of numbers that can be manipulated by a computer. The second building block removes noise, enhances the picture, and segments the regions of interest from the image background. Both input and output of this building block are images. The third block reduces the data dimension by removing redundancy and unreliable components from the data and hence represents the object of interest by means of a reduced set of numerical features.

2.1.1 Image acquisition

Camera sensors are devices able to capture a 2D projection of the observed scene, by processing the light field emanating from it. The fundamental components of a camera are a lens and a set of photosensitive sensors, each one corresponding to an image pixel. The incoming light intensity is measured by these sensors, and stored in the corresponding pixel. The final result, i.e., the computer image, is nothing but a matrix of pixels, the value of which is proportional to the brightness of the corresponding point in the scene.

(38)

18 Feature extraction techniques

2.1.2 Image segmentation

Image segmentation is the process of partitioning an image into components, re-gions or objects, so as to change the representation of an image into something easier to be analyzed in order to get information in the region of interest of an image. Several algorithms and methods have been developed for image segmen-tation. However, every method is not equally appropriate for a determined class of images. A lots of factors, e.g., image, color, intensity, noise level influence the outcome of the image segmentation procedure. A complete overview on these techniques is presented in [26]. Particularly important for the application object of this work are edge detection methods. The concept behind edge detection is straightforward: given a set of pixels described by gray-level values, an image edge is represented by an abrupt (step) change in intensity. Once the edge of an object has been detected, the object itself can be segmented from the image. The output that is received applying edge detection algorithms is a binary image, an image that has only two possible values for each pixel, which usually correspond to the color black and white.

2.1.3 Feature extraction

Usually this phase is referred to as high-level feature extraction to distinguish it from low-level feature extraction, to which the previously mentioned edge detection belongs [27]-[28]. While low-level features are those basic features that can be extracted from an image without any shape information, high-level ones concern finding shapes and recognizing objects. It is worth noting that extraction is far more complex than detection, since extraction implies that we have a description of a shape, such as its position and size, whereas detection of a shape merely implies knowledge of its existence within an image.

2.2 Edge detection

Many approaches to image interpretation are based on edges, since analysis on edge detection is insensitive to change in the overall illumination level. Edge detec-tion highlights image contrast. Detecting contrast, which is difference in intensity, can emphasize the boundaries of features within an image. Essentially, as already mentioned, the boundary of an object is a step change in the intensity levels and the edge is located at the position of this step change. Three fundamental steps are commonly used in these techniques:

1. Filtering: in order to facilitate the detection of edges, it is essential to repress as much noise as possible and determine changes in intensity in the neighbourhood of a point, without destroying the true edges;

(39)

2.2 Edge detection 19

2. Detection of edge points: determine which edge pixels should be discarded as noise and which should be retained (usually, thresholding provides the criterion used for detection);

3. Edge localization: not all of the points in an image are edges for a particular application. Edge localization determines the exact location of an edge. Generally speaking the edge detection operators are based on the differenti-ation, both first or higher order, of the pixels-value of an image. Just to make an example, consider an horizontal edge operator. When applied to an image P, it is capable of detecting vertical changes in intensity, edges Ex, differencing horizontally adjacent points

Exx,y = |Px,y− Px+1,y| ∀x ∈ 1, N − 1; y ∈1, N. (2.1)

However, this operator will not show up horizontal changes in intensity. To detect horizontal edges, a vertical edge operator is needed. This will determine horizon-tal intensity changes but not vertical ones, so the vertical edge detector detects horizontal edges, Ey, according to

Eyx,y = |Px,y− Px,y+1| ∀x ∈ 1, N ; y ∈ 1, N − 1. (2.2)

Combining the two gives an operator E that can detect vertical and horizontal edges together, that is

Ex,y = |2Px,y− Px+1,y− Px,y+1| ∀x, y ∈ 1, N − 1. (2.3)

Equation (2.3) gives the coefficients of a differencing template which can be con-volved with an image to detect all the edge points, usually written as

M = 2 −1 −1 0

. (2.4)

Clearly, differencing adjacent points provides an estimate of the first-order deriva-tive at a point. If the point is taken between points separated by ∆x, then, exploiting Taylor series expansion on a function f , we obtain:

f(x + ∆x) = f (x) + f0(x)∆x + f00(x)∆x

2

2! + O(∆x

3

). (2.5) By rearrangement, the first-order derivative f0_{(x) is:}

f0 = f(x + ∆x) − f (x)

∆x − O(∆x). (2.6) This shows that the difference between adjacent points is an estimate of the first-order derivative, with error O(∆x). This error depends on the size of the interval ∆x and on the complexity of the curve. In practice, the small sampling of image

(40)

pixels and the reduced high-frequency content make this approximation adequate. However, the error can be reduced by spacing the differenced points by one pixel. This is equivalent to computing the first-order difference delivered by equation (2.1) at two adjacent points, as a new horizontal difference Exx where:

Exxx,y = Exx+1,y+ Exx,y = Px+1,y− Px,y+ Px,y− Px−1,y = Px+1,y− Px−1,y. (2.7)

This is equivalent to incorporating spacing to detect the edges Exx by:

Exxx,y = |Px+1,y− Px−1,y| ∀x ∈ 2, N − 1; y ∈ 1, N. (2.8)

To analyze this result, expanding also f (x − ∆x) in Taylor series and differencing it from (2.5), we obtain:

f0 = f(x + ∆x) − f (x − ∆x)

2∆x − O(∆x

2

). (2.9) This result suggests that the estimate of the first-order difference is now the difference between points separated by one pixel, with error O(∆x2). If ∆x < 1, this error is clearly smaller than the one associated with differencing adjacent pixels. The template associated with this operation is given by:

Mx =1 0 −1 . (2.10)

It gives the vertical edges detected at its central pixel. A transposed version of the template gives a vertical edge-detection operator. Based on these concepts, other first-order edge-detection operators exist.

2.2.1 Roberts cross operator

The Roberts cross operator (Roberts, 1965) was one of the earliest edge-detection operators. It implements a version of basic first-order edge detection and uses two templates that differentiate pixel values in a diagonal manner, as opposed to along the axes’ direction. The two templates are called M− _{and M}+ _{and are given by:}

M−=+1 0 0 −1 , M+ = 0 +1 −1 0 . (2.11) In implementation, the maximum value delivered by application of these templates is stored as the value of the edge at that point. The edge point Ex,y is then the

maximum of the two values derived by convolving the two templates at an image point:

(41)

where by ∗ we denote the convolution product. In alternative, the results coming from the application of the two templates are considered as providing compo-nents of an edge vector : the strength along the horizontal and vertical axes. The edge magnitude is the length of the vector and the edge direction is the vector’s orientation.

2.2.2 Prewitt operator

The edge-detection operation, being strictly related to differentiation, is strongly affected by noise. It is therefore prudent to incorporate some sort of averaging within the process itself. Therefore, usually the differentiation templates are ex-tended along a higher number of rows and columns, typically three. This idea is at the basis of the Prewitt edge-detection operator (Prewitt and Mendelsohn, 1966) that consists of the two templates:

Mx =   1 0 −1 1 0 −1 1 0 −1  , My =   1 1 1 0 0 0 −1 −1 −1  . (2.13) The image, as already mentioned, is then convolved with these two kernels:

M x= Mx∗ Px,y, M y = My ∗ Px,y ∀x, y ∈ 1, N − 1. (2.14)

Finally, the edge magnitude M and the edge direction θ are simply determined as: M(x, y) =pM x(x, y)2_{+ M y(x, y)}2_, _(2.15) θ(x, y) = arctan M y(x, y) M x(x, y) ! . (2.16)

2.2.3 Sobel operator

When the weight of the central pixels, for both Prewitt templates, is doubled, this gives the well-known Sobel edge-detection operator (Sobel, 1970) which, again, consists of two masks to determine the edge in vector form. It was the most pop-ular edge-detection operator until the development of edge-detection techniques with a theoretical basis. This was due to the fact that it provided better per-formance than other contemporaneous edge-detection operators. However, some theoretical considerations can be made considering the combinations of averag-ing and differencaverag-ing. Let’s firstly consider image averagaverag-ing, which, as previously mentioned, has the aim of reducing noise. In images, noise arises in sampling, in quantization, in transmission and in processing. Thus, according to the central limit theorem, the result of adding these independent noise sources together is

(42)

a Gaussian-distributed noise source. As a matter of fact, this is not necessarily true, but it can be considered a good approximation. Among the averaging oper-ators, the Gaussian averaging retains more features while attenuating noise and it is particularly used in edge detection operations. The Gaussian function g at coordinates (x, y) is controlled by its variance σ2 _{according to:}

g(x, y, σ) = 1 2πσ2e

−(x2+y2)

2σ2 . (2.17)

A surface plot of the 2D Gaussian function is shown in Figure 2.1.

Figure 2.1: Gaussian function with zero mean and standard deviation σ = 1 Let’s now consider that the binomial expansion gives the integer coefficients of a series that, in the limit, approximates the normal distribution. As a result, Pascal’s triangle gives sets of coefficients for a smoothing operator, which, in the limit, approaches the coefficients of a Gaussian smoothing operator. The Pascal’s triangle is presented in Figure 2.2 for different windows size.

Window size 2 3 4 5 1 1 1 1 1 1 1 1 2 3 3 4 6 4

Figure 2.2: Pascal’s triangle

As can be seen, the coefficients of smoothing for a window size of 3 are the one used in the Sobel operator. As far as differencing is concerned, the Pascal’s

(43)

triangle, but this time for subtraction, can be obtained as well. It is represented in Figure 2.3. Window size 2 3 4 5 1 -1 1 -1 1 -1 1 -1 0 1 -1 2 0 -2

Figure 2.3: Pascal’s triangle for subtraction

It is simply obtained by subtracting the templates derived from two adjacent expansions for a smaller window size. Again the row for the window size of 3 corresponds to the Sobel’s coefficients in the other direction with respect to the averaging ones. They are also the same coefficients which have been found through Taylor series expansion in (2.10).

In conclusion, the general form of Sobel operator combines Gaussian smoothing along one axis, and differencing along the other. This behaviour can be generalized allowing the determination of larger templates.

Note that similar procedures are applied to find second order edge-detection operators, e.g., Marr-Hildreth operator [29].

2.2.4 Canny edge-detection operator

Among the edge detection techniques, the Canny edge-detection operator [30] is perhaps the most popular at present. It was formulated for detection of step-type edge influenced by white noise, with three main objectives:

1. Good detection. There should be a low probability of failing to mark real edge points, and low probability of falsely marking non-edge points. Since both these probability are monotonically decreasing functions of the output signal to noise ratio, this criterion corresponds to maximizing signal to noise ratio.

2. Good localization. The points marked as edges by the operator should be as close as possible to the centre of the true edge.

3. Only one response to a single edge. This is implicitly captured in (1) since when two nearby operators respond to the same edge, one of them must be considered a false edge. However, the mathematical form of the first criterion did not capture the multiple responses requirement and it had to be made explicit.

(44)

Let the step amplitude be A and the noise be n(x), then the considered input signal I(x) is:

I(x) = Au(x) + n(x), (2.18) where u(x) is the unit step function. Let the impulse response of the sought operator be represented by f (x). Then, the output O(x0) of the application of

the operator to the input I(x) is given by the convolution integral, O(x0) =

Z ∞

−∞

I(x)f (x0− x)dx. (2.19)

Using the linearity of the convolution, the integral can be split into contributions due to the step and to noise only. The output due to the step only is (at the centre of the step, i.e., x0 = 0):

Z ∞ −∞ f(x)Au(−x)dx = A Z 0 −∞ f(x)dx, (2.20) while the mean squared response to the noise component only will be

E " Z ∞ −∞ f(x)n(−x)dx #2 , (2.21)

where E[y] is the expectation value of y. If the noise is white the above simplifies to: E " Z ∞ −∞ f2(x)n2(−x)dx # = n20 Z ∞ ∞ f2(x)dx, (2.22) where n2

0 = E[n2(x)] for all x, i.e., n20 is the variance of the input noise. Thus,

the output signal-to-noise ratio is defined as the quotient of the response to the step only and the square root of the mean squared noise response:

S.N.R= A R0 −∞f(x)dx n0 q R∞ −∞f2(x)dx . (2.23)

From this expression we can define a measure Σ of the signal to noise performance of the operator which is independent from the input signal:

S.N.R= A n0 Σ, Σ = R0 −∞f(x)dx q R∞ −∞f2(x)dx . (2.24) The first criterion corresponds to find the impulse response f which maximizes this quantity.

As far as the second criterion is concerned, consider that, while for an ideal step a single maximum at the centre of the edge is expected, since the signal I(x)

(45)

contains noise, this maximum is expected to be displaced from the true position of the edge. Canny, in order to obtain a performance measure for this criterion, used the reciprocal of the standard deviation of the distance of the actual maximum from the centre of the true edge. This choice was driven by the objective of having a scale independent criterion.

A maximum in the output O(x0) of the operator corresponds to a zero-crossing

in the spatial derivative of this output. We wish to find the position x0 where:

O0(x0) = d dx0 Z ∞ −∞ f(x)I(x0 − x)dx = 0, (2.25)

which by the differentiation theorem for convolution can be simplified to: Z ∞

−∞

f0(x)I(x0− x)dx = 0. (2.26)

To find x0 it is possible again to split the derivative of the output into components

due to the step and due to noise only (call these O0

s and O 0 n respectively). O0_s(x0) = Z ∞ −∞ f0(x)Au(x0− x)dx = Z x0 −∞ Af0(x)dx = Af (x0). (2.27)

The response of the derivative filter to the noise only (at any output point) will be a Gaussian random variable with zero mean and variance equal to the mean-squared output amplitude:

E[O0n2(x)] = n 2 0 Z ∞ −∞ f02(x)dx. (2.28) We now add the constraint that the function f should be antisymmetric. An arbitrary function can always be split into symmetric and antisymmetric com-ponents, but it should be clear that the symmetric component adds nothing to the detection or localizing ability of the operator, but will contribute to the noise components that affect both. The Taylor expansion of O0

s(x0) about the origin

gives:

O0_s(x0) = Af (x0) ≈ x0Af0(0). (2.29)

For a zero-crossing in the output O0 _{we require:}

O0(x0) = Os0(x0) + O0n(x0) = 0, (2.30)

i.e., O0

s(x0) = −O0n(x0) and E[O0

2

s(x0)] = E[O0

2

n(x0)]. Substituting for the two

outputs from (2.28) and (2.29), we obtain: E[x2 0] ≈ R∞ −∞n 2 0f0 2 (x)dx A2_f02 (0) = δx 2 0, (2.31)

(46)

where δx0 is an approximation of the standard deviation of the distance of the

actual maximum from the true edge. The localization is defined as the reciprocal of δx0: Localization= A n0 |f0_(0)| q R∞ −∞f0 2 (x)dx . (2.32)

Again, we define a performance measure Λ which is a property of the operator only: Localization= A n0 Λ, Λ = |f 0_(0)| q R∞ −∞f0 2 (x)dx . (2.33)

The two criteria just derived must be now combined in a meaningful way. It turns out that if the product is used, a measure that is both amplitude and scale independent is obtained. This measure is a property of the shape of the impulse response f only: Σ(f )Λ(f ) = R0 −∞f(x)dx q R∞ −∞f2(x)dx |f0_(0)| q R∞ −∞f0 2 (x)dx . (2.34) All that remains is to find a function which maximizes this expression. Numerical methods were used by Canny to find a finite number of parameter values once the solution had been reduced to a parametric form. However, it was noted that the final result was approximately equal to a first derivative of a Gaussian:

f(x) = − x σ2e

−x2

2σ2. (2.35)

The performance of this approximation resulted worse than the actual solution by about 20%. However, since it would be probably difficult to detect this difference by looking at the performance of the two operators on real images, and because the first derivative of a Gaussian operator can be computed with much less effort in two dimensions, it has been used in all practical applications.

Until now, in one dimension, the step edge has been characterized by one position coordinate. In two dimensions, also the edge direction must be taken into account. In order to detect an edge of a particular orientation, we can convolve the image with an operator which gives the first derivative in a direction normal to the one of the edge. According to the analysis done in one-dimension, we will use an operator, gn, which is a first derivative of a Gaussian function g, but now,

computed in the direction of the normal, n⊥:

gn =

∂g ∂n⊥

(47)

where n⊥can be estimated from the first-order derivative of the Gaussian function

g convolved with the image P , and scaled appropriately as n⊥ =

∇(P ∗ g)

|∇(P ∗ g)|. (2.37) The location of the true edge point is at the maximum point of gn convolved with

the image. This maximum is when the differential (along n⊥) is zero:

∂(gn∗ P )

∂n⊥

= 0. (2.38)

By substituting equation (2.36) in (2.38), we obtain ∂2(gn∗ P )

∂n2 ⊥

= 0. (2.39)

Equation (2.39) provides the basis for a two-dimensional operator which meets the desired criteria. However, it is virtually impossible to achieve this exact im-plementation of Canny given the requirement to estimate the normal direction.

2.2.5 Implementation

A common approximation of the Canny edge-detector, readily implementable, can be summarized in the following steps:

1. Gaussian smoothing;

2. Use of first-derivative operator; 3. Use of non-maximum suppression; 4. Hysteresis thresholding.

It should be noted that the first two stages can be combined using directly the derivative of the Gaussian function as detector. However, in practical applications, these two steps are usually kept separated.

For what concerns the Gaussian smoothing, since the image is stored as a matrix of values corresponding to intensities of its pixels, a discrete approximation of the Gaussian function is needed. Starting from equation (2.17), it is possible to calculate the coefficients for a Gaussian template that is then convolved with an image. The Gaussian function essentially removes the influence of points greater than 3σ in (radial) distance from the center of the template. The effect of choosing larger templates is to remove more details and noise at the expense of losing features. The size of the template, then, essentially dictates appropriate choice of the variance. The variance is chosen to ensure that template coefficients drop to near zero at the template’s edge. In this work, a 5 × 5 kernel of a Gaussian

(48)

function with standard deviation σ = 1.4 has been used. Its expression is the following: G= 1 159       2 4 5 4 2 4 9 12 9 4 5 12 15 12 5 4 9 12 9 4 2 4 5 4 2       . (2.40)

The result of its application is shown in Figure 2.4.

(a) (b)

Figure 2.4: Image used as test example (a) and after Gaussian smoothing (b) As next step, a 2D derivative operator is applied. In particular, the previously mentioned Sobel operator has been implemented. As highlighted in Section 2.2.2, the two convolution kernels are applied to the image and then the edge magnitude M and edge direction θ are computed. This information is then exploited to suppress the non-local maxima in the considered pixel’s neighbourhood and in the direction of the gradient. For what concerns the direction of the gradient, it should be related to a direction that can be traced in an image. Considering a pixel in an image, indicated as ∗ in (2.41), there are only four possible directions when describing the surrounding pixels, namely 0° (horizontal direction-1), 45° (positive diagonal-2), 90° (vertical direction-3) and 135° (negative diagonal-4)

4 3 2 1 ∗ 1 2 3 4

. (2.41)

Consequently, the direction of the gradient must be approximated to these four possible values. As next step, the non-maximum suppression is applied in

(49)

the edge direction, normal to the gradient. If the gradient magnitude of the considered pixel exceeds both the corresponding values of the neighbouring pixels in the edge direction, then that pixel is marked as a maximum with value equal to its gradient magnitude, otherwise it is set to zero. In Figure 2.5, the results obtained after the application of Sobel operator and non-maximum suppression are shown.

(a) (b)

Figure 2.5: Result after Sobel operator application (a) and after non-maximum suppression (b)

Finally, the last stage of Canny edge-detector, which has the aim of satisfying its third and last criterion, consists in an hysteresis thresholding. It requires two values, an upper and a lower threshold. The pixels with intensity gradient higher than the maximum are identified as edges, while those with value lower than the minimum are discarded. Those pixels who lie in the range between the two thresholds are evaluated based on their connectivity with other pixels. If they are connected to pixels who have intensity gradient higher than the maximum, they are considered to be part of that edge. Otherwise, they are also discarded. The final result of the algorithm is shown in Figure 2.6.

2.2.6 Thresholding

Compared with the other edge detection algorithms, Canny edge-detection can provide much better and more reliable results, which has made it the criterion for evaluating other methods. However, it must be noted that this operator is very sensitive to the two thresholds used for hysteresis thresholding. In fact, images are easily affected by factors such as the environment and illumination, which can cause substantial gray-level changes between the image background and a

(50)

Figure 2.6: Final result (after hysteresis thresholding)

target, which degrades the edge detection performance of the operator because the parameters must be adjusted manually. Such weak contrast conditions reduce the adaptability of the algorithm, causing it to easily fail to detect edges [31].

Various researchers have improved the original algorithm and proposed some improved Canny edge-detection algorithms based on adaptive thresholds. One of the most popular solutions, is the one of exploiting Otsu thresholding (Otsu, 1979) [32]. This method is based on gray-level histogram. Gray-level histogram shows how individual brightness level are occupied in an image [33]. The histogram plots the number of pixels with a particular brightness level against the brightness level itself. For a 8-bit pixels, the brightness ranges from 0 (black) to 255 (white). This kind of histograms allows to distinguish an object from the background based on the different ranges of intensities present. Otsu’s technique maximizes the likelihood that the threshold is chosen so as to split the image between the object and its background. This is achieved by selecting a threshold that gives the best separation of classes, for all pixels in an image. The Otsu’s technique is formulated as follows. Denoting with L the number of gray levels in a picture, with ni the number of pixels at level i and with N the total number of pixels

N = n1+ n2+ · · · + nL, the gray-level histogram is normalized and regarded as a

probability distribution: pi = ni N, pi ≥ 0, L X i=1 pi = 1. (2.42)

Now suppose to split the pixels into two classes C0 and C1, background and

objects or vice versa, by a threshold at level k; C0 denotes pixels with levels

(51)

of class occurrence and the class mean levels, respectively, are given by: ω0 = P r(C0) = k X i=1 pi = ω(k), (2.43) ω1 = P r(C1) = L X i=k+1 pi = 1 − ω(k), (2.44) and µ0 = k X i=1 ipi ω0 = µ(k) ω(k), (2.45) µ1 = L X i=k+1 ipi ω1 = µT − µ(k) 1 − ω(k) , (2.46) where ω(k) = k X i=1 pi, (2.47) and µ(k) = k X i=1 ipi, (2.48)

are the zeroth- and the first-order cumulative moments of the histogram up to the kth level respectively, and

µT = µ(L) = L

X

i=1

ipi, (2.49)

is the total mean level of the original image. In order to evaluate the performance of the threshold at level k, the between-class variance is computed:

σ_B2 = ω0(µ0− µT)2+ ω1(µ1− µT)2 = ω0ω1(µ1− µ0)2 (2.50)

Substituting the expressions for ω0 (2.43), ω1 (2.44), µ0 (2.45) and µ1 (2.46), the

following expression is obtained:

σ_B2(k) = [µTω(k) − µ(k)]

2

ω(k)[1 − ω(k)] . (2.51) The optimal threshold is the level for which the variance of class separability is at its maximum, namely the optimal threshold k∗ _{is the one for which}

σ_B2(k∗) = max

1≤k<Lσ 2

B(k). (2.52)

(52)

32 Feature extraction techniques (a) 0 100 200 300 400 500 600 700 800 900 1000 _{Gray-level histogram} Otsu threshold 0 50 100 150 200 250 (b)

Figure 2.7: Gray-level histogram with Otsu thresholding (b) of image (a) This method is able of providing the best threshold value in the statistical sense and it is regarded as the most stable method in image threshold segmentation at present. However, the Otsu algorithm provides a unique global threshold for the image. An extension of this technique to Canny edge-detector, where two threshold values are needed, has been proposed in [34]. Specifically, the value returned by the Otsu algorithm is considered the higher threshold Th, while the

lower one is obtained multiplying it by a coefficient lower than one. In particular, empirically was found that a value ranging from 0.4 and 0.5 is appropriate in many applications [35]. It is worth noting that, anyway, the global nature of the two values here obtained may cause some losses in local characteristics in images with uneven back-ground.

2.3 High-level feature extraction

As already mentioned, high-level feature extraction concerns finding shapes and objects in computer images. Among the existent techniques, the simplest is through thresholding. However, this is only likely to provide a solution when illu-mination and lighting can be controlled. As a consequence, two main approaches can be considered: one is to extract constituent parts and the other is to extract constituent shapes. In the former, the objects are represented as a collection of in-terest points or of distribution of low-level features, e.g., the edges analyzed in the previous section. Conversely, the use of shape can be investigated. To this pur-pose, template matching, a model-based approach in which the shape is extracted by searching for the best correlation between a known model and the pixels in an image, can be used. As far as this thesis work is concerned, being the object of interest fixed and repetitive in shape, the latter approach is preferable. In

(53)

par-2.3 High-level feature extraction 33

ticular, a technique which can be thought as belonging to the template matching approaches, called Hough transform has proved to be particularly successful in the detection and description of lines [36].

The Hough transform [37] is a technique that locates shapes in images. Its prime advantage, with respect to other techniques of the same class, is that it can deliver the same result as that for template matching, but faster [38]. This is achieved by a reformulation of the template matching process, based on an evidence-gathering approach where the evidence is the votes cast in an accumu-lator array. The Hough transform implementation defines a mapping from the image points into an accumulator space, called Hough space, which requires much less computational resources than template matching. Although this technique has been used to extract various shapes like lines, circles and ellipses, in this work, only line extraction will be considered. Let’s consider a line in R2 _{defined as:}

y= mx + b, (2.53) where m is the slope and b is the intersect of the line. Using this representation of the line, the accumulator space will be defined by the parameters m and b. Let’s consider, for example, the three points represented in Figure 2.8a. Each of these points, of coordinates (xi, yi) is converted into a line in the accumulator space

simply considering

b = yi− mxi i= 1 : 3 ∀m ∈ R. (2.54)

The result of this operation is shown in Figure 2.8b.

0 0.5 1 1.5 2 2.5 3 3.5 4 0 0.5 1 1.5 2 2.5 3 3.5 4 (a) -3 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 -2 0 2 4 6 8 10 A B C (b)

Figure 2.8: Conversion of three points from the image space (a) to the accumulator space (b)

It is worth noting that one of the resulting lines in the accumulator space represents the set of all the lines in the image space that can be traced passing

A computer vision line-tracking algorithm for UAV GNSS-aided guidance

Politecnico di Milano

School of Industrial and Information Engineering

Master of Science in Aeronautical Engineering

A computer vision line-tracking algorithm for UAV

GNSS-aided guidance

Acknowledgments

Abstract

Sommario

Contents

List of Figures

List of Tables

Introduction

Objective

State of the art

Approach

Thesis structure

Chapter 1

Problem statement

1.1

Multirotor kinematics and dynamics

1.1.1

Reference frames

1.1.2

Rotation formalism

1.1.3

Flight dynamics

1.2

Mathematical formulation

Chapter 2

Feature extraction techniques

2.1

Overview

2.1.1

Image acquisition

2.1.2

Image segmentation

2.1.3

Feature extraction

2.2

Edge detection

2.2.1

Roberts cross operator

2.2.2

Prewitt operator

2.2.3

Sobel operator

2.2.4

Canny edge-detection operator

2.2.5

Implementation

2.2.6

Thresholding

2.3

High-level feature extraction