Contents lists available atScienceDirect
Computers and Electronics in Agriculture
journal homepage:www.elsevier.com/locate/compagOriginal papers
Cost-effective visual odometry system for vehicle motion control in
agricultural environments
Shahzad Zaman
a, Lorenzo Comba
b,⁎, Alessandro Biglia
a, Davide Ricauda Aimonino
a,
Paolo Barge
a, Paolo Gay
aaDiSAFA – Università degli Studi di Torino, Largo Paolo Braccini 2, 10095 Grugliasco (TO), Italy bDENERG – Politecnico di Torino, Corso Duca degli Abruzzi 24, 10129 Torino, Italy
A R T I C L E I N F O Keywords:
Precision agriculture Visual odometry
Unmanned ground vehicle (UGV) Real-time image processing Agricultural field robots
A B S T R A C T
In precision agriculture, innovative cost-effective technologies and new improved solutions, aimed at making operations and processes more reliable, robust and economically viable, are still needed. In this context, robotics and automation play a crucial role, with particular reference to unmanned vehicles for crop monitoring and site-specific operations. However, unstructured and irregular working environments, such as agricultural scenarios, require specific solutions regarding positioning and motion control of autonomous vehicles.
In this paper, a reliable and cost-effective monocular visual odometry system, properly calibrated for the localisation and navigation of tracked vehicles on agricultural terrains, is presented. The main contribution of this work is the design and implementation of an enhanced image processing algorithm, based on the cross-correlation approach. It was specifically developed to use a simplified hardware and a low complexity me-chanical system, without compromising performance. By providing sub-pixel results, the presented algorithm allows to exploit low-resolution images, thus obtaining high accuracy in motion estimation with short computing time. The results, in terms of odometry accuracy and processing time, achieved during the in-field experi-mentation campaign on several terrains proved the effectiveness of the proposed method and its fitness for automatic control solutions in precision agriculture applications.
1. Introduction
Precision agriculture (PA) has been recognised as an essential ap-proach to optimise crop-managing practices and to improve field pro-ducts quality ensuring, at the same time, environmental safety (Ding et al., 2018; Grella et al., 2017; Lindblom et al., 2017). In very large fields and/or in-fields located on hilly areas, cropland monitoring and maintenance may result in a laborious task, requiring automatic ma-chines and procedures (Comba et al., 2018; Grimstad and From, 2017). In this regard, unmanned ground vehicles (UGVs) are playing a crucial role in increasing efficiency in cultivation, e.g. in optimising the use of fertilisers or precision weed control (Utstumo et al., 2018; Vakilian and Massah, 2017; De Baerdemaeker, 2013).
To perform agricultural in-field tasks with the least amount of human interaction, UGVs should be characterised by a high level of automation (van Henten et al., 2013; Kassler, 2001). Nowadays, de-veloped autonomous navigation systems, which use GPS technologies (Bonadies and Gadsden, 2019) and/or machine vision approaches (García-Santillán et al., 2017), allow UGVs, for example, to follow crop
rows autonomously, even in complex agricultural scenarios. A common requirement for these applications is a robust up-to-date position and orientation assessment during movements (Ghaleb et al., 2017). Despite the wide diffusion of GPS systems, they show limitations and drawbacks when high precision navigation is required or where the satellite signal is poor, e.g. in covered areas, greenhouses or peculiar hilly regions (Ericson and Åstrand, 2018; Aboelmagd et al., 2013). In agricultural environments, UGV motion estimation by wheel odometry also en-counters critical limitations due to wheels slippage on sloped terrains, which is very typical in some crops such as vineyards (Bechar and Vigneault, 2016; Aboelmagd et al., 2013; Nourani-Vatani et al., 2009). Visual odometry (VO), the measurement of the position and or-ientation of a system by exploiting the information provided by a set of successive images (Moravec, 1980), can provide reliable movement feedback in UGV motion control (Aqel et al., 2016; Scaramuzza and Fraundorfer, 2011). The hardware required to implement a VO system consists of one or more digital cameras, an image processing unit and an optional lighting system. Not requiring external signals or refer-ences, visual odometry has been proven to be very significant in
https://doi.org/10.1016/j.compag.2019.03.037
Received 14 November 2018; Received in revised form 4 February 2019; Accepted 31 March 2019 ⁎Corresponding author.
E-mail address:[email protected](L. Comba).
0168-1699/ © 2019 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/BY-NC-ND/4.0/).
particular contexts where the GPS signal is weak or absent (even where the magnetic field cannot be exploited by compass), by overcoming the limitations of other methodologies (Scaramuzza and Fraundorfer, 2011).
Two main typologies of VO systems can be defined on the basis of the adopted number of cameras: (1) stereo systems use data provided by multiple cameras while (2) monocular systems, characterised by a simple and cost-effective setup, exploit a single digital camera. The image processing of stereo systems is typically complex and time con-suming and requires accurate calibration procedures; indeed, an un-synchronised shutter speed between the stereo cameras can lead to errors in motion estimation (Aqel et al., 2016; Jiang et al., 2014). However, the stereo system degrades to the monocular case when the stereo baseline (the distance between the two cameras) is small com-pared to the distance of the acquired scene by the cameras (Aqel et al., 2016).
The available image processing algorithms for VO applications have two main approaches: (1) feature-based algorithms and (2) appearance-based algorithms. In feature-appearance-based VO, specific features/details de-tected and tracked in the sequence of successive images are exploited (Fraundorfer and Scaramuzza, 2012). Depending on the application, the performance to be achieved and the different approaches in feature selection, several algorithms can be found in literature, such as Libviso (Geiger et al., 2012), Gantry (Jiang et al., 2014) or the Newton-Raphson search methods (Shi and Tomasi, 1994). A different approach is adopted in appearance based-algorithms where successive image
frames are searched for changes in appearance by extracting informa-tion regarding pixels displacement. The template matching process, which is a widely recognised approach among VO appearance-based solutions, consists in selecting a small portion within a frame (called template) and in comparing it with a temporally subsequent image, then scoring the quality of the matching (Gonzalez et al., 2012; Goshtasby et al., 1984). This task has mainly been performed by using the sum of squared differences (SSD) and normalised cross-correlation (NCC) as similarity measures (Aqel et al., 2016; Yoo et al., 2014; Nourani-Vatani et al., 2009). This latter matching measure, even if computationally heavier than SSD, is invariant to the linear gradient of image contrast and brightness (Mahmood and Khan, 2012; Lewis, 1995).
Motion assessment by VO systems has been proven to be particu-larly effective when integrated with other sensors such as the inertial measurement unit (IMU), compass sensor, visual compass (Gonzalez et al., 2012), GPS technology or encoders (e.g. on wheels and tracks), to avoid error accumulation on long missions (Zaidner and Shapiro, 2016). Indeed, with particular attention to agricultural applications, innovative and reliable solutions should be developed to reduce system complexity and costs by implementing smart algorithms and by ex-ploiting data fusion (Comba et al., 2016; Zaidner and Shapiro, 2016).
In this paper, a reliable and cost-effective monocular visual odo-metry system, properly calibrated for the localisation and navigation of tracked vehicles on agricultural terrains, is presented. The main con-tribution of this work is the design and implementation of an enhanced
Nomenclature
CEPs circular error probable of translation assessment errors [mm]
di,j digital number of pixel located atithrow andjthcolumn of imageI
d¯u,v average values of digital numbers within a portion of imageI
f f
[ , ]x y x and y component of image focal length [pixel] gx image pixels spatial resolution [mm/pixel]
gy image pixels spatial resolution [mm/pixel]
hc camera height from the ground [mm]
Ik acquired grey scale image at time instant tk i,j
l digital number of pixel located atithrow andjthcolumn of image L
¯
l average values of digital numbers within templateT ( ) L ( )k image obtained by rotating imageIkby angle
n distance threshold from M
m coefficient to set the threshold values for
NixNj image size (height × width) [pixel] OUGV
k{ }k origin of the UGV{ }kreference frame at time tk
pT template size pUGV
i,j{ }k position of pixeldi,jin the reference frame UGV{ }kat time tk[mm]
+ pu vUGV
,
{ }k 1 position of the template T ( )
k centre in imageIk 1+ [mm]
p p
[ , ]T
c,x c,y position coordinates of the camera centre in the UGV{ }k reference frame [mm]
q u v( , , ) binary function to select a neighbourhood of u v( , , ) R (·) rotation matrix
s (·)(ors + (·)
kk 1 ) evaluated vehicle translation (between time instant tkandtk 1+) [mm]
sr reference vehicle translation [mm] tk generic image acquisition time instant [s]
T ( )k pixel subset, called template, of imageL ( )k U ordered set of u indices
u v
[ , , ]e e e weighted centroid of UGV
{ }k reference frame of the UGV at time tk
V ordered set ofvindices
wT semi-width of the templateTk[pixels] Greek letters
u v
( , , )normalised cross-correlation function M maximum value of u v( , , )
angular resolution of the VO process [deg] specific subset of
s error in translation assessment between two successive images [mm]
error in orientation assessment between two successive images [deg]
rotation angle of imageL ( )k [deg] evaluated vehicle rotation [deg] r reference vehicle rotation [deg] min minimum value of [deg] max maximum value of [deg]
ordered set of all considered rotation angles ( ={ min, min+ , , max}) [deg]
µ average of rotation assessment errors [deg]
s standard deviation of translation assessment errors [mm] standard deviation of rotation assessment errors [deg] Acronyms
CCD charged coupled device CEP circular error probable GPS global positioning system GSD ground sample distance IMU inertial measurement unit NCC normalised cross correlation PA precision agriculture SSD sum of squared differences UGV unmanned ground vehicle VO visual odometry
image processing algorithm, based on the cross-correlation approach, with sub-pixel capabilities. It was specifically developed to use a sim-plified hardware and a low complexity mechanical system, without compromising performance. In the implemented VO system, installed on a full electric tracked UGV, ground images acquisition was per-formed by an off-the-shelf camera. The performance of the system, in terms of computing time and of movement evaluation accuracy, was investigated with in-field tests on several kinds of terrains, typical of agricultural scenarios. In addition, the optimal set of algorithm para-meters was investigated for the specific UGV navigation/motion control for precision agricultural applications.
The paper is structured as follows:Section 2reports the description of the implemented tracked UGV and of the vision system. The pro-posed algorithm for visual odometry is presented inSection 3, while the results from the in-field tests are discussed inSection 4.Section 5 re-ports the conclusion and future developments.
2. System setup
The implemented VO system was developed to perform the motion and positioning controls of a full electric UGV specifically designed for precision spraying in tunnel crop management, where GPS technology is hampered by metal enclosures. Image acquisition is performed by a Logitech C922 webcam, properly positioned in the front part of the vehicle, with a downward looking setup at the height h( )c of 245 mm from the ground. To improve the quality of the acquired images, the camera was shielded with a properly sized rigid cover to protect the portion of ground within the camera field of view from direct lighting, thus avoiding irregular lighting and the presence of marked shadows. The illumination of the observed ground surface is provided by a lighting system made of 48 SMD LED 5050 modules (surface-mount device light-emitting diode) with an overall lighting power of more than 1000 lm and a power consumption of 8.6 W. Fig. 1reports the diagram of the VO system setup together with an image of the im-plemented UGV system.
The image acquisition campaign was conducted on five different terrains (soil, grass, concrete, asphalt and gravel), typical of agricultural environments, in order to assess and quantify the performance of the proposed algorithm. Two datasets of more than 16,000 pairs of grey scale images (8-bit colour representation), at two image resolutions,
were processed. Images with a high-resolution have a size of 1280 × 720 pixels (width and height) while low-resolution ones, which were obtained by down sampling the high-resolution ones, are 320 × 240 pixels (width and height). The sample images at high and low-resolution, acquired on five different terrains, are shown inFig. 2. A grey scale imageIk, acquired at time instant tk, can be defined as an ordered set of digital numbersdi,jas
=
Ik {di,j [0, 1, ,255] 1 i Ni, 1 j Nj} (1) whereiand j are the row and column indices whileNiandNjare the numbers of pixels per row and column, respectively.
The intrinsic camera parameters and acquisition settings were evaluated by performing a calibration procedure (Matlab© calibration toolbox). The focal length in pixel was( , )f fx y =(299.4122, 299.4303) and( , )f fx y =(888.5340, 888.8749) for the low-resolution and high-re-solution images respectively. The position [mm] of pixelsdi,jin the UGV reference frame UGV{ }kat time tk, defined with originOkin the bar-ycentre of the tracked system and with the x-axis aligned to the ve-hicle’s forward motion direction (Fig. 4), can thus be easily computed as
= + p N h f N h f p p j 2 , 2 i [ , ] UGV i,j{ } j c x i c y T c,x c,y T k (2) whereh f c x and h f c
yare the pixels’ spatial resolutions gxandgy[mm/pixel] respectively and p[c,x,pc,y]Tare the position coordinates of the camera centre [mm] in the UGV{ }k. In the implemented UGV, the position coordinates of the camera with respect to the barycentre of the tracked system are [950, 0]T mm. The relevant camera and images intrinsic parameters adopted in this work are summarised inTable 1.
3. Visual odometry algorithms
In visual odometry, the objective of measuring the position and orientation of an object at timetk 1+, knowing its position and
orienta-tion at time tk, is performed by evaluating the relative movement of a solid camera having occurred during time intervaltk 1+ tk. This task is performed by comparing the image pair Ik andIk 1+,acquired in the
ordered time instants tkandtk 1+, respectively.
In the normalised cross-correlation (NCC) approach, a pixel subset
T ( )k (also named template) is selected from the imageL ( )k centre,
Fig. 1. Scheme (a) and picture (b) of the implemented UGV prototype. In the final version of the visual odometry system, the lower part of the shielding rigid cover
which is obtained rotating imageIkby an angle , as = T ( ) L ( ) i N w N w 2 T, j 2 T k li,j k i j (3) whereli,j is a digital number of image LkandwT is the semi-width [pixels] of the templateTk. The adopted template sizepTcan be defined as a fraction of the shortest image dimension aspT=2·w NT· i 1; with this definition pT [01]. With no assumption on the performed move-ment, angle is usually selected from an ordered set of values ={ min, min+ , , max}, with minand maxchosen to consider the whole circle angle. The parameter can be defined as the angular resolution of the process.
The relative movement ofIk 1+ with respect to imageIk, in terms of translation u v[ , ]T[pixels] and rotation [deg], is thus performed by assessing the position of the ground portions represented in templates T ( )k in the subsequent imageIk 1+ by solving the problem
=max ( , , )u v
u v
M
, , (4)
with u U={w wT, T+1, ,Ni wT}, v V={w wT, T+1, , Nj w }T, and where u v( , , ) is the normalised cross-correla-tion funccross-correla-tion (Aqel et al., 2016; Lewis, 1995) defined as
= = = + + + + = = + + + + + + u v d d d d ( , , ) ( ¯ ) ·( ¯) ( ¯) ·( ¯) w w w w u v u v I w w T w w w w u v I w w T i j i ,j , i ,j ( ) i j i ,j 2 i ,j 2 ( ) T T T T k 1 T T k T T T T k 1 T T k l l l l (5) with = = = + + + d d w ¯ ( ) 4· u v w w w w u v I , i j i ,j T2 T T T T k 1 (6) and = = = + + w ¯ ( ) 4· w w w w w w T i j i ,j ( ) T2 T T T T T T k l l (7) the average values of the digital numbers within a portion of imageIk 1+
and template T ( ),k respectively. A scheme of the implemented NCC algorithm is reported inFig. 3.
The relative movements +
kk 1performed by the UGV in the time in-tervaltk 1+ tk(Fig. 4) can thus be easily computed as
= + + s ( , , )u v R( )·pu vUGV p N N UGV kk 1 {, } 2 , 2 { } k 1 i kj (8) where R( ) is the rotation matrix of angle , p +
u v{UGV, }k 1is the tem-plate T ( )k assessed position [mm] inIk 1+ (represented in{UGV}k 1+, Eq. (2)), and pUGV
, { }
Ni N 2 2j
k is the known position [mm] of template T k in Ik, (represented in UGV{ }k, Eq. (2). For the sake of clarity, it should be noted that pUGV
, { }
Ni N 2 2j
k is equal to p[ ,p ]
c,x c,y T, which is [950, 0]Tmillimetres, and thats + ( , , )u v
kk 1 coincides withOk 1{UGV+ }k, which is the origin of the
reference frame {UGV}k 1+ represented in UGV{ }k reference frame (Fig. 4).
3.1. Enhanced cross-correlation algorithm
The quality of the UGV’s movement measure, using normalised cross-correlation-based visual odometry algorithms, is strictly related to
the solution of the problem defined in Eq.(4). The approach of con-sidering the sole maximum value M of ( , , )u v , with
+
u {w wT, T 1, ,Ni wT}, v {w wT, T+1, ,Nj wT} and , has intrinsic limitations regarding maximum achievable accuracy. In-deed, the digital discretisation of the field of view performed by the digital camera and the discrete set of the investigated orientation affect both the translation and the rotation assessments. The accuracy of the VO system is thus related to the adopted image resolution, being directly related to the pixels ground sample distance (GSD) gxandgy and the angle step adopted in the image processing. Regarding this aspect, an accuracy improvement can be pursued by adopting high-resolution cameras, which can provide images with smaller pixels GSD gxandgy: favourable effects are linked, in the meanwhile, to the ac-curacy of u v[ , ]T and to the angular resolution values. Indeed, con-cerning the rotation procedure of imageL ( )k , if the rotation angle is small, no modifications are obtained on the pixels’ digital number in the central part of the image, where the template is selected. For the sake of clarity, the smallest values which lead to template T ( )k modifications, in relation to image resolution and template sizepT, are reported inTable 2.
However, increasing image resolution leads to a considerable in-crement in the required computing load, which does not fit with the real-time requirements of the VO algorithm application or requires technologies which are too expensive.
The proposed approach is aimed at increasing VO assessment ac-curacy by using very low-resolution images, which allows to drastically reduce the computing load while achieving results comparable to the ones obtained by processing high-resolution data. This translates into more cost-effective systems, requiring economical acquisition and processing hardware.
For this purpose, a function q u v( , , )was defined as
= < > q u v u v m u v u v n u v m u v u v n ( , , ) 0 if ( , , ) · , ([ , , ] [ , , ]) [1, 1, ] 1 if ( , , ) · , ([ , , ] [ , , ]) [1, 1, ] M 1 2 M 1 2 (9) in order to consider a neighbourhood of the maximum M(Eq.(4)) of cross-correlation discrete function u v( , , )in the space u v( , , ), with values higher than m·M. In particular,n is the distance threshold from Mand m is the coefficient to set the values threshold. In this work, adopted values aren =5andm=0.95on the base of empirical eva-luations. The Hadamard product with [1, 1, 1]was adopted to nor-malise the weight of the three spatial coordinates u v( , , ).
The enhanced movement assessment is thus performed by com-puting the weighted centroids u v[ , , ]e e e of (Fig. 5), as
= = = = = = = u u u v q u v q u v · ( , , )· ( , , ) ( , , ) u wN w v wN w u w N w v w N w e z 1 card( ) z z z 1 card( ) z T i T T j T T i T T j T (10) = = = = = = = v v u v q u v q u v · ( , , )· ( , , ) ( , , ) v w N w u N w u w N w v w N w e 1 z 1 card( ) z z z 1 card( ) z T j T i T T i T T j T (11) and = = = = = = = u v q u v q u v z· ( , , )· ( , , ) ( , , ) u wN w v wN w u w N w v w N w e z 1 card( ) z z z 1 card( ) z T i T T j T T i T T j T (12) With the proposed approach, the UGV’s movement evaluation is not defined by discrete values, since u v[ , , ]e e e 3.
4. Results and discussion
The performance of the proposed visual odometry system, devel-oped for a UGV motion estimation, was assessed by processing more than 16,000 images. The in-field tests were performed on different agricultural terrains by acquiring images on soil, grass, asphalt,
Table 1
Intrinsic parameters of the camera and of the processed images. Image type Ni Nj fx(pixels) fy(pixels) gx(mm/
pixel)
gy(mm/ pixel) Low-resolution 320 240 299.4303 299.4122 0.8182 0.8183 High-resolution 1280 720 888.8749 888.5340 0.2756 0.2757
concrete and gravel. In particular, both rectilinear and curvilinear paths were planned. Considering the whole dataset, the travelled distance between two subsequent images ranges between 0 mm (static vehicle) and 70 mm, which guarantees a minimum overlapping area of 72%.
The relative rotation does not exceed the range of [−9 +9] degrees, due to the short movement between two acquired frames. The image resolutions were 1280 × 720 pixels (high-resolution images) and 320 × 240 pixels (low-resolution images). To evaluate the performance
improvements of the proposed algorithm, with sub-pixel capabilities, the set of acquired images was also processed by means of a standard VO algorithm (Computer Vision System Toolbox,MathWorks, 2018).
The performance analysis of the proposed VO system was per-formed: (1) by assessing motion evaluation accuracy in pairs of suc-cessive images, using high-resolution datasets as a reference, and (2) by computing the cumulative error with respect to in-field position
Fig. 4. Visual odometry variables layout: position pNUGVi N 2 , 2j
{ }k of template Tkin the UGV reference frame UGV{ }k(xkand y
kaxis with Okorigin); positionpu v{UGV, }k 1+ of template T ( )k in the updated UGV reference frame{UGV}k 1+ (xk 1+ andyk 1+ axis withOk 1+ origin); rotation angle of imageIk 1+ with respect to Ikand UGV evaluated movement assessments+
kk 1.
Table 2
Angular resolution ,minas a function of the template size pT.
Template size pT
0.1 0.2 0.3
Image resolution 320 × 240 2.24 [deg] 1.15 [deg] 0.77 [deg] 1280 × 720 0.77 [deg] 0.39 [deg] 0.26 [deg]
Fig. 5. 3D cross-correlation matrix u v( , , )and the position coordinates u v[ , , ]e e e of the weighted centroids obtained by the enhanced VO algorithm.
Table 3
Accuracy in translation evaluation provided by standard and the enhanced al-gorithms, detailed for different terrains and considering the overall acquired data. Adopted template size pT=0.2. Achieved percent improvement of the enhanced algorithm is also reported for every evaluation.
Terrains Standard algorithm
accuracy Enhanced algorithmaccuracy Accuracy improvement CEPs [mm] s[mm] CEP[mm]s s[mm] CEP[%]s s[%] Soil 0.45 0.19 0.19 0.10 58.48 50.11 Grass 0.35 0.19 0.19 0.14 44.10 24.43 Concrete 0.28 0.14 0.14 0.08 50.64 44.55 Asphalt 0.37 0.11 0.13 0.07 63.31 36.93 Gravel 0.39 0.14 0.16 0.07 57.40 52.29 Overall 0.37 0.16 0.16 0.09 54.79 41.66
references travelling about 10 m long paths.
Concerning a pair of successive images, the error in measuring the relative movement s and the rotation between two subsequent images was defined as
= s(·) s
s r 2 (13)
and
= r (14)
respectively, where s(·) (Eq.(8)) and are the vehicle’s movement and rotation, evaluated by using the enhanced and standard algorithm and
by processing low-resolution images, whilesrand rrepresent the re-ference measurements from the high-resolution images. Concerning the translation assessment, accuracy was expressed by the circular error probable (CEPs) and standard deviation ( s) indices (Winkler et al., 2012) (Table 3), while accuracy in measuring the changes in vehicle orientation were described by computing the average (µ ) and standard deviation ( ) of the computed errors (Table 4). The results were detailed for each in-field test performed on a specific kind of terrain and, finally, computed by considering the whole image dataset. Overall accuracy in the translation assessment of the proposed algo-rithm across different terrains resulted to beCEPs=0.16mm, with an improvement of around 54% with respect to the values obtained by processing the images with the standard algorithm, which shows a CEPs of 0.37 mm. The average error in the vehicle’s orientation assessment was µ =0.26degrees, with an improvement of around 67.6% with respect to the values obtained by processing the images with the stan-dard algorithm. The typology of terrain slightly affects the achieved performance: on the grass surface, a lower performance improvement was found compared to other terrains. Indeed, the greater variability in object height within the camera field of view can lead to additional perspective errors. Nevertheless, even in these complex scenarios, im-provements of 44% in the CEPsand of 34% in the orientation assess-ment was observed (CEPs= 0.19 mm andµ =0.42 degree) compared to the ones obtained by the standard algorithm. Boxplots of errors s and , computed by considering the whole image dataset, are reported inFig. 6for standard and enhanced algorithms. The x and y compo-nents of sand the CEPscircles are detailed inFig. 7, with represented
Table 4
Accuracy in orientation evaluation provided by standard and the enhanced algorithms, detailed for different terrains and considering the overall acquired data. Adopted template size pT=0.2. Achieved percent improvement of the enhanced algorithm is also reported for every evaluation.
Terrains Standard algorithm
accuracy Enhanced algorithmaccuracy Accuracy improvement
µ [deg] s[deg] µ [deg] s[deg] µ [%] s[%]
Soil 0.75 0.38 0.29 0.24 61.12 37.19 Grass 0.65 0.36 0.42 0.26 34.45 29.23 Concrete 0.96 1.19 0.18 0.14 81.59 88.48 Asphalt 1.08 1.50 0.15 0.15 86.46 90.09 Gravel 0.97 0.87 0.25 0.20 74.30 76.99 Overall 0.88 0.86 0.26 0.20 67.58 64.39
by using a colour bar.
The cumulative error was computed for 20 sample paths of the tracked vehicle with a length of 9.6 m, defined as a curvilinear path generated by a sinusoidal trajectory of 0.15 m amplitude and of 3.2 m period. The number of acquired images for a path repetition ranges between 156 and 166, with an average travelled distance between two consecutive frames of 61 mm. Defining a normalised cumulative error with respect to the travelled distance, the obtained values are 0.08 and 0.84 [deg·m 1] for what concerns translation and orientation, respec-tively. The improvement compared to the standard algorithm is of about 60% for both the translation and orientation assessments. The boxplots of all the obtained cumulative errors, expressed in normalised values, are reported inFig. 8. Considering a constant travelled distance, the cumulative error is strictly related to the number of processed images, as every processing step contributes to the overall error. With this assumption, to minimise the cumulative error, pairs of frames ac-quired at the largest distance, still guaranteeing the proper overlapping surface, should be used. For this purpose, a multi-frame approach can further improve system performance (Jiang et al., 2014).
The optimal configuration for a VO system setup requires thorough analysis of the parameters related to image processing and their tuning
according to the application requirements. With particular attention to the overall VO system performance, the sizepTof the template T ( )k is a relevant algorithm parameter since it is strictly related to (1) the mo-tion accuracy measure, (2) the allowed maximum length of the relative movement between two subsequent images, which should still assure the required overlapping surface of the template, (3) the computing time and, thus, (4) the maximum allowed velocity with a specific VO setup.
The template sizepThas a non-linear and non-monotonic effect on the overall VO system’s accuracy. Considering the translation assess-ment, by varyingpTwithin the range 0.05–0.35, an optimal value can be found that provides the best accuracy. Indeed, the proposed algo-rithm achieves a CEPs=0.16mm for pT=0.20, while accuracy de-grades toCEPs=0.21mm andCEPs=0.22forpT =0.05andpT=0.35, respectively. The boxplots of errors s and , obtained by settingpT within the range 0.05–0.35, are reported inFigs. 9 and 10, respectively. The observed accuracy trend in determining the vehicle’s orientation is similar to the one described for translation, with the exception of the effect ofpTvalues greater than0.20on the accuracy’s decrement: it is less marked untilpTexceeds 0.6, values that lead to insufficient over-lapping surfaces between two successive images. Indeed, regarding
Fig. 7. Representation of x and y component of errors sobtained by the standard (a) and the enhanced algorithm (b). Errors are represented with a colormap from black to red. Circle areas bounded by the CEPsis represented with blue solid line. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Fig. 8. Boxplots of normalised cumulative errors of translation cs(a) and rotation c (b) assessment measured on 20 repetition of 9.6 m long sample path on several terrains, obtained by standard and enhanced algorithm.
proper overlapping surfaces between successive images, the template size should not exceed a certain value. Larger template sizespTrequire a shorter relative movement of the vehicle between image acquisition
time instants to avoid complete mismatch between a pair of successive images. In the implemented VO system performance evaluation, in-creasingpTfrom 0.1 to 0.6 will limit the maximum allowed movement
Fig. 9. Boxplots of accuracy in translation measurement s (·), detailed for algorithm template size pTfrom 0.05 to 0.4 and for typology of travelled terrain. (soil (a-b), grass (c-d), concrete (e-f), asphalt (g-h) and gravel (i-j)).
from 93.1 to 39.2 mm, requiring a higher framerate to keep proper image acquisition when considering a constant vehicle velocity.
Concerning the computing time, smaller pT values allow to
drastically reduce the required time to process an image pair: con-sidering a low-resolution dataset, the average computing time (0.02 s) using pT=0.05 is 88% shorter than the one required by pT=0.35
Fig. 10. Boxplots of accuracy in rotation measurement , detailed for algorithm template size pTfrom 0.05 to 0.4 and for typology of travelled terrain. (soil (a-b), grass (c-d), concrete (e-f), asphalt (g-h) and gravel (i-j)).
(0.19 s). Fig. 11a reports the average computing time obtained for processing low and high resolution images with a template size pT
ranging from 0.05 to 0.8.
Consequently, the allowed maximum velocity of the vehicle is thus strictly related to template size: considering a constant computing power, smaller template sizes lead to higher vehicle maximum speeds, due to the concurrent effects on the processing time required for an image pair and the length of the maximum allowed movement between two subsequent images. In the implemented VO system, processing low-resolution images by using a value ofpT=0.05, the upper limit velocity (about 4.1 m·s 1) is 91% greater than the one allowed by p =0.35
T (about 0.3 m·s 1). The maximum allowed velocities for low and high-resolution images with respect to template sizepTranging from 0.05 to 0.8 are represented inFig. 11b.
5. Conclusions
In this paper, an enhanced image processing algorithm for a cost-effective monocular visual odometry system, aimed at obtaining highly reliable results at low computational costs for a tracked UGV navigation in agricultural applications, is presented. The implemented VO system consists of a downward looking low cost web-camera sheltered with a rigid cover to acquire images with uniform LED lighting. Based on the normalised cross-correlation methodology, the proposed VO algorithm was developed to exploit low-resolution images (320 × 240 pixels), achieving sub-pixel accuracy in motion estimation. The algorithm al-lows the VO system to be applied to real-time applications using cost-effective hardware, by requiring a lower computational load.
The robustness of the proposed VO algorithm was evaluated by performing an extensive in-field test campaign on several terrains ty-pical of agricultural scenarios: soil, grass, concrete, asphalt and gravel. The relationship between system performances and more relevant al-gorithm parameters was investigated in order to determine a proper final system setup.
The obtained overall accuracy, in terms of circular error probable and normalised cumulative error, which are0.16mm and 0.08 respec-tively, were compatible with UGV requirements for precision agri-cultural applications. The obtained short computing time allowed the vehicle to achieve a maximum velocity limit higher than 4 m·s 1.
Based on the relative motion assessment, the performance of VO systems degrades when incrementing path length. Therefore, the system integration with absolute reference is required to maintain the needed accuracy during long mission paths.
Acknowledgements
The presented research was partially funded by the Italian Ministry of University and Research PRIN 2015 “Ottimizzazione di macchine operatrici attraverso l’analisi del profilo di missione per un’agricoltura più efficiente” (Prot. 2015KTY5NW) and the “ElectroAgri” Project, Manunet ERA-NET, Call 2016, P.O.R. FESR 2014-2020 European Program.
References
Aboelmagd, N., Karmat, T.B., Georgy, J., 2013. Fundamentals of inertial navigation, sa-tellite-based positioning and their integration. Springer, doi:https://doi.org/10. 1007/978-3-642-30466-8.
Aqel, M.O.A., Marhaban, M.H., Saripan, M.I., Ismail, N.Bt., 2016. Review of visual odo-metry: types, approaches, challenges, and applications. SpringerPlus 5.https://doi. org/10.1186/s40064-016-3573-7.
De Baerdemaeker, J., 2013. Precision agriculture technology and robotics for good agricultural practices. IFAC Proc. Volumes 46, 1–4.https://doi.org/10.3182/ 20130327-3-JP-3017.00003.
Bechar, A., Vigneault, C., 2016. Agricultural robots for field operations: Concepts and components. Biosyst. Eng. 149, 94–111.https://doi.org/10.1016/j.biosystemseng. 2016.06.014.
Bonadies, S., Gadsden, S.A., 2019. An overview of autonomous crop row navigation strategies for unmanned ground vehicles. Eng. Agric. Environ. Food 12, 24–31. https://doi.org/10.1016/j.eaef.2018.09.001.
Comba, L., Biglia, A., Ricauda Aimonino, D., Gay, P., 2018. Unsupervised detection of vineyards by 3D point-cloud UAV photogrammetry for precision agriculture. Comput. Electron Agric. 155, 84–95.https://doi.org/10.1016/j.compag.2018.10.005. Comba, L., Gay, P., Ricauda Aimonino, D., 2016. Robot ensembles for grafting herbaceous
crops. Biosyst. Eng. 146, 227–239.https://doi.org/10.1016/j.biosystemseng.2016. 02.012.
Ding, Y., Wang, L., Li, Y., Li, D., 2018. Model predictive control and its application in agriculture: a review. Comput. Electron Agric. 151, 104–117.https://doi.org/10. 1016/j.compag.2018.06.004.
Ericson, S.K., Åstrand, B.S., 2018. Analysis of two visual odometry systems for use in an agricultural field environment. Biosyst. Eng. 166, 116–125.https://doi.org/10.1016/ j.biosystemseng.2017.11.009.
Fraundorfer, F., Scaramuzza, D., 2012. Visual odometry: Part II: Matching, robustness, optimization, and applications. IEEE Rob. Autom. Mag. 19, 78–90.https://doi.org/ 10.1109/MRA.2012.2182810.
García-Santillán, I.D., Montalvo, M., Guerrero, J.M., Pajares, G., 2017. Automatic de-tection of curved and straight crop rows from images in maize fields. Biosyst. Eng. 156, 61–79.https://doi.org/10.1016/j.biosystemseng.2017.01.013.
Geiger, A., Lenz, P., Urtasun, R., 2012. Are we ready for autonomous driving? The KITTI vision benchmark suite. Conference on Computer Vision and Pattern Recognition (CVPR).
Ghaleb, F.A., Zainala, A., Rassam, M.A., Abraham, A., 2017. Improved vehicle positioning algorithm using enhanced innovation-based adaptive Kalman filter. Pervasive Mob. Comput. 40, 139–155.https://doi.org/10.1016/j.pmcj.2017.06.008.
Gonzalez, R., Rodriguez, F., Guzman, J.L., Pradalier, C., Siegwart, R., 2012. Combined visual odometry and visual compass for off-road mobile robots localization. Robotica 30, 865–878.https://doi.org/10.1017/S026357471100110X.
Goshtasby, A., Gage, S.H., Bartholic, J.F., 1984. A two-stage cross correlation approach to template matching. In: IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-6, pp. 374–378,https://doi.org/10.1109/TPAMI.1984.4767532.
Fig. 11. Template size pTinfluence on processing time (a) and maximum allowed UGV velocity (b). Results obtained by processing low-resolution and high-resolution images are represented by blue and red lines, respectively.
Grella, M., Gil, E., Balsari, P., Marucco, P., Gallart, M., 2017. Advances in developing a new test method to assess spray drift potential from air blast sprayers. Span J. Agric. Res. 15.https://doi.org/10.5424/sjar/2017153-10580.
Grimstad, L., From, P.J., 2017. Thorvald II - a modular and re-configurable agricultural robot. IFAC-PapersOnLine 50, 4588–4593.https://doi.org/10.1016/j.ifacol.2017.08. 1005.
Jiang, D., Yang, L., Li, D., Gao, F., Tian, L., Li, L., 2014. Development of a 3D ego-motion estimation system for an autonomous agricultural vehicle. Biosyst. Eng. 121, 150–159.https://doi.org/10.1016/j.biosystemseng.2014.02.016.
Kassler, M., 2001. Agricultural automation in the new millennium. Comput. Electron. Agric. 30, 237–240.https://doi.org/10.1016/S0168-1699(00)00167-8. Lewis, J.P., 1995. Fast Template Matching. Vis. Interface, 95, pp. 120–123. Lindblom, J., Lundström, C., Ljung, M., Jonsson, A., 2017. Promoting sustainable
in-tensification in precision agriculture: review of decision support systems develop-ment and strategies. Precis. Agric. 18, 309–331. https://doi.org/10.1007/s11119-016-9491-4.
Mahmood, A., Khan, S., 2012. Correlation-coefficient-based fast template matching through partial elimination. IEEE Trans. Image Process. 21, 2099–2108.https://doi. org/10.1109/TIP.2011.2171696.
MathWorks, 2018. Computer Vision System Toolbox.
Moravec, H., 1980. Obstacle Avoidance and Navigation in the Real World by a Seeing Robot Rover. PhD thesis. Stanford University.
Nourani-Vatani, N., Roberts, J., Srinivasan, M.V., 2009. Practical visual odometry for car-like vehicles. In: IEEE International Conference on Robotics and Automation, 1–7
(2009), pp. 3551–3557,https://doi.org/10.1109/ROBOT.2009.5152403. Scaramuzza, D., Fraundorfer, F., 2011. Visual odometry Part I: the first 30 years and
fundamentals. IEEE Rob. Autom. Mag. 18, 80–92.https://doi.org/10.1109/MRA. 2011.943233.
Shi, J., Tomasi, C., 1994. Good features to track. In: EEE Conference on Computer Vision and Pattern Recognition, doi:https://doi.org/10.1109/CVPR.1994.323794. Utstumo, T., Urdal, F., Brevik, A., Dørum, J., Netland, J., Overskeid, Ø., et al., 2018.
Robotic in-row weed control in vegetables. Comput. Electron Agric. 154, 36–45. https://doi.org/10.1016/j.compag.2018.08.043.
van Henten, E.J., Bac, C.W., Hemming, J., Edan, Y., 2013. Robotics in protected culti-vation. IFAC Proc. Volumes 46, 170–177. https://doi.org/10.3182/20130828-2-SF-3019.00070.
Vakilian, K.A., Massah, J., 2017. A farmer-assistant robot for nitrogen fertilizing man-agement of greenhouse crops. Comput. Electron. Agric. 139, 153–163.https://doi. org/10.1016/j.compag.2017.05.012.
Winkler, V., Bickert, B., 2012. Estimation of the circular error probability for a Doppler-Beam-Sharpening-Radar-Mode. In: 9th European Conference on Synthetic Aperture Radar, pp. 368–371.
Yoo, J., Hwang, S.S., Kim, S.D., Ki, M.S., Cha, J., 2014. Scale-invariant template matching using histogram of dominant gradients. Pattern Recognit. 47, 3006–3018 10.1016/ j.patcog.2014.02.016.
Zaidner, G., Shapiro, A., 2016. A novel data fusion algorithm for low-cost localisation and navigation of autonomous vineyard sprayer robots. Biosyst. Eng. 146, 133–148. https://doi.org/10.1016/j.biosystemseng.2016.05.002.