4 Large-scale Image Registration using a Tiling-based Strategy

(1)

4 Large-scale Image Registration

using a Tiling-based Strategy

4.1 Introduction

Chapter 2 presented an innovative methodological framework for planetary image registration based on a two-step strategy composed of a coarser feature-based step and a finer area-feature-based step. Conversely, Chapter 3 proposed an area-based solution for the multisensor image registration problem. Such solution was based on the domain adaptation capability of conditional gen-erative adversarial networks. This chapter will focus again on a multisensor registration scenario, but this time in a large-scale setup. The proposed method is indeed capable of addressing the challenging problem of multisen-sor image registration while also keeping the computational cost and the time needed for convergence as low as possible, thus allowing the registration of large datasets. Moreover, the registration of images defined on a large pixel grid is also feasible, due to the definition of a tiling strategy with an ad-hoc scheduling mechanism.

It is worth mentioning that the work presented in this chapter has been experimentally validated in the context of the project "CCI+ HRLC: Cli-mate Change Initiative Extension (CCI+) Phase 1, New Essential CliCli-mate Variables (NEW ECVS) High Resolution Land Cover ECV", funded by the European Space Agency (ESA). The goal of the project is the study of how

(2)

100

4. LARGE-SCALE IMAGE REGISTRATION the climate change correlates with the high resolution land cover (LC) and land-cover changes (LCC). Indeed, such products are generated on the basis of optical and radar imagery collected from 1990 to 2019. It is therefore straightforward to understand the importance of data fusion methodologies applied in the context of the project, to benefit from the available optical and SAR images across the aforementioned time window. Multisensor image registration (or multisensor geolocation) plays a major role in this framework. Besides being deployed and validated in the context of the aforemen-tioned project, the proposed multisensor registration method can be applied in many different scenarios. Its validity is not restricted to the special case of the CCI HRLC processing chain but, as it is also stressed in the experimental analysis section, it can be applied to all such cases in which image registra-tion is mandatory for the extracregistra-tion of informaregistra-tion from joint collecregistra-tions of data coming from different sensors (i.e., optical and SAR). Indeed, as it will be shown later, the proposed method has been tested with a large variety of heterogeneous datasets, ranging from Sentinel-1 and Sentinel-2, to Landsat-7, Landsat-8, and ENVISAT ASAR, and thus proving its applicability to a large spectrum of application.

This chapter is organized as follows. Section 4.2 will introduce the pro-posed methodology for large scale multisensor image registration, with details on the state-of-the-art solutions, the registration framework that has been adopted, the tiling strategy that has been developed to possibly cope with large datasets, and the possibility of applying non-stationary transformation models while also keeping the computational requirements as low as possi-ble. Then, the achieved results are reported in Section 4.3. The conclusions, together with the possible future developments, are drawn in Section 4.6. Finally, the appendix in Section 4.A will briefly introduce the CCI+ HRLC project in the context of which the proposed methodology has been devel-oped and experimentally validated, giving insights to the problem at hand and to the large-scale scenario that the proposed methodologies have been designed for.

4.2 The Multisensor Geolocation Module

As anticipated before, besides it has been deployed and validated in the context of the CCI+ HRLC project (for further details please refer to the

(3)

4.2. PREVIOUS WORK

101

Appendix 4.A), the proposed solution has a broader range of applicability, being able to cope with the registration of multisensor datasets in the more general framework of large scale remote sensing. In particular, the architec-ture of the proposed multisensor registration method is highly influenced by the large-scale nature of the problem it has been designed to solve. Indeed, in order to process a large amount of data, it has been specifically designed to keep the computational requirements as low as possible, yet preserving its effectiveness and supporting the generation of further products making the best use of complementary data sources. Moreover, the proposed tiling-based strategy and the associated ad-hoc scheduling mechanism allow the application of the proposed multisensor registration method to images that are defined on large pixel grids. The combination of large scale processing and applicability to large images is in line with the increasing interest in the field of big data within the remote sensing community.

As for the experimental validation, the proposed solution has been proven flexible enough to process data collected by multiple satellites, thus not be-ing anchored to a specific type of data. Indeed, both in the context of the CCI+ HRLC project and in the experimental analysis reported in this thesis (see Section 4.3), many different sensors and satellites have been used as in-put source of information, also spanning a time frame that goes back to the year 1990. Additionally, the proposed method has been demonstrated to be flexible in the types of spatial feature used for the registration, as they vary based on the geographical area at hand.

Therefore, the rationale of this preliminary and general analysis is that, on the one hand, there are many factors that constrains the design of the proposed module in view of the applicability to large scale scenarios but, on the other hand, the resulting solution has been proven effective and robust in the application to many different problems. As for Chapter 3, multisensor image registration is already a challenging and open problem per se, and the application to large-scale scenarios, as it is the case of the CCI+ HRLC project, further increases the difficulty of the problem. Many design choices are required, and the rest of this Section will go deeper into such details.

4.2.1 Previous Work

The registration problem that is addressed in this Chapter can be classified as belonging to a large-scale scenario, where the trade-off between the

(4)

com-102

4. LARGE-SCALE IMAGE REGISTRATION putational complexity and the robustness of the method is critical. In the literature, when the images to register cover a large area on the ground, or when the resolution is particularly high (i.e., the images are defined over a large pixel grid), non-homogeneous transformations are usually the primary choice [163, 164]. Non-homogeneous transformations, differently from the homogeneous case [16], are able to model the cases where the best trans-formation fitting the pair of images may change across the pixel grid. An example of such transformation is represented by the piecewise-linear func-tions [165]. In this case, the registration process divides the images into triangular elements (for example, by Delaunay’s triangulation method) that are then individually mapped through a linear transformation.

In the context of feature-based image registration, non-homogeneous transformations have been studied in [163] and [164]. The works are aimed at comparing the characteristics of thin plate spline (TPS), multiquadric (MQ), piecewise linear (PL), and weighted mean (WM) transformations. Moreover, the performances of such transformation models in non-rigid and feature-based image registration problems are also compared. In particular, TPS and MQ are found to be most suitable when the set of control-point correspondences is not large (fewer than a thousand) and variation in spacing between the control points is not large. Conversely, when the spacing between the control points varies greatly, PL is found to produce a more accurate registration than TPS and MQ. When a very large set of control points is given and the control points contain positional inaccuracies, WM is preferred over TPS, MQ, and PL because it uses an averaging process that smoothes the noise and does not require the solution of a very large system of equations. Due to the high computational complexity of such non-homogeneous transformations, a state-of-the-art analysis points out that a lot of work has been conducted in order to try to reduce the complexity of the regis-tration process. It is worth recalling that, during the optimization phase (please refer to Chapter 2, Chapter 3, or [16]), the input image is trans-formed many times in order to compute the similarity metric for different setup of the transformation parameters. Since applying non-homogeneous transformations is resource demanding, the resulting computational effort is not negligible. Moreover, non-homogeneous transformations are described by a large number of parameters, thus increasing the dimension of the search space during optimization.

(5)

4.2. PREVIOUS WORK

103

Indeed, to reduce the computational burden, [166] proposed a multi-scale version of a registration method using piecewise linear transformations. Conversely, different strategies have been followed by [167] and [168], which proposed a 2-step and a 4-step registration strategy, respectively. By re-formulating the registration problem into a set of subsequent and simpler sub-problems, each one granting an improved accuracy with respect to the previous stage, they were able to reduce the overall computational burden, while also granting effectiveness and robustness.

The work proposed in this chapter is aimed at applying such non-homogeneous transformation in the context of multisensor image registra-tion. To maximize the effectiveness and the robustness of the method, the application of an area-based registration process is preferred to the feature-based case. The applicability to large-scale scenarios implies working in an unsupervised manner and, using the whole image area, allows the consequent exploitation of the whole set of spatial features.

The idea that has been implemented in this case is correlated to the tile-wise processing applied for semantic segmentation in the work by Derksen, Inglada, and Michel [169]. Briefly, the essence of their work is related to the fact that, in order to produce a segmentation map for a very large image, such image could be divided into a set of partially overlapping patches, and the segmentation could be run on each one of the patches. The final segmen-tation map could then be built based on the results of the single patches, while also granting consistency across the patches by taking into account the overlapping areas. Moreover, in [169] the tilewise method allowed for several pieces of the image to be processed simultaneously, thus enabling the deployment of the set of segmentation methods in a parallel processing environment.

Similarly to [169], the work proposed here makes use of the tilewise processing for approximating a piecewise RST transformation. Moreover, provided that a specific scheduling is adopted, the tilewise processing allows us to use the registration result of a given patch as an initialization point for the subsequent, thus strongly reducing the computational burden. Please refer to the next section for additional technical details.

(6)

104

4. LARGE-SCALE IMAGE REGISTRATION

Figure 4.1: Flowchart of the proposed tiling-based registration strategy.

4.2.2 The Proposed Tiling-based Strategy

Within the image registration literature, as it has also been reported in Chap-ter 2, ChapChap-ter 3, and in the previous Section 4.2.1, there exist many different strategies for matching two remotely sensed images. In particular, to give a brief recap, an image registration method is broadly composed of different elements, i.e.: (i) the geometric transformation used to warp the input image; (ii) the similarity measure used to compare the reference and input images during the registration process; and (iii) the optimization strategy used to minimize or maximize the similarity measure, depending on the semantic of the metric [16]. In the case at hand in this Chapter, due to the multisensor nature of the problem and due to the constraints deriving from the large-scale scenarios, the possible choices are restricted to a subset of all the possible similarity metrics, optimization methods, or transformation models.

As for the similarity metric, the choice has been the usage of the mutual information between the reference optical image and the input SAR image. In order to keep the computational burden as low as possible, the Mattes mutual information [170] has been chosen as the final implementation. The marginal and joint probability density functions are evaluated from the his-tograms, where the histograms are computed from a predefined set of pixels

(7)

4.2. TILING-BASED STRATEGY

105

Figure 4.2: Details on the division of the images into patches.

and whose number of bins has been set equal to 50. It is worth mentioning that the proposed method is not sensitive to the specific value of such an hyperparameter with respect to the accuracy. Indeed, setting the number of bins in the range [40, 100] has been experimentally demonstrated to not affect the final result (details of this preliminary experimental analysis are omitted for the sake of brevity). However, it was experienced that lower and higher values caused the method to converge to different solutions. Provided that a lower number of bins allows the method to converge faster (due to the lower number of operations required to compute the similarity measure), setting the number of bins to 50 allows to keep the computational require-ments as low as possible, while also ensuring the method to converge to the expected solution (e.g., choosing a lower value would have been better in terms of computation, but would have not been as conservative in terms of stability of the result).

Due to the non-differentiability of such a similarity metric, the opti-mization method has been chosen to be a modified version of the Powell’s minimization algorithm [171], whose application does not involve gradients or Hessian matrices and is feasible in the case of non-differentiable objective functions. The Powell’s conjugate direction method iteratively performs the

(8)

106

4. LARGE-SCALE IMAGE REGISTRATION minimization of the objective function by a multi-directional search along a set of search vectors in the parameter space, which aims at emulating the behavior of conjugate gradient directions without the use of deriva-tives. The line search along each search vector is the Brent’s minimization method [172], which combines the bisection method, the secant method, and inverse quadratic interpolation. After each iteration, the set of search vec-tors is updated. Indeed, the direction which contributed the most to the minimization of the objective is substituted with a linear combination of the search vectors, whose weights correspond to the scalar values determined during the line search along each vector.

The modification to the standard Powell’s minimization method simply integrates a set of barrier functions to cope with the unconstrained nature of the original optimization algorithm. This way it is possible to restrict the search space to the set of feasible transformations (i.e., based on the size and spatial resolution of the input images). Concerning the transformation model, as previously anticipated, the goal is to apply a non-homogeneous transformation. In particular, following the idea in [169], a tilewise processing is proposed, and the final piecewise transformation is approximated using a set of RST transformations applied to each one of the patches the images are divided into.

The proposed tiling-based processing works as described in the flowchart of Figure 4.1. First, the input full-size images are divided into overlapping patches. It is worth noting that a registration process, while warping the input image to match the reference, usually produces a transformed input image where some parts of the original image end up outside the pixel grid, and other parts of the grid have no information assigned to their pixels. Indeed, by registering overlapping patches and discarding the results on the borders it is possible to reconstruct a final full-size image that is not affected by such border artifacts (see Figure 4.2).

Taking into account such overlapping-patch strategy, following the flowchart and the Figures from 4.3 to 4.9, it is possible to appreciate the proposed al-gorithm. After the overlapping patches are extracted (Figure 4.3), a specific registration schedule is defined (Figure 4.4). Then, the first pair of patches is selected and the registration algorithm is run. At this stage, the algorithm is initialized with the identity transformation consisting of: (i) no translation along neither one of the two axis; (ii) no rotation; and (iii) unitary scaling

(9)

107

(10)

108

Figure 4.4: Tiling-based Registration (2). Setup of the tile-based registration

(11)

109

(12)

110

Figure 4.6: Tiling-based Registration (4). Initialization of the geolocation step

(13)

111

(14)

112

Figure 4.8: Tiling-based Registratio (6). Initialization of the geolocation step

(15)

113

(16)

114

4. LARGE-SCALE IMAGE REGISTRATION factor (Figure 4.5). The result of such registration step is forwarded to the second registration step (i.e., the one taking into account the second pair of patches) and used as initialization (Figure 4.6). Once the second registration is complete, the result is forwarded again to the next step (Figure 4.7 and Figure 4.8). Finally, after the entire set of patches has been registered, the final transformed image is reconstructed (Figure 4.9).

In addition to the flowchart of the proposed algorithm, it is worth noting a couple of further details. In particular, the evolution of the transformations found for each patch is used to detect possible anomalies in the registration of a particular pair of patches. In case an anomaly is found, the previous transformation is applied for warping that area of the image. Moreover, the same strategy is also adopted for all the cases where the optimization method failed to converge (e.g., in all the cases where the two patches do not contain any spatial feature, as for example an area completely covered by water). With respect to the anomaly, it is worth mentioning that the evolution of the transformations is analysed by using the sequence of distances between subsequent transformations.

In particular, adopting the usual notation, let Tpn≠1 and Tpnbe the trans-formations found for the (n ≠ 1)-th and n-th pair of patches. Moreover, let

pn≠1 and pn be the vectors containing the transformation parameters such

that pn≠1 = {tn≠1

x , tny≠1, ◊n≠1, kn≠1} and pn = {tnx, tny, ◊n, kn}. We denote as ”n

the element-wise absolute difference between the consecutive transformation vectors pn≠1 _{and p}n (i.e., ”n = |pn_{≠ p}n≠1_{|, where |·| indicates element-wise}

absolute difference and not the Euclidean norm). If one of the components of the resulting vector ”n exceeds a predefined threshold, an anomaly is

de-tected. Then, the transformation is updated according to Tpn = T_pn≠1 and

the process is continued. Similarly, for the non-convergence scenario, as for example when no spatial features are present in the imaged scene (a situation that may arise in unsupervised large-scale scenarios), the previous result is used and, again, the transformation is set equal to Tpn = T_pn≠1. The value

of the anomaly threshold depends on the transformation parameter under analysis, and it has been experimentally set equal to 80% of the width of the search space for that particular parameter. As an example, if the translation along the x axis is allowed to take on values in the range [≠100, 100], then the anomaly threshold for that parameter is computed as equal to 160.

(17)

4.3. EXPERIMENTAL RESULTS

115 4.3 Experimental Results

4.4 Dataset and Experimental Setup

The experimental analysis reported in this Section is aimed at highlighting the effectiveness and the robustness of the proposed multisensor registration method while challenged with data collected by multiple sensors, at multiple moments in time, and in different geographical areas. Indeed, besides its points of strength, this section will also analyse the weaknesses of the pro-posed method by reporting a practical example where some minor problems arose. As anticipated, the dataset chosen for the experiments comprises data collected by different sensors, at different moments in time, and on four dif-ferent geographical areas. The following paragraphs will provide additional details.

The optical images are collected by Sentinel-2 (experiments with 2019 data), Landsat-8 (experiments with 2015 data), and Landsat-7 (experiments with 2005 data). As for the SAR images, they are collected by Sentinel-1 (experiments with 2019 and 2015 data) and ENVISAT ASAR (experiments with 2005 data). The four locations correspond to as many geographical areas identified by the Sentinel-2 tiling system. In particular, two of them are located in Amazon (i.e., S-2 granules 21KUQ and 21KXT), one in the African Sahel (i.e., S-2 granule 37PCP), and the last one is located in Siberia (i.e., S-2 granule 42WXS). Since the proposed method has been experimentally validated in the context of the CCI+ HRLC project, the four areas are inside the regions of interest for the final products of the projects (i.e., referring to the appendix in Section 4.A, the regions of interest are the blue areas in Figure 4.20).

Before jumping into the experimental analysis, it is worth describing how the following paragraphs are organized. The datasets that have been used for the experiments reported in this thesis have been collected in the context of the CCI+ HRLC project. Therefore, all the available pair of opti-cal and SAR images correspond to real-case scenarios with no ground truth available. Therefore, the most significant part of the experimental analysis will rely on the visual interpretation of the results, thus providing a quanti-tative evaluation of the proposed method. However, in order to also provide a quantitative evaluation of the performances, two pairs of optical and SAR

(18)

116

4. LARGE-SCALE IMAGE REGISTRATION images have been manually registered (i.e., the Sentinel-1 and Sentinel-2 im-ages corresponding to the granule 42WXS in Siberia, and the the Sentinel-1 and Sentinel-2 images corresponding to the granule 21KUQ in Amazon). From a pair of registered images it has been possible to generate a set of semi-synthetic datasets, with well-known ground truth transformations.

Therefore, in the experimental analysis, the proposed method has also been quantitatively evaluated in terms of root mean square error (RMSE) (see Chapter 3.4). Working on a tile-based strategy, the final RMSE has been computed by averaging the RMSEs obtained by the registration of each patch. Moreover, on such semi-synthetic datasets, the proposed method has been experimentally compared with the more traditional application of an area based method to the entire granule. The results have been analysed in terms of the trade-off between the achieved root mean squared error and the computational requirements of the two solutions.

It is worth mentioning that the synthetic transformations that have been applied to the registered dataset belong to the family of the global RST transformations. Therefore, the application of the area based registration method to the whole area is not ill-posed, as it is possible for the method to retrieve the exact solution. Additionally, in the synthetic context at hand, provided that the synthetic transformation belongs to the set of the global transformation models, the problem addressed by the traditional application of the area based method to the entire scene is intrinsically less complex than the problem faced by the proposed method. Indeed, the RST model is specified by four parameters, which determine the search space in the first case. Conversely, the search space in the second case (i.e., the proposed method) is defined by four RST parameters for each of the patches in which the input image is divided into. Therefore, in the experimental analysis, the focus will be on the reduced computational complexity obtained by the proposed solution and its capability of achieving an accuracy score that is comparable with the one achieved by the method operating on the entire scene as a whole.

Moreover, it is worth spending a few words mentioning that the images that are being registered in such experiments do not correspond with the raw sensor data. Indeed, as it can be seen from the flowcharts of the CCI+ HRLC processing chains in the Appendix 4.A (Figure 4.21 and Figure 4.22), the multisensor geolocation module receives input data from the two

(19)

pre-4.5. ANALYSIS OF THE RESULTS

117

processing modules of the optical and SAR chains. Indeed, the input data to register consists of seasonal composites generated from time series of raw images. This is done for two main reasons. First, the seasonal composites generated from the optical images are meant for removing clouds and other artifacts due to the atmospheric conditions possibly affecting the acquisitions. In the case of the Landsat 7 data of 2005, this is important to also compensate for the relevant data gaps due to the scan line corrector (SLC) failure that affected the sensor in 2003. Second, the seasonal composites generated from the SAR images allow for multi-temporal filtering, helpful in order to reduce the speckle affecting the original radar data. Overall, the goal of such pre-processing chains, with respect to the multisensor geolocation module, is to favor that the spatial features present in the image are not affected by either atmospheric condition or speckle, but that they can be used for matching the pairs of images.

4.5 Analysis of the Results

The first example that is being analysed is shown in Figure 4.10. The figure represents the Sentinel-2 and Sentinel-1 seasonal composites for the summer season of 2019 in the area identified by the Sentinel-2 granule 42WXS. Fig-ure 4.10(a) shows the images before registration, while FigFig-ure 4.10(b) shows the registration result. The white part of the registered Sentinel-1 image cor-responds to the area where no data is present (e.g, the input S-1 image does not contain the information corresponding to the whole S-2 42WXS granule). Indeed, the output images must share the same pixel lattice for the further processing steps aiming at jointly benefiting from the input optical and SAR data. This is especially important in the context of the CCI+ HRLC pro-cessing chain (i.e., the two classification steps and the fusion step), but the very same considerations are valid for any other application aiming at jointly extracting information from optical and SAR data.

Moreover, Figure 4.11 and Figure 4.12 allow a more detailed analysis of the registration results. Indeed, the two figures show the registration results for two of the patches that the input images are divided into. In particu-lar, in both cases, the images at the top of the page show a checkerboard visualization of the situation before and after the registration is performed. Neighbouring squares show either the optical or the SAR image, allowing

(20)

118

(a) Before registration

(b) After registration

Figure 4.10: Registration results for the Siberian S-2 granule 42WXS. The first

(21)

4.5. ANALYSIS OF THE RESULTS

119

Figure 4.11: Registration results for the Siberian S-2 granule 42WXS.

Checkerboard visualization of one of the patches the input images are divided into. It highlights the effectiveness of the registration process. The images on top

show the situation before and after the patch is registered, while the images below depict a zoomed area helpful to appreciate the matching of the spatial features present in the scene. It is worth mentioning that only the band number

(22)

120

Figure 4.12: Registration results for the Siberian S-2 granule 42WXS.

Checkerboard visualization of one of the patches the input images are divided into. It highlights the effectiveness of the registration process. The images on top

show the situation before and after the patch is registered, while the images below depict a zoomed area helpful to appreciate the matching of the spatial features present in the scene. It is worth mentioning that only the band number 8 of the optical image (i.e., the one corresponding to the NIR spectrum) is shown.

(23)

121

F igu re 4. 13: R egi st rat ion re su lts for th e Si be rian S-2 gr an ul e 42W XS .C he ck er boar d vi su al iz at ion s of tw o of th e pat ch es th e in pu t im age s ar e di vi de d in to. It hi gh ligh ts th e eff ec tiv en es s of th e re gi st rat ion pr oc es s. Al so in th is cas e it is pos si bl e to ap pr ec iat e th e m at ch in g be tw ee n th e sp at ial fe at ur es pr es en t in th e sc en e. It is w or th en tion in g th at on ly th e ban d nu m be r 8 of th e op tic al im age (i. e. ,t he on e cor re sp on di ng to th e NI R sp ec tr um ) is sh own .

(24)

122

4. LARGE-SCALE IMAGE REGISTRATION the reader to appreciate the matching and mismatching of the spatial fea-tures in the area. Similarly, the images at the bottom of the page show the same checkerboard visualization, but considering only a zoomed detail of the patch. In this case, red and green circles have been added to the picture in order to highlight the most significant spatial features. Finally, Figure 4.13 displays the registration results for other two patches. For the sake of brevity, only the registration results are shown. Also in this case, it is possible to appreciate the matching in the rivers and in the bodies of water.

Moving to other areas, Figure 4.14 shows the registration results ob-tained in Africa and Amazon. Also in this case, the input data consist of seasonal composites generated from Sentinel-1 and Sentinel-2 raw images of 2019. For the sake of precision, the Sentinel-2 granules corresponding to such areas are the 37PCP for Africa and 21KUQ and 21KXT for Amazon. For the experiment in Africa, it is possible to appreciate the situations before and after registration. Conversely, for the experiments in Amazon, for the sake of brevity, only the registration results are reported. Similarly to the Siberian case, the images that are shown represent some of the patches the input full-size images are divided into.

The importance of Figure 4.14(a) and Figure 4.14(b) is twofold. First, they allow the reader to appreciate the effectiveness of the registration method when run on different geographical areas. The spatial features present in the scenes are different and the registration method has been proven to be ro-bust enough to adapt accordingly. Second, and this is visible in the African case, the result shows the robustness of the proposed method also in those cases where the spatial features are very few (e.g., the lake in the African example consists of a nearly flat surface that is almost uninformative for the registration).

Other examples are reported in Figure 4.15 and Figure 4.16. In this case, the goal of the analysis is to prove the robustness of the method not only with respect to the geographical area, but also with respect to the sensors and the acquisition time. Indeed, the two figures show the results of: (i) an experiment conducted with 2005 data acquired by Landsat-7 (optical image) and ENVISAT ASAR (radar image); and (ii) an experiment considering 2015 data acquired by Landsat-8 (optical image) and Sentinel-1 (radar image). Again, the geographical area that is taken into consideration is the same used for the experiments in Figure 4.10 (i.e., the S-2 granule 42WXS in Siberia).

(25)

123

(a) Registration result in Africa.

(b) After registration

Figure 4.14: Registration results for the S-2 granule 37PCP (African Sahel)

and for the two S-2 granules 21KUQ and 21KXT (Amazon). For the first case, the image shows the situation before and after the registration is run. Conversely, for the sake of brevity, only the checkerboard visualizations of the registration results are shown in the Amazon cases. It is worth mentioning that only the band number 8 of the optical image (i.e., the one corresponding to the

(26)

124

Figure 4.15: Registration results for the experiments with: (i) Landsat-7 and

ENVISAT data collected in 2005; (ii) Landsat-8 and Sentinel-1 data collected in 2015. The geographical area is the same already considered before, i.e., the S-2

(27)

125

igu re 4. 16: R egi st rat ion re su lts for th e ex pe rim en ts w ith : (i) Lan ds at -7 an d E NVI SA T dat a col le ct ed in 2005; ) Lan ds at -8 an d Se nt in el -1 dat a col le ct ed in 2015. In bot h th e 2005 an d 2015 cas es ,t he figu re sh ow s th e re su lt of th e re gi st rat ion on on e of th e pat ch es th e in pu t im age s ar e di vi de d in to. It is w or th m en tion in g th at on ly th e ban d cor re sp on di ng to th e NI R sp ec tr um is sh ow n for th e op tic al dat a.

(28)

126

F igu re 4. 17: E xam ple of a pos sib le regi str at ion m ism at ch du e to th e tili ng-bas ed pro ces sin g. T he regi ste red im age pre sen ts a dis con tin uit y at th e int erf ac e be tw een tw o pat ch es. Us in g th e m agn ify in g bo xe s it is pos sib le to ap pre ciat e th at th e im age has be en regi ste red ,y et it is pos sib le to ap pre ciat e an ar tifac t du e to sligh tly diff ere nt tran sfor m at ion ap pli ed to ne igh bou rin g pat ch es. Ho w ev er, su ch de sign ch oi ce al low th e regi str at ion m eth od to ap pro xim at e a non -h om oge ne ou s tran sfor m at ion ac ros s th e w hol e im age .

(29)

127

To investigate the registration accuracy, Figure 4.16 shows the result on one of the patches the images are divided into. Green circles have been added to the images to highlight the good matching between the spatial features in the multisensor dataset.

Finally, an additional experiment is shown in Figure 4.17. Such an exam-ple allows to appreciate a minor artifact that is present at the border between neighbouring patches. Indeed, in order to approximate a non-homogeneous transformation, the registration pipeline applies different affine transforma-tions to the different patches the input image is divided into. This may gen-erate some artifacts like the one shown in the figure. Nevertheless, such mis-alignments are small and allow for the benefit of being able to approximate piecewise-affine transformations with reduced computational requirements. Moreover, the artifact shown in Figure 4.17 correspond to a misalignment of around 5 pixels and, in the whole 10890 ◊ 10890 pixels African image, it represents the largest error that was generated due to the tiling-based processing.

Concerning the quantitative analysis, as anticipated above, the proposed tiling-based strategy has been evaluated in terms or root mean squared error by generating a semi-synthetic dataset from pairs of manually registered images. Two registered Sentinel-1 and Sentinel-2 images (corresponding to the Siberian S-2 granule 42WXS and to the Amazonian S-2 granule 21KUQ) have been warped with a set of well known RST transformations. Then, the proposed method has been applied to the resulting unregistered pairs. Due to the tiling-based processing, for each one of the pairs, the root mean square errors obtained in each patch are averaged to get the final score.

In order to also quantitatively assess the gain that the proposed solu-tion allows in terms of computasolu-tion time, the same semi-synthetic datasets have been used for experiments with the straightforward application of the area-based registration method, without the tiling strategy. Indeed, the in-put image is not divided into small patches, but the registration method is applied to the entire image lattice. It is worth noting that, as it will be highlighted later, the application of the registration method directly to the entire image prevents the possibility of modelling non-homogeneous trans-formations. The time needed for convergence, together with the obtained root mean squared error, has been compared to the one achieved by the pro-posed solution. It is worth recalling once more that the registration problem

(30)

128

Area Siberia Amazon

Method SoA Proposed SoA Proposed

Time [min] 68 31 51 30

RMSE [pixels] 2.1 2.6 2.0 2.2

Table 4.1: Quantitative evaluation of the proposed method and comparison

with a state-of-the-art solution. The RMSE (in pixels) and computation time (in minutes) are averaged across a synthetic dataset made of 10 images.

faced by the area-based method (without tiling) is intrinsically less complex than the one faced by the proposed method. Indeed, the synthetic transfor-mations applied to construct the synthetic dataset belong to the family of global RST transformation models. On the one hand, there existed a feasible solution that the area-based method could converge to, thus allowing a fair comparison. On the other hand, the dimensionality of the problem faced by the proposed solution is much higher, due to the larger search space. Nev-ertheless, comparing the results allow us to appreciate the level of accuracy that the proposed method is able to reach.

Table 4.1 reports the quantitative analysis. The computation times are measured in minutes, while the root mean squared errors are measured in terms of pixel units. The numbers have to be considered as averaged across a synthetic dataset generated from a set of 5 transformations applied to the Siberian and the Amazonian images. The table allows us to appreciate a significant reduction in the computation time, thus assessing the capability of the proposed method to achieve a speedup of around 2.2, while also achieving accuracy performances that are comparable to the one obtained by the area-based method in the favourable setup. We also note that, as mentioned above, the traditional approach used here for comparison could obtain low RMSE values also because a unique well-defined parametric transformation, valid across the whole scene, existed by construction. Indeed, in a registration problem on a large-scale data set, this is normally not the case, because the geometric relation between the two input images may generally be space-variant and a unique parametric transformation usually does not exist.

As a final remark, it is worth spending some time analysing the sensi-tivity of the proposed tiling-based strategy to its main parameters. Indeed,

(31)

129

Tables 4.2 and 4.3 report the quantitative analysis of the performances (again in terms of RMSE and time) by varying the window size and the anomaly detection mechanism. The window size is chosen equal to 1000 pixels (i.e., each patch is 1000 ◊ 1000 pixels), which is the baseline choice, and equal to 500 and 2000 for comparison purposes. Conversely, the effect of the anomaly detection mechanism is studied by comparing the cases where it is switched on or switched off. The images used for the experiments reported in this table are the Sentinel-1 and Sentinel-2 pairs collected in Amazon (21KUQ), Siberia (42WXS), and Africa (37PCP).

Concerning the first parameter, the window size determines the size of the patches the input images are divided into. A larger number implies that the images are divided into a smaller set of patches, each one of larger size. Conversely, picking a smaller window size implies a larger number of smaller patches. The rationale is that smaller patches are faster to register, thus choosing a smaller window size can grant a significant boost in terms of computation time even though these smaller patches are more numerous. Nevertheless, smaller images are also more difficult to register. This is espe-cially the case when no significant spatial feature is present in a particular area. Conversely, dividing the images into larger patches favours such spatial features to be present in each one of the patches. Indeed, in this case, the ac-curacy usually increases, at the expense of a lower computational speed-up, as the registration mechanism takes more time. Nevertheless, another con-sideration is that larger patches also imply that the piecewise approximation of the possible non-homogeneous transformation is coarser. Therefore, the higher accuracy achievable by choosing a larger window size is not always granted if the geometrical distorsion between the input images is strongly non-stationary.

Concerning the anomaly detection mechanism, its usage grants better performances at the expense of a slightly longer computation time. Indeed, the activation of such mechanism requires the algorithm to perform some computations for every pair of patches. Conversely, by deactivating the mechanism, the accuracy may be reduced, especially in those cases where the spatial features are not present in the whole image. Indeed, if one of the patches is completely flat (e.g., in case of a lake, the sea, grasslands, etc. ) , the registration may fail to converge (i.e., it may output any transformation, as all the possible transformations provide similar values for the similarity measure). The anomaly detection would catch such an exception due to the

(32)

130

4. LARGE-SCALE IMAGE REGISTRATION Window Size 500 1000 2000

Time [min] 27 30 35

RMSE [pixels] 3.5 2.2 1.6

Table 4.2: Sensitivity analysis of the proposed method to the window size. The

RMSE (in pixels) and computation time (in minutes) are averaged across a synthetic dataset made of 5 images (Amazon 21KUQ).

Amazon 21KUQ Africa 37PCP

Anomaly Detection On Off On Off

Time [min] 30 24 29 25

RMSE [pixels] 2.2 2.4 15 140

Table 4.3: Analysis of the performances of the integration of the anomaly

detection mechanism into the proposed tiling-based multi sensor registration method. The RMSE (in pixels) and computation time (in minutes) are averaged

across a synthetic dataset made of 10 images.

possible inconsistency of the new transformation with respect to the previ-ous ones. Conversely, in case the mechanism is not running, the resulting wrong transformation would be applied to the input patch. Additionally, the wrong result would be forwarded to the next iteration as an initialization, thus increasing the complexity of the subsequent registration problem.

Table 4.2 reports the quantitative analysis (in terms of computation time and RMSE) of different window sizes in the considered scenario. As expected, the time needed for the method to converge increases with the window size. Moreover, the RMSE, in the case of small 500◊500 patches, is larger than in the baseline scenario. This confirms the above assumption of having better performances in case more spatial features are present in the image. As for the comparison with the state-of-the-art solution (see Table 4.1), it is worth noting that the synthetic dataset has been constructed using global ground truth transformations. Hence, in this case, increasing the window size will not affect the accuracy. If the ground truth transformations were spatially non-homogeneous, the fidelity of its approximation would have worsened with the increase of the window size.

(33)

131

Figure 4.18: RMSEs obtained by the proposed tiling-based registration method

in each one of the patches the input image is divided into. Comparison of the performances achieved in case the anomaly detection mechanism is enabled or

disabled. The red plot shows the RMSEs in case the anomaly detection mechanism is disabled, while the green plot shows the RMSEs in case it is enabled. The blue area denotes the iterations where the anomaly detection

(34)

132

4. LARGE-SCALE IMAGE REGISTRATION Concerning the analysis of the anomaly detection mechanism, Table 4.2 reports a quantitative analysis of its effectiveness in two different scenarios. The images composing the Amazonian synthetic dataset are full of spatial features, thus allowing a smoother registration with few anomalous patches. Conversely, the African dataset is much more challenging, with vast areas pre-senting few or no spatial features (e.g., a vast water area covering around 20 patches). In this second case the anomaly detection mechanism is paramount. Figure 4.18 shows the RMSEs of each one of the patches composing one of the images of the African dataset. It is easy to appreciate the importance of the anomaly detection mechanism in this case. When the mechanism is dis-abled, as soon as some subsequent patches are anomalous, the RMSE starts diverging. The registration method applied to the single pair of patches is not able to recover the anomaly. Conversely, using the anomaly detection mechanism, once the anomaly is detected, the previous transformation is used and forwarder to the following registration problem. In this case the RMSE is kept to a lower value also in difficult and challenging scenarios as the African area.

4.6 Conclusions and Future Developments

This Chapter presented a novel solution for the multisensor registration prob-lem in a large scale scenario. Although the method has been developed and experimentally validated in the context of the CCI+ HRLC project, its valid-ity goes beyond that specific framework, thus allowing its application to all such cases where the joint use of optical and SAR data requires a registration step before further processing. Moreover, in addition to the challenging prob-lem of multisensor registration, the proposed method has been specifically designed to cope with large scale datasets. Non-homogeneous transforma-tions are approximated using a tiling-based processing. Moreover, an ad-hoc scheduling mechanism has been designed so that the registration of a given pair of patches is initialized based on the previous result. Indeed, the com-putational burden that multisensor image registration usually brings along is dramatically reduced, thus allowing the application of the proposed method in a large-scale scenario.

The proposed solution resulted to be robust and effective in a wide variety of experiments. Indeed, in the experimental analysis, the tiling-based

(35)

4.6. CONCLUSIONS AND FUTURE DEVELOPMENTS

133

multisensor geolocation module was tested with optical data collected by Sentinel-2, Landsat-7, and Landsat-8, and with SAR images collected by Sentinel-1 and ENVISAT ASAR. Moreover, the experiments were carried out with either recent data (i.e., Sentinel-2 and Sentinel-1 2019 data) or historical data collected in 2015 and 2005 by Sentinel-1, Landsat-7, Landsat-8, and ENVISAT ASAR. In addition, the experimental analysis was carried out both from a qualitative and a quantitative perspective. The effectiveness of the proposed solution has been visually demonstrated by analysing the results obtained from real pairs of unregistered images. Moreover, a semi-synthetic dataset has been used to quantitatively evaluate the performances of the method in terms of root mean square error. Finally, the same semi-synthetic dataset has been used to compare the proposed tiling-based solution with the application of standard state-of-the-art solutions in terms of accuracy and computational requirements. Indeed, the trade-off between the two quantities is especially important when working in large-scale scenarios.

The extensions of the proposed method will take into consideration the possibility of replacing the modified Powell’s optimization algorithm with a global method, like genetic algorithm or simulated annealing. Nevertheless, the focus will be put on the trade-off between the possible gain in the accuracy and the computational overhead that global optimizers bring along. In this respect, it is worth recalling that the proposed tiling-based method can be combined with the application of an arbitrary registration algorithm to each individual tile.

With respect to the computational burden, another solution that will be taken into consideration is the integration of conditional generative adver-sarial networks in the processing chain, as described in the previous chapter. Being able to effectively translate one of the two images onto the domain of the other allows adopting less computationally demanding correlation-type metrics for the registration. The drawback is the need for training the cGAN. On the one hand, the computationally demanding training can be done offline and, in production, it is possible to just apply the network for image-to-image translation purposes. On the other hand, it is necessary to investigate the robustness of such network and determine how many models it is necessary to train for the application to the CCI+ HRLC processing chains. As an example, it might be the case that just one model for each one of the three geographical regions and each one of the sensors is enough. Or maybe it is necessary to train many models for as many subregions that can be

(36)

identi-134

4. LARGE-SCALE IMAGE REGISTRATION fied based on geographical and physical characteristics. Also, it might be the case that also the acquisition time affects the domain adaptation capabilities of the cGAN, thus requiring to train different models for the different years of production. Finally, experimenting with such choices is also important because, depending on the number of models that are necessary, also the requirements in terms of training data changes, thus possibly preventing the adoption of such solution.

Another future goal is the development of a more sophisticated schedul-ing, with the possibility of applying a filterschedul-ing, or a smoothschedul-ing, between the different local transformations. This way it is possible to reduce the artifacts that may appear at the border between subsequent patches. Moreover, it would be also interesting to design a scheduling that would allow for parallel processing, thus being able to take advantage of high performance computing capabilities to further reduce the computation time necessary for registering the whole images.