Dim target detection via background subtraction in naval InfraRed Search and Track systems.

(1)

University of Pisa

Naval Academy

Department of Information Engineering

Informatics, Electronics, Telecommunications

Master Degree in

Telecommunication Engineering

Dim target detection via background subtraction

in naval InfraRed Search and Track systems

Supervisors:

Prof. Ing. Marco DIANI

Prof. Ing. Nicola ACITO

Ing. Andrea ZINGONI

CF (AN) RoccoSOLETI

Candidate:

(2)

A

BSTRACT

Target detection is a crucial assignment in several fields. It aims at managing and protecting the object being supervised as the target could be dangerous, intentionally or not.

In maritime scenarios, target detection can be useful for managing ship routes near the coast both from land and on board of the vessels. For example, it can be used from port captaincy in order to avoid ship accidents, or in the open sea, in order to detect hostile behaviours from other boats.

The particular employment of the military ships emphasizes the need of security. Nevertheless, such a need occurs also for merchant ships. In fact, in the last years, the incidence of pirate attack to merchant ships as increased and the cost of piracy to the world shipping in the global economy is estimated to about £18 billion a year.

Surveillance technology in maritime scenarios are mainly oriented towards active sensors such as radar. Nevertheless, they are very expansive and not always the optimum choice since it makes the user more visible to the distance whereas, especially for military ships, it is desirable to keep stealth. Furthermore, they produce electromagnetic fields that are harmful for human health and can be affected by several electromagnetic compatibility problems with other board instruments.

InfraRed cameras are, on the contrary, passive instruments, generally cheaper than radars. Since they do not produce electromagnetic fields, they avoid increasing the visibility of the ship and can be placed almost everywhere, without interfering with other board instruments.

In order to detect targets in InfraRed images, several algorithms of background estimation and rejection are used. Most of them are sequence-based and take advantage of the time stationarity of the background to highlight the target by subtraction of consecutive images. These methods are time consuming, because they have to store a buffer of images before producing a result. Furthermore, they provide insufficient performance for maritime environment, where the sea clutter is rapidly changing. For this reason, this thesis work focuses on the frame-based approach. The methods belonging to this category consider the spatial flatness of the clutter and estimate it by means of frequency domain filters or spatial domain filters in order to delete it.

For this thesis, some linear and nonlinear spatial domain filters have been developed. They require knowing the dimension of the target to be detected. This is not a strong limit for those algorithms since the goal is the first detection of the target, after which it can be followed by a tracker. The first detection desirably happens when the target is far from the camera, near the horizon line, where its shape inside the image occupies a window with dimensions of few pixels.

The validation of the proposed algorithms has been done through a dataset of real InfraRed images captured on maritime environment. In this dataset, all the images of each sequence have been

(3)

analysed as if it was necessary to detect the target in each frame and not only the first time. Furthermore, the algorithms have been tested even on nearer target, which occupies a bigger number of pixels.

The large amount of real images taken implicated a hard work of selection and classification of the sequences, but permitted not to use simulated images, providing, at the same time, more reliable results.

In order to evaluate the efficiency throughout all the images it has been necessary to create a small tracking algorithm that automatically recognises the movement of the target.

The parameters chosen to evaluate the performance of the filters have shown that some of the analysed filters better cope with certain types of background than the others. The strength in such result is that it is possible for to previously decide which filter is more appropriate depending on the background in which we need to find a target.

(4)

I

NDEX Abstract ... 2 Index ... 4 Introduction ... 6 1 Theoretical framework ... 9 1.1 Radiometry ... 9 1.1.1 Plank’s law ... 11

1.1.2 Wien’s displacement law ... 13

1.1.3 Discussion on emitted and reflected radiance ... 14

1.2 Sensor ... 15 1.2.1 Camera model ... 15 1.2.2 Pinhole model ... 16 1.3 Signal model ... 18 1.3.1 Target ... 18 1.3.2 Background ... 19 1.3.3 Noise... 21 2 Background removal ... 24

2.1 Introduction to background removal ... 24

2.2 Proposed filters ... 27

2.2.1 2D Window Average with Guard... 27

2.2.2 Bi-Dimensional Median ... 32 2.2.3 Dimensional reduction ... 34 2.2.4 Mono-dimensional filters ... 35 2.2.5 Bilateral filter ... 36 3 Results... 38 3.1 Dataset ... 38 3.1.1 Camera characteristics... 38

(5)

3.1.2 Dataset organization ... 39

3.2 Evaluation methods ... 39

3.2.1 Signal to Clutter Ratio Gain (SCRG) ... 40

3.2.2 False Alarm at First Sight Rate (FAFSR) ... 41

3.2.3 Discussion on the use of two parameters ... 41

3.3 Results per sequence ... 42

3.3.1 Target embedded in sky background ... 42

3.3.2 Target embedded in sea background ... 46

3.3.3 Target embedded in structured background ... 54

3.4 General analysis ... 59

Conclusions ... 62

(6)

I

NTRODUCTION

In this work, the problem of automatic target detection in maritime scenarios has been analysed. This problem can be tackled by using different type of sensors. We focused on thermal camera. Those are passive sensors which can produce video flows whose frames represent the electromagnetic radiation emitted by bodies situated in a thermal range that is likely found on Earth. Such electromagnetic radiation is characterized by a particular frequency range which is distinguished, within the frequency spectrum, with the name of InfraRed. In particular, in this thesis, images captured by a camera working in the part of the InfraRed band called Medium Wavelength InfraRed have been analysed.

The images produced by the thermal camera depict the whole scenario in the field of view of the sensor. Therefore, in the image we can find the background and, if present, one or more targets. Furthermore, images are affected by noise, which disturb the clearness of the scene. The goal is to automatically decide for the target presence, avoiding being deceived by the background and by the noise. The influence of the noise can be reduced by means of low pass filters which can reduce the contribution of the high frequencies, the part of the spectrum where the noise is predominant. Then, in order to increase the detection probability, which is the probability of correctly decide for the presence of a target, and, at the same time, to reduce the number of false alarms, which means that has been decided for the presence of a target when it is not there, a preliminary operation of background removal is necessary. This operation is accomplished by subtracting the estimation of the background to the image.

In literature, several filters whose purpose is the background estimation exist. For the maritime scenarios, the most promising filters are the frame-based ones. Such filters, differently from the sequence-based ones, estimate the background signal of a pixel by means of the surrounding area instead of using the pixel in the previous and consecutive frames. The better performance of the frame-based in maritime scenarios depends on the fact that both the background and the camera, which is likely mounted on a ship, move.

In this work, bi-dimensional and mono-dimensional filters has been investigated. Furthermore, a type of filter which select the maximum value of the estimations that have been done by mono-dimensional filters along some preferential direction has been tested. For each of the previously mentioned family, both the averaging and the median versions has been illustrated. As the background estimation produced by the averaging filters suffers the target presence, such filters are provided with a guard window which preserves the results. Further a different type of bi-dimensional filter, which is called Bilateral filter, has been tested. The latter, by means of a weighting window

(7)

which can assign different weights to each pixel, promises to handle both flat and non-flat background. Such filter is provided with a guard, as well.

For the validation of the filters, several sequences have been used. Such sequences cover the main classes of background (which are sky, sea and land background) captured in different climatic conditions. Both marine and flying targets have been observed and detected.

The target detection problem in maritime scenarios is mainly concentrated on few pixel targets situated far from the camera, near the horizon line. In fact, after the detection of a target, tracking algorithms can be used to follow its movement, so that the detecting process has not to be performed again for the same target. Nevertheless, in order to evaluate the performance of the filters in each background situation, also nearer target, which covers a relatively high number of pixels and are situated far from the horizon line, have been used.

The recordings we had available for the validation have been captured by means of the ‘ERICA Plus’ camera, produced by Leonardo S.p.a..

The performance of each filter has been calculated by means of two parameters, the Gain in Signal to Clutter Ratio (SCRG) after the filtering process and the False Alarm at First Sight Rate (FAFSR). The first is local measure indicating how much the target is visible with respect to the background, whereas the second indicates the number of pixels whose intensity remains higher than the target intensity after the filtering process with respect to the total number of pixel of the frame.

In Chapter 1 a theoretical introduction to the problem has been given. In the first part of the chapter, details on the relationship between the detection problem and the usage of the InfraRed spectrum has been presented. Then, a general idea on the imaging sensors principles has been given and, finally, the InfraRed signal components have been separately depicted and analysed.

In Chapter 2, the background estimation problem has been investigated. Considerations about the signal model which have guided the choice of the filters have been illustrated and, then, the chosen filters have been presented.

In Chapter 3, the results have been presented. In the first part of the chapter, the organization of the sequence database have been elucidated, as the sequences have been ordered on the basis of the background class in which the target is embedded. In the second part of the chapter, the evaluation parameters have been illustrated. Finally, in the last part of the chapter, each sequence has been depicted, showing also a frame, in order to give an idea of the reference situation. The results given by the filters on each sequence have been presented and analysed in order to hopefully find a general criterion on the basis of which, starting from the observation of the background, the best filter can be chosen.

(8)

Finally, in the conclusions chapter, considerations about the results produced by the presented filters have been given. Ideas for further future works have also been proposed.

(9)

1 T

HEORETICAL FRAMEWORK

In this chapter, the theoretical elements, on the basis of which the entire work lies, will be illustrated. We start by explaining some elements of the branch of optics called radiometry.

1.1 Radiometry

In physics, we call optics the branch that studies the light, its behaviour, its properties and its interactions with matter [1]. Optics also is in charge of the construction of instruments enable to use or detect light. As the light is an electromagnetic radiation, all the related phenomena could be perfectly explained by electromagnetism. Nevertheless, it is very useful, for certain application, to employ simplified models. On the one hand, the most classical are geometric optics [2], that treats light in terms of collection of straight rays, and physical optics [3], more indicated for studying phenomena, such as interference, diffraction and polarization, with which the ray treatment would fail. On the other hand, because of the dual nature of the light, several modern methods were born in order to face both the wave-like and the particle-like properties of the light. Between those methods, they stand out radiometry and quantum optics [4], a branch of quantum mechanics that treats light as a collection of particles called photons.

Radiometry is the branch of optics that studies the electromagnetic radiation measurement [5]. The measure of electromagnetic radiation can provide important information about the chemical composition and the temperature of objects and gasses. For this reason, it is largely used in subject such as astronomy or Earth remote sensing. The latter subject is about the acquisition of information regarding objects with which it is impossible to create a physical contact [6]. Therefore, the target detection purpose belongs to this field.

In order to better understand how radiometry can help the target detection purpose, it is necessary to know how the electromagnetic spectrum is composed. A representation of it is shown in Figure 1.1.

(10)

Figure 1.1) Electromagnetic Spectrum

Similarly to every type of wave, the electromagnetic ones are characterized by an amplitude1_,

a wavelength2_{and a frequency}3_{. The last two properties are linked by the propagation speed, which,}

for this thesis purpose, we can consider constant and approximate it with the light speed in free space which is called ‘c’ and is equal to 299792458m

s (approximately 3 ∙ 10 8 m

s). Knowing the light speed,

we can link the frequency to the wavelength by means of Equation 2.1, where f indicates the frequency and λ indicates the wavelength.

f = c

λ 2.1

When the wavelength is in the range 390nm to 700nm (430: 700THz) the light can be seen by human eye, therefore we talk about ‘visible spectrum’ [7]. This one is the part of spectrum at which all the cameras people use every day to take pictures works.

The part of spectrum that goes from 700nm to 1mm (300: 430THz), instead, is called ‘InfraRed’ [8] and is the part we are interested in. The InfraRed spectrum, in turn, is divided into five subparts [9]:

1. Near-InfraRed (NIR), 0.75: 1.4μm (214: 400THz)

2. Short-wavelength InfraRed (SWIR), 1.4: 3μm (100: 214THz) 3. Mid-wavelength InfraRed (MWIR), 3: 8μm (37: 100THz) 4. Long-wavelength InfraRed (LWIR), 8: 15μm (20: 37THz) 5. Far-InfraRed (FIR), 15μm: 1mm (0.3: 20THz)

Actually, in order to decide the band framework, we have to pay attention to the variations in atmospheric transmittance4 throughout the frequency spectrum. In Figure 1.2 such a feature is shown.

1_{The amplitude indicates the intensity of the wave.}

2_{The wavelength indicates the spatial distance between two consecutive peaks.} 3_{The frequency indicates the number of peaks in a second.}

(11)

Figure 1.2) Atmospheric transmittance

The transmittance changes throughout the frequency spectrum because of the capability of the molecules present in the atmosphere to absorb certain frequency radiations. The parts of the spectrum where the transmittance reaches a near-zero level are not suitable for remote sensing purposes, because no energy could reach the sensor. Therefore, the bands utilized for InfraRed sensing are MWIR, especially from 3μm to 5μm, and LWIR, especially from 8μm to 14μm [10]

The motivation for which we are interested in InfraRed spectrum is given by the Plank’s law.

1.1.1 Plank’s law

Plank’s law shows the dependence of the spectral density of electromagnetic radiation5_{that a}

black body6 at a given temperature, in thermal equilibrium, can emit [11].

The relation between the spectral radiation and the temperature is given by Equation 2.2, where Lbb(λ, T) is the spectral radiance, ħ is the Plank constant and K is the Boltzmann constant.

5_{The spectral density of electromagnetic radiation, briefly called spectral radiance (measured in} W

sr∙m3),

indicates the energy of the electromagnetic radiation per unit of time emitted by a unitary surface towards a unit of solid angle in a specific frequency of the electromagnetic spectrum [5].

6_{A black body is an ideal object that can absorb all the incoming electromagnetic radiation [11]. This means}

that it does not reflect or transmit any light component. It is so called because ‘black’ does not indicate a colour, but the absence of colours, in other words, the absence of light. In this case, ‘black’ indicates the absence of reflected light. For the principle of the conservation of energy, the incoming radiation cannot remain bounded inside the body; in fact, the latter take advantage of all this radiation to warm up and, eventually, to emit. This means that the energy emitted by a black body at a given temperature is higher or equal to the energy emitted by any other body at the same temperature.

(12)

Lbb(λ, T) = 2∙ħ∙c2 π∙λ5 ∙ 1 (e ħ∙c λ∙K∙T₋₁₎ 2.2

In order to better understand such a law, Figure 1.3 and Figure 1.4 are shown:

Figure 1.3) Plank's law for objects at a range of temperature observable on the Earth

Figure 1.4) Plank's law for the Sun surface

As Figure 1.3 and Figure 1.4 show, the peak in radiance is not always found at the same wavelength, but it moves towards increasing wavelengths as temperature decreases. For example, the peak of the Sun light is bounded in the visible spectrum. This is the reason why, for a sort of anthropic principle [12], human eyes can recognise those frequencies and not others.

(13)

It is worth noting that the radiance expressed by the Plank’s law, when we talk about grey bodies7, must be weighted by the emissivity8.

The wavelength at which the peak of radiance is situated, is given by the Wien’s displacement law.

1.1.2 Wien’s displacement law

The Wien’s displacement law is a direct consequence of the Plank’s law. It states that the black body radiation has its peak at a wavelength that is inversely proportional to the temperature of the body [13]. The reference formula is the 2.3, where λ_max is the wavelength at which the peak of radiation is found and b is a constant called Wien’s displacement constant (measured in m ∙ °K).

λmax= b

T 2.3

For bodies at temperatures from 100°K to10000°K, the value of λ_max is shown in Figure 1.5

Figure 1.5) Wien's displacement law

In particular, the peak of Sun radiation, which means the radiation of a body at 5778°K, is found at wavelength of about 0.5μm.

From Figure 1.5 it is possible to deduce the peak wavelength of bodies at temperatures between 200°K and 500°K. Such peaks are situated between 5.7μm and 14.7μm, that is in the portions of spectrum called MWIR and LWIR.

7_{A grey body is a real object that can absorb, reflect and transmit the incoming electromagnetic radiation.} 8_{The emissivity indicates the ratio between the radiation emitted by the body under test and the radiation that} a black body at the same temperature would emit.

(14)

1.1.3 Discussion on emitted and reflected radiance

Despite we discovered that the most part of radiation emitted by bodies of interest is situated in the InfraRed band, we have to consider whether their own InfraRed emissions or their reflections of the InfraRed radiation hailing from the Sun is the highest. In fact, if the last hypothesis was true, even analysing the InfraRed band it would be very difficult to distinguish the information related to the temperature of the objects, hidden in the reflections. In order to have a fair comparison, we have to calculate the amount of radiation that reaches a unitary surface on Earth and that is re-irradiated towards 1sr. To do this the spectral radiance emitted by the Sun need to be multiplied for the area of the solar disk and divided by the surface a hypothetical observer situated on the Sun surface would see under a unitary solid angle at the distance from Sun to Earth. Further, this quantity has to be normalized by the solid angle towards which the energy would be re-irradiated. Such a solid angle would approximately be the solid angle subtended to a hemisphere, which means 2πsr.

The Sun spectral radiance reflected by the Earth is given by Equation 2.4, where L_BBR(λ, TS)

is the Sun spectral radiance reflected by the Earth, LBB(λ, TS) is the Sun spectral radiance, ASD is the

area of the solar disk and D_ES is the distance from Sun to Earth. L_BBR(λ, TS) =

LBB(λ,TS)∙ASD

DES2∙2∙π 2.4

This is another approximation because it considers the Sun and the Earth surfaces flat and parallel each other. The result is shown in Figure 1.6.

(15)

On the one hand, the Sun radiance at 6μm that would be re-irradiated by a square meter on the Earth towards 1sr would be of the order of magnitude of 105 W

sr∙m39.

On the other hand, in Figure 1.3 we can see that the radiance emitted by a black body at the temperature of 350 °K would be in the order of 107 W

sr∙m3, and so about a hundred times higher than the Sun radiation on Earth.

For all those reasons, the most useful part of the electromagnetic spectrum, in order to emphasize objects in a range of temperature that can be reasonably found on the Earth, is the InfraRed one.

1.2 Sensor

Once we decided the band we want to explore, we need to know how the instrument we are going to use does work. Such an instrument is a thermal camera, which is a camera that can observe the InfraRed spectrum instead of the visible one.

1.2.1 Camera model

The most important components, all images acquisition systems are composed of, are three [14]:

1. A body provided with an aperture on a side; 2. A photosensitive surface (the sensor);

3. A lens and mirrors system situated in the middle of the previous parts.

Figure 1.7) Acquisition system components

9_{Actually, such a quantity is overestimated because of the atmospheric transmission, which, as shown in Figure} 1.2, never reaches 100%, and because of the reflection coefficient (the ratio between the reflected and the incoming radiant energy), which is included in the interval [0,1] for any kind of surface.

(16)

The light coming from the scene pass through the aperture. Thanks to the lens system, it is both directed and focused on the sensor, where the image is created. Then, the image has to be stored in a memory support which typology depends on the specific system.

The camera we used in this work is provided with a digital sensor, which is a matrix of square photosensitive elements called ‘pixels’. Each pixel can convert the radiant energy incoming on it into a number. Such a number is the value that will be assigned to the portion of the image, which position corresponds to the one of the pixel in the sensing matrix. This portion is also called ‘pixel’. Therefore, a pixel is the smallest part of the image within which it is no longer possible to distinguish more than one object. As the number of pixel determines the resolution10_{of the camera it is considered as the}

unity of measure of the image.

To be more precise, a single pixel cannot capture all the radiant energy, but it only reacts to a specific wavelength range. For example, common cameras need three benches of pixels, which can separately react to red, green and blue wavelengths. Thermal cameras are provided with one or more benches of pixels that can react to InfraRed wavelengths.

Each pixel can measure the radiant energy of the proper colour and, in order to store such a measure, the latter needs to be quantized in levels which number is a power of 2.

1.2.2 Pinhole model

As the purpose of the collected images is the study of the scene, we need to know how to extrapolate information about the scene starting from its acquisition. The most used model is the pinhole model [14].

It is based on the following definitions:

• Optical centre, O, which corresponds to the aperture;

• Imaging plane, I, upon which the photosensitive surface is placed; • Focal distance, f, which is the distance between Oand I;

• Optical axis, c₀, which corresponds to the straight line passing by O and perpendicular to I; • Principle point, C, indicated by the intersection between I and c₀;

• Focal plane, F, which is the plane parallel to I and containing O.

10_{The resolution indicates the ability of an optical system to see as distinct points of an object that are located} at small angular distance.

(17)

Figure 1.8) Geometric constructions on the Pinhole model

As Figure 1.8 shows, any point P in the scene is projected into I as P′. It is easy to notice the similitude between the triangles in 2.5

P̂ ≈ COP′_ZOP_x ̂ and P_u ̂ ≈ COP′_ZOP_y ̂ , _v 2.5 Starting from 2.5, we can write the relations in 2.6, where (x, y, z) are the spatial coordinates of P, whereas (u, v) are the surface coordinates of P′ on I.

{u = − f z∙ x v = −f z∙ y 2.6

By inverting the 2.6, we can deduce the dimension of the acquired scene, called field of view (FOV), knowing f and the dimensions of the sensor bench. In the same way, we can deduce the dimensions of the part of the scene acquired by a single pixel, called instantaneous field of view (IFOV), knowing f and the physical dimensions of a single pixel in the bench. As such dimensions would depend on the distance of the single part of the scene from O, we often talk about angular field of view (angularFOV) and angular instantaneous field of view (angularIFOV). If we consider only the (x, c0) plane, the presented relations would be expressed by Equations 2.7 and 2.8, where s and

d_I represent the dimensions of the whole sensor bench and the dimension of the single pixel along the x axis, respectively.

{ FOV = z f∙ s IFOV = z f∙ dI 2.7 { angularFOV = 1 f∙ s angularIFOV =1 f∙ dI 2.8

(18)

Figure 1.9) Geometric construction of the Pinhole model in 2D

We will indicate angularFOV and angularIFOV simply as FOV and IFOV in the following as it is easy to distinguish them by the unit of measurement.

1.3 Signal model

Once we understood how the image is formed, we can analyse the model of the signal that generated it.

The position of each pixel in the image is indicated by means of a couple of coordinates (i, j), whereas, as we will work with video sequences, the whole image is identified by the time coordinate (t) inside the image flow. The signal captured by the pixel at those coordinates of an InfraRed image s(i, j, t) can be modelled as a superimposition of different components. Such a model is expressed by Equation 2.9, where tgt(i, j, t) is the target intensity, b(i, j, t) is the background intensity and n(i, j, t) is the noise intensity [15].

s(i, j, t) = tgt(i, j, t) + b(i, j, t) + n(i, j, t) 2.9 To be precise, what the model names tgt(i, j, t) is the difference between the target intensity and the background intensity, in fact, the pixel in which IFOV there is a target would only see the target and not the background behind it. Nevertheless, we will continue referring of tgt(i, j, t) as target intensity hereinafter.

(19)

The contribution of Equation 2.9 we are interested in is only the target signal. It can be represented as shown by Equation 2.10, where A is the amplitude and represent the maximum peak, s_T(i − i₀, j − j₀, t − t₀) is the amplitude-normalized target signal positioned at the coordinates (i₀, j₀, t₀).

tgt(i, j, t) = A ∙ sT(i − i0, j − j0, t − t0) 2.10

What helps to distinguish the target from the surrounding background is its contrast. The contrast is the intensity difference that makes an object to stand out with respect to the background. Several definitions of contrast exist. One of the most used is the Weber definition [16]:

C_W(i, j, t) =tgt(i,j,t)−b(i,j,t)

b(i,j,t) 2.11

In Equation 2.11, C_W(i, j, t) is the name given to the contrast. As we only know the value of s(i, j, t) and not the single ones of the target, background and noise components, in order to calculate the target contrast we have to estimate the values of tgt(i, j, t) and b(i, j, t).

1.3.2 Background

The background term, sometimes referred as ‘clutter’, is the unwanted signal that reaches the camera sensor. In this work, we particularly refer to maritime clutter, which has been studied and modelled for years, since several corporations, especially the NATO community, expressed the importance of accurate information about this complicated matter [17].

Differently from the surveillance systems working with fully structured background, where the clutter is quite easily indicated as the part of the scene that does not move during a sequence of images (just think about the surveillance system of a bank) whereas every moving object is a possible target, in the maritime scenario, the background also moves. For this reason, it is important, in order to better analyse the behaviour of this hardly facing clutter, to classify it into different types of background. First, we can distinguish three main clutter types in maritime scenario, which can be found together into a single image.

• Sky background, which can be

o Empty when no structure is present, apart from the possible presence of target. Its variations are very slow in time, only determined by the luminance conditions, which basically depend on the day hour. In this situation, the intensity fluctuations in near pixels (where near means both spatially and temporary), where the target is absent, are due to the noise factor. It is the ideal situation for an InfraRed system working in the maritime scenario for its performances are only limited by the noise.

(20)

o Cloudy, otherwise. We can consider clouds as water volumes in the sky at approximately the same temperature of the surrounding air. Because of their optical thickness in the InfraRed bands, clouds appear as warm objects with respect to the cold sky. The problems due to clouds are the higher background level, which, in case of warm targets, cause a loss in target contrast and the formation of sharp edges, which can be enhanced by the effect of the same filters that should delete the background, as it will be explained in Chapter 2.

• Sea background. Its variation intervals depend on the sea conditions, anyway they are rapidly changing both in space and in time. It contains two types of information, the first concerning the emissions produced by the water mass at a specific temperature, the second due to the reflection of ambient radiation. The mutual percentage of those elements depend on the emission and reflection coefficients. The latter depend, in turn, on the angle formed between the radiation source and the sea surface, and it becomes significant under small angles. The sea background can be characterized by the presence of the horizon line and the by the presence of Sun glint.

o The Horizon line is the line between the sky and the sea clutter. Because of the difference in temperature and emittance of the elements, the signals received from the water and from the atmosphere are very different. This difference in radiation produces an increase in contrast, which harden the small target detection nearby the horizon, which is, unfortunately, the most interesting zone where performing detection. It is worth noting that some techniques in literature [18] [19] take advantage of the presence of the horizon line to limit the target searching field to a narrower strip around the horizon, which, as said, is the most interesting zone. This is possible thanks to the relative simplicity in recognizing the horizon line [20] [21]. Limiting the searching area is likely to result in a decreasing of false alarms and missed detection. Anyway, in order to test the background filtering algorithms presented in this work also in images were the targets were not so near to the horizon, this possibility has not been taken into account. Therefore, we have to consider that the results would be enhanced by the usage of these techniques.

o The Sun glint is the effect of the solar reflection on the sea surface. Those reflections show up in a wide range of Sun azimuth, starting from the horizon, where this component is higher. Sun glint, when present, is the most powerful part of the signal and, therefore, it has bad influence on target detection because of the facts that it can

(21)

cloak the small targets or, when it does not happen, it causes the detection to rise in difficulty because of the increasing false alarms.

• Structures are sometimes present, placed between sky and sea backgrounds, such as land pieces towards the horizon. Such a part of the image would not move in time in the image flow if it were not for the movement of the camera that is most likely secured to a moving platform like a ship. Therefore, the best choice would be to compensate the movement of the camera by means of inertial sensors ad than to treat those images like normal structured background. Anyway, in this work we also tested the proposed filtering algorithms with targets placed in structured background.

It is worth noting that the maritime background is a realization of a non-stationary stochastic process, whose statistics vary in space (within the frame) and time (within the sequence). In particular, the mean value of the background is more rapidly changing than the variance, both in space and in time. Nevertheless, it is possible to find a space window and a time interval, within which the background can be considered stationary.

1.3.3 Noise

The noise term is an undesired random variation in brightness, mainly due to electronic components. Several phenomena can cause the insurgence of different types of noise, listed below.

• The Thermal noise (nth), also known as Johnson-Nyquist noise is generated by thermal

agitation of the charge carries, such as electrons, moving in an electrical conductor [22] [23]. All conductors at nonzero temperature generate this type of noise. Its frequency distribution is approximately white, which means that the power spectral density of this process is constant throughout the frequency spectrum. Its amplitude behaviour is modelled with a Gaussian distribution [24] with mean value equal to zero and standard deviation σth.

n_th ∈ N(0, σ_th2 ) 2.12

• The Shot noise (n_shut) is caused by the particle nature of light. In the time interval T_shut in which the camera shutter is opened, the number of photons (n_p) that reaches the sensor is not exactly determined but suffers a slight random variation. Such a variation represents the noise. The number of photons hitting the sensor in T_shut is a discrete random variable with Poisson distribution [25] of mean value μ_p. It is worth noting that, if the number of photons is sufficiently high, the Poisson distribution can be approximated by a Gaussian distribution which mean value and standard deviation are equal to μ_p. Therefore, being the intensity of an InfraRed image always higher than zero, since the electromagnetic emission of a body depend

(22)

on its temperature and this is always higher than 0°K, we can model the shut noise as a Gaussian random variable with mean value equal to zero and variance γ_shut2 _{∙ μ}

p linearly

dependent on the mean number of photons11. The power spectral density of the shut noise can also be considered white for our purposes.

n_shut ∈ N(0, γ_shut2 ∙ μ_p) 2.13

• The quantization noise (n_quant) is generated during the analogic to digital conversion. This conversion is necessary because the single pixel must store the energetic level coming from its IFOV into a finite number of bits, and therefore, into a finite number of levels12_{. The}

quantization noise is linearly dependent on the input signal and it has a uniform distribution in an interval as large as the difference between two consecutive levels (∆) [26].

n_quant∈ U(−∆

2, ∆

2) 2.14

In this work, it has been used a camera capable with Q = 16bit of precision. By means of Equation 2.15, we know that the signal to quantization noise ratio (SQNR), in 16 bit precision images, is equal to 96.32dB. Such a SQNR is high enough to permit us to neglect this noise type.

SQNR = 20 ∙ Log(2Q_{) = 6.02 ∙ Q} _2.15

The resulting noise intensity can be considered as the sum of all the previous types.

n = n_th+ n_shut+ n_quant ≅ n_th+ n_shut 2.16

11_{Indicating the incoming radiance with the L letter we can say that it depends on the number of photons n}

p,

by means of a constant called γshut

L = γshut∙ np, np∈ P(μp, μp) →

→ L = γshut∙ μp+ γshut∙ ∆p, ∆p= np− μp →

→ L = Lsignal+ nshut, Lsignal= c ∙ μp, nshut= γshut∙ ∆p∈ N(0, γshut2 ∙ μp)

12 _{The energetic values are already quantized in nature, as each photon carries a precise}

quantity of energy. Nevertheless, this quantity is infinitely small with respect to the whole quantity of energy that can be captured even by a single pixel. Therefore, energy can be considered as a continuous quantity.

(23)

Therefore, being both the remained components independent and Gaussian distributed and being the sum of independent Gaussian variables, still Gaussian, we can affirm that the total noise is additive13, white14 and Gaussian.

n ∈ N(0, σ_th2 + γ_shut2 ∙ μp) 2.17

13_{It means it is added to the signal.}

(24)

2 B

ACKGROUND REMOVAL

The purpose of this work is the analysis of background subtraction techniques aimed at emphasizing the target intensity in the InfraRed image with respect to the background in order to correctly detect it. In Figure 2.1 a general block diagram describing the entire target detection problem is depicted.

Figure 2.1) Target detection block diagram

It consists of an initial operation of noise removal, a subsequent operation of background estimation and a stage of background rejection, which leads to the enhancement of the target. Finally, a decision is taken over the pixels, in order to indicate, by means of the output map(i, j, t), which of them belong to the target and which not. The decision block may include a preliminary operation of normalization of the pixels with respect to the standard deviation of the residual clutter. This is motivated by the fact that the background estimation block is supposed to annihilate the local mean value, and, although this operation weakens the standard deviation of the clutter, it does not completely nullify it.

Such diagram should be intended as a general example of a target detection framework using background subtraction. Actually, such operations could be performed jointly or in a different order. In this chapter, an analysis of the properties of the InfraRed signal will be discussed and, then, the filters used to remove the background will be presented.

2.1 Introduction to background removal

The techniques used to estimate and remove the contribution of the background in an image can be classified in two approaches, the sequence-based and the frame-based. Such approaches are analysed hereinafter.

The sequence-based approach includes all the algorithms that take advantage of the stationarity of the background over time to predict its value in the successive image, by means of the elaboration of a data buffer previously collected.

The simplest algorithms belonging to this category are the ones that consider a stationary background and try to delate it by subtraction of consecutive frames [27]. In such techniques, only the pixel that changed their intensity, such as objects that appeared or disappeared, or that moved into

(25)

another position, are not rejected. These techniques can take into account the possible low speed of some targets which could not modify their position in the frame interval T_frame15_{, by subtracting}

images which are distant in time more than T_frame.

Other algorithms use a statistical approach and estimate the next frame by means of predictors like the Kalman filter [28].

The sequence-based approach does not match with the maritime background estimation because it is weak against the camera motion. In fact, the camera is likely to be mounted on a ship and thus it is prone to the sea movements. If the same pixel in two consecutive images focuses on different spatial locations, the algorithm is not capable to understand that the variation is due to the camera motion and not to a real change in the scene and thus it does not reject the mentioned pixel. For the same reason, this approach is weak against the rapid change in clutter pixel value due to the sea waves, which, on instead of being recognised as clutter and so attenuated, would be enhanced by the mentioned techniques, generating several false alarms [27] [29].

For those reasons, we will consider the frame-based approach, which includes all the algorithms that take advantage of the local spatial stationarity16 (hereinafter called flatness in order to not make confusion with the time stationarity) of the background to predict its value in the same image. Differently to the sequence-based, the frame-based techniques are not affected by the movement of the sensor, as they only consider one single frame each time. For the same reason, they are neither affected by the movement of the background. Thus, those techniques are the most common in maritime clutter estimation problems. Frame-based techniques can be characterized by the exploited domain (frequency or space).

Frequency-domain-based techniques exploit the frequency analysis in order to separate the background and noise components from the target component.

The frequency analysis of a mono-dimensional signal, whose behaviour follows the expected one for the considered issue (which is, instead bi-dimensional), is shown as an example in Figure 2.2. As it is shown, the background signal, having the highest spatial correlation level, is concentrated at

15_{The frame interval 𝑇}

𝑓𝑟𝑎𝑚𝑒 is the interval between two consecutive image acquisitions

16_{As said in Paragraph 1.3.2, the background signal can only be considered locally stationary. Indeed, it is a} stochastic process whose mean value and variance vary within the frame.

(26)

the lowest frequency, whereas the target signal, which is a few pixels intensity, follows a sinc-like17 function and is predominant in the middle frequencies. In the high frequencies, the noise contribution predominates. This is a pixel by pixel uncorrelated process, thus white18, as better explained in Chapter 1. Therefore, a specific band-pass filter should successfully enhance the target with respect to the other signal components [30]. Some of the algorithms proposed in this work can be classified in this category, as it will be explained later.

Figure 2.2) Frequency behaviour example of the received signal

Space-domain-based techniques, conversely, use spatial features to detect the targets inside the frame.

Some of those algorithms exploit databases, called ‘training set’, containing details of several targets such as their image, called ‘template’. As the target rotation with respect to the point of view of the camera would strongly influence the capability in recognising it, those algorithms need the database to be equipped with all the images of the target captured from all the directions. Actually, good performances are achieved by using shapes, instead of detailed images, of the targets. That permit to generalize the matching problem and to use a smaller database, as the shapes does not change much for ships belonging to the same class [18]. Similarly, rotations of angles of 30 ° with

17_{A rect function, which is a just approximation for the target signal, in the spatial domain is Fourier} transformed into a sinc function in the frequency domain.

18_{Actually, the noise process is only white in the considered band, as a complete white process is a pure} mathematically abstraction and it is not physically realizable. In fact, a white process should have infinite power, and that is impossible.

(27)

respect to the direction of the viewer result sufficiently performing. Such algorithms calculate a matching parameter between each part of the image to be checked and the elements of the training set in order to decide whether the target is present or not [18]. Obviously, the performance strongly depends on the number of available template. In military purposes, it is not possible to collect the shapes of all the classes of ships of the other countries, especially if they are enemies, which are the most important ships to be found. Furthermore, the purpose of this work is to detect targets which are so far from the camera that they a number of pixel too small to well distinguish even the shapes of a target. Thus, we will not focus on this type of algorithms.

Some other techniques use the information coming from each part of the image to design a very specific filter which can estimate the background signal [31]. Some of the algorithms analysed in this work can be classified in this category.

As one could expect, algorithms that combine ideas from all the mentioned categories have been also proposed [27].

2.2 Proposed filters

In this work, we focused on frame based filters as they are the most suitable for maritime scenarios. The analysed techniques exploit both space and frequency domain. Such techniques are listed below.

• 2D Window Average with Guard (2DWAG). • 2D Median (2DMED).

• (1D) Window Average with Guard (WAG). • (1D) Median (MED).

• MaxMean with Guard (MaxMeanG). • MaxMedian.

• Bilateral.

Further details are given in the following.

2.2.1 2D Window Average with Guard

This filter derives from the simple 2D moving average filter. In order to show how the 2D moving average filter works, the mono-dimensional moving average filter is explained. The 2D version replicate the same behaviour on the second dimension.

The moving average filter is a linear filter and works as follows:

1. A weighting window is initially centred on the first sample of the signal. 2. The samples within the window are weighted and summed.

(28)

3. The weighting window moves to the next sample of the signal. 4. The steps 2 and 3 are repeated for all the samples of the signal.

It is worth noting that such steps describe the discrete convolution operation, mathematically expressed by Equation 3.1, between the input signal and the weighting window.

f[m] ⊗ g[m] ≜ ∑∞n=−∞f[m − n] ∙ g[n] 3.1

It is also worth noting that, being both the signal and the weighting window finite, the transient phase of the first and last samples, where the weighting window outgoes the input limits, need to be managed. In this work, this problem has been resolved by truncating the weighting window where it exceeds the input signal.

The weighting window can have several shapes, depending on the specific application. Some examples are given in Figure 2.3, where a rectangular, a triangular and a sinc-like window are shown. The differences in weighting samples reflect in a different effect in the frequency domain, where the less sharp are the variation of such window, the more the lateral lobes of the filtering function are rejected [32].

Figure 2.3) Examples of weighting windows

The effect of the mediation of a certain number of samples mitigates the rapid variations such as the those produced by the noise. An example is given in Figure 2.4 where the red line represents the effect of the rectangular filter, plotted in the upper part of the figure, on the input signal, depicted in blue.

(29)

Figure 2.4) Moving average filter effect

The representation of the same signal in the frequency domain is shown in Figure 2.5, jointly with the frequency response of the filter.

Figure 2.5) Moving average filter effect in the frequency domain

As it is shown, the filter preserves only the central part of the frequency spectrum, which corresponds to the low frequency components, whereas attenuates the high frequency components. The first contains information related to the input signal, whereas the second contains information related to noise.

The dimensional moving average filter works similarly. It is also linear and based on a bi-dimensional weighting window that mediates and moves towards the abscissa and ordinate axis through the input image. Three examples of bi-dimensional weighting windows are given in Figure 2.6. The difference with the previously presented mono-dimensional ones is that the pixel selected by the window are not on a single line, but both on rows and columns.

(30)

Figure 2.6) Examples of bi-dimensional weighting windows (rectangular, pyramidal and sinc)

As the background component of the signal is, as said, the dominant contribution at low frequencies, we need to use a low-pass filter in order to highlight it. The parts that we want to filter away from the signal are the noise and the target, if it is present. Then, by subtracting the result from the original image, the target should be enhanced.

For a well-built camera, the noise energy is very low with respect to the clutter and to the target signals and, thus, the variations of the pixels from the true value of background or target are very small. For this reason, a simple low pass filter with a relatively large band could correctly mitigate its effect and no other considerations are needed.

Differently from the noise, the target signal is high with respect to the background signal. For this reason, when the moving window is positioned on the target, the mean value is strongly polarized by the value of the target and, therefore, the background estimation could be wrong.

To avoid the target bias, the so called Bi-Dimensional Window Average with Guard (2DWAG) has been proposed [15]. In this case, the window is similar to the bi-dimensional rectangular one, which is shown in Figure 2.6, but presents a ‘guard’ zone in the centre of the window, that is a region where the contribution given by the pixels is null. Both a three-dimensional and a bi-dimensional representation of the window are provided in Figure 2.7.

(31)

Figure 2.7) Bi-Dimensional Window Average with Guard

The value of the background estimation in position (i, j) is given by Equation 3.2, where b̂2SWAG(i, j)19 is the background estimation and wn(l, k) is the weighting window normalized in order

to sum one, having size (2 ∙ N + 1)× (2 ∙ N + 1), in which a guard having size (2 ∙ G + 1)× (2 ∙ G + 1) is introduced.

b̂_2SWAG(i, j) = ∑N_l=−N∑N_k=−Nw_n(l, k) ∙ s(i + l, j + k) 3.2 Where: • w_n(l, k) =_∑ _∑w(l,k)N _w(m,n) n=−N N m=−N • w(l, k) = {0, (l, k) ∈ [−G, G] 1, otherwise, G < N

As the meaning of the guard is to avoid the contribution of a target biases the estimation of the background, its dimension must be higher or equal to the target one, so that the whole target can be ignored. However, the bigger is the guard, the more distant are the pixels with non-null weight from the central one, which is the pixel whose background contribution we mean to estimate, and therefore, the less trustworthy is the estimation. Thus, the parameter G must be carefully chosen according to the target dimensions.

It is worth noting that this type of filters is not appropriate for non-flat background, as they are not capable to preserve important features such as edges, which, being rapid variation in the image intensity, belong to the high frequencies.

(32)

2.2.2 Bi-Dimensional Median

There is a different kind of filters called ‘bi-dimensional ranked order filters’ [15]. Similarly to the previously analysed, the effect of those filters on mono-dimensional signals will be explained first. Then, it will be extended on bi-dimensional signals.

Mono-dimensional ranked order filters work as follows: 1. A window is initially centred on the first sample of the signal.

2. The samples within the window are sorted in ascending order and the rth element is selected,

where r depends on the chosen ranked order filter. 3. The window moves to the next sample of the signal.

4. The steps 2 and 3 are repeated for all the samples of the signal.

The process is similar to the one followed by the moving average filter, but the ranked order filters are nonlinear.

The ranked order filter studied in this work is the median filter, which takes the central value of the set of sorted elements as output value.

Differently from the previous filters, the ranked order filters preserve most of the edges. A comparison between the effect of the moving average and the median filter upon a mono-dimensional signal which contains a sharp edge embedded in noise is shown in Figure 2.8. In the top part, the denoised input is plotted in red. The result of the filtering algorithm should be as similar as possible to this red plot. As it can be seen, the moving average filter, by calculating the mean value at each step, smooths the edge, which becomes a ramp with the base length equal to the window dimension. On the contrary, the median filter takes, at each step, the median value of the samples under the window. Therefore, as long as more than a half of the window is found before the edge, the median value will be nearer to the zero level with respect to the level assumed by the result of the moving average. After the centre of the window overcomes the edge, the median will be nearer to the one level. The result is, therefore, more similar to an edge than to a ramp. Such a characteristic, makes the ranked order filters more suitable to estimate nonstationary background.

(33)

Figure 2.8) Comparison about edges preservation

It is worth noting that if the window is larger than the double of the target dimension, there is a high probability that the median value will belong to the background. Thus, the median background estimation will be less influenced from the target presence than the averaging one. Figure 2.9 shows the improvement of the median filter, with respect to the averaging one, on a target smaller than a half of the window size.

Figure 2.9) Comparison about target influence on the background estimation

This means that, if the window is conveniently dimensioned with respect to the expected target, the median filter does not require any guard.

The bi-dimensional ranked order filter exploits a bi-dimensional window, and, therefore, the pixels are selected along rows and columns.

The used bi-dimensional ranked order filter takes the name of Bi-Dimensional Median (2DMED).

(34)

2.2.3 Dimensional reduction

The main disadvantage of the multi-dimensional ranked order filters is that they hardly preserve signal features which dimensionality is lower with respect to the representation space [33]. For example, for such filters it is hard to correctly estimate thin lines or sharp corners. For this reason, another family of filters called generalized ranked order filters has been designed. Such filters are multi-dimensional filters based on ranked order operations which exploit subspaces having a lower dimensionality than the representation space.

In the studied case, the bi-dimensional space is considered, thus. bi-dimensional generalized ranked order filters should operate in one dimension. They work as follows:

1. A window is initially centred on the top left corner pixel of the image.

2. Those of the elements within the window which are aligned on some previously selected directions passing by the central pixel are separately elaborated through a mono-dimensional ranked order filter.

3. The results obtained in 2 are elaborated by another mono-dimensional ranked order filter. 4. The window moves to the next pixel of the image.

5. The steps 2, 3 and 4 are repeated for all the pixels of the image.

In this work, the ‘Max-Median’ will be analysed. It chooses the median value from the samples belonging to four directions separated by 45° (step 2) and, then, takes the one with the maximum value (step 3).

In formula:

b̂_Max−Median(i, j) = max (z₁, z₂, z₃, z₄) 3.3 Where:

• z₁ = median[s(i, j − N), … , s(i, j), … , s(i, j + N)] • z2 = median[s(i − N, j), … , s(i, j), … , s(i + N, j)]

• z₃ = median[s(i − N, j − N), … , s(i, j), … , s(i + N, j + N)] • z₄ = median[s(i + N, j − N), … , s(i, j), … , s(i − N, j + N)]

A type of filter inspired by the Max-Median, the ‘Max-Mean’ filter, has been analysed too. Such filter chooses the mean value, instead of the mean, from the samples belonging to four directions separated by 45° and, among these, takes the one with the maximum value.

In formula:

b̂_Max−Median(i, j) = max (z₁, z₂, z₃, z₄) 3.4 Where:

(35)

• z2 = mean[s(i − N, j), … , s(i, j), … , s(i + N, j)]

• z₃ = mean[s(i − N, j − N), … , s(i, j), … , s(i + N, j + N)] • z₄ = mean[s(i + N, j − N), … , s(i, j), … , s(i − N, j + N)]

The Max-Mean filter, similarly to the moving average filter, suffers the peaks due to the targets in the background estimation, therefore a guard window of the same dimension of the expected target is used as well. Such filter takes the name of ‘Max-Mean with Guard (Max-MeanG)’.

2.2.4 Mono-dimensional filters

The previously presented filters have been designed basing on the consideration that we need to preserve the background edges. In maritime scenarios, a pivotal edge to be considered is the horizon line. Its importance is due to the fact that it is present in the majority of the sea images and, in addition, it is generally the part of the image where one is interested to find targets. Therefore, estimating the background nearby the horizon line is a crucial task.

The horizon line within an image generally follows one of the dimensions of the camera sensor (the horizontal one if the camera is horizontally placed and the vertical one if it is vertically placed, although only the first case will be considered here), unless the camera is rotated by a significant movement of the platform 20. Therefore, in order to preserve the horizon line, it is reasonable to use a filter which does not modify the horizontal edges at all, or, in other words, a filter that works orthogonally to the direction in which the variation occurs. This kind of filter is a mono-dimensional filter oriented towards the abscissa axis.

In order to better understand the meaning of the last sentence, let us consider the horizon as parallel to the abscissa and situated between the (hl)th and the (hl + 1)th lines. In such case,

considering a reference system in which the top left pixel is the (1,1) and proceed to the generic pixel (i, j) from left to right by increasing the i value and from top to bottom by increasing the j value, all the jth lines, with j < hl, should contain sky background, whereas, all the jth lines, with j > i, should

contain sea background.

If we consider bi-dimensional filters, like the ones seen previously, working on a window situated near the i_th line21, emerges that they are likely to produce a joint elaboration of both sea and

20_{In this case, however, the inverse rotation calculated by the inertial sensors information, can be applied in} order to make the horizon line parallel to the abscissa axis.

(36)

sky pixels. Thus, the output will suffer the presence of such a duality and thus the edge estimation will appear smoothed, instead of sharp as the horizon likely is.

On the contrary, a mono-dimensional abscissa oriented filter working on the ith line, would

only elaborate sky pixels, whereas, when working on the i + 1_th line, it would only elaborate sea pixels. Therefore, by avoid mixing sea and sky clutter it would preserve the edge between them without smoothing it.

The mono-dimensional filters used in this work are the mono-dimensional Window Average with Guard (WAG) and the mono-dimensional Median (MED).

2.2.5 Bilateral filter

As it has been described in the section 2.2.1, the filtering window can assign a different weight to each sample. In Figure 2.6, some examples have been given. The idea for the next analysed filter, called ‘Bilateral filter' [34], is to assign the weights on the basis of the pixels upon which the window is positioned, instead that assigning always the same a priori chosen weights. In other words, the weights change depending on the position of the window within the image. The way they change is designed so to enforce the weight given to the pixels which value is more similar to the central one, disadvantaging the most different pixels. In this way, when a strong edge is encountered, the pixel on the other side of the edge with respect to the centre of the window, are less influential on the output. The strength of this filter is, therefore, to automatically manage both the flat and the non-flat background. In fact, the advantage of using bilateral filters is that they preserve the edge between two different zones (like sea and sky, sea and land or cloudy and clear sky) and, at the same time, smooth the small differences within the flat background due to the noise and to the little variations of local mean in the clutter.

It is worth noting that when the weighting window is centred on a pixel where the target is present, the highest weights will be given to the other pixels belonging to the target. Thus, the target too would be included in the background estimation, wasting it. In order to avoid the preservation of the target in the background estimation, a guard window, bigger than the target dimensions, is used in the bilateral filter as well. With the help of a guard, when the weighting window is centred on the target, all its pixels should be included within (and thus the filter should assign them a null weight). Therefore, in such condition, even if all the pixels outside the guard, which belong to the background, are very different to the central one, which, on the contrary, belongs to the target, the estimation will not suffer this difference. In fact, the estimation would only be compromised if there were other pixels more similar to the central one than the background ones, but, the only more similar pixels are the ones that belong to the target itself and they will be ignored thanks to the guard effect.

(37)

These concepts are expressed in formula by Equation 3.5, where w_n(i,j)(l, k) represents the weighting window w(i,j)(l, k) centred on the (i, j) pixel, normalized to unitary sum, whereas mask(l, k) matrix represent the guard window.

b̂_Bilateral(i, j) = ∑ ∑N w_n(i,j)(l, k) ∙ s(i + l, j + k)

k=−N N l=−N 3.5 Where: • w_n(i,j)_{(l, k) =} w(i,j)(l,k) ∑Nm=−N∑Nn=−Nw(i,j)(m,n)

• w(i,j)_{(l, k) = e}−K∙{max[s(i,j)−s(i−l,j−k)]s(i,j)−s(i−l,j−k) } 2

∙ mask(l, k) • mask(l, k) = {0, l, k ∈ [−G, G]

1, otherwise , G < N

The K coefficient controls the actual weighting difference between the most different and the most similar pixels to the central one, therefore, the more structured the background is, the better results are obtained by bilateral filtering with high K. In this work integer values of K from 1 to 9 have been used. The bilateral filter with coefficient K will be indicated as ‘BilateralK’ hereinafter.

(38)

3 R

ESULTS

In this chapter, the results given by the proposed algorithms will be analysed. The filters have been applied on several sequences of real data which cover almost all the type of background listed in Paragraph 1.3.2. The capability of each filter to achieve the goal of the background subtraction has been evaluated by means of two parameters that will be illustrated as well.

3.1 Dataset

The characteristics of the used sensor and the organization of the data will be illustrated below.

3.1.1 Camera characteristics

The sequences have been captured using the camera ‘ERICA Plus’ produced by Leonardo S.p.a. [35]. Its characteristics are listed in Table 1.

Table 1) ERICA Plus technical specifications

Spectral Range MWIR 3.7-5µm

Resolution (512x640)

Pitch 16µm

Sensitivity <15°mK

Operative Temperature -32°C to + 55°C

Field of View Narrow (NFOV) Medium (MFOV) Wide (WFOV)

(1.2x0.94)° (4x3.3)° (24x19)°

Frame rate 50Hz

Data depth 16bit

The ERICA Plus camera uses the Land and Naval Defence Electronics Division MWIR Mercury Cadmium Telluride HAWK detector. The camera can work at 3 different zoom level, which are called ‘narrow FOV’ (NFOV), ‘medium FOV’ (MFOV) and ‘wide FOV’ (WFOV). For the collection of the used database, the WFOV mode has been used. Such mode provides a horizontal field of view of 24° and a vertical field of view of 19°. The IFOV of each pixel is equal to 0.65mrad. Leonardo S.p.a. also provides the performances of the ERICA Plus camera in terms of ‘Detection’, ‘Recognition’ and ‘Identification’ of different kinds of targets [35]. Such performances are shown in Figure 3.1.

(39)

Figure 3.1) Range performances of ERICA Plus

3.1.2 Dataset organization

The dataset of collected images contains several scenario types, belonging to the categories exposed in Paragraph 1.3.2. In particular, boats of different sizes have been used to evaluate the performances for target embedded in different sea clutter types, whereas a drone and several emergency flares have been used to evaluate the performances in structured and sky background.

The acquired images have been organized following an order which makes it easier to evaluate the potentials of each filter with respect to the typology of background surrounding the target. The organization has been modelled on the basis of the local background and not to the whole background because in each image, different background classes are present, therefore, the one which affect more the target detection, which is the local one, has been chosen as more representative. The order is listed below.

• Target embedded in the sky background. o Empty sky.

o Cloudy sky.

• Target embedded in the sea background. o Far from the horizon.

o Near to the horizon. o Near to the coast.

o Near to the horizon, with the presence of glint. • Target embedded in the structured background.

For each category, if more than a sequence has been used, the order in which each sequence have been presented is by decreasing dimension of the target, in terms of pixels, inside the image. In the Section 3.3, a representative frame of each sequence will be shown.