On some open problems in the implementation of the linear sampling method

(1)

PH.D. IN MATHEMATICS

xviii cycle

Riccardo Aramini

On some open problems

in the implementation

of the linear sampling method

Supervisor

(2)

(3)

Preface V

1 Inverse problems and regularization 1

1.1 Direct and inverse problems . . . 1

1.2 Well-posed and ill-posed problems . . . 3

1.3 Formulation of a linear inverse problem . . . 5

1.4 Facing ill-posedness . . . 9

1.5 Generalized inverse operators . . . 10

1.5.1 The case of compact operators . . . 16

1.6 Regularization theory: a general formulation . . . 20

1.7 Regularization algorithms . . . 31

1.7.1 TSVD (case of exact operator) . . . 31

1.7.2 Tikhonov’s method . . . 33

1.8 The generalized discrepancy principle . . . 50

1.8.1 Preliminary considerations . . . 50

1.8.2 The incompatible case . . . 54

1.8.3 The compatible case . . . 62

1.8.4 A mixed approach (in the compatible case) . . . 65

2 The direct and the inverse scattering problem 77 2.1 Formulation of the direct scattering problem . . . 78

2.2 The far-field pattern of the scattered field . . . 90

2.3 Formulation of the inverse scattering problem . . . 100

2.4 The general theorem . . . 110

2.5 The linear sampling method . . . 132

3 The linear sampling method without sampling 141 3.1 A new implementation of the linear sampling method . . . 141

3.2 Band-limitedness of the indicator function . . . 157

(4)

3.4 Using a new family of indicator functions . . . 165

3.5 Facing the cut-off problem: deformable models . . . 173

3.6 Conclusions . . . 177

A Mathematical miscellany 179 A.1 Direct sum of vector spaces . . . 179

A.2 Multi-index notation . . . 182

A.3 Spaces of continuous functions . . . 182

A.4 Real-analytic functions . . . 183

A.5 Distributions . . . 184

A.6 Spaces of Lebesgue integrable functions . . . 186

A.7 Fourier transform . . . 188

A.8 Sobolev spaces (first family) . . . 189

A.9 Sobolev spaces (second family) . . . 191

A.10 Links between the two families of Sobolev spaces (1) . . . 194

A.11 Partition of unity . . . 194

A.12 Lipschitz domains and Ck _{domains . . . .} ₁₉₅

A.13 Links between the two families of Sobolev spaces (2) . . . 196

A.14 Sobolev spaces on the boundary . . . 197

A.15 Trace operators (1) . . . 200

A.16 Green identities . . . 201

A.17 Trace operators (2) . . . 202

A.18 Generalized Green identities and their consequences . . . 205

A.19 Transpose operators . . . 208

B Figures 213 B.1 [2.5] The linear sampling method . . . 214

B.2 [3.1] A new implementation of the linear sampling method . . . 217

B.3 [3.2] Band-limitedness of the indicator function . . . 226

B.4 [3.3] Spatial resolution . . . 228

B.5 [3.5] Deformable models . . . 230

(5)

Electromagnetic scattering is a physical phenomenon in which an electromagnetic incident wave is scattered by an obstacle or an inhomogeneity and the total field at any point in space can be expressed as the sum of the original incident field and the scattered one. The direct electromagnetic scattering problem consists in determining the scattered field, once the geometrical and physical properties of the scatterer, as well as the incident field, are known. Among the various corresponding inverse electromagnetic scattering problems we can conceive, we are particularly interested in the following one: to get information on the support of the scatterer, once the incident wave and the far-field pattern, i.e. the scattered wave considered at large distances from the scatterer, are known.

In correspondence with the recent development of several new techniques in remote sensing and non-invasive investigation, in the last years inverse electromagnetic scattering problems have increasingly drawn the attention of scientific community, in particular with regard to the following applications1_:

1. medical diagnostics and therapy: for example, in using microwaves to detect bone marrow

cancer (leukaemia) or breast cancer, as well as to make hyperthermia treatment;

2. non-destructive testing: for example, in looking for small cracks inside metallic or plastic

structures;

3. mine removal, in the case in which one wants to recover the location of mines in a

minefield from aerial measurements of the wave scattered by such mines when reached by a known electromagnetic field sent by a scout plane flying over them;

4. radar, when not only the presence and the number of some moving objects are to be

detected, but also some information about their dimensions and shapes is needed.

In general, there are two main difficulties making inverse scattering problems hard to solve:

1_{For an interesting review of some applications and methods in inverse electromagnetic (and acoustic)}

(6)

(a) they are ill-posed (in the sense of Hadamard [40]);

(b) they are non-linear.

In order to briefly discuss these two points, we can observe what follows.

(a’) Any reliable approach to the solution of an ill-posed problem has to face, at some stage, questions of uniqueness and stability; in particular, it is well-known that any numerical implementation of a method for solving an ill-posed (or an ill-conditioned) problem needs to involve, at some step, a regularization procedure in order to damp the wild oscillations that, without regularization, would completely blur the solution owing to the presence of noise in the measured data and an actually uncontrolled error propagation from the data to the solution themselves. However, such pathologies can be satisfactorily cured, at least for linear problems, by regularization theory: some basic concepts and techniques of the latter are, in fact, introduced in chapter 1.

(b’) Unfortunately, the genuine non-linearity of inverse scattering problems in general prevents one from using the powerful tools of regularization theory holding in the linear case. Traditional approaches for solving such problems are substantially of two kinds:

(i’) non-linear optimization schemes, which provide an iterative (and, often, very accu-rate) reconstruction of the scatterer starting from an initial guess about its geomet-rical properties;

(ii’) weak scattering approximation methods, which allow one to linearize the problem by means of suitable approximations, such as physical optics (holding when the wavelength is much smaller than the linear dimensions of the scatterer and the latter is a conductor) or Born approximation (holding when the wavelength is larger than the linear dimensions of the scatterer and the latter is a penetrable object having a low contrast with respect to the background medium).

Both kinds of approach suffer from some heavy drawbacks. As regards (i’), we point out that the difficulty in implementing iterative optimization algorithms is twofold: first, they require long (and, sometimes, extremely long) reconstruction times; second, they need to be quite accurately initialized, while, on the other hand, there are several applications, e.g. in medical imaging, in which the a priori available information about the geometrical properties of the scatterer does not allow for an initial guess that is accurate enough (if any at all). As regards (ii’), we point out that weak scattering approximation methods typically require a priori knowledge of the physical conditions under which the scattering phenomenon has generated the measured far-field pattern: more precisely, one should know whether the object has been penetrated by the incident wave or not, and, in the latter case, what kind of boundary condition the total field satisfies on the boundary of

(7)

the object itself. Moreover, situations occur in which no weak scattering approximation is possible: a typical example is microwave tomography in medical imaging applications.

In order to overcome the above mentioned drawbacks, in 1996 a new approach to inverse scattering problem solving has been proposed by Colton and Kirsch [26] (with two important additional contributions by Colton, Piana, Potthast in 1997 [29] and by Cakoni, Colton, Haddar in 2002 [17]): it is the by now quite famous linear sampling method, which is essentially a computational procedure providing a visualization of the support of the scatterer. Roughly speaking, it works as follows: the N × N measurements of the far-field pattern of the scattered wave at N observation angles and for N incident fixed-frequency fields are put in a N × N matrix, called far-field matrix, and a finite grid of sampling points in a spatial region containing the scatterer is chosen; then, for each grid point zl, a linear algebraic system is written, whose

left-hand side coefficients are the elements of the far-field matrix, while the right-hand side is a known vector with N components depending on zl; moreover, for each zl, a regularized

solution of the above linear system is determined and its Euclidean norm is computed. Finally, the shape of the scatterer can be detected by the set of grid points in which such a norm (playing the role of indicator function) is mostly large. Of course, several different indicator functions are possible, since a visualization of the scatterer profile can also be obtained by mapping the values of a suitable monotonically increasing or decreasing function I of the norm itself. However, for the mathematical and technical aspects concerning the linear sampling method in its traditional formulation, we directly refer, as regards this PhD thesis, to sections 2.4 and 2.5 of chapter 2 (which is entirely devoted to introducing the direct and, even more, inverse scattering problem we are interested in).

Here we would rather point out the main features of the linear sampling method and explain the reasons for its usefulness:

• as indicated by the name itself, the linear sampling method is actually linear: more precisely, the indicator function is ultimately obtained by solving a finite number of ill-conditioned linear systems (one for each grid point). This means that the implemen-tation is compuimplemen-tationally simple and the numerical instability typical of noisy inverse problems can be easily handled by using the classical tools of regularization theory for ill-conditioned linear systems;

• the above linearity does not derive from any sort of approximation based on particular physical conditions; in other terms, no approximation concerning the wavelength or the physical properties of the scatterer is needed;

• very little a priori information on the scatterer is required: more precisely, it is not necessary to know a priori the number of connected pieces forming the scatterer, nor

(8)

whether they are penetrable by the wave or not, and, in the latter case, which kind of (possibly mixed) boundary conditions the total field satisfies (piecewise) on the boundary of the scatterer. It suffices only to know that the latter is actually inside the chosen grid of sampling points;

• the implementation of the linear sampling method is computationally fast: more precisely, the reconstruction of a two-dimensional scatterer from real data requires a few minutes, while for complex three-dimensional objects not more than a couple of hours is typically necessary.

On the other hand, the linear sampling method has also some drawbacks. The main one is that, in the case of scattering from a penetrable and inhomogeneous object, it can only provide a visualization of the support of the scatterer, but no information at all about the point values of the index of refraction; however, one should remember that, even in a purely theoretical context, it is possible to prove that in several situations (e.g. anisotropic objects) only the support of the scatterer is uniquely determined and not the point values of the index of refraction. Another drawback is that scatterers are, in general, not accurately recovered as regards their possible concavities, which tend to be “convexified”, as pointed out in [22].

Finally, the linear sampling method still presents some open problems from three points of view:

(a) its mathematical foundation;

(b) its numerical implementation;

(c) the quantitative assessment of its performances.

The original results obtained by working at this PhD thesis mainly2 _{concern the previous}

points (b) and (c) and are illustrated in chapter 3, according to the approach proposed in [3] and adding some further details or applications. More precisely, we try to discuss and face the following four problems:

(i) is there a criterion suggesting how to choose the parameters of an “optimal” grid containing the scatterer (i.e. number of points and sampling distance)?

(ii) is it possible to give a characterization of the indicator function in terms of its physical meaning or analytical properties?

2_{As far as we know, also the two theorems 1.7.6 and 1.7.7, as well as the blended regularization presented}

(9)

(iii) which is the spatial resolution power achievable by means of the linear sampling method?

(iiii) once the visualization map, i.e. the indicator function, is available, which general crite-rion can suggest the thresholding level for its values? Or, in other terms, what can be considered “large” or “small” for the indicator function (respectively depending on the increasing or decreasing monotonicity of the function I)?

To this end, in section 3.1 we present a new (no-sampling) implementation of the linear sampling method in which the set of the angle-discretized far-field equations for all sampling points is replaced by a single functional equation formulated in a Hilbert space defined as a direct sum of L2 _{spaces; this removes the problem of choicing a grid of sampling points and}

allows one to determine, by means of a unique regularization process, a regularized solution of the above functional equation, in such a way that the regularization parameter does not depend any longer on the sampling point and an analytical representation for any indicator function is therefore possible without any sampling in the space. Then, for sake of simplicity, in section 3.2 we choose a particular indicator function whose analytical expression allows one to show that it is band-limited and, consequently, to obtain (in section 3.3) some theoretical information about the spatial resolution achievable by the method. Moreover, in the same framework of our no-sampling implementation, in section 3.4 we discuss the possibility of using a different family of indicator functions (with no apparent gain in visualization accuracy), while in section 3.5 we outline the technique of deformable contour models in order to face the problem stated in the previous point (iiii).

Finally, this PhD thesis ends with two appendices: appendix A, which collects in few pages a good number of definitions, notations, theorems and properties which we often need to use and recall (mainly in chapter 2), and appendix B, which contains all the figures3 _illustrating

chapters 2 and 3 (chapter 1 has no figures): this should avoid an excess of fragmentation in the written text and make it more easily readable.

Throughout the text, the black square, i.e. the symbol ¥, indicates the end of the proof of a theorem, while the empty square, i.e. the symbol ¤, indicates the end of a remark or of an example.

Acknowledgements

During the last three years, working under the guide of my advisor professor Michele Piana and with my colleague doctor Massimo Brignone has been a pleasure and an enriching experience for me, not only from a strictly professional or scientific viewpoint, but also by virtue of their valued friendship. I deeply hope that it will be possible to continue such a collaboration during the next years or decades.

(10)

I also thank several professors and PhD students of the University of Trento, for their helpfulness, kindness and friendship: all of them have contributed to enable me to spend a good time in Trento.

Finally, I wish to express my gratitude to my friends and, above all, to my parents, who, although not directly involved in my scientific activity, have often helped me with their (not only psychological) support.

(11)

Inverse problems and regularization

In this chapter we introduce the concept of inverse problem and we explain that, in general, an inverse problem is ill-posed (or ill-conditioned), this implying, in particular, that its solution is either non-existing, or not unique or completely blurred by noise and, consequently, devoid of any physical meaning.

Then we show that such pathologies can be cured by regularization theory, which allows one to define a new concept of solution of an inverse problem: it is the so-called regularized solution, which represents a generally satisfying compromise between accuracy in reproducing the noisy data and stability with respect to noisy perturbations of the data themselves. Although a certain number of different regularization methods is known in literature, we mainly focus on Tikhonov’s one, since it is just the one we shall use in chapters 2 and 3 to implement the linear sampling method.

1.1. Direct and inverse problems

From a strictly mathematical point of view, the concept of inverse problem is quite ambiguous, in the sense that it would be only possible to speak about a couple of reciprocally inverse problems, as suggested by a well-known statement of J. B. Keller [43]: “We call two problems

inverses of one another if the formulation of each involves all or part of the solution of the

other”. However, the same author goes on: “Often, for historical reasons, one of the two problems has been studied extensively for some time, while the other has never been studied and is not so well understood. In such cases, the former is called the direct problem, while the latter is the inverse problem”.

Mathematical physics is rich of such problems: their peculiarity is a sort of duality, by which the data of one problem are all or part of the unknowns of the other and conversely, so that it may be asked by virtue of which criterion a direct problem can be distinguished by an inverse

(12)

one. The fact is that from a physical point of view the situation is quite different, since the two problems are not on the same level: the direct one starts from known causes to compute unknown effects, i.e. it is oriented along a cause-effect sequence, while the corresponding inverse one works backwards, since it consists in computing the unknown causes of given effects. Generally speaking, direct problems have always been considered by physicists more fundamental than inverse ones, and consequently they also have been more studied.

Thus, the historical reasons mentioned by Keller are basically physical reasons, since only physical laws can establish what are the causes and what are the effects, and provide the equations relating the effects to the causes. Let us see some examples about this.

First of all, if we consider Newtonian mechanics, we know that its second law relates force (cause) to acceleration (effect) and, consequently, to trajectory. So a direct problem is, for instance, the computation of trajectories of particles from the forces acting upon them (and their initial conditions), while the corresponding inverse problem is the determination of the forces from the knowledge of the trajectories. From this point of view, Newton succeeded in solving the first inverse problem when he drew the explicit form of the gravitation force from the Kepler laws describing the trajectories of the planets.

However, with regard to the application of modern methods in inverse problem solving, other examples are more suitable. In scattering and diffraction theory, the aim of the direct problem is to calculate the scattered (or diffracted) waves starting from the knowledge of the sources and the obstacles, while the inverse problem consists in determining the obstacles when the data are the sources and the scattered (or diffracted) waves. This kind of inverse problems is very important in non-destructive evaluation (e.g. medical imaging), which consists in sounding an object by means of an appropriate radiation source.

Another typical example of direct problem can be found in wave-propagation theory, when, starting from the knowledge of a given source, one has to compute the field radiated by it (for instance, the radiation pattern of a given antenna); obviously, the corresponding inverse problem consists in determining the source from the knowledge of the radiated field (in the previous example, the aim is to compute the current distribution in the antenna, given the radiation pattern).

We can also consider potential theory: a direct problem is computing the potential generated by a known distribution of masses (or charges), while the corresponding inverse problem consists in determining the mass (or charge) distribution, given the potential generated by it.

Another field rich of this kind of problems is instrumental physics, i.e. the physics of in-struments such as electronic devices, imaging systems and so on. In these cases, the direct problem consists in determining the output of the instrument (e.g. the image) from the knowl-edge of the input (e.g. the object) and the characteristics of the instrument (impulse response function, etc.), while the inverse problem is the computation of the input from the knowledge of the instrument and of its output.

(13)

We have already said that a direct problem is oriented along a cause-effect sequence; now we want to point out that it is also directed towards a loss of information, in the sense that its solution is the result of a transition from a physical quantity with a certain information content to another physical quantity with a smaller information content. This is a common feature for most direct problems and we shall investigate it more precisely in the next section. Here we only observe that in general the solution of a direct problem is much smoother than the data: for instance, the image yielded by a bandlimited system is smoother than the corresponding object (if the object involves out of band frequencies), the wave scattered by a rough obstacle can be smooth, and so on. An interesting example of this property can be found in [9], p. 3-4. As a consequence, a conceptual difficulty common to most inverse problems arises, since their solution requires a transformation which should involve a gain of information. This difficulty is referred to as ill-posedness and we shall consider it in the next section.

Finally, although there exists a certain number of mathematically interesting nonlinear inverse problems, in the following we shall consider only linear inverse problems. The reason is threefold:

1) linear problems, eventually deriving from the linearization of nonlinear ones, are currently the most important for the applications;

2) well-known mathematical methods and efficient numerical algorithms for the computa-tion of their solucomputa-tions are already at disposal;

3) we are mainly interested in implementing the linear sampling method (abbr. LSM, to which section 2.5 is dedicated) for the solution of inverse scattering problems.

1.2. Well-posed and ill-posed problems

In the previous section we have mentioned ill-posedness as a typical feature of inverse problems: our aim is now to give some definitions and comments in order to be more precise in handling this important property.

First of all, let us recall the basic concept of well-posed problem, introduced for the first time in 1902 by the French mathematician Jacques Hadamard: he gave a definition of such a concept in a paper on boundary-value problems for partial differential equations and their physical interpretation [39]. In this first formulation, a problem was called well-posed if its solution was unique and existed for arbitrary data. However, in a subsequent treatise [40], written in 1923, Hadamard pointed up the requirement of continuous dependence of the solution on the data, since a solution that varies very much for small changes of the data cannot be considered a solution from a physical point of view: in fact, physical data are always affected by errors and an uncontrolled propagation of them in the solution makes the latter physically meaningless.

(14)

Definition 1.2.1. A problem is well-posed (in the sense of Hadamard) if it satisfies the fol-lowing three properties:

1. the solution is unique;

2. the solution exists for arbitrary data;

3. the solution depends continuously on the data.

Definition 1.2.2. A problem is ill-posed if at least one of the previous three properties is not verified.

Therefore a problem is ill-posed if its solution is not unique or1 _{does not exist for arbitrary}

data or2 _{does not depend continuously on the data.}

Hadamard was convinced that problems deriving from physics had always to be well-posed; this point of view was heavily conditioned by the physics of the nineteenth century. The mathematical requirements of existence, uniqueness and continuity of the solution correspond to the “philosophical” ideal of a unique, complete and stable determination of the physical events. Consequently, ill-posed problems were considered for a long time (up to the late 1960s) as mathematical pathologies, devoid of real interest in the context of applied mathematics, and were not seriously studied.

Anyway, the subsequent discovery of the ill-posedness of most inverse problems has fully changed this point of view: with the development of the inverse problems theory and its application to many areas of applied sciences, ill-posedness has become a crucial point for their solution, both in functional and numerical analysis.

For instance, the following impressive example of ill-posed problem, due to Hadamard himself [40], has been considered for many years of merely mathematical interest, but in 1977 it turned out that the basic inverse problem of electrocardiography [25], i.e. the reconstruction of the epicardial potential from body surface maps, can be formulated just as a Cauchy problem for an elliptic equation, i.e. a generalization of the Laplace equation.

Example 1.2.1. Let us consider the Laplace equation in two dimensions

∂2_u

∂x2 +

∂2_u

∂y2 = 0 (1.1)

and the associated Cauchy problem characterized by the data

u(x, 0) = 1

n cos(nx),

∂u

∂y(x, 0) = 0. (1.2)

1_{Here or has the same meaning of the Latin vel.} 2_Idem.

(15)

Then the unique solution of this Cauchy problem is

u(x, y) = 1

ncos(nx) cosh(ny). (1.3)

If the same Laplace equation (1.1) is considered with the Cauchy data

u(x, 0) = 0, ∂u

∂y(x, 0) = 0, (1.4)

the unique solution of the new Cauchy problem is u(x, y) = 0. Now, for sufficiently large n the distance (measured, e.g., by means of the supremum norm) between the Cauchy data (1.2) and (1.4) can be made arbitrarily small for any x, while at any given finite distance from the x-axis the solution (1.3) grows to infinity. This is a classical example showing the effects produced by a non-continuous dependence of the solution on the data. It is interesting to observe that if the oscillating function (1.2) represents the experimental errors on the data, then, by linearity, the error propagation from the data to the solution is described by the function (1.3), and its effect is so heavy that the solution corresponding to these real data is physically meaningless. Besides, it can be shown that the solution doesn’t exist for any data, but only for data endowed with some specific analyticity properties. ¤

1.3. Formulation of a linear inverse problem

The observations and comments in the previous sections suggest the following general state-ment: a direct problem, i.e. a problem oriented along a cause-effect sequence, is well-posed, while the corresponding inverse problem, which implies a reversal of the cause-effect sequence, is ill-posed. However, this statement is not completely meaningful unless we can yield an ap-propriate mathematical environment for the description of direct and inverse problems. For sake of perspicuity, we shall often use terms and expressions that are frequently used in imag-ing problems: obviously, this is not necessary, but it can help in some cases by virtue of the intuitive meaning of this terminology.

The first point is to define the direct problem: its solution provides a linear operator A, whose domain is the linear space X of the objects to be imaged, which correspond to suitable functions with certain properties, and whose range is in the linear space Y of the images, which correspond to appropriate functions describing, in the inverse problem, the measurable data. Naturally, X is called object space and Y image space; in this context, they represent functional spaces that are typically taken as Hilbert spaces. There are good reasons for this choice: first of all, we need a distance, in order to know whether two objects (or images) are close or not, so that our spaces have to be metric. Secondly, a scalar product turns out to be particularly appropriate in the case of discrete data: indeed, in such a case the operator A maps a function

(16)

into a vector and, if A is continuous3_{, each component of this vector can be represented, by}

virtue of Riesz representation theorem4_{, just in terms of a useful scalar product. Finally, the}

space L2 _{of the measurable functions such that} R _{|f (x)|}2_{dx < ∞ contains the finite energy}

signals, which are the only ones to be physically achievable.

The second point is to observe that the space Y cannot coincide with the range R(A) of the operator A, but can only strictly contain it. In fact, recalling the obvious decomposition

Y = R(A) ⊕ R(A)⊥_, _(1.5)

it’s easy to realize that R(A) is the set of all the noise-free images: the direct problem being well-posed, the operator A associates a unique image to each object. As already observed in section 1.1, this image may be rather smooth, since its information content is smaller than the one of the corresponding object. However, a measured image is certainly a noisy image, since it corresponds to a noise-free image corrupted by the noise affecting the measurement process: as a consequence, the smoothness property mentioned above may not be satisfied and, in any case, the measured image may not belong to R(A).

So the third point is to define properly the Hilbert space Y , in such a way that it contains both the noise-free and the noisy images.

Summing up, the solution of the direct problem defines a linear continuous operator A :

X → Y between two Hilbert spaces: X is the space of the objects, Y the space of all the

possible images, noise-free and noisy ones.

Definition 1.3.1. Let X and Y be two normed vector spaces (not necessarily complete); we shall denote with B(X, Y ) the space of the linear continuous (i.e. bounded) operators from X to Y .

Remark 1.3.1. We recall that the operatorial norm, defined in any one of the following equivalent ways: 1. kAk := sup kxkX6=0 kAxkY kxkX ; 2. kAk := sup kxkX=1 kAxkY; 3. kAk := sup kxkX≤1 kAxkY;

4. kAk := inf{C ∈ R | kAxkY ≤ CkxkX ∀x ∈ X},

makes B(X, Y ) a normed vector space, which is also complete if Y is complete in turn. In the

following, unless otherwise specified, we shall always consider the particular case in which X and Y are Hilbert spaces. ¤

3_{As we shall always admit, insofar as the direct problem is well-posed.} 4_{See theorem A.1.3 in appendix A.}

(17)

By virtue of the previous mathematical setting, now we can explain more precisely the loss of information that characterizes the solution of a direct problem. First of all, it may obviously happen that two, or even more, different objects have exactly the same image. Since the operator is linear, this fact corresponds to the existence of objects (called invisible) whose image is zero. In other terms, given any object in the space X, if an invisible object is added to it, we obtain a different object which has exactly the same image of the previous one. Secondly, it may happen that two very distant objects have very close images; in other words, there may exist very large sets of different objects that are mapped by the operator A into very small sets of images.

Thus, if we now consider the inverse problem

g = Af, (1.6)

i.e. the problem of determining the object f corresponding to a given image g, it is easy to realize that its ill-posedness is strictly related to the loss of information that affects the solution of the direct one. Indeed, if the image g corresponds to two (or more) different objects, the solution of the inverse problem is not unique (in this case, N (A) 6= {0}, being

N (A) the kernel of A). If g is a noisy image, which doesn’t belong to R(A), then the solution

to the inverse problem doesn’t exist (in this case, D(A−1_{) 6= Y , being D(A}−1_{) the domain}

of the inverse operator A−1_{). Finally, if we have two neighbouring images g}

1, g2 such that

the corresponding objects f1, f2 are very distant, then the solution of the inverse problem

doesn’t depend continuously on the data (in this case, the operator A−1 _{is not continuous).}

Obviously, all three cases may occur in the same inverse problem. So we can understand why the requirements of the following definition, which is simply the reformulation of definition 1.2.1 for a linear inverse problem, are not, in general, fulfilled:

Definition 1.3.2. A linear inverse problem is well-posed if the three following properties hold:

1. N (A) = {0};

2. D(A−1_{) = Y ;}

3. A−1 _{: Y → X is continuous.}

However, it is worthwhile noticing, also for future purpose, that if the first two requirements are satisfied, also the third one is: this is a consequence of the two following theorems.

Theorem 1.3.1. [Open mapping theorem]. Let X and Y be Banach spaces; if A ∈ B(X, Y )

is surjective, then the image by A of an open set in X is an open set in Y , i.e. the map A is open.

Proof. This is a fundamental theorem in functional analysis. For a proof, see, for instance, [57]. ¥

(18)

Theorem 1.3.2. Let X and Y be Banach spaces; if A ∈ B(X, Y ) is bijective, then A−1 _is

continuous.

Proof. In particular, A is continuous and surjective: by theorem 1.3.1, it maps open sets into open sets. Furthermore, A is injective, so that A−1 _{exists, D(A}−1_{) = Y and the inverse image}

by A−1 _{of an open set in X is an open set in Y , i.e. A}−1 _{is continuous. ¥}

It is also interesting to observe that neither the mere continuity of the inverse operator

A−1_{, which is trivially verified in a finite-dimensional context}5_{, would be sufficient to assure}

the stability of the solution: in other terms, well-posedness is not a sufficient condition for the stability of the solution of a linear inverse problem. Indeed, let us assume that A−1 _is

well-defined and continuous. Then, with reference to equation (1.6), if δg is a small variation of the datum and δf is the corresponding variation of the solution, the continuity of A−1 _implies

kδf kX ≤ kA−1kkδgkY, (1.7)

where k · kX, k · kY now denote respectively the norms in the Hilbert spaces X and Y induced

by the scalar products in X and in Y themselves. On the other hand, the continuity of A implies

kf kX ≥ kgkY kAk, (1.8) so that kδf kX kf kX ≤ kAkkA−1_kkδgkY kgkY . (1.9)

The real positive number

C(A) := kAkkA−1_k _(1.10)

is said condition number and provides an estimate of the instability of the problem. Since we have

kgkY = kAf kY ≤ kAkkf kX = kAkkA−1gkX ≤ kAkkA−1kkgkY, (1.11)

it is always

C(A) ≥ 1. (1.12)

Hence, if C(A) À 1 (in some LSM-applications6_{, for instance, it is up to the order of 10}10_),

a small variation δg on the datum can produce an enormous variation δf on the solution: in such a case, the inverse problem is said ill-conditioned. In other terms, relation (1.9) shows

5_{This typically happens when the original continuous problem is discretized.} 6_{See chapter 2.}

(19)

that the presence of an error, however small, on the datum of an ill-conditioned problem can make its solution extremely unstable7_.

So the next point is: how to cure ill-posedness (or ill-conditioning)?

1.4. Facing ill-posedness

First of all, we could observe that the property of non-continuous dependence of the solution on the data is strictly verified only for ill-posed problems formulated in infinite-dimensional spaces; in practice, one has to deal with discrete data and with discrete, finite-dimensional problems. Now, the discrete version of a continuous linear inverse problem is a linear algebraic system, apparently an easy mathematical problem: there exist a lot of methods that yield a numerical solution to it. However, this problem is obtained by discretizing a problem with very bad mathematical properties: indeed, its mathematical solution simply doesn’t work, in the sense that it is physically meaningless. If we now remember the last part of the previous section, we can easily imagine what happens.

We already know that in the continuous case small oscillating data can produce large oscillating solutions. In any inverse problem, data are always affected by noise, which can be considered as a small randomly oscillating function. Thus, the solution method amplifies the noise generating a large and wildly oscillating function which fully hides the physically meaningful solution corresponding to the exact, i.e. noise-free data. This property is still true for the discrete version of the continuous ill-posed problem, since the corresponding linear algebraic system is ill-conditioned: even if the solution exists and is unique, it is completely corrupted by a minimum error on the data.

Summing up, whereas on the one hand we can compute a unique solution of our algebraic system, on the other hand this solution is meaningless; the physically meaningful solution we are seeking is not a solution of the problem but only an approximate solution, in the sense that, when mapped by the matrix representing the discretized version of the operator A, it reproduces the data not exactly, but only within the experimental errors. Anyway, if we search for approximate solutions, they turn out to form a very large set, which contains completely different functions, as a consequence of the loss of information in the direct problem. Thus our problem is: how can we choose the good ones?

We can now state the so-called golden rule for solving ill-posed inverse problems: look for

approximate solutions satisfying additional constraints coming from the physics of the problem.

Let us clarify this statement by means of the mathematical model introduced just above. The set of the approximate solutions that reproduce (within a certain amount of error) the

7_{We point out that inequality (1.9) cannot be improved since, in some cases, equality holds true: see [9], p.}

(20)

same data function is the set of the objects whose images are close to the measured one. The set of such objects is too large, due to the loss of information in the direct problem. Thus we need some additional information, also called a priori or prior information, in order to compensate this loss. This information is additional in the sense that it cannot be retrieved from the image or from the properties of the mapping A that describes the direct problem, but represents some expected physical properties of the object. Its role is to reduce the set of the objects that are compatible with the measured image, or also to distinguish meaningful objects from spurious ones, generated by overwhelming propagation of the noise affecting the image.

Let us see some simple examples of additional information.

1. The object cannot be too large: this implies a constraint in the form of an upper bound on the object itself, or its intensity, or its energy, etc.

2. The object is smooth, so that, for example, its derivatives must be smaller than a certain quantity.

3. The object is known to be non-negative.

4. The object must be different from zero only inside a given bounded region.

Furthermore, a quite different kind of additional information may be represented by sta-tistical properties of the objects. In this case, the objects to be restored are assumed to be realizations of a random process with known probability distribution (this can be a way of expressing our previous experience in object restoring). Although a complete knowledge of the probability distribution is not always at disposal, also a partial knowledge of statistical properties of the object (for instance, the expectation values or covariance matrices) may be useful.

Thus, the principle of the regularization methods is to use the additional information explic-itly, from the very beginning, to construct families of approximate solutions, that is of objects compatible with the measured image. These methods are now one of the most effective tools for the solution of inverse problems.

1.5. Generalized inverse operators

Given an ill-posed linear inverse problem, a first step towards the objective determination of an approximate solution consists of looking for functions minimizing in some sense the distance between their image by the operator A and the datum. More precisely, we introduce the least-squares problem associated to the linear inverse one (1.6), defined as the problem of determining

f ∈ X such that

(21)

where X and Y are Hilbert spaces.

Definition 1.5.1. A solution of the least-squares problem (1.13) is said a normal solution or pseudosolution.

The characterization of pseudosolutions is given by the following theorem.

Theorem 1.5.1. Given A ∈ B(X, Y ), let P denote the linear projection onto R(A), closure of the range of A, and let g be a generic element of Y . Then the following properties of u ∈ X are equivalent:

(i) Au = P g;

(ii) kAu − gkY ≤ kAf − gkY ∀f ∈ X;

(iii) A∗_{Au = A}∗_g,

where A∗ _{denotes the adjoint operator of A, defined, as usual, by the condition}8 _{(Af, g)}

Y =

(f, A∗_g)

X ∀f ∈ X, ∀g ∈ Y .

Proof. (i) ⇒ (ii): by virtue of the decomposition Y = R(A) ⊕ R(A)⊥_{, one has that P g − g ∈}

R(A)⊥_{; besides Af − P g ∈ R(A), so that}

kAf − gk2_Y = kAf − P gk2_Y + kP g − gk2_Y. (1.14)

Then, by using hypothesis (i), we have

kAf − gk2

Y = kAf − P gk2Y + kAu − gk2Y ≥ kAu − gk2Y ∀f ∈ X, (1.15)

i.e. (ii).

(ii) ⇒ (iii): since P g ∈ R(A) and R(A) is closed, there exists a sequence {fn}∞n=1 such that

P g = limn→∞Afn, i.e. limn→∞kAfn− P gkY = 0. If we now assume that u ∈ X is such that

(ii) holds, by virtue of the continuity of the norm we have

kAu − gk2

Y = kAu − P gk2Y + lim_n→∞kAfn− gk2Y ≥ kAu − P gk2Y + kAu − gk2Y. (1.16)

It follows that Au − P g = 0, then Au − g = P g − g and A∗_{Au − A}∗_{g = A}∗_{(P g − g). But}

P g − g ∈ R(A)⊥ _{= N (A}∗_{), so that (iii) is true.}

(iii) ⇒ (i): if (iii) holds, then Au − g ∈ N (A∗_{) = R(A)}⊥ _{and (i) follows. ¥}

8_{In the following, we shall denote with ( , )}

X the scalar product in a Hilbert space X. We choose “right

component conjugation”, i.e. (x1, ax2)X = ¯a(x1, x2)X ∀a ∈ C and ∀x1, x2 ∈ X, where we obviously denote

(22)

If g ∈ Y , the set of the pseudosolutions associated to g will be denoted with Sg. Clearly,

by virtue of condition (i) of theorem 1.5.1, it follows that Sg is empty if and only if R(A) is

not closed and g ∈ R(A) \ R(A). In other terms, it holds

Sg 6= ∅ ⇔ g ∈ R(A) ⊕ R(A)⊥. (1.17)

Let us now suppose that Sg is not empty and let u0 be a generic pseudosolution: it’s trivial to

show that

Sg = {u = u0+ ϕ, Aϕ = 0} , (1.18)

or, in a shorter form,

Sg = u0+ N (A). (1.19)

Then, it is easy to realize that the set Sg is convex and closed in the Hilbert space X; therefore,

for a well-known theorem, there exists a unique pseudosolution with minimal norm. Hence, we can introduce the following definition.

Definition 1.5.2. Given the inverse problem g = Af , if the set Sg of its pseudosolutions is

not empty, the unique element of Sg having minimal norm will be called generalized solution

of the problem and it will be denoted with f†_{. Moreover, the operator A}†_{: D(A}†_{) → X, defined}

by the condition

A†_{g = f}† _{∀g ∈ D(A}†_{) ⊂ Y,} _(1.20)

will be called generalized inverse operator.

Recalling the double implication (1.17), we can immediately realize that

D(A†_{) = R(A) ⊕ R(A)}⊥_. _(1.21)

It is worthwhile observing that the condition of minimal norm in the definition of generalized solution may have a quite natural physical meaning. Indeed, for example, the L2_{-norm of a}

signal is a measure of its energy and minimizing the energy of a signal is a typical way to reduce its instability9_.

Of course, the concepts of generalized solution and of generalized inverse operator are fundamental in studying linear inverse problems. In order to describe them in a better way, we give the two following theorems.

Theorem 1.5.2. The generalized solution f† _{is the unique pseudosolution belonging to N (A)}⊥_.

9_{However it may happen that minimum conditions other than the L}2 _{minimization more realistically fulfil}

the problem requirements. A possible generalization of the definition of generalized solution is given by replacing the minimum norm condition with the more general constraint kCf kZ = minimum, where C is a closed linear

operator with range in the Hilbert space Z. The closedness condition is due to the fact that the use of

(23)

Proof. Since f†_{∈ X, we have the obvious decomposition}

f† _{= u}

1+ u2, (1.22)

with u1 ∈ N (A)⊥ and u2 ∈ N (A). We immediately have

u1 = f†− u2, (1.23)

and, by remembering representation (1.18) and recalling that f† _{is, in particular, a}

pseudoso-lution, we get that u1 is a pseudosolution too. Furthermore, we can write

kf†_k2

X = ku1+ u2k2X = ku1kX2 + ku2k2X ≥ ku1k2X. (1.24)

Since by definition f† _{is the unique pseudosolution of minimal norm, the relation (1.24) can}

hold only if u2 = 0. This implies that f†= u1 and so f†∈ N (A)⊥. Finally, if we substitute f†

to u0 in the decomposition (1.19), we find that f† is the unique pseudosolution in N (A)⊥. ¥

Theorem 1.5.3. The generalized inverse operator A†_{, defined by relation (1.20), is linear.}

Proof. Let g1 and g2 be two elements of the domain D(A†) of the generalized inverse operator

A†_{. Then, remembering condition (i) of theorem 1.5.1 and relation (1.20), we have immediately}

AA†_g

1 = P g1, AA†g2 = P g2. (1.25)

Since P g1, P g2 ∈ R(A), by linearity of A and P we have that also P (g1+ g2) = P g1+ P g2 ∈

R(A). Thus, not only g1+ g2 ∈ D(A†) and AA†(g1+ g2) = P (g1+ g2), but, recalling the two

relations (1.25), we also get

AA†_g

1+ AA†g2 = AA†(g1+ g2), (1.26)

and then, by linearity of A, we have that A†_g

1+ A†g2− A†(g1+ g2) ∈ N (A). But A†g1, A†g2

and A†_(g

1 + g2) are the generalized solution respectively corresponding to the data g1, g2 and

g1+ g2, so that A†g1, A†g2, A†(g1+ g2) ∈ N (A)⊥. Since the latter is a linear subspace of X, it

follows that A†_g

1+ A†g2− A†(g1+ g2) ∈ N (A)⊥. Summing up, we have found that

A†_g

1+ A†g2 − A†(g1+ g2) ∈ N (A) ∩ N (A)⊥. (1.27)

But obviously N (A) ∩ N (A)⊥ _{= {0}, so that}

A†_g

1+ A†g2 = A†(g1+ g2). (1.28)

In a fully analogous way it can be shown that A†_{(ag) = aA}†_{g ∀a ∈ C. ¥}

We can also establish a relation between the range of the generalized inverse operator and the range of the adjoint one: this is the aim of the following theorem.

(24)

Theorem 1.5.4. Given A ∈ B(X, Y ), it holds R(A∗_{) ⊂ R(A}†_{). Furthermore, if R(A) is}

closed, then R(A∗_{) = R(A}†_).

Proof. If u ∈ R(A∗_{), then u ∈ R(A}∗_{) = N (A)}⊥_{. If we define the element g := Au, then u}

is the generalized solution corresponding to g, since it is trivially pseudosolution and has no components in N (A). Hence u = A†_{g and so u ∈ R(A}†_{); summing up, R(A}∗_{) ⊂ R(A}†_).

Furthermore, let us now suppose that R(A) is closed; then, it suffices to prove the other inclusion R(A†_{) ⊂ R(A}∗_{). If u ∈ R(A}†_{), it is also u ∈ N (A)}⊥ _{by virtue of theorem 1.5.2. Now,}

if R(A) is closed, also R(A∗_{) is closed}10 _{and then N (A)}⊥ _{= R(A}∗_{). It follows that u ∈ R(A}∗₎

and, finally, R(A†_{) ⊂ R(A}∗_{). ¥}

Since we have introduced the generalized solution and the generalized inverse operator, we are in a position to formulate a new inverse problem that consists in determining the solution of two subsequent minimum problems, described by the two equations

kAf − gkY = minimum (1.29)

and

kf kX = minimum. (1.30)

Such a problem is well-posed if and only if, ∀g ∈ Y , the generalized solution exists unique and the generalized inverse operator is continuous. There is an entire class of operators in B(X, Y ) for which the well-posedness of the problem (1.29), (1.30) is ensured. Indeed, if R(A) is closed, the space of the data can be decomposed in the form Y = R(A) ⊕ R(A)⊥_{. It follows that}

D(A†_{) = Y and then, ∀g ∈ Y , the set S}

g of pseudosolutions is not empty. Hence, the existence

and uniqueness of the generalized solution is a direct consequence of its definition, while the continuity of the generalized inverse operator is guaranteed by the following lemma and the subsequent theorem.

Lemma 1.5.5. If A ∈ B(X, Y ) and R(A) is closed, then ∃ m > 0 such that

kAf kY ≥ mkf kX ∀f ∈ N (A)⊥. (1.31)

Proof. If A is the null operator A = 0, then N (A)⊥ _{= {0} and, as a consequence, inequality}

(1.31) is trivially verified by choosing an arbitrary m ∈ R.

If A 6= 0, i.e. N (A)⊥ _{6= {0}, then the restriction A}0 _{: N (A)}⊥ _{→ R(A) of A is a bijective}

and continuous operator between two Hilbert spaces and then, by theorem 1.3.2, (A0₎−1 _is

continuous. Hence, ∀f ∈ N (A)⊥_{, we have}

kf kX = k(A0)−1Af kX ≤ k(A0)−1kkAf kY, (1.32)

(25)

so that the thesis holds with m := k(A0₎−1_k−1_{. ¥}

Theorem 1.5.6. Let A ∈ B(X, Y ): then A† _{is continuous ⇔ R(A) is closed.}

Proof. “⇒”: by absurd, if R(A) were not closed, then D(A†_{) would be dense in Y, being}

D(A†_{) = R(A) ⊕ R(A)}⊥_{. Clearly, ∀g ∈ D(A}†_{) it holds:}

AA†_{g = P g.} _(1.33)

Since A†_{is linear, continuous and densely defined in Y , it can be extended to a linear continuous}

operator ˆA† _{defined on all Y . Hence, if g ∈ D(A}†_{), equation (1.33) holds; if g ∈ Y \ D(A}†_{), let}

{gn}∞n=0 be a sequence in D(A†) such that limn→∞gn = g, i.e. limn→∞kgn− gkY = 0. Since

{gn}∞n=0 ⊂ D(A†), by (1.33) we have AA†gn = P gn ∀n ∈ N or, equivalently,

A ˆA†_g

n= P gn ∀n ∈ N; (1.34)

then, taking the limit as n → ∞ of both members of the previous equation and using the continuity of A, ˆA† _{and P , we finally get}

A ˆA†_{g = P g.} _(1.35)

Summing up, we have found that

A ˆA†_{g = P g} _{∀g ∈ Y.} _(1.36)

But this means that equation Af = g has a pseudosolution, precisely ˆA†_{g, for each datum}

g ∈ Y ; on the other hand, it is always possible to choose g ∈ R(A) \ R(A), so that, for such a g, no pseudosolution of Af = g exists (see the double implication (1.17)). Hence, we have got

a contradiction.

“⇐”: since we have in this case

AA†_{g = P g} _{∀g ∈ Y,} _(1.37)

we can write

kgkY ≥ kP gkY = kAA†gkY ≥ mkA†gkX ∀g ∈ Y, (1.38)

where the last inequality holds by virtue of lemma 1.5.5. Finally, we get

kA†gkX ≤ 1

mkgkY, (1.39)

i.e. A† _{is continuous. ¥}

Theorem 1.5.6 states that if R(A) is closed, the problem of determining the generalized solution is well-posed, while if R(A) is not closed, the determination of f† _{is surely ill-posed.}

(26)

1.5.1. The case of compact operators

Among the operators whose range, in general, is not closed, there are some extremely mean-ingful classes that are frequently met in the applications: an important example is the one of compact operators (see, e.g., [57]). Firstly we recall some basic definitions and theorems about them.

Definition 1.5.3. Let K be a subset of a topological space; K is said compact if and only if for any (not necessarily countable) family of open sets {Ui}i∈I such that ∪i∈IAi ⊃ K, there

exists a finite subset J ⊂ I such that ∪i∈JAi ⊃ K.

The previous definition can be summarized saying that K is called compact if and only if from any one of its open coverings it is always possible to extract a finite subcovering.

Definition 1.5.4. A subset of a topological space is said to be relatively compact if its closure is compact.

Remark 1.5.1. If K is a compact subset of a normed vector space X (not necessarily com-plete), then it is bounded. Indeed, in a normed vector space we can consider a generic open ball of centre x0 and radius r, defined as:

B(x0, r) := {x ∈ X | kx − x0kX < r}. (1.40)

Then, fixed r > 0, for each point xi ∈ K we can consider the ball Bi := B(xi, r). Clearly

{Bi}i∈I is an open covering of K; since K is compact, we can extract a finite subcovering, i.e.

there exists a finite subset J ⊂ I such that K ⊂ ∪i∈JBi. If we now consider the maximum R

of the distances between the centres of the balls, i.e.

R := max

i,j∈Jkxi− xjkX, (1.41)

and arbitrarily choose an xi0 among the centres xi, i ∈ J, we easily realize that ∪i∈JBi ⊂

B(xi0, R + r); hence K ⊂ B(xi0, R + r), and this means that K is bounded. ¤

Definition 1.5.5. Let A : X → Y be a linear operator between the normed vector spaces (not necessarily complete) X and Y ; A is said compact if it maps bounded sets onto relatively compact sets.

Remark 1.5.2. If a linear operator is compact, then it is bounded. Indeed, if, according to notation (1.40), we denote with B(0, 1) the closure of the open ball in X with centre in 0 ∈ X and radius 1, we have that, by definition of compact operator, the set A³B(0, 1)´is compact in Y : hence, by virtue of remark 1.5.1, such a set is also bounded. In particular, this implies that there exists C > 0 such that

(27)

it follows that

sup

kxkX=1

kAxkY ≤ C, (1.43)

i.e. A is bounded. ¤

We now state two theorems: the first one, remembering theorem 1.5.6, implies that if the inverse problem we are interested in is modelled by a linear compact operator, then, in general, it is ill-posed; the second one is very useful in the computational implementations, in which also continuous problems need to be discretized in finite-dimensional spaces.

Theorem 1.5.7. Let X and Y be Banach spaces; if A ∈ B(X, Y ) is compact and its range is closed, then the range dimension is finite (and the operator is said of finite range).

Proof. The operator A : X → R(A) is linear, continuous and surjective between two Banach spaces. Then, by virtue of the open mapping theorem 1.3.1, given any g ∈ R(A), g = Af , the image by A of the unitary open sphere of centre f is an open set in R(A) containing g. Since

A is compact, the closure of such an open set (which is still contained in R(A)) is compact;

thus, each element of R(A) has a compact neighbourhood in R(A). This means that R(A) is locally compact and then, being also a normed vector space, it is of finite dimension by virtue of a theorem by Riesz [49]. ¥

Theorem 1.5.8. Let X and Y be normed vector spaces (not necessarily complete); if A ∈ B(X, Y ) is of finite range, then it is compact.

Proof. If V ⊂ X is bounded, i.e. there exists C < ∞ such that kxkX ≤ C ∀x ∈ V , then

kAxkY ≤ kAk kxkX ≤ CkAk ∀x ∈ V , i.e. A(V ) is bounded. Moreover, since dimR(A) = n <

∞, R(A) is closed; hence A(V ) is a closed and bounded subset of the normed vector space

of finite dimension R(A), which, as such, is always homeomorphic to an Rn _{(in our case, we}

obviously have n = dimR(A)). By virtue of Heine-Borel’s theorem, A(V ) is compact; hence,

A is compact. ¥

Also for future purpose, we are now going to show that if X, Y are Hilbert spaces and

A : X → Y is compact, the generalized solution, if it exists, can be written explicitly in terms

of the datum g and the singular system of the operator A.

Let us briefly recall that if A is compact, then the operators A∗_{A and AA}∗ _{are compact,}

self-adjoint and positive11_{. They also have the same positive eigenvalues with the same multiplicity.}

Let σ2

k be these eigenvalues, ordered in a decreasing sequence (σ20 ≥ σ12 ≥ σ22 ≥ . . . ): except in

11_{We remind that a linear operator T : X → X, with X a Hilbert space, is called positive if (x, T x)}

X ≥ 0

∀x ∈ X; T is called strictly positive if (x, T x)X> 0 ∀x ∈ X with x 6= 0. It is possible to prove that if a linear

(28)

degenerate cases where they are in finite number, the sequence {σ2 k}

∞

k=0 tends to zero when n

tends to infinity. It can be shown that it is always possible to find vector sets {uk}∞_k=0 ⊂ X

and {vk}∞_k=0 ⊂ Y such that:

1. denoting with σk the positive square root of σk2, it holds:

Auk= σkvk, A∗vk= σkuk; (1.44)

2. the set {uk}∞_k=0⊂ X form an orthonormal basis in N (A)⊥.

Furthermore, it trivially turns out that {uk}∞_k=0 ⊂ X are all the eigenvectors (in N (A∗A)⊥ =

N (A)⊥_{) of the operator A}∗_{A, while {v}

k}∞_k=0 ⊂ Y are all the eigenvectors (in N (AA∗)⊥ =

N (A∗₎⊥_{) of AA}∗ _{and form an orthonormal basis in R(A).}

Let us now observe that any f ∈ X can be univocally decomposed as

f = f1+ f2, with f1 ∈ N (A)⊥, f2 ∈ N (A), (1.45)

so that Af = Af1. Since it obviously holds

f1 = ∞

X

k=0

(f, uk)Xuk, (1.46)

by means of the continuity and linearity of A and of the first of relations (1.44), we get

Af1 = A Ã lim n→∞ n X k=0 (f, uk)Xuk ! = lim n→∞ n X k=0 (f, uk)XAuk = ∞ X k=0 (f, uk)Xσkvk. (1.47)

Summing up, we have found the following relation

Af =

∞

X

k=0

σk(f, uk)X vk ∀f ∈ X, (1.48)

and in an analogous way we can get one for A∗_{, i.e.}

A∗_{g =} ∞

X

k=0

σk(g, vk)Y uk ∀g ∈ Y. (1.49)

Definition 1.5.6. The positive numbers σk and the vectors uk, vk, for k ∈ N, are respectively

called the singular values and the singular vectors (functions) of the compact operator A. The set of triples {σk, uk, vk}∞k=0 is said the singular system of A; the representation (1.48) [(1.49)]

(29)

Remark 1.5.3. The singular representations (1.48) and (1.49) easily imply that kAk = σ0

and kA∗_{k = σ}

0 respectively. Let us prove this result for A (the argument for A∗ is obviously

the same). To this end, let f ∈ X be such that kf kX = 1: then, by virtue of (1.48), we have

kAf kY = ° ° ° ° ° ∞ X k=0 σk(f, uk)Xvk ° ° ° ° ° Y = v u u tX∞ k=0 σ2 k|(f, uk)X|2 ≤ σ0 v u u tX∞ k=0 |(f, uk)X|2 ≤ σ0, (1.50)

where equality can hold (it suffices to take, e.g., f = u0). Hence, remembering definition12

kAk := sup

kf kX=1

kAf kY, from inequality (1.50) we immediately get kAk = σ0. ¤

Let us now consider the so-called Euler equation, which is exactly the third condition that characterizes pseudosolutions in theorem (1.5.1), i.e.

A∗Af = A∗g. (1.51)

By inserting in both sides of the previous equation the singular representations (1.48) and (1.49), we get σ2_j(f, uj)X = σj(g, vj)Y ∀j ∈ N, (1.52) that is (f, uj)X = 1 σj (g, vj)Y ∀j ∈ N. (1.53)

It easily follows that necessary and sufficient condition for the existence of pseudosolutions, i.e. solutions of equation (1.51), is that

∞ X k=0 1 σ2 k |(g, vk)Y|2 < ∞, (1.54)

which is called Picard’s condition and is basically posed on the datum g. If the Picard’s condition holds, it is immediate to realize that the following series

∞ X k=0 1 σk (g, vk)Y uk (1.55)

converges to a pseudosolution, even better to the generalized solution f†_{, since all the vectors}

uk are in N (A)⊥. Summing up, we have found the explicit representation

A†_{g = f}† ₌ ∞ X k=0 1 σk (g, vk)Y uk. (1.56)

(30)

Coming back to the case of a generic operator A ∈ B(X, Y ) (also not compact), we conclude this section 1.5 noticing that, analogously to what observed in section 1.3 about inequality (1.9), the fact that R(A) is closed does not ensure the stability of the generalized solution. Indeed, it is possible to prove [10] the following inequality:

kδf†_k X kf†_k X ≤ C(A)kδgkY kgkY , (1.57)

where, this time, the condition number is given by

C(A) = kAkkA†_k. _(1.58)

It can be shown [8] that, also in this case, it is always C(A) ≥ 1. Obviously, if C(A) À 1 and the datum is affected by error, the generalized solution is numerically unstable. Hence, looking for the generalized solution of an inverse problem, instead of the solution itself, does not free us from the necessity of using regularization algorithms. Of course, this does not mean that the concept of generalized solution is useless; on the contrary, it is fundamental just in handling the regularization algorithms themselves.

1.6. Regularization theory: a general formulation

Given a linear inverse problem, if the generalized inverse operator is not continuous or the problem is characterized by a very large condition number, the knowledge of the generalized solution, if it exists, is nearly useless from the point of view of applications to real data. In such cases, indeed, as a consequence of any measurement operation, there is always an error on the datum; this error propagates on the generalized solution and makes it numerically unstable and then physically meaningless. In such a situation, some methods yielding stable estimates of the generalized solutions are needed. In scientific literature there exist various algorithms of this kind and their description is rigorously developed in the ambit of regularization theory (see, e.g., [33], [37], [6], [67], [52]). Here we shall only give some basic definitions and some hints about the fundamental techniques we are going to employ in the following chapters.

In general terms, regularization is the approximation of an ill-posed problem by a one-parameter (usually denoted with α) family of neighbouring well-posed problems. We now want to motivate the future definition of a regularization operator and of a regularization method by making some considerations about the fact that our aim is to approximate the generalized solution f† _{= A}†_{g of the usual (and exact) linear inverse problem}

Af = g (1.59)

in the most general case in which the error or the noise affect not only the exact datum g, but also the exact operator A, i.e. there may be also modelling errors, so that we don’t know

(31)

neither g nor A, but only some approximations of them. In practice, we shall have to deal with a noisy version of the previous problem, i.e.

Ahf = gδ, (1.60)

where gδ represents a perturbed version of the datum and Ah is an approximate version of the

operator; we shall work in a deterministic framework, i.e. we shall assume to know a priori the following noise bounds:

kgδ− gkY ≤ δ (1.61)

and

kAh− Ak ≤ h. (1.62)

In the following, for notational convenience, we shall sometimes denote with η := (δ, h) the two noise levels together.

Remark 1.6.1. Also for future purpose, it is useful to observe that, in any case, for each exact datum g, its noisy version gδ can be represented as

gδ = g + wδ, (1.63)

where wδ is the noise function and is such that kwδkY ≤ δ.

Analogously, for each exact operator A, its noisy version Ah can be represented as

Ah = A + Nh (1.64)

where Nh is the noise operator and is such that kNhk ≤ h.

Expressions (1.63) and (1.64) may not be explicitly known, but it is always possible to assume that they exist. In fact they are quite general and do not mean that the noise is necessarily additive, since wδ and Nh may respectively depend on g and A. ¤

Remark 1.6.2. If the exact equation (1.59) falls within the mathematical model of a physical phenomenon, then one has necessarily that g ∈ R(A) (otherwise the model would be inconsis-tent, giving rise to an impossible equation). In such a case, since R(A) ⊂ D(A†_{), we have that}

g = Af† _{and consequently expression (1.63) can be rewritten as}

gδ = Af†+ wδ. (1.65)

However, in some situations, the exact equation is or may be impossible, i.e. one may have

g /∈ R(A): as we shall see, this is, in general, just the case of the linear sampling method,

whose basic equation, although physically interpretable as a focusing condition [22], does not actually model any physical phenomenon. Obviously, in such a circumstance representation (1.65) may not hold. ¤