(will be inserted by the editor)

### An ℓ

2### -ℓ

q### regularization method

### for large discrete ill-posed problems

Alessandro Buccini · Lothar Reichelthe date of receipt and acceptance should be inserted later

Abstract Ill-posed problems arise in many areas of science and engineering. Their
solutions, if they exist, are very sensitive to perturbations in the data. Regularization
aims to reduce this sensitivity. Typically, regularization methods replace the original
problem by a minimization problem with a fidelity term and a regularization term.
Recently, the use of a p-norm to measure the fidelity term, and a q-norm to measure
the regularization term, has received considerable attention. The relative importance
of these terms is determined by a regularization parameter. This paper discussed how
the latter parameter can be determined with the aid of the discrepancy principle. We
primarily focus on the situation when p = 2 and 0 < q_{≤ 2, where we note that when}
0 < q < 1, the minimization problem may be non-convex.

Keywords ℓ2-ℓq minimization_{· ill-posed problem · iterative method}
Mathematics Subject Classification (2000) 65F10· 65R32 · 90C26

1 Introduction

We consider the computation of an approximate solution of minimization problems of the form

min

x∈RnkAx − b δ

k2, (1)

where A ∈ Rm×n is a large matrix, whose singular values decrease to zero
grad-ually with no significant gap, and the vector bδ _{∈} Rm represents measured
error-contaminated data. We will sometimes refer to the error in bδ as “noise.” The norm
k · k2 in (1) denotes the Euclidean norm. We also will comment on the use of other

norms. A. Buccini

Department of Mathematical Sciences, Kent State University, Kent, OH 44242, USA. E-mail: [email protected]

L. Reichel

Department of Mathematical Sciences, Kent State University, Kent, OH 44242, USA. E-mail: [email protected]

*Minimization problems of the kind (1) are commonly referred to as discrete *
*ill-posed problems. They typically arise from the discretization of ill-ill-posed problems, such*
as Fredholm integral equations of the first kind with a smooth kernel; see, e.g., [11, 15,
17] for discussions on ill-posed and discrete ill-posed problems.

Let b∈ Rm denote the (unknown) error-free vector associated with bδ. We will assume that b is in the range of A and that a fairly sharp bound δ for the error in bδ

is known, i.e., _{
}

b− bδ

2≤ δ. (2)

These assumptions will allow us to determine a regularization parameter with the aid of the discrepancy principle.

Since A is ill-conditioned and bδ is contaminated by error, the na¨ıve solution, A†bδ, of (1), where A† denotes the Moore-Penrose pseudoinverse of A, usually is not a meaningful approximation of the desired vector

b

x := A†b (3)

due to severe amplification and propagation of the error in bδ. To achieve a more
accurate approximation ofbx, the original discrete ill-posed problem (1) is replaced by
a nearby well-posed problem, whose solution is less sensitive to the error in bδ. This
*replacement is known as regularization.*

A regularization technique that recently has received considerable attention is to replace (1) by an ℓp-ℓq minimization problem of the form

x∗:= arg min x 1 p Ax− bδ p p+ µ qkLxk q q =: arg min x J (x), (4)

where the regularization matrix L∈Rℓ×nis such that

N (A) ∩ N (L) = {0}. (5)

Here_{N (M) denotes the null space of the matrix M and}

kzks:= n X j=1 |zj|s 1/s , z = [z1, . . . , zn]T ∈Rn.

We will refer tok · ksas the s-norm of z also for 0 < s < 1, even though the mapping

z_{→ kzk}sdoes not satisfy the triangle inequality and therefore is not a norm for these

s-values.

The regularization parameter µ > 0 in (4) balances the relative influence of the first term (the fidelity term) and the second term (the regularization term). The use of 0 < p, q ≤ 2 has received considerable attention; see, e.g., [3,4,7,12,20,22,24] and references therein. Note that if either 0 < p < 1 or 0 < q < 1, then the functional (4) generally is non-convex. When p = q = 2, the problem (4) reduces to Tikhonov regularization in general form [14, 17, 23].

We briefly comment on the choice of q. In many situations it is known that the desired vector (3) is sparse in some basis. To enhance sparsity, we may consider using a regularization term with ℓ0-norm. However, the minimization problem so obtained is very difficult to solve. Therefore, it is common to approximate the ℓ0-norm by the ℓ1 -norm. The main advantage of using this approximation is that the ℓ1-norm is convex.

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

Fig. 1: Comparison of different ℓq-norms. The solid black line represents the ℓ0-norm, the dotted black line shows the ℓ1-norm, the dark gray solid line displays the ℓ0.5-norm, and the light gray solid line depicts the ℓ0.1-norm.

This makes the computation of a solution easier. However, ℓq-norms with 0 < q < 1 are better approximations to the ℓ0-norm. In particular, the smaller q, the better the approximation, i.e., if q1is smaller than q2then the ℓq1-norm is a better approximation

of the 0-norm than the ℓq2_{-norm; see Figure 1 for an illustration and [24] for a }
discus-sion. The main drawback of using q < 1 is that the resulting minimization problem (4)
may be non-convex.

We turn to the choice of p. The value of p should depend on the type of noise in the data bδ. For white Gaussian noise, the choice p = 2 is appropriate. We will primarily consider this choice in the present paper.

A possible approach for solving the minimization problem (4) with p = 2 is to approximate the ℓq-norm by a weighted ℓ2-norm. By iteratively refining this approxi-mation, it is possible to effectively compute a solution of (4). This process is known as the iteratively reweighted norm (IRN) method and has been applied in several different situations with good results [10, 13, 24, 25, 27]. IRN methods proceed by solving a se-quence of weighted least-squares problems until a solution of (4) has been determined to desired accuracy. Applications of IRN-type methods to minimization problems (4) with p or q smaller than unity are described in [20, 24]. It is shown in [20] that the solutions of the sequence of weighted least-squares problems converge to a stationary point of the functional (4).

None of the works on solution methods for (4) mentioned discuss the choice of the regularization parameter µ > 0. It is the purpose of the present paper to describe several algorithms, that are based on the discrepancy principle, for choosing a suitable µ-value, and to provide theoretical results that shed some light on their performances. Our algorithms for determining µ are based on the IRN-type method described in [20] and there denoted by FMM-GKS. We outline the idea behind this method and denote it simply by MM-GKS. Each iteration with this method can be divided into two steps: The first step majorizes the functional to be minimized in (4) by a quadratic

function that is tangent to the functional at the current approximation. Then, in the second step, the unique minimizer of the majorant is computed and used as the new iterate.

In our first algorithm for determining a suitable value of the regularization
param-eter µ, we supplement the iterations described in [20] with an a priori chosen
monoton-ically decreasing sequence of values µ(k), k = 1, 2, . . . , of the regularization parameter.
The sequence is chosen such that µ(k)_{→ 0 as k → ∞. Inspired by results due to Hanke}
and Groetsch [16], we will use a geometrically decreasing sequence in the numerical
examples of Section 4. However, any monotonically decreasing sequence that converges
to zero can be used. We will show that, when no stopping criterion is employed, the
iterates generated in this manner converge to a solution of the least-squares problem
(1). This solution is not useful. Regularization is achieved by early termination of the
iterations. The stopping rule is based on the discrepancy principle, which prescribes
that the iterations be terminated as soon as an iterate x(k) has been determined that

satisfies _{
}
Ax(k)_{− b}δ
2≤ τ δ, (6)

where δ is defined in (2) and τ > 1 is a user-supplied constant independent of δ. Our second algorithm for determining a suitable value of the regularization param-eter differs from the first one in that the sequence of regularization paramparam-eters µ(k), k = 1, 2, . . . , is not chosen a priori. Instead, we compute at each iteration a regulariza-tion parameter µ(k) such that the iterate x(k)satisfies the discrepancy principle, i.e.,

such that _{
}

Ax(k)_{− b}δ
_{
}

2= τ δ.

We show convergence of this algorithm (up to a subsequence) and regularization prop-erties of the computed iterates x(k). A third algorithm considered in Section 4 uses the regularization parameter value obtained in this manner in the algorithm described in [20].

This paper is organized as follows: Section 2 outlines the FMM-GKS method de-scribed in [20] for the solution of (4). Our algorithms for determining the regularization parameter µ are described in Section 3, and a few numerical example are presented in Section 4. Finally, Section 5 contains concluding remarks and discusses some exten-sions.

2 A majorization-minimization method

We briefly describe the FMM-GKS method proposed in [20]. Introduce a smoothed
version of the norm_{kxk}q_{q} as follows. Consider the function Φq :R→Rdefined by

Φq(t) =|t|q.

If 0 < q_{≤ 1, then the function t → Φ}q(t) is not differentiable at t = 0. Introduce the

smoothed version of Φq as

Φq,ε(t) =

p

t2_{+ ε}2q_{,} _{(7)}

where ε > 0 is a small constant. Clearly, Φq,ε(t) is everywhere differentiable. A smoothed

version of_{kxk}q_{q} for x = [x1, . . . , xn]t∈Rnis given by the right-hand side of

kxkqq ≈ n

X

i=1

Throughout this paper, the superscripttdenotes transposition.

Define the smoothed version of the functional with p = 2 that is minimized in (4),

Jε(x) :=1
2
Ax_{− b}δ
_{
}2
2+
µ
q
ℓ
X
i=1
Φq,ε((Lx)i). (8)

Thus, the smoothed minimization problem associated with (4) reads x∗:= arg min

x J ε(x).

The FMM-GKS method described in [20] for computing a stationary point ofJεis

based on a majorization-minimization method. It constructs a sequence of iterates x(k)
that converge to a stationary point of_{J}ε. At each step the functionalJεis majorized

by a quadratic function x_{→ Q(x, x}(k)) that is tangent to_{J}εat x(k). The next iterate

x(k+1) is the unique minimizer of x → Q(x, x(k)). We outline this method in the remainder of this section.

Definition 1 Consider the differentiable function _{J (x) :}Rn →R. We say that the
function x_{→ Q(x, y) :}_{R}n_{→}_{R}*is a quadratic tangent majorant for*_{J (x) at y ∈}_{R}nif
the following conditions hold:

– _{Q(x, y) is quadratic;}

– _{Q(x, y) ≥ J (x) for all x ∈}Rn;

– _{Q(y, y) = J (y) and ∇Q(y, y) = ∇J (y).}

2.1 Majorization step

We outline the construction of a quadratic tangent majorant at the point x(k). Two
approaches are described in [20], one yields a majorant with fixed aperture and the
other one a majorant with the largest aperture possible. The second kind of
majo-rant approximates the function_{J}ε better than the first kind, but its computation is

more demanding. In the following we will consider only the majorant with fixed aper-ture, although all the theoretical results hold true also for the majorant with adaptive aperture.

Let

u(k):= Lx(k) and introduce the vector

ω(k):= u(k) 1− (u(k))2+ ε2 ε2 q/2−1! , (9)

where all the operations, including squaring, are meant element-wise. It is shown in [20] that the function

Q(x, x(k)) = 1
2
Ax_{− b}δ
_{
}2
2+
µεq−2
2
kLxk22− 2
D
ω(k), Lx
E
+ c, (10)
with c a suitable constant independent of x, is a quadratic tangent majorant for_{J}εat

2.2 Minimization step

Given x(k), the next iterate x(k+1) is the minimizer of x _{→ Q(x, x}(k)). Since _{Q is}
quadratic, x(k+1) can be computed by determining the zero of the gradient, i.e., by
solving the linear system of equations

(AtA + ηLtL)x(k+1)= Atbδ+ ηLtω(k), η := µεq−2. (11) The matrix on the left-hand side is nonsingular for µ > 0 due to the requirement (5). Therefore x(k+1)is the unique minimizer ofQ(x, x(k)).

An approximate solution of (11) can be computed efficiently by seeking a solution in a low-dimensional subspace. Let the columns of Vk ∈Rn×d, with 1≤ d ≪ n, form

an orthonormal basis for the subspace in which we determine an approximate solution x(k+1) of (11). We compute x(k+1)by solving the minimization problem

y(k+1):= arg min
y
AVk
η1/2LVk
y_{−}
bδ
η1/2ω(k)
_{
}
2
2
(12)
and letting
x(k+1)= Vky(k+1). (13)

Introduce the QR factorizations

AVk= QARA with QA∈Rm×d, RA∈Rd×d,

LVk= QLRL with QA∈Rℓ×d, RA∈Rd×d.

(14) Inserting these factorizations into (12) yields

y(k+1):= arg min
y
RA
η1/2RL
y_{−}
Qt_{A}bδ
η1/2Qt_{L}ω(k)
_{
}
2
2
,

and substituting (13) into (11) gives the residual vector

r := At(AVky(k+1)− bδ) + ηLt(LVky(k+1)− ω(k)). (15)

We expand the solution subspace by including the scaled residual vector vnew= r/krk

in the solution subspace. This vector is orthogonal to the columns of the matrix Vk,

and we define the new matrix Vk+1 = [Vk, vnew] ∈ Rn×(d+1) with an orthonormal

basis for the expanded solution subspace. The so determined solution subspaces are
*referred to as a Generalized Krylov subspaces. Another related application of this kind*
of solution subspaces is described in [23].

We store the matrices

AVk+1= [AVk, Avnew], LVk+1= [LVk, Lvnew].

The QR factorizations of these matrices are computed by updating the QR factoriza-tions (14) according to AVk+1= [AVk, Avnew] = [QA, ˜qA] RA rA 0t τA , LVk+1= [LVk, Lvnew] = [QL, ˜qL] RLrL 0t τL ,

where

rA= QtA(Avnew), qA= Avnew− QArA,

τA=kqAk2, q˜A= qA/τA,

rL= QtL(Lvnew), qL= Lvnew− QLrL,

τL=kqLk2, q˜L= qL/τL;

see Daniel et al. [8] for details.

Algorithm 1 summarizes the computations. Following Huang et al. [20], we refer to the scheme described as a majorization-minimization generalized Krylov subspace (MM-GKS) method. To initiate the computations a user chooses a k0-dimensional

solution subspace of_{R}n. The columns of the matrix V0 form an orthonormal basis for

this subspace.

*Algorithm 1 (The MM-GKS method) Let 0 < q _{≤ 2 and µ > 0. Consider A ∈}*

Rm×n *and L*∈Rℓ×n*such that (5) holds. Fix ε > 0 and k*0*> 0.*

η = µεq−2*;*

*Generate the initial subspace basis: V*0∈Rn×k0 *such that V*0tV0*= I;*
*Compute and store AV*0 *and LV*0*;*

*Let x*0∈Rn *be an initial approximate solution. Let y*(0)= V0tx(0)*;*
*Compute the QR factorizations AV*0= QARA *and LV*0= QLRL*;*

for k = 0, 1, . . . do
u(k)= LVky(k)*;*
ω(k)= u(k)
1−(u(k)ε)22+ε2
q/2−1
*;*
y(k+1)= (R_{A}tRA+ ηRtLRL)−1(RtAQtAbδ+ ηRtLQtLω(k)*);*
r = At(AVky(k+1)− bδ) + ηLt(LVky(k+1)− ω(k)*);*
vnew= r/krk_{2}*; V*k+1= [Vk, vnew*];*

*Update the QR factorizations: AV*k+1= QARA*and LV*k+1= QLRL*;*

x(k+1)= Vky(k+1)*;*

end

To evaluate y(k+1), we solve a least-squares problem instead of inverting the matrix
R_{A}tRA+ ηRtLRL; see [20] for details.

The main computational effort of the algorithm for large matrices A and L is the evaluation of matrix-vector products with these matrices and with their transposes. Since the matrices AVk and LVk are stored, each iteration requires the evaluation of

the matrix-vector products Avk+1and Lvk+1, which are needed for updating AVkand

LVk, and of their QR factorizations. The evaluation of a matrix-vector products with

each one of the matrices At and Lt is required when computing the residual vector (15).

The following result for the approximate solutions x(k) computed by Algorithm 1 is shown in [20].

*Theorem 1 Let (5) hold. Then for any initial approximate solution x*(0) ∈Rn*, the*
*sequence*{x(k)}k *converges to a stationary point of*Jε*(x). Thus,*

*(i) lim*k→∞
x(k+1)− x(k)
2*= 0,*
*(ii) lim*k→∞∇Jε(x(k)*) = 0.*

*Remark 1 When q < 1, the minimization problems (12) may be non-convex. The *
sta-tionary point computed by MM-GKS therefore might not be a global minimum; in fact,
it is not assured that it is a local minimum. The stationary point computed depends on
the initial approximate solution and, hence, may vary if the starting vector is changed.
It is outside the scope of this paper to analyze the convexity or local convexity
prop-erties of the functionals (4) and (8), and of the minimization problem (12). Thus, we
are not able to provide theoretical insight on the closeness of the computed stationary
point to a global minimum. However, numerical evidences indicates that changing the
initial approximate solution does not significantly alter the quality of the restoration
determined by the MM-GKS method.

3 Determining the regularization parameter

This section describes two methods for determining the regularization parameter µ. Our analysis requires the matrix A to be of full column rank. This requirement might not be satisfied for certain matrices that stem from the discretization of ill-posed problems. We may replace A by the full-rank matrix

˜ A = A αI , (16)

where α > 0 is a fixed tiny constant and I denotes the identity matrix of proper dimension. For sufficiently small α > 0, this substitution will not adversely affect the quality of the computed solution. Henceforth, we assume that such a replacement has been carried out and refer to the matrix so obtained by A. Of course, the matrix (16) does not have to be explicitly stored.

3.1 The MM-GKS-MD method

We first describe a method in which the regularization parameter decreases mono-tonically during the iterations. This method is derived from Algorithm 1 and uses the discrepancy principle to determine when to terminate the computations. Let µ(k) denote the regularization parameter at step k. We will require that

µ(k−1)> µ(k)> 0 _{∀k,} lim

k→∞µ

(k)_{= 0.} _{(17)}

The algorithm of this subsection is referred to as a majorization-minimization general-ized Krylov subspace method with monotonically decreasing regularization parameter, in brief as the MM-GKS-MD method. We describe an algorithm that defines this method and subsequently discuss some of its properties.

*Algorithm 2 (The MM-GKS-MD method) Let 0 < q < 2 be fixed and let µ*(k)*,*
*k = 1, 2, . . . , be a sequence that satisfies (17). Let the matrices A*_{∈} _{R}m×n *and L*_{∈}

Rℓ×n *satisfy (5). Let x*0 ∈Rn *be an initial approximate solution, and let ε > 0 and*

k0*> 0 be constants.*

*Generate the initial subspace basis: V*0∈Rn×k0 *such that V*0tV0*= I;*
*Compute and store AV*0 *and LV*0*;*

*Compute QR factorizations AV*0= QARA *and LV*0= QLRL*;*

y(0)= V_{0}tx(0)*;*
for k = 0, 1, . . . do
u(k)= LVky(k)*;*
ω(k)= u(k)
1_{−}(u(k)_{ε})22+ε2
q/2−1
*;*
η(k)= µ(k)εq−2*;*
y(k+1)= (R_{A}tRA+ η(k)RtLRL)−1(RAtQtAbδ+ η(k)RtLQtLω(k)*);*
if
AVky(k+1)− bδ
2≤ τ δ then
*Exit;*
end
r = At(AVky(k+1)− bδ) + η(k)Lt(LVky(k+1)− ω(k)*);*
*Reorthogonalize, if needed: r = r*_{− V}kVkt*r;*
vnew= r/krk2*; V*k+1= [Vk, vnew*];*

*Update the QR factorizations: AV*k+1= QARA*and LV*k+1= QLRL*;*

x(k+1)= Vky(k+1)*;*

end

The residual vector r in Algorithm 2 is defined as in (15).

Before showing some properties of Algorithm 2, we introduce some notation that will be used in the following. Define the functional

Jε(k)(x) :=1
2
Ax_{− b}δ
2
2+
µ(k)
q
ℓ
X
i=1
Φq,ε((Lx)i)

and let_{Q}(k)(x, x(k)) denote the quadratic tangent majorant for_{J}ε(k) at x(k), i.e.,

Q(k)(x, x(k)) = 1 2 Ax− bδ 2 2+ µ(k)εq−2 2 kLxk22− 2 D ω(k), Lx E + c(k),

where c(k) is a suitable constant independent of x and ω(k) is defined in (9). We use the notationQ(k)(x, x(k)) instead ofQ(x, x(k)) to stress the fact that the parameter µ(k) in Q(k)is non-stationary.

*Proposition 1 Let x*(k)*, k = 1, 2, . . . , denote the approximate solutions generated by*
*Algorithm 2. Then*

*Proof By the properties of quadratic tangent majorants, we have*
Jε(k)(x(k)) =Q(k)(x(k), x(k))
(a)
≥ Q(k)(x(k+1), x(k))
(b)
≥ Jε(k)(x(k+1))
=1
2
Ax(k+1)− bδ
2
2+
µ(k)
q
ℓ
X
i=1
Φq,ε((Lx(k+1))i)
(c)
≥ 1_{2}
Ax(k+1)_{− b}δ
_{
}2
2+
µ(k+1)
q
ℓ
X
i=1
Φq,ε((Lx(k+1))i) =Jε(k+1)(x(k+1)),

where (a) holds since x(k+1) is a minimizer of Q(k), (b) follows from the fact that Q(k)(x, x(k))≥ Jε(k)(x) for all x, and (c) holds since µ(k)≥ µ(k+1).

The following result is a consequence of Proposition 1.

*Corollary 1 Let the iterates x*(k)*, k = 0, 1, . . . , be determined by Algorithm 2. Then*
Jε(k)(x(k))→ J∗*as k*→ ∞,

*for some J*∗*≥ 0.*

*Proof This is obvious since the sequence J*ε(k)(x(k)), k = 1, 2, . . . , is decreasing and

bounded from below by zero.

We are now in position to show that, when A is of full column rank, the iterates generated by Algorithm 2 converge to the solution of the least-squares problem (1) if no stopping criterion is employed.

*Theorem 2 Assume that the matrix A is of full column rank. Let Algorithm 2 *
*deter-mine the iterates x*(k)*, k = 1, 2, . . . . Then*

lim

k→∞x

(k)_{= A}†_{b}δ_{,}

*i.e., the limit is the solution of the least-squares problem (1).*

*Proof Let ρ*(k)denote the modulus of convexity of_{Q}(k), see, e.g., [19] for a definition,
and let

ρ := inf

k ρ (k)

.

Since A is of full column rank, we have ρ > 0. We will show that x(k), k = 1, 2, . . . , is a Cauchy sequence. To this aim consider

ρ
2
x(k+1)_{− x}(k)
_{
}2
2≤ Q
(k)_{(x}(k)_{, x}(k)_{)}
− Q(k)(x(k+1), x(k))
=_{J}ε(k)(x(k))− Q(k)(x(k+1), x(k))
(a)
≤ Jε(k)(x(k))− Jε(k)(x(k+1))
(b)
≤ Jε(k)(x(k))− Jε(k+1)(x(k+1)),

where (a) follows from the fact that_{Q}(k) is a quadratic tangent majorant, and (b)
holds because µ(k+1)_{≤ µ}(k).

Since the sequence_{J}ε(k)(x(k)), k = 1, 2, . . . , is convergent, we have
Jε(k)(x(k))− Jε(k+1)(x(k+1))→ 0 as k → ∞.
Thus,
x(k+1)_{− x}(k)
_{
}
2→ 0 as k → ∞,

and we have established that the sequence x(k), k = 1, 2, . . . , is a Cauchy sequence. SinceRn is complete, there is a limit point x∗.

It remains to show that x∗is the solution of Ax = bδ. Since we assume the matrix A to be of full column rank, it suffices to prove that the residual converges to 0 as k→ ∞. Assume that k is large enough so that the columns of Vk spanRn and consider the

residual at step k,

r(k)= Ax(k)− bδ = A(AtA + η(k)LtL)−1Atbδ+ η(k)Ltω(k)− bδ.

Since µ(k)_{→ 0 as k → ∞, it holds that η}(k)_{→ 0 as k → ∞. Thus,}

r(k)_{→ (A}tA)−1AtAbδ_{− b}δ= 0 as k_{→ ∞.}

This shows the theorem.

*Corollary 2 Let δ be a bound for the norm of the error in b*δ*, see (2), and let τ > 1*
*be a parameter independent of δ. The computations with Algorithm 2 equipped with the*
*discrepancy principle (6) as stopping criterion terminate after a finite number, k*δ*, of*
*steps. The iterate x*(kδ) _{generated at step k}

δ *satisfies*
Ax(kδ)
− bδ
2≤ τ δ.

We are now in a position to show that Algorithm 2 equipped with the discrepancy principle as a stopping rule is an iterative regularization method.

*Theorem 3 Let the conditions of Corollary 2 hold and let x*δ= x(kδ) _{denote the last}*iterate determined by Algorithm 2 equipped with the discrepancy principle as stopping*
*criterion. Then*
lim sup
δց0
xδ_{−}x_{b}
2= 0,
*where*b*x denotes the desired solution vector (3).*

*Proof All norms are equivalent in*Rn and A is of full column rank. Therefore, there is
a constant ˜c > 0 such that

xδ−xb 2≤ ˜c xδ−bx A

for all vectors xδ_{−}_{b}x, where_{kyk}_{A} =_{kAyk}_{2} is the vector norm induced by A. Then
we have
lim sup
δց0
xδ_{−}_{b}x
_{
}
2≤ ˜clim supδց0
xδ_{−}_{b}x
_{
}
A
= ˜c lim sup
δց0
Axδ_{− A}x_{b}
_{
}
2
= ˜c lim sup
δց0
Axδ_{− b}δ_{− (A}bx_{− b}δ)
2
= ˜c lim sup
δց0
Axδ− bδ− (b − bδ)
2
≤ ˜clim sup
δց0
_{
}_{Ax}δ
− bδ
2+
b− bδ
2
(a)
≤ ˜clim sup
δց0
(τ δ + δ)
≤ ˜c(1 + τ ) lim sup
δց0
δ
= 0,

where (a) follows from Corollary 2 and (2).

3.2 The MM-GKS-DP method

This subsection describes another method derived from Algorithm 1 for choosing a suit-able value of the regularization parameter. The method determines a value of the regu-larization parameter that satisfies the discrepancy principle in each iteration. We refer to this scheme as the MM-GKS-DP method. The theoretical results for this method are less complete than those of the previous subsection. For instance, we will not prove convergence of the method, but only show the existence of a converging subsequence. Nevertheless, numerical experiments reported in Section 4 show the MM-GKS-DP method to achieve very good results. We first describe the algorithm and then discuss its properties.

*Algorithm 3 (The MM-GKS-DP method) Let 0 < q < 2 be fixed and let µ*(k)*,*
*k = 1, 2, . . . , be a sequence that satisfies (17). Let the matrices A*_{∈} _{R}m×n *and L*_{∈}

Rℓ×n *satisfy (5). Let x*0 ∈Rn *be an initial approximate solution, and let ε > 0 and*

k0*> 0 be constants.*

*Generate the initial subspace basis: V*0∈Rn×k0 *such that V*0tV0*= I;*
*Compute and store AV*0 *and LV*0*;*

*Compute the QR factorizations AV*0= QARA *and LV*0= QLRL*;*

y(0)= V0tx(0)*;*
for k = 0, 1, . . . do
u(k)= LVky(k)*;*
ω(k)= u(k)
1_{−}(u(k)_{ε})22+ε2
q/2−1
*;*
η(k)= µ(k)εq−2*;*

*where µ*(k) *is such that*

AVky(k+1)− bδ
2*= τ δ;*
y(k+1)= (R_{A}tRA+ η(k)RtLRL)−1(RAtQtAbδ+ η(k)RtLQtLω(k)*);*
r = At(AVky(k+1)− bδ) + ηLt(LVky(k+1)− ω(k)*);*
*Reorthogonalize, if needed: r = r*_{− V}kVkt*r;*
vnew= r/krk2*; V*k+1= [Vk, vnew*];*

*Update the QR factorizations: AV*k+1= QARA*and LV*k+1= QLRL*;*

x(k+1)= Vk+1y(k+1)*;*

end

Again the residual vector r is defined as in (15).

We now show that the sequence x(k), k = 1, 2, . . . , computed by Algorithm 3 has
a converging subsequence with a limit that satisfies the discrepancy principle.
*Proposition 2 Assume that A is of full column rank and let x*(k)*, k = 1, 2, . . . , denote*
*the iterates generated by Algorithm 3. Then there is a subsequence x*(kj)_{, j = 1, 2, . . . ,}*with a limit x*∗*, such that* _{
}

Ax∗_{− b}δ
2= τ δ.
*Proof We have for all k that*

Ax(k)_{− b}δ

_{2} = τ δ. Since all norms are equivalent in

Rn, there is a constant ˆc > 0 independent of A and x(k)such that τ δ = Ax(k)− bδ 2≥ Ax(k) 2− bδ 2 = x(k) A− bδ 2≥ ˆc x(k) 2− bδ 2, where x(k)

_{A}=_{kAx}(k)_{k}2. Thus, for all k we have the bound

x(k)
2≤
τ δ_{−}
bδ
2
ˆ
c .

This shows that the sequence x(k), k = 1, 2, . . . , is bounded. Therefore, there is a
subsequence x(kj)_{, j = 1, 2, . . . that converges to a limit x}∗ _{as j increases. Since}

Ax(kj)_{− b}δ

2= τ δ for all j, we have by continuity of the norm that

Ax∗− bδ

2= τ δ.

We now can show that the MM-GKS-DP method is an iterative regularization method (up to a subsequence).

*Theorem 4 Let x*(kj)_{, j = 1, 2, . . . , denote the subsequence defined in Proposition 2}*and let x*δ = x(kj) _{denote the approximate solution determined by Algorithm 3 with}*noise level δ > 0, where we assume that A is of full column rank. Then*

lim sup
δց0
xδ_{−}x_{b}
_{
}
2= 0,
*where*_{b}*x denotes the solution (3) of the error-free system.*

*Proof The result can be shown in the same manner as Theorem 3. We therefore omit*
the details.

*Remark 2 In actual computations we have not experienced the need to take a *
subse-quence of the iterates x(k), k = 1, 2, . . . , determined by Algorithm 3.

*Remark 3 This section shows results for the MM-GKS method of [20]. This reference*
also considers an approach to construct quadratic majorants, whose aperture depends
on x(k). The resulting scheme is referred to as the AMM-GKS in [20]. It requires more
work per iteration, but the number of iterations required may be smaller than for the
MM-GKS method applied in the present paper. The theory developed in this section
carries over to the AMM-GKS method without change.

*Remark 4 Recently, Huang et al. [21] used the approach of this subsection to determine*
the regularization parameter in an iterated Tikhonov regularization method in general
form with p = q = 2. Then the quadratic majorant Q, defined by (10), is independent
of k. Theorem 4 applies to the iterates generated by the method in [21].

*Remark 5 This section focused on the use of the Euclidean norm for the fidelity term*
in (4). The results of this section carry over to fidelity terms with p_{6= 2, such as p = 1.}
Then Algorithms 1-3 have to be modified to implement the FMM-GKS or AMM-GKS
methods of [20].

4 Numerical examples

We present some numerical examples. All computations are carried out in MATLAB 2016a with about 15 significant decimal digits running on a laptop computer with a quad core CPU Intel Core i7-6700HQ @2.60GHz and with 16 GB of RAM. We compare Algorithms 2 and 3 with the MM-GKS method described in [20], which requires a user to explicitly specify a value of the regularization parameter, and with the methods described in [2, 9]. We do not compare our approach with other methods from the literature for solving (4) since the main focus of this paper is to show that, using the discrepancy principle, we are able to determine a regularization parameter that provides accurate restorations, i.e., restorations that are of about the same quality

as restorations obtained with the optimal regularization parameter. For a thorough
analysis of the performances of the ℓ2_{− ℓ}q approach and for a comparison with other
methods from the literature we refer to [20, 24].

The error in the vector b is in all examples white Gaussian noise. We fix ε = 1 in (7). This value is small compared to elements of the vector bδ, which in all examples represents a contaminated image. Each element is a pixel value in the range [0, 255]. Numerical experiments show that the choice of ε does not largely affect the quality of the restorations, but can greatly affect the required number of iterations. This is illustrated below.

We would like to compute a sparse solution and therefore use a two-level framelet analysis operator as regularization operator L. Framelets are extensions of wavelets. They are defined as follows.

Definition 2 LetA ∈Rr×nwith n*≤ r. The set of the rows of A is a tight frame for*

Rn if∀x ∈Rn it holds kxk22= r X j=1 yjtx,

where yj∈Rnis the j-th row ofA (written as a column vector), i.e., A = [y1, . . . , yr]t.

The matrix*A is referred to as an analysis operator and A*t*as a synthesis operator.*
Tight frames have been used in many image restoration applications, including
inpainting and deblurring [3, 5, 6]. The essential feature of tight frames is their
redun-dancy. Due to the redundancy, the loss of some information can be tolerated. Usually
images have a very sparse representation in the framelet domain.

We will use tight frames determined by linear B-splines. These frames are in one space-dimension made up of a low-pass filter W0and two high-pass filters W1and W2.

The corresponding masks are given by w(0)=1 2(1, 2, 1) , w (1) = √ 2 4 (1, 0,−1) , w (2) =1 4(−1, 2, −1) .

We now derive the synthesis operator_{A from these masks. Imposing reflexive boundary}
conditions, so that_{A}t_{A = I, we obtain}

W0=1
4
3 1 0 . . . 0
1 2 1
. ._{.}. ._{.} . ._{.}
1 2 1
0 . . . 0 1 3
, W1=
√
2
4
−1 1 0 . . . 0
−1 0 1
. ._{.} . ._{.} . ._{.}
−1 0 1
0 . . . 0 −1 1
,
and
W2= 1
4
1 −1 0 . . . 0
−1 2 −1
. ._{.} . ._{.} . ._{.}
−1 2 −1
0 . . . 0 1 1
.

We obtain the synthesis operator

A = W0 W1 W2 .

We define operators for two space-dimensions by using tensor products Wij= Wi⊗ Wj, i, j = 0, 1, 2.

This yields the analysis operator

A = W00 W01 .. . W22 .

The matrix W00 is a low-pass filter. All the other matrices Wij contain at least one

high-pass filter in some direction.

While we have discussed the stopping criterion for the MM-GKS-MD method, we have not defined one for the MM-GKS-DP and MM-GKS methods. We will terminate the iterations with the latter two methods as soon as two consecutive iterates are sufficiently close, i.e., as soon as

x(k)_{− x}(k+1)
2
x(k)
_{
}
2
≤ 10−4.

Inspired by the geometric sequence of regularization parameters used for iterated Tikhonov methods described in [16, 1], we use in the MM-GKS-MD method the fol-lowing sequence of regularization parameters

µ(k)= 0.7k, k = 1, 2, . . . .

We also have to address the selection of the regularization parameter for the MM-GKS method. Our first computed example illustrates that the regularization parameter value determined by the MM-GKS-DP method is quite close to the optimal one, i.e., to the value that minimizes the restoration error defined below. We therefore will in our computed examples use the regularization parameter value determined by the MM-GKS-DP method for the MM-GKS method as well.

We measure the quality of the restored images with the relative restoration error RRE(x) = kx −bxk2

kbxk2

, wherebx denotes the desired solution (3).

*Example 1 We consider the restoration of an image that has been contaminated by*
noise and spatially invariant blur; see, e.g., [18] for a discussion of this kind of
restora-tion problems. The blurring phenomenon can be modeled with a Fredholm integral
equation of the first kind

B (k, f ) =

Z

k (s_{− x, t − y) f (s, t) dsdt = k ∗ f,}

which, due to the spatial invariance, reduces to a convolution. Discretization yields a linear system of equations with a severely ill-conditioned matrix, i.e., the singular values of the matrix decrease to zero with increasing index with no significant gap; see,

(a) (b) (c)

Fig. 2: Boat test image: (a) True image (124× 124 pixels), (b) PSF (14 × 14 pixels), (c) blurred and noisy image (δ = 0.02kbk2).

e.g., [15, 17]. We impose anti-reflective boundary conditions to take into account that the available image is finite; see [26] for details.

We first illustrate how the selection of ε and q affect the accuracy and conver-gence rate of the MM-GKS-DP method. Then we compare the proposed approaches with other methods from the literature. In particular, we compare with the MM-GKS method described in [20] and with methods proposed in [2, 9]. Specifically, we com-pare with the Approximated Iterated Tikhonov (AIT) method described in [9], the Approximated Iterated Tikhonov with General Penalty term (AIT-GP) method, the Approximated Projected Iterated Tikhonov (APIT) method, and the Approximated Projected Iterated Tikhonov with General Penalty term (APIT-GP) method described in [2].

We turn to the roles of q and ε in the MM-GKS-DP method. To gain some insight
into how these parameters should be chosen, we apply the MM-GKS-DP method to
a fairly small test problem. Specifically, we consider the Boat image blurred by an
average PSF of size 4_{× 4 pixels, and contaminated by white Gaussian noise such that}
δ = 0.02_{kbk}_{2}; see Figure 2. Figure 3 shows results obtained for different values of q and
ε. We observe that for small values of ε, the method converges in fairly few iterations,
but the computed restoration is not so accurate. The best restorations are achieved
when ε = 1 and q is small (in this case q = 10−1). We also observe that, fixing ε = 1,
the error decreases with q, but the number of iterations required by the method grows
sharply as q decreases.

Figure 4 displays three restorations determined by the MM-GKS-DP method for three different choices of q and ε. Specifically, we show restorations obtained for q = 10−1and ε = 10−3, for q = 1 and ε = 1, and for q = 10−1and ε = 1. Visual inspection of the images shows the first parameter pair to yield a restoration with some noise and ringing near the edges, the second parameter pair gives a restoration with less noise, but still shows some ringing, while the third parameter pair gives the best restoration; it appears noise-free and does not show significant ringing.

*Example 2 We would like to illustrate that the restorations obtained by using the*
smoothed functionalJεdefined by (8) are very close to the restoration determined by

the functionalJ defined by (4). To this aim, we iteratively apply Algortihm 3 while
progressively decreasing the value of ε. Let _{{ε}j}j denote a decreasing sequence such

0.2 0.4 0.6 0.8 1 q -3 -2 -1 0 lo g10 (ε ) 0.074 0.076 0.078 0.08 0.082 0.084 (a) 0.2 0.4 0.6 0.8 1 q -3 -2 -1 0 lo g10 (ε ) 20 40 60 80 100 120 140 (b)

Fig. 3: Boat test case: (a) RRE obtained with MM-GKS-DP for different values of q and ε, (b) Number of iterations required to reach convergence with MM-GKS-DP for different values of q and ε.

(a) (b) (c)

Fig. 4: Boat test image restorations obtained with MM-GKS-DP with different values of q and ε: (a) q = 10−1 and ε = 10−3, (b) q = 1 and ε = 1, (c) q = 10−1and ε = 1.

up the computations, we use as initial vector the restoration provided by Algorithm 3 at the previous step, i.e., when εj−1 is used as smoothing parameter. The procedure

Method RRE Iterations
Restarted MM-GKS-DP 0.13274 _{256}

MM-GKS-DP 0.13655 186

Table 1: Satellite test image: Relative restoration errors of the restorations obtained with the different methods and number of iterations required to reach convergence. The smallest error is in boldface.

*Algorithm 4 (The restarted MM-GKS-DP method) Let 0 < q≤ 2. Consider*
A_{∈}Rm×n *and L*_{∈}Rℓ×n*such that (5) holds. Let*_{{ε}j}j *denote a decreasing sequence*
*such that ε*j*→ 0 as j → ∞ and fix k*0*> 0.*

*Generate the initial subspace basis: V*0∈Rn×k0 *such that V*0tV0*= I;*
*Compute and store AV*0 *and LV*0*;*

*Compute the QR factorizations AV*0= QARA *and LV*0= QLRL*;*

y(0)= V_{0}tx(0)*;*
for j = 0, 1, . . . do
for k = 0, 1, . . . do
u(k)= LVky(k)*;*
ω(k)= u(k) 1−
(u(k)_{)}2_{+ε}2
j
ε2
j
q/2−1!
*;*
η(k)= µ(k)εq−2_{j} *;*

*where µ*(k) *is such that*

AVky(k+1)− bδ
_{2}*= τ δ;*
y(k+1)= (Rt_{A}RA+ η(k)RtLRL)−1(RtAQtAbδ+ η(k)RtLQtLω(k)*);*
r = At(AVky(k+1)− bδ) + ηLt(LVky(k+1)− ω(k)*);*
*Reorthogonalize, if needed: r = r*_{− V}kVkt*r;*
vnew= r/krk_{2}*; V*k+1= [Vk, vnew*];*

*Update the QR factorizations: AV*k+1= QARA *and LV*k+1= QLRL*;*

x(k+1)= Vk+1y(k+1)*;*

end end

Similarly as in the previous algorithms, the residual vector r is defined by (15).
The computations described by Algorithm 4 can be demanding both in terms of
computing time and memory consumption. We therefore only apply the algorithm to
the restoration of a fairly small image. In particular, we consider a 122_{× 122-pixel}
satellite image, see Figure 5(a), blur it with a 13× 13-pixel motion blur PSF, and add
3% white Gaussian noise. The contaminated image is shown in Figure 5(c).

We choose a logarithmically spaced sequence εj, j = 1, 2, . . . , 5, between 1 and

10−6. Table 1 shows the restoration errors obtained and the number of iterations re-quired to reach convergence by Algorithms 4 and 3. The table shows the restorations determined by these algorithms to be very close. It is not surprising to see that the restarted MM-GKS-DP method is able to produce a more accurate restoration. How-ever, the difference in quality of the restorations obtained by Algorithms 4 and 3 is insignificant. In particular, the small improvement in accuracy does not justify the additional computational effort required by Algorithm 4. In the following numerical examples we therefore are not going to consider the restarted MM-GKS-DP method.

(a) (b) (c)

Fig. 5: Satellite test image: (a) True image (122× 122 pixels), (b) PSF (13 × 13 pixels), (c) blurred and noisy image (δ = 0.03kbk2).

(a) (b)

Fig. 6: Satellite test image restoration: (a) MM-GKS-DP, (b) Restarted MM-GKS-DP.

(a) (b) (c)

Fig. 7: Cameraman test image: (a) True image (238_{× 238 pixels), (b) PSF (17 × 17}
pixels), (c) blurred and noisy image (δ = 0.03kbk2).

*Example 3 We consider the cameraman image, blur it with a non-symmetric PSF,*
and add white Gaussian noise such that δ = 0.05kbk2; see Figure 7. Anti-reflective

boundary conditions are imposed; see [26].

We first analyze how the choice of µ affects the quality of the restoration determined by the MM-GKS method, and compare it to restorations obtained by the MM-GKS-MD and MM-GKS-DP methods. Let q = 0.1 and run the MM-GKS method for different choices of µ. Figure 8(a) shows graphs for the RRE as a function of µ for the three

methods. The graphs for the MM-GKS-MD and MM-GKS-DP methods are obviously constant since they are independent of the choice of µ. Visual inspection of the graph for the MM-GKS method shows, as one would expect, that the RRE obtained for the optimal µ-value to be smaller than the RRE for the MD and MM-GKS-DP methods. However, the optimal RRE-value is very close to the error obtained by the GKS-DP method and not too far away from the error obtained by the MM-GKS-MD method. Moreover, if µ is chosen too large or too small, then the restoration error for the MM-GKS method can be very large. Since the optimal value of the regularization parameter µ for the MM-GKS method is close to the value determined by the MM-GKS-DP method, we will use the latter in the following computed examples with the MM-GKS method.

Starting from this observation, we define a new method, denoted by MM-GKS-R, for majorization-minimization in generalized Krylov subspaces, repeatedly. This method combines the MM-GKS-DP and MM-GKS methods and is defined by Algo-rithm 5.

*Algorithm 5 (MM-GKS-R) Let 0 < q* _{≤ 2. Consider A ∈}_{R}m×n *and L*_{∈} _{R}ℓ×n
*such that (5) holds. Fix ε > 0 and k*0*> 0.*

*Run MM-GKS-DP until convergence;*

*Denote by µ*∗ *the last parameter used in MM-GKS-DP;*
*Run MM-GKS with µ = µ*∗ _{until convergence;}

This algorithm satisfies all the properties shown in [20], i.e., Theorem 1 holds. Moreover, it can be shown experimentally that the limit point of this algorithm ap-proximately satisfies the discrepancy principle.

We next discuss the effect of varying q. Figure 8(b) shows the RRE obtained with the three methods when q is varied. We observe that the RRE obtained by both the MM-GKS-R and MM-GKS-DP methods increases with q, while the RRE obtained by the MM-GKS-MD method does not change much with q (and is larger than for the other two methods). While this comparison suggests that it is good to choose a very small q-value, it must be stressed that, if q is too small, then all the three methods might become numerically unstable.

Let q = 0.1. Table 2 reports the RRE obtained with the different methods as well as the number of iterations required to reach convergence. Figure 9 shows some of the restorations obtained.

Table 2 shows the MM-GKS-R, MM-GKS-MD, and MM-GKS-DP methods to out-perform the AIT, AIT-GP, APIT, and APIT-GP algorithms in terms of accuracy. However, the MM-GKS-type methods require more iterations to converge. Moreover, among the MM-GKS-type methods, the MM-GKS-MD method is the fastest, but the computed restorations are less accurate than those determined with the MM-GKS-R and MM-GKS-DP methods.

Visual inspection of the restorations in Figure 9 shows that the restorations ob-tained with AIT-type methods are more noisy than the ones determined with MM-GKS-type methods. MM-MM-GKS-type methods are seen to be able to restore details of the image more accurately with less noise propagation.

*Example 4 We turn to the restoration of a Clock image, which has been contaminated*
by out-of-focus blur and white Gaussian noise such that δ = 0.01_{kbk}_{2}; see Figure 10.
The PSF has a radius of 4 pixels. We impose anti-reflective boundary conditions and

10-2 10-1 100 0.1 0.12 0.14 0.16 0.18 (a) 0.2 0.4 0.6 0.8 1 0.105 0.11 0.115 0.12 0.125 (b)

Fig. 8: Cameraman test image: Behavior of the RRE as function of µ and q. Figure (a) shows the RRE obtained for different µ-values and q = 0.1. The dashed curve displays the RRE obtained with MM-GKS-MD, the solid curve displays the RRE determined by MM-GKS-DP, and the dotted line depicts the RRE obtained with MM-GKS. The first two methods have constant error since they do not need an estimate of the parameter µ. Figure (b) displays the error obtained with the three methods when changing q. For the GKS method, we used the µ-value determined in the last iteration of MM-GKS-DP, i.e., we applied the MM-GKS-R method. The dashed curve shows the RRE determined by MD, the solid curve depicts the RRE obtained by MM-GKS-DP, and the dotted curve shows the RRE generated by MM-GKS.

(a) (b) (c)

(d) (e) (f)

Fig. 9: Cameraman test image restorations: (a) MM-GKS-R, (b) MM-GKS-MD, (c) MM-GKS-DP, (d) AIT-GP, (e) APIT, (f) APIT-GP.

Method RRE Iterations
MM-GKS-R 0.10287 97
MM-GKS-MD 0.12252 7
MM-GKS-DP 0.10228 _{100}
AIT 0.13061 3
AIT-GP 0.12371 3
APIT 0.13025 3
APIT-GP 0.12346 3

Table 2: Cameraman test image: Relative restoration errors of the restorations obtained with the different methods and number of iterations required to reach convergence. The smallest error is in boldface.

(a) (b) (c)

Fig. 10: Clock test image: (a) True image (246_{× 246 pixels), (b) PSF (9 × 9 pixels),}
(c) blurred and noisy image (δ = 0.01kbk2).

Method RRE Iterations

MM-GKS-R 0.032887 29
MM-GKS-MD 0.036663 5
MM-GKS-DP 0.032828 _{28}
AIT 0.051979 5
AIT-GP 0.050436 5
APIT 0.051979 5
APIT-GP 0.050436 5

Table 3: Clock test image: Relative errors of the restorations determined by different methods and number of iterations required to reach convergence. The smallest error is in boldface.

set q = 0.7. Figure 11 displays some of the computed restorations and Table 3 reports the RRE and the number of iterations for the different methods.

As in the previous examples, MM-GKS-type methods out-perform the other meth-ods, even though for this example the difference in performance is smaller than for the previous example. This is due to the fact that there is less noise-contamination than in the previous example. In term of efficiency, we can see that the MM-GKS-MD method requires fewer iterations than the other two MM-GKS methods, but gives a less accurate restoration.

(a) (b) (c)

(d) (e) (f)

Fig. 11: Clock test image restorations: (a) GKS-R, (b) GKS-MD, (c) MM-GKS-DP, (d) AIT-GP, (e) APIT, (f) APIT-GP.

5 Conclusions

The image restoration methods described in [20, 24] require a user to provide a suitable value of the regularization parameter. This paper develops two approaches to determine such a value with the aid of the discrepancy principle. This enhances the usefulness of the methods. They can be applied without user-interaction to real-world applications for which an estimate of the norm of the error in bδ is available.

Acknowledgments

The authors would like to thank the referees for their insightful comments that im-proved presentation. The first author is a member of the INdAM Research group GNCS and his work is partially founded by the group. The second author is supported in part by NSF grants DMS-1720259 and DMS-1729509.

References

1. M. Brill and E. Schock, Iterative solution of ill-posed problems – a survey, in Model Optimization in Exploration Geophysics, ed. A. Vogel, Vieweg, Braunschweig, 1987, pp. 17–37.

2. A. Buccini, Regularizing preconditioners by non-stationary iterated Tikhonov with general penalty term, Appl. Numer. Math., 116 (2017), pp. 64–81.

3. J.-F. Cai, R. H. Chan, L. Shen, and Z. Shen, Simultaneously inpainting in image and transformed domains, Numer. Math., 112 (2009), pp. 509–533.

4. J.-F. Cai, R. H. Chan, and Z. Shen, A framelet-based image inpainting algorithm, Appl. Comput. Harmonic Anal., 24 (2008), pp. 131–149.

5. J.-F. Cai, R. H. Chan, and Z. Shen, Linearized Bregman iterations for frame-based image deblurring, SIAM J. Imaging Sci., 2 (2009), pp. 226–252.

6. J.-F. Cai, S. Osher, and Z. Shen, Split Bregman methods and frame based image restora-tion, Multiscale Model. Simul., 8 (2009), pp. 337–369.

7. R. H. Chan and H. X. Liang, Half-quadratic algorithm for ℓp-ℓqproblems with applications

to TV-ℓ1image restoration and compressive sensing, in Proceedings of Efficient Algorithms

for Global Optimization Methods in Computer Vision, Lecture Notes in Comput. Sci. # 8293, Springer, Berlin, 2014, pp. 78–103.

8. J. W. Daniel, W. B. Gragg, L. Kaufman, and G. W. Stewart, Reorthogonalization and stable algorithms for updating the Gram-Schmidt QR factorization, Math. Comp., 30 (1976), pp. 772–795.

9. M. Donatelli and M. Hanke, Fast nonstationary preconditioned iterative methods for ill-posed problems with application to image deblurring, Inverse Problems, 29 (2013), 095008. 10. M. Donatelli, T. Huckle, M. Mazza, and D. Sesana, Image deblurring by sparsity constraint

on the Fourier coefficients, Numer. Algorithms, 72 (2016), pp. 341–361.

11. H. W. Engl, M. Hanke, and A. Neubauer, Regularization of Inverse Problems, Kluwer, Dordrecht, 1996.

12. C. Estatico, S. Gratton, F. Lenti, and D. Titley-Peloquin, A conjugate gradient like method for p-norm minimization in functional spaces, Numer. Math., 137 (2017), pp. 895–922. 13. S. Gazzola and J. G. Nagy, Generalized Arnoldi–Tikhonov method for sparse

reconstruc-tion, SIAM J. Sci. Comput., 36 (2014), pp. B225–B247.

14. S. Gazzola, P. Novati, and M. R. Russo, On Krylov projection methods and Tikhonov regularization, Electron. Trans. Numer. Anal., 44 (2015), pp. 83–123.

15. M. Hanke and P. C. Hansen, Regularization methods for large-scale problems, Surv. Math. Ind., 3 (1993), pp. 253–315.

16. M. Hanke and C. W. Groetsch, Nonstationary iterated Tikhonov regularization, J. Optim. Theory Appl., 98 (1998), pp. 37–53.

17. P. C. Hansen, Rank Deficient and Discrete Ill-Posed Problems, SIAM, Philadelphia, 1998. 18. P. C. Hansen, J. G. Nagy, and D. P. O’Leary, Deblurring Images: Matrices, Spectra, and

Filtering, SIAM, Philadelphia, 2006.

19. J.-P. Hiriart-Urruty and C. Lemar´echal, Fundamentals of Convex Analysis, Springer, New York, 2004.

20. G. Huang, A. Lanza, S. Morigi, L. Reichel, and F. Sgallari, Majorization-minimization generalized Krylov subspace methods for ℓp-ℓqoptimization applied to image restoration,

BIT Numer. Math., 57 (2017), pp. 351–378.

21. G. Huang, L. Reichel, and F. Yin, Projected nonstationary iterated Tikhonov regulariza-tion, BIT Numer. Math., 56 (2016), pp. 467–487.

22. J. Huang, M. Donatelli, and R. H. Chan, Nonstationary iterated thresholding algorithms for image deblurring, Inverse Problems and Imaging, 7 (2013), pp. 717–736.

23. J. Lampe, L. Reichel, and H. Voss, Large-scale Tikhonov regularization via reduction by orthogonal projection, Linear Algebra Appl., 436 (2012), pp. 2845–2865.

24. A. Lanza, S. Morigi, L. Reichel, and F. Sgallari, A generalized Krylov subspace method for ℓp-ℓqminimization, SIAM J. Sci. Comput., 37 (2015), pp. S30–S50.

25. P. Rodr´ıguez and B. Wohlberg, Efficient minimization method for a generalized total variation functional, IEEE Trans. Image Process., 18 (2009), pp. 322–332.

26. S. Serra-Capizzano, A note on antireflective boundary conditions and fast deblurring mod-els, SIAM J. Sci. Comput., 25 (2004), pp. 1307–1325.

27. R. Wolke and H. Schwetlick, Iteratively reweighted least squares: algorithms, convergence analysis, and numerical comparisons, SIAM J. Sci. Statist. Comput., 9 (1988), pp. 907–921.