Denote by λ εj the eigenvalues of A outside the interval [1 − ε, 1 + ε] and by r ε the number of such eigenvalues. Set S = [1 − ε, 1 + ε] ∪ {λ ε j } and let p q be the polynomial

(1)

A che serve il clustering attorno a 1

Let A be a p.d. matrix and ε, 0 < ε < 1, be fixed.

Denote by λ ^ε _j the eigenvalues of A outside the interval [1 − ε, 1 + ε] and by r ε the number of such eigenvalues. Set S = [1 − ε, 1 + ε] ∪ {λ ^ε j } and let p q be the polynomial

p q (λ) = Y

λ

^ε_j

1 − λ λ ^ε _j

! T q−r

ε

((1 − λ)/ε)

T q−r

ε

(1/ε) , q ≥ r ε

where T k (x) denotes the chebycev polynomial of degree k. ((b+a−2λ)/(b−a) = (1 − λ)/ε, (b + a)/(b − a) = 1/ε, if a = 1 − ε, b = 1 + ε ). Notice that S is a set containing all the eigenvalues of A, and p q has exactly degree q and p q (0) = 1.

Then one can say that if x q is the q-th vector generated by the CG method when solving Ax = b, then

kx − x q k A ≤ (max

λ∈S |p q (λ)|)kx − x 0 k A . (bound) This bound for kx − x ^q k ^A allows a better evaluation of the CG rate of conver- gence with respect to the well known bound

kx − x ^q k ^A ≤ 2

p µ 2 (A) − 1 pµ 2 (A) + 1

! q

kx − x ⁰ k ^A , µ 2 (A) = max λ(A)

min λ(A) (wkbound) in case it is known that most of (almost all) the eigenvalues of A are in some interval [1 − ε, 1 + ε] where ε is small (almost zero).

If, moreover, the n×n linear system Ax = b can be seen as one of a sequence of increasing order linear systems, with the property that ∀ ε > 0 ∃ k ε , n ε such that for all n > n ε outside [1 − ε, 1 + ε] fall no more than n ε eigenvalues of A, then (bound) allows to prove the superlinear convergence of CG.

(Note that in general CG has a linear rate of convergence, as a consequence of (wkbound)).

Let us prove these assertions, by evaluating max λ∈S |p q (λ)|.

max λ∈S |p q (λ)| = max λ∈[1−ε,1+ε] |p q (λ)|

≤ (max ^... Q

λ

^ε_j

1 −

λ λ

^ε_j

)(max ^...

T

_q−rε

((1−λ)/ε) T

_q−rε

(1/ε)

)

= (max ... Q

λ

^ε_j

1 −

λ λ

^ε_j

)

1 T

_q−rε

(1/ε) . Now first notice that

T q−r

ε

1 ε

= T q−r

ε

1+ε 1−ε + 1

1+ε 1−ε − 1

!

> 1 2



 q 1+ε

1−ε + 1 q 1+ε

1−ε − 1





q−r

ε

.

Then denote by ˆ λ ^ε _j those eigenvalues λ ^ε _j satisfying the inequalities λ ^ε _j < 1 − ε, λ ^ε _j < 1

2 (1 + ε) and observe that

max λ∈[1−ε,1+ε] Q

λ

^ε_j

1 −

λ λ

^ε_j

≤ max ^... Q

ˆ λ

^ε_j

1 −

λ λ

^ε_j

= Q

λ ˆ

^ε_j

1+ε λ ˆ

^ε_j

− 1

.

1

(2)

So, we have

max λ∈S |p ^q (λ)| ≤ Q

λ ˆ

^ε_j

1+ε λ ˆ

^ε_j

− 1

2 q

1+ε 1−ε

−1 q

1+ε

1−ε

+1

! q−r

ε

≤ 2

1+ε

min λ(A) − 1 #ˆ λ

^ε_j

q

1+ε 1−ε

−1 q

1+ε

1−ε

+1

! q−r

ε

≈

1+ε

min λ(A) − 1 #ˆ λ

^ε_j

ε

^q

ε

^rε

2

^q−rε−1

,

where in the latter approximation we have used the following Taylor expansion

f (ε) = q 1+ε

1−ε − 1 q 1+ε

1−ε + 1

= ε 2 + ε ²

2 f ⁰⁰ (0) + . . . .

Dubbio su GStrang

If t ² < 1 then Ax i = λ i Ax i , λ i > 0, for n linearly independent vectors x i . Thus, if X denotes the matrix whose columns are the x i then AX = AXD, D = diag (λ i ) ⇒ det(A) 6= 0 ⇒ A ⁻¹ Ax i = λ i x _i ⇒ i λ i sono gli autovalori di A ⁻¹ A.

Since (A ⁻¹ A) ⁻¹ = A ⁻¹ A = L ^−T L ⁻¹ A = L ^−T (L ⁻¹ AL ^−T )L ^T , then the eigenvalues of L ⁻¹ AL ^−T are 1/λ i > 0.

⇒ L ⁻¹ AL ^−T is p.d. ⇒ A is p.d. ⇒ A ⁻¹ A and E ⁻¹ AE ^−T , EE ^T = A, have the same eigenvalues.

But, why the x i are independent?

Sui metodi iterativi (H, u k ) r _k = b − Ax ^k

H = A ^T A: x k+1 = x k + _kAu ^r

^T^k

^Au

^k

k

²₂

u _k

becomes Richardson for u k = r k , and a method which converges in one step if u k = A ⁻¹ r _k = x − x k . Choose u k = L ⁻¹ A r _k ?

For the method

x _k+1 = x k + (x − x k ) ^T Hu u ^T Hu u we know that |(x − x k ) ^T Hu|/ √

u ^T Hu → 0, or, in case u are vectors of the canonical basis, |(x − x k ) ^T Hu| → 0. So, x − x k tends to be orthogonal to u.

Assume s k ∈ {1, 2, . . . , n} are chosen so that ∀ i we have s k = i for an infinite number of k. Set u = e s

k

. Then x − x k tends to be orthogonal to all vectors of the canonical basis, and thus tends to be the null vector. It follows that Gauss-Seidel an Southwell are not the only laws leading to the convergence of the scheme with u = e s

k

. Some of these remarks are not correct, why ?

’ <(A) > 0’ vs ’p.d. of hermitian part of A’

Let A be a n × n matrix with complex entries.

Set A h = ¹ ₂ (A + A ^∗ )) (hermitian part of A) and A ah = ¹ ₂ (A − A ^∗ ) (anti- hermitian part of A). Obvioulsly A = A h + A ah .

If z ^∗ A h z > 0 for all non null complex vectors z, or, equivalently, if A h is p.d., then <(λ(A)) > 0.

2

(3)

There exists x 6= 0 such that λ(A) = x ^∗ Ax. Thus

<(λ(A)) = <(x ^∗ A h x + x ^∗ A ah x ) = x ^∗ A h x > 0

( x ^∗ A ah x = (x ^∗ A ah x ) ^∗ = x ^∗ A ^∗ _ah x = −x ^∗ A ah x , so x ^∗ A ah x is is a complex number with null real part).

If <(λ(A)) > 0 and A is normal then z ^∗ A h z > 0, ∀ z 6= 0.

Let x k be eigenvectors of A (Ax k = λ k x _k ) such that x ^∗ _i x _j is 1 if i = j and 0 otherwise. Then

z ^∗ A h z = <(z ^∗ Az) = <(( P α i x _i ) ^∗ A(P α j x _j ))

= <(P |α ^k | ² λ k ) = P |α ^k | ² <(λ ^k ) > 0, z 6= 0.

The hypothesis <(λ(A)) > 0 alone is not sufficient to assure that z ^∗ A h z > 0,

∀ z 6= 0. For instance, if A =

1 a 0 1

, a ∈ R, a ² − 4 ≥ 0, then

[x y]A

x y

= [x xa + y]

x y

= x ² + axy + y ²

is not positive for all x, y ∈ R, [x y] 6= [0 0]. (We have used the fact that if z ∈ R ⁿ , A ∈ R ^n×n then z ^∗ A h z = z ^∗ Az).

On the rate of convergence of some iterative methods in cases where the coefficient matrix has a p.d. hermitian part

For some linear systems iterative solvers (f.i. GMRES), it is known that the residual at step k, r k = b − Ax k , satisfies the inequality

kr k k 2 ≤

1 − min λ(A h ) ² max λ(A ^∗ A)

^k2

kr 0 k 2 (1)

(I’m not sure if the norm is the 2-norm) provided that the hermitian part A h of A is p.d. (This assertion is surely true in case of systems Ax = b with A and b real, see [Saad book]).

Remark. If A is also hermitian or, equivalently, if A is p.d., then the number in the parentheses becomes 1 − 1/µ ² (A) ² .

Let us observe that

A h p.d. ⇒ 0 < min λ(A h ) ² max λ(A ^∗ A) ≤ 1 where the equality (on the right) holds iff A = αI, α > 0.

Lemma. If A h is p.d. then <(λ(A)) ≥ min λ(A ^h ).

Proof. There exists a vector x, kxk ² = 1, such that <(λ(A)) = x ^∗ A h x . The thesis follows from the minmax eigenvalues representation theory for hermitian matrices.

By the Lemma, we have

min λ(A h ) ≤ <(λ(A)) ≤ p<(λ(A)) ² + =(λ(A)) ²

= |λ(A)| ≤ ρ(A) ≤ kAk 2 = pmax λ(A ^∗ A).

3

(4)

Note, more precisely, that the equality <(λ(A)) = ρ(A) holds for all λ(A) iff the eigenvalues λ(A) are all equal to an α, α > 0, iff there exists T unitary such that

T ⁻¹ AT =







α ∗ ∗

. .. ∗ α





 .

If at least one of the entries ∗ in the upper triangular part of this matrix is nonzero, then kAk ² > α = ρ(A). It follows that min λ(A h ) < pmax(λ(A ^∗ A)) unless A = αI, α > 0.

Let A be a non singular real n × n matrix. We know that the condition

<(λ(A)) > 0 (which is verified f.i. if the hermitian part of A is p.d.) is necessary and sufficient for the existence of ˆ ω > 0 such that ∀ ω ∈ (0, ˆ ω) the Richardson iterative scheme

x ₀ ∈ R ⁿ , x _k+1 = x k + ω(b − Ax ^k ), k = 0, . . . (R(ω)) converges to the solution x = A ⁻¹ b of the linear system Az = b.

Let ω ott be a (it may be not unique) value of ω for which the rate of conver- gence is maxima, i.e. ρ(I − ω ott A) = min ρ(I − ωA).

Now we prove the following assertion:

If A is real, normal and its eigenvalues are of the form α + iβ j , j = 1, . . . , n, α > 0, β j zero for at least one j (these hypothesis imply that the hermitian part of A is p.d.), then the error at step k of R(ω ott ) satisfies the inequality

kx − x k k 2 ≤

1 − 1

µ 2 (A) ²

^k₂

kx − x 0 k 2 (2)

where µ 2 (A) = q

α ² + max β _j ² /α is the condition number of A (compare with (1), where A is p.d.).

So, µ 2 (A) small is a sufficient condition to have fast convergence also in cases where A is not p.d..

Example. A = B + αI, B real, B ^∗ = −B, det(B) = 0, α > 0. For instance

B =





0 1 0

−1 0 1

0 −1 0



 .

Proof. First recall that if M is a normal matrix then ρ(M ) = kMk 2 . Observe that ρ(I − ωA) ² = max k g k (ω), g k (ω) = ω ² (α ² + β _k ² ) − 2ωα + 1.

Since g _k ⁰ (ω) = 2ω(α ² + β _k ² ) − 2α, we have g ⁰ _k ( α

α ² + β _k ² ) = 0, g ⁰ _k (0) = −2α

and thus g _k ⁰ (0) is constant with respect to k and negative. It follows that R(ω) converges iff ω ∈ (0, 2α/(α ² + max β ² _k )). Moreover,

ω ott = α

α ² + max β _k ² , ρ(I − ω ^ott A) ² = 1 − α ²

α ² + max β _k ² = 1 − 1 µ 2 (A) ² , kx − x ^k k ² ≤ kI − ω ^ott Ak ^k 2 kx − x ⁰ k ² = ρ(I − ω ^ott A) ^k kx − x ⁰ k ² which imply the thesis.

4

Denote by λ εj the eigenvalues of A outside the interval [1 − ε, 1 + ε] and by r ε the number of such eigenvalues. Set S = [1 − ε, 1 + ε] ∪ {λ ε j } and let p q be the polynomial

A che serve il clustering attorno a 1

Let A be a p.d. matrix and ε, 0 < ε < 1, be fixed.

Denote by λ ε j the eigenvalues of A outside the interval [1 − ε, 1 + ε] and by r ε the number of such eigenvalues. Set S = [1 − ε, 1 + ε] ∪ {λ ε j } and let p q be the polynomial

p q (λ) = Y

λ

1 − λ λ ε j

! T q−r

((1 − λ)/ε)

T q−r

(1/ε) , q ≥ r ε

where T k (x) denotes the chebycev polynomial of degree k. ((b+a−2λ)/(b−a) = (1 − λ)/ε, (b + a)/(b − a) = 1/ε, if a = 1 − ε, b = 1 + ε ). Notice that S is a set containing all the eigenvalues of A, and p q has exactly degree q and p q (0) = 1.

Then one can say that if x q is the q-th vector generated by the CG method when solving Ax = b, then

kx − x q k A ≤ (max

λ∈S |p q (λ)|)kx − x 0 k A . (bound) This bound for kx − x q k A allows a better evaluation of the CG rate of conver- gence with respect to the well known bound

kx − x q k A ≤ 2

p µ 2 (A) − 1 pµ 2 (A) + 1

! q

kx − x 0 k A , µ 2 (A) = max λ(A)

min λ(A) (wkbound) in case it is known that most of (almost all) the eigenvalues of A are in some interval [1 − ε, 1 + ε] where ε is small (almost zero).

(Note that in general CG has a linear rate of convergence, as a consequence of (wkbound)).

Let us prove these assertions, by evaluating max λ∈S |p q (λ)|.

max λ∈S |p q (λ)| = max λ∈[1−ε,1+ε] |p q (λ)|

≤ (max ... Q

λ

1 −

λ λ

)(max ...

T

((1−λ)/ε) T

(1/ε)

)

= (max ... Q

λ

1 −

λ λ

)

1 T

(1/ε) . Now first notice that

T q−r

 1 ε



= T q−r

1+ε 1−ε + 1

1+ε 1−ε − 1

!

> 1 2



 q 1+ε

1−ε + 1 q 1+ε

1−ε − 1





q−r

.

Then denote by ˆ λ ε j those eigenvalues λ ε j satisfying the inequalities λ ε j < 1 − ε, λ ε j < 1

2 (1 + ε) and observe that

max λ∈[1−ε,1+ε] Q

λ

1 −

λ λ

≤ max ... Q

ˆ λ

1 −

λ λ

= Q

λ ˆ



1+ε λ ˆ

− 1

 .

1

So, we have

max λ∈S |p q (λ)| ≤ Q

λ ˆ



1+ε λ ˆ

− 1

 2

q

Denote by λ ^ε _j the eigenvalues of A outside the interval [1 − ε, 1 + ε] and by r ε the number of such eigenvalues. Set S = [1 − ε, 1 + ε] ∪ {λ ^ε j } and let p q be the polynomial

1 − λ λ ^ε _j

λ∈S |p q (λ)|)kx − x 0 k A . (bound) This bound for kx − x ^q k ^A allows a better evaluation of the CG rate of conver- gence with respect to the well known bound

kx − x ^q k ^A ≤ 2

kx − x ⁰ k ^A , µ 2 (A) = max λ(A)

≤ (max ^... Q

)(max ^...

1 ε

Then denote by ˆ λ ^ε _j those eigenvalues λ ^ε _j satisfying the inequalities λ ^ε _j < 1 − ε, λ ^ε _j < 1

≤ max ^... Q

.

max λ∈S |p ^q (λ)| ≤ Q

2

≤ 2

min λ(A) − 1 #ˆ λ

≈

min λ(A) − 1 #ˆ λ

= ε 2 + ε ²

2 f ⁰⁰ (0) + . . . .

If t ² < 1 then Ax i = λ i Ax i , λ i > 0, for n linearly independent vectors x i . Thus, if X denotes the matrix whose columns are the x i then AX = AXD, D = diag (λ i ) ⇒ det(A) 6= 0 ⇒ A ⁻¹ Ax i = λ i x _i ⇒ i λ i sono gli autovalori di A ⁻¹ A.

Since (A ⁻¹ A) ⁻¹ = A ⁻¹ A = L ^−T L ⁻¹ A = L ^−T (L ⁻¹ AL ^−T )L ^T , then the eigenvalues of L ⁻¹ AL ^−T are 1/λ i > 0.

⇒ L ⁻¹ AL ^−T is p.d. ⇒ A is p.d. ⇒ A ⁻¹ A and E ⁻¹ AE ^−T , EE ^T = A, have the same eigenvalues.

Sui metodi iterativi (H, u k ) r _k = b − Ax ^k

H = A ^T A: x k+1 = x k + _kAu ^r

^Au

u _k

becomes Richardson for u k = r k , and a method which converges in one step if u k = A ⁻¹ r _k = x − x k . Choose u k = L ⁻¹ A r _k ?

x _k+1 = x k + (x − x k ) ^T Hu u ^T Hu u we know that |(x − x k ) ^T Hu|/ √

u ^T Hu → 0, or, in case u are vectors of the canonical basis, |(x − x k ) ^T Hu| → 0. So, x − x k tends to be orthogonal to u.

Set A h = ¹ ₂ (A + A ^∗ )) (hermitian part of A) and A ah = ¹ ₂ (A − A ^∗ ) (anti- hermitian part of A). Obvioulsly A = A h + A ah .

If z ^∗ A h z > 0 for all non null complex vectors z, or, equivalently, if A h is p.d., then <(λ(A)) > 0.

There exists x 6= 0 such that λ(A) = x ^∗ Ax. Thus

<(λ(A)) = <(x ^∗ A h x + x ^∗ A ah x ) = x ^∗ A h x > 0

( x ^∗ A ah x = (x ^∗ A ah x ) ^∗ = x ^∗ A ^∗ _ah x = −x ^∗ A ah x , so x ^∗ A ah x is is a complex number with null real part).

If <(λ(A)) > 0 and A is normal then z ^∗ A h z > 0, ∀ z 6= 0.

Let x k be eigenvectors of A (Ax k = λ k x _k ) such that x ^∗ _i x _j is 1 if i = j and 0 otherwise. Then

z ^∗ A h z = <(z ^∗ Az) = <(( P α i x _i ) ^∗ A(P α j x _j ))

= <(P |α ^k | ² λ k ) = P |α ^k | ² <(λ ^k ) > 0, z 6= 0.

The hypothesis <(λ(A)) > 0 alone is not sufficient to assure that z ^∗ A h z > 0,

1 a 0 1

, a ∈ R, a ² − 4 ≥ 0, then

x y

x y

= x ² + axy + y ²

is not positive for all x, y ∈ R, [x y] 6= [0 0]. (We have used the fact that if z ∈ R ⁿ , A ∈ R ^n×n then z ^∗ A h z = z ^∗ Az).