(d k conjugate to d k−1 ). Here below is the algorithm we obtain:

(1)

CG method

Assume A, the coefficient matrix of our system Ax = b, positive definite (recall that A and b are real). In the general scheme choose H = A, d 0 = r 0 = b − Ax 0 , d k = r k + β k−1 d _k−1 , k = 1, 2, . . ., (r k = b − Ax k ) where β k−1 is such that

(d k , d k−1 ) A = 0

(d k conjugate to d k−1 ). Here below is the algorithm we obtain:

x ₀ ∈ R ⁿ , r 0 = b − Ax 0 , d 0 = r 0 . For k = 0, 1, . . . , {

τ k = ^d

T k

r

_k

d

^T_k

Ad

k

x _k+1 = x k + τ k d _k

r _k+1 = b − Ax ^k+1 = r k − τ ^k Ad k

β k = − ^r

T k+1

Ad

k

d

^T

k

Ad

k

d _k+1 = r k+1 + β k d _k }

known as Conjugate Gradient (CG) algorithm.

Remarks. Note that

0 = (x − x k+1 ) ^T Hd k = (x − x k+1 ) ^T Ad k = r ^T _k+1 d _k , r ^T _k+1 d _k+1 = kr k+1 k ² 2 . As a consequence, if at step s we have r s = b − Ax ^s 6= 0, then d ^s 6= 0, τ ^s is well defined and not zero . . .: the algorithm works.

If r 0 , . . ., r m−1 are non null and r m = 0, then β m−1 = 0, d m = 0, τ m cannot be defined, but it doen’t matter since x m = A ⁻¹ b . This hypothesis is effectively verified, in fact there exists m ≤ n = the order of A, such that r ^m = 0 (see below).

Alternative expressions for τ k and β k hold:

τ k = d ^T

k r _k d ^T

k Ad k

= r ^T

k r _k d ^T

k Ad k

(d k = r k + β k−1 d _k−1 , r ^T _k d _k−1 = 0), β k = − ^r

T k+1

Ad

k

d

^T

k

Ad

k

= − ^r

T

k+1

τ

k⁻¹

(r

k

−r

k+1

) d

^T

k

τ

k⁻¹

(r

k

−r

k+1

)

= ^−r

T

k+1

r

_k

+r

^T_k+1

r

_k+1

d

^T

k

r

_k

= ^r

T k+1

r

_k+1

r

^T

k

r

_k

. The latter identity uses the result

r ^T _k+1 r _k = 0

(residual at step k + 1 is orthogonal to residual at step k, exactly as in the Gradient method) which is not obvious:

r ^T

k+1 r _k = r ^T _k+1 (d k − β k−1 d _k−1 ) = −β k−1 r ^T

k+1 d _k−1

= −β k−1 (r k − τ k Ad k ) ^T d _k−1 = β k−1 τ k d ^T

k Ad k−1 = 0.

First main result: If r 0 , r 1 , . . . , r p are non null, then

d ^T _l Ad j = 0, r ^T _l r _j = 0, 0 ≤ j < l ≤ p.

(2)

That is, each new residual (search direction) is orthogonal (conjugate) to all previous residuals (search directions). As a consequence, the residual r m must be null for some m ≤ n, or, equivalently, CG finds the solution of Ax = b in at most n steps.

Proof. . . .

A useful representation of the residuals. If r 0 , r 1 , . . . , r k−1 are non null, then there exist polynomials s k (λ), q k (λ) such that

r _k = s k (A)r 0 , d _k = q k (A)r 0 ,

s k (λ) = (−1) ^k τ 0 τ 1 · · · τ k−1 λ ^k + . . . + 1, τ 0 τ 1 · · · τ k−1 6= 0.

Proof (by induction). The equality r 0 = s 0 (A)r 0 holds if s 0 (λ) = 1; d 0 = q 0 (A)r 0 holds if q 0 (λ) = 1. Moreover,

r _k+1 = r k − τ k Ad k = s k (A)r 0 − τ k Aq k (A)r 0 = s k+1 (A)r 0

if s k+1 (λ) = s k (λ) − τ k λq k (λ), and

d _k+1 = r k+1 + β k d _k = s k+1 (A)r 0 + β k q k (A)r 0 = q k+1 (A)r 0

if q k+1 (λ) = s k+1 (λ) + β k q k (λ). Finally, since

s k+1 (λ) = s k (λ) − τ ^k λ(s k (λ) + β k−1 q k−1 (λ)),

the coefficient of λ ^k+1 in s k+1 (λ) is −τ ^k times the coefficient of λ ^k in s k (λ).

Thus, by the inductive assumption, it must be (−1) ^k+1 τ 0 τ 1 · · · τ k−1 τ k . Also, the coefficient of λ ⁰ in s k+1 (λ) is equal to the coefficient of λ ⁰ in s k (λ), which is 1 by the inductive assumption.

Second main result: r k = 0 for some k ≤ #{distinct eigenvalues of A}.

Proof. Let µ 1 , µ 2 , . . . , µ m be the distinct eigenvalues of A (m ≤ n = order of A). Assume that CG requires more than m steps to converge. So, the vectors r ₀ , r 1 , . . . , r m are non null, and, by the First main result, orthogonal (⇒ linearly independent). Let V be an orthonormal matrix whose columns are eigenvectors of A, thus V ^T = V ⁻¹ and AV = V D for D diagonal with the eigenvalues of A as diagonal entries. Observe that there is a degree-m polynomial which is null in A,

m

Y

j=1

(A−µ j I) =

m

Y

j=1

(V DV ^T −µ j I) =

m

Y

j=1

V (D−µ j I)V ^T == V

m

Y

j=1

(D−µ j I)V ^T = 0.

As a consequence the matrices A ⁰ = I, A, . . ., A ^m are linearly dependent. But this implies that the dimension of the space

S m+1 (r 0 ) = Span {r ⁰ , Ar 0 , . . . , A ^m r ₀ } = Span {r ⁰ , r 1 , . . . , r m }

is smaller than m + 1, which is absurd. It follows that one of the vectors r i , i = 0, . . . , m, must be null.

Let Π ¹ _k be the set of all polynomials of degree exactly k whose graphic

pass through (0, 1). We now see that the polynomial s k (λ) in the expression

r _k = s k (A)r 0 is a very particular polynomial in the class Π ¹ _k : it makes the norm

(3)

of the vector p k (A)r 0 , p k ∈ Π ¹ k , minimum (for a suitable choice of the norm).

This result let us give estimates of the rate of convergence of CG, as precise as good is the knowledge about the location of the eigenvalues of A. For example, if it is known that the eigenvalues of A cluster around 1, then CG must converge with a superlinear rate of convergence (see toe 1a).

Notice that r k = s k (A)r 0 = r 0 + ˆ h _k , for a particular vector ˆ h _k in the space M = Span {Ar ⁰ , A ² r ₀ , . . . , A ^k r ₀ }. Take a generic vector h ^k in this space. Then

kr 0 + h k k ² A

⁻¹

= kr 0 + ˆ h _k + h k − ˆ h _k k ² A

⁻¹

= kr ⁰ + ˆ h _k k ² A

⁻¹

+ kh ^k − ˆh ^k k ² A

⁻¹

+ 2(r 0 + ˆ h _k , h k − ˆ h _k ) A

⁻¹

. Now observe that the latter inner product is null, in fact, for j = 0, . . . , k − 1, 0 = r ^T _k r _j = r ^T _k A ⁻¹ Ar j = (r k , Ar j ) A

⁻¹

, that is, r k is A ⁻¹ -orthogonal to the space Span {Ar ⁰ , Ar 1 , . . . , Ar k−1 }, but this space is exactly M. The thesis follows since h k − ˆ h _k ∈ M. So we have:

kr 0 + h k k ² A

⁻¹

= kr 0 + ˆ h _k k ² A

⁻¹

+ kh k − ˆh k k ² A

⁻¹

≥ kr 0 + ˆ h _k k ² A

⁻¹

. In other words,

kr ^k k ² A

⁻¹

= kr ⁰ + ˆ h _k k ² A

⁻¹

= min{kr ⁰ + h k k ² A

⁻¹

: h k ∈ M}

= min{kp ^k (A)r 0 k ² A

⁻¹

: p k ∈ Π ¹ k }. (m) Comparison with GMRES. Notice that for any h k ∈ M we have

r ₀ + h k = b − A(x ⁰ + z), z = −A ⁻¹ h _k ∈ S ^k (r 0 ) = Span {r ⁰ , Ar 0 , . . . , A ^k−1 r ₀ }.

Thus, the vector x k generated by the CG method is of type x 0 + ˆ z where ˆ z solves the problem

kb − A(x ⁰ + ˆ z )k A

⁻¹

= min{kb − A(x ⁰ + z)k A

⁻¹

: z ∈ S ^k (r 0 )} (p) (S k (r 0 ) is known as Krilov space). GMRES is a method able to solve Ax = b in at most n steps under the only assumption det(A) 6= 0. (Like CG, GMRES in order to be competitive must be used as an iterative method, i.e. less than n steps must be sufficient to give a good approximation of x). In the k-th step of GMRES it is defined a vector x k of type x 0 + ˆ z where ˆ z solves exactly the problem (p) but the norm involved is the euclidean one. So, CG is a minimal residual algorithm different from GMRES| A pd .

It is easy to see that the condition (m) can be rewritten as follows:

kx − x k k ² A = min

p

k

∈Π

¹k

kp k (A)(x − x 0 )k ² A .

Now we give a bound for the quantity kp ^k (A)(x − x ⁰ )k ² A , p k ∈ Π ¹ k , which can be evaluated if (besides A, b) also some information about the location of the eigenvalues λ i of A is given. Let v i 6= 0 be such that Av ⁱ = λ i v _i , v ^T _i v _j = δ ij . Then

kp ^k (A)(x − x ⁰ )k ² A = (x − x ⁰ ) ^T Ap k (A) ² (x − x ⁰ ) = (P α i v _i ) ^T P α i Ap k (A) ² v _i

= (P α i v _i ) ^T P α i Ap k (λ i ) ² v _i = (P α i v _i ) ^T P α i λ i p k (λ i ) ² v _i

= P α ² _i λ i p k (λ i ) ² ≤ max ⁱ |p ^k (λ i )| ² kx − x ⁰ k ² A .

(4)

So, we obtain the following

Third main result: If x k is the k-th vector generated by CG when applied to solve the pd linear system Ax = b, then

kx − x ^k k ² A = min

p

k

∈Π

¹k

kp ^k (A)(x − x ⁰ )k ² A ≤ max _i |p ^k (λ i )| ² kx − x ⁰ k ² A , ∀ p ^k ∈ Π ¹ k .

So, if S ⊂ R, p ^k ∈ Π ¹ k , M k ∈ R are known such that λ ⁱ ∈ S ∀ i and |p ^k (λ)| ≤ M ^k

∀ λ ∈ S, then kx − x ^k k ^A ≤ M ^k kx − x ⁰ k ^A .

Let us see two applications of the latter result. As consequences of the first application we observe that CG (considered as an iterative method) has a linear rate of convergence, is in general faster than G, and is competitive (f.i. with direct methods) if λ max and λ min are comparable. However, as a consequence of the second application, the latter condition is not necessary: the rate of convergence of CG remains high (so, CG remains competitive) if most of the eigenvalues are in [λ min , ˆ λ] with λ min and ˆ λ comparable. Further useful applications of the Third main result hold. In particular, as a consequence of one of these (see toe 1a), it can be stated that CG has a superlinear rate of convergence if most of the eigenvalues of A are in the interval S = [1 − ε, 1 + ε]

(. . .).

(1)

S = [λ min , λ max ], p k (x) = T k

λ

max

+λ

min

−2x λ

max

−λ

min

T k

λ

max

+λ

min

λ

max

−λ

min

⇒

kx − x ^k k ^A < 2

p µ 2 (A) − 1 pµ 2 (A) + 1

! k

kx − x ⁰ k ^A , µ 2 (A) = λ max

λ min . (2)

S = [λ min , ˆ λ] ∪ {λ i : λ i > ˆ λ}, r ˆ λ = #{i : λ i > ˆ λ},

p k (x) = Π _{i: λ}

_i

_>ˆ _λ

1 − x

λ i

T k−r

λˆ

_ˆ

λ+λ

min

−2x λ−λ ˆ

min

T k−r

λˆ

_ˆ

λ+λ

min

λ−λ ˆ

min

⇒

kx − x k k A < 2





q λ/λ ˆ min − 1 q λ/λ ˆ min + 1





k−r

ˆλ

kx − x 0 k A , k ≥ r λ ˆ .

The applications (1) and (2) of the Third main result suggest an idea. When λ min and λ max are not comparable and the eigenvalues of A are uniformly dis- tributed in the interval [λ min , λ max ] (in this case all n steps of CG are required in order to give a good approximation of x), replace the given system Ax = b with an equivalent system ˜ A˜ x = ˜ b , ˜ A = E ⁻¹ AE ^−T , ˜ x = E ^T x , ˜ b = E ⁻¹ b , det(E) 6= 0, where the matrix E is such that µ 2 ( ˜ A) < µ 2 (A) and has one of the following properties

• µ 2 ( ˜ A) << µ 2 (A)

• ˜ A has much less distinct eigenvalues than A

(5)

• ˜ A has the eigenvalues much more clustered (around 1) than A Then apply CG to ˜ A˜ x = ˜ b .

If such matrix E can be found, then the pd matrix P = EE ^T is said preconditioner.

Note that E ^−T AE ˜ ^T = P ⁻¹ A, so one could look directly for a pd matrix P such that the (real positive) eigenvalues of P ⁻¹ A have the required properties.

For example, in order to obtain something of type P ⁻¹ A ≈ I (which would result in a very high increase of the CG rate of convergence) one could choose P as an approximation A of A. We shall see that applying CG to ˜ A˜ x = ˜ b requires, for each step, a surplus of computation: solve a system of type P z = h k . This computation must not make CG slow, in other words P must be a lower complexiy matrix than A. Also notice that E 1 and E 2 , E 1 6= E ² , E 1 E ₁ ^T = E 2 E ₂ ^T , define matrices ˜ A 1 = E ₁ ⁻¹ AE ₁ ^−T and ˜ A 2 = E ₂ ⁻¹ AE ₂ ^−T , ˜ A 1 6= ˜ A 2 , with the same spectrum. For this reason one prefers to call preconditioner P instead of E.

A final remark. The vector x = A ⁻¹ b we are looking for is also the minimum point of the function F (z) = ¹ ₂ z ^T Az − z ^T b . Analogously, ˜ x = ˜ A ⁻¹ b ˜ is the minimum point of the function ˜ F (z) = ¹ ₂ z ^T Az − z ˜ ^T b ˜ . The preconditioning technique replaces the (sections of the) contours of F with the more spherical (sections of the) contours of ˜ F , and this results in a more efficient minimization when using gradient-type methods.

Let us write the preconditioned version of the CG algorithm, well defined once that A, b and the preconditioner P are given.

Let us apply CG to the system ˜ A˜ x = ˜ b :

˜

x ₀ ∈ R ⁿ , ˜r 0 = ˜ b − ˜ A˜ x ₀ , ˜ d ₀ = ˜r 0 . For k = 0, 1, . . . , {

˜ τ k = ^˜ ^r

T k

˜ r

_k

d ˜

^T_k

A˜ ˜ d

k

˜

x _k+1 = ˜ x _k + ˜ τ k d ˜ _k

˜r k+1 = ˜ b − ˜ A˜ x _k+1 = ˜r k − ˜τ ^k A˜ ˜ d _k β ˜ k = ^˜ ^r

T k+1

˜ r

_k+1

˜ r

^T

k

˜ r

_k

d ˜ _k+1 = ˜r k+1 + ˜ β k d ˜ _k }

Note that the convergence rate of the sequence {˜x ^k } can be evaluated by using the following results

k˜x − ˜x k k A ^˜ < 2



 q

µ 2 ( ˜ A) − 1 q

µ 2 ( ˜ A) + 1





k

k˜x − ˜x 0 k A ^˜ , µ 2 ( ˜ A) = λ ˜ max

λ ˜ min

,

k˜x − ˜x k k A ^˜ < 2





q˜ˆλ/˜λ min − 1 q˜ˆλ/˜λ min + 1





k−r

λ˜ˆ

k˜x − ˜x 0 k A ^˜ , k ≥ r λ ^˜ ˆ :

if µ 2 ( ˜ A) << µ 2 (A) or ˜ A has most of the eigenvalues ˜ λ i in [˜ λ min , λ] and ˜ ˆ λ/˜ ˜ ˆ λ min <<

λ max /λ min , then ˜ x _k → ˜x = E ^T x with a greater rate than x k → x.

(6)

Now we obtain each row of the preconditioned CG method. Define x k = E ^−T x ˜ _k , r k = b − Ax k , and d k = E ^−T d ˜ _k . Then

˜r k = b ˜ − ˜ A˜ x _k = E ⁻¹ b − E ⁻¹ AE ^−T (E ^T x _k )

= E ⁻¹ r _k = E ^T E ^−T E ⁻¹ r _k = E ^T h _k , h k = P ⁻¹ r _k ,

˜r ^T _k ˜r k = r ^T _k E ^−T E ⁻¹ r _k = r ^T _k h _k , d ˜ ^T

k A˜ ˜ d _k = d ˜ ^T

k E ⁻¹ AE ^−T d ˜ _k = d ^T _k Ad k . Thus

˜

τ k = r ^T _k h _k d ^T

k Ad k

. (row1)

Moreover, we have

˜

x _k+1 = E ^T x _k+1 = E ^T x _k + ˜ τ k E ^T d _k ⇒

x _k+1 = x k + ˜ τ k d _k , (row2)

˜r k+1 = E ⁻¹ r _k+1 = E ⁻¹ r _k − ˜τ k E ⁻¹ AE ^−T E ^T d _k ⇒

r _k+1 = r k + ˜ τ k Ad k , (row3)

β ˜ k = r ^T

k+1 h _k+1 r ^T

k h _k (row4)

(row3.5: h k+1 = P ⁻¹ r _k+1 ),

˜

d _k+1 = E ^T d _k+1 = E ^T h _k+1 + ˜ β k E ^T d _k ⇒

d _k+1 = h k+1 + ˜ β k d _k . (row5)

Finally, in order to initialize the algorithm, set:

x ₀ = E ^−T x ˜ ₀ , r 0 = b − Ax ⁰ ,

d ₀ = E ^−T d ˜ ₀ = E ^−T ˜r 0 = E ^−T E ^T h ₀ = h 0 . (row0) Regarding the convergence rate of the sequence {x k }, generated by the algorithm row0 and, for k = 0, 1, . . ., rows1, 2, 3, 3.5, 4, 5, note that

k˜x k − ˜xk ² A ˜ = (˜ x _k − ˜x) ^T A(˜ ˜ x _k − ˜x) = (E ^T x _k − E ^T x ) ^T E ⁻¹ AE ^−T (E ^T x _k − E ^T x )

= (x k − x) ^T A(x k − x) = kx k − xk ² A .

Thus the bounds for k˜x − ˜x k k A ^˜ obtained above, can be rewritten as follows

kx

k

−xk

A

kx

0

−xk

A

≤ 2

√

µ

2

( ˜ A)−1

√ µ

2

( ˜ A)+1

k

, µ 2 ( ˜ A) = ^˜ ^λ _˜

^max

λ

min

,

kx

k

−xk

^A

kx

0

−xk

A

≤ 2 ^{q ˜ˆλ/˜λ}

^min

⁻¹

q ˜ˆλ/˜λ

min

+1

! k−r

˜ˆλ

, k ≥ r ˜ ˆ λ .

Circulant and τ matrices and best approximations

Find the nearest circulant matrix (in the Frobenius norm) to a symmetric

4 × 4 Toeplitz matrix A = [t |i−j| ] ⁴ _i,j=1 and call it A. Write A in the particular

case t 0 = 2, t 1 = −1, t 2 = t 3 = 0. Extend to the n × n case, noting that the

first row of A can be computed in O(n) arithmetic operations.

(7)

The matrix A is the circulant matrix whose first row is [t 0 (3t 1 + t 3 )/4 t 2 (3t 1 + t 3 )/4].

So, in the particular case,

A = 3 4







8/3 −1 0 −1

−1 8/3 −1 0

0 −1 8/3 −1

−1 0 −1 8/3





 .

Find the nearest τ matrix (in the Frobenius norm) to a symmetric 4 × 4 Toeplitz matrix A = [t |i−j| ] ⁴ _i,j=1 and call it A. Write A in the particular case t 0 = 2, t 1 = −1, t 2 = t 3 = 0. Extend to the n × n case, noting that the first row of A can be computed in O(n) arithmetic operations.

The algebra τ . We recall that τ is the set of all polynomials in the Toeplitz matrix X = [t |i−j| ] ⁿ _i,j=1 , t 1 = 1, t i = 0 i 6= 1. For n = 4 and n = 5 a basis for τ is:

J 1 = I, J 2 =







0 1 0 0 1 0 1 0 0 1 0 1 0 0 1 0





 , J 3 =







0 0 1 0 0 1 0 1 1 0 1 0 0 1 0 0







, J 4 = J,

J 1 = I, J 2 =







0 1 0 0 0 1 0 1 0 0 0 1 0 1 0 0 0 1 0 1 0 0 0 1 0





 , J 3 =







0 0 1 0 0 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 0 0 1 0 0





 ,

J 4 =







0 0 0 1 0 0 0 1 0 1 0 1 0 1 0 1 0 1 0 0 0 1 0 0 0







, J 5 = J.

From these cases one easily deduces a basis for τ for n generic.

A n × n matrix A belongs to the space τ iff AX = XA iff its entries satisfy the following cross-sum conditions:

a i,j−1 + a i,j+1 = a i−1,j + a i+1,j , 1 ≤ i, j ≤ n where a 0,j = a n+1,j = a i,0 = a i,n+1 = 0, 1 ≤ i, j ≤ n.

Matrices from tau are diagonalized by the (unitary) sine transform, in fact

XS = S diag (2 cos jπ

n + 1 , j = 1, . . . , n), S =

r 2

n + 1 sin ijπ n + 1

n

i,j=1 .

It can be shown that the eigenvalues of A ⁻¹ A cluster around 1 if A is the best

τ approximation (in the Frobenius norm) of A and A is a symmetric Toeplitz

matrix satisfying the hypothesis . . . (the same of the circulant case)

(8)

Given a 5 × 5 symmetric Toeplitz matrix A = [t |i−j| ] ⁵ _i,j=1 , write it in the form:

A = B +







0 · · · 0 .. . C .. . 0 · · · 0







where B and C are τ matrices of order 5 and 3, respectively. Extend to the n × n case.

How Richardson method arise

Perron theorem. Consider the following Cauchy problem dx(t)

dt = h(x(t)), t > 0, x(0) = x 0

where h is a vectorial function, h : R ⁿ → R ⁿ , such that h(ˆ x ) = 0, ˆ x ∈ R ⁿ , and h ∈ C ¹ in a neighbourhood of ˆ x . Let J ^h be the Jacobian matrix of h,

J ^h =







∂h

1

∂x

1

· · · _∂x ^∂h

n¹

.. . .. .

∂h

n

∂x

1

· · · ^∂h ∂x

ⁿn





 .

If <(λ(J ^h (ˆ x ))) < 0, then

∃ δ > 0 | kx ⁰ − ˆxk < δ ⇒ x(t) → ˆx, t → +∞

(δ = +∞ if <(λ(J ^h (x))) < 0, ∀ x ∈ R ⁿ ).

Example 1. Let f : R ⁿ → R and ˆx be such that f(ˆx) = min f(x). Notice that in ˆ x the gradient of f is null and the Hessian of f is pd (∇f(ˆx) = 0, ∇ ² f (ˆ x ) pd). Set h = −∇f. Then the vector h(ˆx) = −∇f(ˆx) is null, and the matrix J ^h (ˆ x ) = −∇ ² f (ˆ x ) is negative definite. In particular, we have λ(J ^h (ˆ x )) < 0. So, if x 0 ≈ ˆx, then the solution x(t) of the problem

dx(t)

dt = −∇f(x(t)), t > 0, x(0) = x ⁰

tends to ˆ x (as t → +∞). It follows that any Cauchy differential problem solver can be used to minimize a function.

Example 2. Let A be a n × n real matrix and assume that <(λ(A)) > 0.

Set h(x) = b − Ax where b ∈ R ⁿ . Note that h(A ⁻¹ b ) = 0, h ∈ C ¹ (R ⁿ ), and the eigenvalues of J ^h (A ⁻¹ b ) = J ^h (x) = −A have negative real parts. It follows that x(t), the solution of

dx(t)

dt = b − Ax(t), t > 0, x(0) = x 0 , (p) tends to A ⁻¹ b (as t → +∞) for any choice of x 0 . Let us give a method to find an approximation of such solution.

Fix a step ∆t > 0 and approximate the exact vectors x(t i ), t i = i∆t, i = 1, 2, . . ., with the vectors η(t i ) defined by

1 ∆t [η(t i−1 + ∆t) − η(t i−1 )] = b − Aη(t i−1 ),

η(t i ) = η(t i−1 + ∆t), i = 1, 2, . . .

(9)

Note that we are applying Euler method to the Cauchy differential problem (p). We see that the method obtained, η(t i ) = η(t i−1 ) + ∆t(b − Aη(t i−1 )), is the Richardson iterative method for solving linear systems Ax = b, which we know to converge (for a sufficiently small positive ∆t) just under the assumption

<(λ(A)) > 0. Notice that such method is also called Richardson-Euler method, and now we know the reason.

Compare Example 2 with Example 1; more precisely, is the second example

a particular case of the first one ?

(d k conjugate to d k−1 ). Here below is the algorithm we obtain:

CG method

Assume A, the coefficient matrix of our system Ax = b, positive definite (recall that A and b are real). In the general scheme choose H = A, d 0 = r 0 = b − Ax 0 , d k = r k + β k−1 d k−1 , k = 1, 2, . . ., (r k = b − Ax k ) where β k−1 is such that

(d k , d k−1 ) A = 0