An example of preconditioning Let A and E be the n × n matrices
A =
2 −1
−1 2 . ..
. .. ... −1
−1 2
, E =
1
−1 1 . .. ...
−1 1
.
We have
A = E ˜ −1 AE −T = I + ee T , e = [1 1 . . . 1] T .
The eigenvalues of the matrix ˜ A are: 1 n − 1 times, and 1 + e T e = n + 1. So, the condition number of ˜ A (in norm 2) is n + 1.
Let us compute the condition number of A. The eigenvalues of A are known in explicit form: 2 − 2 cos n+1 jπ , j = 1, . . . , n. Thus,
µ 2 (A) = 2 − 2 cos n+1 nπ
2 − 2 cos n+1 π = 1 + cos n+1 π
1 − cos n+1 π = 1 + cos(2 2(n+1) π )
1 − cos(2 2(n+1) π ) = 2 cos 2 2(n+1) π
2 sin 2 2(n+1) π = 1 tg 2 2(n+1) π . Since lim n→+∞ ( 2(n+1) π ) 2 / tg 2 2(n+1) π = 1, we can conclude that µ 2 (A) = O(n 2 ).
It follows that, in order to solve a system Ax = b where the coefficient matrix A is as above, it is convenient to apply the linear systems solver at disposal to the equivalent system E −1 AE −T E T x = E −1 b, i.e. compute ˜ x = E T x.
Proof that a DFT of order n can be reduced to two DFT of order n/2
y =
y 0
y 1
.. . y n−1
=
1 1 · · · 1
1 ω · · · ω n−1 .. . .. . .. . 1 ω n−1 · · · ω (n−1)(n−1)
z 0
z 1
.. . z n−1
= W z
(ω = ω n , W = W n ). Then, for i = 0, . . . , n − 1, y i = P n−1
j=0 ω ij z j = P m−1
p=0 ω i(2p) z 2p + P m−1
p=0 ω i(2p+1) z 2p+1
= P m−1
p=0 (ω 2 ) ip z 2p + ω i P m−1
p=0 (ω 2 ) ip z 2p+1 = P m−1
p=0 ω ip m z 2p + ω i n P m−1
p=0 ω m ip z 2p+1 , (1) (ω n 2 = ω m , m = n/2) and, for i = 0, . . . , m − 1,
y m+i = P m−1
p=0 ω (m+i)p m z 2p + ω n m ω n i P m−1
p=0 ω m (m+i)p z 2p+1
= P m−1
p=0 ω ip m z 2p − ω n i P m−1
p=0 ω ip m z 2p+1 . (2) Formulas (1), i = 0, . . . , m − 1, and (2) in matrix form become:
y 0
y 1
.. . y m−1
= W m
z 0
z 2
.. . z n−2
+
1
ω n
. ..
ω m−1 n
W m
z 1
z 3
.. . z n−1
,
y m
y m+1
.. . y n−1
= W m
z 0
z 2
.. . z n−2
−
1
ω n
. ..
ω n m−1
W m
z 1
z 3
.. . z n−1
.
It follows that
y = W n z =
W m DW m
W m −DW m
z 0
z 2
.. . z n−2
z 1
z 3
.. . z n−1
=
I D
I −D
W m 0
0 W m
Qz
where the permutation matrix Q is defined in an obvious way.
The real part of λ(A)) > 0 vs λ(A h ) > 0, also for A real A ∈ C n×n , A h = 1 2 (A + A ∗ ), A ah = 1 2 (A − A ∗ ).
Definition: A h is p.d. iff z ∗ A h z > 0, ∀ z ∈ C n , z 6= 0.
Then
z ∗ Az = z ∗ A h z + z ∗ A ah z ,
z ∗ Az = (z R − iz I ) T (A R + iA I )(z R + iz I )
= z T R A R z R + z T I A R z I − z T R (A I − A T I )z I
+i[z T R (A R − A T R )z I + z T R A I z R + z T I A I z I ]
where z = z R + iz I , A = A R + iA I . Note that z ∗ A h z is real and z ∗ A ah z is purely immaginary.
Moreover
z T R Az R + z T I Az I ( if A is real) =
(z ∗ Az) R = z ∗ A h z = (z R − iz I ) T [(A R ) S + i(A I ) AS ](z R + iz I )
= z T R (A R ) S z R + z T I (A R ) S z I + 2z T I (A I ) AS z R +i[z T R (A I ) AS z R + z T I (A I ) AS z I ]
= ( if A is real) z T R A s z R + z T I A S z I , (z ∗ Az) R = z T R A R z R + z T I A R z I − z T R (A I − A T I )z I
= ( if A is real) z T R Az R + z T I Az I . Consequences:
1. A h is p.d. iff (z ∗ Az) R > 0, ∀ z ∈ C n , z 6= 0
2. For any eigenvalue λ(A) there exists z, kzk 2 = 1, such that (λ(A)) R = z ∗ A h z ≥ min λ(A h ) [it is the vector z in Az = λ(A)z]
3. For any eigenvalue λ(A h ) there exists y, kyk 2 = 1, such that λ(A h ) = (y ∗ Ay) R [it is the vector y in A h y = λ(A h )y]
4. Assume A real. Then the following assertions are equivalent
• A h = A S is p.d. (z ∗ A h z > 0, ∀ z ∈ C n , z 6= 0)
• ξ T Aξ > 0, ∀ ξ ∈ R n , ξ 6= 0
• ξ T A S ξ > 0, ∀ ξ ∈ R n , ξ 6= 0 (ξ T Aξ = ξ T A S ξ if ξ ∈ R n and A is real) Further results:
A h p.d. (λ(A h ) > 0) ⇒ (λ(A)) R > 0 (consequence of 2.) (λ(A)) R > 0 & A normal ⇒ A h p.d.
(λ(A)) R > 0 does not imply A h p.d. (see Example with a 2 ≥ 4)
There exist non normal matrices A with (λ(A)) R > 0 for which A h is p.d.
(see Example with 0 < a 2 < 4)
Perhaps (λ(A)) R “much” positive would imply λ(A h ) > 0 (A h p.d.) EXAMPLE.
A =
1 a 0 1
, a ∈ R,
[x y]
1 a 0 1
x y
= [x xa + y]
x y
= x 2 + axy + y 2 , x 2 + axy + y 2 > 0, ∀ x, y, (x, y) 6= (0, 0) iff a 2 < 4
i.e. the hermitian part of A is p.d. iff a 2 < 4. Also observe that A is normal iff a = 0. So, a ∈ R, 0 < a 2 < 4 ⇒ A satisfies the coditions: A real, A h = A S p.d., A is not normal, (λ(A)) R = λ(A) = 1 > 0.
We know that A h p.d. implies <(λ(A)) > 0 . . . <(λ(A)) > 0 ⇒ A h p.d. ? If A is normal, yes; otherwise a stronger hypothesis of kind <(λ(A)) > q ≥ 0 is sufficient to obtain the p.d. of A h . The aim is to find a q as small as possible.
A question is “q can be zero for a class of not normal matrices ?”
Let A be a generic n × n matrix. It is known that AX = XT , with X = [x 1 x 2 . . . x n ] unitary and T upper triangular
T =
λ 1 t 12 · · · t 1n
0 λ 2 . .. .. . .. . . .. ... t n−1n
0 · · · 0 λ n
with the eigenvalues of A as diagonal entries (Schur theorem). Equivalently, we have
Ax j = λ j x j + t 1j x 1 + . . . + t j−1j x j−1 , j = 1, . . . , n.
Now let λ(A h ) be a generic eigenvalue of A h = 1 2 (A + A ∗ ), the hermitian part of A. Then there exists y 6= 0 such that
λ(A h ) = y ∗ A h y
y ∗ y = y ∗ Ay
y ∗ y − y ∗ A ah y y ∗ y
and, since λ(A h ) is real and y ∗ A ah y purely immaginary, we have the formula:
λ(A h ) = <(y ∗ Ay)
y ∗ y . (1)
Let us obtain, using (1), an expression of λ(A h ) in terms of the eigenvalues λ i
of A. There exist α i ∈ C for which y = P
i α i x i (recall that y is an eigenvector of the λ(A h ) we are considering), thus y ∗ y = P
i |α i | 2 and y ∗ Ay = P
i α i x ∗ i P
j α j Ax j
= P
i α i x ∗ i P
j α j (λ j x j + P j−1 k=1 t kj x k )
= P
i |α i | 2 λ i + P
i α i x ∗ i P n
j=2 α j ( P j−1 k=1 t kj x k )
= P
i |α i | 2 λ i + P
i α i x ∗ i P n−1 k=1 ( P n
j=k+1 α j t kj )x k
= P
i |α i | 2 λ i + f ({α i } n i=1 , {t ij } i<j ) where
f ({α i } n i=1 , {t ij } i<j ) =
n
X
i=1
α i n
X
j=i+1
α j t ij =
n
X
j=1
α j j−1
X
i=1
α i t ij .
It follows that
λ(A h ) = P
i |α i | 2 <(λ i ) P
i |α i | 2 + <(f ({α i } n i=1 , {t ij } i<j )) P
i |α i | 2 . (2)
Remark. Since AX = XT implies A h X = XT h , T h == 1 2 (T + T ∗ ), we have:
min α
(k), Xα
(k)indip. eigenvectors of A
h<(f ({α
i}
ni=1,{t
ij}
i<j)) P
i
|α
i|
2= min α
(k), α
(k)indip. eigenvectors of T
h<(f ({α
i}
ni=1,{t
ij}
i<j)) P
i
|α
i|
2. Let us see some consequences of (2):
1 If t ij = 0 ∀ i < j (i.e. if A is normal), then min <(λ i ) ≤ λ(A h ) =
P
i |α i | 2 <(λ i ) P
i |α i | 2 ≤ max <(λ i ).
So, if A is normal and <(λ(A)) > 0, then A h is p.d. (all eigenvalues of A h
are positive)
2 Since |<(f )| ≤ |f |, in order to obtain bounds for <(f )/ P
i |α i | 2 in (2) we look for bounds for |f |:
|f | = | P n
i=1 α i P n
j=i+1 α j t ij | ≤ P n
i=1 |α i | P n
j=i+1 |α j ||t ij |
≤ pP n
i=1 |α i | 2 q P n
i=1 ( P n
j=i+1 |α j ||t ij |) 2
≤ pP n
i=1 |α i | 2 q P n
i=1 ( P n
j=i+1 |α j | 2 )( P n
j=i+1 |t ij | 2
≤ pP n
i=1 |α i | 2 pP n
i=1 |α i | 2 q P n
i=1
P n
j=i+1 |t ij | 2
= P n
i=1 |α i | 2 q P n
i=1
P n
j=i+1 |t ij | 2 ,
|f | ≤ max
i<j |t ij |
n
X
i=1
|α i |
n
X
j=i+1
|α j | ≤ max
i<j |t ij |x
n
X
i=1
|α i | 2 , x ≤ n − 1 2 . So we can say that
|<(f )|
P
i |α i | 2 ≤ min{
v u u t
n
X
i=1 n
X
j=i+1
|t ij | 2 , n − 1 2 max
i<j |t ij |}. (3)
Note how the min changes when passing from a matrix T with |t ij | con- stant (for i < j) to a T with t ij = 0 for all but one (i, j), i < j.
The case n = 2:
|<(α 1 α 2 t 12 )|
|α 1 | 2 + |α 2 | 2 ≤ |α 1 ||α 2 ||t 12 |
|α 1 | 2 + |α 2 | 2 ≤ 1 2 |t 12 | Theorem. We have
min <(λ i ) − g({t ij } i<j ) ≤ λ(A h ) ≤ max <(λ i ) + g({t ij } i<j ) whenever P |<(f )|
i
|α
i|
2≤ g({t ij } i<j ). So, if <(λ(A)) > g({t ij } i<j ), then A h is p.d.
Examples of functions g({t ij } i<j ) are given in (3). Notice, however, that the functions g({t ij } i<j ) in Theorem should be easily computable from the entries of A; in fact they should depend directly on the entries a ij :
X
i<j
|t ij | 2 = kT k 2 F − X
i
|λ i | 2 = kAk 2 F − X
i
|λ i | 2 ≤ kAk 2 F − n min λ(A ∗ A) . . .
(Av i = λ i v i , ⇒ v ∗ i A ∗ Av i = |λ i | 2 v ∗ i v i . . . ) Eigenstructure of normal matrices
If A is normal and Ax = λx, then also A ∗ x (besides x) is eigenvector of A corresponding to λ.
If A is normal, Ax = λx and λ is a simple eigenvalue, then there exists µ such that A ∗ x = µx. Note that µ must be an eigenvalue of A. (The eigenvalues of A ∗ are the complex conjugates of the eigenvalues of A).
If A is normal, Ax i = λ i x i , i = 1, . . . , n, and all the λ i are simple eigenvalues (so A has n distinct eigenvalues) , then there exist µ i such that A ∗ x i = µ i x i , i = 1, . . . , n. Note that µ i must be equal to an eigenvalue λ j of A
If A is normal then Ax i = λ i x i , x ∗ i x j = δ ij , 1 ≤ i, j ≤ n, AA ∗ x i = A ∗ Ax i = λ i A ∗ x i ⇒ {A ∗ x i } are eigenvectors of A (as {x i }). Moreover
(A ∗ x i ) ∗ (A ∗ x j ) = x ∗ i AA ∗ x j = x ∗ i A ∗ Ax j = (Ax i ) ∗ (Ax j )
= (λ i x i ) ∗ (λ j x j ) = λ i λ j x ∗ i x j = |λ i | 2 δ ij . So, if A is also non singular, then { λ 1
i
A ∗ x i } are orthonormal eigenvectors of A (as {x i }).
Circulant-type matrix algebras Let
A =
0 a 0 0 0 b c 0 0
. We have:
• A 3 = I iff abc = 1
A 2 =
0 0 ab
bc 0 0 0 ca 0
, A 3 =
abc 0 0
0 bca 0
0 0 cab
.
• the characteristic polynomial of A is λ 3 − abc, so, if abc = 1 then the eigenvalues of A are: 1, ω 3 , ω 2 3 , where ω 3 = e −i2π/3 .
• A is normal iff |a| = |b| = |c|.
• A is unitary iff |a| = |b| = |c| = 1.
By imposing the identity
0 a 0 0 0 b c 0 0
x y z
= ω i
x y z
for i = 0, i = 1, i = 2, and therefore by requiring, respectively, the conditions abc = 1, abc = 1 & ω 3 = 1, abc = 1 & ω 6 = 1, one obtains the equalities:
0 a 0 0 0 b c 0 0
1 bc c
= 1
1 bc c
,
0 a 0 0 0 b c 0 0
1 bcω cω 2
= ω
1 bcω cω 2
,
0 a 0 0 0 b c 0 0
1 bcω 2
cω 4
= ω 2
1 bcω 2
cω 4
.
So, if abc = 1 and Q = diag (1, bc, c)F , where F is the 3 × 3 Fourier matrix, then AQ = Q diag (1, ω 3 , ω 3 2 ).
Note that Q is unitary iff |a| = |b| = |c| = 1 (iff A is unitary).
Exercise. Consider the n × n case.
Proof of Ax i = λ i Ax i , A = [t |i−j| ], A GStrang
Let A be the real symmetric Toeplitz matrix [t |i−j| ] n i,j=1 and A be the GStrang circulant matrix associated with A. Assume n even, set m = n/2 and consider the m × m matrices
S =
1 t · · · t m−1 t
.. . t m−1
, R =
t m t m+1 · · · t n−1 t m−1
.. . t
,
Q =
t m t m−1 · · · t t m−1
.. . t
, J =
0 · · · 0 1
.. . 1 0
0 .. .
1 0 · · · 0
(S, R, Q are Toeplitz). Observe that
A =
S R
R T S
, SJ = JS, RJ = JR T , A =
S Q
Q S
, QJ = JQ.
Obviously we have the identities Ae m = Ae m and Ae m+1 = Ae m+1 . Moreover, if x is the m × 1 vector [t 0 · · · 0 − t m ] T , then
Sx =
t − t n−1 t 2 − t n−2
.. . t m − t m
, QJx =
−t n + t 2
−t n−1 + t 3 .. .
−t m+1 + t m+1
, RJx =
−t n + t n
−t n−1 + t n−1 .. .
−t m+1 + t m+1
= 0.
⇒ Sx ± RJx =
t − t n−1 t 2 − t n−2
.. . t m − t m
, Sx ± QJx =
(t − t n−1 )(1 ± t) (t 2 − t n−2 )(1 ± t)
.. .
(t m − t m )(1 ± t)
⇒ 1
1 ± t (Sx ± QJx) = Sx ± RJx. (1)
⇒ SJJx ± RJx = 1±t 1 (SJJx ± QJx)
⇒ JSJx ± JR T x = 1±t 1 (JSJx ± JQx)
⇒ SJx ± R T x = 1±t 1 (SJx ± Qx)
⇒ ± SJx + R T x = 1
1 ± t (±SJx + Qx). (2)
Equalities (1) and (2) imply
S R
R T S
x
±Jx
= 1
1 ± t
S Q
Q S
x
±Jx
, i.e.
A
t 0 .. . 0
−t m
∓t m 0 .. . 0
±t
= 1
1 ± t A
t 0 .. . 0
−t m
∓t m 0
.. . 0
±t
.
Finally, let y be any m × 1 vector [y 0 y 1 · · · y m−1 ] T satisfying the following two linear equations:
(a) y 0 + y 1 t + . . . + y j t j + . . . + y m−1 t m−1 = 0,
(b) y m−1 + y m−2 t + . . . + y j−1 t m−j + . . . + y 0 t m−1 = 0.
Multiplying (a) by t m , t m−1 , . . . , t and (b) by t, t 2 , . . . , t m , one obtains the identities:
Ry =
t m t m+1 · · · t n−1 t m−1
.. . t
y 0
y 1
.. . y m−1
= 0, R T y =
t m t m−1 · · · t t m+1
.. . t n−1
y 0
y 1
.. . y m−1
= 0
⇒ A
y
±y
=
S R
R T S
y
±y
=
Sy
±Sy
. (1)
On the other side we also have:
t m Sy = t m
1 t · · · t m−1 t
.. . t m−1
y 0
y 1
.. . y m−1
= −
t m t m−1 · · · t t m−1
.. . t
y 0
y 1
.. . y m−1
= −Qy.
In fact, for j = 0, 1, . . . , m − 1 the (j + 1)-row in the left is equal to (use (b) and (a), respectively)
[t m+j · · · t m+1 t m t m+1 · · · t 2m−1−j ]
y 0
y 1
.. . y m−1
= (t m+j y 0 + t m+j−1 y 1 + . . . + t m+1 y j−1 ) + (t m y j + t m+1 y j+1 + . . . + t 2m−1−j y m−1 )
= (−y m−1 − y m−2 t . . . − y j t m−j−1 )t j+1 + (−y 0 − y 1 t . . . − y j−1 t j−1 )t m−j
= −[t m−j · · · t m−1 t m t m−1 · · · t j+1 ]
y 0
y 1
.. . y m−1
which is the (j + 1)-row in the right. Thus A
y
±y
=
Sy ± Qy Qy ± Sy
=
Sy ∓ t m Sy
−t m Sy ± Sy
= (1 ∓ t m )
Sy
±Sy
. (2) From (1) and (2) it follows that
A
y
±y
= 1
1 ∓ t m A
y
±y
, ∀ y |
1 t · · · t m−1 t m−1 · · · t 1
y =
0 0
.
So we have:
- m − 2 eigenvectors of type
y y
corresponding to the eigenvalue 1−t 1
m- m − 2 eigenvectors of type
y
−y
corresponding to the eigenvalue 1+t 1
m- two eigenvectors e m and e m+1 corresponding to the eigenvalue 1 - one eigenvector
x Jx
corresponding to the eigenvalue 1+t 1
- one eigenvector
x
−Jx
corresponding to the eigenvalue 1−t 1
where x = [t 0 · · · 0 − t m ] T and the vectors y are m − 2 linearly independent solutions of the system:
1 t · · · t m−1 t m−1 · · · t 1
y =
0 0
.
We have proved the equality Ax i = λ i Ax i for n (eigenvalues,eigenvectors) (λ i , x i ). Why the x i are linearly independent ?
Let A, B be n × n (non null) matrices with complex entries. Assume that Ax = λBx, Ay = µBy for non null vectors x and y where λ, µ ∈ C, λ 6= µ.
Then x and y are linearly independent.
If B is non singular, then we have the equations B −1 Ax = λx and B −1 Ay = µy, and the thesis follows as in the classic eigenvalue problem case (but B −1 A takes the role of A).
If A is non singular and both λ and µ are non zero (the case of GStrang) then we have the equations A −1 Bx = λ 1 x and A −1 By = µ 1 y, and the proof is very similar to the classic eigenvalue problem case (but A −1 B takes the role of A).
If B is singular, A is non singular and λ = 0 (or µ = 0), then we have the equation Ax = 0 (Ay = 0) which implies x = 0 (y = 0), which is against our hypothesis.
If B and A are singular . . . is the thesis true ?
Proof of eigenvalue minmax representation for a hermitian matrix A (and of the interlace theorem)
(1) Ax i = λ i x i , x ∗ i x j = δ ij , λ 1 ≤ λ 2 ≤ . . . ≤ λ n (recall that normal matrices can be diagonalized via unitary transforms). Let V j ⊂ C n be a generic space of dimension j. Then for any x 6= 0, x ∈ V j ∩ Span {x j , . . . , x n }, we have x = P n
i=j α i x i with α i not all zeroes, and
x
∗Ax
x
∗x = (P (P
iα
ix
i)
∗A(P
kα
kx
k)
i
α
ix
i)
∗(P
kα
kx
k) = (P α (P α
ix
ix
∗i)(P α
∗ kλ
kx
k)
i)(P α
kx
k)
=
P
n i=j|α
i|
2λ
iP
ni=j
|α
i|
2≥ λ j . Thus λ j ≤ max x∈V
j(x ∗ Ax/x ∗ x ).
Moreover, we have (x ∗ j Ax j /x ∗ j x j ) = λ j , and for any x = P j
i=1 β i x i , with β i
not all zeroes,
x ∗ Ax x ∗ x =
P j
i=1 |β i | 2 λ i
P j
i=1 |β i | 2 ≤ λ j . It follows that for V j = Span {x 1 . . . x j } it holds max x∈V
jx
∗Ax x
∗x = λ j .
(2) A, B, C hermitian, α i , β i , γ i their eigenvalues in non-decreasing order, C = A + B: proof of the interlace theorem
γ j = min V
jmax x∈V
jx
∗Cx
x
∗x = min V
jmax x∈V
jx
∗Ax
x
∗x + x x
∗∗Bx x
≤ min V
jmax x∈V
jx
∗Ax x
∗x + β n
= min V
jmax x∈V
jx
∗Ax
x
∗x + β n = α j + β n , α j = min V
jmax x∈V
jx
∗Ax
x
∗x = min V
jmax x∈V
jx
∗Cx
x
∗x − x x
∗∗Bx x
≤ min V
jmax x∈V
jx
∗Cx x
∗x − β 1
= min V
jmax x∈V
jx
∗Cx
x
∗x − β 1 = γ j − β 1 . Deflation
Le A be a n × n matrix. Denote by λ i , i = 1, . . . , n, the eigenvalues of A and by y i the corresponding eigenvectors. So, we have Ay i = λ i y i , i = 1, . . . , n.
Assume that λ 1 , y 1 are given and that λ 1 6= 0. Choose w ∈ C n such that w ∗ y 1 6= 0 (given y 1 choose w not orthogonal to y 1 ) and set
W = A − λ 1
w ∗ y 1
y 1 w ∗ .
It is known that the eigenvalues of W are
0, λ 2 , . . . , λ j , . . . , λ n
i.e. they are the same of A except λ 1 which is replaced with 0. Let w 1 , w 2 , . . ., w j , . . ., w n be the corresponding eigenvectors (W w 1 = 0, W w j = λ j w j
j = 2, . . . , n). Is it possible to obtain the w j from the y j ? First observe that
Ay 1 = λ 1 y 1 ⇒ W y 1 = 0 : w 1 = y 1 . (a) Then, for j = 2, . . . , n,
W y j = Ay j − λ 1
w ∗ y 1 y 1 w ∗ y j = λ j y j − λ 1
w ∗ y j
w ∗ y 1 y 1 . (1) If we impose y j = w j + cy 1 , j = 2, . . . , n, then (1) becomes,
W w j + cW y 1 = λ j w j + cλ j y 1 − λ 1 w
∗w
jw
∗y
1y 1 − cλ 1 y 1
= λ j w j + y 1 [cλ j − λ 1 w
∗w
jw
∗y
1− λ 1 c]
So, if λ j 6= λ 1 and
w j = y j − λ 1
λ j − λ 1
w ∗ w j w ∗ y 1
y 1 , (2)
then W w j = λ j w j . If, moreover, λ j 6= 0, then w ∗ y j = w ∗ w j + λ λ
1j
−λ
1w ∗ w j ⇒ w ∗ y j = w ∗ w j λ
jλ
j−λ
1⇒ w ∗ w j = λ
jλ −λ
j 1w ∗ y j . So, by (2), for all j ∈ {2 . . . n} | λ j 6= λ 1 , 0 :
Ay j = λ j y j ⇒ W (y j − λ λ
1j
w
∗y
jw
∗y
1y 1 ) = λ j (y j − λ λ
1j
w
∗y
jw
∗y
1y 1 ) : w j = y j − λ λ
1j