A new matrix algebra ?
Let T j (x) be the Chebycev polynomials
T 0 (x) = 1, T 1 (x) = x, T k (x) = 2T k −1 (x)x − T k −2 (x), k = 2, 3, . . . , and recall their alternative representation in [−1, 1]
T k (x) = cos(k arccos x), x ∈ [−1, 1].
Some of them: T 2 (x) = 2x 2 − 1, T 3 (x) = 4x 3 − 3x, T 4 (x) = 8x 4 − 8x 2 + 1, . . . Let X be a n × n matrix with the property that the set L of all polynomials in X has dimension n (i.e. maximum dimension, since by Cayley-Hamilton theorem if p X (λ) is the characteristic polynomial of X then p X (X) = 0). Usually an X with such property is called non-derogatory. Consider the Chebycev basis of L:
J 1 = T 0 (X) = I, J 2 = T 1 (X) = X,
J k+1 = T k (X) = 2T k −1 (X)X − T k −2 (X), k = 2, 3, . . . , n − 1.
We are interested in cases where A = P
k a k J k means v T A = [a 1 a 2 · · · a n ] for some vector v.
For instance, choose
X =
0 1 0 a b c 0 d e
. Then
J 1 = I, J 2 = X =
0 1 0 a b c 0 d e
,
J 3 = T 2 (X) = 2T 1 (X)X − T 0 (X) = 2X 2 − I
= 2
a b c
ba a + b 2 + cd bc + ce da db + ed dc + e 2
− I.
Note that the first row of J 3 is [2a − 1 2b 2c] and thus is equal to [0 0 1] iff a = 1 2 , b = 0, c = 1 2 . So, for these particular choices of a, b, c we have e T 1 J k = e T k , k = 1, 2, 3, i.e. A = P 3
k=1 a k J k means e T 1 A = [a 1 a 2 a 3 ]. Moreover, since a = 1 2 , b = 0, c = 1 2 imply
J 3 =
0 0 1
0 d e
d 2ed d − 1 + 2e 2
,
we can say that the counter-identity matrix J is in L if d = 1 and e = 0. We rewrite the J k in this case:
J 1 =
1 0 0 0 1 0 0 0 1
, J 2 =
0 1 0
1/2 0 1/2
0 1 0
, J 3 =
0 0 1 0 1 0 1 0 0
.
Note that the eigenvalues of J 2 = X must be real and distinct (see the theory
on eigenstructure of tridiagonal matrices). It is easy to obtain them: −1, 0, 1.
Let us generalize this example. Let X be a generic non-derogatory tridi- agonal matrix with e T 2 = [0 1 0 · · · 0] as first row. Then the following result holds:
If e T 1 J s = e T s , s = 1, . . . , k, then e T 1 J k+1 = e T k+1 iff the k-th row of X is [· · · 0 1 2 0 1 2 0 · · ·]
In fact, e T
1 J k+1 = e T 1 T k (X) = e T 1 (2T k −1 (X)X − T k −2 (X)) = e T 1 (2J k X − J k −1 )
= 2e T k X − e T k −1 = [· · · 0 2x kk −1 − 1 2x kk 2x kk+1 0 · · ·].
So, for
X =
0 1 0 · · · 0
1
2 0 1 2 . .. . ..
1
2 0 1 2 a b
we have e T 1 J k = e T k , k = 1, . . . , n, i.e. A = P n
k=1 a k J k means e T 1 A = [a 1 a 2 . . . a n ].
Moreover, J is a polynomial in X or, equivalently, J n = J iff a = 1, b = 0 (proof:
use the identity JX = XJ).
Now assume a = 1, b = 0. In this case the J k have the form:
J k =
1
1
2 0 1 2 . . .
. . . 1
2 . .. ...
1
2 0 . . . . .. ... 1 2
1 2
1
2 0 1 2
. .. 0 1 2
. . . 1
.
Moreover, the eigenvalues of J 2 = X must be real and distinct (see the theory on eigenstructure of tridiagonal matrices). By using this fact one can obtain them in explicit form. They are cos(jπ/(n − 1)), j = 0, . . . , n − 1.
Let us see why. Since T n −1 (X) = J n = J we have the equality T n −1 (X) 2 = I.
Now, if Xv = λv then v = T n −1 (X) 2 v = T n −1 (λ) 2 v , in other words the eigenvalues λ of X must be zeros of the algebraic equation T n −1 (x) 2 = 1. For x ∈ [−1, 1] this equation becomes cos 2 ((n − 1) arccos x) = 1 whose zeros are:
x = cos(2kπ/(n − 1)) and x = cos((2k + 1)π/(n − 1)), k ∈ Z.
Eigenvectors of X ? Look for x j such that x 2 = cos n jπ −1 x 1 1
2 (x 1 + x 3 ) = cos n jπ −1 x 2
· · ·
1
2 (x n −2 + x n ) = cos n jπ −1 x n −1
x n −1 = cos n jπ −1 x n
The eigenvectors of 1 and −1 are [1 1 · · · 1] and [1 − 1 1 · · · (−1) n −1 ],
respectively. For n = 3 and n = 4:
0 1 0
1 2 0 1 2 0 1 0
1 1 1
1 0 −1
1 −1 1
=
1 1 1
1 0 −1
1 −1 1
1
0
−1
,
0 1 0 0
1
2 0 1 2 0 0 1 2 0 1 2
0 0 1 0
1 1 1 1
1 1 2 − 1 2 −1 1 − 1 2 − 1 2 1
1 −1 1 −1
=
1 1 1 1
1 1 2 − 1 2 −1 1 − 1 2 − 1 2 1
1 −1 1 −1
1
1 2
− 1 2
−1
For a generic n ?
How Chebycev polynomials arise
Set y(x) = x n − p n −1 (x) where p n −1 is the unique degree-(n − 1) poly- nomial solving the minimum problem min p ∈P
n−1max [−1,1] |x n − p(x)|. If µ = max [−1,1] |y(x)| then y(x) assumes the values µ and −µ alternately in n + 1 successive points {x i } n i=0 of [−1, 1], −1 ≤ x 0 < x 1 < x 2 < . . . < x n −1 < x n ≤ 1 (see min-max approximation theory []). Obviously y 0 (x i ) = 0, i = 1, . . . , n − 1, whereas y 0 (x 0 )y 0 (x n ) 6= 0 since y 0 (x) is a polynomial of degree n − 1. Thus x 0 = −1, x n = 1. Consider now the function y(x) 2 − µ 2 . It is zero in all the x i and its derivative, 2y(x)y 0 (x), is zero in x 1 , x 2 , . . . , x n −1 . It follows that y(x) 2 −µ 2 = c(x 2 −1)y 0 (x) 2 for some real constant c. Noting that the coefficient of x 2n is on the left 1 and on the right cn 2 we conclude that
n 2
1 − x 2 = y 0 (x) 2
µ 2 − y(x) 2 , n
√ 1 − x 2 = ± y 0 (x) pµ 2 − y(x) 2 , y(x) = µ cos(n arccos x + c). Finally, y(1) = ±µ ⇒ c = kπ ⇒
y(x) = x n − p n −1 (x) = ±µ cos(n arccos x) =: ±µT n (x).
Properties of Chebycev polynomials
• T k (λ)| [−1,1] = cos(k arccos λ)
• T k (λ) = 1 2 [(λ − √
λ 2 − 1) k + (λ + √
λ 2 − 1) k ]
• |T k (λ)| ≤ 1, λ ∈ [−1, 1]
• T k (cos iπ k ) = (−1) i , i = 0, 1, . . . , k
• T k (cos (2j+1)π 2k ) = 0, j = 0, 1, . . . , k − 1
• T k (λ) ≥ λ k , λ ≥ 1
• T k ( λ+1 λ −1 ) = 1 2 [( √ √ λ+1 λ
−1 ) k + ( √ √ λ λ+1 −1 ) k ], λ > 1
• T k ( λ+1 λ −1 ) > 1 2 ( √ √ λ+1
λ −1 ) k , λ > 1
• 0 < a < b and t k (λ) = T k ((b + a − 2λ)/(b − a))/T k ((b + a)/(b − a)) imply min max
[a,b] |p k (λ)| = max
[a,b] |t k (λ)| = 1
T k ((b + a)/(b − a))
where the min is taken on all polynomials of type a k λ k + . . . + a 1 λ + 1,
a k 6= 0
• T k+j (λ) + T |k−j| (λ) = 2T k (λ)T j (λ)
• R 1
−1
√ 1
1−λ
2T k (λ)T j (λ)dλ = π, π/2, 0, j = k = 0, j = k > 0, j 6= k Chebycev as characteristic polynomials
Write a semi-infinite matrix X = [x ij ] +∞ i,j=1 with the property: for all n the characteristic polynomial p n (λ) of the upper left n × n submatrix of X (X n ) is T n ∗ (λ) = T n (λ)/(2 n −1 ) where T n (λ) is the degree n Chebycev polynomial defined by ().
For the following choices of X,
X =
0 1/ √
2 0 0 · · · 1/ √
2 0 1/2 0 · · ·
0 1/2 0 . ..
0 0 . .. ...
.. . .. .
, X =
0 1 0 0 · · ·
1/2 0 1/2 0 · · · 0 1/2 0 . ..
0 0 . .. ...
.. . .. .
,
X =
0 1 0 0 · · ·
1/2 0 1 0 · · · 0 1/4 0 . ..
0 0 . .. ...
.. . .. .
,
we have p 0 (λ) = 1, p 1 (λ) = λ = T 1 (λ), p 2 (λ) = λ 2 − 1 2 = 1 2 (2λ 2 − 1) = 1 2 T 2 (λ), p 3 (λ) = λ(λ 2 − 1 2 ) − 1 4 λ = λ 3 − 3 4 λ = 1 4 (4λ 3 − 3λ) = 1 4 T 3 (λ), . . ., p n (λ) = λp n −1 (λ) − 1 4 p n −2 (λ) = 2
n1
−1T n (λ), . . ..
Proof: By induction:
1
2
n−1T n (λ) = 2
n1
−1(2T n −1 (λ)λ − T n −2 (λ)) = 2
n1
−1(2 · 2 n −2 p n −1 (λ)λ − 2 n −3 p n −2 (λ))
= p n −1 (λ)λ − 2 −2 p n −2 (λ) = p n (λ)
Notice that the third choice of X implies e T 1 p k (X n ) = e T k+1 , k = 0, . . . , n − 1 . . . for all three choices of X n we refer to L, the set of all polynomials in X n , as Chebycev algebras . . . Each X is equal to DXD −1 for another X; since p(DXD −1 ) = Dp(X)D −1 from the eigenvectors v of one algebra we have easily the eigenvectors of the other algebras, they are Dv
SVD of A ∈ C n ×n and how to compute the singular values of A ∈ R n ×n If A is a n × n normal matrix then there exist matrices U, D, U unitary, D = diag (λ i ) with |λ 1 | ≥ |λ 2 | ≥ . . . ≥ |λ n | ≥ 0, such that A = UDU ∗ . It follows that A admits the following singular value decomposition
A = U diag (|λ i |) diag (e i arg(λ
i) )U ∗ = U σV ∗ , σ = diag (σ i ), σ i = |λ i |, V = U diag (e −i arg(λ
i) ).
However, any n × n matrix A admits a singular value decomposition, i.e. there exist unitary matrices U, V and σ = diag (σ i ) with σ 1 ≥ σ 2 ≥ . . . ≥ σ n ≥ 0 such that A = U σV ∗ . The σ i are the singular values of A.
Example. For n = 1 we have a 11 = 1 · |a 11 |(e −i arg(a
11) ) ∗ .
Proof . . .
By knowing the SVD of A we can do many things. In particular we have (0) | det(A)| = Q n
i=1 σ i
(1) σ r+1 = kA − P r
i=1 σ i u i v ∗
i k 2 = min{kA − Bk 2 : rank(B) ≤ r}
(2) q P n
j=r+1 σ 2 j = kA − P r
i=1 σ i u i v ∗
i k F = min{kA − Bk F : rank(B) ≤ r}
(3) σ n = kA − P n −1
i=1 σ i u i v ∗ i k 2 = min{kA − Bk 2 : rank(B) ≤ n − 1} = min{kA − Bk 2 : det(B) = 0}
(4) kAk 2 = σ 1 , kAk F = pP n i=1 σ 2 i
(5) σ 1 2 , σ 2 2 , . . . , σ n 2 are the eigenvalues of A ∗ A
(6) det(A) 6= 0 ⇒ kA −1 k 2 = 1/σ n , µ 2 (A) = σ 1 /σ n , µ 2 (A ∗ A) = µ 2 (A) 2 (7) If λ i are the eigenvalues of A, then σ n ≤ |λ i | ≤ σ 1 . If A is normal then
σ i = |λ i |
(8) If σ 1 ≥ . . . ≥ σ k > 0 = σ k+1 = . . . = σ n then the kernel and the image of A can be represented as follows: {x ∈ C n : Ax = 0} = Span {v k+1 , . . . , v n }, {Ax : x ∈ C n } = Span {u 1 , . . . , u k }
How to compute U, σ, V such that A = U σV ∗ ? An algorithm that works for A real (note that in this case U, V can be chosen real unitary (orthonormal)) consists in the following two steps (1) and (2):
Step (1). Transform A into a bidiagonal matrix
QAZ = B =
a 1 b 1
0 a 2 b 2
. .. ...
. .. b n −1
a n
by using orthonormal transforms Q and Z.
For n = 1: 1 · a 11 · 1 = a 11 .
For n = 2, if α = a 11 /pa 2 11 + a 2 21 , β = −a 21 /pa 2 11 + a 2 21 , then
α −β
β α
a 11 a 12
a 21 a 22
1 0 0 1
=
αa 11 − βa 21 αa 12 − βa 22
βa 11 + αa 21 βa 12 + αa 22
=
pa 2 11 + a 2 21 a
11√ a
12+a
21a
22a
211+a
2210 −a
21√ a
12+a
11a
22a
211+a
221
. For n > 2 . . . example, n = 4:
A =
, S 12 T A =
0
,
S 13 T (S 12 T A) =
0
0
, S 14 T (S 13 T S 12 T A) =
0
0
0
,
(S 14 T S 13 T S 12 T A)S 23 =
0
0
0
0
, (S 14 T S 13 T S 12 T AS 23 )S 24 =
0 0
0
0
0
,
S 23 0T (S 14 T S 13 T S 12 T AS 23 S 24 ) =
0 0
0
0 0
0
, S 0T 24 (S 23 0T S 14 T S 13 T S 12 T AS 23 S 24 ) =
0 0
0
0 0
0 0
,
(S 24 0T S 23 0T S T 14 S 13 T S 12 T AS 23 S 24 )S 34 =
0 0
0 0
0 0
0 0
,
S 34 0T (S 24 0T S 23 0T S 14 T S 13 T S 12 T AS 23 S 24 S 34 ) =
0 0
0 0
0 0
0 0 0
= B.
Thus B = QAZ, Q = S 34 0T S 24 0T S 23 0T S 14 T S 13 T S 12 T , Z = S 23 S 24 S 34 (Q T = Q −1 , Z T = Z −1 ). Note that the Givens transformations (plane rotations)
S ij = S ji =
I
α β
I
−β α
I
, α 2 + β 2 = 1,
used to liquidate the not-on-bidiagonal-part entries are chosen so that they leave unchanged the previously posed zeros. Moreover, S ij is used to liquidate (i, j), i > j, when multiplying on the left, and S i+1j is used to liquidate (i, j), i < j −1, when multiplying on the right.
Exercise (Elisa Sallicandro). Transform the matrix
A =
1 0 2 0
−1 1 0 0
0 0 −1 3
4 0 0 1
into a bidiagonal matrix B.
Step (2). Set A 1 := B and define a sequence of matrices {A j } +∞ j=1 via the rule: for k = 2, 4, 6, . . .
A k −1 =
a k 1 −1 b k 1 −1 0 a k 2 −1 . ..
. .. b k n −1 −1 a k n −1
→ A k = A k −1 Z k −1 =
a k 1 b k 1 a k 2
. .. ...
b k n −1 a k n
(Z k −1 is the product of the n − 1 plane rotations used to liquidate the entries (i, i + 1) of A k −1 ),
A k → A k+1 = Q k A k =
a k+1 1 b k+1 1 0 a k+1 2 . ..
. .. b k+1 n −1 a k+1 n
(Q k is the product of the n − 1 plane rotations used to liquidate the entries (i + 1, i) of A k ).
Then b j i → 0 if j → +∞ (i = 1, . . . , n − 1). (Question: one should also prove that the a j i , i = 1, . . . , n, have non-negative limit)
Proof: Let us show that b j n −1 → 0 if j → +∞.
The euclidean norm of the i-th column of A k is equal to the euclidean norm of the i-th column of A k+1 . Thus
a 1 (k) 2 + b 1 (k) 2 = a 1 (k + 1) 2
b 2 (k) 2 + a 2 (k) 2 = a 2 (k + 1) 2 + b 1 (k + 1) 2
· · ·
a n (k) 2 = a n (k + 1) 2 + b n −1 (k + 1) 2
Moreover, the euclidean norm of the n-th row of A k −1 is equal to the euclidean norm of the n-row of A k . Thus
kBk 2 F = kA k+1 k 2 F ≥ a n (k + 1) 2 = a n (k − 1) 2 − b n −1 (k + 1) 2 − b n −1 (k) 2
= . . . = a n (1) 2 − P k+1
j=2 b n −1 (j) 2 ≥ 0.
But this implies P +∞
j=1 b n −1 (j) 2 < +∞, and we have the thesis.
(b j n −2 → 0: The euclidean norm of the (n − 1)-th row of A k −1 is equal to the euclidean norm of the (n − 1)-th row of A k . Thus
a n −1 (k + 1) 2 = a n −1 (1) 2 +
k
X
j=1
b n −1 (j) 2 −
k+1
X
j=2
b n −2 (j) 2
⇒ P +∞
j=1 b n −2 (j) 2 < +∞ ⇒ b n −2 (j) → 0 if j → ∞. . . .) Step (2) for n = 2 (convergence).
A k −1 =
a k 1 −1 b k 1 −1 0 a k 2 −1
→ A k = A k −1 Z k −1 =
a k 1 0 b k 1 a k 2
→ A k+1 = Q k A k =
a k+1 1 b k+1 1 0 a k+1 2
, kA k e i k 2 = kA k+1 e i k 2 ⇒
a 1 (k) 2 + b 1 (k) 2 = a 1 (k + 1) 2 , a 2 (k) 2 = b 1 (k + 1) 2 + a 2 (k + 1) 2 ,
kA T k −1 e 2 k 2 = kA T k e 2 k 2 ⇒ a 2 (k − 1) 2 = b 1 (k) 2 + a 2 (k) 2 . Thus
kBk 2 F = kA k+1 k 2 F ≥ a 2 (k + 1) 2 = a 2 (k − 1) 2 − b 1 (k) 2 − b 1 (k + 1) 2
= a 2 (1) 2 − P k+1
j=2 b 1 (j) 2 ⇒ b 1 (j) → 0.
Step (2) for n = 2 (details and example). Given an upper triangular (bidi- agonal) 2 × 2 matrix
A =
a 1 b 1
0 a 2
,
write an algorithm to compute its singular values σ 1 , σ 2 . (Notice however that σ 1 , σ 2 are simply the squaring roots of the eigenvalues of
a 1 0 b 1 a 2
a 1 b 1
0 a 2
=
|a 1 | 2 a 1 b 1
b 1 a 1 |b 1 | 2 + |a 2 | 2
, i.e.
q 1
2 (|a 1 | 2 + |a 2 | 2 + |b 1 | 2 ± p(|a 1 | 2 + |a 2 | 2 + |b 1 | 2 ) 2 − 4|a 1 | 2 |a 2 | 2 ) ).
Solution. Set A 1 = A and, for k = 2, 4, . . . Z k −1 =
α β
−β α
, α = a 1
pa 2 1 + b 2 1 , β = −b 1
pa 2 1 + b 2 1 . Then
A k = A k −1 Z k −1 =
a 1 α − b 1 β a 1 β + b 1 α
−a 2 β a 2 α
=
"
pa 2 1 + b 2 1 0
a
2b
1√ a
21+b
21a
2a
1√ a
21+b
21# . Now set
Q k =
α −β
β α
, α = pa 2 1 + b 2 1 q
a 2 1 + b 2 1 + a a
222b
21 1+b
21, β =
−a
2b
1√ a
21+b
21q
a 2 1 + b 2 1 + a a
222b
21 1+b
21.
Then
A k+1 = Q k A k = Q k A k −1 Z k −1
=
αpa 2 1 + b 2 1 − β √ a
2b
1a
21+b
21−β √ a
2a
1a
21+b
21βpa 2 1 + b 2 1 + α √ a
2b
1a
21+b
21α √ a
2a
1a
21+b
21
=
q a 2 1 + b 2 1 + a a
222b
21 1+b
21a
22a
1b
1(a
21+b
21) r
a
21+b
21+
a2a22b211+b2 1
0 r a
2a
1a
21+b
21+
a22b21a21+b21
.
The algorithm
10 a 1 =? a 2 =? b 1 =?
20 a new 1 = q
a 2 1 + b 2 1 + a a
222b
21 1+b
2130 a new 2 = r a
2a
1a
21+b
21+
a2a22b211+b2 1
40 b new 1 = a
22a
1b
1(a
21+b
21) r
a
21+b
21+
a2a22b211+b2 1