• Non ci sono risultati.

The Fourier matrix, circulants, and fast discrete transforms Consider the following n × n matrix

N/A
N/A
Protected

Academic year: 2021

Condividi "The Fourier matrix, circulants, and fast discrete transforms Consider the following n × n matrix"

Copied!
44
0
0

Testo completo

(1)

Rome-Moscow school of Matrix Methods and Applied Linear Algebra

Rome, Sept 19 - Oct 3, 2010, Moscow, Oct 10 - Oct 24, 2010 Carmine Di Fiore In these notes the concepts of circulants, τ and Toeplitz matrices, Hessenberg al- gebras and displacement decompositions, spaces of class V and best least squares fits on such spaces, are introduced and investigated. As a consequence of the re- sults presented, the choice of matrices involved in displacement decompositions, the choice of preconditioners in solving linear systems and the choice of Hes- sian approximations in quasi-Newton minimization methods, become possible in wider classes of low complexity matrix algebras.

The Fourier matrix, circulants, and fast discrete transforms Consider the following n × n matrix

P 1 =

 0 1

0 1 . ..

1

1 0

 .

Let ω ∈ C. Note that

P 1

 1 1 1

=

 1 1 1

= 1

 1 1 1

 , P 1

 1 ω ω n−1

=

 ω ω 2 ω n−1

1

= ω

 1 ω ω n−1

 ,

where the latter identity holds if ω n = 1. More in general, if ω n = 1, we have the following vectorial identities

P 1

 1 ω j ω (n−1)j

=

 ω j ω (n−1)j

1

= ω j

 1 ω j ω (n−1)j

 , j = 0, 1, . . . , n − 1,

or, equivalently, the following matrix identity P 1 W = W D 1ω

n−1

,

D 1ω

n−1

=

 1

ω

ω n−1

 , W =

1 1 1 1

1 ω ω j ω n−1

1 ω n−1 ω (n−1)j ω (n−1)(n−1)

 .

Proposition. If ω n = 1 and if ω j 6= 1 for 0 < j < n, then W W = nI.

proof: since |ω| = 1, ω = ω −1 , we have [W W ] ij = [W W ] ij = P n

k=1 [W ] ik [W ] kj = P n

k=1 ω (i−1)(k−1) ω (k−1)(j−1)

= P n

k=1 ω (k−1)(j−i) = P n

k=1 (ω j−i ) k−1 .

(2)

Thus [W W ] ij = n if i = j, and [W W ] ij = 1−(ω 1−ω

j−ij−i

)

n

= 0 if i 6= j (note that the assumption ω j 6= 1 for 0 < j < n is essential in order to make 1 − ω j−i 6= 0).

By the result of the above Proposition, we can say that the following (sym- metric) Fourier matrix

F = 1

√ n W is unitary, i.e. F F = I.

Exercise. Prove that F 2 = JP 1 where J is the permutation matrix Je k = e n+1−k , k = 1, . . . , n (J is usually called anti-identity).

The matrix identity satisfied by P 1 and W can be of course rewritten in terms of F , P 1 F = F D 1ω

n−1

, thus we obtain the equality

P 1 = F D 1ω

n−1

F

which states that the Fourier matrix diagonalizes the matrix P 1 , or, more precisely, that the columns of the Fourier matrix form a system of n unitar- ily orthonormal eigenvectors for the matrix P 1 with corresponding eigenvalues 1, ω, . . . , ω n−1 .

But if F diagonalizes P 1 , then it diagonalizes all polynomials in P 1 : P 1 k−1 = F D k−1

n−1

F ,

P n

k=1 a k P 1 k−1 = F P n

k=1 a k D k−1

n−1

F

= F

 P n

k=1 a k

P n

k=1 a k ω k−1

P n

k=1 a k ω (n−1)(k−1)

 F

= F d(W a)F = √

nF d(F a)F

where by d(z) we mean the diagonal matrix whose diagonal entries are z 1 , z 2 , . . . , z n . Let us investigate the matrices P 1 k−1 , k = 1, . . . , n, and the matrix P n

k=1 a k P 1 k−1 in the case n = 4:

P 1 0 = I, P 1 1 =

0 1 0 0 0 0 1 0 0 0 0 1 1 0 0 0

 , P 1 2 =

0 0 1 0 0 0 0 1 1 0 0 0 0 1 0 0

 , P 1 3 =

0 0 0 1 1 0 0 0 0 1 0 0 0 0 1 0

 ,

P 1 4 = P 1 3 P 1 = P 1 T P 1 = I = P 1 0 ,

4

X

k=1

a k P 1 k−1 =

a 1 a 2 a 3 a 4

a 4 a 1 a 2 a 3

a 3 a 4 a 1 a 2

a 2 a 3 a 4 a 1

= √

4F d(F a)F , F = 1

√ 4

1 1 1 1

1 ω ω 2 ω 3 1 ω 2 ω 4 ω 6 1 ω 3 ω 6 ω 9

 ,

ω 4 = 1, ω j 6= 1, 0 < j < 4 (ω = e ±i2π/4 ).

Note that, for n generic, we have the identities e T 1 P 1 k−1 = e T k , k = 1, . . . , n,

and P 1 n = I (prove them!). So, the set C = {p(P 1 )} of all polynomials in P 1

(3)

is spanned by the matrices J k = P 1 k−1 ; the particular polynomial P n

k=1 a k J k is simply denoted by C(a). Note that C(a) is the matrix of C with first row a T :

C(a) =

n

X

k=1

a k J k =

a 1 a 2 a n−1 a n

a n a 1 a n−1

a 3 a 2

a 2 a 3 a n a 1

= F d(F T a)d(F T e 1 ) −1 F −1 .

C is known as the space of circulant matrices.

Exercise. (i) Repeat all, starting from the n × n matrix

P −1 =

0 1

0 1 . ..

1

−1 0

and arriving to the (−1)-circulant matrix whose first row is a T , a ∈ C n :

C −1 (a) =

a 1 a 2 a n−1 a n

−a n a 1 a n−1

−a 3 a 2

−a 2 −a 3 −a n a 1

 .

(ii) Let T be a Toeplitz n × n matrix, i.e. T = (t i−j ) n i,j=1 , for some t k ∈ C.

Show that T can be written as the sum of a circulant and of a (−1)-circulant, that is, T = C(a) + C −1 (b), a, b ∈ C n .

Why circulant matrices can be interesting in the applications of linear alge- bra? The main reason is in the fact that the matrix-vector product C(a)z can be computed in at most O(n log 2 n) arithmetic operations (whereas, usually, a matrix-vector product requires n 2 multiplications).

Proposition FFT. Given z ∈ C n , the complexity of the matrix-vector product F z is at most O(n log 2 n). Such operation is called discrete Fourier transform (DFT) of z. As a consequence, the matrix-vector product C(a)z is computable by two DFTs (after the preprocessing DFT F a).

proof: since ω (i−1)(k−1) is the (i, k) entry of W and z k is the k entry of z ∈ C n , we have

(W z) i = P n

k=1 ω (i−1)(k−1) z k = P n/2

j=1 ω (i−1)(2j−2) z 2j−1 + P n/2

j=1 ω (i−1)(2j−1) z 2j

= P n/2

j=1 (ω 2 ) (i−1)(j−1) z 2j−1 + P n/2

j=1 ω (i−1)(2(j−1)+1) z 2j

= P n/2

j=1 (ω 2 ) (i−1)(j−1) z 2j−1 + ω i−1 P n/2

j=1 (ω 2 ) (i−1)(j−1) z 2j .

Note that ω is in fact a function of n, i.e. the right notation for ω should be ω n . Then ω 2 = ω 2 n is such that (ω n 2 ) n/2 = 1 and (ω n 2 ) i 6= 1 0 < i < n/2; in other words ω n 2 = ω n/2 . So, we have the identities

(W n z) i =

n/2

X

j=1

ω n/2 (i−1)(j−1) z 2j−1 + ω n i−1

n/2

X

j=1

ω n/2 (i−1)(j−1) z 2j , i = 1, 2, . . . , n. (?)

(4)

It follows that, for i = 1, . . . , n 2 ,

(W n z) i = (W n/2

 z 1

z 3

z n−1

) i + ω n i−1 (W n/2

 z 2

z 4

z n

 ) i .

Moreover, by setting i = n 2 + k, k = 1, . . . , n 2 , in (?), we obtain (W n z)

n2

+k = P n/2

j=1 ω n/2

n2

(j−1) ω n/2 (k−1)(j−1) z 2j−1 + ω n

n2

ω k−1 n P n/2

j=1 ω n/2

n2

(j−1) ω (k−1)(j−1) n/2 z 2j

= P n/2

j=1 ω (k−1)(j−1) n/2 z 2j−1 − ω n k−1 P n/2

j=1 ω n/2 (k−1)(j−1) z 2j

= (W n/2

 z 1

z 3

z n−1

) k − ω n k−1 (W n/2

 z 2

z 4

z n

) k , k = 1, . . . , n 2 .

(ω n

n2

= −1; think ω = e ±i2π/n ). Thus

W n z =

"

I D

n 2 −1 n

I −D

n2 −1 n

# 

W n/2 0 0 W n/2

 Qz,

D 1ω

n 2 −1 n

=

 1

ω n

ω n

n2

−1

 , Q =

 1 0 0 1

1 0 0 1

0 0 0 1

0 1

 .

(??)

If c n denotes the complexity of the matrix-vector product F n z, then, by the previous formula,

c n ≤ 2c n/2 + rn, r constant.

But this implies c n = O(n log 2 n). The proof of the last assertion is left to the reader.

Of course, any time a n × n matrix U, well defined for all n, satisfies for n even an identity of the type

U n = h

sparse matrix i 

U n/2 0 0 U n/2

 h

permutation matrix i , the matrix-vector product U n z can be computed in at most O(n log 2 n) arith- metic operations. The above identity is verified for at least 10 matrices U , the Fourier transform and its (−1) version, and the eight Hartley-type transforms.

Note, however, that there are also other 16 discrete transforms of complexity O(n log 2 n), sine-type and the cosine-type transforms. See [],[].

Exercise G. Prove that the n × n matrix G = G n defined by G ij = 1

√ n (cos (2i + 1)(2j + 1)π

2n + sin (2i + 1)(2j + 1)π

2n ), i, j = 0, . . . , n − 1,

(5)

is symmetric, persymmetric, real, unitary, and satisfies the identity:

G n = 1

√ 2

 R + R

−R − J R + J

  G n/2 0 0 G n/2



Q, R ± = D c ± D s J, (J n 2 × n 2 anti-identity) for some suitable n 2 × n 2 diagonal matrices D c , D s . Prove, moreover, that each row of G n has at least a zero entry when n = 2 + 4s (this is like to say, we will see, that the space {Gd(z)G : z ∈ C n } is not a h-space for such values of n); and that, instead, for all other n, [G n ] 1k 6= 0 ∀ k (i.e., for all other n, the space {Gd(z)G : z ∈ C n } is a 1-space).

Exercise. Prove that the space C −1 S + JC −1 SK , C −1 S symmetric n × n (−1)- circulants, C −1 SK skewsymmetric (−1)-circulants, is a commutative matrix alge- bra (a matrix A is skewsymmetric if A T = −A).

The sine matrix and the (commutative) algebra of τ matrices Consider the n × n matrix

P 0 + P 0 T =

 0 1 1 0 1 0 1 0

1 1 0

 ,

and set J 1 = I, and J 2 = P 0 +P 0 T . Note that e T 1 J 1 = e T 1 , e T 1 J 2 = e T 2 . Moreover, since

(P 0 + P 0 T ) 2 =

1 0 1 0 2 0 1 1 0 2

1

1 2 0 0 1

=

0 0 1 0 1 0 1 1 0 1

1

1 1 0 1 0 0

 + I,

we have e T 1 ((P 0 + P 0 T ) 2 − I) = [0 0 1 0 · · · 0] = e T 3 . Set J 3 = (P 0 + P 0 T ) 2 − I = J 2 (P 0 + P 0 T ) − J 1 ; then e T 1 J 3 = e T 3 .

More in general, set J i+1 = J i (P 0 +P 0 T )−J i−1 , i = 2, 3, . . . , n−1. The matrix J i+1 is a polynomial in P 0 + P 0 T of degree i with the property e T 1 J i+1 = e T i+1 . proof: assume e T 1 J j = e T j , j = 1, . . . , i; then

e T 1 J i+1 = e T 1 (J i (P 0 +P 0 T )−J i−1 ) = (e T i (P 0 +P 0 T ))−e T i−1 = (e T i−1 +e T i+1 )−e T i−1 = e T i+1 . Since J 1 , J 2 , . . . , J n are linearly independent, we can say that they span the set {p(P 0 + P 0 T )} of all polynomials in the matrix P 0 + P 0 T (use the Cailey- Hamilton theorem). We call such set τ . Note that the matrices of τ are deter- mined once their first row is known; with the symbol τ (a) we denote the matrix of τ whose first row is a T , i.e. the matrix P

k a k J k .

Let us find a useful representation of τ and, in particular, of τ (a). First

(6)

observe that the following vectorial equalities hold:

 1 1 0 1

1

1 1

sin n+1 sin n+1 2jπ sin n+1 njπ

= 2 cos jπ n + 1

sin n+1 sin n+1 2jπ sin n+1 njπ

, j = 1, . . . , n.

Such n equalities can be rewritten as a simple matrix identity (P 0 +P 0 T )S = SD where S is the matrix

S ij =

r 2

n + 1 sin ijπ

n + 1 , i, j = 1, . . . , n,

and D is the diagonal matrix with diagonal entries D jj = 2 cos n+1 . Note that the matrix S, called sine matrix, is real, symmetric and unitary (prove it!).

Remark. Let F 2(n+1) be the Fourier matrix of order 2(n + 1). Then the sine matrix S satisfies the following relation:

i I − F 2(n+1) 2 F 2(n+1) =

0 0 T 0 0 T

0 S 0 −SJ

0 0 T 0 0 T

0 −JS 0 JSJ

(note that F 2(n+1) 2 is a permutation matrix). As a consequence, a sine transform can be computed by performing a discrete Fourier transform.

So, the columns of the sine matrix S form a system of unitarily orthonormal eigenvectors for the matrix P 0 + P 0 T . In other words, the unitary matrix S diagonalizes P 0 + P 0 T and, of course, diagonalizes any polynomial in P 0 + P 0 T , i.e. any τ matrix:

P 0 + P 0 T = SDS, (P 0 + P 0 T ) k = SD k S, τ = {p(P 0 + P 0 T )} = { P n

k=1 a k (P 0 + P 0 T ) k−1 : a k ∈ C} = {Sd(z)S : z ∈ C n }.

In particular, it is clear that the matrix of τ with first row a T is τ (a) =

n

X

k=1

a k J k = Sd(S T a)d(S T e 1 ) −1 S −1 .

The latter formula states that matrix-vector products involving τ matrices have complexity at most O(n log 2 n).

Proposition PC. Given X ∈ C n×n generic, we have {p(X)} ⊂ {A ∈ C n×n : AX = XA},

dim({p(X)}) ≤ n ≤ dim{A ∈ C n×n : AX = XA}

and if one equality holds then the other equality holds too. So, X is non derogatory if and only if dim({p(X)}) = n = dim{A ∈ C n×n : AX = XA}.

The above Proposition suggests the following further representation of the space τ :

τ = {A ∈ C n×n : A(P 0 + P 0 T ) = (P 0 + P 0 T )A}.

(7)

The fact that any matrix of τ must commute with P 0 + P 0 T is equivalent to require that the following n 2 cross-sum conditions hold:

a i,j−1 + a i,j+1 = a i−1,j + a i+1,j , i, j = 1, . . . , n

where we have set a 0,j = a n+1,j = a i,0 = a i,n+1 = 0, i, j = 1, . . . , n. We can use such conditions in order to write down the generic τ matrix whose first row is [a 1 a 2 · · · a n ]. For example, for n = 4, we can say that

τ (a) =

a 1 a 2 a 3 a 4

a 2 a 1 + a 3 a 2 + a 4 a 3

a 3 a 2 + a 4 a 1 + a 3 a 2

a 4 a 3 a 2 a 1

, J 1 = I, J 2 =

0 1 0 0 1 0 1 0 0 1 0 1 0 0 1 0

 ,

J 3 =

0 0 1 0 0 1 0 1 1 0 1 0 0 1 0 0

 , J 4 =

0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 0

, J 2 −J 4 =

0 1 0 −1

1 0 0 0

0 0 0 1

−1 0 1 0

 ,

and so on.

Exercise. Prove that for n even the matrix J 2 is invertible, and, if possible, compute the inverse.

solution: We know that if J 2 is invertible, then J 2 −1 ∈ τ (from the fact that J 2 commutes with P 0 + P 0 T it follows that also J 2 −1 commutes with P 0 + P 0 T !).

Thus J 2 −1 = τ (z) for some z ∈ C n . Note that the matrix identity τ (z)J 2 = I is equivalent to the vectorial identity z T J 2 = e T 1 . So, for example, for n = 4 we have the condition

[z 1 z 2 z 3 z 4 ]

0 1 0 0 1 0 1 0 0 1 0 1 0 0 1 0

= [1 0 0 0],

which yields z 1 = 0, z 2 = 1, z 3 = 0, z 4 = −1, and thus

J 2 −1 =

0 1 0 −1

1 0 0 0

0 0 0 1

−1 0 1 0

= J 3 − J 4 .

The proof for n = 6, 8, . . . is left to the reader.

Exercise Ttau. Let T be a symmetric Toeplitz n×n matrix, i.e. T = (t |i−j| ) n i,j=1 , for some t k ∈ C. Show that T = A + B where A is a τ matrix of order n and

B =

0 0 0

0 R 0

0 0 0

 , R ∈ τ, R ∈ C (n−2)×(n−2) .

Exercise. Write down rank one τ matrices (try first n = 2, n = 3, n = 4, n = 5, n = 6, . . .)

Hessenberg algebras

(8)

Let X be a lower Hessenberg 3 × 3 matrix

X =

a 11 b 1 0 a 21 a 22 b 2

a 31 a 32 a 33

 .

We now show that the space of all polynomials in X is spanned by three matrices J 1 , J 2 , J 3 such that e T 1 J k = e T k , k = 1, 2, 3, provided that X ii+1 6= 0, i = 1, 2.

As J 1 we take the identity, J 1 = X 0 = I. Let us define J 2 :

X − a 11 I =

0 b 1 0

a 21 a 22 − a 11 b 2

a 31 a 32 a 33 − a 11

 , b 1 6= 0 ⇒

J 2 = 1

b 1 (X − a 11 I) =

0 1 0

a 21 /b 1 (a 22 − a 11 )/b 1 b 2 /b 1

a 31 /b 1 a 32 /b 1 (a 33 − a 11 )/b 1

 .

Note that e T 1 J 2 = e T 2 . Then, let us define J 3 :

J 2 2 =

a 22 /b 1 (a 22 − a 11 )/b 1 b 2 /b 1 

 , b 2 6= 0 ⇒

J 3 = b 1

b 2

(J 2 2 − a 22

b 1 I − a 22 − a 11 b 1

J 2 ).

Note that e T 1 J 3 = e T 3 . Finally, Cailey-Hamilton theorem yields the thesis, {p(X)} = Span {J 1 , J 2 , J 3 }.

The following Proposition generalizes to a generic n the above remarks. For a detailed proof see [].

Proposition. Let X be a lower Hessenberg n × n matrix. Then the space H X of all polynomials in X

H X = {

n

X

k=1

α k X k−1 : α k ∈ C}

is spanned by n matrices J 1 , . . . , J n such that e T 1 J k = e T k , k = 1, . . . , n, provided that X ii+1 6= 0, i = 1, . . . , n − 1; in such case H X is called Hessenberg algebra, and for any a ∈ C n there is a unique matrix in H X with first row a T which is denoted by H X (a), i.e. H X (a) = P

k a k J k .

Of course, by Proposition PC, any Hessenberg algebra H X admits also the following representation

H X = {A ∈ C n×n : AX = XA}.

Until now we have seen two examples of Hessenberg algebras, ξ-circulants

ξ 6= 0 (X = P ξ ) and tau matrices (X = P 0 + P 0 T ). Both can be simultaneously

(9)

diagonalized by a suitable matrix. An example of Hessenberg algebra whose matrices cannot be simultaneously diagonalized is H P

0

, the space of all upper triangular Toeplitz matrices. The matrix H P

0

(a) is displayed here below

H P

0

(a) =

a 1 a 2 a n

a 1 a 2

. .. ...

. .. a 2

a 1

 .

Even the matrix P 0 , i.e. the matrix generating the space, is not diagonalizable.

(We shall see, however, that H P

0

can be embedded in the space of 2n × 2n circulants, which are diagonalizable).

Note that when X = P 0 or, more in general, when X = P ξ , the matrices J k

are simply the powers of X, i.e. J k = X k−1 . For example, for n = 3

J 1 = I, J 2 = X =

0 1 0 0 0 1 ξ 0 0

 , J 3 = X 2 =

0 0 1 ξ 0 0 0 ξ 0

 .

Hessenberg algebras make up a subclass of commutative matrix algebras of the class of 1-spaces defined here below

Definition. L ⊂ C n×n is said to be a 1-space if L = Span {J 1 , . . . , n} with J k such that e T 1 J k = e T k , k = 1, . . . , n.

An example of 1-space L which is not a Hessenberg algebra is the following L = {

 A JB

JB A



: A, B n 2 × n

2 circulants}.

One can easily prove that L is a non commutative matrix algebra. An example of 1-space which is not a matrix algebra is the set of all n×n symmetric Toeplitz matrices (see the next section).

Toeplitz linear systems and displacement decompositions

A n × n Toeplitz matrix is a matrix of the form T = (t i−j ) n i,j=1 . In applications often one has to solve Toeplitz linear systems T x = b.

For example, here below is a 3 × 3 Toeplitz matrix:

T =

t 0 t −1 t −2 t 1 t 0 t −1 t 2 t 1 t 0

 .

An example of Toeplitz matrix T is the coefficient matrix of the linear system arising when solving, by finite differences or by finite elements, the boundary value differential problem −u = f , u(a) = α, u(b) = β. In such case T is symmetric and its first row is [2 − 1 0 · · · 0]. Another example, important is applied probability, is T = (t |i−j| ) n i,j=1 with |t| (t ∈ C) less than 1.

Exercise. The vector space of all n ×n symmetric Toeplitz matrices is a 1-space, being (t |i−j| ) n i,j=1 equal to the sum P n

k=1 t k−1 J k where the J k are the symmetric

(10)

Toeplitz matrices with first row e T k . Prove that such 1-space is not a matrix algebra.

In the framework of displacement theory it is possible to obtain some de- compositions of T −1 involving matrices from Hessenberg algebras (or from more general commutative h-spaces) of the type

T −1 =

α

X

k=1

M k N k , 2 ≤ α ≤ 4.

Usually, the matrices M k and N k , appearing in such formulas, can be multi- plied by a vector in O(n log 2 n) arithmetic operations; thus fast direct solvers of Toeplitz linear system naturally arise. Here below there is one example of such formulas:

T −1 = L 1 U 1 + L 2 U 2 . (GS) The L j and U j are suitable lower and upper triangular Toeplitz matrices, i.e.

elements or transposed of elements from the Hessenberg algebra H P

0

.

Remark. By the Gohberg-Semencul formula (GS), if the L j and U j are known (a way to obtain them is indicated in []), then the matrix-vector product T −1 b can be computed in at most O(n log 2 n) arithmetic operations. That is, assuming preprocessing on T , the complexity of the problem of solving any Toeplitz system T x = b is at most O(n log 2 n).

proof: it is enough to prove that any Toeplitz matrix (in particular the triangular ones) can be multiplied by a vector by means of a finite number of discrete Fourier transforms. The latter result is immediate if we observe that any n × n Toeplitx matrix can be embedded into a (2n + k) × (2n + k), k ≥ 0, circulant matrix; for example, if n = 3 we have

T =

t 0 t −1 t −2 t 1 t 0 t −1 t 2 t 1 t 0

 , C =

t 0 t −1 t −2 t 2 t 1

t 1 t 0 t −1 t −2 t 2

t 2 t 1 t 0 t −1 t −2 t 2 t 1 t 0 t −1 t −2 t −2 t 2 t 1 t 0 t −1 t −1 t −2 t 2 t 1 t 0

(k = 0). It is clear that the vector T · z is the first part of the vector C

 z 0

 . The above representation (GS) for the inverse of a Toeplitz matrix, which can be very useful in order to solve Toeplitz linear systems efficiently [], follows from the displacement decomposition formula stated in the following theorem Theorem DD. Let X be a lower Hessenberg n × n matrix. Assume that X ij = X i+1,j+1 , i, j = 1, . . . , n − 1 (X has Toeplitz structure), and that b = X ii+1 6= 0, and consider the (commutative) Hessenberg algebra H X generated by X (note that H X is a 1-space).

Assume that A ∈ C n×n is such that AX − XA = P α

m=1 x m y T m . Then bA = −

α

X

m=1

H P

0

(˜ x m ) T H X (y m ) + bH X (A T e 1 ) (DD)

(11)

where, for z ∈ C n , H P

0

(z) is the upper triangular Toeplitz matrix with first row z T , H X (z) is the matrix of H X with first row z T , and ˜ z is the vector [0 z 1 · · · z n−1 ] T .

Note: Besides DD several other displacement decompositions hold, which can be general like DD, i.e. representing generic matrices A, or specialized for centrosymmetric A (see []). Such decompositions yield formulas for the inverses of Toeplitz, Toeplitz plus Hankel, and Toeplitz plus Hankel-like matrices useful in order to solve Toeplitz plus Hankel-like linear systems. Recall that a Hankel matrix is nothing else a matrix of the form JT where T is Toeplitz (the well known Hilbert matrix is an example of Hankel matrix).

In order to prove Theorem DD, the following Lemma is fundamental.

Lemma []. Let L be a commutative 1-space of n × n matrices, i.e. L = { P

k α k J k : α k ∈ C}, with J k ∈ C n×n such that e T 1 J k = e T k , J k J s = J s J k . Let X be an element of L, and assume that A ∈ C n×n is such that AX −XA = xy T . Then x T L(y) T = 0 T .

proof: note that the equality J k J s = J s J k implies e T 1 J k J s = e T 1 J s J k , e T k J s = e T s J k ∀ s, k, thus

x T L(y) T e r = x T ( P

k y k J k ) T e r = x T P

k y k J k T e r

= x T P

k y k J r T e k = x T J r T P

k y k e k

= x T J r T y = P

i,j x i y j [J r T ] ij

= P

i,j x i y j [J r ] ji = P

i,j [AX − XA] ij [J r ] ji

= P

i [(AX − XA)J r ] ii = P

i [(AJ r )X − X(AJ r )] ii

= tr ((AJ r )X) − tr (X(AJ r )) = 0

(recall that the two matrices M N and N M , M, N ∈ C n×n , have the same characteristic polynomial, even if (in case det(M ) = det(N ) = 0) M N and N M might be not similar each other).

We now report a draft of the proof of Theorem DD (for a more detailed proof see []). In order to obtain the equality (DD), which is of the type

bA = E + bH X (A T e 1 ), it is enough to prove that

EX − XE = (bA)X − X(bA), (∗ ∗ ∗)

and to observe that the first row of E is null. In fact, the above equality implies (bA − E)X − X(bA − E) = 0, and thus bA − E ∈ H X . The Lemma, applied for L = H X , is fundamental in proving (***).

A matrix algebra which is not a 1-space: the class of spaces in V

Remember that a matrix A is said symmetric if A T = A (a ji = a ij ), skewsym-

metric if A T = −A (a ji = −a ij ) and persymmetric if A T = JAJ (a ji =

a n+1−i,n+1−j ).

(12)

Consider a 6 × 6 symmetric (−1)-circulant matrix A ∈ C −1 S ,

A =

a 0 a 1 a 2 0 −a 2 −a 1

a 1 a 0 a 1 a 2 0 −a 2

a 2 a 1 a 0 a 1 a 2 0 0 a 2 a 1 a 0 a 1 a 2

−a 2 0 a 2 a 1 a 0 a 1

−a 1 −a 2 0 a 2 a 1 a 0

 ,

a 6 × 6 skewsymmetric (−1)-circulant matrix B ∈ C −1 SK , and the matrix JB,

B =

0 b 1 b 2 b 3 b 2 b 1

−b 1 0 b 1 b 2 b 3 b 2

−b 2 −b 1 0 b 1 b 2 b 3

−b 3 −b 2 −b 1 0 b 1 b 2

−b 2 −b 3 −b 2 −b 1 0 b 1

−b 1 −b 2 −b 3 −b 2 −b 1 0

 , JB =

−b 1 −b 2 −b 3 −b 2 −b 1 0

−b 2 −b 3 −b 2 −b 1 0 b 1

−b 3 −b 2 −b 1 0 b 1 b 2

−b 2 −b 1 0 b 1 b 2 b 3

−b 1 0 b 1 b 2 b 3 b 2

0 b 1 b 2 b 3 b 2 b 1

 .

The vector space γ of all matrices of the type A + JB has dimension equal to 6 (recall that dim(A + JB) = dim(A) + dim(JB) − dim(A ∩ JB)).

We now show that there is not a basis {J k } for γ such that e T 1 J k = e T k , k = 1, 2, 3, 4, 5, 6, i.e. γ is not a 1-space. Note that

e T 1 (A + JB) = [(a 0 − b 1 )(a 1 − b 2 )(a 2 − b 3 )(−b 2 )(−a 2 − b 1 )(−a 1 )], so, the equality e T 1 (A + JB) = e T 2 is satisfied if and only if

a 0 − b 1 = 0, a 1 − b 2 = 1, a 2 − b 3 = 0, −b 2 = 0, −a 2 − b 1 = 0, −a 1 = 0.

Since we have both the conditions b 2 = 0 and b 2 = −1, a matrix J 2 ∈ γ such that e T 1 J 2 = e T 2 cannot exist.

However, there exists a basis {J k } of γ satisfying the equalities (e 1 +e 6 ) T J k = e T k , k = 1, 2, 3, 4, 5, 6. For example, a matrix J 2 ∈ γ with the property (e 1 + e 6 ) T J 2 = e T 2 is obtained as follows. Note that

e T 6 (A + JB) = [(−a 1 )(−a 2 + b 1 )(b 2 )(a 2 + b 3 )(a 1 + b 2 )(a 0 + b 1 )], so, for the sum of the first and of the sixth rows of A + JB, we obtain the formula

(e 1 + e 6 ) T (A + JB)

= [(a 0 − b 1 − a 1 )(a 1 − b 2 − a 2 + b 1 )(a 2 − b 3 + b 2 )(−b 2 + a 2 + b 3 )(−a 2 − b 1 + a 1 + b 2 )(−a 1 + a 0 + b 1 )].

Thus the condition (e 1 + e 6 ) T (A + JB) = e T 2 is satisfied if and only if the following system of equations has solution

a 0 − b 1 − a 1 = 0, a 1 − b 2 − a 2 + b 1 = 1, a 2 − b 3 + b 2 = 0, −b 2 + a 2 + b 3 = 0,

−a 2 − b 1 + a 1 + b 2 = 0, −a 1 + a 0 + b 1 = 0,

and such system has the unique solution a 0 = a 1 = 1 2 , a 2 = 0, b 2 = b 3 = − 1 2 ,

(13)

b 1 = 0. The matrix J 2 ∈ γ such that (e 1 + e 6 ) T J 2 = e T 2 is displayed here below:

J 2 =

1

2 1 1 2 1 2 0 − 1 2

1 1 1 0 0 0

1

2 1 1 2 1 2 0 − 1 2

1

2 0 1 2 1 2 0 − 1 2

0 0 0 0 0 0

1 2 0 − 1 21 2 0 1 2

 .

Analogously, one obtains the other J k such that (e 1 + e 6 ) T J k = e T k :

J 1 =

1 1 2 1 2 1 2 1 2 0

1

2 1 1 2 1 2 0 − 1 2 1

2 1

2 1 0 − 1 2 − 1 2 1

2 1

2 0 0 − 1 2 − 1 2 1

2 0 − 1 2 − 1 2 0 − 1 2

0 − 2 11 2 − 1 2 − 1 2 0

 , J 3 =

1 2

1

2 1 0 − 1 2 − 1 2 1

2 1 1 2 1 2 0 − 1 2

1 1 2 1 2 1 2 1 2 0 0 1 2 1 2 1 2 1 2 0

1 2 0 1 2 1 2 0 1 2

1 2 − 1 2 0 0 1 2 1 2

 ,

(a 1 = a 2 = 0, a 0 = 1 2 , b 2 = b 3 = − 1 2 , b 1 = − 1 2 ), (a 0 = a 1 = a 2 = 1 2 , b 1 = b 2 = 0, b 3 = − 1 2 ),

J 4 =

1 2

1

2 0 0 − 1 21 2

1

2 0 1 2 1 2 0 − 1 2 0 1 2 1 2 1 2 1 2 0 0 1 2 1 2 1 2 1 2 1

1 2 0 1 2 1 2 1 1 2

1 21 2 0 1 1 2 1 2

 , J 5 =

1

2 0 − 1 2 − 1 2 0 − 1 2

0 0 0 0 0 0

1 2 0 1 2 1 2 0 1 2

1 2 0 1 2 1 2 1 1 2

0 0 0 1 1 1

1 2 0 1 2 1 2 1 1 2

 ,

(a 0 = a 1 = a 2 = 1 2 , b 1 = b 2 = 0, b 3 = 1 2 ), (a 0 = a 1 = 1 2 , a 2 = 0, b 2 = b 3 = 1 2 , b 1 = 0),

J 6 =

0 − 1 2 − 1 2 − 1 2 − 1 2 0

− 2 1 0 − 1 2 − 1 2 0 1 2

− 2 11 2 0 0 1 2 1 2

− 2 11 2 0 1 1 2 1 2

− 2 1 0 1 2 1 2 1 1 2 0 1 2 1 2 1 2 1 2 1

 .

(a 1 = a 2 = 0, a 0 = 1 2 , b 1 = b 2 = b 3 = 1 2 ).

Of course γ = Span {J 1 , J 2 , J 3 , J 4 , J 5 , J 6 } (the J k are linearly independent!).

The matrix P

k a k J k is denoted by γ(a). Note that (e 1 + e 6 ) T γ(a) = a T , so γ(a) is called the matrix of γ whose (e 1 + e 6 )-row is a T .

More in general, one can easily prove that the set γ = C −1 S + JC −1 SK , C −1 S n×n symmetric (−1)-circulants, C −1 SK n×n skewsymmetric (−1)-circulants, is a vector space of dimension n, and is a commutative matrix algebra. Moreover, it is a 1-space if and only if n is one of the integers {3, 4, 5, 7, 8, 9, 11, 12, 13, 15, . . .};

for the remaining values of n, i.e. for n = 2 + 4s s ∈ Z, no row of a matrix

A + JB of γ determines A + JB, that is, there is no index h for which there

exists a basis {J k } of γ with the property e T h J k = e T k . Instead, for all n the sum

of the first and of the nth row of A + JB determines A + JB, i.e. there exists

a basis {J k } of γ with the property (e 1 + e n ) T J k = e T k .

(14)

The matrices of γ can be simultaneously diagonalized by a fast discrete transform. More precisely, for any value of n the following equality holds:

γ = {Gd(z)G −1 : z ∈ C n }, G ij = √ 1

n

 cos (2i−1)(2j−1)π

2n + sin (2i−1)(2j−1)π 2n

 , i, j = 1, . . . , n

(see []). Note that the matrix G is real, symmetric, persymmetric and uni- tary. The fact that the matrix-vector product Gz can be computed in at most O(n log 2 n) arithmetic operations follows from the representation of G n , G n := G, stated in Exercise G.

Note that for n = 6 the matrix G has ten zeros among its entries, and that these zeros are positioned as follows:

G =

0

0 0 0

0 0

0 0 0

0

 .

Thus each of the vectors G T e k , k = 1, . . . , 6, has at least one zero entry, i.e.

the matrix d(G T e k ) −1 is never well defined. The latter assertion is yet true whenever n = 2 + 4s s ∈ Z. In other words, γ can be represented as

γ = {Gd(G T z)d(G T e k ) −1 G −1 : z ∈ C n }.

if and only n 6= 2 + 4s s ∈ Z (for such n one can choose k = 1).

However, as one may guess from the above discussion (detailed, for n = 6), it can be easily shown that [G T (e 1 + e n )] k 6= 0 ∀ k and ∀ n. So, we have the following representation for γ

γ = {Gd(G T z)d(G T (e 1 + e n )) −1 G −1 : z ∈ C n }

valid for all n (also for n = 2 + 4s where d(G T e k ) are not invertible). The latter formula confirms the fact that the matrices of γ are uniquely defined by the sum of their 1st and nth rows; in particular, since the sum of the first and of the nth row of the matrix Gd(G T a)d(G T (e 1 + e n )) −1 G −1 is equal to a T , we can say that such matrix is exactly the matrix γ(a) (already defined above for n = 6), i.e. the matrix of γ whose (e 1 + e n )-row is a T .

It is now natural to introduce a class of spaces which include (besides the 1-spaces, like the Hessenberg algebras) also spaces like the algebra γ.

Definition. A subset L of C n×n is said to be a space in V, if L = Span {J 1 , . . . , J n } with v T J k = e T k for some v ∈ C n . Given z ∈ C n , the matrix P

k z k J k ∈ L is denoted by L(z). Since v T L(z) = z T , L(z) is called the matrix of L whose v-row is z T .

Example. L = sd M = {Md(z)M −1 : z ∈ C n } is in V since

L = Span {J 1 , . . . , J n }, J k = M d(M T e k )d(M T v) −1 M −1 , for any vector v such that [M T v ] i 6= 0 ∀ i, and

v T J k = v T M d(M T e k )d(M T v) −1 M −1 = e T k M d(M T v)d(M T v) −1 M −1 = e T k .

(15)

Note that L(z) = Md(M T z)d(M T v) −1 M −1 .

A matrix X ∈ C n×n is said to be non derogatory if the condition p(X) = 0, p polynomial, implies ∂p ≥ n. Note that by the Cailey-Hamilton theorem the characteristic polynomial of X is null in X. So, X is non derogatory if and only if the set {p(X)} of all polynomials in X has dimension n. In [] it is stated the following result, which proves that V is a wide class of spaces of matrices.

Theorem ND. Let X be a n × n matrix with complex entries. Then X is non derogatory if and only if {p(X)} := {p(X) : p polynomials} is in V.

The Proposition here below collects several properties of the spaces in V.

They will be used (in the next section) in order to prove important properties of the best least squares fit in L of a matrix A, holding for all spaces L of a particular subclass of V.

Proposition V (properties of spaces in V). Let L be a space in V, i.e. L = Span {J 1 , . . . , J n } with v T J k = e T k for some v ∈ C n .

(1) If X ∈ L and v T X = 0 T , then X = 0, thus v T X = v T Y , X, Y ∈ L, implies X = Y

proof: 0 T = v T X = v T P

k α k J k = P

k α k e T k = [α 1 · · · α n ] ⇒ α k = 0 ∀ k.

(2) If J i X ∈ L, X ∈ C n×n , then J i X = P

k [X] ik J k

proof: there exist α k such that J i X = P

k α k J k ; multiplying the latter identity by v T we have

e T i X = v T J i X = X

k

α k e T k = [α 1 · · · α n ]

which implies α k = [X] ik .

(3) Let P k ∈ C n be defined by e T s P k = e T k J s (note that e T k = v T J k = P

i v i e T i J k = P

i v i e T k P i = e T k P

i v i P i , and thus P v k P k = I). Then the follow- ing assertions are equivalent:

(i) L is closed under matrix multiplication (ii) J i J j = P

k [J j ] ik J k ∀ i, j (iii) P r J j = J j P r ∀ r, j (iv) P k P r = P

i [P r ] ki P i

proof: The implication (i) ⇒ (ii) follows from (2) for X = J j . The opposite implication is obvious. The fact that conditions (ii) and (iii) are equivalent follows by taking the (r, s) entry of the equality in (ii):

e T i P r J j e s = e T r J i J j e s = X

k

[J j ] ik [J k ] rs = X

k

[J j ] ik [P r ] ks = [J j P r ] is .

The fact that conditions (iii) and (iv) are equivalent follows from the identities:

[P k P r ] ms = [J m P r ] ks = [P r J m ] ks = [J k J m ] rs = X

i

[J k ] ri [J m ] is = X

i

[P r ] ki [P i ] ms .

(3.5) If I ∈ L, then P

i v i J i = I and v T P k = e T k , i.e. also the space

Span {P 1 , . . . , P n } is in V (with the same v)

(16)

proof: both I and P

i v i J i have v T as v-row, and both, by assumption, are in L, so they must be equal; moreover, we have

v T P k = X

i

v i e T i P k = X

i

v i e T k J i = e T k X

i

v i J i = e T k .

(4) If L is closed under matrix multiplication, then L(L(z) T z 0 ) = L(z 0 )L(z), ∀ z, z 0 ∈ C n

proof: since L is closed, the matrix L(z 0 )L(z) is in L; moreover, its v-row is z 0T L(z); the thesis follows from the fact that also L(L(z) T z 0 ) is the matrix of L whose v-row is z 0T L(z).

(5) Assume I ∈ L and L closed under matrix multiplication. Then X ∈ L is non singular if and only if ∃ z ∈ C n such that z T X = v T ; in this case X −1 = L(z)

proof: by inspecting the kth row of the matrix L(z)X, and applying properties (3) and (3.5), we obtain the identities

e T k L(z)X = e T k

X

s

z s J s X = X

s

z s e T s P k X = z T P k X = z T XP k = v T P k = e T k ,

or, equivalently, the equality L(z)X = I, which implies that X is non singular and X −1 = L(z).

The best least squares fit to A in L ⊂ C n×n

Given a subspace L ⊂ C n×n and a n × n matrix A, it is well defined L A , the projection on L of A. In the following theorem we state some assumptions on L (in particular we consider n-dimensional subspaces of C n×n ) which assure that L A is hermitian whenever A is, and that the eigenvalues of L A are bounded by those of A. Such assumptions imply that L ∈ V.

Theorem L A . Assumptions:

L ⊂ C n×n , I ∈ L, L = Span {J 1 , . . . , J n } with J k such that J i H J j =

n

X

k=1

[J k ] ij J k , i, j = 1, . . . , n. (∗)

A ∈ C n×n .

L A ∈ L, kA − L A k F ≤ kA − Xk F , ∀ X ∈ L (such matrix L A is well defined since C n×n is a Hilbert space with respect the norm k · k F induced by the inner product (A, B) F = P

ij a ij b ij and L is a subspace of C n×n ).

Thesis: If A = A H , then L A = L H A and min λ(A) ≤ λ(L A ) ≤ max λ(A).

Note: if A is real symmetric, then L A is in general hermitian; it is real symmetric under the further condition that L is spanned by real matrices (prove it!).

Note: we shall see that the hypotheses of Theorem L A are satisfied by spaces of

the type {Md(z)M −1 : z ∈ C n } if M H M is diagonal and its diagonal entries are

positive; however, the same hypotheses can be satisfied also by non commutative

spaces (we shall see an example, for others see []), so also in the latter cases we

can say that the conclusions of Theorem L A hold.

(17)

Applications of Theorem L A . If the conditions of Theorem L A are satisfied, then L A is positive definite (i.e. L A = L H A and z ∈ C n z 6= 0 ⇒ z H L A z positive) whenever A is positive definite. So, in order to solve the linear system

Ax = b, A positive definite we can solve the equivalent system

L −1 A Ax = L −1 A b

whose coefficient matrix has real and positive eigenvalues, often better dis- tributed than those of A (this results, for example when solving Ax = b it- eratively, in less iterations).

Moreover, if the conditions of Theorem L A are satisfied and L is spanned by real matrices, then then L A is real positive definite (i.e. L A = L T A , L A ∈ R n×n , and z ∈ C n z 6= 0 ⇒ z H L A z positive) whenever A is real positive definite. That is, the following implications hold

B k real positive definite ⇒ L B

k

real positive definite ⇒

ϕ(L B

k

, s k , y k ) real positive definite ⇒ L ϕ(L

Bk

,s

k

,y

k

) real positive definite

(provided s T k y k is positive), and thus both the S and NS LQN search directions, d k+1 = −ϕ(L B

k

, s k , y k ) −1 ∇f(x k+1 ) and d k+1 = −L −1 ϕ(L

Bk

,s

k

,y

k

) ∇f(x k+1 ), are well defined descent directions in x k+1 for the function f : R n → R (see []

for the definitions of B k , s k , y k , ϕ, S and NS LQN).

proof (of Theorem L A ): The matrix L A = P

s α s J s is uniquely defined by the following condition

(X, A − L A ) F = 0, X ∈ L or, equivalently, by the n conditions (J k , A − P

s α s J s ) F = 0, k = 1, . . . , n, which can be rewritten as follows

n

X

s=1

(J k , J s ) F α s = (J k , A) F , k = 1, . . . , n.

In other words we have the formula L A = X

s

[B −1 c ] s J s , B ks = (J k , J s ) F , c k = (J k , A) F , k, s = 1, . . . , n.

Remark. B is positive definite, i.e. B = B and z H Bz > 0 ∀ z ∈ C n z 6= 0.

proof: B ks = (J k , J s ) F = (J s , J k ) F = B sk , that is, B is a hermitian matrix.

Moreover, since 0 < ( P

s z s J s , P

s z s J s ) F = P

k,s z k z s (J k , J s ) F = z H Bz when- ever z 6= 0, the matrix B is also positive definite.

Remark. Let v k ∈ C be such that I = P

k v k J k (such v k exist because I ∈ L).

Then the vector v whose entries are the v k satisfies the equalities v T J k = e T k ,

thus L ∈ V and all results stated for spaces in V hold for our space L.

(18)

proof: Multiply (*) by v i and sum on i:

v i J i H J j = X

k

v i [J k ] ij J k , J j = X

k

( X

i

v i [J k ] ij )J k = X

k

(v H J k e j )J k .

This implies v H J k e j = 0 if k 6= j and v H J k e j = 1 if k = j, i.e. v T J k = e T k . As an immediate consequence of the above two Remarks, we have that L A

is the matrix of L whose v-row is (B −1 c) T , L A = X

s

[B −1 c] s J s = L(B −1 c),

and, moreover,

L A = L(B −1 c ) = L((B H ) −1 c ) = L((B −1 ) T c).

Remark. L is closed under conjugate transposition.

proof: multiply (*) by v j and sum on j:

J i H v j J j = X

k

v j [J k ] ij J k , J i H = X

k

( X

j

v j [J k ] ij )J k ⇒ J i H ∈ L.

The latter Remark yields part of the thesis of Theorem L A , because it implies that L H A ∈ L, and this fact together with the equalities

kA − L A k F = kA H − L H A k F = kA − L H A k F

(remember that our A is hermitian!) and the unicity of the best approximation of A, yield the identity L A = L H A . In other words, under our conditions on L the projection on L of a hermitian matrix is hermitian too.

Remark. L is closed under matrix multiplication (L is a matrix algebra).

proof: the set {J i H } forms an alternative basis for L (prove it!), thus there exist z i (s) ∈ C such that J s = P

i z i (s) J i H . Multiply (*) by z i (s) and sum on i, z i (s) J i H J j = X

k

z i (s) [J k ] ij J k , J s J j = X

k

( X

i

z (s) i [J k ] ij )J k ,

to observe that J s J j ∈ L.

Remark. B = P

k tr (J k )J k , thus B ∈ L, and, since B is non singular (it is positive definite!), by the result V (5) also the matrix B −1 is in L.

proof: by equality (*) we have:

B ij = (J i , J j ) F = P

r,t [J i ] rt [J j ] rt = P

r,t [J i H ] tr [J j ] rt

= P

t [J i H J j ] tt = P

t

P

k [J k ] ij [J k ] tt = P

k tr (J k )[J k ] ij .

The latter two Remarks, together with V (4), let us rewrite again L A as follows

L A = . . . = L((B −1 ) T c ) = L(c)B −1 .

Now note that there exists a hermitian matrix M such that M 2 = B −1 , and

that the matrices L A and M L(c)M have the same eigenvalues (by the last

Riferimenti

Documenti correlati

We use our approach to extend to this case a number of theorems about classical Toeplitz operators and, finally, we show that some classes of well-known operators —like

In this section we study the case of weighted Bergman spaces on the complex ellip- soids  q , which we denote by  as before, and we prove the analog of Theorems 1.1 and 1.2 in

Precedentemente abbiamo definito due tipi di fattorizzazione di Wiener-Hopf ca- nonica, uno per certe funzioni sul cerchio unitario e un altro per certe funzioni sulla retta

Tali sistemi di equazioni si pos- sono rappresentare da un’equazione di Toeplitz oppure un’equazione integrale di Wiener-Hopf, dove la soluzione e il termine noto sono successioni

Heinig, On Matrix Integral Operators on a Finite Inter- val with Kernels depending on the Difference of their Arguments, Rev.. Pures

We discuss problems on Hankel determinants and the classical moment problem related to and inspired by certain Vandermonde determinants for polynomial interpolation on

k γ k (Z 3 ) k has nonzero entries, but it is not a polynomial in the matrix d(v)Z 3 whatever vector v is chosen. .) starting from our lower triangular Toeplitz.!. But, in order

Observe that in our case we need to face both linear and semilinear FPDEs with mixed type of derivatives, both fractional and classic so we will use again the approximate