## The power method, Perron-Frobenius theory, Page-Rank computation Preliminary considerations, A ∈ C ^{n×n}

## Given a n × n matrix A with eigenvalues λ ^{1} , λ 2 , . . . , λ n , the following result holds.

## Theorem (inverse power). If λ ^{∗} _{i} is an approximation of the eigenvalue λ i of a n×n matrix A, i.e. |λ i − λ ^{∗} i | is smaller than |λ j − λ ^{∗} i |, ∀ λ j 6= λ i , if m a (λ i ) = m g (λ i ), and if v 0 ∈ C ^{n} is not orthogonal to the space spanned by the eigenvectors of A corresponding to λ i , then the sequence {v k } generated by the algorithm

## (A − λ ^{∗} i I)a k = v k−1 , v k = a _{k}

## ka k k , k = 1, 2, . . .

## converges to an eigenvector of A corresponding to the eigenvalue λ i . If A is diagonalizable, then the rate of convergence is O((max j: λ

j### 6=λ

i## | ^{λ} _{λ}

^{i}

_{j}

^{−λ} _{−λ}

^{∗}

^{i}

^{∗}

i

## |) ^{k} ).

### Remark . In the general case the rate of convergence is

j: λ

### max

_{j}6=λ

_{i}

### max

s_{λj}

### O(|p

s_{λj}−1

### (k)|| λ

i### − λ

^{∗}

_{i}

### λ

j### − λ

^{∗}

_{i}

### |

^{k}

### ),

### where, given the block diagonal Jordan form of A, J = X

^{−1}

### AX, for each λ

j### 6= λ

i### , the number s

λ_{j}

### indicates the dimension of the generic Jordan block associated with λ

j### , and p

s_{λj}−1

### (k) is a polynomial of degree s

λ_{j}

### − 1, whose coefficients depend on

_{(λ}

^{1}

j−λ^{∗}_{i})^{r}

### , r = 1, . . . , s

λ_{j}

### − 1, and on the coefficients of v

0### with respect to the column vectors of X corresponding to the Jordan block under consideration.

## proof: See the Appendix.

## Let λ 1 be such that |λ ^{1} | = ρ(A) and assume that all λ i such that |λ i | = ρ(A) are equal to λ 1 (in such case we say that λ 1 dominates the eigenvalues of A).

## Assume, moreover, that the algebraic and geometric multiplicity of λ 1 are equal.

## Then the power method (see the Theorem below) can be used to compute λ 1

## and an eigenvector corresponding to λ 1 .

## Theorem (power). If λ 1 dominates the eigenvalues of a n × n matrix A, if m a (λ 1 ) = m g (λ 1 ), and if v 0 ∈ C ^{n} is not orthogonal to the space spanned by the eigenvectors of A corresponding to λ 1 , then the sequence {v ^{k} } generated by the algorithm

## a _{k} = Av k−1 , v k = a _{k}

## ka k k , k = 1, 2, . . .

## converges to an eigenvector of A corresponding to the eigenvalue λ 1 . Moreover, u ^{H} v _{k+1}

## u ^{H} v _{k} → λ ^{1} , k → +∞

## for any u for which u ^{H} v _{k} 6= 0. If A is diagonalizable, then the rate of conver- gence is O((max j: λ

j### 6=λ

1## | ^{λ} _{λ}

^{j}

_{1}

## |) ^{k} ).

### Remark . In the general case the rate of convergence is

j: λ

### max

_{j}6=λ1

### max

s_{λj}

### O(|p

s_{λj}−1

### (k)|| λ

j### λ

1### |

^{k}

### ),

### where, given the block diagonal Jordan form of A, J = X

^{−1}

### AX, for each λ

j### 6= λ

1### , the number

### s

λ_{j}

### indicates the dimension of the generic Jordan block associated with λ

j### , and p

s_{λj}−1

### (k) is

### a polynomial of degree s

λ_{j}

### − 1, whose coefficients depend on

_{λ}

^{1}r

j

### , r = 1, . . . , s

λ_{j}

### − 1, and on the coefficients of v

0### with respect to the column vectors of X corresponding to the Jordan block under consideration.

## proof: See the Appendix.

## For our purposes it is useful to recall also the following classic result on matrix deflation: how to introduce a matrix whose eigenvalues are all equal to the eigenvalues of A except one, which, instead, is zero.

## Theorem. Let A be a n × n matrix. Let λ 1 be a nonzero eigenvalue of A and y 1 a corresponding eigenvector, i.e. Ay 1 = λ 1 y _{1} . Call λ 2 , λ 3 , . . . , λ n the remaining eigenvalues of A. Then the matrix W = A − w ^{λ}

^{∗}

^{1}

### y

1## y _{1} w ^{∗} , w ^{∗} y _{1} 6= 0, has eigenvalues 0, λ 2 , λ 3 , . . . , λ n .

## proof. Introduce S = [y 1 z _{2} · · · z n ] non singular, and observe that p A (λ) = p _{S}

^{−1}

_{AS} (λ) = (λ − λ 1 )q(λ), p W (λ) = p _{S}

^{−1}

_{W S} (λ) = λq(λ).

## Let G be the following matrix

## G =

##

##

### 1 4

### 1 4

### 1 3 2

### 4 1 8

### 1 11 8

### 16 1 4

### 1 16

##

## .

## Since ρ(G) ≤ kGk ^{∞} = 1, we can say that the spectrum of G lies in the circle {z ∈ C : |z| ≤ 1}.

## Note that Ge = 1 · e, i.e. one eigenvalue of G, λ 1 = 1, and its corresponding eigenvector, y 1 = e = [1 1 · · · 1] ^{T} , are known. If λ 2 , λ 3 denote the remaining eigenvalues of G, then we can define a matrix W , in terms of G, λ 1 , y 1 , whose eigenvalues are 0, λ 2 , λ 3 :

## W = G − λ 1

## w ^{∗} y _{1} y _{1} w ^{∗} = G − 1 w ^{∗} e

##

## w ^{∗} w ^{∗} w ^{∗}

##

## , ∀ w, w ^{∗} e 6= 0.

## Since (e ^{T} _{i} G)e = e ^{T} _{i} (1 · e) = 1 6= 0, we choose w ^{∗} = e ^{T} _{i} G:

## W = G −

##

## e ^{T} _{i} G e ^{T} _{i} G e ^{T} _{i} G

##

## . For i = 1 the matrix W becomes

## W =

##

##

### 1 4

### 1 4

### 1 3 2

### 4 1 8

### 1 11 8

### 16 1 4

### 1 16

##

## −

##

##

### 1 4

### 1 4

### 1 1 2 4

### 1 4

### 1 1 2 4

### 1 4

### 1 2

##

## =

##

##

## 0 0 0

### 1

### 2 − ^{1} _{8} − ^{3} _{8}

### 7

### 16 0 − _{16} ^{7}

##

## ,

## so, the eigenvalues of A different from 1 are the eigenvalues of W = ˜

## − ^{1} 8 − ^{3} 8

## 0 − 16 ^{7}

## , i.e. − ^{1} _{8} and − _{16} ^{7} .

## Thus, 1, − ^{1} _{8} and − _{16} ^{7} are the eigenvalues of G, and also, of course, of G ^{T} .

## In particular, 1 is eigenvalue of G ^{T} , but note that the eigenvector p of G ^{T}

## corresponding to 1 is not obvious; it must be computed. For example, it can be computed as the limit of the inverse power sequence {v k } defined as follows (ε small positive number):

## v _{0} ∈ R ^{3} , (G ^{T} − (1 + ε)I)a k = v k−1 , v k = a _{k}

## ka ^{k} k , k = 0, 1, 2, . . . (rate of convergence: O(

### 1−(1+ε)

### −

^{1}

_{8}

### −(1+ε)

### k )). [Here a reference for the inverse power iterations]. We shall see that the vector p can be also obtained as the limit of the sequence

## p _{0} ∈ R ^{3} , p 0 positive, kp ^{0} k ^{1} = 1, p k+1 = G ^{T} p _{k} , k = 0, 1, 2, . . . (rate of convergence: O(

### −

_{16}

^{7}

### 1

### k

## )) [this result is in fact a particular case of the Theorem at the beginning of this section (set k · k = k · k 1 , u = e)]. Even if the rate of convergence of the p k is not as good as the rate of convergence of the v _{k} , the computation of p k+1 from p k is much cheaper than the computation of v k+1 from v k . In fact, for analogous problems, but of high dimension, even one step of the inverse power iterations is prohibitive.

## The Perron-Frobenius theory: A ∈ R ^{n×n} , A ≥ 0 irreducible [Varga]

## Lemma. Let A be a n × n non negative matrix, i.e. a ij ≥ 0, ∀ i, j. Assume that A is not reducible. Then (I + A) ^{n−1} is a positive matrix, i.e. its entries are all positive.

## proof. We shall prove that the vector (I + A) ^{n−1} x is positive whenever x is a non negative non null vector (prove that this is equivalent to the thesis!).

## Let x be a non negative non null vector. Set x 0 = x, x 1 = (I + A)x 0 = x _{0} + Ax 0 , . . ., x k+1 = (I + A)x k = x k + Ax k , k = 1, . . . , n − 2. Note that x _{k} = (I + A) ^{k} x, in particular x n−1 = (I + A) ^{n−1} x. So, our aim is to prove that x _{n−1} is a positive vector. First observe by induction that all x k are non negative vectors (x k non negative and A non negative imply Ax k non negative and x k+1 = x k + Ax k non negative). Then the thesis (x n−1 positive) is now proved by showing that x k+1 must have less zeros than x k for each k ∈ {0, . . . , n − 2}.

## Note that x k+1 cannot have more zeros than x k , since Ax k , in the definition x _{k} + Ax k of x k+1 , is non negative. Assume x k+1 has the same number of zero entries as x k . Of course such zeros must be in the same places, i.e. there exists a permutation matrix P such that P x k =

## α 0

## , P x k+1 =

## β 0

## , with α, β positive vectors of the same dimension m, 1 ≤ m ≤ n − 1 (why such bounds for m ?). Thus, P x k+1 = P x k + P Ax k = P x k + P AP ^{T} P x k . Consider the following partition of the matrix P AP ^{T}

## P AP ^{T} =

## M 11 M 12

## M 21 M 22

## where M 11 is m × m. Then

## β 0

## =

## α 0

## +

## M 11 M 12

## M 21 M 22

## α 0

## ,

## and, in particular, M 21 α = 0. The latter condition implies M 21 = 0, being α a positive vector and M 21 a non negative matrix. But this is equivalent to say that A is reducible, against the hypothesis!

## Example. Set

## I + A =

##

##

##

##

## 1 0 0 1

## a 1 0 0

## 0 b 1 0

## 0 0 c 1

##

##

##

##

## , a, b, c positive.

## Note that A is a non-negative irreducible matrix, and in fact (I + A) ^{3} is a positive matrix, i.e. its entries are positive. Moreover, 3 is the minimum j for which (I + A) ^{j} is positive.

## Let A be a n × n non negative irreducible matrix. Let x be a non negative non null vector, and associate to x the number

## r x := min

### i:x

i### >0

## P

### j a ij x j

## x i

## = min

### i:x

i### >0

## (Ax) i

## x i

## .

## Proposition. r x is a non negative real number; r x = r αx if α > 0; Ax ≥ r x x;

## r x = sup{ρ ∈ R : Ax ≥ ρx}.

## proof: easy, left to the reader.

## Now associate to A the following number:

## r = sup

### x≥0, x6=0

## r x = sup

### x≥0, x6=0 i:x min

_{i}

### >0

## P

### j a ij x j

## x i

## .

## Proposition. r is a positive real number; if w ≥ 0, w 6= 0, is such that Aw ≥ rw, then Aw = rw and w > 0.

## proof. If e = [1 1 · · · 1] ^{T} , then r e = min i: (e)

i### >0 P

j

### a

ij### (e)

j### (e)

_{i}

## = min i P

### j a ij ≥ 0.

## Assume r e = 0. Then for some k we would have P

### j a kj = 0, so the kth row of A would be null, and thus A would be reducible (exchange the k and n rows), against the hypothesis. It follows that r ≥ r e > 0.

## proof. Set η = Aw − rw. We know that η ≥ 0. Assume η 6= 0. Then, by the Lemma,

## 0 < (I + A) ^{n−1} η = (I + A) ^{n−1} Aw − (I + A) ^{n−1} rw

## = A(I + A) ^{n−1} w − r(I + A) ^{n−1} w = Ay − ry, y > 0.

## i.e. r < (Ay) i /y i ∀ i. Thus r < r y , which is absurd. It follows that η = 0, that is, Aw = rw. Then, we also have w > 0 since 0 < (I + A) ^{n−1} w = (1 + r) ^{n−1} w.

## In the following, given v ∈ C ^{n} and M ∈ C ^{n×n} we denote by |v| and |M|, respectively, the column vector (|v k |) ^{n} k=1 and the matrix (|m ij |) ^{n} i,j=1 .

## Theorem. There exists a positive vector z for which Az = rz; r = ρ(A); if

## B ∈ C ^{n×n} , |B| ≤ A, then ρ(B) ≤ ρ(A); if B ∈ C ^{n×n} , |B| ≤ A, |B| 6= A then

## ρ(B) < ρ(A).

## proof: We now show that there exists z ≥ 0 such that r = r z . Once this is proved we will have the inequality Az ≥ r z z = rz which implies, by the Proposition, Az = rz and z > 0.

## r = sup

### x≥0, x6=0

## r x = sup

### x≥0, x6=0

## r

^{x}

kxk2

## = sup

### x≥0, kxk

2### =1

## r x = (∗)

## We have r x ≤ r (I+A)

^{n−1}

### x since r (I+A)

^{n−1}

### x = sup{ρ ∈ R : A(I + A) ^{n−1} x ≥ ρ(I +A) ^{n−1} x } and Ax ≥ r ^{x} x ⇒ A(I+A) ^{n−1} x = (I +A) ^{n−1} Ax ≥ r ^{x} (I +A) ^{n−1} x.

## Thus,

## (∗) ≤ sup

### x≥0, kxk

2### =1

## r (I+A)

^{n−1}

### x = sup

### y∈Q

## r y = max

### y∈Q r y = r z ≤ r

## for some z ∈ Q = {(I + A) ^{n−1} x : x ≥ 0, kxk ^{2} = 1} (r ^{y} is continuous in y (why?) and Q is a compact). Note that z > 0.

## proof: Let λ be an eigenvalue of A, i.e. ∃ y 6= 0 | Ay = λy. Then |λ||y| = |λy| =

## |Ay| ≤ A|y|, |y| ≥ 0, |y| 6= 0. Thus, by definition of r |y| , |λ| ≤ r |y| ≤ r, and we have the inequality ρ(A) ≤ r. But r is an eigenvalue of A, thus r = ρ(A).

## proof: Let λ be an eigenvalue of B, i.e. ∃ y 6= 0 | By = λy. Then |λ||y| =

## |λy| = |By| ≤ |B||y| ≤ A|y|, |y| ≥ 0, |y| 6= 0. Thus, by definition of r |y| ,

## |λ| ≤ r |y| ≤ r, and we have the inequality ρ(B) ≤ r = ρ(A).

## proof: Assume ρ(B) = ρ(A), i.e. there exists λ eigenvalue of B (By = λy, y 6= 0) such that |λ| = r. Then we can add an equality in the above arguments,

## r|y| = |λ||y| = |λy| = |By| ≤ |B||y| ≤ A|y|, |y| ≥ 0, |y| 6= 0,

## obtaining the inequality r|y| ≤ A|y|, |y| ≥ 0, |y| 6= 0. But by the above Proposition, such inequality implies A|y| = r|y| with |y| > 0. So we have

## r|y| = |λ||y| = |λy| = |By| = |B||y| = A|y| = r|y|, |y| > 0,

## from which it follows |B| = A, against the hypothesis! Thus ρ(B) < ρ(A). We conclude this section with the following result of the Perron-Frobenius theory, stated without proof (actually we shall give a proof of such result in the case A is positive):

## Result. If A is a non-negative irreducible n×n matrix, then r = ρ(A) is a simple eigenvalue of A, i.e. λ − r divides p A (λ) but (λ − r) ^{2} does not divide p A (λ).

## Computing the Perron-pair of A irreducible non-negative by the power method Let A be an irreducible non-negative n × n matrix. Then we know that ρ(A) is positive and is a simple eigenvalue of A, and the corresponding eigenvector can be chosen positive. Of course, such eigenvector is uniquely defined if we require that its measure in the 1-norm is one. We also know that ρ(A) = r :=

## sup _{x≥0, x6=0} min i: x

i### >0 (Ax)

i### x

i## .

## So, it is uniquely defined the Perron-pair (ρ(A), z). such that Az = ρ(A)z, ρ(A) positive, z positive, kzk ^{1} = 1. The power method is a way to compute such Perron-pair.

## Theorem(power). Let A be an irreducible non-negative n × n matrix. Let a 0 be any positive vector, and set v 0 = a 0 /ka 0 k 1 . Then set

## a _{k+1} = Av k , v _{k+1} = a _{k+1}

## ka ^{k+1} k ^{1} , k = 0, 1, 2, . . . .

## Note that the sequences {a k }, {v k } are well defined sequences of positive vectors, and kv k k 1 = 1, ∀ k. Let X be the non singular matrix defining, by similarity, the Jordan block-diagonal form of A, i.e.

## X ^{−1} AX = J =

## r 0 ^{T}

## 0 B

## , X =

##

## z x 2 · · · x n

##

## ,

## and let r = ρ(A), λ j , j = 2, . . . , n, be the eigenvalues of A (λ j 6= r, ∀ j). If α in the expression a 0 = αz + P n

### j=2 α j x _{j} is nonzero, then, for k → +∞, we have v _{k} − z → 0, ka k k 1 − ρ(A) → 0,

## provided that |λ j | is smaller than r for all j = 2, . . . , n. In the particular case where A is diagonalizable, the rate of convergence is

### j=2...n max

## |λ j | r

## k

## [for the general case, use the Remark in Theorem(power) of the section on preliminary considerations].

## proof: We prove the Theorem only in the case A diagonalizable, where Ax j = λ j x _{j} , j = 2, . . . , n. It is easy to observe that A ^{k} a _{0} = αr ^{k} z + P n

### j=2 α j λ ^{k} _{j} x _{j} is a positive vector, and that

## v _{k} = ^{αr}

k

### z+ P

nj=2

### α

j### λ

^{k}

_{j}

### x

j### kαr

^{k}

### z+ P

nj=2

### α

j### λ

^{k}

_{j}

### x

j### k

1## = ^{αr}

k

### z+ P

nj=2

### α

j### λ

^{k}

_{j}

### x

j### e

^{T}

### (αr

^{k}

### z+ P

nj=2

### α

j### λ

^{k}

_{j}

### x

j### )

## = ^{αr}

k

### z+ P

nj=2

### α

j### λ

^{k}

_{j}

### x

j### αr

^{k}

### e

^{T}

### z+ P

nj=2

### α

j### λ

^{k}

_{j}

### e

^{T}

### x

j## =

### z+ P

n j=2αj α

λj r k### x

j### 1+ P

n j=2αj α

λj r k### e

^{T}

### x

j## .

## Moreover,

## a k+1 = Av k =

## rz + P n j=2

### α

j### α

## _{λ}

j

### r

## k

## λ j x _{j} 1 + P n

### j=2 α

j### α

## _{λ}

j

### r

## k

## e ^{T} x _{j} and, since a k+1 is positive,

## ka k+1 k 1 = e ^{T} a _{k+1} =

### r+ P

n j=2αj α

λj r^{k}

### λ

j### e

^{T}

### x

j### 1+ P

n j=2αj α

λj r^{k}

### e

^{T}

### x

j## =

### r 1+ P

n j=2αj α

λj r^{k+1}

### e

^{T}

### x

j### 1+ P

n j=2αj α

λj r^{k}

### e

^{T}

### x

j## .

## Exercise. Discuss the convergence of the power method when applied to compute the Perron-pair (1,

## _{1}

### 2 1 2

## ) of the following two 2 × 2 matrices:

## A =

## 0 1 1 0

## , A =

## 1/4 3/4 3/4 1/4

## .

## In the second case, find a constant c such that kv k − zk ≤ c( ^{1} _{2} ) ^{k} .

## Exercise. Prove the Theorem in case A (non negative, irreducible) is 4 × 4, and there exists X non singular for which

## X ^{−1} AX =

##

##

##

## r

## λ 2 1 λ 2 1

## λ 2

##

##

##

## ,

## with |λ 2 | smaller than r. Prove that in such case the rate of convergence is

## |p ^{2} (k)|

## λ 2

## r

## k

## , where p 2 is a degree-two polynomial.

## Exercise. Set

## A =

##

##

## 0 0 1

### 1

### 4 0 0

### 3

### 4 1 0

##

## .

## Prove that the eigenvalues of A are {1, − ^{1} 2 , − ^{1} 2 } and that A is not diagonalizable.

## Find X = [z x 2 x 3 ] such that

## X ^{−1} AX =

##

##

## 1 0 0

## 0 − ^{1} 2 1 0 0 − ^{1} 2

##

## .

## The only hypothesis A irreducible, non-negative, does not assure that r = ρ(A) is the unique eigenvalue of A whose absolute value is equal to r. We only know that if |λ j | = r, j ∈ {2, . . . , n}, then λ j 6= r. Let us see examples:

## A =

## 0 1 1 0

## ; Eigenvalues: − 1, 1; Perron-pair: (1,

## _{1}

### 2 1 2

## ).

## A =

##

##

## 0 a 0

## 1 0 1

## 0 1 − a 0

##

## , a ∈ (0, 1); Eig: − 1, 0, 1; Perron-pair: (1,

##

##

### a 2 1 1−a 2

### 2

##

## ).

## A =

##

##

## 2 1 1 1 2 1 1 1 2

##

## ; Eigenvalues: 1, 1, 4; Perron-pair: (4,

##

## 1/3 1/3 1/3

##

## ).

## Note that if in the second example A is replaced by A ^{T} , then there is no need of computation in obtaining the perron-pair, since it is clear that A ^{T} e = e, e = [1 1 1] ^{T} . Moreover, in the third example r = ρ(A) (which is 4) dominates the remaining eigevalues of A (which are 1, 1). We note that this fact is true for any positive matrix A (see Theorem(positive) below), even if, as the following example shows, it is not a peculiarity of positive matrices:

## A =

##

##

## 2 1 0 1 2 1 0 1 2

##

## ; Eig: 2 ∓ √

## 2, 2; Perron-pair: (2 + √ 2, 1

## 2 + √ 2

##

##

## √ 1 2 1

##

## )

## (here computation is required to obtain the Perron-pair).

## Theorem(positive). If A is a n × n positive matrix and r = ρ(A), λ ^{2} , . . . , λ n are its eigenvalues, then |λ j | is smaller than r for all j = 2, . . . , n.

## proof: Set W = A − sze ^{T} . The eigenvalues of W are r − s, λ ^{2} , . . . , λ n (a sketch of the proof:

## Y :=

## z y

2## · · · y

n## non singular, Y

^{−1}

## AY =

## r · · ·

## 0 M

## , Y

^{−1}

## W Y =

## r − s · · ·

## 0 M

## ).

## If there is a value of s for which |W | ≤ A, |W | 6= A, then ρ(W ) is smaller than ρ(A), and thus |λ ^{2} |, . . . , |λ n | are smaller than r = ρ(A). If A is positive, then such s exists, s = min i,j a ij .

## Irreducible non-negative stochastic-by-columns A and power method

## Let A be an irreducible non-negative stochastic by columns n × n matrix. Then we know that 1 = ρ(A) (A ^{T} e = e, ρ(A) ≤ kAk 1 = 1), that r = 1 = ρ(A) is a simple eigenvalue of A, and that the corresponding eigenvector can be chosen positive. Of course, such eigenvector is uniquely defined if we require that its measure in the 1-norm is one.

## So, it is uniquely defined the Perron-pair (1 = ρ(A), z). such that Az = z, z positive, kzk ^{1} = 1. The computation of such Perron-pair, i.e. the computation of the vector z, can be performed via the power method. Actually, we now see that the power method in the particular case where A is stochastic-by-columns (besides non-negative and irreducible) can be rewritten in a simpler form and converges independently from the choice of a 0 (provided that 1 dominates the other eigenvalues). These results follow from some remarks, reported in the following Proposition.

## Proposition. Let A be an irreducible non-negative stochastic-by-columns n × n matrix. Then

## i)

## v ∈ C ^{n} ⇒ X

### i

## (Av) i = X

### i

## v i

## ( P

### i (Av) i = P

### i

## P

### j a ij v j = P

### j v j P

### i a ij = P

### j v j ), ii)

## v positive, kvk 1 = 1 ⇒ Av positive, kAvk 1 = 1 (use the irreducibility of A and assertion i)),

## iii)

## Av = λv, λ 6= 1, ⇒ X

### i

## v i = 0 ( P

### i v i = P

### i (Av) i = λ P

### i v i , thus (λ − 1) P

### i v i = 0).

## Exercise. Assume that X ^{−1} AX = J where J is the Jordan block diagonal form of A, where A is an irreducible non-negative stochastic-by-columns n×n matrix.

## Assume that [J] 11 = 1. Prove that P

### i [X] ij = 0, j = 2, . . . , n.

## Corollary(power). Let A be an irreducible non-negative stochastic-by-columns n × n matrix. Let a 0 be any positive vector such that ka 0 k 1 = 1. Then set

## a _{k+1} = Aa k , k = 0, 1, 2, . . . .

## The sequence {a k } is a well defined sequence of positive vectors, and ka k k 1 = 1 = ρ(A), ∀ k. If k → +∞, then

## a _{k} − z → 0,

## provided that |λ j | is smaller than 1 for all j = 2, . . . , n. The latter event is assured by Theorem(positive) when A is positive. In the particular case where A is diagonalizable, the rate of convergence is

## ( max

### j=2...n |λ j |) ^{k}

## [for the general case, use the Remark in Theorem(power) of the section on preliminary considerations].

## proof: The sequences a k and v k defined in Theorem(power) coincide, because, by Proposition ii), we have kAv k k ^{1} = kv k k ^{1} . It remains to show that α in the expression a 0 = αz + P

### j α j x _{j} is nonzero. Note that e ^{T} a _{0} = αe ^{T} z + P

### j α j e ^{T} x _{j} = α + P

### j α j e ^{T} x _{j} . In the case that A is diagonalizable the thesis follows by Proposition iii) which asserts that e ^{T} x _{j} , j = 2, . . . , n, must be zero.

## In the generic case the proof is left to the reader . . ..

## Exercise. Discuss existence, unicity, and computation of the Perron-pair for

## A =

##

##

## 0 1 − a a

## a 0 1 − a

## 1 − a a 0

##

## , a ∈ [0, 1], A =

##

##

## 0 0 1

## a 0 0

## 1 − a 1 0

##

## , a ∈ [0, 1],

## A =

##

##

##

##

##

##

##

##

##

##

##

## 0 b 1

## 1 0 b 2

## 1 − b 1 0 . ..

## 1 − b ^{2} . .. b n−2

## . .. 0 1

## 1 − b ^{n−2} 0

##

##

##

##

##

##

##

##

##

##

##

## , b i ∈ (0, 1).

## Page-rank computation [Berkhin, et al]

## Consider an oriented graph with set of verteces V = {1, 2, . . . , n}, and set of edges E, i, j ∈ E if there is a link from i to j.

## Associate to such graph the adiacency matrix:

## L =

## 1 ij ∈ E 0 ij / ∈ E

## Call deg (i) the number of edges starting from i. Of course, deg (i) = 0 (no edge starts from i) iff L ij = 0, ∀ j. Moreover, deg (i) = P

### j L ij . Associate to the graph the transition matrix:

## P =

## ( _{1}

## deg (i) ij ∈ E

## 0 ij / ∈ E

## Note that P ij = L ij / deg (i) if i is such that deg (i) > 0, and P ij = L ij = 0 otherwise. Moreover, P

### j P ij = 1 if deg (i) > 0 and P

### j P ij = 0 otherwise. So, the matrix P is a non negative matrix quasi-stochastic by rows.

## Remark. Row i of P is null iff no edge starts from i; column j of P is null iff no edge points to j.

## Let p ∈ R ^{n} be the vector whose entry j, p j , is the importance (authority) of the vertex j. Then

## p j = X

### i: i→j

## p i

## deg (i) =

### n

## X

### i=1

## P ij p i =

### n

## X

### i=1

## P _{ji} ^{T} p i = (P ^{T} p) j , p = P ^{T} p

## (note that such fixed point p may not exist, or, if exists, may be not unique or with zero entries; see below).

## Let p ^{(k+1)} _{j} be the probability that at step k + 1 of my visit of the graph (navigation on the web) I am on the vertex (page) j. Then

## p ^{(k+1)} _{j} = P

### i: i→j p

^{(k)}

_{i}

## deg ^{(i)} = P n

### i=1 P ij p ^{(k)} _{i}

## = P n

### i=1 P _{ji} ^{T} p ^{(k)} _{i} = (P ^{T} p ^{(k)} ) j , p ^{(k+1)} = P ^{T} p ^{(k)}

## (note that such sequence of vector probabilities exists and is uniquely defined, once p ^{(0)} is given, but the p ^{(k)} may loss the possible ddp property of p ^{(0)} ; see below). [A vector w is said ddp (discrete distribution of probability) if w is positive and kwk ^{1} = 1].

## Note that it is natural to require that: p ^{(k)} _{i} > 0 (at step k there is a prob- ability that I am on vertex i), P

### j p ^{(k)} _{j} = 1 (at step k I am on some vertex);

## p i > 0 (any vertex has a portion of importance . . .), P

### i p i = 1 (. . . of the total 1).

## So, the following facts must be true:

## a) p such that p = P ^{T} p , p > 0, kpk ^{1} = 1, exists and is uniquely defined, b) the method p ^{(0)} = ddp, p ^{(k+1)} = P ^{T} p ^{(k)} converges to p.

## We now show that in order to have a) and b), the matrix P ^{T} must be both stochastic by columns and irreducible.

## Theorem(stoc). If the above facts a) and b) are true, then P ^{T} must be stochastic by columns.

## proof: We know that ∃ ! p such that p = P ^{T} p , p > 0, kpk ^{1} = 1. Assume that P ^{T} is quasi-stochastic but not stochastic by columns. Then

## kP ^{T} p k 1 = P

### i (P ^{T} p) i = P

### i

## P

### j (P ^{T} ) ij p j = P

### j

## P

### i P ji p j

## = P

### j: deg ^{(j)>0} 1 · p j + P

### j: deg ^{(j)=0} 0 · p j < P

### j p j = kpk 1 , analogously,

## kp ^{(k+1)} k 1 = kP ^{T} p ^{(k)} k 1 ≤ kp ^{(k)} k 1 ≤ kp ^{(1)} k 1 < kp ^{(0)} k 1 = 1.

## Thus p cannot be equal to P ^{T} p, and p ^{(k)} cannot converge to a ddp.

## Theorem(irred). If the above facts a) and b) are true, then P ^{T} must be irre-

## ducible.

## proof: assume P ^{T} reducible. Then there exists a permutation matrix Q such that

## Q ^{T} P ^{T} Q =

## A 11 A 12

## 0 A 22

## , (∗)

## with A 11 and A 22 at least 1×1 square matrices. Note that Q ^{T} P ^{T} Q is stochastic by columns, like P ^{T} . We know that ∃ ! p such that p = P ^{T} p , p > 0, kpk 1 = 1, but this is equivalent to say that ∃ ! Q ^{T} p such that

## Q ^{T} p =

## A 11 A 12

## 0 A 22

## Q ^{T} p , Q ^{T} p > 0, kQ ^{T} p k ^{1} = 1. (∗∗) Case 1: assume A 12 = 0. Then A 11 and A 22 are non negative stochastic by columns matrices. We can assume they are also irreducible (why?). Then, by the Perron-Frobenius theory,

## ∃ ! y 1 , y 2 > 0, ky 1 k 1 = ky 2 k 1 = 1, y 1 = A 11 y _{1} , y 2 = A 22 y _{2} . and, as a consequence, for all α ∈ (0, 1) we have

## αy 1

## (1 − α)y ^{2}

## =

## A 11 0 0 A 22

## αy 1

## (1 − α)y ^{2}

## ,

## αy 1

## (1 − α)y ^{2}

## > 0, k

## αy 1

## (1 − α)y ^{2}

## k 1 = 1.

## So, all vectors p such that Q ^{T} p =

## αy 1

## (1 − α)y 2

## satisfy the properties p > 0, kpk ^{1} = 1, p = P ^{T} p, which is against the hypothesis of unicity.

## Case 2: assume A 12 6= 0. Then A 22 in (*) is not stochastic by columns, thus A 22 may have no eigenvalue equal to 1, i.e. the equations in (**) involving A 22

## may be verified only if part of p is null, against the hypothesis of positiveness of p.

## Viceversa, we know that if the non negative matrix P ^{T} is irreducible and stochastic by columns, then 1 = ρ(P ^{T} ) is a simple eigenvalue of P ^{T} and there exists a unique vector p such that p = P ^{T} p , p > 0, kpk 1 = 1. We also know that such hypotheses are not sufficient to assure the convergence (to p) of the sequence p ^{(k+1)} = P ^{T} p ^{(k)} , p ^{(0)} > 0, kp ^{0} k ^{1} = 1, or, equivalently, to assure that the remaining eigenvalues of P ^{T} have absolute value smaller than 1. We can only say that the sequence {p ^{(k)} } is a well defined sequence of ddp.

## We have to modify P (the graph) so to make well posed (∃ !) the mathemat- ical problem and to make convergent the algorithm for solving it.

## Make P ^{T} stochastic:

## P ^{0} = P + dv ^{T} , d =

##

##

##

## δ deg (1),0

## .. . δ deg ^{(n),0}

##

##

## , v =

##

##

##

### 1 n .. .

### 1 n

##

##

##

## i.e, where P has null rows P ^{0} has the row vector v ^{T} . The vertex i with deg (i) =

## 0 now links to all verteces of the graph. [We discuss the uniform case, but what

## follows can be repeated for the more general case v =ddp].

## Observe that (P ^{0} ) ^{T} is stochastic by columns, (P ^{0} ) ^{T} ≥ 0, thus 1 is eigenvalue of (P ^{0} ) ^{T} and the other eigenvalues of (P ^{0} ) ^{T} , λ ^{0} _{2} , . . . , λ ^{0} _{n} , are such that |λ ^{0} j | ≤ 1.

## Make (P ^{0} ) ^{T} irreducible:

## P ^{00} = cP ^{0} + (1 − c)ev ^{T} , e =

##

##

## 1

## .. . 1

##

##

## , c ∈ (0, 1).

## Since

## e ^{T} _{i} P ^{00} =

## ( cv ^{T} + (1 − c)v ^{T} = v ^{T} deg (i) = 0 c[· · · 0 deg ^{1} ^{(i)} 0 · · ·] + (1 − c)v ^{T} deg (i) > 0

## we are assuming that a visitor of the graph can go from the vertex i to one of its neighborhoods with probability c/ deg (i) + (1 − c)/n , and with probability (1−c)/n to an arbitrary vertex of the graph. Of course the parameter c must be chosen near to 1, in order to maintain our model faithful to the way the graph (web) is visited.

## Observe that (P ^{00} ) ^{T} is stochastic by columns and positive, therefore, in par- ticular, it is non negative and irreducible. So, we have all we need.

## Theorem(page-rank). 1 = ρ((P ^{00} ) ^{T} ) is a simple eigenvalue of (P ^{00} ) ^{T} , there exists a unique vector p such that p = (P ^{00} ) ^{T} p , p > 0, kpk ^{1} = 1 (i.e. we have fact (a)), and the other eigenvalues of (P ^{00} ) ^{T} , λ ^{00} _{2} , . . . , λ ^{00} _{n} , are such that |λ ^{00} j | < 1 (by Theorem(positive)). Thus, p ^{(k+1)} = (P ^{00} ) ^{T} p ^{(k)} , p ^{(0)} > 0, kp ^{(0)} k 1 = 1, is a sequence of ddp convergent to p and, in case P ^{00} is diagonalizable,

## kp ^{(k)} − pk = O(( max

### j=2,...,n |λ ^{00} j |) ^{k} )

## [for the general case see the Remark in Theorem(power) of Section 1] (i.e. we have fact (b)). Moreover, for the particular choice of (P ^{00} ) ^{T} , the cost of each step of the power method is O(n) and is dominated by the cost of the matrix-vector multiplication P ^{T} z , and if A is diagonalizable, then the rate of convergence is

## kp ^{(k)} − pk = O(c ^{k} )

## [for the general case see the Remark in Theorem(power) of Section 1] (Google- search engine sets c = 0.85 [Berkhin]).

## proof: We have to prove only the final assertions.

## O(n) arithmetic operations are sufficient to perform each step of the power method. We have

## (P ^{00} ) ^{T} = c(P ^{T} + vd ^{T} ) + (1 − c)ve ^{T} , and, if x ≥ 0, then

## (P ^{00} ) ^{T} x = cP ^{T} x + γv,

## γ = cd ^{T} x + (1 − c)e ^{T} x = e ^{T} x − c[e ^{T} x − d ^{T} x ] = kxk ^{1} − ckP ^{T} x k ^{1} (why the latter equality holds?). Thus, in order to compute p ^{(k+1)} from p ^{(k)} one can use the following function

## y = cP ^{T} x,

## γ = kxk 1 − kyk 1 ,

## (P ^{00} ) ^{T} x = y + γv

## where the dominant operation is the matrix-vector multiplication P ^{T} x. Note that each row j of P ^{T} has (on the average) a very small number of nonzero entries, i.e. exactly the number of vertices pointing to j, so P ^{T} x can be com- puted with O(n) arithmetic operations. Note also that in order to implement the above function one needs only 2n memory allocations.

## Rate of convergence kp ^{(k)} − pk = O(c ^{k} ). We first prove that if e ^{T} v = 1 then p P

^{0}

## (λ) = (λ − 1)p ^{n−1} (λ) ⇒ p _{P}

0### +

^{1−c}

_{c}

### ev

^{T}

## (λ) = (λ − 1

## c )p n−1 (λ). (∗ ∗ ∗) In fact,

## S =

## e y _{2} · · · y n , det(S) 6= 0, S ^{−1} P ^{0} S =

## 1 u ^{T}

## 0 M

## , p P

^{0}

## (λ) = (λ − 1)p M (λ),

## S ^{−1} (P ^{0} + ^{1−c} _{c} ev ^{T} )S =

## 1 u ^{T}

## 0 M

## + ^{1−c} _{c} S ^{−1} ev ^{T} S

## =

## 1 u ^{T}

## 0 M

## + ^{1−c} _{c}

##

##

## 1 · · · ∗ 0 · · · 0

##

## =

## 1 c u ˜ ^{T}

## 0 M

## .

## As a consequence of (***), if 1, λ ^{0} _{2} , . . . , λ ^{0} _{n} are the eigenvalues of P ^{0} (re- call that |λ ^{0} j | ≤ 1, j = 2, . . . , n), then the eigenvalues of P ^{0} + ^{1−c} _{c} ev ^{T} are

### 1

### c , λ ^{0} _{2} , . . . , λ ^{0} _{n} , and thus the eigenvalues of cP ^{0} + (1 − c)ev ^{T} are 1, cλ ^{0} _{2} , . . . , cλ ^{0} _{n} . It follows that if A is diagonalizable, then kp ^{(k)} −pk = O((max ^{j=2,...,n} |cλ ^{0} j |) ^{k} ) = O(c ^{k} ).

## Why to compute p? [Berkhin].

## QUERY: Berkhin survey.

## Go in the inverted terms document file, which is a table containing a row for each term of a collection’s dictionary. In such file, for each term there is a list of all documents that contain such term

## .. .

## term → LISTA

term## = {i

1## , i

2## , . . . , i

k## } ⊂ {1, 2, . . . , n}

## .. .

## Berkhin → LISTA

Berkhin## = {1, 4, 6}

## .. .

## survey → LISTA

survey## = {1, 3}

## .. .

## Define the set of relevance of the query

## ∪ _{term} ∈ QUERY LISTA ^{term} = {1, 3, 4, 6}

## Reading p (p is updated once each month) considers and orders the correspond- ing set of authorities {p ^{1} , p 3 , p 4 , p 6 }, for example p ^{4} ≥ p ^{6} ≥ p ^{3} ≥ p ^{1}

## Finally, show the titles of the documents 1, 3, 4, 6 in the order 4, 6, 3, 1, from the

## one with greatest authority to the one with smallest authority.

## Main criticism: This procedure, being independent from the query allows a fast answer, but does not make distinction between pages with authority from pages with authority on a specific subject.

## Exercise. Draw the graph whose transition matrix is

## P =

##

##

##

##

##

##

##

##

## 0 1/2 1/2 0 0 0

## 0 0 0 0 0 0

## 1/3 1/3 0 0 1/3 0

## 0 0 0 0 1/2 1/2

## 0 0 0 1/2 0 1/2

## 0 0 0 1 0 0

##

##

##

##

##

##

##

## .

## Note that P is non negative, reducible and quasi-stochastic, but not stochastic, by rows. Prove that there is no positive vector p such that p = P ^{T} p. Starting from P and proceeding as indicated in the theory, introduce a non negative matrix P ^{0} stochastic by rows. Note that P ^{0} is reducible, like P . Prove that there is no positive vector p such that p = (P ^{0} ) ^{T} p . Starting from P ^{0} and proceeding as indicated in the theory, introduce a non negative matrix P ^{00} irreducible and stochastic by rows. Prove that there exists a unique positive vector p such that p = (P ^{00} ) ^{T} p , kpk 1 = 1, and describe an algorithm for the computation of p.

## Here below is an approximation of such vector p:

## p ^{T} = [0.03721 0.05396 0.04151 0.3751 0.206 0.2862].

## Assume, for example, that the set of relevance of a query is {1, 3, 4, 6}. Then the documents 1, 3, 4, 6 are listed in the order 4, 6, 3, 1, being p 4 ≥ p 6 ≥ p 3 ≥ p 1 .

## APPENDIX

## Proof (Theorem(inverse power)):

## Assume A diagonalizable, Ax j = λ j x _{j} , {x j } ^{n} j=1 linearly independent. Then Ax j − λ ^{∗} i x _{j} = (λ j − λ ^{∗} i )x j ⇒ (A − λ ^{∗} i I) ^{−m} x _{j} = 1

## (λ j − λ ^{∗} i ) ^{m} x _{j} . Call α j the numbers for which v 0 = P

### j: λ

_{j}

### =λ

_{i}

## α j x _{j} + P

### j: λ

_{j}

### 6=λ

_{i}

## α j x _{j} . Note that the first sum is a non null vector (by the assumption on v 0 ). Then the following equality holds

## (λ i − λ ^{∗} i ) ^{m} (A − λ ^{∗} i I) ^{−m} v _{0} = X

### j: λ

j### =λ

i## α j x _{j} + X

### j: λ

j### 6=λ

i## α j

## λ i − λ ^{∗} i

## λ j − λ ^{∗} i

## m

## x _{j}

## which, for m → +∞, yields the thesis.

## Assume that A is not diagonalizable. In this case call x j , j = 1, . . . , n, the

## columns of the non singular matrix X for which X ^{−1} AX = J where J is the

## block diagonal Jordan form of A. Assume that the upper-left diagonal block of

## J is diagonal and its diagonal contains all λ j such that λ j = λ := λ i . Assume

## that there are t of such eigenvalues. Then consider any other diagonal block

## of J; of course it corresponds to an eigenvalue λ j different from λ, call such

## eigenvalue µ. The block under consideration has an order, say s = s µ .

### J =

###

###

###

###

###

###

###

###

###

###

###

###

###

###

###

###

###

###

###

### λ

### λ . . .

### λ

### •

### µ 1

### µ . . . . . . 1

### µ

### •

###

###

###

###

###

###

###

###

###

###

###

###

###

###

###

###

###

###

###

### .

## Restrict the matrix equation X ^{−1} AX = J to this block, thus, for some r ≥ t we have

## A[x

r+1## x

r+2## x

r+3## · · · x

r+s## ] = [x

r+1## x

r+2## x

r+3## · · · x

r+s## ]

##

##

##

##

##

##

## µ 1

## µ . . . . . . 1

## µ

##

##

##

##

##

## .

## It follows that

## Ax r+1 = µx r+1 , (A − λ ^{∗} I)x r+1 = (µ − λ ^{∗} )x r+1 ,

## (A − λ ^{∗} I) ^{−1} x _{r+1} = _{µ−λ} ^{1}

∗## x _{r+1} , (A − λ ^{∗} I) ^{−m} x _{r+1} = _{(µ−λ} ^{1}

∗### )

^{m}

## x _{r+1} . Ax r+2 = µx r+2 + x r+1 , (A − λ ^{∗} I)x r+2 = (µ − λ ^{∗} )x r+2 + x r+1 , (A − λ ^{∗} I) ^{−1} x _{r+2} = _{µ−λ} ^{1}

∗## x _{r+2} − _{(µ−λ} ^{1}

^{∗}

_{)}

^{2}

## x _{r+1} ,

## (A − λ ^{∗} I) ^{−m} x _{r+2} = _{(µ−λ} ^{1}

∗### )

^{m}

## x _{r+2} − (µ−λ ^{m}

^{∗}

### )

^{m+1}

## x _{r+1} .

## Ax r+3 = µx r+3 + x r+2 , (A − λ ^{∗} I)x r+3 = (µ − λ ^{∗} )x r+3 + x r+2 , (A − λ ^{∗} I) ^{−1} x _{r+3} = _{µ−λ} ^{1}

∗## x _{r+3} − _{(µ−λ} ^{1}

^{∗}

_{)}

^{2}

## x _{r+2} + _{(µ−λ} ^{1}

∗### )

^{3}

## x _{r+1} , (A − λ ^{∗} I) ^{−m} x r+3 = _{(µ−λ} ^{1}

∗### )

^{m}

## x r+3 − (µ−λ ^{m}

^{∗}

### )

^{m+1}

## x r+2 +

^{1}

^{2}

^{(m}

2

### +m) (µ−λ

^{∗}

### )

^{m+2}

## x r+1 . Ax r+4 = µx r+4 + x r+3 , (A − λ ^{∗} I)x r+4 = (µ − λ ^{∗} )x r+4 + x r+3 , (A − λ ^{∗} I) ^{−1} x _{r+4} = _{µ−λ} ^{1}

∗## x _{r+4} − _{(µ−λ} ^{1}

^{∗}

_{)}

^{2}

## x _{r+3}

## + _{(µ−λ} ^{1}

∗### )

^{3}

## x r+2 − (µ−λ ^{1}

^{∗}

### )

^{4}

## x r+1 ,

## (A − λ ^{∗} I) ^{−m} x _{r+4} = _{(µ−λ} ^{1}

∗### )

^{m}

## x _{r+4} − _{(µ−λ} ^{m}

^{∗}

_{)}

^{m+1}

## x _{r+3} + _{(µ−λ}

^{1}

^{2}

^{(m}

^{2}∗

^{+m)} )

^{m+2}

## x _{r+2} −

^{1}

^{6}

^{m} (µ−λ

^{3}

^{+}

^{1}

^{2}

^{∗}

^{m} )

^{m+3}

^{2}

^{+}

^{1}

^{3}

^{m} x _{r+1} ,

## (A − λ ^{∗} I) ^{−m} x _{r+s} = 1

## (µ − λ ^{∗} ) ^{m} x _{r+s} − . . . + (−1) ^{s−1} q s−1 (m)

## (µ − λ ^{∗} ) ^{m+s−1} x _{r+1} , (A − λ ^{∗} I) ^{−m} [x r+1 x _{r+2} x _{r+3} · · ·]

## = [x r+1 x _{r+2} x _{r+3} · · ·]

##

##

##

##

##

##

### 1

### (µ−λ

^{∗}

### )

^{m}

## − _{(µ−λ} ^{m}

^{∗}

_{)}

^{m+1}

_{(µ−λ}

^{1}

^{2}

^{(m}

^{2}

^{∗}

^{+m)} _{)}

^{m+2}

## · · · 0 _{(µ−λ} ^{1}

∗### )

^{m}

## − _{(µ−λ} ^{m}

^{∗}

_{)}

^{m+1}

## 0 0 _{(µ−λ} ^{1}

∗### )

^{m}

## .. . . ..

##

##

##

##

##

##

## .

## Set also λ ^{∗} := λ ^{∗} _{i} (λ ^{∗} _{i} is the given approximation of λ = λ i ). Choose v 0 , and call α j the numbers for which v 0 = P

### j: λ

j### =λ α j x _{j} + { . . . + P r+s

### j=r+1 α j x _{j} + . . . }.

## Note that P

### j: λ

j### =λ α j x _{j} = P t

### j=1 α j x _{j} . Then (A − λ ^{∗} I) ^{−m} v _{0}

## = P

### j: λ

j### =λ α j (A − λ ^{∗} I) ^{−m} x _{j} + { · · · + P r+s

### j=r+1 α j (A − λ ^{∗} I) ^{−m} x _{j} + · · · }

## = P

### j: λ

j### =λ α j 1

### (λ−λ

^{∗}

### )

^{m}

## x _{j} + { · · · + [ α ^{r+1} ( _{(µ−λ} ^{1}

∗### )

^{m}

## x _{r+1} ) +α r+2 ( _{(µ−λ} ^{1}

∗### )

^{m}

## x r+2 − (µ−λ ^{m}

^{∗}

### )

^{m+1}

## x r+1 )

## +α r+3 ( _{(µ−λ} ^{1}

∗### )

^{m}

## x _{r+3} − _{(µ−λ} ^{m}

^{∗}

_{)}

^{m+1}

## x _{r+2} + _{(µ−λ}

^{1}

^{2}

^{(m}

^{2}∗

^{+m)} )

^{m+2}

## x _{r+1} ) +α r+4 ( _{(µ−λ} ^{1}

∗### )

^{m}

## x _{r+4} − (µ−λ ^{m}

^{∗}

### )

^{m+1}

## x _{r+3}

## + _{(µ−λ}

^{1}

^{2}

^{(m}

^{2}∗

^{+m)} )

^{m+2}

## x _{r+2} −

^{1}

^{6}

^{m} (µ−λ

^{3}

^{+}

^{1}

^{2}

^{∗}

^{m} )

^{m+3}

^{2}

^{+}

^{1}

^{3}

^{m} x _{r+1} )

## + . . . + α r+s ( _{(µ−λ} ^{1}

∗### )

^{m}

## x r+s − . . . + (−1) ^{s−1} (µ−λ ^{q}

^{s−1}

^{∗}

### ) ^{(m)}

^{m+s−1}

## x r+1 ) ] + · · · }.

## But this implies

## (λ − λ ^{∗} ) ^{m} (A − λ ^{∗} I) ^{−m} v _{0}

## = P

### j: λ

_{j}

### =λ α j x _{j} + { . . . +

### λ−λ

^{∗}

### µ−λ

^{∗}

## m

## h α r+1 − α ^{r+2} µ−λ ^{m}

^{∗}

## + α r+3

1 2

### (m

^{2}

### +m)

### (µ−λ

^{∗}

### )

^{2}

## . . . + (−1) ^{s−1} α r+s q

s−1### (m) (µ−λ

^{∗}

### )

^{s−1}

## x r+1

## + α r+2 − α r+3 m

### µ−λ

^{∗}

## + α r+4

1 2

### (m

^{2}

### +m)

### (µ−λ

^{∗}

### )

^{2}

## . . . + (−1) ^{s−2} α r+s q

s−2### (m) (µ−λ

^{∗}

### )

^{s−2}

## x r+2

## + . . . + α r+s x r+s

## i + · · · },

## from which it is clear that the sequence (λ−λ ^{∗} ) ^{m} (A−λ ^{∗} I) ^{−m} v _{0} converges to an eigenvector of A associated with λ. The assertions about the rate of convergence follow by setting

## p s−1 (λ) = α r+1 − α ^{r+2} µ−λ ^{m}

^{∗}

## + α r+3

1 2

### (m

^{2}

### +m)

### (µ−λ

^{∗}

### )

^{2}

## −α r+4

1

6

### m

^{3}

### +

^{1}

_{2}

### m

^{2}

### +

^{1}

_{3}

### m

### (µ−λ

^{∗}

### )

^{3}

## . . . + (−1) ^{s−1} α r+s q

s−1### (m) (µ−λ

^{∗}

### )

^{s−1}

## . Proof (Theorem(power)):

## Assume that A is diagonalizable, so Ax j = λ j x _{j} , {x j } ^{n} j=1 linearly independent.

## Choose v 0 , and call α j so that v 0 = P

### j: λ

j### =λ

1## α j x _{j} + P

### j: λ

j### 6=λ

1## α j x _{j} . Note that the first sum is non null by assumption. Then we have the equalities:

## A ^{m} v _{0} = λ ^{m} _{1} P

### j: λ

j### =λ

1## α j x _{j} + P

### j: λ

j### 6=λ

1## α j λ ^{m} _{j} x _{j} ,

### 1

### λ

^{m}

_{1}

## A ^{m} v _{0} = P

### j: λ

j### =λ

1## α j x _{j} + P

### j: λ

j### 6=λ

1## α j λ

j### λ

1## m

## x _{j} . The thesis follows letting m go to infinite.

## Assume that A is not diagonalizable. Set µ := λ j , where λ j 6= λ 1 , and restrict, as above, the Jordan matrix equation X ^{−1} AX = J to a diagonal Jordan block of order s associated with µ. Then

## Ax r+1 = µx r+1 , A ^{m} x _{r+1} = µ ^{m} x _{r+1} ,

## Ax r+2 = µx r+2 + x r+1 , A ^{m} x _{r+2} = µ ^{m} x _{r+2} + mµ ^{m−1} x _{r+1} , Ax r+3 = µx r+3 + x r+2 ,

## A ^{m} x _{r+3} = µ ^{m} x _{r+3} + mµ ^{m−1} x _{r+2} + ^{1} _{2} (m ^{2} − m)µ ^{m−2} x _{r+1} ,

## Ax r+4 = µx r+4 + x r+3 ,

## A ^{m} x _{r+4} = µ ^{m} x _{r+4} + mµ ^{m−1} x _{r+3} + ^{1} _{2} (m ^{2} − m)µ ^{m−2} x _{r+2} +( ^{1} _{6} m ^{3} − ^{1} _{2} m ^{2} + ^{1} _{3} m)µ ^{m−3} x _{r+1} ,

## and so on. Set λ := λ 1 and t = m a (λ 1 ) = m g (λ 1 ), so we can assume that the upper-left t × t submatrix of J is diagonal with diagonal entries all equal to λ.

## Then

## A ^{m} v _{0} = A ^{m} P t

### j=1 α j x _{j} + {. . . + P r+s

### j=r+1 α j x _{j} + . . .}

## = P t

### j=1 α j λ ^{m} x _{j} + {. . . + P r+s

### j=r+1 α j A ^{m} x _{j} + . . .}

## = λ ^{m} P t

### j=1 α j x _{j} + {. . . + [ α r+1 A ^{m} x _{r+1} + α r+2 A ^{m} x _{r+2} +α r+3 A ^{m} x _{r+3} + α r+4 A ^{m} x _{r+4} + . . . + α r+s A ^{m} x _{r+s} ] + . . .}

## = λ ^{m} P t j=1 α j x _{j}

## +{. . . + [ α ^{r+1} (µ ^{m} x r+1 ) + α r+2 (µ ^{m} x r+2 + mµ ^{m−1} x r+1 ) +α r+3 (µ ^{m} x r+3 + mµ ^{m−1} x r+2 + ^{1} _{2} (m ^{2} − m)µ ^{m−2} x r+1 ) +α r+4 (µ ^{m} x r+4 + mµ ^{m−1} x r+3 + ^{1} _{2} (m ^{2} − m)µ ^{m−2} x r+2

## +( ^{1} _{6} m ^{3} − ^{1} 2 m ^{2} + ^{1} _{3} m)µ ^{m−3} x r+1 )

## + . . . + α r+s (µ ^{m} x r+s + . . . + q s−1 (m)µ ^{m−s+1} x r+1 ) ] + . . .}, A ^{m} v _{0} = λ ^{m} P t

### j=1 α j x _{j}

## +{. . . + [ (α ^{r+1} µ ^{m} + α r+2 mµ ^{m−1} + α r+3 1

### 2 (m ^{2} − m)µ ^{m−2} +α r+4 ( ^{1} _{6} m ^{3} − ^{1} 2 m ^{2} + ^{1} _{3} m)µ ^{m−3}

## + . . . + α r+s q s−1 (m)µ ^{m−s+1} )x r+1

## +(α r+2 µ ^{m} + α r+3 mµ ^{m−1} + α r+4 1

### 2 (m ^{2} − m)µ ^{m−2} + . . . +α r+s q s−2 (m)µ ^{m−s+2} )x r+2 + . . . + (α r+s µ ^{m} )x r+s ] + . . .},

### 1

### λ

^{m}

## A ^{m} v _{0} = P t

### j=1 α j x _{j} + {. . . +

### µ λ

## m

## [ (α r+1 + α r+2 m µ + α r+3

1 2

### (m

^{2}

### −m)

### µ

^{2}

## +α r+4

1

6

### m

^{3}

### −

^{1}

_{2}

### m

^{2}

### +

^{1}

_{3}

### m

### µ

^{3}

## + . . . + α r+s q

s−1### (m) µ

^{s−1}

## )x r+1

## +(α r+2 + α r+3 m µ + α r+4

1 2

### (m

^{2}

### −m)

### µ

^{2}

## + . . . + α r+s q

s−2### (m) µ

^{s−2}

## )x r+2

## + . . . + (α r+s )x r+s ] + . . .}.

## It is clear that, as m goes to infinite, the sequence _{λ} ^{1}

m## A ^{m} v _{0} converges to an eigenvector of A associated with λ, the dominant eigenvalue of A. Finally, the assertions on the rate of convergence follow by setting

## p s−1 (m) = α r+1 + α r+2 m µ + α r+3

1 2

### (m

^{2}

### −m)

### µ

^{2}

## +α r+4

1

6

### m

^{3}

### −

^{1}

_{2}

### m

^{2}

### +

^{1}

_{3}

### m

### µ

^{3}

## + . . . + α r+s q

s−1### (m)

### µ

^{s−1}