Given a n × n matrix A with eigenvalues λ 1 , λ 2 , . . . , λ n , the following result holds.

(1)

The power method, Perron-Frobenius theory, Page-Rank computation Preliminary considerations, A ∈ C ^n×n

Given a n × n matrix A with eigenvalues λ ¹ , λ 2 , . . . , λ n , the following result holds.

Theorem (inverse power). If λ ^∗ _i is an approximation of the eigenvalue λ i of a n×n matrix A, i.e. |λ i − λ ^∗ i | is smaller than |λ j − λ ^∗ i |, ∀ λ j 6= λ i , if m a (λ i ) = m g (λ i ), and if v 0 ∈ C ⁿ is not orthogonal to the space spanned by the eigenvectors of A corresponding to λ i , then the sequence {v k } generated by the algorithm

(A − λ ^∗ i I)a k = v k−1 , v k = a _k

ka k k , k = 1, 2, . . .

converges to an eigenvector of A corresponding to the eigenvalue λ i . If A is diagonalizable, then the rate of convergence is O((max j: λ

j

6=λ

i

| ^λ _λ

ⁱ_j

^−λ _−λ

^∗ⁱ^∗

i

|) ^k ).

Remark . In the general case the rate of convergence is

j: λ

max

_j6=λ_i

max

s_λj

O(|p

s_λj−1

(k)|| λ

i

− λ

^∗_i

λ

j

− λ

^∗_i

|

^k

),

where, given the block diagonal Jordan form of A, J = X

⁻¹

AX, for each λ

j

6= λ

i

, the number s

λ_j

indicates the dimension of the generic Jordan block associated with λ

j

, and p

s_λj−1

(k) is a polynomial of degree s

λ_j

− 1, whose coefficients depend on

_(λ ¹

j−λ^∗_i)^r

, r = 1, . . . , s

λ_j

− 1, and on the coefficients of v

0

with respect to the column vectors of X corresponding to the Jordan block under consideration.

proof: See the Appendix.

Let λ 1 be such that |λ ¹ | = ρ(A) and assume that all λ i such that |λ i | = ρ(A) are equal to λ 1 (in such case we say that λ 1 dominates the eigenvalues of A).

Assume, moreover, that the algebraic and geometric multiplicity of λ 1 are equal.

Then the power method (see the Theorem below) can be used to compute λ 1

and an eigenvector corresponding to λ 1 .

Theorem (power). If λ 1 dominates the eigenvalues of a n × n matrix A, if m a (λ 1 ) = m g (λ 1 ), and if v 0 ∈ C ⁿ is not orthogonal to the space spanned by the eigenvectors of A corresponding to λ 1 , then the sequence {v ^k } generated by the algorithm

a _k = Av k−1 , v k = a _k

ka k k , k = 1, 2, . . .

converges to an eigenvector of A corresponding to the eigenvalue λ 1 . Moreover, u ^H v _k+1

u ^H v _k → λ ¹ , k → +∞

for any u for which u ^H v _k 6= 0. If A is diagonalizable, then the rate of convergence is O((max j: λ

j

6=λ

1

| ^λ _λ

^j₁

|) ^k ).

Remark . In the general case the rate of convergence is

j: λ

max

_j6=λ1

max

s_λj

O(|p

s_λj−1

(k)|| λ

j

λ

1

|

^k

),

where, given the block diagonal Jordan form of A, J = X

⁻¹

AX, for each λ

j

6= λ

1

, the number

s

λ_j

indicates the dimension of the generic Jordan block associated with λ

j

, and p

s_λj−1

(k) is

(2)

a polynomial of degree s

λ_j

− 1, whose coefficients depend on

_λ¹r

j

, r = 1, . . . , s

λ_j

− 1, and on the coefficients of v

0

with respect to the column vectors of X corresponding to the Jordan block under consideration.

proof: See the Appendix.

For our purposes it is useful to recall also the following classic result on matrix deflation: how to introduce a matrix whose eigenvalues are all equal to the eigenvalues of A except one, which, instead, is zero.

Theorem. Let A be a n × n matrix. Let λ 1 be a nonzero eigenvalue of A and y 1 a corresponding eigenvector, i.e. Ay 1 = λ 1 y ₁ . Call λ 2 , λ 3 , . . . , λ n the remaining eigenvalues of A. Then the matrix W = A − w ^λ

^∗¹

y

1

y ₁ w ^∗ , w ^∗ y ₁ 6= 0, has eigenvalues 0, λ 2 , λ 3 , . . . , λ n .

proof. Introduce S = [y 1 z ₂ · · · z n ] non singular, and observe that p A (λ) = p _S

⁻¹

_AS (λ) = (λ − λ 1 )q(λ), p W (λ) = p _S

⁻¹

_{W S} (λ) = λq(λ).

Let G be the following matrix

G =





1 4

1 3 2

4 1 8

1 11 8

16 1 4

1 16



 .

Since ρ(G) ≤ kGk ^∞ = 1, we can say that the spectrum of G lies in the circle {z ∈ C : |z| ≤ 1}.

Note that Ge = 1 · e, i.e. one eigenvalue of G, λ 1 = 1, and its corresponding eigenvector, y 1 = e = [1 1 · · · 1] ^T , are known. If λ 2 , λ 3 denote the remaining eigenvalues of G, then we can define a matrix W , in terms of G, λ 1 , y 1 , whose eigenvalues are 0, λ 2 , λ 3 :

W = G − λ 1

w ^∗ y ₁ y ₁ w ^∗ = G − 1 w ^∗ e



 w ^∗ w ^∗ w ^∗



 , ∀ w, w ^∗ e 6= 0.

Since (e ^T _i G)e = e ^T _i (1 · e) = 1 6= 0, we choose w ^∗ = e ^T _i G:

W = G −



 e ^T _i G e ^T _i G e ^T _i G



 . For i = 1 the matrix W becomes

W =





1 4

1 3 2

4 1 8

1 11 8

16 1 4

1 16



 −





1 4

1 1 2 4

1 4

1 1 2 4

1 4

1 2



 =





0 0 0

1 2 − ¹ ₈ − ³ ₈

7 16 0 − ₁₆ ⁷



 ,

so, the eigenvalues of A different from 1 are the eigenvalues of W = ˜

− ¹ 8 − ³ 8

0 − 16 ⁷

, i.e. − ¹ ₈ and − ₁₆ ⁷ .

Thus, 1, − ¹ ₈ and − ₁₆ ⁷ are the eigenvalues of G, and also, of course, of G ^T .

In particular, 1 is eigenvalue of G ^T , but note that the eigenvector p of G ^T

(3)

corresponding to 1 is not obvious; it must be computed. For example, it can be computed as the limit of the inverse power sequence {v k } defined as follows (ε small positive number):

v ₀ ∈ R ³ , (G ^T − (1 + ε)I)a k = v k−1 , v k = a _k

ka ^k k , k = 0, 1, 2, . . . (rate of convergence: O(

1−(1+ε)

−

¹₈

−(1+ε)

k )). [Here a reference for the inverse power iterations]. We shall see that the vector p can be also obtained as the limit of the sequence

p ₀ ∈ R ³ , p 0 positive, kp ⁰ k ¹ = 1, p k+1 = G ^T p _k , k = 0, 1, 2, . . . (rate of convergence: O(

−

₁₆⁷

1 k

)) [this result is in fact a particular case of the Theorem at the beginning of this section (set k · k = k · k 1 , u = e)]. Even if the rate of convergence of the p k is not as good as the rate of convergence of the v _k , the computation of p k+1 from p k is much cheaper than the computation of v k+1 from v k . In fact, for analogous problems, but of high dimension, even one step of the inverse power iterations is prohibitive.

The Perron-Frobenius theory: A ∈ R ^n×n , A ≥ 0 irreducible [Varga]

Lemma. Let A be a n × n non negative matrix, i.e. a ij ≥ 0, ∀ i, j. Assume that A is not reducible. Then (I + A) ⁿ⁻¹ is a positive matrix, i.e. its entries are all positive.

proof. We shall prove that the vector (I + A) ⁿ⁻¹ x is positive whenever x is a non negative non null vector (prove that this is equivalent to the thesis!).

Let x be a non negative non null vector. Set x 0 = x, x 1 = (I + A)x 0 = x ₀ + Ax 0 , . . ., x k+1 = (I + A)x k = x k + Ax k , k = 1, . . . , n − 2. Note that x _k = (I + A) ^k x, in particular x n−1 = (I + A) ⁿ⁻¹ x. So, our aim is to prove that x _n−1 is a positive vector. First observe by induction that all x k are non negative vectors (x k non negative and A non negative imply Ax k non negative and x k+1 = x k + Ax k non negative). Then the thesis (x n−1 positive) is now proved by showing that x k+1 must have less zeros than x k for each k ∈ {0, . . . , n − 2}.

Note that x k+1 cannot have more zeros than x k , since Ax k , in the definition x _k + Ax k of x k+1 , is non negative. Assume x k+1 has the same number of zero entries as x k . Of course such zeros must be in the same places, i.e. there exists a permutation matrix P such that P x k =

α 0

, P x k+1 =

β 0

, with α, β positive vectors of the same dimension m, 1 ≤ m ≤ n − 1 (why such bounds for m ?). Thus, P x k+1 = P x k + P Ax k = P x k + P AP ^T P x k . Consider the following partition of the matrix P AP ^T

P AP ^T =

M 11 M 12

M 21 M 22

where M 11 is m × m. Then

β 0

=

α 0

+

M 11 M 12

M 21 M 22

α 0

,

(4)

and, in particular, M 21 α = 0. The latter condition implies M 21 = 0, being α a positive vector and M 21 a non negative matrix. But this is equivalent to say that A is reducible, against the hypothesis!

Example. Set

I + A =







1 0 0 1

a 1 0 0

0 b 1 0

0 0 c 1







, a, b, c positive.

Note that A is a non-negative irreducible matrix, and in fact (I + A) ³ is a positive matrix, i.e. its entries are positive. Moreover, 3 is the minimum j for which (I + A) ^j is positive.

Let A be a n × n non negative irreducible matrix. Let x be a non negative non null vector, and associate to x the number

r x := min

i:x

i

>0

P

j a ij x j

x i

= min

i:x

i

>0

(Ax) i

x i

.

Proposition. r x is a non negative real number; r x = r αx if α > 0; Ax ≥ r x x;

r x = sup{ρ ∈ R : Ax ≥ ρx}.

proof: easy, left to the reader.

Now associate to A the following number:

r = sup

x≥0, x6=0

r x = sup

x≥0, x6=0 i:x min

_i

>0

P

j a ij x j

x i

.

Proposition. r is a positive real number; if w ≥ 0, w 6= 0, is such that Aw ≥ rw, then Aw = rw and w > 0.

proof. If e = [1 1 · · · 1] ^T , then r e = min i: (e)

i

>0 P

j

a

ij

(e)

j

(e)

_i

= min i P

j a ij ≥ 0.

Assume r e = 0. Then for some k we would have P

j a kj = 0, so the kth row of A would be null, and thus A would be reducible (exchange the k and n rows), against the hypothesis. It follows that r ≥ r e > 0.

proof. Set η = Aw − rw. We know that η ≥ 0. Assume η 6= 0. Then, by the Lemma,

0 < (I + A) ⁿ⁻¹ η = (I + A) ⁿ⁻¹ Aw − (I + A) ⁿ⁻¹ rw

= A(I + A) ⁿ⁻¹ w − r(I + A) ⁿ⁻¹ w = Ay − ry, y > 0.

i.e. r < (Ay) i /y i ∀ i. Thus r < r y , which is absurd. It follows that η = 0, that is, Aw = rw. Then, we also have w > 0 since 0 < (I + A) ⁿ⁻¹ w = (1 + r) ⁿ⁻¹ w.

In the following, given v ∈ C ⁿ and M ∈ C ^n×n we denote by |v| and |M|, respectively, the column vector (|v k |) ⁿ k=1 and the matrix (|m ij |) ⁿ i,j=1 .

Theorem. There exists a positive vector z for which Az = rz; r = ρ(A); if

B ∈ C ^n×n , |B| ≤ A, then ρ(B) ≤ ρ(A); if B ∈ C ^n×n , |B| ≤ A, |B| 6= A then

ρ(B) < ρ(A).

(5)

proof: We now show that there exists z ≥ 0 such that r = r z . Once this is proved we will have the inequality Az ≥ r z z = rz which implies, by the Proposition, Az = rz and z > 0.

r = sup

x≥0, x6=0

r x = sup

x≥0, x6=0

r

^x

kxk2

= sup

x≥0, kxk

2

=1

r x = (∗)

We have r x ≤ r (I+A)

ⁿ⁻¹

x since r (I+A)

ⁿ⁻¹

x = sup{ρ ∈ R : A(I + A) ⁿ⁻¹ x ≥ ρ(I +A) ⁿ⁻¹ x } and Ax ≥ r ^x x ⇒ A(I+A) ⁿ⁻¹ x = (I +A) ⁿ⁻¹ Ax ≥ r ^x (I +A) ⁿ⁻¹ x.

Thus,

(∗) ≤ sup

x≥0, kxk

2

=1

r (I+A)

ⁿ⁻¹

x = sup

y∈Q

r y = max

y∈Q r y = r z ≤ r

for some z ∈ Q = {(I + A) ⁿ⁻¹ x : x ≥ 0, kxk ² = 1} (r ^y is continuous in y (why?) and Q is a compact). Note that z > 0.

proof: Let λ be an eigenvalue of A, i.e. ∃ y 6= 0 | Ay = λy. Then |λ||y| = |λy| =

|Ay| ≤ A|y|, |y| ≥ 0, |y| 6= 0. Thus, by definition of r |y| , |λ| ≤ r |y| ≤ r, and we have the inequality ρ(A) ≤ r. But r is an eigenvalue of A, thus r = ρ(A).

proof: Let λ be an eigenvalue of B, i.e. ∃ y 6= 0 | By = λy. Then |λ||y| =

|λy| = |By| ≤ |B||y| ≤ A|y|, |y| ≥ 0, |y| 6= 0. Thus, by definition of r |y| ,

|λ| ≤ r |y| ≤ r, and we have the inequality ρ(B) ≤ r = ρ(A).

proof: Assume ρ(B) = ρ(A), i.e. there exists λ eigenvalue of B (By = λy, y 6= 0) such that |λ| = r. Then we can add an equality in the above arguments,

r|y| = |λ||y| = |λy| = |By| ≤ |B||y| ≤ A|y|, |y| ≥ 0, |y| 6= 0,

obtaining the inequality r|y| ≤ A|y|, |y| ≥ 0, |y| 6= 0. But by the above Proposition, such inequality implies A|y| = r|y| with |y| > 0. So we have

r|y| = |λ||y| = |λy| = |By| = |B||y| = A|y| = r|y|, |y| > 0,

from which it follows |B| = A, against the hypothesis! Thus ρ(B) < ρ(A). We conclude this section with the following result of the Perron-Frobenius theory, stated without proof (actually we shall give a proof of such result in the case A is positive):

Result. If A is a non-negative irreducible n×n matrix, then r = ρ(A) is a simple eigenvalue of A, i.e. λ − r divides p A (λ) but (λ − r) ² does not divide p A (λ).

Computing the Perron-pair of A irreducible non-negative by the power method Let A be an irreducible non-negative n × n matrix. Then we know that ρ(A) is positive and is a simple eigenvalue of A, and the corresponding eigenvector can be chosen positive. Of course, such eigenvector is uniquely defined if we require that its measure in the 1-norm is one. We also know that ρ(A) = r :=

sup _{x≥0, x6=0} min i: x

i

>0 (Ax)

i

x

i

.

So, it is uniquely defined the Perron-pair (ρ(A), z). such that Az = ρ(A)z, ρ(A) positive, z positive, kzk ¹ = 1. The power method is a way to compute such Perron-pair.

Theorem(power). Let A be an irreducible non-negative n × n matrix. Let a 0 be any positive vector, and set v 0 = a 0 /ka 0 k 1 . Then set

a _k+1 = Av k , v _k+1 = a _k+1

ka ^k+1 k ¹ , k = 0, 1, 2, . . . .

(6)

Note that the sequences {a k }, {v k } are well defined sequences of positive vectors, and kv k k 1 = 1, ∀ k. Let X be the non singular matrix defining, by similarity, the Jordan block-diagonal form of A, i.e.

X ⁻¹ AX = J =

r 0 ^T

0 B

, X =



 z x 2 · · · x n



 ,

and let r = ρ(A), λ j , j = 2, . . . , n, be the eigenvalues of A (λ j 6= r, ∀ j). If α in the expression a 0 = αz + P n

j=2 α j x _j is nonzero, then, for k → +∞, we have v _k − z → 0, ka k k 1 − ρ(A) → 0,

provided that |λ j | is smaller than r for all j = 2, . . . , n. In the particular case where A is diagonalizable, the rate of convergence is

j=2...n max

|λ j | r

k

[for the general case, use the Remark in Theorem(power) of the section on preliminary considerations].

proof: We prove the Theorem only in the case A diagonalizable, where Ax j = λ j x _j , j = 2, . . . , n. It is easy to observe that A ^k a ₀ = αr ^k z + P n

j=2 α j λ ^k _j x _j is a positive vector, and that

v _k = ^αr

k

z+ P

n

j=2

α

j

λ

^k_j

x

j

kαr

^k

z+ P

n

j=2

α

j

λ

^k_j

x

j

k

1

= ^αr

k

z+ P

n

j=2

α

j

λ

^k_j

x

j

e

^T

(αr

^k

z+ P

n

j=2

α

j

λ

^k_j

x

j

)

= ^αr

k

z+ P

n

j=2

α

j

λ

^k_j

x

j

αr

^k

e

^T

z+ P

n

j=2

α

j

λ

^k_j

e

^T

x

j

=

z+ P

n j=2

αj α

λj r

k

x

j

1+ P

n j=2

αj α

λj r

k

e

^T

x

j

.

Moreover,

a k+1 = Av k =

rz + P n j=2

α

j

α

_λ

j

r

k

λ j x _j 1 + P n

j=2 α

j

α

_λ

j

r

k

e ^T x _j and, since a k+1 is positive,

ka k+1 k 1 = e ^T a _k+1 =

r+ P

n j=2

αj α

λj r

^k

λ

j

e

^T

x

j

1+ P

n j=2

αj α

λj r

^k

e

^T

x

j

=

r 1+ P

n j=2

αj α

λj r

^k+1

e

^T

x

j

1+ P

n j=2

αj α

λj r

^k

e

^T

x

j

.

Exercise. Discuss the convergence of the power method when applied to compute the Perron-pair (1,

₁

2 1 2

) of the following two 2 × 2 matrices:

A =

0 1 1 0

, A =

1/4 3/4 3/4 1/4

.

(7)

In the second case, find a constant c such that kv k − zk ≤ c( ¹ ₂ ) ^k .

Exercise. Prove the Theorem in case A (non negative, irreducible) is 4 × 4, and there exists X non singular for which

X ⁻¹ AX =





 r

λ 2 1 λ 2 1

λ 2





 ,

with |λ 2 | smaller than r. Prove that in such case the rate of convergence is

|p ² (k)|

λ 2

r

k

, where p 2 is a degree-two polynomial.

Exercise. Set

A =





0 0 1

1 4 0 0

3 4 1 0



 .

Prove that the eigenvalues of A are {1, − ¹ 2 , − ¹ 2 } and that A is not diagonalizable.

Find X = [z x 2 x 3 ] such that

X ⁻¹ AX =





1 0 0

0 − ¹ 2 1 0 0 − ¹ 2



 .

The only hypothesis A irreducible, non-negative, does not assure that r = ρ(A) is the unique eigenvalue of A whose absolute value is equal to r. We only know that if |λ j | = r, j ∈ {2, . . . , n}, then λ j 6= r. Let us see examples:

A =

0 1 1 0

; Eigenvalues: − 1, 1; Perron-pair: (1,

₁

2 1 2

).

A =





0 a 0

1 0 1

0 1 − a 0



 , a ∈ (0, 1); Eig: − 1, 0, 1; Perron-pair: (1,





a 2 1 1−a 2

2



).

A =





2 1 1 1 2 1 1 1 2



 ; Eigenvalues: 1, 1, 4; Perron-pair: (4,



 1/3 1/3 1/3



).

Note that if in the second example A is replaced by A ^T , then there is no need of computation in obtaining the perron-pair, since it is clear that A ^T e = e, e = [1 1 1] ^T . Moreover, in the third example r = ρ(A) (which is 4) dominates the remaining eigevalues of A (which are 1, 1). We note that this fact is true for any positive matrix A (see Theorem(positive) below), even if, as the following example shows, it is not a peculiarity of positive matrices:

A =





2 1 0 1 2 1 0 1 2



 ; Eig: 2 ∓ √

2, 2; Perron-pair: (2 + √ 2, 1

2 + √ 2





√ 1 2 1



)

(8)

(here computation is required to obtain the Perron-pair).

Theorem(positive). If A is a n × n positive matrix and r = ρ(A), λ ² , . . . , λ n are its eigenvalues, then |λ j | is smaller than r for all j = 2, . . . , n.

proof: Set W = A − sze ^T . The eigenvalues of W are r − s, λ ² , . . . , λ n (a sketch of the proof:

Y :=

z y

2

· · · y

n

non singular, Y

⁻¹

AY =

r · · ·

0 M

, Y

⁻¹

W Y =

r − s · · ·

0 M

).

If there is a value of s for which |W | ≤ A, |W | 6= A, then ρ(W ) is smaller than ρ(A), and thus |λ ² |, . . . , |λ n | are smaller than r = ρ(A). If A is positive, then such s exists, s = min i,j a ij .

Irreducible non-negative stochastic-by-columns A and power method

Let A be an irreducible non-negative stochastic by columns n × n matrix. Then we know that 1 = ρ(A) (A ^T e = e, ρ(A) ≤ kAk 1 = 1), that r = 1 = ρ(A) is a simple eigenvalue of A, and that the corresponding eigenvector can be chosen positive. Of course, such eigenvector is uniquely defined if we require that its measure in the 1-norm is one.

So, it is uniquely defined the Perron-pair (1 = ρ(A), z). such that Az = z, z positive, kzk ¹ = 1. The computation of such Perron-pair, i.e. the computation of the vector z, can be performed via the power method. Actually, we now see that the power method in the particular case where A is stochastic-by-columns (besides non-negative and irreducible) can be rewritten in a simpler form and converges independently from the choice of a 0 (provided that 1 dominates the other eigenvalues). These results follow from some remarks, reported in the following Proposition.

Proposition. Let A be an irreducible non-negative stochastic-by-columns n × n matrix. Then

i)

v ∈ C ⁿ ⇒ X

i

(Av) i = X

i

v i

( P

i (Av) i = P

i

P

j a ij v j = P

j v j P

i a ij = P

j v j ), ii)

v positive, kvk 1 = 1 ⇒ Av positive, kAvk 1 = 1 (use the irreducibility of A and assertion i)),

iii)

Av = λv, λ 6= 1, ⇒ X

i

v i = 0 ( P

i v i = P

i (Av) i = λ P

i v i , thus (λ − 1) P

i v i = 0).

Exercise. Assume that X ⁻¹ AX = J where J is the Jordan block diagonal form of A, where A is an irreducible non-negative stochastic-by-columns n×n matrix.

Assume that [J] 11 = 1. Prove that P

i [X] ij = 0, j = 2, . . . , n.

Corollary(power). Let A be an irreducible non-negative stochastic-by-columns n × n matrix. Let a 0 be any positive vector such that ka 0 k 1 = 1. Then set

a _k+1 = Aa k , k = 0, 1, 2, . . . .

(9)

The sequence {a k } is a well defined sequence of positive vectors, and ka k k 1 = 1 = ρ(A), ∀ k. If k → +∞, then

a _k − z → 0,

provided that |λ j | is smaller than 1 for all j = 2, . . . , n. The latter event is assured by Theorem(positive) when A is positive. In the particular case where A is diagonalizable, the rate of convergence is

( max

j=2...n |λ j |) ^k

[for the general case, use the Remark in Theorem(power) of the section on preliminary considerations].

proof: The sequences a k and v k defined in Theorem(power) coincide, because, by Proposition ii), we have kAv k k ¹ = kv k k ¹ . It remains to show that α in the expression a 0 = αz + P

j α j x _j is nonzero. Note that e ^T a ₀ = αe ^T z + P

j α j e ^T x _j = α + P

j α j e ^T x _j . In the case that A is diagonalizable the thesis follows by Proposition iii) which asserts that e ^T x _j , j = 2, . . . , n, must be zero.

In the generic case the proof is left to the reader . . ..

Exercise. Discuss existence, unicity, and computation of the Perron-pair for

A =





0 1 − a a

a 0 1 − a

1 − a a 0



 , a ∈ [0, 1], A =





0 0 1

a 0 0

1 − a 1 0



 , a ∈ [0, 1],

A =







0 b 1

1 0 b 2

1 − b 1 0 . ..

1 − b ² . .. b n−2

. .. 0 1

1 − b ⁿ⁻² 0







, b i ∈ (0, 1).

Page-rank computation [Berkhin, et al]

Consider an oriented graph with set of verteces V = {1, 2, . . . , n}, and set of edges E, i, j ∈ E if there is a link from i to j.

Associate to such graph the adiacency matrix:

L =

1 ij ∈ E 0 ij / ∈ E

Call deg (i) the number of edges starting from i. Of course, deg (i) = 0 (no edge starts from i) iff L ij = 0, ∀ j. Moreover, deg (i) = P

j L ij . Associate to the graph the transition matrix:

P =

( ₁

deg (i) ij ∈ E

0 ij / ∈ E

(10)

Note that P ij = L ij / deg (i) if i is such that deg (i) > 0, and P ij = L ij = 0 otherwise. Moreover, P

j P ij = 1 if deg (i) > 0 and P

j P ij = 0 otherwise. So, the matrix P is a non negative matrix quasi-stochastic by rows.

Remark. Row i of P is null iff no edge starts from i; column j of P is null iff no edge points to j.

Let p ∈ R ⁿ be the vector whose entry j, p j , is the importance (authority) of the vertex j. Then

p j = X

i: i→j

p i

deg (i) =

n

X

i=1

P ij p i =

n

X

i=1

P _ji ^T p i = (P ^T p) j , p = P ^T p

(note that such fixed point p may not exist, or, if exists, may be not unique or with zero entries; see below).

Let p ^(k+1) _j be the probability that at step k + 1 of my visit of the graph (navigation on the web) I am on the vertex (page) j. Then

p ^(k+1) _j = P

i: i→j p

^(k)_i

deg ⁽ⁱ⁾ = P n

i=1 P ij p ^(k) _i

= P n

i=1 P _ji ^T p ^(k) _i = (P ^T p ^(k) ) j , p ^(k+1) = P ^T p ^(k)

(note that such sequence of vector probabilities exists and is uniquely defined, once p ⁽⁰⁾ is given, but the p ^(k) may loss the possible ddp property of p ⁽⁰⁾ ; see below). [A vector w is said ddp (discrete distribution of probability) if w is positive and kwk ¹ = 1].

Note that it is natural to require that: p ^(k) _i > 0 (at step k there is a probability that I am on vertex i), P

j p ^(k) _j = 1 (at step k I am on some vertex);

p i > 0 (any vertex has a portion of importance . . .), P

i p i = 1 (. . . of the total 1).

So, the following facts must be true:

a) p such that p = P ^T p , p > 0, kpk ¹ = 1, exists and is uniquely defined, b) the method p ⁽⁰⁾ = ddp, p ^(k+1) = P ^T p ^(k) converges to p.

We now show that in order to have a) and b), the matrix P ^T must be both stochastic by columns and irreducible.

Theorem(stoc). If the above facts a) and b) are true, then P ^T must be stochastic by columns.

proof: We know that ∃ ! p such that p = P ^T p , p > 0, kpk ¹ = 1. Assume that P ^T is quasi-stochastic but not stochastic by columns. Then

kP ^T p k 1 = P

i (P ^T p) i = P

i

P

j (P ^T ) ij p j = P

j

P

i P ji p j

= P

j: deg ^(j)>0 1 · p j + P

j: deg ^(j)=0 0 · p j < P

j p j = kpk 1 , analogously,

kp ^(k+1) k 1 = kP ^T p ^(k) k 1 ≤ kp ^(k) k 1 ≤ kp ⁽¹⁾ k 1 < kp ⁽⁰⁾ k 1 = 1.

Thus p cannot be equal to P ^T p, and p ^(k) cannot converge to a ddp.

Theorem(irred). If the above facts a) and b) are true, then P ^T must be irre-

ducible.

(11)

proof: assume P ^T reducible. Then there exists a permutation matrix Q such that

Q ^T P ^T Q =

A 11 A 12

0 A 22

, (∗)

with A 11 and A 22 at least 1×1 square matrices. Note that Q ^T P ^T Q is stochastic by columns, like P ^T . We know that ∃ ! p such that p = P ^T p , p > 0, kpk 1 = 1, but this is equivalent to say that ∃ ! Q ^T p such that

Q ^T p =

A 11 A 12

0 A 22

Q ^T p , Q ^T p > 0, kQ ^T p k ¹ = 1. (∗∗) Case 1: assume A 12 = 0. Then A 11 and A 22 are non negative stochastic by columns matrices. We can assume they are also irreducible (why?). Then, by the Perron-Frobenius theory,

∃ ! y 1 , y 2 > 0, ky 1 k 1 = ky 2 k 1 = 1, y 1 = A 11 y ₁ , y 2 = A 22 y ₂ . and, as a consequence, for all α ∈ (0, 1) we have

αy 1

(1 − α)y ²

=

A 11 0 0 A 22

αy 1

(1 − α)y ²

,

αy 1

(1 − α)y ²

> 0, k

αy 1

(1 − α)y ²

k 1 = 1.

So, all vectors p such that Q ^T p =

αy 1

(1 − α)y 2

satisfy the properties p > 0, kpk ¹ = 1, p = P ^T p, which is against the hypothesis of unicity.

Case 2: assume A 12 6= 0. Then A 22 in (*) is not stochastic by columns, thus A 22 may have no eigenvalue equal to 1, i.e. the equations in (**) involving A 22

may be verified only if part of p is null, against the hypothesis of positiveness of p.

Viceversa, we know that if the non negative matrix P ^T is irreducible and stochastic by columns, then 1 = ρ(P ^T ) is a simple eigenvalue of P ^T and there exists a unique vector p such that p = P ^T p , p > 0, kpk 1 = 1. We also know that such hypotheses are not sufficient to assure the convergence (to p) of the sequence p ^(k+1) = P ^T p ^(k) , p ⁽⁰⁾ > 0, kp ⁰ k ¹ = 1, or, equivalently, to assure that the remaining eigenvalues of P ^T have absolute value smaller than 1. We can only say that the sequence {p ^(k) } is a well defined sequence of ddp.

We have to modify P (the graph) so to make well posed (∃ !) the mathemat- ical problem and to make convergent the algorithm for solving it.

Make P ^T stochastic:

P ⁰ = P + dv ^T , d =







δ deg (1),0

.. . δ deg ^(n),0





 , v =







1 n .. .

1 n







i.e, where P has null rows P ⁰ has the row vector v ^T . The vertex i with deg (i) =

0 now links to all verteces of the graph. [We discuss the uniform case, but what

follows can be repeated for the more general case v =ddp].

(12)

Observe that (P ⁰ ) ^T is stochastic by columns, (P ⁰ ) ^T ≥ 0, thus 1 is eigenvalue of (P ⁰ ) ^T and the other eigenvalues of (P ⁰ ) ^T , λ ⁰ ₂ , . . . , λ ⁰ _n , are such that |λ ⁰ j | ≤ 1.

Make (P ⁰ ) ^T irreducible:

P ⁰⁰ = cP ⁰ + (1 − c)ev ^T , e =





 1

.. . 1





 , c ∈ (0, 1).

Since

e ^T _i P ⁰⁰ =

( cv ^T + (1 − c)v ^T = v ^T deg (i) = 0 c[· · · 0 deg ¹ ⁽ⁱ⁾ 0 · · ·] + (1 − c)v ^T deg (i) > 0

we are assuming that a visitor of the graph can go from the vertex i to one of its neighborhoods with probability c/ deg (i) + (1 − c)/n , and with probability (1−c)/n to an arbitrary vertex of the graph. Of course the parameter c must be chosen near to 1, in order to maintain our model faithful to the way the graph (web) is visited.

Observe that (P ⁰⁰ ) ^T is stochastic by columns and positive, therefore, in particular, it is non negative and irreducible. So, we have all we need.

Theorem(page-rank). 1 = ρ((P ⁰⁰ ) ^T ) is a simple eigenvalue of (P ⁰⁰ ) ^T , there exists a unique vector p such that p = (P ⁰⁰ ) ^T p , p > 0, kpk ¹ = 1 (i.e. we have fact (a)), and the other eigenvalues of (P ⁰⁰ ) ^T , λ ⁰⁰ ₂ , . . . , λ ⁰⁰ _n , are such that |λ ⁰⁰ j | < 1 (by Theorem(positive)). Thus, p ^(k+1) = (P ⁰⁰ ) ^T p ^(k) , p ⁽⁰⁾ > 0, kp ⁽⁰⁾ k 1 = 1, is a sequence of ddp convergent to p and, in case P ⁰⁰ is diagonalizable,

kp ^(k) − pk = O(( max

j=2,...,n |λ ⁰⁰ j |) ^k )

[for the general case see the Remark in Theorem(power) of Section 1] (i.e. we have fact (b)). Moreover, for the particular choice of (P ⁰⁰ ) ^T , the cost of each step of the power method is O(n) and is dominated by the cost of the matrix-vector multiplication P ^T z , and if A is diagonalizable, then the rate of convergence is

kp ^(k) − pk = O(c ^k )

[for the general case see the Remark in Theorem(power) of Section 1] (Google- search engine sets c = 0.85 [Berkhin]).

proof: We have to prove only the final assertions.

O(n) arithmetic operations are sufficient to perform each step of the power method. We have

(P ⁰⁰ ) ^T = c(P ^T + vd ^T ) + (1 − c)ve ^T , and, if x ≥ 0, then

(P ⁰⁰ ) ^T x = cP ^T x + γv,

γ = cd ^T x + (1 − c)e ^T x = e ^T x − c[e ^T x − d ^T x ] = kxk ¹ − ckP ^T x k ¹ (why the latter equality holds?). Thus, in order to compute p ^(k+1) from p ^(k) one can use the following function

y = cP ^T x,

γ = kxk 1 − kyk 1 ,

(P ⁰⁰ ) ^T x = y + γv

(13)

where the dominant operation is the matrix-vector multiplication P ^T x. Note that each row j of P ^T has (on the average) a very small number of nonzero entries, i.e. exactly the number of vertices pointing to j, so P ^T x can be computed with O(n) arithmetic operations. Note also that in order to implement the above function one needs only 2n memory allocations.

Rate of convergence kp ^(k) − pk = O(c ^k ). We first prove that if e ^T v = 1 then p P

⁰

(λ) = (λ − 1)p ⁿ⁻¹ (λ) ⇒ p _P

0

+

^1−c_c

ev

^T

(λ) = (λ − 1

c )p n−1 (λ). (∗ ∗ ∗) In fact,

S =

e y ₂ · · · y n , det(S) 6= 0, S ⁻¹ P ⁰ S =

1 u ^T

0 M

, p P

⁰

(λ) = (λ − 1)p M (λ),

S ⁻¹ (P ⁰ + ^1−c _c ev ^T )S =

1 u ^T

0 M

+ ^1−c _c S ⁻¹ ev ^T S

=

1 u ^T

0 M

+ ^1−c _c





1 · · · ∗ 0 · · · 0



 =

1 c u ˜ ^T

0 M

.

As a consequence of (***), if 1, λ ⁰ ₂ , . . . , λ ⁰ _n are the eigenvalues of P ⁰ (recall that |λ ⁰ j | ≤ 1, j = 2, . . . , n), then the eigenvalues of P ⁰ + ^1−c _c ev ^T are

1 c , λ ⁰ ₂ , . . . , λ ⁰ _n , and thus the eigenvalues of cP ⁰ + (1 − c)ev ^T are 1, cλ ⁰ ₂ , . . . , cλ ⁰ _n . It follows that if A is diagonalizable, then kp ^(k) −pk = O((max ^j=2,...,n |cλ ⁰ j |) ^k ) = O(c ^k ).

Why to compute p? [Berkhin].

QUERY: Berkhin survey.

Go in the inverted terms document file, which is a table containing a row for each term of a collection’s dictionary. In such file, for each term there is a list of all documents that contain such term

.. .

term → LISTA

term

= {i

1

, i

2

, . . . , i

k

} ⊂ {1, 2, . . . , n}

.. .

Berkhin → LISTA

Berkhin

= {1, 4, 6}

.. .

survey → LISTA

survey

= {1, 3}

.. .

Define the set of relevance of the query

∪ _term ∈ QUERY LISTA ^term = {1, 3, 4, 6}

Reading p (p is updated once each month) considers and orders the corresponding set of authorities {p ¹ , p 3 , p 4 , p 6 }, for example p ⁴ ≥ p ⁶ ≥ p ³ ≥ p ¹

Finally, show the titles of the documents 1, 3, 4, 6 in the order 4, 6, 3, 1, from the

one with greatest authority to the one with smallest authority.

(14)

Main criticism: This procedure, being independent from the query allows a fast answer, but does not make distinction between pages with authority from pages with authority on a specific subject.

Exercise. Draw the graph whose transition matrix is

P =







0 1/2 1/2 0 0 0

0 0 0 0 0 0

1/3 1/3 0 0 1/3 0

0 0 0 0 1/2 1/2

0 0 0 1/2 0 1/2

0 0 0 1 0 0





 .

Note that P is non negative, reducible and quasi-stochastic, but not stochastic, by rows. Prove that there is no positive vector p such that p = P ^T p. Starting from P and proceeding as indicated in the theory, introduce a non negative matrix P ⁰ stochastic by rows. Note that P ⁰ is reducible, like P . Prove that there is no positive vector p such that p = (P ⁰ ) ^T p . Starting from P ⁰ and proceeding as indicated in the theory, introduce a non negative matrix P ⁰⁰ irreducible and stochastic by rows. Prove that there exists a unique positive vector p such that p = (P ⁰⁰ ) ^T p , kpk 1 = 1, and describe an algorithm for the computation of p.

Here below is an approximation of such vector p:

p ^T = [0.03721 0.05396 0.04151 0.3751 0.206 0.2862].

Assume, for example, that the set of relevance of a query is {1, 3, 4, 6}. Then the documents 1, 3, 4, 6 are listed in the order 4, 6, 3, 1, being p 4 ≥ p 6 ≥ p 3 ≥ p 1 .

APPENDIX

Proof (Theorem(inverse power)):

Assume A diagonalizable, Ax j = λ j x _j , {x j } ⁿ j=1 linearly independent. Then Ax j − λ ^∗ i x _j = (λ j − λ ^∗ i )x j ⇒ (A − λ ^∗ i I) ^−m x _j = 1

(λ j − λ ^∗ i ) ^m x _j . Call α j the numbers for which v 0 = P

j: λ

_j

=λ

_i

α j x _j + P

j: λ

_j

6=λ

_i

α j x _j . Note that the first sum is a non null vector (by the assumption on v 0 ). Then the following equality holds

(λ i − λ ^∗ i ) ^m (A − λ ^∗ i I) ^−m v ₀ = X

j: λ

j

=λ

i

α j x _j + X

j: λ

j

6=λ

i

α j

λ i − λ ^∗ i

λ j − λ ^∗ i

m

x _j

which, for m → +∞, yields the thesis.

Assume that A is not diagonalizable. In this case call x j , j = 1, . . . , n, the

columns of the non singular matrix X for which X ⁻¹ AX = J where J is the

block diagonal Jordan form of A. Assume that the upper-left diagonal block of

J is diagonal and its diagonal contains all λ j such that λ j = λ := λ i . Assume

that there are t of such eigenvalues. Then consider any other diagonal block

of J; of course it corresponds to an eigenvalue λ j different from λ, call such

(15)

eigenvalue µ. The block under consideration has an order, say s = s µ .

J =





 λ

λ . . .

λ

• µ 1

µ . . . . . . 1

µ

• 



 .

Restrict the matrix equation X ⁻¹ AX = J to this block, thus, for some r ≥ t we have

A[x

r+1

x

r+2

x

r+3

· · · x

r+s

] = [x

r+1

x

r+2

x

r+3

· · · x

r+s

]







µ 1

µ . . . . . . 1

µ





 .

It follows that

Ax r+1 = µx r+1 , (A − λ ^∗ I)x r+1 = (µ − λ ^∗ )x r+1 ,

(A − λ ^∗ I) ⁻¹ x _r+1 = _µ−λ ¹

∗

x _r+1 , (A − λ ^∗ I) ^−m x _r+1 = _(µ−λ ¹

∗

)

^m

x _r+1 . Ax r+2 = µx r+2 + x r+1 , (A − λ ^∗ I)x r+2 = (µ − λ ^∗ )x r+2 + x r+1 , (A − λ ^∗ I) ⁻¹ x _r+2 = _µ−λ ¹

∗

x _r+2 − _(µ−λ ¹

^∗

₎

²

x _r+1 ,

(A − λ ^∗ I) ^−m x _r+2 = _(µ−λ ¹

∗

)

^m

x _r+2 − (µ−λ ^m

^∗

)

^m+1

x _r+1 .

Ax r+3 = µx r+3 + x r+2 , (A − λ ^∗ I)x r+3 = (µ − λ ^∗ )x r+3 + x r+2 , (A − λ ^∗ I) ⁻¹ x _r+3 = _µ−λ ¹

∗

x _r+3 − _(µ−λ ¹

^∗

₎

²

x _r+2 + _(µ−λ ¹

∗

)

³

x _r+1 , (A − λ ^∗ I) ^−m x r+3 = _(µ−λ ¹

∗

)

^m

x r+3 − (µ−λ ^m

^∗

)

^m+1

x r+2 +

¹²

^(m

2

+m) (µ−λ

^∗

)

^m+2

x r+1 . Ax r+4 = µx r+4 + x r+3 , (A − λ ^∗ I)x r+4 = (µ − λ ^∗ )x r+4 + x r+3 , (A − λ ^∗ I) ⁻¹ x _r+4 = _µ−λ ¹

∗

x _r+4 − _(µ−λ ¹

^∗

₎

²

x _r+3

+ _(µ−λ ¹

∗

)

³

x r+2 − (µ−λ ¹

^∗

)

⁴

x r+1 ,

(A − λ ^∗ I) ^−m x _r+4 = _(µ−λ ¹

∗

)

^m

x _r+4 − _(µ−λ ^m

^∗

₎

^m+1

x _r+3 + _(µ−λ

¹²

^(m

²∗

^+m) )

^m+2

x _r+2 −

¹⁶

^m (µ−λ

³

⁺

¹²^∗

^m )

^m+3²

⁺

¹³

^m x _r+1 ,

(A − λ ^∗ I) ^−m x _r+s = 1

(µ − λ ^∗ ) ^m x _r+s − . . . + (−1) ^s−1 q s−1 (m)

(µ − λ ^∗ ) ^m+s−1 x _r+1 , (A − λ ^∗ I) ^−m [x r+1 x _r+2 x _r+3 · · ·]

= [x r+1 x _r+2 x _r+3 · · ·]







1 (µ−λ

^∗

)

^m

− _(µ−λ ^m

^∗

₎

^m+1

_(µ−λ

¹²

^(m

²^∗

^+m) ₎

^m+2

· · · 0 _(µ−λ ¹

∗

)

^m

− _(µ−λ ^m

^∗

₎

^m+1

0 0 _(µ−λ ¹

∗

)

^m

.. . . ..







.

(16)

Set also λ ^∗ := λ ^∗ _i (λ ^∗ _i is the given approximation of λ = λ i ). Choose v 0 , and call α j the numbers for which v 0 = P

j: λ

j

=λ α j x _j + { . . . + P r+s

j=r+1 α j x _j + . . . }.

Note that P

j: λ

j

=λ α j x _j = P t

j=1 α j x _j . Then (A − λ ^∗ I) ^−m v ₀

= P

j: λ

j

=λ α j (A − λ ^∗ I) ^−m x _j + { · · · + P r+s

j=r+1 α j (A − λ ^∗ I) ^−m x _j + · · · }

= P

j: λ

j

=λ α j 1

(λ−λ

^∗

)

^m

x _j + { · · · + [ α ^r+1 ( _(µ−λ ¹

∗

)

^m

x _r+1 ) +α r+2 ( _(µ−λ ¹

∗

)

^m

x r+2 − (µ−λ ^m

^∗

)

^m+1

x r+1 )

+α r+3 ( _(µ−λ ¹

∗

)

^m

x _r+3 − _(µ−λ ^m

^∗

₎

^m+1

x _r+2 + _(µ−λ

¹²

^(m

²∗

^+m) )

^m+2

x _r+1 ) +α r+4 ( _(µ−λ ¹

∗

)

^m

x _r+4 − (µ−λ ^m

^∗

)

^m+1

x _r+3

+ _(µ−λ

¹²

^(m

²∗

^+m) )

^m+2

x _r+2 −

¹⁶

^m (µ−λ

³

⁺

¹²^∗

^m )

^m+3²

⁺

¹³

^m x _r+1 )

+ . . . + α r+s ( _(µ−λ ¹

∗

)

^m

x r+s − . . . + (−1) ^s−1 (µ−λ ^q

^s−1^∗

) ^(m)

^m+s−1

x r+1 ) ] + · · · }.

But this implies

(λ − λ ^∗ ) ^m (A − λ ^∗ I) ^−m v ₀

= P

j: λ

_j

=λ α j x _j + { . . . +

λ−λ

^∗

µ−λ

^∗

m

h α r+1 − α ^r+2 µ−λ ^m

^∗

+ α r+3

1 2

(m

²

+m)

(µ−λ

^∗

)

²

. . . + (−1) ^s−1 α r+s q

s−1

(m) (µ−λ

^∗

)

^s−1

x r+1

+ α r+2 − α r+3 m

µ−λ

^∗

+ α r+4

1 2

(m

²

+m)

(µ−λ

^∗

)

²

. . . + (−1) ^s−2 α r+s q

s−2

(m) (µ−λ

^∗

)

^s−2

x r+2

+ . . . + α r+s x r+s

i + · · · },

from which it is clear that the sequence (λ−λ ^∗ ) ^m (A−λ ^∗ I) ^−m v ₀ converges to an eigenvector of A associated with λ. The assertions about the rate of convergence follow by setting

p s−1 (λ) = α r+1 − α ^r+2 µ−λ ^m

^∗

+ α r+3

1 2

(m

²

+m)

(µ−λ

^∗

)

²

−α r+4

1

6

m

³

+

¹₂

m

²

+

¹₃

m

(µ−λ

^∗

)

³

. . . + (−1) ^s−1 α r+s q

s−1

(m) (µ−λ

^∗

)

^s−1

. Proof (Theorem(power)):

Assume that A is diagonalizable, so Ax j = λ j x _j , {x j } ⁿ j=1 linearly independent.

Choose v 0 , and call α j so that v 0 = P

j: λ

j

=λ

1

α j x _j + P

j: λ

j

6=λ

1

α j x _j . Note that the first sum is non null by assumption. Then we have the equalities:

A ^m v ₀ = λ ^m ₁ P

j: λ

j

=λ

1

α j x _j + P

j: λ

j

6=λ

1

α j λ ^m _j x _j ,

1 λ

^m₁

A ^m v ₀ = P

j: λ

j

=λ

1

α j x _j + P

j: λ

j

6=λ

1

α j λ

j

λ

1

m

x _j . The thesis follows letting m go to infinite.

Assume that A is not diagonalizable. Set µ := λ j , where λ j 6= λ 1 , and restrict, as above, the Jordan matrix equation X ⁻¹ AX = J to a diagonal Jordan block of order s associated with µ. Then

Ax r+1 = µx r+1 , A ^m x _r+1 = µ ^m x _r+1 ,

Ax r+2 = µx r+2 + x r+1 , A ^m x _r+2 = µ ^m x _r+2 + mµ ^m−1 x _r+1 , Ax r+3 = µx r+3 + x r+2 ,

A ^m x _r+3 = µ ^m x _r+3 + mµ ^m−1 x _r+2 + ¹ ₂ (m ² − m)µ ^m−2 x _r+1 ,

(17)

Ax r+4 = µx r+4 + x r+3 ,

A ^m x _r+4 = µ ^m x _r+4 + mµ ^m−1 x _r+3 + ¹ ₂ (m ² − m)µ ^m−2 x _r+2 +( ¹ ₆ m ³ − ¹ ₂ m ² + ¹ ₃ m)µ ^m−3 x _r+1 ,

and so on. Set λ := λ 1 and t = m a (λ 1 ) = m g (λ 1 ), so we can assume that the upper-left t × t submatrix of J is diagonal with diagonal entries all equal to λ.

Then

A ^m v ₀ = A ^m P t

j=1 α j x _j + {. . . + P r+s

j=r+1 α j x _j + . . .}

= P t

j=1 α j λ ^m x _j + {. . . + P r+s

j=r+1 α j A ^m x _j + . . .}

= λ ^m P t

j=1 α j x _j + {. . . + [ α r+1 A ^m x _r+1 + α r+2 A ^m x _r+2 +α r+3 A ^m x _r+3 + α r+4 A ^m x _r+4 + . . . + α r+s A ^m x _r+s ] + . . .}

= λ ^m P t j=1 α j x _j

+{. . . + [ α ^r+1 (µ ^m x r+1 ) + α r+2 (µ ^m x r+2 + mµ ^m−1 x r+1 ) +α r+3 (µ ^m x r+3 + mµ ^m−1 x r+2 + ¹ ₂ (m ² − m)µ ^m−2 x r+1 ) +α r+4 (µ ^m x r+4 + mµ ^m−1 x r+3 + ¹ ₂ (m ² − m)µ ^m−2 x r+2

+( ¹ ₆ m ³ − ¹ 2 m ² + ¹ ₃ m)µ ^m−3 x r+1 )

+ . . . + α r+s (µ ^m x r+s + . . . + q s−1 (m)µ ^m−s+1 x r+1 ) ] + . . .}, A ^m v ₀ = λ ^m P t

j=1 α j x _j

+{. . . + [ (α ^r+1 µ ^m + α r+2 mµ ^m−1 + α r+3 1

2 (m ² − m)µ ^m−2 +α r+4 ( ¹ ₆ m ³ − ¹ 2 m ² + ¹ ₃ m)µ ^m−3

+ . . . + α r+s q s−1 (m)µ ^m−s+1 )x r+1

+(α r+2 µ ^m + α r+3 mµ ^m−1 + α r+4 1

2 (m ² − m)µ ^m−2 + . . . +α r+s q s−2 (m)µ ^m−s+2 )x r+2 + . . . + (α r+s µ ^m )x r+s ] + . . .},

1 λ

^m

A ^m v ₀ = P t

j=1 α j x _j + {. . . +

µ λ

m

[ (α r+1 + α r+2 m µ + α r+3

1 2

(m

²

−m)

µ

²

+α r+4

1

6

m

³

−

¹₂

m

²

+

¹₃

m

µ

³

+ . . . + α r+s q

s−1

(m) µ

^s−1

)x r+1

+(α r+2 + α r+3 m µ + α r+4

1 2

(m

²

−m)

µ

²

+ . . . + α r+s q

s−2

(m) µ

^s−2

)x r+2

+ . . . + (α r+s )x r+s ] + . . .}.

It is clear that, as m goes to infinite, the sequence _λ ¹

m

A ^m v ₀ converges to an eigenvector of A associated with λ, the dominant eigenvalue of A. Finally, the assertions on the rate of convergence follow by setting

p s−1 (m) = α r+1 + α r+2 m µ + α r+3

1 2

(m

²

−m)

µ

²

+α r+4

1

6

m

³

−

¹₂

m

²

+

¹₃

m

µ

³

+ . . . + α r+s q

s−1

(m)

µ

^s−1

.

Given a n × n matrix A with eigenvalues λ 1 , λ 2 , . . . , λ n , the following result holds.

The power method, Perron-Frobenius theory, Page-Rank computation Preliminary considerations, A ∈ C n×n