• Non ci sono risultati.

Given a n × n matrix A with eigenvalues λ 1 , λ 2 , . . . , λ n , the following result holds.

N/A
N/A
Protected

Academic year: 2021

Condividi "Given a n × n matrix A with eigenvalues λ 1 , λ 2 , . . . , λ n , the following result holds."

Copied!
17
0
0

Testo completo

(1)

The power method, Perron-Frobenius theory, Page-Rank computation Preliminary considerations, A ∈ C n×n

Given a n × n matrix A with eigenvalues λ 1 , λ 2 , . . . , λ n , the following result holds.

Theorem (inverse power). If λ i is an approximation of the eigenvalue λ i of a n×n matrix A, i.e. |λ i − λ i | is smaller than |λ j − λ i |, ∀ λ j 6= λ i , if m a (λ i ) = m g (λ i ), and if v 0 ∈ C n is not orthogonal to the space spanned by the eigenvectors of A corresponding to λ i , then the sequence {v k } generated by the algorithm

(A − λ i I)a k = v k−1 , v k = a k

ka k k , k = 1, 2, . . .

converges to an eigenvector of A corresponding to the eigenvalue λ i . If A is diagonalizable, then the rate of convergence is O((max j: λ

j

6=λ

i

| λ λ

ij

−λ −λ

i

i

|) k ).

Remark . In the general case the rate of convergence is

j: λ

max

j6=λi

max

sλj

O(|p

sλj−1

(k)|| λ

i

− λ

i

λ

j

− λ

i

|

k

),

where, given the block diagonal Jordan form of A, J = X

−1

AX, for each λ

j

6= λ

i

, the number s

λj

indicates the dimension of the generic Jordan block associated with λ

j

, and p

sλj−1

(k) is a polynomial of degree s

λj

− 1, whose coefficients depend on

1

j−λi)r

, r = 1, . . . , s

λj

− 1, and on the coefficients of v

0

with respect to the column vectors of X corresponding to the Jordan block under consideration.

proof: See the Appendix. 

Let λ 1 be such that |λ 1 | = ρ(A) and assume that all λ i such that |λ i | = ρ(A) are equal to λ 1 (in such case we say that λ 1 dominates the eigenvalues of A).

Assume, moreover, that the algebraic and geometric multiplicity of λ 1 are equal.

Then the power method (see the Theorem below) can be used to compute λ 1

and an eigenvector corresponding to λ 1 .

Theorem (power). If λ 1 dominates the eigenvalues of a n × n matrix A, if m a (λ 1 ) = m g (λ 1 ), and if v 0 ∈ C n is not orthogonal to the space spanned by the eigenvectors of A corresponding to λ 1 , then the sequence {v k } generated by the algorithm

a k = Av k−1 , v k = a k

ka k k , k = 1, 2, . . .

converges to an eigenvector of A corresponding to the eigenvalue λ 1 . Moreover, u H v k+1

u H v k → λ 1 , k → +∞

for any u for which u H v k 6= 0. If A is diagonalizable, then the rate of conver- gence is O((max j: λ

j

6=λ

1

| λ λ

j1

|) k ).

Remark . In the general case the rate of convergence is

j: λ

max

j6=λ1

max

sλj

O(|p

sλj−1

(k)|| λ

j

λ

1

|

k

),

where, given the block diagonal Jordan form of A, J = X

−1

AX, for each λ

j

6= λ

1

, the number

s

λj

indicates the dimension of the generic Jordan block associated with λ

j

, and p

sλj−1

(k) is

(2)

a polynomial of degree s

λj

− 1, whose coefficients depend on

λ1r

j

, r = 1, . . . , s

λj

− 1, and on the coefficients of v

0

with respect to the column vectors of X corresponding to the Jordan block under consideration.

proof: See the Appendix. 

For our purposes it is useful to recall also the following classic result on matrix deflation: how to introduce a matrix whose eigenvalues are all equal to the eigenvalues of A except one, which, instead, is zero.

Theorem. Let A be a n × n matrix. Let λ 1 be a nonzero eigenvalue of A and y 1 a corresponding eigenvector, i.e. Ay 1 = λ 1 y 1 . Call λ 2 , λ 3 , . . . , λ n the remaining eigenvalues of A. Then the matrix W = A − w λ

1

y

1

y 1 w , w y 1 6= 0, has eigenvalues 0, λ 2 , λ 3 , . . . , λ n .

proof. Introduce S = [y 1 z 2 · · · z n ] non singular, and observe that p A (λ) = p S

−1

AS (λ) = (λ − λ 1 )q(λ), p W (λ) = p S

−1

W S (λ) = λq(λ). 

Let G be the following matrix

G =

1 4

1 4

1 3 2

4 1 8

1 11 8

16 1 4

1 16

 .

Since ρ(G) ≤ kGk = 1, we can say that the spectrum of G lies in the circle {z ∈ C : |z| ≤ 1}.

Note that Ge = 1 · e, i.e. one eigenvalue of G, λ 1 = 1, and its corresponding eigenvector, y 1 = e = [1 1 · · · 1] T , are known. If λ 2 , λ 3 denote the remaining eigenvalues of G, then we can define a matrix W , in terms of G, λ 1 , y 1 , whose eigenvalues are 0, λ 2 , λ 3 :

W = G − λ 1

w y 1 y 1 w = G − 1 w e

 w w w

 , ∀ w, w e 6= 0.

Since (e T i G)e = e T i (1 · e) = 1 6= 0, we choose w = e T i G:

W = G −

 e T i G e T i G e T i G

 . For i = 1 the matrix W becomes

W =

1 4

1 4

1 3 2

4 1 8

1 11 8

16 1 4

1 16

 −

1 4

1 4

1 1 2 4

1 4

1 1 2 4

1 4

1 2

 =

0 0 0

1

2 − 1 83 8

7

16 0 − 16 7

 ,

so, the eigenvalues of A different from 1 are the eigenvalues of W = ˜

 − 1 8 − 3 8

0 − 16 7

 , i.e. − 1 8 and − 16 7 .

Thus, 1, − 1 8 and − 16 7 are the eigenvalues of G, and also, of course, of G T .

In particular, 1 is eigenvalue of G T , but note that the eigenvector p of G T

(3)

corresponding to 1 is not obvious; it must be computed. For example, it can be computed as the limit of the inverse power sequence {v k } defined as follows (ε small positive number):

v 0 ∈ R 3 , (G T − (1 + ε)I)a k = v k−1 , v k = a k

ka k k , k = 0, 1, 2, . . . (rate of convergence: O(

1−(1+ε)

18

−(1+ε)

k )). [Here a reference for the inverse power iterations]. We shall see that the vector p can be also obtained as the limit of the sequence

p 0 ∈ R 3 , p 0 positive, kp 0 k 1 = 1, p k+1 = G T p k , k = 0, 1, 2, . . . (rate of convergence: O(

167

1

k

)) [this result is in fact a particular case of the Theorem at the beginning of this section (set k · k = k · k 1 , u = e)]. Even if the rate of convergence of the p k is not as good as the rate of convergence of the v k , the computation of p k+1 from p k is much cheaper than the computation of v k+1 from v k . In fact, for analogous problems, but of high dimension, even one step of the inverse power iterations is prohibitive.

The Perron-Frobenius theory: A ∈ R n×n , A ≥ 0 irreducible [Varga]

Lemma. Let A be a n × n non negative matrix, i.e. a ij ≥ 0, ∀ i, j. Assume that A is not reducible. Then (I + A) n−1 is a positive matrix, i.e. its entries are all positive.

proof. We shall prove that the vector (I + A) n−1 x is positive whenever x is a non negative non null vector (prove that this is equivalent to the thesis!).

Let x be a non negative non null vector. Set x 0 = x, x 1 = (I + A)x 0 = x 0 + Ax 0 , . . ., x k+1 = (I + A)x k = x k + Ax k , k = 1, . . . , n − 2. Note that x k = (I + A) k x, in particular x n−1 = (I + A) n−1 x. So, our aim is to prove that x n−1 is a positive vector. First observe by induction that all x k are non negative vectors (x k non negative and A non negative imply Ax k non negative and x k+1 = x k + Ax k non negative). Then the thesis (x n−1 positive) is now proved by showing that x k+1 must have less zeros than x k for each k ∈ {0, . . . , n − 2}.

Note that x k+1 cannot have more zeros than x k , since Ax k , in the definition x k + Ax k of x k+1 , is non negative. Assume x k+1 has the same number of zero entries as x k . Of course such zeros must be in the same places, i.e. there exists a permutation matrix P such that P x k =

 α 0



, P x k+1 =

 β 0



, with α, β positive vectors of the same dimension m, 1 ≤ m ≤ n − 1 (why such bounds for m ?). Thus, P x k+1 = P x k + P Ax k = P x k + P AP T P x k . Consider the following partition of the matrix P AP T

P AP T =

 M 11 M 12

M 21 M 22



where M 11 is m × m. Then

 β 0



=

 α 0

 +

 M 11 M 12

M 21 M 22

  α 0



,

(4)

and, in particular, M 21 α = 0. The latter condition implies M 21 = 0, being α a positive vector and M 21 a non negative matrix. But this is equivalent to say that A is reducible, against the hypothesis! 

Example. Set

I + A =

1 0 0 1

a 1 0 0

0 b 1 0

0 0 c 1

, a, b, c positive.

Note that A is a non-negative irreducible matrix, and in fact (I + A) 3 is a positive matrix, i.e. its entries are positive. Moreover, 3 is the minimum j for which (I + A) j is positive.

Let A be a n × n non negative irreducible matrix. Let x be a non negative non null vector, and associate to x the number

r x := min

i:x

i

>0

P

j a ij x j

x i

= min

i:x

i

>0

(Ax) i

x i

.

Proposition. r x is a non negative real number; r x = r αx if α > 0; Ax ≥ r x x;

r x = sup{ρ ∈ R : Ax ≥ ρx}.

proof: easy, left to the reader.

Now associate to A the following number:

r = sup

x≥0, x6=0

r x = sup

x≥0, x6=0 i:x min

i

>0

P

j a ij x j

x i

.

Proposition. r is a positive real number; if w ≥ 0, w 6= 0, is such that Aw ≥ rw, then Aw = rw and w > 0.

proof. If e = [1 1 · · · 1] T , then r e = min i: (e)

i

>0 P

j

a

ij

(e)

j

(e)

i

= min i P

j a ij ≥ 0.

Assume r e = 0. Then for some k we would have P

j a kj = 0, so the kth row of A would be null, and thus A would be reducible (exchange the k and n rows), against the hypothesis. It follows that r ≥ r e > 0.

proof. Set η = Aw − rw. We know that η ≥ 0. Assume η 6= 0. Then, by the Lemma,

0 < (I + A) n−1 η = (I + A) n−1 Aw − (I + A) n−1 rw

= A(I + A) n−1 w − r(I + A) n−1 w = Ay − ry, y > 0.

i.e. r < (Ay) i /y i ∀ i. Thus r < r y , which is absurd. It follows that η = 0, that is, Aw = rw. Then, we also have w > 0 since 0 < (I + A) n−1 w = (1 + r) n−1 w.



In the following, given v ∈ C n and M ∈ C n×n we denote by |v| and |M|, respectively, the column vector (|v k |) n k=1 and the matrix (|m ij |) n i,j=1 .

Theorem. There exists a positive vector z for which Az = rz; r = ρ(A); if

B ∈ C n×n , |B| ≤ A, then ρ(B) ≤ ρ(A); if B ∈ C n×n , |B| ≤ A, |B| 6= A then

ρ(B) < ρ(A).

(5)

proof: We now show that there exists z ≥ 0 such that r = r z . Once this is proved we will have the inequality Az ≥ r z z = rz which implies, by the Proposition, Az = rz and z > 0.

r = sup

x≥0, x6=0

r x = sup

x≥0, x6=0

r

x

kxk2

= sup

x≥0, kxk

2

=1

r x = (∗)

We have r x ≤ r (I+A)

n−1

x since r (I+A)

n−1

x = sup{ρ ∈ R : A(I + A) n−1 x ≥ ρ(I +A) n−1 x } and Ax ≥ r x x ⇒ A(I+A) n−1 x = (I +A) n−1 Ax ≥ r x (I +A) n−1 x.

Thus,

(∗) ≤ sup

x≥0, kxk

2

=1

r (I+A)

n−1

x = sup

y∈Q

r y = max

y∈Q r y = r z ≤ r

for some z ∈ Q = {(I + A) n−1 x : x ≥ 0, kxk 2 = 1} (r y is continuous in y (why?) and Q is a compact). Note that z > 0.

proof: Let λ be an eigenvalue of A, i.e. ∃ y 6= 0 | Ay = λy. Then |λ||y| = |λy| =

|Ay| ≤ A|y|, |y| ≥ 0, |y| 6= 0. Thus, by definition of r |y| , |λ| ≤ r |y| ≤ r, and we have the inequality ρ(A) ≤ r. But r is an eigenvalue of A, thus r = ρ(A).

proof: Let λ be an eigenvalue of B, i.e. ∃ y 6= 0 | By = λy. Then |λ||y| =

|λy| = |By| ≤ |B||y| ≤ A|y|, |y| ≥ 0, |y| 6= 0. Thus, by definition of r |y| ,

|λ| ≤ r |y| ≤ r, and we have the inequality ρ(B) ≤ r = ρ(A).

proof: Assume ρ(B) = ρ(A), i.e. there exists λ eigenvalue of B (By = λy, y 6= 0) such that |λ| = r. Then we can add an equality in the above arguments,

r|y| = |λ||y| = |λy| = |By| ≤ |B||y| ≤ A|y|, |y| ≥ 0, |y| 6= 0,

obtaining the inequality r|y| ≤ A|y|, |y| ≥ 0, |y| 6= 0. But by the above Proposition, such inequality implies A|y| = r|y| with |y| > 0. So we have

r|y| = |λ||y| = |λy| = |By| = |B||y| = A|y| = r|y|, |y| > 0,

from which it follows |B| = A, against the hypothesis! Thus ρ(B) < ρ(A).  We conclude this section with the following result of the Perron-Frobenius theory, stated without proof (actually we shall give a proof of such result in the case A is positive):

Result. If A is a non-negative irreducible n×n matrix, then r = ρ(A) is a simple eigenvalue of A, i.e. λ − r divides p A (λ) but (λ − r) 2 does not divide p A (λ).

Computing the Perron-pair of A irreducible non-negative by the power method Let A be an irreducible non-negative n × n matrix. Then we know that ρ(A) is positive and is a simple eigenvalue of A, and the corresponding eigenvector can be chosen positive. Of course, such eigenvector is uniquely defined if we require that its measure in the 1-norm is one. We also know that ρ(A) = r :=

sup x≥0, x6=0 min i: x

i

>0 (Ax)

i

x

i

.

So, it is uniquely defined the Perron-pair (ρ(A), z). such that Az = ρ(A)z, ρ(A) positive, z positive, kzk 1 = 1. The power method is a way to compute such Perron-pair.

Theorem(power). Let A be an irreducible non-negative n × n matrix. Let a 0 be any positive vector, and set v 0 = a 0 /ka 0 k 1 . Then set

a k+1 = Av k , v k+1 = a k+1

ka k+1 k 1 , k = 0, 1, 2, . . . .

(6)

Note that the sequences {a k }, {v k } are well defined sequences of positive vectors, and kv k k 1 = 1, ∀ k. Let X be the non singular matrix defining, by similarity, the Jordan block-diagonal form of A, i.e.

X −1 AX = J =

 r 0 T

0 B

 , X =

 z x 2 · · · x n

 ,

and let r = ρ(A), λ j , j = 2, . . . , n, be the eigenvalues of A (λ j 6= r, ∀ j). If α in the expression a 0 = αz + P n

j=2 α j x j is nonzero, then, for k → +∞, we have v k − z → 0, ka k k 1 − ρ(A) → 0,

provided that |λ j | is smaller than r for all j = 2, . . . , n. In the particular case where A is diagonalizable, the rate of convergence is

j=2...n max

|λ j | r

 k

[for the general case, use the Remark in Theorem(power) of the section on preliminary considerations].

proof: We prove the Theorem only in the case A diagonalizable, where Ax j = λ j x j , j = 2, . . . , n. It is easy to observe that A k a 0 = αr k z + P n

j=2 α j λ k j x j is a positive vector, and that

v k = αr

k

z+ P

n

j=2

α

j

λ

kj

x

j

kαr

k

z+ P

n

j=2

α

j

λ

kj

x

j

k

1

= αr

k

z+ P

n

j=2

α

j

λ

kj

x

j

e

T

(αr

k

z+ P

n

j=2

α

j

λ

kj

x

j

)

= αr

k

z+ P

n

j=2

α

j

λ

kj

x

j

αr

k

e

T

z+ P

n

j=2

α

j

λ

kj

e

T

x

j

=

z+ P

n j=2

αj α



λj r



k

x

j

1+ P

n j=2

αj α



λj r



k

e

T

x

j

.

Moreover,

a k+1 = Av k =

rz + P n j=2

α

j

α

 λ

j

r

 k

λ j x j 1 + P n

j=2 α

j

α

 λ

j

r

 k

e T x j and, since a k+1 is positive,

ka k+1 k 1 = e T a k+1 =

r+ P

n j=2

αj α



λj r



k

λ

j

e

T

x

j

1+ P

n j=2

αj α



λj r



k

e

T

x

j

=

r 1+ P

n j=2

αj α



λj r



k+1

e

T

x

j



1+ P

n j=2

αj α



λj r



k

e

T

x

j

.

Exercise. Discuss the convergence of the power method when applied to compute the Perron-pair (1,

 1

2 1 2



) of the following two 2 × 2 matrices:

A =

 0 1 1 0

 , A =

 1/4 3/4 3/4 1/4



.

(7)

In the second case, find a constant c such that kv k − zk ≤ c( 1 2 ) k .

Exercise. Prove the Theorem in case A (non negative, irreducible) is 4 × 4, and there exists X non singular for which

X −1 AX =

 r

λ 2 1 λ 2 1

λ 2

 ,

with |λ 2 | smaller than r. Prove that in such case the rate of convergence is

|p 2 (k)|

λ 2

r

 k

, where p 2 is a degree-two polynomial.

Exercise. Set

A =

0 0 1

1

4 0 0

3

4 1 0

 .

Prove that the eigenvalues of A are {1, − 1 2 , − 1 2 } and that A is not diagonalizable.

Find X = [z x 2 x 3 ] such that

X −1 AX =

1 0 0

0 − 1 2 1 0 0 − 1 2

 .

The only hypothesis A irreducible, non-negative, does not assure that r = ρ(A) is the unique eigenvalue of A whose absolute value is equal to r. We only know that if |λ j | = r, j ∈ {2, . . . , n}, then λ j 6= r. Let us see examples:

A =

 0 1 1 0



; Eigenvalues: − 1, 1; Perron-pair: (1,

 1

2 1 2

 ).

A =

0 a 0

1 0 1

0 1 − a 0

 , a ∈ (0, 1); Eig: − 1, 0, 1; Perron-pair: (1,

a 2 1 1−a 2

2

).

A =

2 1 1 1 2 1 1 1 2

 ; Eigenvalues: 1, 1, 4; Perron-pair: (4,

 1/3 1/3 1/3

).

Note that if in the second example A is replaced by A T , then there is no need of computation in obtaining the perron-pair, since it is clear that A T e = e, e = [1 1 1] T . Moreover, in the third example r = ρ(A) (which is 4) dominates the remaining eigevalues of A (which are 1, 1). We note that this fact is true for any positive matrix A (see Theorem(positive) below), even if, as the following example shows, it is not a peculiarity of positive matrices:

A =

2 1 0 1 2 1 0 1 2

 ; Eig: 2 ∓ √

2, 2; Perron-pair: (2 + √ 2, 1

2 + √ 2

√ 1 2 1

)

(8)

(here computation is required to obtain the Perron-pair).

Theorem(positive). If A is a n × n positive matrix and r = ρ(A), λ 2 , . . . , λ n are its eigenvalues, then |λ j | is smaller than r for all j = 2, . . . , n.

proof: Set W = A − sze T . The eigenvalues of W are r − s, λ 2 , . . . , λ n (a sketch of the proof:

Y := 

z y

2

· · · y

n

 non singular, Y

−1

AY =

 r · · ·

0 M



, Y

−1

W Y =

 r − s · · ·

0 M

 ).

If there is a value of s for which |W | ≤ A, |W | 6= A, then ρ(W ) is smaller than ρ(A), and thus |λ 2 |, . . . , |λ n | are smaller than r = ρ(A). If A is positive, then such s exists, s = min i,j a ij . 

Irreducible non-negative stochastic-by-columns A and power method

Let A be an irreducible non-negative stochastic by columns n × n matrix. Then we know that 1 = ρ(A) (A T e = e, ρ(A) ≤ kAk 1 = 1), that r = 1 = ρ(A) is a simple eigenvalue of A, and that the corresponding eigenvector can be chosen positive. Of course, such eigenvector is uniquely defined if we require that its measure in the 1-norm is one.

So, it is uniquely defined the Perron-pair (1 = ρ(A), z). such that Az = z, z positive, kzk 1 = 1. The computation of such Perron-pair, i.e. the computation of the vector z, can be performed via the power method. Actually, we now see that the power method in the particular case where A is stochastic-by-columns (besides non-negative and irreducible) can be rewritten in a simpler form and converges independently from the choice of a 0 (provided that 1 dominates the other eigenvalues). These results follow from some remarks, reported in the following Proposition.

Proposition. Let A be an irreducible non-negative stochastic-by-columns n × n matrix. Then

i)

v ∈ C n ⇒ X

i

(Av) i = X

i

v i

( P

i (Av) i = P

i

P

j a ij v j = P

j v j P

i a ij = P

j v j ), ii)

v positive, kvk 1 = 1 ⇒ Av positive, kAvk 1 = 1 (use the irreducibility of A and assertion i)),

iii)

Av = λv, λ 6= 1, ⇒ X

i

v i = 0 ( P

i v i = P

i (Av) i = λ P

i v i , thus (λ − 1) P

i v i = 0).

Exercise. Assume that X −1 AX = J where J is the Jordan block diagonal form of A, where A is an irreducible non-negative stochastic-by-columns n×n matrix.

Assume that [J] 11 = 1. Prove that P

i [X] ij = 0, j = 2, . . . , n.

Corollary(power). Let A be an irreducible non-negative stochastic-by-columns n × n matrix. Let a 0 be any positive vector such that ka 0 k 1 = 1. Then set

a k+1 = Aa k , k = 0, 1, 2, . . . .

(9)

The sequence {a k } is a well defined sequence of positive vectors, and ka k k 1 = 1 = ρ(A), ∀ k. If k → +∞, then

a k − z → 0,

provided that |λ j | is smaller than 1 for all j = 2, . . . , n. The latter event is assured by Theorem(positive) when A is positive. In the particular case where A is diagonalizable, the rate of convergence is

( max

j=2...n |λ j |) k

[for the general case, use the Remark in Theorem(power) of the section on preliminary considerations].

proof: The sequences a k and v k defined in Theorem(power) coincide, because, by Proposition ii), we have kAv k k 1 = kv k k 1 . It remains to show that α in the expression a 0 = αz + P

j α j x j is nonzero. Note that e T a 0 = αe T z + P

j α j e T x j = α + P

j α j e T x j . In the case that A is diagonalizable the thesis follows by Proposition iii) which asserts that e T x j , j = 2, . . . , n, must be zero.

In the generic case the proof is left to the reader . . .. 

Exercise. Discuss existence, unicity, and computation of the Perron-pair for

A =

0 1 − a a

a 0 1 − a

1 − a a 0

 , a ∈ [0, 1], A =

0 0 1

a 0 0

1 − a 1 0

 , a ∈ [0, 1],

A =

0 b 1

1 0 b 2

1 − b 1 0 . ..

1 − b 2 . .. b n−2

. .. 0 1

1 − b n−2 0

, b i ∈ (0, 1).

Page-rank computation [Berkhin, et al]

Consider an oriented graph with set of verteces V = {1, 2, . . . , n}, and set of edges E, i, j ∈ E if there is a link from i to j.

Associate to such graph the adiacency matrix:

L =

 1 ij ∈ E 0 ij / ∈ E

Call deg (i) the number of edges starting from i. Of course, deg (i) = 0 (no edge starts from i) iff L ij = 0, ∀ j. Moreover, deg (i) = P

j L ij . Associate to the graph the transition matrix:

P =

( 1

deg (i) ij ∈ E

0 ij / ∈ E

(10)

Note that P ij = L ij / deg (i) if i is such that deg (i) > 0, and P ij = L ij = 0 otherwise. Moreover, P

j P ij = 1 if deg (i) > 0 and P

j P ij = 0 otherwise. So, the matrix P is a non negative matrix quasi-stochastic by rows.

Remark. Row i of P is null iff no edge starts from i; column j of P is null iff no edge points to j.

Let p ∈ R n be the vector whose entry j, p j , is the importance (authority) of the vertex j. Then

p j = X

i: i→j

p i

deg (i) =

n

X

i=1

P ij p i =

n

X

i=1

P ji T p i = (P T p) j , p = P T p

(note that such fixed point p may not exist, or, if exists, may be not unique or with zero entries; see below).

Let p (k+1) j be the probability that at step k + 1 of my visit of the graph (navigation on the web) I am on the vertex (page) j. Then

p (k+1) j = P

i: i→j p

(k)i

deg (i) = P n

i=1 P ij p (k) i

= P n

i=1 P ji T p (k) i = (P T p (k) ) j , p (k+1) = P T p (k)

(note that such sequence of vector probabilities exists and is uniquely defined, once p (0) is given, but the p (k) may loss the possible ddp property of p (0) ; see below). [A vector w is said ddp (discrete distribution of probability) if w is positive and kwk 1 = 1].

Note that it is natural to require that: p (k) i > 0 (at step k there is a prob- ability that I am on vertex i), P

j p (k) j = 1 (at step k I am on some vertex);

p i > 0 (any vertex has a portion of importance . . .), P

i p i = 1 (. . . of the total 1).

So, the following facts must be true:

a) p such that p = P T p , p > 0, kpk 1 = 1, exists and is uniquely defined, b) the method p (0) = ddp, p (k+1) = P T p (k) converges to p.

We now show that in order to have a) and b), the matrix P T must be both stochastic by columns and irreducible.

Theorem(stoc). If the above facts a) and b) are true, then P T must be stochastic by columns.

proof: We know that ∃ ! p such that p = P T p , p > 0, kpk 1 = 1. Assume that P T is quasi-stochastic but not stochastic by columns. Then

kP T p k 1 = P

i (P T p) i = P

i

P

j (P T ) ij p j = P

j

P

i P ji p j

= P

j: deg (j)>0 1 · p j + P

j: deg (j)=0 0 · p j < P

j p j = kpk 1 , analogously,

kp (k+1) k 1 = kP T p (k) k 1 ≤ kp (k) k 1 ≤ kp (1) k 1 < kp (0) k 1 = 1.

Thus p cannot be equal to P T p, and p (k) cannot converge to a ddp. 

Theorem(irred). If the above facts a) and b) are true, then P T must be irre-

ducible.

(11)

proof: assume P T reducible. Then there exists a permutation matrix Q such that

Q T P T Q =

 A 11 A 12

0 A 22



, (∗)

with A 11 and A 22 at least 1×1 square matrices. Note that Q T P T Q is stochastic by columns, like P T . We know that ∃ ! p such that p = P T p , p > 0, kpk 1 = 1, but this is equivalent to say that ∃ ! Q T p such that

Q T p =

 A 11 A 12

0 A 22



Q T p , Q T p > 0, kQ T p k 1 = 1. (∗∗) Case 1: assume A 12 = 0. Then A 11 and A 22 are non negative stochastic by columns matrices. We can assume they are also irreducible (why?). Then, by the Perron-Frobenius theory,

∃ ! y 1 , y 2 > 0, ky 1 k 1 = ky 2 k 1 = 1, y 1 = A 11 y 1 , y 2 = A 22 y 2 . and, as a consequence, for all α ∈ (0, 1) we have

 αy 1

(1 − α)y 2



=

 A 11 0 0 A 22

  αy 1

(1 − α)y 2

 ,

 αy 1

(1 − α)y 2



> 0, k

 αy 1

(1 − α)y 2

 k 1 = 1.

So, all vectors p such that Q T p =

 αy 1

(1 − α)y 2



satisfy the properties p > 0, kpk 1 = 1, p = P T p, which is against the hypothesis of unicity.

Case 2: assume A 12 6= 0. Then A 22 in (*) is not stochastic by columns, thus A 22 may have no eigenvalue equal to 1, i.e. the equations in (**) involving A 22

may be verified only if part of p is null, against the hypothesis of positiveness of p. 

Viceversa, we know that if the non negative matrix P T is irreducible and stochastic by columns, then 1 = ρ(P T ) is a simple eigenvalue of P T and there exists a unique vector p such that p = P T p , p > 0, kpk 1 = 1. We also know that such hypotheses are not sufficient to assure the convergence (to p) of the sequence p (k+1) = P T p (k) , p (0) > 0, kp 0 k 1 = 1, or, equivalently, to assure that the remaining eigenvalues of P T have absolute value smaller than 1. We can only say that the sequence {p (k) } is a well defined sequence of ddp.

We have to modify P (the graph) so to make well posed (∃ !) the mathemat- ical problem and to make convergent the algorithm for solving it.

Make P T stochastic:

P 0 = P + dv T , d =

δ deg (1),0

.. . δ deg (n),0

 , v =

1 n .. .

1 n

i.e, where P has null rows P 0 has the row vector v T . The vertex i with deg (i) =

0 now links to all verteces of the graph. [We discuss the uniform case, but what

follows can be repeated for the more general case v =ddp].

(12)

Observe that (P 0 ) T is stochastic by columns, (P 0 ) T ≥ 0, thus 1 is eigenvalue of (P 0 ) T and the other eigenvalues of (P 0 ) T , λ 0 2 , . . . , λ 0 n , are such that |λ 0 j | ≤ 1.

Make (P 0 ) T irreducible:

P 00 = cP 0 + (1 − c)ev T , e =

 1

.. . 1

 , c ∈ (0, 1).

Since

e T i P 00 =

( cv T + (1 − c)v T = v T deg (i) = 0 c[· · · 0 deg 1 (i) 0 · · ·] + (1 − c)v T deg (i) > 0

we are assuming that a visitor of the graph can go from the vertex i to one of its neighborhoods with probability c/ deg (i) + (1 − c)/n , and with probability (1−c)/n to an arbitrary vertex of the graph. Of course the parameter c must be chosen near to 1, in order to maintain our model faithful to the way the graph (web) is visited.

Observe that (P 00 ) T is stochastic by columns and positive, therefore, in par- ticular, it is non negative and irreducible. So, we have all we need.

Theorem(page-rank). 1 = ρ((P 00 ) T ) is a simple eigenvalue of (P 00 ) T , there exists a unique vector p such that p = (P 00 ) T p , p > 0, kpk 1 = 1 (i.e. we have fact (a)), and the other eigenvalues of (P 00 ) T , λ 00 2 , . . . , λ 00 n , are such that |λ 00 j | < 1 (by Theorem(positive)). Thus, p (k+1) = (P 00 ) T p (k) , p (0) > 0, kp (0) k 1 = 1, is a sequence of ddp convergent to p and, in case P 00 is diagonalizable,

kp (k) − pk = O(( max

j=2,...,n |λ 00 j |) k )

[for the general case see the Remark in Theorem(power) of Section 1] (i.e. we have fact (b)). Moreover, for the particular choice of (P 00 ) T , the cost of each step of the power method is O(n) and is dominated by the cost of the matrix-vector multiplication P T z , and if A is diagonalizable, then the rate of convergence is

kp (k) − pk = O(c k )

[for the general case see the Remark in Theorem(power) of Section 1] (Google- search engine sets c = 0.85 [Berkhin]).

proof: We have to prove only the final assertions.

O(n) arithmetic operations are sufficient to perform each step of the power method. We have

(P 00 ) T = c(P T + vd T ) + (1 − c)ve T , and, if x ≥ 0, then

(P 00 ) T x = cP T x + γv,

γ = cd T x + (1 − c)e T x = e T x − c[e T x − d T x ] = kxk 1 − ckP T x k 1 (why the latter equality holds?). Thus, in order to compute p (k+1) from p (k) one can use the following function

y = cP T x,

γ = kxk 1 − kyk 1 ,

(P 00 ) T x = y + γv

(13)

where the dominant operation is the matrix-vector multiplication P T x. Note that each row j of P T has (on the average) a very small number of nonzero entries, i.e. exactly the number of vertices pointing to j, so P T x can be com- puted with O(n) arithmetic operations. Note also that in order to implement the above function one needs only 2n memory allocations.

Rate of convergence kp (k) − pk = O(c k ). We first prove that if e T v = 1 then p P

0

(λ) = (λ − 1)p n−1 (λ) ⇒ p P

0

+

1−cc

ev

T

(λ) = (λ − 1

c )p n−1 (λ). (∗ ∗ ∗) In fact,

S = 

e y 2 · · · y n  , det(S) 6= 0, S −1 P 0 S =

 1 u T

0 M



, p P

0

(λ) = (λ − 1)p M (λ),

S −1 (P 0 + 1−c c ev T )S =

 1 u T

0 M



+ 1−c c S −1 ev T S

=

 1 u T

0 M

 + 1−c c

1 · · · ∗ 0 · · · 0

 =

 1 c u ˜ T

0 M

 .

As a consequence of (***), if 1, λ 0 2 , . . . , λ 0 n are the eigenvalues of P 0 (re- call that |λ 0 j | ≤ 1, j = 2, . . . , n), then the eigenvalues of P 0 + 1−c c ev T are

1

c , λ 0 2 , . . . , λ 0 n , and thus the eigenvalues of cP 0 + (1 − c)ev T are 1, cλ 0 2 , . . . , cλ 0 n . It follows that if A is diagonalizable, then kp (k) −pk = O((max j=2,...,n |cλ 0 j |) k ) = O(c k ). 

Why to compute p? [Berkhin].

QUERY: Berkhin survey.

Go in the inverted terms document file, which is a table containing a row for each term of a collection’s dictionary. In such file, for each term there is a list of all documents that contain such term

.. .

term → LISTA

term

= {i

1

, i

2

, . . . , i

k

} ⊂ {1, 2, . . . , n}

.. .

Berkhin → LISTA

Berkhin

= {1, 4, 6}

.. .

survey → LISTA

survey

= {1, 3}

.. .

Define the set of relevance of the query

term ∈ QUERY LISTA term = {1, 3, 4, 6}

Reading p (p is updated once each month) considers and orders the correspond- ing set of authorities {p 1 , p 3 , p 4 , p 6 }, for example p 4 ≥ p 6 ≥ p 3 ≥ p 1

Finally, show the titles of the documents 1, 3, 4, 6 in the order 4, 6, 3, 1, from the

one with greatest authority to the one with smallest authority.

(14)

Main criticism: This procedure, being independent from the query allows a fast answer, but does not make distinction between pages with authority from pages with authority on a specific subject.

Exercise. Draw the graph whose transition matrix is

P =

0 1/2 1/2 0 0 0

0 0 0 0 0 0

1/3 1/3 0 0 1/3 0

0 0 0 0 1/2 1/2

0 0 0 1/2 0 1/2

0 0 0 1 0 0

 .

Note that P is non negative, reducible and quasi-stochastic, but not stochastic, by rows. Prove that there is no positive vector p such that p = P T p. Starting from P and proceeding as indicated in the theory, introduce a non negative matrix P 0 stochastic by rows. Note that P 0 is reducible, like P . Prove that there is no positive vector p such that p = (P 0 ) T p . Starting from P 0 and proceeding as indicated in the theory, introduce a non negative matrix P 00 irreducible and stochastic by rows. Prove that there exists a unique positive vector p such that p = (P 00 ) T p , kpk 1 = 1, and describe an algorithm for the computation of p.

Here below is an approximation of such vector p:

p T = [0.03721 0.05396 0.04151 0.3751 0.206 0.2862].

Assume, for example, that the set of relevance of a query is {1, 3, 4, 6}. Then the documents 1, 3, 4, 6 are listed in the order 4, 6, 3, 1, being p 4 ≥ p 6 ≥ p 3 ≥ p 1 .

APPENDIX

Proof (Theorem(inverse power)):

Assume A diagonalizable, Ax j = λ j x j , {x j } n j=1 linearly independent. Then Ax j − λ i x j = (λ j − λ i )x j ⇒ (A − λ i I) −m x j = 1

(λ j − λ i ) m x j . Call α j the numbers for which v 0 = P

j: λ

j

i

α j x j + P

j: λ

j

6=λ

i

α j x j . Note that the first sum is a non null vector (by the assumption on v 0 ). Then the following equality holds

(λ i − λ i ) m (A − λ i I) −m v 0 = X

j: λ

j

i

α j x j + X

j: λ

j

6=λ

i

α j

 λ i − λ i

λ j − λ i

 m

x j

which, for m → +∞, yields the thesis.

Assume that A is not diagonalizable. In this case call x j , j = 1, . . . , n, the

columns of the non singular matrix X for which X −1 AX = J where J is the

block diagonal Jordan form of A. Assume that the upper-left diagonal block of

J is diagonal and its diagonal contains all λ j such that λ j = λ := λ i . Assume

that there are t of such eigenvalues. Then consider any other diagonal block

of J; of course it corresponds to an eigenvalue λ j different from λ, call such

(15)

eigenvalue µ. The block under consideration has an order, say s = s µ .

J =

 λ

λ . . .

λ

µ 1

µ . . . . . . 1

µ

 .

Restrict the matrix equation X −1 AX = J to this block, thus, for some r ≥ t we have

A[x

r+1

x

r+2

x

r+3

· · · x

r+s

] = [x

r+1

x

r+2

x

r+3

· · · x

r+s

]

µ 1

µ . . . . . . 1

µ

 .

It follows that

Ax r+1 = µx r+1 , (A − λ I)x r+1 = (µ − λ )x r+1 ,

(A − λ I) −1 x r+1 = µ−λ 1

x r+1 , (A − λ I) −m x r+1 = (µ−λ 1

)

m

x r+1 . Ax r+2 = µx r+2 + x r+1 , (A − λ I)x r+2 = (µ − λ )x r+2 + x r+1 , (A − λ I) −1 x r+2 = µ−λ 1

x r+2(µ−λ 1

)

2

x r+1 ,

(A − λ I) −m x r+2 = (µ−λ 1

)

m

x r+2 − (µ−λ m

)

m+1

x r+1 .

Ax r+3 = µx r+3 + x r+2 , (A − λ I)x r+3 = (µ − λ )x r+3 + x r+2 , (A − λ I) −1 x r+3 = µ−λ 1

x r+3(µ−λ 1

)

2

x r+2 + (µ−λ 1

)

3

x r+1 , (A − λ I) −m x r+3 = (µ−λ 1

)

m

x r+3 − (µ−λ m

)

m+1

x r+2 +

12

(m

2

+m) (µ−λ

)

m+2

x r+1 . Ax r+4 = µx r+4 + x r+3 , (A − λ I)x r+4 = (µ − λ )x r+4 + x r+3 , (A − λ I) −1 x r+4 = µ−λ 1

x r+4(µ−λ 1

)

2

x r+3

+ (µ−λ 1

)

3

x r+2 − (µ−λ 1

)

4

x r+1 ,

(A − λ I) −m x r+4 = (µ−λ 1

)

m

x r+4(µ−λ m

)

m+1

x r+3 + (µ−λ

12

(m

2

+m) )

m+2

x r+2

16

m (µ−λ

3

+

12

m )

m+32

+

13

m x r+1 ,

(A − λ I) −m x r+s = 1

(µ − λ ) m x r+s − . . . + (−1) s−1 q s−1 (m)

(µ − λ ) m+s−1 x r+1 , (A − λ I) −m [x r+1 x r+2 x r+3 · · ·]

= [x r+1 x r+2 x r+3 · · ·]

1

(µ−λ

)

m

(µ−λ m

)

m+1

(µ−λ

12

(m

2

+m) )

m+2

· · · 0 (µ−λ 1

)

m

(µ−λ m

)

m+1

0 0 (µ−λ 1

)

m

.. . . ..

.

(16)

Set also λ := λ i i is the given approximation of λ = λ i ). Choose v 0 , and call α j the numbers for which v 0 = P

j: λ

j

=λ α j x j + { . . . + P r+s

j=r+1 α j x j + . . . }.

Note that P

j: λ

j

=λ α j x j = P t

j=1 α j x j . Then (A − λ I) −m v 0

= P

j: λ

j

=λ α j (A − λ I) −m x j + { · · · + P r+s

j=r+1 α j (A − λ I) −m x j + · · · }

= P

j: λ

j

=λ α j 1

(λ−λ

)

m

x j + { · · · + [ α r+1 ( (µ−λ 1

)

m

x r+1 ) +α r+2 ( (µ−λ 1

)

m

x r+2 − (µ−λ m

)

m+1

x r+1 )

+α r+3 ( (µ−λ 1

)

m

x r+3(µ−λ m

)

m+1

x r+2 + (µ−λ

12

(m

2

+m) )

m+2

x r+1 ) +α r+4 ( (µ−λ 1

)

m

x r+4 − (µ−λ m

)

m+1

x r+3

+ (µ−λ

12

(m

2

+m) )

m+2

x r+2

16

m (µ−λ

3

+

12

m )

m+32

+

13

m x r+1 )

+ . . . + α r+s ( (µ−λ 1

)

m

x r+s − . . . + (−1) s−1 (µ−λ q

s−1

) (m)

m+s−1

x r+1 ) ] + · · · }.

But this implies

(λ − λ ) m (A − λ I) −m v 0

= P

j: λ

j

=λ α j x j + { . . . + 

λ−λ

µ−λ

 m

h α r+1 − α r+2 µ−λ m

+ α r+3

1 2

(m

2

+m)

(µ−λ

)

2

. . . + (−1) s−1 α r+s q

s−1

(m) (µ−λ

)

s−1

x r+1

+ α r+2 − α r+3 m

µ−λ

+ α r+4

1 2

(m

2

+m)

(µ−λ

)

2

. . . + (−1) s−2 α r+s q

s−2

(m) (µ−λ

)

s−2

x r+2

+ . . . + α r+s x r+s

i + · · · },

from which it is clear that the sequence (λ−λ ) m (A−λ I) −m v 0 converges to an eigenvector of A associated with λ. The assertions about the rate of convergence follow by setting

p s−1 (λ) = α r+1 − α r+2 µ−λ m

+ α r+3

1 2

(m

2

+m)

(µ−λ

)

2

−α r+4

1

6

m

3

+

12

m

2

+

13

m

(µ−λ

)

3

. . . + (−1) s−1 α r+s q

s−1

(m) (µ−λ

)

s−1

. Proof (Theorem(power)):

Assume that A is diagonalizable, so Ax j = λ j x j , {x j } n j=1 linearly independent.

Choose v 0 , and call α j so that v 0 = P

j: λ

j

1

α j x j + P

j: λ

j

6=λ

1

α j x j . Note that the first sum is non null by assumption. Then we have the equalities:

A m v 0 = λ m 1 P

j: λ

j

1

α j x j + P

j: λ

j

6=λ

1

α j λ m j x j ,

1

λ

m1

A m v 0 = P

j: λ

j

1

α j x j + P

j: λ

j

6=λ

1

α j λ

j

λ

1

 m

x j . The thesis follows letting m go to infinite.

Assume that A is not diagonalizable. Set µ := λ j , where λ j 6= λ 1 , and restrict, as above, the Jordan matrix equation X −1 AX = J to a diagonal Jordan block of order s associated with µ. Then

Ax r+1 = µx r+1 , A m x r+1 = µ m x r+1 ,

Ax r+2 = µx r+2 + x r+1 , A m x r+2 = µ m x r+2 + mµ m−1 x r+1 , Ax r+3 = µx r+3 + x r+2 ,

A m x r+3 = µ m x r+3 + mµ m−1 x r+2 + 1 2 (m 2 − m)µ m−2 x r+1 ,

(17)

Ax r+4 = µx r+4 + x r+3 ,

A m x r+4 = µ m x r+4 + mµ m−1 x r+3 + 1 2 (m 2 − m)µ m−2 x r+2 +( 1 6 m 31 2 m 2 + 1 3 m)µ m−3 x r+1 ,

and so on. Set λ := λ 1 and t = m a (λ 1 ) = m g (λ 1 ), so we can assume that the upper-left t × t submatrix of J is diagonal with diagonal entries all equal to λ.

Then

A m v 0 = A m P t

j=1 α j x j + {. . . + P r+s

j=r+1 α j x j + . . .} 

= P t

j=1 α j λ m x j + {. . . + P r+s

j=r+1 α j A m x j + . . .}

= λ m P t

j=1 α j x j + {. . . + [ α r+1 A m x r+1 + α r+2 A m x r+2 +α r+3 A m x r+3 + α r+4 A m x r+4 + . . . + α r+s A m x r+s ] + . . .}

= λ m P t j=1 α j x j

+{. . . + [ α r+1m x r+1 ) + α r+2 (µ m x r+2 + mµ m−1 x r+1 ) +α r+3 (µ m x r+3 + mµ m−1 x r+2 + 1 2 (m 2 − m)µ m−2 x r+1 ) +α r+4 (µ m x r+4 + mµ m−1 x r+3 + 1 2 (m 2 − m)µ m−2 x r+2

+( 1 6 m 31 2 m 2 + 1 3 m)µ m−3 x r+1 )

+ . . . + α r+s (µ m x r+s + . . . + q s−1 (m)µ m−s+1 x r+1 ) ] + . . .}, A m v 0 = λ m P t

j=1 α j x j

+{. . . + [ (α r+1 µ m + α r+2 mµ m−1 + α r+3 1

2 (m 2 − m)µ m−2 +α r+4 ( 1 6 m 31 2 m 2 + 1 3 m)µ m−3

+ . . . + α r+s q s−1 (m)µ m−s+1 )x r+1

+(α r+2 µ m + α r+3 mµ m−1 + α r+4 1

2 (m 2 − m)µ m−2 + . . . +α r+s q s−2 (m)µ m−s+2 )x r+2 + . . . + (α r+s µ m )x r+s ] + . . .},

1

λ

m

A m v 0 = P t

j=1 α j x j + {. . . + 

µ λ

 m

[ (α r+1 + α r+2 m µ + α r+3

1 2

(m

2

−m)

µ

2

+α r+4

1

6

m

3

12

m

2

+

13

m

µ

3

+ . . . + α r+s q

s−1

(m) µ

s−1

)x r+1

+(α r+2 + α r+3 m µ + α r+4

1 2

(m

2

−m)

µ

2

+ . . . + α r+s q

s−2

(m) µ

s−2

)x r+2

+ . . . + (α r+s )x r+s ] + . . .}.

It is clear that, as m goes to infinite, the sequence λ 1

m

A m v 0 converges to an eigenvector of A associated with λ, the dominant eigenvalue of A. Finally, the assertions on the rate of convergence follow by setting

p s−1 (m) = α r+1 + α r+2 m µ + α r+3

1 2

(m

2

−m)

µ

2

+α r+4

1

6

m

3

12

m

2

+

13

m

µ

3

+ . . . + α r+s q

s−1

(m)

µ

s−1

.

Riferimenti

Documenti correlati

This article proves a case of the p-adic Birch and Swinnerton–Dyer conjecture for Garrett p-adic L-functions of (Bertolini et al. in On p-adic analogues of the Birch and

The decay amplitudes and the production polarisation are determined from the moments using a Bayesian analysis.. The marginalisation over unwanted parameters is performed using

Thus we can have solutions with fast decay, corresponding to trajectory converging to the origin, and with slow decay, which are the ones described in the thesis.. Note that this

As previously explained, the two most signi cant parameters for the target performances are the funnel diameter D f (which regulates the velocity of the coolant in the funnel, and

Furthermore an orthonormal basis will be constructed explicitly and some interesting properties of the elements of such a basis will be investigated1. The starting point is a

Usually, this fact is proved by non-constructive techniques, althought a constructive proof, depending on the Principle of Uniform Boundedness can be found in [8]... This argument

In this paper we are interested in possible extensions of this multiplicity result to the case of a system of differential equations: we have in mind in particular the physical model

In particular it linearizes the system in the following way: if the input signal at the gate increases, a larger drain current will flow through the inductor, its voltage drop