The power method, Perron-Frobenius theory, Page-Rank computation Preliminary considerations, A ∈ C n×n
Given a n × n matrix A with eigenvalues λ 1 , λ 2 , . . . , λ n , the following result holds.
Theorem (inverse power). If λ ∗ i is an approximation of the eigenvalue λ i of a n×n matrix A, i.e. |λ i − λ ∗ i | is smaller than |λ j − λ ∗ i |, ∀ λ j 6= λ i , if m a (λ i ) = m g (λ i ), and if v 0 ∈ C n is not orthogonal to the space spanned by the eigenvectors of A corresponding to λ i , then the sequence {v k } generated by the algorithm
(A − λ ∗ i I)a k = v k−1 , v k = a k
ka k k , k = 1, 2, . . .
converges to an eigenvector of A corresponding to the eigenvalue λ i . If A is diagonalizable, then the rate of convergence is O((max j: λj6=λ
i| λ λij−λ −λ
∗i∗
−λ −λ
∗i∗i
|) k ).
Remark . In the general case the rate of convergence is
j: λ
max
j6=λimax
sλj
O(|p
sλj−1(k)|| λ
i− λ
∗iλ
j− λ
∗i|
k),
where, given the block diagonal Jordan form of A, J = X
−1AX, for each λ
j6= λ
i, the number s
λjindicates the dimension of the generic Jordan block associated with λ
j, and p
sλj−1(k) is a polynomial of degree s
λj− 1, whose coefficients depend on
(λ 1j−λ∗i)r
, r = 1, . . . , s
λj− 1, and on the coefficients of v
0with respect to the column vectors of X corresponding to the Jordan block under consideration.
proof: See the Appendix.
Let λ 1 be such that |λ 1 | = ρ(A) and assume that all λ i such that |λ i | = ρ(A) are equal to λ 1 (in such case we say that λ 1 dominates the eigenvalues of A).
Assume, moreover, that the algebraic and geometric multiplicity of λ 1 are equal.
Then the power method (see the Theorem below) can be used to compute λ 1
and an eigenvector corresponding to λ 1 .
Theorem (power). If λ 1 dominates the eigenvalues of a n × n matrix A, if m a (λ 1 ) = m g (λ 1 ), and if v 0 ∈ C n is not orthogonal to the space spanned by the eigenvectors of A corresponding to λ 1 , then the sequence {v k } generated by the algorithm
a k = Av k−1 , v k = a k
ka k k , k = 1, 2, . . .
converges to an eigenvector of A corresponding to the eigenvalue λ 1 . Moreover, u H v k+1
u H v k → λ 1 , k → +∞
for any u for which u H v k 6= 0. If A is diagonalizable, then the rate of conver- gence is O((max j: λj6=λ
1| λ λj1|) k ).
|) k ).
Remark . In the general case the rate of convergence is
j: λ
max
j6=λ1max
sλjO(|p
sλj−1(k)|| λ
jλ
1|
k),
where, given the block diagonal Jordan form of A, J = X
−1AX, for each λ
j6= λ
1, the number
s
λjindicates the dimension of the generic Jordan block associated with λ
j, and p
sλj−1(k) is
a polynomial of degree s
λj− 1, whose coefficients depend on
λ1rj
, r = 1, . . . , s
λj− 1, and on the coefficients of v
0with respect to the column vectors of X corresponding to the Jordan block under consideration.
proof: See the Appendix.
For our purposes it is useful to recall also the following classic result on matrix deflation: how to introduce a matrix whose eigenvalues are all equal to the eigenvalues of A except one, which, instead, is zero.
Theorem. Let A be a n × n matrix. Let λ 1 be a nonzero eigenvalue of A and y 1 a corresponding eigenvector, i.e. Ay 1 = λ 1 y 1 . Call λ 2 , λ 3 , . . . , λ n the remaining eigenvalues of A. Then the matrix W = A − w λ∗1y
1y 1 w ∗ , w ∗ y 1 6= 0, has eigenvalues 0, λ 2 , λ 3 , . . . , λ n .
proof. Introduce S = [y 1 z 2 · · · z n ] non singular, and observe that p A (λ) = p S−1AS (λ) = (λ − λ 1 )q(λ), p W (λ) = p S
−1W S (λ) = λq(λ).
Let G be the following matrix
G =
1 4
1 4
1 3 2
4 1 8
1 11 8
16 1 4
1 16
.
Since ρ(G) ≤ kGk ∞ = 1, we can say that the spectrum of G lies in the circle {z ∈ C : |z| ≤ 1}.
Note that Ge = 1 · e, i.e. one eigenvalue of G, λ 1 = 1, and its corresponding eigenvector, y 1 = e = [1 1 · · · 1] T , are known. If λ 2 , λ 3 denote the remaining eigenvalues of G, then we can define a matrix W , in terms of G, λ 1 , y 1 , whose eigenvalues are 0, λ 2 , λ 3 :
W = G − λ 1
w ∗ y 1 y 1 w ∗ = G − 1 w ∗ e
w ∗ w ∗ w ∗
, ∀ w, w ∗ e 6= 0.
Since (e T i G)e = e T i (1 · e) = 1 6= 0, we choose w ∗ = e T i G:
W = G −
e T i G e T i G e T i G
. For i = 1 the matrix W becomes
W =
1 4
1 4
1 3 2
4 1 8
1 11 8
16 1 4
1 16
−
1 4
1 4
1 1 2 4
1 4
1 1 2 4
1 4
1 2
=
0 0 0
1
2 − 1 8 − 3 8
7
16 0 − 16 7
,
so, the eigenvalues of A different from 1 are the eigenvalues of W = ˜
− 1 8 − 3 8
0 − 16 7
, i.e. − 1 8 and − 16 7 .
Thus, 1, − 1 8 and − 16 7 are the eigenvalues of G, and also, of course, of G T .
In particular, 1 is eigenvalue of G T , but note that the eigenvector p of G T
corresponding to 1 is not obvious; it must be computed. For example, it can be computed as the limit of the inverse power sequence {v k } defined as follows (ε small positive number):
v 0 ∈ R 3 , (G T − (1 + ε)I)a k = v k−1 , v k = a k
ka k k , k = 0, 1, 2, . . . (rate of convergence: O(
1−(1+ε)
−
18−(1+ε)
k )). [Here a reference for the inverse power iterations]. We shall see that the vector p can be also obtained as the limit of the sequence
p 0 ∈ R 3 , p 0 positive, kp 0 k 1 = 1, p k+1 = G T p k , k = 0, 1, 2, . . . (rate of convergence: O(
−
1671
k
)) [this result is in fact a particular case of the Theorem at the beginning of this section (set k · k = k · k 1 , u = e)]. Even if the rate of convergence of the p k is not as good as the rate of convergence of the v k , the computation of p k+1 from p k is much cheaper than the computation of v k+1 from v k . In fact, for analogous problems, but of high dimension, even one step of the inverse power iterations is prohibitive.
The Perron-Frobenius theory: A ∈ R n×n , A ≥ 0 irreducible [Varga]
Lemma. Let A be a n × n non negative matrix, i.e. a ij ≥ 0, ∀ i, j. Assume that A is not reducible. Then (I + A) n−1 is a positive matrix, i.e. its entries are all positive.
proof. We shall prove that the vector (I + A) n−1 x is positive whenever x is a non negative non null vector (prove that this is equivalent to the thesis!).
Let x be a non negative non null vector. Set x 0 = x, x 1 = (I + A)x 0 = x 0 + Ax 0 , . . ., x k+1 = (I + A)x k = x k + Ax k , k = 1, . . . , n − 2. Note that x k = (I + A) k x, in particular x n−1 = (I + A) n−1 x. So, our aim is to prove that x n−1 is a positive vector. First observe by induction that all x k are non negative vectors (x k non negative and A non negative imply Ax k non negative and x k+1 = x k + Ax k non negative). Then the thesis (x n−1 positive) is now proved by showing that x k+1 must have less zeros than x k for each k ∈ {0, . . . , n − 2}.
Note that x k+1 cannot have more zeros than x k , since Ax k , in the definition x k + Ax k of x k+1 , is non negative. Assume x k+1 has the same number of zero entries as x k . Of course such zeros must be in the same places, i.e. there exists a permutation matrix P such that P x k =
α 0
, P x k+1 =
β 0
, with α, β positive vectors of the same dimension m, 1 ≤ m ≤ n − 1 (why such bounds for m ?). Thus, P x k+1 = P x k + P Ax k = P x k + P AP T P x k . Consider the following partition of the matrix P AP T
P AP T =
M 11 M 12
M 21 M 22
where M 11 is m × m. Then
β 0
=
α 0
+
M 11 M 12
M 21 M 22
α 0
,
and, in particular, M 21 α = 0. The latter condition implies M 21 = 0, being α a positive vector and M 21 a non negative matrix. But this is equivalent to say that A is reducible, against the hypothesis!
Example. Set
I + A =
1 0 0 1
a 1 0 0
0 b 1 0
0 0 c 1
, a, b, c positive.
Note that A is a non-negative irreducible matrix, and in fact (I + A) 3 is a positive matrix, i.e. its entries are positive. Moreover, 3 is the minimum j for which (I + A) j is positive.
Let A be a n × n non negative irreducible matrix. Let x be a non negative non null vector, and associate to x the number
r x := min
i:x
i>0
P
j a ij x j
x i
= min
i:x
i>0
(Ax) i
x i
.
Proposition. r x is a non negative real number; r x = r αx if α > 0; Ax ≥ r x x;
r x = sup{ρ ∈ R : Ax ≥ ρx}.
proof: easy, left to the reader.
Now associate to A the following number:
r = sup
x≥0, x6=0
r x = sup
x≥0, x6=0 i:x mini>0
P
j a ij x j
x i
.
Proposition. r is a positive real number; if w ≥ 0, w 6= 0, is such that Aw ≥ rw, then Aw = rw and w > 0.
proof. If e = [1 1 · · · 1] T , then r e = min i: (e)i>0 P
j
a
ij(e)
j(e)
i= min i P
j a ij ≥ 0.
Assume r e = 0. Then for some k we would have P
j a kj = 0, so the kth row of A would be null, and thus A would be reducible (exchange the k and n rows), against the hypothesis. It follows that r ≥ r e > 0.
proof. Set η = Aw − rw. We know that η ≥ 0. Assume η 6= 0. Then, by the Lemma,
0 < (I + A) n−1 η = (I + A) n−1 Aw − (I + A) n−1 rw
= A(I + A) n−1 w − r(I + A) n−1 w = Ay − ry, y > 0.
i.e. r < (Ay) i /y i ∀ i. Thus r < r y , which is absurd. It follows that η = 0, that is, Aw = rw. Then, we also have w > 0 since 0 < (I + A) n−1 w = (1 + r) n−1 w.
In the following, given v ∈ C n and M ∈ C n×n we denote by |v| and |M|, respectively, the column vector (|v k |) n k=1 and the matrix (|m ij |) n i,j=1 .
Theorem. There exists a positive vector z for which Az = rz; r = ρ(A); if
B ∈ C n×n , |B| ≤ A, then ρ(B) ≤ ρ(A); if B ∈ C n×n , |B| ≤ A, |B| 6= A then
ρ(B) < ρ(A).
proof: We now show that there exists z ≥ 0 such that r = r z . Once this is proved we will have the inequality Az ≥ r z z = rz which implies, by the Proposition, Az = rz and z > 0.
r = sup
x≥0, x6=0
r x = sup
x≥0, x6=0
r
xkxk2
= sup
x≥0, kxk
2=1
r x = (∗)
We have r x ≤ r (I+A)n−1x since r (I+A)
n−1x = sup{ρ ∈ R : A(I + A) n−1 x ≥ ρ(I +A) n−1 x } and Ax ≥ r x x ⇒ A(I+A) n−1 x = (I +A) n−1 Ax ≥ r x (I +A) n−1 x.
Thus,
(∗) ≤ sup
x≥0, kxk
2=1
r (I+A)n−1x = sup
y∈Q
r y = max
y∈Q r y = r z ≤ r
for some z ∈ Q = {(I + A) n−1 x : x ≥ 0, kxk 2 = 1} (r y is continuous in y (why?) and Q is a compact). Note that z > 0.
proof: Let λ be an eigenvalue of A, i.e. ∃ y 6= 0 | Ay = λy. Then |λ||y| = |λy| =
|Ay| ≤ A|y|, |y| ≥ 0, |y| 6= 0. Thus, by definition of r |y| , |λ| ≤ r |y| ≤ r, and we have the inequality ρ(A) ≤ r. But r is an eigenvalue of A, thus r = ρ(A).
proof: Let λ be an eigenvalue of B, i.e. ∃ y 6= 0 | By = λy. Then |λ||y| =
|λy| = |By| ≤ |B||y| ≤ A|y|, |y| ≥ 0, |y| 6= 0. Thus, by definition of r |y| ,
|λ| ≤ r |y| ≤ r, and we have the inequality ρ(B) ≤ r = ρ(A).
proof: Assume ρ(B) = ρ(A), i.e. there exists λ eigenvalue of B (By = λy, y 6= 0) such that |λ| = r. Then we can add an equality in the above arguments,
r|y| = |λ||y| = |λy| = |By| ≤ |B||y| ≤ A|y|, |y| ≥ 0, |y| 6= 0,
obtaining the inequality r|y| ≤ A|y|, |y| ≥ 0, |y| 6= 0. But by the above Proposition, such inequality implies A|y| = r|y| with |y| > 0. So we have
r|y| = |λ||y| = |λy| = |By| = |B||y| = A|y| = r|y|, |y| > 0,
from which it follows |B| = A, against the hypothesis! Thus ρ(B) < ρ(A). We conclude this section with the following result of the Perron-Frobenius theory, stated without proof (actually we shall give a proof of such result in the case A is positive):
Result. If A is a non-negative irreducible n×n matrix, then r = ρ(A) is a simple eigenvalue of A, i.e. λ − r divides p A (λ) but (λ − r) 2 does not divide p A (λ).
Computing the Perron-pair of A irreducible non-negative by the power method Let A be an irreducible non-negative n × n matrix. Then we know that ρ(A) is positive and is a simple eigenvalue of A, and the corresponding eigenvector can be chosen positive. Of course, such eigenvector is uniquely defined if we require that its measure in the 1-norm is one. We also know that ρ(A) = r :=
sup x≥0, x6=0 min i: xi>0 (Ax)
i
x
i.
So, it is uniquely defined the Perron-pair (ρ(A), z). such that Az = ρ(A)z, ρ(A) positive, z positive, kzk 1 = 1. The power method is a way to compute such Perron-pair.
Theorem(power). Let A be an irreducible non-negative n × n matrix. Let a 0 be any positive vector, and set v 0 = a 0 /ka 0 k 1 . Then set
a k+1 = Av k , v k+1 = a k+1
ka k+1 k 1 , k = 0, 1, 2, . . . .
Note that the sequences {a k }, {v k } are well defined sequences of positive vectors, and kv k k 1 = 1, ∀ k. Let X be the non singular matrix defining, by similarity, the Jordan block-diagonal form of A, i.e.
X −1 AX = J =
r 0 T
0 B
, X =
z x 2 · · · x n
,
and let r = ρ(A), λ j , j = 2, . . . , n, be the eigenvalues of A (λ j 6= r, ∀ j). If α in the expression a 0 = αz + P n
j=2 α j x j is nonzero, then, for k → +∞, we have v k − z → 0, ka k k 1 − ρ(A) → 0,
provided that |λ j | is smaller than r for all j = 2, . . . , n. In the particular case where A is diagonalizable, the rate of convergence is
j=2...n max
|λ j | r
k
[for the general case, use the Remark in Theorem(power) of the section on preliminary considerations].
proof: We prove the Theorem only in the case A diagonalizable, where Ax j = λ j x j , j = 2, . . . , n. It is easy to observe that A k a 0 = αr k z + P n
j=2 α j λ k j x j is a positive vector, and that
v k = αr
k
z+ P
nj=2
α
jλ
kjx
jkαr
kz+ P
nj=2
α
jλ
kjx
jk
1= αr
k
z+ P
nj=2
α
jλ
kjx
je
T(αr
kz+ P
nj=2
α
jλ
kjx
j)
= αr
k
z+ P
nj=2
α
jλ
kjx
jαr
ke
Tz+ P
nj=2
α
jλ
kje
Tx
j=
z+ P
n j=2αj α
λj r kx
j1+ P
n j=2αj α
λj r ke
Tx
j.
Moreover,
a k+1 = Av k =
rz + P n j=2
α
jα
λ
j
r
k
λ j x j 1 + P n
j=2 α
jα
λ
j
r
k
e T x j and, since a k+1 is positive,
ka k+1 k 1 = e T a k+1 =
r+ P
n j=2αj α
λj r kλ
je
Tx
j1+ P
n j=2αj α
λj r ke
Tx
j=
r 1+ P
n j=2αj α
λj r k+1e
Tx
j1+ P
n j=2αj α
λj r ke
Tx
j.
Exercise. Discuss the convergence of the power method when applied to compute the Perron-pair (1,
1
2 1 2
) of the following two 2 × 2 matrices:
A =
0 1 1 0
, A =
1/4 3/4 3/4 1/4
.
In the second case, find a constant c such that kv k − zk ≤ c( 1 2 ) k .
Exercise. Prove the Theorem in case A (non negative, irreducible) is 4 × 4, and there exists X non singular for which
X −1 AX =
r
λ 2 1 λ 2 1
λ 2
,
with |λ 2 | smaller than r. Prove that in such case the rate of convergence is
|p 2 (k)|
λ 2
r
k
, where p 2 is a degree-two polynomial.
Exercise. Set
A =
0 0 1
1
4 0 0
3
4 1 0
.
Prove that the eigenvalues of A are {1, − 1 2 , − 1 2 } and that A is not diagonalizable.
Find X = [z x 2 x 3 ] such that
X −1 AX =
1 0 0
0 − 1 2 1 0 0 − 1 2
.
The only hypothesis A irreducible, non-negative, does not assure that r = ρ(A) is the unique eigenvalue of A whose absolute value is equal to r. We only know that if |λ j | = r, j ∈ {2, . . . , n}, then λ j 6= r. Let us see examples:
A =
0 1 1 0
; Eigenvalues: − 1, 1; Perron-pair: (1,
1
2 1 2
).
A =
0 a 0
1 0 1
0 1 − a 0
, a ∈ (0, 1); Eig: − 1, 0, 1; Perron-pair: (1,
a 2 1 1−a 2
2
).
A =
2 1 1 1 2 1 1 1 2
; Eigenvalues: 1, 1, 4; Perron-pair: (4,
1/3 1/3 1/3
).
Note that if in the second example A is replaced by A T , then there is no need of computation in obtaining the perron-pair, since it is clear that A T e = e, e = [1 1 1] T . Moreover, in the third example r = ρ(A) (which is 4) dominates the remaining eigevalues of A (which are 1, 1). We note that this fact is true for any positive matrix A (see Theorem(positive) below), even if, as the following example shows, it is not a peculiarity of positive matrices:
A =
2 1 0 1 2 1 0 1 2
; Eig: 2 ∓ √
2, 2; Perron-pair: (2 + √ 2, 1
2 + √ 2
√ 1 2 1
)
(here computation is required to obtain the Perron-pair).
Theorem(positive). If A is a n × n positive matrix and r = ρ(A), λ 2 , . . . , λ n are its eigenvalues, then |λ j | is smaller than r for all j = 2, . . . , n.
proof: Set W = A − sze T . The eigenvalues of W are r − s, λ 2 , . . . , λ n (a sketch of the proof:
Y :=
z y
2· · · y
nnon singular, Y
−1AY =
r · · ·
0 M
, Y
−1W Y =
r − s · · ·
0 M
).
If there is a value of s for which |W | ≤ A, |W | 6= A, then ρ(W ) is smaller than ρ(A), and thus |λ 2 |, . . . , |λ n | are smaller than r = ρ(A). If A is positive, then such s exists, s = min i,j a ij .
Irreducible non-negative stochastic-by-columns A and power method
Let A be an irreducible non-negative stochastic by columns n × n matrix. Then we know that 1 = ρ(A) (A T e = e, ρ(A) ≤ kAk 1 = 1), that r = 1 = ρ(A) is a simple eigenvalue of A, and that the corresponding eigenvector can be chosen positive. Of course, such eigenvector is uniquely defined if we require that its measure in the 1-norm is one.
So, it is uniquely defined the Perron-pair (1 = ρ(A), z). such that Az = z, z positive, kzk 1 = 1. The computation of such Perron-pair, i.e. the computation of the vector z, can be performed via the power method. Actually, we now see that the power method in the particular case where A is stochastic-by-columns (besides non-negative and irreducible) can be rewritten in a simpler form and converges independently from the choice of a 0 (provided that 1 dominates the other eigenvalues). These results follow from some remarks, reported in the following Proposition.
Proposition. Let A be an irreducible non-negative stochastic-by-columns n × n matrix. Then
i)
v ∈ C n ⇒ X
i
(Av) i = X
i
v i
( P
i (Av) i = P
i
P
j a ij v j = P
j v j P
i a ij = P
j v j ), ii)
v positive, kvk 1 = 1 ⇒ Av positive, kAvk 1 = 1 (use the irreducibility of A and assertion i)),
iii)
Av = λv, λ 6= 1, ⇒ X
i
v i = 0 ( P
i v i = P
i (Av) i = λ P
i v i , thus (λ − 1) P
i v i = 0).
Exercise. Assume that X −1 AX = J where J is the Jordan block diagonal form of A, where A is an irreducible non-negative stochastic-by-columns n×n matrix.
Assume that [J] 11 = 1. Prove that P
i [X] ij = 0, j = 2, . . . , n.
Corollary(power). Let A be an irreducible non-negative stochastic-by-columns n × n matrix. Let a 0 be any positive vector such that ka 0 k 1 = 1. Then set
a k+1 = Aa k , k = 0, 1, 2, . . . .
The sequence {a k } is a well defined sequence of positive vectors, and ka k k 1 = 1 = ρ(A), ∀ k. If k → +∞, then
a k − z → 0,
provided that |λ j | is smaller than 1 for all j = 2, . . . , n. The latter event is assured by Theorem(positive) when A is positive. In the particular case where A is diagonalizable, the rate of convergence is
( max
j=2...n |λ j |) k
[for the general case, use the Remark in Theorem(power) of the section on preliminary considerations].
proof: The sequences a k and v k defined in Theorem(power) coincide, because, by Proposition ii), we have kAv k k 1 = kv k k 1 . It remains to show that α in the expression a 0 = αz + P
j α j x j is nonzero. Note that e T a 0 = αe T z + P
j α j e T x j = α + P
j α j e T x j . In the case that A is diagonalizable the thesis follows by Proposition iii) which asserts that e T x j , j = 2, . . . , n, must be zero.
In the generic case the proof is left to the reader . . ..
Exercise. Discuss existence, unicity, and computation of the Perron-pair for
A =
0 1 − a a
a 0 1 − a
1 − a a 0
, a ∈ [0, 1], A =
0 0 1
a 0 0
1 − a 1 0
, a ∈ [0, 1],
A =
0 b 1
1 0 b 2
1 − b 1 0 . ..
1 − b 2 . .. b n−2
. .. 0 1
1 − b n−2 0
, b i ∈ (0, 1).
Page-rank computation [Berkhin, et al]
Consider an oriented graph with set of verteces V = {1, 2, . . . , n}, and set of edges E, i, j ∈ E if there is a link from i to j.
Associate to such graph the adiacency matrix:
L =
1 ij ∈ E 0 ij / ∈ E
Call deg (i) the number of edges starting from i. Of course, deg (i) = 0 (no edge starts from i) iff L ij = 0, ∀ j. Moreover, deg (i) = P
j L ij . Associate to the graph the transition matrix:
P =
( 1
deg (i) ij ∈ E
0 ij / ∈ E
Note that P ij = L ij / deg (i) if i is such that deg (i) > 0, and P ij = L ij = 0 otherwise. Moreover, P
j P ij = 1 if deg (i) > 0 and P
j P ij = 0 otherwise. So, the matrix P is a non negative matrix quasi-stochastic by rows.
Remark. Row i of P is null iff no edge starts from i; column j of P is null iff no edge points to j.
Let p ∈ R n be the vector whose entry j, p j , is the importance (authority) of the vertex j. Then
p j = X
i: i→j
p i
deg (i) =
n
X
i=1
P ij p i =
n
X
i=1
P ji T p i = (P T p) j , p = P T p
(note that such fixed point p may not exist, or, if exists, may be not unique or with zero entries; see below).
Let p (k+1) j be the probability that at step k + 1 of my visit of the graph (navigation on the web) I am on the vertex (page) j. Then
p (k+1) j = P
i: i→j p
(k)ideg (i) = P n
i=1 P ij p (k) i
= P n
i=1 P ji T p (k) i = (P T p (k) ) j , p (k+1) = P T p (k)
(note that such sequence of vector probabilities exists and is uniquely defined, once p (0) is given, but the p (k) may loss the possible ddp property of p (0) ; see below). [A vector w is said ddp (discrete distribution of probability) if w is positive and kwk 1 = 1].
Note that it is natural to require that: p (k) i > 0 (at step k there is a prob- ability that I am on vertex i), P
j p (k) j = 1 (at step k I am on some vertex);
p i > 0 (any vertex has a portion of importance . . .), P
i p i = 1 (. . . of the total 1).
So, the following facts must be true:
a) p such that p = P T p , p > 0, kpk 1 = 1, exists and is uniquely defined, b) the method p (0) = ddp, p (k+1) = P T p (k) converges to p.
We now show that in order to have a) and b), the matrix P T must be both stochastic by columns and irreducible.
Theorem(stoc). If the above facts a) and b) are true, then P T must be stochastic by columns.
proof: We know that ∃ ! p such that p = P T p , p > 0, kpk 1 = 1. Assume that P T is quasi-stochastic but not stochastic by columns. Then
kP T p k 1 = P
i (P T p) i = P
i
P
j (P T ) ij p j = P
j
P
i P ji p j
= P
j: deg (j)>0 1 · p j + P
j: deg (j)=0 0 · p j < P
j p j = kpk 1 , analogously,
kp (k+1) k 1 = kP T p (k) k 1 ≤ kp (k) k 1 ≤ kp (1) k 1 < kp (0) k 1 = 1.
Thus p cannot be equal to P T p, and p (k) cannot converge to a ddp.
Theorem(irred). If the above facts a) and b) are true, then P T must be irre-
ducible.
proof: assume P T reducible. Then there exists a permutation matrix Q such that
Q T P T Q =
A 11 A 12
0 A 22
, (∗)
with A 11 and A 22 at least 1×1 square matrices. Note that Q T P T Q is stochastic by columns, like P T . We know that ∃ ! p such that p = P T p , p > 0, kpk 1 = 1, but this is equivalent to say that ∃ ! Q T p such that
Q T p =
A 11 A 12
0 A 22
Q T p , Q T p > 0, kQ T p k 1 = 1. (∗∗) Case 1: assume A 12 = 0. Then A 11 and A 22 are non negative stochastic by columns matrices. We can assume they are also irreducible (why?). Then, by the Perron-Frobenius theory,
∃ ! y 1 , y 2 > 0, ky 1 k 1 = ky 2 k 1 = 1, y 1 = A 11 y 1 , y 2 = A 22 y 2 . and, as a consequence, for all α ∈ (0, 1) we have
αy 1
(1 − α)y 2
=
A 11 0 0 A 22
αy 1
(1 − α)y 2
,
αy 1
(1 − α)y 2
> 0, k
αy 1
(1 − α)y 2
k 1 = 1.
So, all vectors p such that Q T p =
αy 1
(1 − α)y 2
satisfy the properties p > 0, kpk 1 = 1, p = P T p, which is against the hypothesis of unicity.
Case 2: assume A 12 6= 0. Then A 22 in (*) is not stochastic by columns, thus A 22 may have no eigenvalue equal to 1, i.e. the equations in (**) involving A 22
may be verified only if part of p is null, against the hypothesis of positiveness of p.
Viceversa, we know that if the non negative matrix P T is irreducible and stochastic by columns, then 1 = ρ(P T ) is a simple eigenvalue of P T and there exists a unique vector p such that p = P T p , p > 0, kpk 1 = 1. We also know that such hypotheses are not sufficient to assure the convergence (to p) of the sequence p (k+1) = P T p (k) , p (0) > 0, kp 0 k 1 = 1, or, equivalently, to assure that the remaining eigenvalues of P T have absolute value smaller than 1. We can only say that the sequence {p (k) } is a well defined sequence of ddp.
We have to modify P (the graph) so to make well posed (∃ !) the mathemat- ical problem and to make convergent the algorithm for solving it.
Make P T stochastic:
P 0 = P + dv T , d =
δ deg (1),0
.. . δ deg (n),0
, v =
1 n .. .
1 n
i.e, where P has null rows P 0 has the row vector v T . The vertex i with deg (i) =
0 now links to all verteces of the graph. [We discuss the uniform case, but what
follows can be repeated for the more general case v =ddp].
Observe that (P 0 ) T is stochastic by columns, (P 0 ) T ≥ 0, thus 1 is eigenvalue of (P 0 ) T and the other eigenvalues of (P 0 ) T , λ 0 2 , . . . , λ 0 n , are such that |λ 0 j | ≤ 1.
Make (P 0 ) T irreducible:
P 00 = cP 0 + (1 − c)ev T , e =
1
.. . 1
, c ∈ (0, 1).
Since
e T i P 00 =
( cv T + (1 − c)v T = v T deg (i) = 0 c[· · · 0 deg 1 (i) 0 · · ·] + (1 − c)v T deg (i) > 0
we are assuming that a visitor of the graph can go from the vertex i to one of its neighborhoods with probability c/ deg (i) + (1 − c)/n , and with probability (1−c)/n to an arbitrary vertex of the graph. Of course the parameter c must be chosen near to 1, in order to maintain our model faithful to the way the graph (web) is visited.
Observe that (P 00 ) T is stochastic by columns and positive, therefore, in par- ticular, it is non negative and irreducible. So, we have all we need.
Theorem(page-rank). 1 = ρ((P 00 ) T ) is a simple eigenvalue of (P 00 ) T , there exists a unique vector p such that p = (P 00 ) T p , p > 0, kpk 1 = 1 (i.e. we have fact (a)), and the other eigenvalues of (P 00 ) T , λ 00 2 , . . . , λ 00 n , are such that |λ 00 j | < 1 (by Theorem(positive)). Thus, p (k+1) = (P 00 ) T p (k) , p (0) > 0, kp (0) k 1 = 1, is a sequence of ddp convergent to p and, in case P 00 is diagonalizable,
kp (k) − pk = O(( max
j=2,...,n |λ 00 j |) k )
[for the general case see the Remark in Theorem(power) of Section 1] (i.e. we have fact (b)). Moreover, for the particular choice of (P 00 ) T , the cost of each step of the power method is O(n) and is dominated by the cost of the matrix-vector multiplication P T z , and if A is diagonalizable, then the rate of convergence is
kp (k) − pk = O(c k )
[for the general case see the Remark in Theorem(power) of Section 1] (Google- search engine sets c = 0.85 [Berkhin]).
proof: We have to prove only the final assertions.
O(n) arithmetic operations are sufficient to perform each step of the power method. We have
(P 00 ) T = c(P T + vd T ) + (1 − c)ve T , and, if x ≥ 0, then
(P 00 ) T x = cP T x + γv,
γ = cd T x + (1 − c)e T x = e T x − c[e T x − d T x ] = kxk 1 − ckP T x k 1 (why the latter equality holds?). Thus, in order to compute p (k+1) from p (k) one can use the following function
y = cP T x,
γ = kxk 1 − kyk 1 ,
(P 00 ) T x = y + γv
where the dominant operation is the matrix-vector multiplication P T x. Note that each row j of P T has (on the average) a very small number of nonzero entries, i.e. exactly the number of vertices pointing to j, so P T x can be com- puted with O(n) arithmetic operations. Note also that in order to implement the above function one needs only 2n memory allocations.
Rate of convergence kp (k) − pk = O(c k ). We first prove that if e T v = 1 then p P0(λ) = (λ − 1)p n−1 (λ) ⇒ p P0+
1−cc ev
T(λ) = (λ − 1
+
1−ccev
T(λ) = (λ − 1
c )p n−1 (λ). (∗ ∗ ∗) In fact,
S =
e y 2 · · · y n , det(S) 6= 0, S −1 P 0 S =
1 u T
0 M
, p P0(λ) = (λ − 1)p M (λ),
S −1 (P 0 + 1−c c ev T )S =
1 u T
0 M
+ 1−c c S −1 ev T S
=
1 u T
0 M
+ 1−c c
1 · · · ∗ 0 · · · 0
=
1 c u ˜ T
0 M
.
As a consequence of (***), if 1, λ 0 2 , . . . , λ 0 n are the eigenvalues of P 0 (re- call that |λ 0 j | ≤ 1, j = 2, . . . , n), then the eigenvalues of P 0 + 1−c c ev T are
1
c , λ 0 2 , . . . , λ 0 n , and thus the eigenvalues of cP 0 + (1 − c)ev T are 1, cλ 0 2 , . . . , cλ 0 n . It follows that if A is diagonalizable, then kp (k) −pk = O((max j=2,...,n |cλ 0 j |) k ) = O(c k ).
Why to compute p? [Berkhin].
QUERY: Berkhin survey.
Go in the inverted terms document file, which is a table containing a row for each term of a collection’s dictionary. In such file, for each term there is a list of all documents that contain such term
.. .
term → LISTA
term= {i
1, i
2, . . . , i
k} ⊂ {1, 2, . . . , n}
.. .
Berkhin → LISTA
Berkhin= {1, 4, 6}
.. .
survey → LISTA
survey= {1, 3}
.. .
Define the set of relevance of the query
∪ term ∈ QUERY LISTA term = {1, 3, 4, 6}
Reading p (p is updated once each month) considers and orders the correspond- ing set of authorities {p 1 , p 3 , p 4 , p 6 }, for example p 4 ≥ p 6 ≥ p 3 ≥ p 1
Finally, show the titles of the documents 1, 3, 4, 6 in the order 4, 6, 3, 1, from the
one with greatest authority to the one with smallest authority.
Main criticism: This procedure, being independent from the query allows a fast answer, but does not make distinction between pages with authority from pages with authority on a specific subject.
Exercise. Draw the graph whose transition matrix is
P =
0 1/2 1/2 0 0 0
0 0 0 0 0 0
1/3 1/3 0 0 1/3 0
0 0 0 0 1/2 1/2
0 0 0 1/2 0 1/2
0 0 0 1 0 0
.
Note that P is non negative, reducible and quasi-stochastic, but not stochastic, by rows. Prove that there is no positive vector p such that p = P T p. Starting from P and proceeding as indicated in the theory, introduce a non negative matrix P 0 stochastic by rows. Note that P 0 is reducible, like P . Prove that there is no positive vector p such that p = (P 0 ) T p . Starting from P 0 and proceeding as indicated in the theory, introduce a non negative matrix P 00 irreducible and stochastic by rows. Prove that there exists a unique positive vector p such that p = (P 00 ) T p , kpk 1 = 1, and describe an algorithm for the computation of p.
Here below is an approximation of such vector p:
p T = [0.03721 0.05396 0.04151 0.3751 0.206 0.2862].
Assume, for example, that the set of relevance of a query is {1, 3, 4, 6}. Then the documents 1, 3, 4, 6 are listed in the order 4, 6, 3, 1, being p 4 ≥ p 6 ≥ p 3 ≥ p 1 .
APPENDIX
Proof (Theorem(inverse power)):
Assume A diagonalizable, Ax j = λ j x j , {x j } n j=1 linearly independent. Then Ax j − λ ∗ i x j = (λ j − λ ∗ i )x j ⇒ (A − λ ∗ i I) −m x j = 1
(λ j − λ ∗ i ) m x j . Call α j the numbers for which v 0 = P
j: λ
j=λ
iα j x j + P
j: λ
j6=λ
iα j x j . Note that the first sum is a non null vector (by the assumption on v 0 ). Then the following equality holds
(λ i − λ ∗ i ) m (A − λ ∗ i I) −m v 0 = X
j: λ
j=λ
iα j x j + X
j: λ
j6=λ
iα j
λ i − λ ∗ i
λ j − λ ∗ i
m
x j
which, for m → +∞, yields the thesis.
Assume that A is not diagonalizable. In this case call x j , j = 1, . . . , n, the
columns of the non singular matrix X for which X −1 AX = J where J is the
block diagonal Jordan form of A. Assume that the upper-left diagonal block of
J is diagonal and its diagonal contains all λ j such that λ j = λ := λ i . Assume
that there are t of such eigenvalues. Then consider any other diagonal block
of J; of course it corresponds to an eigenvalue λ j different from λ, call such
eigenvalue µ. The block under consideration has an order, say s = s µ .
J =
λ
λ . . .
λ
•
µ 1
µ . . . . . . 1
µ
•
.
Restrict the matrix equation X −1 AX = J to this block, thus, for some r ≥ t we have
A[x
r+1x
r+2x
r+3· · · x
r+s] = [x
r+1x
r+2x
r+3· · · x
r+s]
µ 1
µ . . . . . . 1
µ
.
It follows that
Ax r+1 = µx r+1 , (A − λ ∗ I)x r+1 = (µ − λ ∗ )x r+1 ,
(A − λ ∗ I) −1 x r+1 = µ−λ 1∗x r+1 , (A − λ ∗ I) −m x r+1 = (µ−λ 1∗)
mx r+1 . Ax r+2 = µx r+2 + x r+1 , (A − λ ∗ I)x r+2 = (µ − λ ∗ )x r+2 + x r+1 , (A − λ ∗ I) −1 x r+2 = µ−λ 1∗x r+2 − (µ−λ 1∗)
2x r+1 ,
)
mx r+1 . Ax r+2 = µx r+2 + x r+1 , (A − λ ∗ I)x r+2 = (µ − λ ∗ )x r+2 + x r+1 , (A − λ ∗ I) −1 x r+2 = µ−λ 1∗x r+2 − (µ−λ 1∗)
2x r+1 ,
)
2x r+1 ,
(A − λ ∗ I) −m x r+2 = (µ−λ 1∗)
mx r+2 − (µ−λ m∗)
m+1x r+1 .
)
m+1x r+1 .
Ax r+3 = µx r+3 + x r+2 , (A − λ ∗ I)x r+3 = (µ − λ ∗ )x r+3 + x r+2 , (A − λ ∗ I) −1 x r+3 = µ−λ 1∗x r+3 − (µ−λ 1∗)
2x r+2 + (µ−λ 1∗)
3x r+1 , (A − λ ∗ I) −m x r+3 = (µ−λ 1∗)
mx r+3 − (µ−λ m∗)
m+1x r+2 +
12(m
)
2x r+2 + (µ−λ 1∗)
3x r+1 , (A − λ ∗ I) −m x r+3 = (µ−λ 1∗)
mx r+3 − (µ−λ m∗)
m+1x r+2 +
12(m
)
mx r+3 − (µ−λ m∗)
m+1x r+2 +
12(m
2
+m) (µ−λ
∗)
m+2x r+1 . Ax r+4 = µx r+4 + x r+3 , (A − λ ∗ I)x r+4 = (µ − λ ∗ )x r+4 + x r+3 , (A − λ ∗ I) −1 x r+4 = µ−λ 1∗x r+4 − (µ−λ 1∗)
2x r+3
)
2x r+3
+ (µ−λ 1∗)
3x r+2 − (µ−λ 1∗)
4x r+1 ,
)
4x r+1 ,
(A − λ ∗ I) −m x r+4 = (µ−λ 1∗)
mx r+4 − (µ−λ m∗)
m+1x r+3 + (µ−λ12(m
2∗+m) )
m+2x r+2 −
16m (µ−λ
3+
12∗m )
m+32+
13m x r+1 ,
)
m+1x r+3 + (µ−λ12(m
2∗+m) )
m+2x r+2 −
16m (µ−λ
3+
12∗m )
m+32+
13m x r+1 ,
(A − λ ∗ I) −m x r+s = 1
(µ − λ ∗ ) m x r+s − . . . + (−1) s−1 q s−1 (m)
(µ − λ ∗ ) m+s−1 x r+1 , (A − λ ∗ I) −m [x r+1 x r+2 x r+3 · · ·]
= [x r+1 x r+2 x r+3 · · ·]
1
(µ−λ
∗)
m− (µ−λ m∗)
m+1 (µ−λ
12(m
2∗+m) )
m+2 · · · 0 (µ−λ 1∗)
m − (µ−λ m∗)
m+1
)
m− (µ−λ m∗)
m+1
0 0 (µ−λ 1∗)
m
.. . . ..
.
Set also λ ∗ := λ ∗ i (λ ∗ i is the given approximation of λ = λ i ). Choose v 0 , and call α j the numbers for which v 0 = P
j: λ
j=λ α j x j + { . . . + P r+s
j=r+1 α j x j + . . . }.
Note that P
j: λ
j=λ α j x j = P t
j=1 α j x j . Then (A − λ ∗ I) −m v 0
= P
j: λ
j=λ α j (A − λ ∗ I) −m x j + { · · · + P r+s
j=r+1 α j (A − λ ∗ I) −m x j + · · · }
= P
j: λ
j=λ α j 1
(λ−λ
∗)
mx j + { · · · + [ α r+1 ( (µ−λ 1∗)
mx r+1 ) +α r+2 ( (µ−λ 1∗)
mx r+2 − (µ−λ m∗)
m+1x r+1 )
)
mx r+2 − (µ−λ m∗)
m+1x r+1 )
+α r+3 ( (µ−λ 1∗)
mx r+3 − (µ−λ m∗)
m+1x r+2 + (µ−λ12(m
2∗+m) )
m+2x r+1 ) +α r+4 ( (µ−λ 1∗)
mx r+4 − (µ−λ m∗)
m+1x r+3
)
m+1x r+2 + (µ−λ12(m
2∗+m) )
m+2x r+1 ) +α r+4 ( (µ−λ 1∗)
mx r+4 − (µ−λ m∗)
m+1x r+3
)
mx r+4 − (µ−λ m∗)
m+1x r+3
+ (µ−λ12(m
2∗+m) )
m+2x r+2 −
16m (µ−λ
3+
12∗m )
m+32+
13m x r+1 )
+ . . . + α r+s ( (µ−λ 1∗)
mx r+s − . . . + (−1) s−1 (µ−λ qs−1∗) (m)
m+s−1x r+1 ) ] + · · · }.
) (m)
m+s−1x r+1 ) ] + · · · }.
But this implies
(λ − λ ∗ ) m (A − λ ∗ I) −m v 0
= P
j: λ
j=λ α j x j + { . . . +
λ−λ
∗µ−λ
∗m
h α r+1 − α r+2 µ−λ m∗ + α r+3
1 2
(m
2+m)
(µ−λ
∗)
2. . . + (−1) s−1 α r+s qs−1(m) (µ−λ
∗)
s−1x r+1
+ α r+2 − α r+3 m
µ−λ
∗+ α r+4
1 2
(m
2+m)
(µ−λ
∗)
2. . . + (−1) s−2 α r+s qs−2(m) (µ−λ
∗)
s−2x r+2
+ . . . + α r+s x r+s
i + · · · },
from which it is clear that the sequence (λ−λ ∗ ) m (A−λ ∗ I) −m v 0 converges to an eigenvector of A associated with λ. The assertions about the rate of convergence follow by setting
p s−1 (λ) = α r+1 − α r+2 µ−λ m∗+ α r+3
1 2
(m
2+m)
(µ−λ
∗)
2−α r+4
1
6
m
3+
12m
2+
13m
(µ−λ
∗)
3. . . + (−1) s−1 α r+s qs−1(m) (µ−λ
∗)
s−1. Proof (Theorem(power)):
Assume that A is diagonalizable, so Ax j = λ j x j , {x j } n j=1 linearly independent.
Choose v 0 , and call α j so that v 0 = P
j: λ
j=λ
1α j x j + P
j: λ
j6=λ
1α j x j . Note that the first sum is non null by assumption. Then we have the equalities:
A m v 0 = λ m 1 P
j: λ
j=λ
1α j x j + P
j: λ
j6=λ
1α j λ m j x j ,
1
λ
m1A m v 0 = P
j: λ
j=λ
1α j x j + P
j: λ
j6=λ
1α j λj
λ
1m
x j . The thesis follows letting m go to infinite.
Assume that A is not diagonalizable. Set µ := λ j , where λ j 6= λ 1 , and restrict, as above, the Jordan matrix equation X −1 AX = J to a diagonal Jordan block of order s associated with µ. Then
Ax r+1 = µx r+1 , A m x r+1 = µ m x r+1 ,
Ax r+2 = µx r+2 + x r+1 , A m x r+2 = µ m x r+2 + mµ m−1 x r+1 , Ax r+3 = µx r+3 + x r+2 ,
A m x r+3 = µ m x r+3 + mµ m−1 x r+2 + 1 2 (m 2 − m)µ m−2 x r+1 ,
Ax r+4 = µx r+4 + x r+3 ,
A m x r+4 = µ m x r+4 + mµ m−1 x r+3 + 1 2 (m 2 − m)µ m−2 x r+2 +( 1 6 m 3 − 1 2 m 2 + 1 3 m)µ m−3 x r+1 ,
and so on. Set λ := λ 1 and t = m a (λ 1 ) = m g (λ 1 ), so we can assume that the upper-left t × t submatrix of J is diagonal with diagonal entries all equal to λ.
Then
A m v 0 = A m P t
j=1 α j x j + {. . . + P r+s
j=r+1 α j x j + . . .}
= P t
j=1 α j λ m x j + {. . . + P r+s
j=r+1 α j A m x j + . . .}
= λ m P t
j=1 α j x j + {. . . + [ α r+1 A m x r+1 + α r+2 A m x r+2 +α r+3 A m x r+3 + α r+4 A m x r+4 + . . . + α r+s A m x r+s ] + . . .}
= λ m P t j=1 α j x j
+{. . . + [ α r+1 (µ m x r+1 ) + α r+2 (µ m x r+2 + mµ m−1 x r+1 ) +α r+3 (µ m x r+3 + mµ m−1 x r+2 + 1 2 (m 2 − m)µ m−2 x r+1 ) +α r+4 (µ m x r+4 + mµ m−1 x r+3 + 1 2 (m 2 − m)µ m−2 x r+2
+( 1 6 m 3 − 1 2 m 2 + 1 3 m)µ m−3 x r+1 )
+ . . . + α r+s (µ m x r+s + . . . + q s−1 (m)µ m−s+1 x r+1 ) ] + . . .}, A m v 0 = λ m P t
j=1 α j x j
+{. . . + [ (α r+1 µ m + α r+2 mµ m−1 + α r+3 1
2 (m 2 − m)µ m−2 +α r+4 ( 1 6 m 3 − 1 2 m 2 + 1 3 m)µ m−3
+ . . . + α r+s q s−1 (m)µ m−s+1 )x r+1
+(α r+2 µ m + α r+3 mµ m−1 + α r+4 1
2 (m 2 − m)µ m−2 + . . . +α r+s q s−2 (m)µ m−s+2 )x r+2 + . . . + (α r+s µ m )x r+s ] + . . .},
1
λ
mA m v 0 = P t
j=1 α j x j + {. . . +
µ λ
m
[ (α r+1 + α r+2 m µ + α r+3
1 2
(m
2−m)
µ
2+α r+4
1
6
m
3−
12m
2+
13m
µ
3+ . . . + α r+s qs−1(m) µ
s−1 )x r+1
+(α r+2 + α r+3 m µ + α r+4
1 2
(m
2−m)
µ
2+ . . . + α r+s qs−2(m) µ
s−2 )x r+2
+ . . . + (α r+s )x r+s ] + . . .}.
It is clear that, as m goes to infinite, the sequence λ 1mA m v 0 converges to an eigenvector of A associated with λ, the dominant eigenvalue of A. Finally, the assertions on the rate of convergence follow by setting
p s−1 (m) = α r+1 + α r+2 m µ + α r+3
1 2
(m
2−m)
µ
2+α r+4
1
6