• Non ci sono risultati.

Newton-type methods for the Multilinear Pagerank

N/A
N/A
Protected

Academic year: 2021

Condividi "Newton-type methods for the Multilinear Pagerank"

Copied!
66
0
0

Testo completo

(1)

Corso di Laurea in Matematica

Tesi di Laurea Magistrale

Newton-type methods for the

Multilinear PageRank

Candidato: Relatore:

Alberto Bucci

Prof. Federico Poloni

Controrelatrice:

Prof.ssa Beatrice Meini

(2)
(3)
(4)

Contents

Introduction 1

Chapter 1. Multilinear PageRank 3

1. The PageRank problem 4

2. The higher-order PageRank problem 6

3. The Multilinear PageRank problem 8

Chapter 2. Iterative algorithms for the Multilinear PageRank 11

1. The fixed-point iteration 12

2. The shifted fixed point iteration 13

3. The inner outer-iteration 14

4. The inverse iteration 15

5. Newton’s method 16

6. The inexact inverse Newton’s method 21

Chapter 3. Numerical homotopy continuation 25

1. Numerical continuation methods 26

2. Predictor-Corrector methods 28

Chapter 4. Continuation algorithms for the Multilinear PageRank 33

1. Natural defined homotopy 34

2. Newton P-C method for the Multilinear PageRank 42 3. Moore-Penrose P-C method for the Multilinear PageRank 45

Chapter 5. Numerical results 49

Conclusions and future works 59

Bibliography 61

(5)
(6)

Introduction

Page ranking, or site scoring, plays a crucial role in developing search engines.

The most well-known way to define the value of a page is, without a doubt, the one given by Larry Page and Sergey Brin in their paper "The anatomy of a large-scale hypertextual web search engine" [1].

In the article, they affirm that page ranking can be computed by count-ing the number and quality of links to a page to determine an estimate of how important the website is. The underlying assumption is that more important websites are likely to receive more links from other important websites.

There exists a great literature on the so calledPageRank problem, in

par-ticular there are several efficient algorithms designed for its resolution. Despite this, dealing with more and more data, researchers started study-ing new models capable of handlstudy-ing this additional information.

Here we focus our attention on one of these generalizations, the Multi-linear PageRank model.

The aim of this thesis is to present new methods for solving the Multilinear PageRank problem.

For that, after a brief introduction on the problem, we analyse different known iterative algorithms, such as the ones studied in a paper by Gleich, Lim and Yu [2]; we also generalized many theoretical results that, in the literature, were known only for the quadratic case of the problem.

In this brief investigation, we confirm the superiority of the Newton’s itera-tion over the other iterative algorithms, providing also an ad hoc modifica-tion of Newton’s method, the so called "Inexact inverse Newton’s iteration".

This modification, scaling to large problems, could help to reduce the com-putational effort required by the Newton’s iteration.

Though Newton’s iteration turned out to be the best alternative, it still has some drawbacks. In particular, there are tensors for which convergence is too slow or even not achieved. So, for reasons of reliability, we investi-gated the problem further.

Inspired by ideas of homotopy continuation we designed two new al-gorithms. To introduce them, we give notions of numerical continuation theory, that is a strategy used to approximate solutions of a system of parametrized nonlinear equations. In particular we focus our attention on predictor-corrector methods. These methods apply really well on our case,

(7)

indeed the equations involved in the Multilinear PageRank problem can be seen as homotopy maps, that are effectively parametrized equations.

In the conclusion of the thesis we report some numerical experiments conducted on a large amount of tensors of several dimensions. The exper-imentation shows that one of the predictor-corrector algorithms still has some troubles with input coefficients that we should make adaptive, while with the other we have already achieved great results.

(8)

CHAPTER 1

Multilinear PageRank

The purpose of this chapter is to explain the reasons that led to the for-mulation of the Multilinear PageRank problem.

Firstly we briefly describe the popular PageRank problem, then we intro-duce its higher-order modification, which is the natural extension from a first order Markov chain to another one of higher degree.

Unfortunately the tensor product structure in the state space and the higher-order stationary distribution of this generalized version make it hard to scale to large problems.

The Multilinear PageRank formulation, on the other hand, enables us to rise above the previous inconvenient without losing valuable data.

(9)

1. The PageRank problem

When we come upon the word PageRank [3] we immediately think of web page rankings.

But to be precise, PageRank is an algorithm used to rank pages which, roughly speaking, works by counting the number and quality of links to a page to determine an estimate of how important the website is. The un-derlying assumption is that more important websites are likely to receive more links from other important websites.

The desired ranking can be mathematically defined both as the solution of an eigenvalue problem or, equivalently, as the stationary distribution of a Markov chain.

To understand this dualism we state thePageRank problem :

Definition. Let R ∈ M(n, R) be a column stochastic matrix, let 0 < α < 1

be a constant and let v be a positive stochastic vector.

The PageRank problem consists in finding a stochastic vector x ∈ Rnsuch that

(1) x = αRx + (1 − α)v.

The vector x is often referred to as PageRank vector. Equation 1 is equivalent to

Mx =hαR + (1 − α)veTix = x.

It follows, as anticipated, that x is an eigenvector of αR + (1 − α)veT. Furthermore, note that matrix M is stochastic: nonnegativity is trivial, while

eTM = αeTR + (1 − α)eTveT = (α + (1 − α))eT = eT.

proves each column of the matrix summing to 1.

Since M is stochastic it represents a Markov chain and the vector x its sta-tionary distribution.

From a practical point of view, solutions that matters are the ones related to values of α closer to 1. Unfortunately they are also the most expensive to compute.

In this regard, an important result is the following:

Proposition1. The eigenvalues of the matrix M = αR+(1−α)veT different

from 1, are the same as those of R multiplied by α.

Proof. We have already proved that matrix M is stochastic, we now show that it is also irreducible.

(10)

1. THE PAGERANK PROBLEM 5

of the nonnegative matrix αR and the positive matrix (1 − α)veT, hence is positive.

Perron-Frobenius theorem [4] ensures that the spectral radius of M is both a real and a simple eigenvalue.

To show that ρ(M) = 1 we use the following fact which applies to nonneg-ative square matrices [5]:

1 = min j n X i Mijρ(M) ≤ max j n X i Mij= 1.

Let λ be any eigenvalue of R different from 1 and vλbe one of its relative

eigenvector

eTvλ= (eTR)vλ= hRTe, vλi= he, Rvλi= eTRvλ= λeTvλ.

This shows that eTvλ= 0.

It now suffices to multiply M by vλto conclude

Mvλ= αRvλ+ (1 − α)veTvλ= (αλ)vλ.

 Corollary. The eigenvalues of M that are different from 1 are with norm

less than α.

The importance of this fact is due to the dependence, in iterative meth-ods, of the speed from the norm of the second greatest eigenvalue. For this, the parameter α is often referred to as damping factor.

There exists an extensive body of scientific literature on the PageRank prob-lem, so that the curious reader can find theoretical results on the existence and uniqueness of the solution and excellent algorithms to find it.

Research goes on in two directions: improving existing algorithms and de-veloping new models which take in account additional information. This second approach is relatively new and we will proceed in accordance to it.

(11)

2. The higher-order PageRank problem

One of the main limits of the PageRank problem is the quantity of data you can deal with.

Indeed, when constructing the transition matrix R, we only make use of information given by links.

A possibility that has been suggested is to improve the model, among the various possibilities one of the best and natural way to do it is transforming the Markov chain structure from a first order to a higher one.

Definition. An m-order Markov chain S is a stochastic process that

satis-fies the relation

P r(St= i1|St−1= i2, . . . , S1= it) = P r(St= i1|St−1= i2, . . . , St−m= im+1).

In words, this means that the future state only depends on the past m states. Although the probability structure of a higher-order Markov chain breaks the fundamental Markov assumption [6], any higher order Markov chain can be reduced to a first order Markov chain by taking a Cartesian product of its state space.

Definition. Let P be an order-(m+1) transition tensor representing an mth

order Markov chain, 0 ≤ α < 1 be a probability, and v be a positive stochastic vector.

Then the higher order PageRank tensor X is the order-m n-dimensional tensor that solves the linear system

(2) X(i, j, . . . , l) = α n X i=1 P(i, j, . . . , k)X(j, . . . , l, k) + (1 − α)vi n X k=1 X(j, . . . , l, k).

For the second order case, the latter system is equivalent to

vec(X) = [αP + (1 − α)V ]vec(X).

Where P is the flattening of P along first index and V = eTI ⊗ v. The vectorized equation resemble the equation:

x = [αP + (1 − α)veT] x. Which is a rewriting of 1 (recall that x is stochastic).

This leads us to believe that there exists a relationship between higher or-der PageRank and the PageRank described above.

(12)

2. THE HIGHER-ORDER PAGERANK PROBLEM 7

Actually we have this result:

Theorem 1. Consider a higher order PageRank problem α, P, v, where P

is an order (m + 1)-tensor. Let P be the matrix for the reduced form of P. Let M = αP + (1 − α)V be the transition matrix for the vector representation of the stationary distribution tensor X.

This stationary distribution is equal to the PageRank vector of the PageRank problem  1 − (1 − α)m−1, M m−1(1 − α)mVm 1 − (1 − α)m , v ⊗ · · · ⊗ v | {z } m  . Where V denote the the matrix eT(I ⊗ · · · ⊗ I

| {z }

m−1

) ⊗ v.

The theorem says that any higher order PageRank problem can be for-mulated as an instance of a PageRank problem.

Corollary. The solution of a higher order PageRank problem exists and is

unique.

Moreover the equivalence tells us that all the theory of higher order PageRank can be derived from the theory of PageRank.

The major drawback of this higher order formulation is the scalability limit, indeed the memory required to represent just the stationary distribution in the second order case is O(n2), and grows infeasible as n scales.

Is there any way to modify the problem to overcome the scalability limit without losing the higher order structure?

(13)

3. The Multilinear PageRank problem

Since troubles arise when solving higher order PageRank due to mem-ory occupied by the tensor of stationary distribution, Li and Ng proposed a valid alternative for the problem [2].

The idea is to diminishing the memory required by replacing the stationary distribution with a rank 1 factorization.

For example, in the second order case, the stationary distribution X will be replaced with one of the form xxT, so we need just to store the vector x for our purposes.

It is a general fact that the memory needed to represent X will be linear in n. Recall that given a vector x ∈ Rn, xm denotes the tensor with [xm]i1...in = xi. . . · xim.

Definition. Let P be an order (m + 1) tensor representing an mth order

Markov chain, α be a probability less than 1 and v be a positive stochastic vec-tor. Then the multilinear PageRank vector is a nonnegative stochastic solution of the following system of polynomial equations

(3) x = αPxm+ (1 − α)v. or equivalently (4) x = αR(x ⊗ · · · ⊗ x | {z } m ) + (1 − α)v.

Where R denotes the flattening along the first index of the tensor P.

We have just described the Multilinear PageRank problem; the aim of this thesis is to present algorithms for its resolution.

(14)

3. THE MULTILINEAR PAGERANK PROBLEM 9

Let’s begin to introduce some results about existence and uniqueness of the solutions:

Theorem2. A stochastic solution of the multilinear equation 4 always

ex-ists.

Proof. Consider the subset S = { y ∈ Rn| y

i0, ||y||1= 1 }:

S is closed because it is the intersection of the two closed subsets:

A = { y ∈ Rn| 0 ≤ yi1 } and B = { y ∈ Rn| ||y||1= 1 };

S is limited because is a subset of the n-ball of radius 1 with re-spect to the 1-norm.

S is convex because both relationships are preserved by convex combinations.

Being S ⊂ Rnwe have that S is a a convex compact set.

Let f : Rn→ Rnbe the continuous function f (y) = αR(y ⊗ · · · ⊗ y | {z }

m

) + (1 − α)v. We observe that f (S) ⊂ S indeed the entries are obtained by adding positive elements and the 1-norm is preserved

eTf (y) = eThαR(y ⊗ · · · ⊗ y) + (1 − α)vi= = αeTR(y ⊗ · · · ⊗ y) + (1 − α)eTv =

= α + 1 − α = 1;

Now by Brouwer fixed-point theorem [7] we conclude.  Although the multilinear PageRank vector always exists, it may not be unique as shown in the following example:

Example. Let α = 0.99, v = [0, 1, 0]T, R =         0 0 0 0 0 0 1/3 1 0 0 0 0 0 1 0 1/3 0 1 1 1 1 1 0 1 1/3 0 0        

Then both x = [0, 1, 0]T and x = [0.1890, 0.3663, 0.4447]T are solutions.

One may wonder how can the solution of the Multilinear PageRank problem be defined if the are multiple solutions. We will answer this ques-tion later. For now any Multilinear PageRank vector is a soluques-tion of the Multilinear PageRank problem.

(15)

Concerning uniqueness the theory does not mirror the case of the stan-dard and higher-order PageRank vectors, where uniqueness is given for

α < 1, on the other hand a certain range of α where the solution is unique

exists:

Theorem 3. Let R ∈ M(n, nm, R) be a column stochastic matrix, v be a

nonnegative vector. Then the multilinear PageRank equation

(5) x = αR(x ⊗ · · · ⊗ x

| {z }

m

) + (1 − α)v.

has a unique solution when 0 < α <m1.

Proof. We make use of the following fact (a proof can be found in [2]) Proposition 2. For stochastic vectors x1, . . . , xm and y1, . . . , ym where the

size of xi is the same as the size of yi, we have

||x1⊗ · · · ⊗xmy1⊗ · · · ⊗ym||1

m

X

i=1

||xiyi||1.

Suppose now that there are two distinct solutions x, y to the multilin-ear PageRank equation

x = αR(x ⊗ · · · ⊗ x) + (1 − α)v. y = αR(y ⊗ · · · ⊗ y) + (1 − α)v.

By subtracting the two equations we get

x − y = αR(x ⊗ · · · ⊗ x − y ⊗ · · · ⊗ y).

From proposition 2 we obtain:

||x − y||1= ||αR(x ⊗ · · · ⊗ x − y ⊗ · · · ⊗ y)||1≤ ≤α ||R||1        m X i=1 ||x − y||1        = = αm||x − y||1< < ||x − y||1. E 

(16)

CHAPTER 2

Iterative algorithms for the Multilinear PageRank

In this chapter we present 6 different iterative algorithms to solve the multilinear PageRank problem: 5 of them first proposed in [2] and one which appears here for the first time.

1) a fixed-point iteration;

2) a shifted fixed-point iteration; 3) a non-linear inner-outer iteration;

4) an inverse iteration, as in the inverse power method; 5) a Newton’s iteration.

6) an inexact inverse Newton’s iteration.

Then we analyse both theoretical aspects and experimental results.

We will show the superiority of Newton’s method, especially in the second order case, over the others.

(17)

1. The fixed-point iteration

We know that fixed-point iteration is an affirmed method of computing fixed points of iterated functions.

In this case the function we consider is f (x) = αR(x ⊗ · · · ⊗ x) + (1 − α)v. Convergence in the uniqueness regime is guaranteed:

Theorem4. Let v and x

0be stochastic vectors, let α <m1.

The fixed-point iteration

xk+1= αR(xk⊗ · · · ⊗xk) + (1 − α)v.

always converge to the unique solution x of the multilinear PageRank problem, moreover the following inequality holds

||xkx||1(αm)k||x0x||1. Proof. We have already proved in theorem 2 that x

kremains stochastic

for all iterations.

We now show the iteration is a contraction:

||y − xk+1||1= ||αR(y ⊗ · · · ⊗ y − xk⊗ · · · ⊗xk)||1αm ||y − xk||1.

In the last inequality we used proposition 2 and the fact that R is column stochastic.

This prove the desired inequality.

 As in the standard PageRank problem, we see that convergence rate highly depends on α.

Concerning the computational cost, we need to compute just a Kronecker product and a sparse matrix-vector multiplication.

(18)

2. THE SHIFTED FIXED POINT ITERATION 13

2. The shifted fixed point iteration

The second algorithm is again a fixed point iteration, it differs from the previous in the iterative function.

We first rewrite equation 4 as

(1 + γ)x = αR(x ⊗ · · · ⊗ x) + (1 − α)v + γx. or alternatively x = α 1 + γR(x ⊗ · · · ⊗ x) + 1 − α 1 + γv + γ 1 + γx. For obvious reasons γ is also called shift parameter. The resulting iteration is

xk+1= α 1 + γR(xk⊗ · · · ⊗xk) + 1 − α 1 + γv + γ 1 + γxk.

Even in this case, convergence is ensured for α < m1, independently of the shift parameter, indeed it holds [2]

||xkx||1 αm + γ 1 + γ

!k

||x0x||1.

This estimate should suggest that γ = 0 is probably the best choice for the shift parameter, however some heuristics about differential equations and some experiments say the opposite.

The computational cost is almost equal to the one of the fixed-point it-eration.

(19)

3. The inner outer-iteration

The structure of this algorithm is slightly different from previous ones. The idea is to create a non-linear iteration scheme where a single itera-tion involves solving a multilinear PageRank problem in the convergence regime.

Here too we write equation 4 in another way

 αR + (1 − α)veT(x ⊗ · · · ⊗ x) = x. α m  αR + (1 − α)veT(x ⊗ · · · ⊗ x) +  1 −α m  x = x.

From here the non-linear iteration emerges

xk+1= α m  αR + (1 − α)veT(xk+1⊗ · · · ⊗xk+1) +  1 −α m  xk.

Latter equation is a multilinear PageRank problem with new parameters;

αR + (1 − α)veT acts as R, xkas v and asdamping factor.

In particular we get adamping factor which turns out to be less then m1. Theorem 3 ensures uniqueness of the solution and iterative methods are very fast when the solution is unique.

Not surprisingly, also this method converges when α < m1 [2].

This algorithm is far more expensive than previous ones in term of compu-tational effort since every iteration requires the resolution of a multilinear PageRank problem.

(20)

4. THE INVERSE ITERATION 15

4. The inverse iteration

For this further iteration we make use of the following equivalence:

R(x ⊗ · · · ⊗ x | {z } m ) = R m            m X i=1 x ⊗ · · · ⊗ x | {z } i−1I ⊗ x ⊗ · · · ⊗ x | {z } m−i            x := S(x)x.

Where the definition of S is straightforward. We rewrite equation 4 as

αS(x)x + (1 − α)v = x.

A new iteration arise

xk+1= αS(xk)xk+1+ (1 − α)v.

The latter equation is the so calledinverse iteration and it consists in a

stan-dard PageRank problem where S(xk) acts as an iteration matrix.

It turns out that the multilinear PageRank vector is the PageRank vector of a solution dependent Markov process.

Theoretical convergence is ensured for 0 ≤ α < m1 and in this case there is an estimate on the convergence rate similar to the previous [2].

As regards the computational cost for each iteration we need to compute the matrix S(xk), this requires m Kronecker products, and to solve a linear

(21)

5. Newton’s method

We finally describe Newton’s method for solving multilinear PageRank. In the literature Newton’s method is one of the most used algorithm for solving nonlinear equations or systems.

As in many others iterative algorithms, it may not converge; but when it does, it converges faster than almost any other alternatives, except in very specific cases.

The broad use we will do of Newton’s iteration makes an explanation nec-essary.

Let the system of nonlinear equations be described by a function

f : Rn→ Rnand let Jf denotes its Jacobian matrix.

Algorithm 1Standard Newton’s method

Inputε, M desired tolerance, max iterations

x0∈ Rn

+ initial point

repeatxk+1= xk(Jf(xk))−1f (xk); Newton’s iteration

ifiterations> M failure

else if ||xk+1xk||< ε stopping criterion

x = xk;

outputx

We see that, in Newton’s iteration, the term (Jf(xk))−1appears.

The computation of this matrix inverse would be too expensive for our pur-poses; to overcome this drawback we exploit the equality

(6) Jf(xk)(xk+1xk) = f (xk).

The difference between the two equations is that in equation 6, to obtain

xk+1, we only need to solve the system in pk= xk+1xk, without ever

com-puting the inverse of the Jacobian matrix, and then summing together pk

and xk.

This idea is frequently used with Newton’s iteration, nevertheless we have been able to provide an efficient way to approximate the inverse of the Ja-cobian sufficiently well . We will discuss it later.

(22)

5. NEWTON’S METHOD 17

To implement the change in the algorithm we can modify only the iter-ation step preserving the overall structure:

Algorithm 2Newton’s method without the inverse of the Jacobian

Inputε, M desired tolerance, max iterations

x0, x1∈ Rn p0= x1−x0; initial points repeatsolve Jf(xk) pk= f (xk) xk+1= xk+ pk; Newton’s iteration ifiterations> M failure

else if ||xk+1xk||< ε stopping criterion

x = xk; outputx

Getting back to the matter at hand, Newton’s method applies to the function f (x) = αR(x ⊗ · · · ⊗ x) + (1 − α)v − x. In this case Jf(x) = αR            m X i=1 x ⊗ · · · ⊗ x | {z } i−1I ⊗ x ⊗ · · · ⊗ x | {z } m−i+1            −I.

For convenience, from now on, the sum above will be denoted by

∂x(x ⊗ · · · ⊗ x).

Knowing that invertibility of the Jacobian is related to the functioning of the algorithm it is important to analyse the structure of Jf(x).

Proposition 3. The matrixJ

f(x) = I − αR (∂x(x ⊗ · · · ⊗ x)) is a

non-singular M-matrix for all α ∈ (0, 1/m) and nonnegative vector x with eTx ≤ 1.

Proof. First remember that a matrix A is a M-matrix if it can be ex-pressed in the form A = sI −B, where B = (bij) with bij0, for all 1 ≤ i, j ≤ n with ρ(B) ≤ s.

Then recall that every induced norm satisfies ρ(B) ≤ ||B||.

In particular ρ(αR(∂x(x ⊗ · · · ⊗ x))) ≤ ||αR(∂x(x ⊗ · · · ⊗ x))||1= αm(eTx)m−1< 1.

The desired splitting is given by s = 1 and B = αR(∂x(x ⊗ · · · ⊗ x)).

(23)

The latter fact ensures, if the hypotheses above apply, that Newton’s method can be run without worrying about the singularity of the Jacobian. Next we present and prove a theorem for all m > 1; the version with m = 2 has already been proved in [2].

Theorem5. Let v be a stochastic vector, let 0 < α < 1

m.

Then Newton’s iteration with x0= 0

xk+1= xkJf(xk)

1 f (xk).

always converges to the unique solution x of the multilinear PageRank problem.

Proof. From now on zk= eTxk and pk= xk+1xk. First we prove that the iteration is well-defined. It is easy to do so if we establish that

f (xk) ≥ 0, xk0, zk1.

We proceed by complete induction on the inequalities above. The base case trivially holds.

For the step case consider the equation pk= −Jf(xk)−1f (xk); by proposition

3 we know that −Jf(xk) is a non-singular M−matrix, this implies that its

inverse is nonnegative and consequently that pk0. From pk0 we have xk+1= xk+ pkxk. Then we have f (xk+1) = f (xk+ pk) = = αR ((xk+ pk) ⊗ · · · ⊗ (xk+ pk)) + (1 − α)v − xkpk= = f (xk) + αR (∂x((xk+ pk) ⊗ · · · ⊗ (xk+ pk))) pkpk= =hJf(xk+ pk) − Jf(xk)ipk= = αR [∂x((xk+ pk) ⊗ · · · ⊗ (xk+ pk)) − ∂x(xk⊗ · · · ⊗xk)] pk0.

It remains to show that zk1.eTJ f(xk)(xk+1xk) = eTf (xk). (1 − αm zm−1k )(zk+1zk) = αzmk + (1 − α) − zk. zk+1= αzmk + (1 − α) − zk 1 − αm zm−1k + zk.

(24)

5. NEWTON’S METHOD 19

zk+1=

(1 − m)α zkm+ (1 − α) 1 − αmzm−1k .

We want to show that zk+1> 1 implies zk> 1, indeed if zk+1> 1 and zk1.αm zm−1

k < (1 − m)αzmkα.

m zm−1k + (1 − m)zmk1 > 0. And a quick computation leads to zk> 1. E

So the iteration is well-defined and moreover xk+1xk.

The latter inequality, together with zk ≤1, tell us that the succession

con-verges to a nonnegative solution (monotonicity and boundedness).

Now to conclude is sufficient to note that in our hypotheses the unique nonnegative solution x of f (x) = 0 with eTx ≤ 1 is the stochastic one. A

proof of this last fact can be found in chapter 4.

 We established that the algorithm converges , what about the conver-gence rate?

Theorem6. Let v be a stochastic vector, let α < 1

m.

Then Newton’s iteration converges with order 2 to the stochastic solution.

Proof. We have already proved that the sequence of the xk converges, to estimate the rate we derive 2 equations

f (xk+1) = αR(xk+1⊗ · · · ⊗x

k+1) + (1 − α)v − xk+1=

= f (xk) + Jf(xk+1)(xk+1xk).

Multiplying on the left by eT we get

(7) fk+1= fk+ (αm zm−1k+11)(zk+1zk).

Secondly consider the equation

f (xk) = −Jf(xk)(zk+1zk).

Multiplying again by eT we obtain

(8) fk= (1 − αm zkm−1)(zk+1zk). The same equation holds if we shift the index

(25)

(9) fk+1= (1 − αm zk+1m−1)(zk+2zk+1).

Putting equations 7, 8 and 9 together

(1 − αm zm−1k+1)(zk+2zk+1) − (1 − αm zm−1k )(zk+1zk) = (αm zm−1k+11)(zk+1zk). (1 − αm zm−1k+1)(zk+2zk+1) = αm(zk+1zk)2        m−2 X i=0 zikzm−2−ik+1        . (zk+2zk+1) ≤       αm(m − 1) 1 − αmzm−1k+1      (zk+1zk)2≤ αm(m − 1) 1 − αm ! (zk+1zk)2= C0(zk+1zk)2.

And since zk1 there will be a k

0from which convergence becomes

qua-dratic.

Looking back to equation 8 we conclude.

 Results on Newton’s convergence rate, as Kantorovich Theorem [8], suggest that when convergence arises it is usually quadratic.

Effectively, retracing the proof above, we note that if the succession con-verges, the hypothesis α < m1 is superfluous. However when α > m1 the algorithm may converge to a solution which is not stochastic or do not con-verge at all.

So, if on the one hand, Newton’s method has better convergence behaviour, on the other, solving a linear system is quite expensive.

For this reason we propose a modified Newton’s iteration for solving mul-tilinear PageRank where there is no need to solve any system.

(26)

6. THE INEXACT INVERSE NEWTON’S METHOD 21

6. The inexact inverse Newton’s method

The idea behind this new algorithm is that, though computing the in-verse of the Jacobian may be difficult for a generic nonlinear system, in our case the exact form of the inverse is known.

Indeed for a matrix B of the form I − A with ρ(A) < 1, the inverse is given by the Taylor series

X

k=0

Ak.

Truncating this series at the dth term we obtain an approximation of the inverse, this is exactly what we are going to do:

Algorithm 3Newton’s method with inexact inverse Jacobian

Inputε, M, d desired tolerance, max iterations, Taylor degree

x0∈ Rn initial point repeatxk+1= xk+        d X i=0 (αR∂x(xk⊗ · · · ⊗xk))i        f (xk); Newton’s iteration ifiterations> M failure

else if ||f (xk+1)|| < ε stopping criterion

x = xk+1;

outputx

Theorem7. Let v be a stochastic vector, let α ≤ 1

m. The Newton’s method

with inexact inverse Jacobian and Taylor degree d, starting from x0= 0; con-verges to the unique solution x of the multilinear PageRank problem.

Proof. To prove this theorem we make use of theorem 11, in particular what is needed in this proof is that the minimal nonnegative solution of equation 4 in the interval (0;m1) is the stochastic one.

The idea is to show that the sequence {xk}is strictly increasing and upper

bounded by the stochastic solution x. If we show this we are done; because the unique nonnegative solution less or equal than x is x itself.

We will prove it by induction. First notice that

f (x0) = (1 − α)v ≥ 0 and x1= x0+ d X i=0 (αR∂x(x0⊗ · · · ⊗x0))if (x0) = x0+ f (x0) = (1 − α)v ≥ x0.

Then consider the term xk+1, it is given by xk+ d

X

i=0

(αR∂x(xk⊗ · · · ⊗xk))if (xk).

(27)

The base case has already been proved.

We need to show that if f (xk) ≥ 0 then f (xk+1) ≥ 0.

Let pk denote the difference xk+1xk, we know that pk≥0.

From Taylor series we obtain

(10) f (xk+1) = f (xk+ pk) = f (xk) + Jf(xk)pk+ g(xk, pk).

Here g(xk, pk) denotes the remainder of the Taylor expansion and it can be

shown to be nonnegative. Since it is tedious to prove this fact we omit the proof here. From 10 follows f (xk+1) ≥ f (xk) + Jf(xk)pk= = f (xk) + (I − αR∂x(xk⊗ · · · ⊗xk))        d X i=0 (αR∂x(xk⊗ · · · ⊗xk))i        f (xk) = = (αR∂x(xk⊗ · · · ⊗xk))d+10. But we have xk+1= xk+ d X i=0 (αR∂x(xk⊗ · · · ⊗xk))if (xk) ≤ ≤xk+ ∞ X i=0 (αR∂x(xk⊗ · · · ⊗xk))if (xk) = xkJf(xk) −1 f (xk). In particular xk+1x. 

We want to emphasize that this algorithm is almost equivalent to New-ton’s one, nevertheless the computational cost is lower since in NewNew-ton’s method we need to solve a system of nonlinear equations, while in the in-exact inverse Newton’s method every iteration requires only the computa-tion of few matrix-vector products.

Since the convergence rate of this algorithm is almost the same as Newton’s one and since it requires a smaller effort, it should be the best iterative al-gorithm for solving the Multilinear PageRank problem, at least when con-vergence is ensured.

Unfortunately, as we will see in last chapter, when α ≥ m1, this algorithm converges much less than Newton’s one.

(28)

6. THE INEXACT INVERSE NEWTON’S METHOD 23

We have already proved that a stochastic solution of equation 4 exists, although it may not be unique.

However, even when the stochastic solution is unique, Newton’s iteration may converge to a solution which is not stochastic.

In the next chapter we will see that there is at least a nonnegative solution other than the stochastic one. To prevent Newton’s method from converg-ing to a non-stochastic vector an interestconverg-ing expedient is to use an explicit stochastic normalization:

Algorithm 4Normalization enforced Newton’s method

Inputε, M desired tolerance, max iterations

x0∈ Rn+ initial point

repeatxk+1= xk(Jf(xk))−1f (xk); Newton’s iteration

xk+1= max(xk+1,0) eTmax(x k+1,0); Normalization step ifiterations> M failure else if ||xk+1x k||< ε stopping criterion x = xk; outputx

To simplify the notation, the analysis we are going to carry out will fo-cus on the second order case.

(29)
(30)

CHAPTER 3

Numerical homotopy continuation

This chapter serves as an introduction to numerical continuation meth-ods, in particular we focus our attention to numerical homotopy techniques. These tools are used to solve systems of nonlinear equations. In this ap-proach the idea is to find the solution by tracking the solutions of "nearby"

systems of equations.

Basically we approximate the solution of the system with continuity: start-ing from the solution of a system which is easier to be found we move towards the solution of the initial system.

After presenting the basic principles of continuation method we will intro-duce predictor-corrector methods.

(31)

1. Numerical continuation methods

We want to solve a system of nonlinear equations, say F(x) = 0, where

F : Rn→ Rnis a smooth function.

To do so we first construct a smooth function H : Rn+1 → Rn such that

H(1, x) = F(x).

In practice H(λ, x) = 0 represents a parametrized system of equations which in λ = 1 is equivalent to F(x) = 0.

Then we look for a value λ0close to 1 for which solving the system H(λ0, x) =

0 is easy. Of course we construct the function H so that such a value is known.

Without loss of generality assume λ0= 0.

At this point we proceed iteratively: we solve H(0, x) = 0 getting the set of solutions S0; we choose a step size δ0 and exploiting the information on

the set S0we solve H(δ0, x) = 0; in the same way we solve the sequence of

systems H(δk, x) = 0 with δ0< δ1< δ2< · · · < δn= 1.

Usually the function H is defined starting from a function G(x) in the fol-lowing way: H(λ, x) = (1 − λ)G(x) + λF(x) that is nothing more than an ho-motopy between the functions G(x) and F(x).

Of course choosing a right G(x) is important, specific start systems that closely mirror the structure of F(x) may be formed for particular systems. The choice of the starting system impacts the computational time it takes to solve F(x), in that, those that are easy to formulate tend to have higher number of points on the path to track, while when the formulation requires more effort the number of paths decrease. There is currently no good way to predict which will lead to the quickest time to solve.

Continuation methods base their functioning on the presence of implic-itly defined curves of solution points.

This assumption is verified in many cases, indeed the following theorem, corollary of theimplicit function theorem, holds:

Theorem 8. Let f : Rn+1→ Rn a smooth function, let y ∈ Rn a regular

value of f then f−1(y) is union of 1-manifolds (curves).

Recall that a regular value of a function f : X → Y is a y ∈ Y such that for ∀x ∈ f−1(y) the Jacobian Jf(x) is surjective.

If 0 is a regular value, theorem 8 applies and the solution set S is union of curves.

(32)

1. NUMERICAL CONTINUATION METHODS 27

λ = 0 λ = 1

Figure 1. Some possible implicit solution curves are depicted. We immediately notice that some branches starting from λ = 0 do not reach the line λ = 1; those paths are useless for us.

Of course many other problems could arise:

Does always exist, for every zero of F(x), a curve that reaches it?If such a curve exist, is it smooth?

When is it assured that it will intersect the target homotopy level

λ = 1 in a finite length?

How can we numerically trace the curve?

Answers depends on the specific problem we are investigating.

We answer the last question for generic problems by explaining predictor-corrector methods, while in our case we answer all the questions.

(33)

2. Predictor-Corrector methods

The idea in Predictor-Corrector (PC) methods is to numerically trace the curves of solution points described above.

In what follows we assume that the starting system H(0, x) = G(x) has al-ready been solved and that there exists a smooth curve linking λ = 0 and

λ = 1. To describe our method we will choose a curve c with these

proper-ties.

We want to find a sequence of points ui = (λi, xi) ∈ R × Rn satisfying a chosen tolerance criterion ||H(λi, xi)|| < ε for some ε > 0.

For ε sufficiently small there is a unique point ci and a unique parameter

value si such that ci= c(si) and which is nearest to uiin Euclidean norm.

The pattern is now straightforward, we suppose that a regular starting point uo = c(0) = c0 is given, then using the fact that uk is known we de-rive the point uk+1, that will not necessarily be on the curve, in accordance

to some predictor algorithm (predictor step); found uk+1we apply a correc-tor algorithm to find a point ci on the curve (corrector step), the method

continues until we get an intersection with the line λ = 1.

λ = 0u0= c0 • • c1 u1 • • u2 c2 • • u3 c3

(34)

2. PREDICTOR-CORRECTOR METHODS 29

Predictor step. The first of the two steps that compose each iteration of the method is the so calledprediction step, in this step from function

val-ues and, optionally, derivative valval-ues at a preceding set of points are used to extrapolate (predict) the function’s value at a subsequent point.

There are many methods that can be used to predict a new point based on the particular problem that you are facing. We mention just n−order Taylor expansion (usual order are one or two), Euler method, Runge-Kutta methods, but literature is full of different methods [9].

In our experimentation we used the easiest and computationally the cheap-est predictor algorithm: linear extrapolation from the previous two values. The choice is motivated by two reasons: first, we avoid complicating the understanding of the algorithm, secondly, even if we will not focus too much on large scale problems in this discussion, the final aim is to apply the algorithm to these kind of problems.

Mathematically suppose that we have already traced two consecutive points

ck−1and ck of c(s) and that we want to predict the point uk+1, the formula

for the extrapolation step is given by

(11) uk+1= ck+ hk(ck+1ck)

The number hkexpresses the step length; it can be constant or not, if it

de-pends on k we talk about adaptive stepsize method.

c0 • • c1 • u2 c2 • • u3 c3

(35)

Corrector step. The corrector step refines the approximated point uk

extrapolated in the predictor step to obtain a new point ck on the curve c.

The correction can be carried out in several ways: fixed point iteration, Newton’s iteration or specific designed algorithms.

Newton’s method, at most with slight changes, is often the best choice for the corrector. This is because of powerful contractive properties of the so-lution set H−1(0) for iterative methods such as Newton’s method.

c0 • • c1 • u2 c2 • • u3 c3

Figure 4. Corrector step.

Concerning the possible choices for ckit is preferable to choose a vector

that solves:

(12) ||ckuk||= min

H(c)=0

||c − uk||.

However, depending on the case, one may look for other solution points. For instance, we can use the Newton method as a corrector fixing the λ-coordinate to solve Hλ(x) = H(λ, x) = 0 for x.

xk+1= xkJ

1

(xk)Hλ

(x

(36)

2. PREDICTOR-CORRECTOR METHODS 31

To conclude this brief introduction to PC method we give a pseudo-algorithm to describe the structure of these pseudo-algorithms:

Algorithm 5Generic Predictor-Corrector Method

Inputε, M desired tolerance, max iterations

u0∈ Rn+1+ initial point

h > 0 initial steplength

repeatpredict a point uk such that

H(uk) ∼ 0, ||uk+1ck|| ∼h; Predictor step

approximateck+1that solves

min c n ||uk+1c || H (c) = 0 o ; Corrector step

choosenew steplength h; steplength adaptation ifiterations>M

failure; elsex = ck+1;

untiltraversing is stopped. intersection with line λ = 1 found outputx

(37)
(38)

CHAPTER 4

Continuation algorithms for the Multilinear

PageRank

In this chapter we show a new approach to the Multilinear PageRank problem inspired by ideas of homotopy continuation.

As we shall see, equation 4 itself gives rise to a homotopy.

We investigated the properties of this natural-defined homotopy, discover-ing many positive aspects, that therefore we provide below.

Then we propose a first class of algorithms which combine the algorithms seen in Chapter 2 and the predictor-corrector method.

Even thought these algorithms performed well in the vast majority of cases, we have been able to create appropriated examples where convergence is slow or even absent.

Finally we present a completely new algorithm for solving the multilinear PageRank which turned out to be the best under many aspects, providing a 100% of success in experiments.

(39)

1. Natural defined homotopy

Having a closer look at equation 4, similarities between the homotopy parameter λ and the damping factor α are evident.

Moreover, we observed that for low values of α, solving the system is easier. These two facts lead us to consider the homotopy defined by 4:

H(x, λ) = λR(x ⊗ · · · ⊗ x) + (1 − λ)v − x = 0.

To avoid confusion we will adopt the notation with α as homotopy param-eter instead of λ.

We have already said that we need to find algorithms that are capable to solve the multilinear PageRank problem for values of α = α0close to 1. So the endpoint of the homotopy will be set to α0; how do we choose the

starting value?

From theorem 3 we know that the stochastic solution, in the interval0,m1, is unique and that almost every previous algorithm converges to it.

To tell the truth, in a recent paper [10], Fasino and Tudisco provided new explicit formulas for the interval of uniqueness, that resulted even larger then just (0,m1), however convergence in this larger interval is proven only for the fixed-point iteration.

Moreover, the smaller the value of α is, the faster algorithms converge. These theoretical results suggest that a valid choice for the starting value would be between 0 and m1; however, starting with a small value of α, the computational effort needed to reach α0would be higher. Experimentally

we showed that starting before m1 is unnecessary. We proved also two important facts:

Theorem 9. Let v be a strictly positive vector and let 0 < α < 1. Any

nonnegative solution of equation 4 is strictly positive.

Proof. We prove it by contradiction.

Let x be a solution of system 4 and suppose that xi= 0. The ithequation of the system would be:

αRi(x ⊗ · · · ⊗ x) + (1 − α)vi= 0

(40)

1. NATURAL DEFINED HOMOTOPY 35

We deduce that:

αRi(x ⊗ · · · ⊗ x) = −(1 − α)vi< 0

But the left hand side is nonnegative by hypothesis. E  Corollary. If v is strictly positive and 0 < α < 1, every stochastic solution

of equation 4 is strictly positive.

Theorem10. Let v be a strictly positive vector and let 0 < α < 1, for every

stochastic solution x of equation 4, we have xi< 1.

Proof. We prove it by contradiction.

Let x be a solution of system 4 and suppose that xi= 1, being x stochastic

it follows x = ei.

Substituting in equation 4 we get

αR(ei⊗ · · · ⊗ei) + (1 − α)v = ei.

But every entry of the left-hand side is positive, because it is sum of a non-negative vector and one with strictly positive entries, while ei has some

zero entries. E 

This latter fact, together with the boundedness of the solutions with respects to 1-norm, ensure that the solution we follow is included in the compact set:

K =n(x, α) ∈ Rn+1

||x||1≤1, 0 ≤ α ≤ 1 o

With regard to nonnegative solutions, we can say another important thing: Theorem 11. Let x ∈ Rn a nonnegative solution of equation 4 with α ∈ (0; 1), then eTx = 1 or eTx = cα, where            cα> 1 f or α < m1 cα= 1 f or α = m1 cα< 1 f or α > m1

Proof. Let x be a nonnegative solution of equation 4. Multiplying on the left by eT, in accordance with previous notation, we get:

gα(z) := αzm+ (1 − α) − z = 0

(41)

and minima of gα(z), for z > 0: 0(z) = αmzm−1−1 gα0(z) = 0 ⇐⇒ z0= m−1 r 1 αm

A quick but omitted computation shows that z0 is a local minimum and gα(z0) < 0.

In addition gα(0) = 1 − α > 0, lim

x→+∞gα(z) = +∞ and gα(1) = 0.

It follows that between m−1q 1

αm and 1 there must be a value cα in which

gα(cα) = 0.

Since the function has only one critical point, the zeros of the function are exactly two, counted with multiplicity.

 We showed that the function gα(z) vanishes at cα and 1 and we know

that a stochastic solution always exists and it corresponds to the value 1, does the same hold for the value cα?

Theorem 12. For all values of α ∈ (0; 1) there exists a vector y ∈ Rn that

solves equation 4 and for which eTy = cα.

Proof. The proof retraces substantially the proof of theorem 2. Consider the set Sα= { y ∈ Rn|yi0, ||y||1= cα}.

It is possible to show that the set Sα ⊂ Rn is convex and compact, in the same way we did in theorem 2.

The tricky part is showing that f (Sα) ⊂ Sα, where f (y) = αR(y ⊗· · ·⊗y)+(1−

α)v:

eTf (y) = eThαR(y ⊗ · · · ⊗ y) + (1 − α)vi= = αeTR(y ⊗ · · · ⊗ y) + (1 − α)eTv =

= αcmα + 1 − α = cα= eTy;

Where, in the last equality, we used the definition of cα.

Brower fixed-point theorem concludes the proof.

(42)

1. NATURAL DEFINED HOMOTOPY 37

We have almost all the ingredients needed to describe a generic path for the homotopy we just analysed. Before doing so we describe all the generic nonnegative solutions of 4.

Theorem13. There always exists a minimal nonnegative solution of

equa-tion 4 and this soluequa-tion is stochastic for α ≤m1.

Proof. Consider the iteration x

k+1= αR(xk⊗ · · · ⊗xk) + (1 − α)v starting

from x0= 0.

We find x1= (1 − α)v ≥ 0 = x0.

Now we show by induction that xk+1xk.

xk+1xk= αR(xk⊗ · · · ⊗xkxk−1⊗ · · · ⊗xk−1) > 0.

So the succession {xk}k∈N is weakly increasing, if we show that it is also

bounded then it converges to a certain limit.

To prove that the succession is bounded we show that the sequence is al-ways less or equal than any nonnegative solution y of equation 4 that, by theorem 12, we know exist.

We proceed again by induction; the induction hypothesis is now xky,

of course we have x0y, so it remains to show that if xky the same holds for xk+1

y − xk+1= y − αR(xk⊗ · · · ⊗xk) = αR(y ⊗ · · · ⊗ y − xk⊗ · · · ⊗xk) > 0.

Now given that the sequence is increasing, nonnegative and upper bounded by any nonnegative solution of 4, the sequence converges to a vector z by monotone convergence theorem.

By theorem 11 we know that eTz must be either cα or 1.

But since the sequence is smaller than any other solution:

eTz = min(cα, 1).

(43)

α eTz

1

eTz = cα

α =m1

Figure 1. Sum of entries of the minimal solution (thick red line) and of the stochastic ones (dashed red line).

So far, we described many interesting properties of the solutions, but for a proper functioning of the algorithm it is necessary that the solution curve starting from α = 0 reaches the target value α∗.

We always found such a curve in our experiments and we have not been able to find a counterexample yet.

Let’s say that, if the Jacobian of the equation 4 is surjective on every point of the curve, in accordance with the statement of theorem 8, a regular curve starting from 0 exists.

It is also easy to prove that between 0 and m1 we are under the hypotheses of theorem 8, hence the presence of a solution curve from 0 tom1 is ensured. Can we extend this curve until α = 1?

Unfortunately, the Jacobian at α =m1 is never surjective, indeed:

eTJH  x, 1 m  = eT Rm∂x(x ⊗ · · · ⊗ x) − I R(x ⊗ · · · ⊗ x) − v  = 0.

Usually, where the Jacobian is rank deficient, the solution curve could present a singularity but this is not the case: we have just showed that equation 4 has two types of nonnegative solutions, are these sets disjoint? Actually in

α = m1 the stochastic solution and the one correspondent to cαcoincide. The intersection explains the rank deficiency: the two curves cross each other and continue on their way, without necessarily losing regularity.

(44)

1. NATURAL DEFINED HOMOTOPY 39

If the curve does not present singularities at any point, which seems reasonable in the general case by a dimensional argument, the solution must intuitively pass from α = 1. Indeed it cannot stop in the interior of the hypercube

C=n(x, α) ∈ Rn+1

0≤ |xi| ≤1, 0 ≤ |α| ≤ 1 o

.

and it cannot pass through the interior of its hyperfaces of the form C ∩ {xi= 0} or C ∩ {xi= 1}, due to theorem 9 and 10; it cannot also turn back to values of α < m1 due to the uniqueness of the solution curve. This implies that the curve crosses the hyperface C ∩ {α = 1} just like we wanted.

(45)

We discussed enough about the problem, now we wish to summarise and give some conclusions.

To do this and to make the understanding easier we represent a generic solution curve and we analyse all its properties.

The curve c ⊂ Rn+1, so it is not possible to represent it as a whole, to over-come this problem we represent a projection of the curve on its first coor-dinate x1.

x1

α v1

α = 0 α =m1 α = 1

Figure 2. Projection of a generic solution curve.

The curve in figure 6 represents a possible solution of a Multilinear PageRank problem: on the horizontal axis we have α and on the vertical one we have x1. At α = 0 the solution is v, so the starting value of the graph

is v1.

As regards the first part of the curve, α < m1, we see that the stochastic so-lution is unique.

From that point on the solution may not be unique. As a matter of fact, in our example there is violation of uniqueness.

From the stochastic hypothesis we know that x1 remains under the line x1= 1 and by theorem 9 it remains positive.

Once that the line α = m1 has been crossed, the curve cannot turn back; so at some time it will reach the line α = 1.

Of course the target value α< 1 will be reached before 1, in particular it

will be reached in finite time and a well-designed Predictor-Corrector al-gorithm should work.

(46)

1. NATURAL DEFINED HOMOTOPY 41

At a first glance, the self-intersection in figure 6 could lead someone to believe that the curve represented is only locally regular, and that follow-ing the solution would be really hard.

But, as we said before, assuming the Jacobian surjective is sufficient to en-sure the regularity of the curve. The confusion is caused by the projection along the first coordinate, if we project along 2 coordinates we see that the intersection disappears: v1 v2 α x1 x2

Figure 3. Projection of the curve in figure 2 on a higher di-mensional space where all the self intersections disappear.

(47)

2. Newton P-C method for the Multilinear PageRank

As its name suggests, the algorithm exploits the Newton’s iteration as corrector method, while for the predictor step it uses a linear extrapolation from the previous two values.

We now present the structure of this algorithm:

Algorithm 6Newton Predictor-Corrector Method for MLPR

Inputε, M desired tolerance, max iterations

α0, α1∈ R α starting values

applyNewton’s iteration with α0and α1

to get x0, x1∈ Rn+ initial points

h > 0 initial steplength

repeatxk+1= xk+ h(xkxk−1)

αk+1= αk+ h(αkαk−1) Predictor step

repeatNewton’s iteration with αk+1

to get xk+1; Corrector step

choosenew steplength h; steplength adaptation ifiterations>M

failure; elsex = xk+1;

untiltraversing is stopped. intersection with line α = α∗found outputx

Experiments revealed that, when the solution is α-increasing, the algo-rithm converges, even without steplength adaptation. Problems arise when the solution curve turns back; it is rare, but we will see that it happens. In the picture 4 below , with red dashed lines, are highlighted the inversion points.

An inversion point is a ( ¯x, ¯α) ∈ Rn+1in which the matrix Hx( ¯x, ¯α) is singular.

Geometrically we have an inversion point every time the curve has a verti-cal tangent.

x1

α

α = 0 α = 1

(48)

2. NEWTON P-C METHOD FOR THE MULTILINEAR PAGERANK 43

Recalling that Newton’s formula is given by xk+1= xk(Jf(xk))−1f (xk)

and that in our new notation Jf(xk) = Hx(xk, αk), at an inversion point New-ton’s iteration cannot be applied and close to it the algorithm will probably not converge.

Moreover, in a neighborhood of the inversion point, even when theoreti-cally there is convergence, the system we need to solve will be ill condi-tioned, causing many computational problems.

To avoid these problems we thought about 3 different possibilities: by-passing inversion points, improving the accuracy of the predictor step or modifying appropriately the corrector step.

To bypass the problematic points we included in the algorithm some checks that allowed us to foresee the presence of an inversion point, for example the slope of the curve with respect to α, and some tools that, once the in-version point has been detected, allow us to overcome the point.

α

Figure 5. In red the points with a high slope.

Using linear extrapolation as predictor step is easy and cheap, but, as re-gards accuracy, higher degrees of extrapolation or other techniques may give better results. Of course these methods are more expensive than the linear extrapolation and, in sight of large scale problems keeping a low computational cost is essential.

Fortunately, we have been able to find a way to improve the linear extrap-olation technique keeping the computational effort unchanged alternating the linear extrapolation step with the resolution of a least square problem. Recall that points along the curve solve equation 4.

In particular when we predict the new point xn+1 we expect that the best

(49)

αn+1[R(xn+1⊗ · · · ⊗xn+1) − v] ≈ xn+1v.

When we say best approximate we mean that αn+1 solves the linear least

square problem below.

So, first, one predict the point xn+1, then calculate αn+1minimizing

||αn+1[R(xn+1⊗ · · · ⊗xn+1) − v] − xn+1v||2. Differentiating we get αn+1= ||R(xn+1⊗ · · · ⊗xn+1) − v||2 2 hR(xn+1⊗ · · · ⊗xn+1) − v, xn+1+ vi. Notice that this computation requires only two inner products

c0 • • c1 • u2 c2

Figure 6. Predictor step in red and least square approxima-tion in light blue

Finally, to avoid problems with the convergence of Newton’s method, we can modify the iteration, allowing α to vary opportunely.

(50)

3. MOORE-PENROSE P-C METHOD FOR THE MULTILINEAR PAGERANK 45

3. Moore-Penrose P-C method for the Multilinear PageRank This is the main algorithm of this thesis.

To better understand the algorithm we need to introduce the Moore-Penrose Newton’s iteration.

Definition. Given a matrix A ∈ M(n, n + 1, R) with maximal rank, its

Moore-Penrose inverse is defined by A+= ATAAT−1.

Remark. The matrix 

AATis invertible and the definition is well posed.

Proof. 

AATx = 0 ⇒ xTAATx = 0 ⇐⇒ ||ATx||2= 0 ⇒ ATx = 0 ⇒ x = 0.

 We observed that Newton’s iteration is the best corrector algorithm be-tween the ones presented above, it just does not work well when there are inversion points.

This is because the Jacobian Hx(x, α) in those points is singular.

To overcome this problem, in [11], E.L. Allowger and K. George present what we call the Moore-Penrose Newton’s iteration, which is a generaliza-tion of Newton’s iterageneraliza-tion.

Consider a general nonlinear map F : Rn+1→ Rn and assume that x is a regular point of F.

Then the Moore-Penrose Newton’s iteration is given by N(x) := x − JF(x)+F(x)

Note that this Newton step is analogous to the classical Newton’s method, with the only formal difference being that the Moore-Penrose inverse JF(v)+

replaces the classical inverse.

In the Newton Predictor-Corrector algorithm replacing the Newton’s step with the Moore-Penrose Newton’s iteration, we obtain our final algorithm: Moore-Penrose Newton Predictor-Corrector algorithm.

(51)

Algorithm 7Moore-Penrose Newton P-C Method for MLPR

Inputε, M desired tolerance, max iterations

α0, α1∈ R α starting values

applyMoore-Penrose Newton’s iteration with α0and α1

to get x0, x1∈ Rn+ initial points

h > 0 initial steplength

repeatxk+1= xk+ h(xkx

k−1)

αk+1= αk+ h(αkαk−1) Predictor step

repeatMoore-Penrose Newton’s iteration

untilconvergence is achieved Corrector step choosenew steplength h; steplength adaptation ifiterations>M

failure; elsex = xk+1;

untiltraversing is stopped. intersection with line α = α∗found outputx

Algorithm 7 differs from the previous one only for the corrector step. The computational cost is slightly higher, on the other hand problems may arise only if 0 is not a regular value for H.

From a geometric point of view Newton’s iteration looks for a solution over an hyperplane where α is fixed, instead the Moore-Penrose inverse itera-tion searches the soluitera-tion in a neighbourhood of the starting point, moving towards the nearest solution point.

(52)

3. MOORE-PENROSE P-C METHOD FOR THE MULTILINEAR PAGERANK 47 v1 v2 α x1 x2 • • • N ewton Moore − P enrose

Figure 7. Difference between classic Newton’s iteration and Moore-Penrose Newton’s iteration.

(53)
(54)

CHAPTER 5

Numerical results

In the previous chapters we analysed the Multilinear PageRank prob-lem in depth. Here we describe the experiments conducted.

In the first part we investigate the different iterative algorithms proposed in chapter 2 to understand which is the most valid alternative.

Doing so, we confirmed the results in [2] about the superiority of Newton’s iteration over the other algorithms.

After this, we compare the predictor-corrector method which alternate lin-ear extrapolation and the least square approximation in the predictor step and Newton’s iteration as corrector.

This algorithm has great results, but still not satisfactory.

Finally we show the performance of the predictor-corrector Moore-Penrose Newton’s method, that turns out to be the best algorithm in term of relia-bility, with a 100% of successes.

(55)

We start analysing the six algorithms presented in chapter 2.

We divide the experimentation in two parts: in the first, we generate sev-eral random tensors and we note that, in most cases, all the algorithms converged; in the second we use both the database of problematic tensors created by Gleich, Lim and Yu and reported in [12] and some others cre-ated by us [13] to see the results.

The first aspect that we analyse is the dependence of the convergence rate and of the number of failures on thedamping factor α.

We use the following legend to refer to the algorithms: • F Fixed-point iteration;

S Shifted fixed-point iteration;IO Inner-outer iteration;

II Inverse iteration;N Newton’s iteration;

IIN Inexact inverse Newton iteration.

So, for instance, IIN5 means that we use the inexact-inverse Newton’s iter-ation with 5 terms of the Taylor series.

(56)

5. NUMERICAL RESULTS 51

We show the number of iterations and the execution time of the differ-ent algorithms.

To do so, we considered three different dimensions: 3, 5 and 7. Then we generated 1000 tensors of rank 3 for each dimension.

After that we ran the algorithms saving the total amount of iterations and the time spent. Of course in the sum we considered only the data obtained by tensors for which all the algorithms converged.

Figure 1. Number of iterations.

Riferimenti

Documenti correlati

Enhancement, 3-D Virtual Objects, Accessibility/Open Access, Open Data, Cultural Heritage Availability, Heritage at Risk, Heritage Protection, Virtual Restoration

Two families of non-overlapping coercive domain decomposition methods are proposed for the numerical approximation of advection dominated advection-di usion equations and

The other cases

The issue addressed in this work is the determination of the quenched critical point for the localization/delocalization phase transition of a polymer interacting with an

Questa condi- zione ha riflessi anche di carattere generale, perché non è pensabile che il sistema delle cure prolungate possa continuare in una condizione di marginalità e

Apply the ADS to different case-studies, already interpreted and where the detachment is well known (i.e. Hikurangi accretionary prism) in order to check if it works well for each

Upon setting the measuring mode of approach-retraction curves, calculation of oscillations amplitude, resonance frequency (when maximum amplitude is reached) and phase as a

The resulting binary images are input into the spline-based algorithm for diameter estimates and the measurements performance against REVIEW is reported in Table 8.. The removal