Multilinear PageRank and its application to graph clustering

(1)

Introduction

Google devised PageRank to help determine the importance of nodes in a graph representing web pages [1]. The key idea was the model of a random surfer, that represents the path of a user on internet following a random walk on the web graph. He/she takes a step according to a Markov chain with probability α, and moves according to a fixed distribution v with probability 1−α. Indeed, the surfer moves at each step from a page to another depending only on which page he lies.

Moreover, the model described above has been widely used in many ap-plications. More specifically, the random surfer has become a good model to deal with graphs whenever they have to be partitioned. Recently, higher-order structures have been considered when partitioning graphs in clusters, since they have proven fundamental to understand some real world graphs. Indeed, many different systems can be viewed as complex networks made up of objects (nodes) and connections between them (links or edges). Over the past several decades the study of such networks has become important in many disciplines, such as social networks [2], financial networks [3], com-munity detection [4] and so on. A recurrent research theme is finding the communities within these networks. These communities reveal important structures hidden within the network data. Often, a community is a group of nodes that are more closely connected to each other than to the rest of the network. More often, these connections are represented better with information involving more than two nodes simultaneously.

Indeed, the random surfer model doesn’t allow us to manage these higher-order information, that requires a new model and other mathematical fea-tures.

Recently, in [5], Gleich et al. have introduced a modification of the model above, considering one step depending on more than one page visited in the

(4)

iv CONTENTS past. They called it spacey random surfer.

More specifically, at each step, the spacey random surfer guess the pre-vious m states drawing them according to the empirical distribution of the visited states, then it moves according to an higher-order Markov chain with probability α and according to a fixed distribution v with probability 1 − α. Analogously to PageRank model, the problem becomes finding the sta-tionary distribution of the random model. For the specific case of an n-state second-order Markov chain, the problem becomes finding stochastic solutions of a quadratic vector equation. This problem has been faced in [6], and it has been understood better in [7], where more efficient numerical methods have been introduced exploiting the theory of quadratic vector equations about nonnegative matrices [8].

In [3], the authors have developed a tensor spectral clustering method, exploiting the model described above, in order to manage graphs by means of their higher-order structures, such as triangles involving three nodes. More-over, in [9], a further modification, called super-spacey random surfer, has been introduced, since it could be more suitable than the spacey random surfer for modelling large scale networks. It leads to another tensor spec-tral clustering method that calls for solving a new vector equation, that is nonlinear.

The main goal of this thesis is to resume all these mathematical tools in a unique organic work and then, exploiting numerical methods introduced in [7], we wish to face the problem proposed in [3], [9] and make some experiments about both synthetic and real world data.

The thesis is organized as follows.

In Chapter 1 we recall some mathematical tools that will be used later. In Chapter 2, we define and study theoretical properties of multilinear Page-Rank. Then, in Chapter 3, we list some algorithms to find stochastic so-lutions of multilinear PageRank equation. In Chapter 4 we introduce the problem of graph clustering based on third-order structures and present an algorithm of spectral clustering exploiting multilinear PageRank, and a fur-ther algorithm based on a different model that calls for solving a nonlinear vector equation. We report a pair of numerical methods to solve it. Finally, in Chapter 5, we report the results of our numerical experiments and make conclusions.

(5)

Chapter 1

Preliminaries

The aim of this chapter is to introduce the reader to our work. Some basic definitions and theorems are mentioned. First, we recall some results about nonnegative matrices (see [10] for a reference) and then some basic notions about nonnegative tensors. In Section 1.3 we resume some prop-erties about quadratic vector equations, studied in [8]. Finally, we define a stochastic process called vertex-reinforced random walk, which will be a suitable mathematical tool for modelling our problem in next chapter.

1.1 Perron-Frobenius theorem and its applications

Throughout all the text, the inequalities between vectors and matrices are intended in the componentwise sense.

Theorem 1.1. Let A ∈ Rn×n be a nonnegative and irreducible matrix. The following stataments hold true:

• ρ(A), the spectral radius of A, is strictly positive; • ρ(A) is an eigenvalue of A;

• there exists x ∈ Rn _{such that x > 0, Ax = ρ(A)x and ||x||}

1 = 1; the

vector x is called Perron-vector of A;

• For each matrix B such that B ≥ A we have that ρ(B) > ρ(A); • ρ(A) is a simple eigenvalue, i.e., it’s a simple solution of the

charac-teristic equation.

(6)

2 CHAPTER 1. PRELIMINARIES Definition 1.1. Let A = (aij)i,j=1,...,n be a square and nonnegative matrix.

We say that A is column stochastic if and only if Pn

i=1aij = 1 for each

j = 1, . . . , n. For simplicity we will omit "column" and say "stochastic matrix".

Definition 1.2. Let x = (xi)i=1,...,n be a nonnegative vector. We say that

xis stochastic if and only if Pn_j=1xj = 1.

In the next propositions, strong consequences of Perron-Frobenius theo-rem about stochastic and nonnegative matrices are recalled.

Proposition 1.2. Let P be a stochastic matrix. Then: • ρ(P ) = 1

• there exists a stochastic vector x such that P x = x

Proposition 1.3. Let M ∈ Rn×n be a square matrix. We say that M is an M-matrix if there exist a nonnegative matrix B and β > 0 such that M = βI − B and ρ(B) ≤ β. The following statements hold:

• If M is a nonsingular M-matrix, then M−1 _{≥ 0}_.

• If M is a nonsingular and irreducible M-matrix, then M−1_{> 0}_.

1.2 Nonnegative tensors

A real value mth-order n-dimensional nonnegative tensor consists in an n × n × · · · × n array A having nm entries in R+ and it’s denoted by A = (Ai1i2,...,im)where i1, i2, . . . , im range from 1 to n.

For each vector x ∈ Rn _{we can define a column vector}

Axm−1 _:= n X i2,...,im=1 Aii2,...,imxi2. . . xim i=1,...,n

Definition 1.3 (Eigenpair of tensors). Let A be a real value mth-order n-dimensional tensor. Assume that there exists a vector x such that Axm−1 _is

not 0. We say (λ, x) ∈ R × Rn _{is an eigenpair of A if}

Axm−1 i = λx

m−1 i

(7)

1.3. ON QUADRATIC VECTOR EQUATIONS 3 For our purpose the following notion of eigenvalue of tensors is suitable. Definition 1.4. Let A be a real value mth-order n-dimensional tensor. Let ||.||be a given vector norm on Rn_{. Assume that there exists a vector x such}

that Axm−1 _{is not 0. We say (λ, x) ∈ R × R}n _{is an E-eigenpair of A if}

   Axm−1 i= λxi ||x|| = 1 for each i = 1, . . . , n.

If an E-eigenvalue has a real E-eigenvector, then we call it a Z-eigenvalue and call the real E-eigenvector a Z-eigenvector.

Therefore, x ∈ Rn _{is a Z-eigenvector if and only if there exists λ such}

that Axm−1 _{= λx}_.

One of our main tools is the so called stochastic tensor, a suitable gen-eralization of the same concept for matrix.

Definition 1.5. Let A = (Ai1,...,im) be an mth-order n-dimensional non-negative tensor. We say that A is stochastic if and only if Pn

i1=1Ai1,...,im = 1. Let us introduce a useful notation for third-order tensors.

Let P be a real value third-order n-dimensional tensor. Define the flat-tening of P along the first index as the n × n2 _{block matrix}

R = h

P•,•,1 P•,•,2 . . . P•,•,n

i . For instance, if P is the following tensor:

P•,•,1 = " 1 1 1 1 # P•,•,2= " 2 2 2 2 # P•,•,3= " 3 3 3 3 # ,

then its flattening is

R = "

1 1 2 2 3 3

#

1.3 On quadratic vector equations

In this section we resume some results introduced in [8] about quadratic vector equations (QVE).

(8)

4 CHAPTER 1. PRELIMINARIES The general problem has the form

x = a + b(x, x) (1.1)

where a ∈ Rn_{, a ≥ 0; and b(x, x) is a positive bilinear form, i.e, it’s linear in}

each component and maps nonnegative vectors in a nonnegative real value. It’s possible to prove that b(x, y) ≥ b(z, w) if x ≥ z ≥ 0 and y ≥ w ≥ 0 [8].

A useful result deals with the existence of a minimal nonnegative solution of (1.1). In order to achieve it, we first state a basic lemma. The proof can be found in [8] in a more general case.

Let us introduce a useful definition and important hypothesis: Definition 1.6. We say l : Rn_{→ R}n a weakly positive map if l(x) ≥ 0, l(x) 6= 0whenever x ≥ 0, x 6= 0.

Condition 1.4. There exist a weakly positive map l : Rn_{→ R}nand a vector z ∈ Rnsuch that for any x ≥ 0, it hold that l(x) ≤ z implies l(a+b(x, x)) ≤ z. Lemma 1.5. Let l : Rn_{→ R}n be weakly positive. For each z ∈ Rn such that z ≥ 0 the set {x ∈ Rn_{: x ≥ 0, l(x) ≤ z}} _{is bounded.}

The following theorem proves the existence of the minimal nonnegative solution.

Theorem 1.6. Equation (1.1) has at least one solution if and only if Condi-tion 1.4 holds true. In addiCondi-tion, there exists a (unique) minimal nonnegative solution.

Proof. Let us consider the iteration

xk+1= a + b(xk, xk)

starting from x0 = 0. We now prove by induction that xk ≥ 0 for each k

and xk≤ xk+1:

x1 = a ≥ 0

xk+1− xk= b(xk, xk) − b(xk−1, xk−1) ≥ 0

since b is a positive bilinear form.

Let z be the same defined in Condition 1.4. We now prove by induction that l(xk) ≤ z:

(9)

1.4. VERTEX-REINFORCED RANDOM WALK 5 l(xk+1) = l(a + b(xk, xk)) ≤ z,

since Condition 1.4 holds true.

The sequence xk is nondecreasing and bounded by Lemma 1.5 so it

con-verges. We call ˆx its limit.

On the other hand, if (1.1) has a solution s we can choose l = I and z = s; thus Condition 1.4 is satisfied.

Finally, for each solution s, suppose by induction that xk≤ s, then:

s − xk+1= b(s, s) − b(xk, xk) ≥ 0

and taking the limit ˆx ≤ s. This proves that ˆx is the minimal nonnegative solution.

1.4 Vertex-reinforced random walk

We now just introduce the notion of "Vertex-reinforced random walk", a stochastic process that will be an important tool for modelling the history of a surfer on an higher-order network. More details can be found in [5]. Now we recall some basic facts concerning probability spaces (Ω, F, P r), where Ω is a set, F is a σ-field on Ω and P r is a probability measure on Ω.

Definition 1.7. Let (Ω, F, P r) be a probability space.

A stochastic process is a sequence of random variables (Xt)t∈T defined

on the space Ω, where T is a set of indices.

The σ-field Ftgenerated by (Xs)s≤t is called the natural filtration of the

process (Xt)t∈T.

From now on, we use T = N.

Let us introduce a notation for the conditional probability.

Definition 1.8. Let (Ω, F, P r) be a probability space. Let A, B ∈ F such that P r(B) 6= 0. Define the conditional probability with respect to an event as follow.

P r(A|B) := P r(A ∩ B) P r(B) .

Let G ⊆ F be a σ-field. Define the conditional probability with respect to the σ-field G as follow.

(10)

6 CHAPTER 1. PRELIMINARIES where E[.|.] is the conditional expectation (see [11] for a formal definition). Definition 1.9. The vertex-reinforced random walk is a stochastic process (Xn)n∈N with states xi where i ranges from 1 to N, satisfying:

X0= x0 si_n= 1 + n X k=1 1{Xk=i}, wn= sn N + n P r(Xn+1= i|Fn) = [M (wn)]i,Xn

where Fnis the σ-field generated by (Xi)i=0,1,...,n, M(wn)is a N ×N column

stochastic matrix given by the map

M : ∆N −1→ {P ∈ RN ×N|P is stochastic}, x 7→ M(x)

where ∆N −1is the simplex of length N probability vectors that are

nonneg-ative and sum to one.

In other words, transitions at any point of the walk may depend on the relative amount of time spent in each state up to that point. The vector wnis

called the occupation vector because it represents the empirical distribution of time spent at each state.

(11)

Chapter 2

Multilinear PageRank problem

In this chapter we extend the PageRank problem to a higher-order Markov chain. This leads to a linear system that is intractable for many interesting problems from the applications, so we study a rank 1-approximation intro-duced by Li and Ng in [12]. This leads to the so called "Multilinear Page-Rank problem" and we study its theoretical properties. A suitable stochastic model, called "spacey random surfer", has been developed in [5]. It yields the same "Multilinear PageRank problem" studied in [6], but this model is a better justification from a probabilistic point of view and, at same time, it models the history of a spacey random surfer on a higher-order network very well.

2.1 PageRank for higher-order Markov chains

First, we define the well known PageRank problem for a first-order Markov chain.

Definition 2.1(PageRank). Let P be a stochastic matrix, let α be a prob-ability smaller than 1, and let v be a stochastic vector. A PageRank vector x is the unique solution of the linear system

x = αP x + (1 − α)v. We call the set (α, P, v) a PageRank problem.

In order to extend the PageRank problem to higher-order Markov chains we review their properties.

(12)

8 CHAPTER 2. MULTILINEAR PAGERANK PROBLEM Definition 2.2. An mth-order Markov chain, where m is finite, is a stochas-tic process (Xn)n∈N satisfying

Pr(Xn= xn| Xn−1= xn−1, Xn−2= xn−2, . . . , X1 = x1)

= Pr(Xn= xn| Xn−1= xn−1, Xn−2= xn−2, . . . , Xn−m = xn−m) for n > m

In words, this means that the future state only depends on the past m states.

In the latter definition the fundamental Markov property fails, but we can recover it by taking a Cartesian product of its state space.

Let Xnbe a second order Markov chain and consider the tensor P whose

entries are:

Pi,j,k = P r(Xn+1= i|Xn= j, Xn−1= k).

The stationary distribution equation for the resulting first-order Markov

chain satisfies _n

X

j,k=1

Pi,j,kXj,k = Xi,j

where Xj,k denotes the stationary probability on the product space.

Now, we define a higher-order PageRank by modelling a random surfer on a higher-order Markov chain. With probability α, the surfer moves according to the higher-order chain, and with probability 1 − α, the surfer travels according to the distribution v. That is, if P is the transition tensor of the higher-order Markov chain, then the higher-order PageRank chain has a transition tensor M, where

M (i1, i2, . . . , im) = αP (i1, i2, . . . , im) + (1 − α)vi.

This leads to the definition of the Higher-order PageRank problem.

Definition 2.3(Higher-order PageRank). Let P be an mth-order stochastic tensor, let α be a probability smaller than 1, and let v be a stochastic vector. A PageRank m − 1th-order tensor X is a solution of the linear system

X(i1, i2, . . . , im−1) = α X im P (i1, i2, . . . , im)X(i2, . . . , im) + (1 − α)vi X im X(i2, . . . , im)

We call the set (α, P, v) a higher-order PageRank problem.

The tensor product structure in the state-space makes finding a station-ary distribution a difficult problem to solve, especially when we have to face

(13)

2.2. MULTILINEAR PAGERANK 9 large scale problems as in many interesting applications. Next paragraph has the aim to introduce an approximation of higher-order stationary distri-bution in order to make the problem less expensive.

2.2 A rank one approximation: multilinear

Page-Rank

Throughout this section we consider a second-order PageRank problem (α, P, v).

Without loss of generality suppose α = 1. Then the limiting distribution equation is

n

X

j,k=1

Pi,j,kXj,k = Xi,j

where X is a stationary tensor and Xi,j is a limiting joint probability. With

the assumption

Xi,j = xixj (2.1)

we can rewrite the system as follows          Pn k=1Pi,j,kxjxk= xixj Pn i=1xi = 1 x ≥ 0 (2.2)

Summing on j we obtain a linear system useful to compute a limiting vector

distribution x.          xi =Pnj,k=1Pi,j,kxjxk Pn i=1xi= 1 x ≥ 0 (2.3) We can compute a limiting distribution x also from a probabilistic point of view. xi = lim n P r(Xn= i) =X j,k Pi,j,klim n P r(Xn= j, Xn−1= k) =X j,k Pi,j,kxjxk = (P x2)i. (2.4)

(14)

10 CHAPTER 2. MULTILINEAR PAGERANK PROBLEM Equation (2.4) shows that the limiting distribution is a Z-eigenvector of stochastic tensor P referred to the eigenvalue 1. In the first-order case, we recover a well-known property of first-order Markov chains.

This transformation of second-order PageRank problem yields a new expres-sion for the limiting distribution in general case α ∈ [0, 1].

Definition 2.4 (Multilinear PageRank problem). Let P be a third-order stochastic tensor, let α be a probability smaller than 1, and let v be a stochastic vector. We say that x is a multilinear PageRank vector if x ≥ 0, P

ixi = 1 and

x = αP x2+ (1 − α)v. (2.5)

The assumption (2.1) implies a strong hypothesis on the process Xn

(a common hypothesis that implies (2.1) is the independence of random variables Xn, but this would be strange in our case). In order to avoid this

constraint, we develop a different stochastic model for a random surfer in the next section.

2.3 Spacey random walk

Let P = (Pi,j,k) be an N-dimensional stochastic tensor. Let us denote

P r a probability measure and 1Athe indicator function of the event A.

Define a pair of stochastic processes Xn, Yn as follows

P r(Yn= k|Fn) =

1 +Pn

s=11{Xs=k}

n + N ,

P r(Xn+1 = i|Xn= j, Yn= k) = Pi,j,k, (2.6)

where Fn is the natural filtration of process Xn and X0 is an initial state.

In words, the process Yn randomly draws a past state X1, . . . , Xn and the

next state visited by Xn follows the law (2.6).

The process Xn is called "spacey random walk" and it’s a particular

version of a vertex-reinforced random walk. The next proposition makes it more clear.

Let ⊗ denote the Kronecker product between matrices. Precisely, if A, B are respectively n × m and p × q matrices, then A ⊗ B is the np × mq block

(15)

2.3. SPACEY RANDOM WALK 11 matrix A ⊗ B =     a1,1B . . . a1,nB ... ... ... an,1B . . . an,mB    

Proposition 2.1. The spacey random walk is a vertex-reinforced random walk defined by the map

x 7→

n

X

k=1

P•,•,kxk= R(x ⊗ I),

where R is the flattening of P along the first index.

Proof. Using the same notation introduced in Section 1.4, we have P r(Xn+1= i|Fn) = P r(Xn+1= i|Xn, wn) = N X k=1 P r(Xn+1= i|Xn, Yn= k)P r(Yn= k|wn) = N X k=1 Pi,Xn,kwn(k) = R(wn⊗ I)i,Xn.

We only need to prove that PN

k=1Pi,Xn,kwn(k)is a stochastic matrix. Indeed, since wn ∈ 4N −1 the matrix PN_k=1P•,•,kwn(k) is a convex combination of

stochastic matrices P•,•,k so it is stochastic. Thus, this process satisfies the

assumptions given in Definition 1.9.

2.3.1 Stationary distribution of spacey random walk

We now develop a heuristic argument to characterize a stationary distri-bution for a spacey random walk. We will use the same notation introduced in Section 1.4.

The idea is to approximate what would happen if we ran the process for an extremely long time.

Let wn be the occupation vector of a spacey random walk. Consider the

behaviour of the process at time n >> 1 and n + l where l << n. We can state wn+l≈ wn and the spacey random walker approximates a Markov

chain with transition matrix M(wn) = R(wn⊗ I).

Suppose that M(wn)has a unique stationary distribution xn satisfying

(16)

12 CHAPTER 2. MULTILINEAR PAGERANK PROBLEM xn = wn+l for large n; otherwise, the distribution xn will cause wn+l to

change. Therefore, the limiting distribution x must satisfy the equation x = M (x)x = R(x ⊗ I)x = R(x ⊗ x).

The latter is a multilinear PageRank equation with α = 1. Later, we develop a stochastic model satisfying a multilinear PageRank problem as in Definition 2.4 for any value of α ∈ [0, 1].

2.3.2 Spacey random surfer

Let α ∈ [0, 1] and let v be a stochastic vector.

Consider the following modification of a spacey random walk process. P r(Yn= k|Fn) =

1 +Pn

s=11{Xs=k}

n + N ,

P r(Xn+1= i|Xn= j, Yn= k) = αPi,j,k+ (1 − α)vi.

We call the process Xn a spacey random surfer. Transition of a spacey

random surfer depends on time spent in each state previously, encoded in the process Yn, and depends on a fixed distribution v.

Applying the argument above, we can state that a limiting distribution of a spacey random surfer must satisfy a multilinear PageRank problem (α, P, v)as in Definition 2.5.

The existence and uniqueness of a stochastic solution for a multilinear PageRank problem is not guaranteed. In next section, this issue will be explored.

2.4 Solution of a vector equation

In this section, we prove that there exists a minimal nonnegative solu-tion of equasolu-tion (2.5), in the componentwise sense, which is not necessarily stochastic. We prove that a stochastic solution always exists and we explore sufficient conditions for its uniqueness (see [7] for a reference).

(17)

2.4. SOLUTION OF A VECTOR EQUATION 13

2.4.1 Minimal nonnegative solutions

Consider the equation (2.5) rewritten as follows

x = αR(x ⊗ x) + (1 − α)v, (2.7)

where the matrix R is the same introduced in Section 2.3. Let us introduce the map

G(x) = αR(x ⊗ x) + (1 − α)v. Lemma 2.2. Consider the fixed-point iteration

xk+1 = G(xk)

started from x0 = 0. Then the sequence {xk}is such that 0 ≤ xk≤ xk+1≤ 1,

there exists m = limkxk ≥ 0 and m is the (unique) minimal nonnegative

solution of (2.7).

Proof. The map G(x) is weakly positive, i.e., G(x) ≥ 0, G(x) 6= 0 whenever x ≥ 0, x 6= 0. Moreover, if 0 ≤ x ≤ 1 then 0 ≤ G(x) ≤ 1. Therefore, taking l = G, b(x, x) = R(x ⊗ x), a = (1 − α)v and z = [1, . . . , 1]t, Condition 1.4 of Section 1.3 is satisfied. Moreover, according to Theorem 1.6, it turns out that the sequence of vectors {xk} is bounded and converges monotonically

to a vector m, which is the minimal nonnegative solution of (2.7).

The minimal solution m could be not stochastic. Now, we exploit the properties of nonnegative solutions in order to characterize stochastic solu-tions.

2.4.2 Existence of stochastic solutions

Let α ∈ (0, 1]. The following result has been proved in [7].

Lemma 2.3. Let g(u) := αu2 + 1 − α. Then, etG(x) = g(etx) for any x ∈ RN, where et is the row vector whose entries are all equal to 1.

(18)

14 CHAPTER 2. MULTILINEAR PAGERANK PROBLEM Proof. We have

etG(x) = αetR(x ⊗ x) + (1 − α)etv = αet(x ⊗ x) + 1 − α = α(etx)2+ 1 − α.

Corollary 2.4. For each solution x of (2.7), etxis one of the two solutions of the quadratic equation u = g(u), i.e, u = 1 or u = 1−α

α .

Proof. For each solution x of (2.7) we have x = G(x) so et_{x = e}t_{G(x) =}

α(etx)2+ 1 − α, where we applied the previous lemma.

Let u be one of the solution of u = g(u). Define the level set lu = {x : etx = u, x ≥ 0}.

Proposition 2.5 (Existence of a stochastic solution). There exists at least a solution x ≥ 0 of (2.7) such that et_{x = 1} _{and at least a solution x ≥ 0}

such that et_{x =} 1−α α .

Proof. The thesis is a consequence of Brouwer fixed-point theorem applied to the compact and convex sets l1 and l1−α

α .

2.4.3 Uniqueness of stochastic solutions

As consequence of Proposition 2.5 we can consider two cases: • α ≤ 1

2, we have 1−α

α ≥ 1 so the minimal solution m satisfies e

t_{m = 1}_.

We have a unique stochastic solution x = m. • α > 1

2 , there exists at least one stochastic solution. Now, we exhibit

an example which causes uniqueness to fail. It can be found in [6]. Let α = 0.99, v = [0, 1, 0]t_and R =    0 0 0 0 0 0 1/3 1 0 0 0 0 0 1 0 1/3 0 1 1 1 1 1 0 1 1/3 0 0   .

(19)

2.5. HIGHER-ORDER PAGERANK 15 Then both x = [0, 1, 0]t _{and x = [0.1890, 0.3663, 0.4447]}t _{solve the}

multilinear PageRank problem.

2.5 Multilinear PageRank of order greater than two

The notion of spacey random walk can be generalized beyond the second-order Markov chain.

Consider an m-order tensor P representing an (m−1)st-order Markov chain. The spacey random walk corresponds to the following process:

1. The walker is at node j, spaces out, and forgets the last m − 2 states. It chooses the last m − 2 states j1, . . . , jm−2 at random based on its history

(each state is drawn from the occupation vector w).

2. The walker then moves to node i with probability Pi,j,j1,...,jm−2. Anal-ogously to second-order case, we can see the spacey random walk as a version of a vertex-reinforced random walk defined by the map

x 7→ R(x ⊗ · · · ⊗ x

| {z }

m−2 terms

⊗I),

where R is the N × Nm−1 _{column-stochastic flattening of P along the first}

index.

The following proposition makes it more clear.

Proposition 2.6. Let X_n be a stochastic process satisfying

P r(Xn+1= i|Xn= j, Yn1= j1, . . . , Ynm−2 = jm−2) = Pi,j,j1,...,jm−2,

where Y1_{, . . . , Y}m−2_{are i.i.d. (independent and identically distributed)}

stochas-tic processes that represent the guess of the last m − 2 states based on the history of Xn, i.e., P r(Y_n1= j1) = 1 +Pn s=11{Xs=j1} n + N , ... P r(Y_nm−2= jm−2) = 1 +Pn s=11{Xs=jm−2} n + N

(20)

16 CHAPTER 2. MULTILINEAR PAGERANK PROBLEM Let wnbe the occupation vector of the spacey random walk Xn. Then, Xn

is a vertex-reinforced random walk defined by the map x 7→ R(x ⊗ · · · ⊗ x

| {z }

m−2 terms

⊗I).

Proof. Analogously to second-order case, the proof is a consequence of the following chain of equalities.

P r(Xn+1= i|Fn) = P r(Xn+1= i|Xn, wn) = N X j1,...,jm−2=1 P r(Xn+1= i|Xn, Yn1= j1, . . . , Ynm−2= jm−2)P r(Yn1= j1, . . . , Ynm−2= jm−2|wn) = N X j1,...,jm−2=1 Pi,Xn,j1,...,jm−2wn(j1) . . . wn(jm−2) = R(wn⊗ · · · ⊗ wn | {z } m−2 terms ⊗I)i,Xn.

Then we can generalize the notion of spacey random surfer. Consider a surfer following the spacey random walk model with probability α and a fixed distribution v with probability 1 − α. A stationary distribution of a spacey random surfer must satisfy the following equation

x = αR(x ⊗ · · · ⊗ x

| {z }

m−1 terms

) + (1 − α)v, (2.8)

that is the generalized multilinear PageRank problem at order m − 1. A uniqueness result for general mth-order multilinear PageRank has been proved in [6].

Proposition 2.7. Let P be an m-th order tensor representing an (m-1)st-order Markov chain. Let α < 1

m−1 and let v be a stochastic vector. Then,

there exists one stochastic solution of the equation (2.8).

Proof. A multilinear PageRank vector x always exists because it is a special case of the stationary distribution vector considered by Li and Ng [12].

Suppose that there exist two distinct stochastic solutions x and y to (2.8), then we have x = αR(x ⊗ · · · ⊗ x | {z } m−1 terms ) + (1 − α)v, y = αR(y ⊗ · · · ⊗ y | {z } m−1 terms ) + (1 − α)v.

(21)

2.5. HIGHER-ORDER PAGERANK 17 product (see [6]) || x ⊗ · · · ⊗ x | {z } m−1 terms − y ⊗ · · · ⊗ y | {z } m−1 terms ||₁≤ (m − 1)||x − y||₁, and we obtain

||x − y||1 ≤ α(m − 1)||x − y||1 < ||x − y||1,

(22)

(23)

Chapter 3

Algorithms

We now show some algorithms to find stochastic solutions of the vector equation (2.7). Their convergence depends on the value of α, indeed we can distinguish two cases: α ≤ 1/2, that we call subcritical case: all the algorithms converge to the unique stochastic solution (sometimes α = 1

2 is an

exception); and α > 1/2 that is the supercritical case, the more challenging one: there are many stochastic solutions and this yields failure of convergence for several algorithms in several numerical tests. Finally, we present a pair of Perron-based algorithms, developed in [7], able to compute stochastic solutions also in the supercritical case exploiting the minimal nonnegative solution.

3.1 Fixed-point iteration

Consider the map G(x) = αR(x ⊗ x) + (1 − α)v defined in Section 2.4. The iteration defined by this map has been studied in Section 2.4 so we can state the following theorem.

Theorem 3.1. Let α ≤ 1₂. Let P be a third-order stochastic tensor and R its flattening along the first index, let v be stochastic vector and x0 = 0. Then

the sequence generated by the fixed-point iteration xk+1 = G(xk)

converges to the unique stochastic solution of Multilinear PageRank problem (2.7).

(24)

20 CHAPTER 3. ALGORITHMS Proof. Proposition 2.5 implies that there exists a unique stochastic solu-tion of (2.7) and Lemma 2.2 implies the convergence of fixed-point iterasolu-tion method.

Lemma 2.2 shows that the fixed-point iteration yields a monotonic con-vergent sequence of nonnegative vectors, applying only sums and products of nonnegative numbers. Thus, computing the stochastic solution with this fixed-point iteration in the subcritical case is fast and free of numerical issues.

3.2 Shifted fixed-point iteration

Let us introduce a shift γ ≥ 0 and consider the following iteration: xk+1 = α 1 + γR(xk⊗ xk) + 1 − α 1 + γv + γ 1 + γxk.

This modification of simple fixed-point iteration has been called shifted fixed-point iteration. Next theorem explores convergence properties of this method. We will use the notation ||.||1 to denote the 1-norm of vectors and

matrices.

Theorem 3.2. Let α < 1₂. Let P be a third-order stochastic tensor and R its flattening along the first index, let v and x0 be stochastic vectors. The

sequence generated by the fixed-point iteration xk+1= α 1 + γR(xk⊗ xk) + 1 − α 1 + γv + γ 1 + γxk

converges to the unique stochastic solution of multilinear PageRank problem (2.7).

Proof. Let α < 1

2 and let y be the unique stochastic solution of (2.7). Then

the following inequalities hold true: ||y − xk+1||1≤ α 1 + γ||R(y ⊗ y − xk⊗ xk)||1+ γ 1 + γ||y − xk||1 ≤ 2α 1 + γ||R||| {z }1 1 ||y − xk||1+ γ 1 + γ||y − xk||1 = 2α + γ 1 + γ ||y − xk||1

(25)

3.2. SHIFTED FIXED-POINT ITERATION 21 where we have used the property of Kronecker product ||a ⊗ b − c ⊗ d||1 ≤

||a − c||₁+ ||b − d||1 with a, b, c, d stochastic vectors (see [6] for the proof).

Finally, by induction we can prove that ||y−xk||1 ≤

2α+γ 1+γ

k

||x0−y||1 ≤

22α+γ_1+γ k. We have used the stochasticity of all iterates xk to state that y

is stochastic and ||x0− y||1 ≤ 2.

Now, since the hypothesis α < 1

2 holds, the ratio 2α+γ

1+γ is less than 1 and

the sequence converges.

The argument above is useful to identify a rate of convergence in the case α < 1₂, but the case α ≥ 1₂ is not solved yet.

Indeed, we can prove a convergence theorem for each value of α as we have done in the Section 2.4 for the fixed-point iteration. The limiting vector is the minimal nonnegative solution that is stochastic in the subcritical case but not in supercritical case.

Precisely, we can apply Theorem 1.6 taking l(x) = α

1+γR(x ⊗ x) + γ 1+γx + 1−α 1+γv, a = 1−α 1+γv, b(x, x) = α 1+γR(x ⊗ x) + γ 1+γx, z = [1, . . . , 1] t _{and x} 0 = 0.

Algorithm 1 below shows a straightforward implementation of shifted fixed-point iteration. The choice γ = 0 leads to fixed-iteration. The max-imum number of iterations depends on the convergence rate computed in Theorem 3.2.

Data: x0, v, α, γ, R, a tolerance , a maximum number of iterations

N

Result: An approximation to the solution m k = 1; x = x0; while ||αR(x ⊗ x) + (1 − α)v − x||₁> and k < N do y = R(x ⊗ x); x = _1+γα y +1−α_1+γv +_1+γγ x; k = k + 1; end m = x;

(26)

22 CHAPTER 3. ALGORITHMS

3.3 Inner-outer iteration

We now develop a nonlinear iteration scheme that uses multilinear Page-Rank, in the convergent regime, as a subroutine. To derive this method we introduce the matrix ˆR = αR + (1 − α)vet. Equation (2.7) becomes:

ˆ R(x ⊗ x) = x. Or equivalently: α 2 ˆ R(x ⊗ x) + (1 −α 2)x = x. From here, non linear iteration emerges:

xk+1 = α 2 ˆ R(xk+1⊗ xk+1) + (1 − α 2)xk.

Each step involves solving a multilinear PageRank problem (α

2, ˆR, xk). In

the subcritical case also this method converges.

Theorem 3.3. Let α < 1₂. Let P be a third-order stochastic tensor and R its flattening along the first index, let v and x0 be stochastic vectors. The

sequence generated by inner-outer iteration xk+1 =

α

2R(xˆ k+1⊗ xk+1) + (1 − α

2)xk

converges to the unique stochastic solution of multilinear PageRank problem (2.7).

Proof. Recall that this is the regime of α when the solution is unique. Let call x the solution. We have:

xk+1− x = α 2R(xˆ k+1⊗ xk+1− x ⊗ x) + (1 − α 2)(xk− x) = α 2 2 R(xk+1⊗ xk+1− x ⊗ x) + (1 − α 2)(xk− x) By using the 1-norm we get a bound for the gap xk+1− x:

||x_k+1− x||₁ ≤ α2_||x

k+1− x||1+ (1 −

α

2)||xk− x||1

and the scheme converges linearly with rate 1−α/2

1−α2 < 1 when α < 1₂.

(27)

inner-3.4. INVERSE ITERATION 23 outer iteration. The maximum number of iterations depends on the conver-gence rate computed in Theorem 3.3.

Data: x0, v, α, R, a tolerance , a maximum number of iterations N

Result: An approximation to the solution m k = 1;

x = x0;

ˆ

R = αR + (1 − α)vet;

while ||αR(x ⊗ x) + (1 − α)v − x||1> and k < N do

Solve the multilinear PageRank (α/2, ˆR, x)with fixed-point iteration;

Update the value of x; k = k + 1;

end m = x;

Algorithm 2: Inner-outer iteration

3.4 Inverse iteration

Observe that

αR(x ⊗ x) = α

2R[(x ⊗ I) + (I ⊗ x)]x.

Both matrices R(x ⊗ I) and R(I ⊗ x) are stochastic. Let us define S(x) =

1

2R(x ⊗ I) + 1

2R(I ⊗ x), a convex combination of these two matrices, that is

still stochastic. Then the multilinear PageRank vector satisfies x = αS(x)x + (1 − α)v.

This equation has the following interpretation: the multilinear PageRank vector is the steady-state vector of a solution-dependent Markov process. Thus, a nonlinear iteration emerges:

xk+1 = αS(xk)xk+1+ (1 − α)v.

Next theorem concerns converge properties of inverse iteration.

Theorem 3.4. Let α < 1₂. Let P be a third-order stochastic tensor and R its flattening along the first index, let v and x0 be stochastic vectors. Let

(28)

24 CHAPTER 3. ALGORITHMS S(x)be the same defined above. The sequence generated by inverse iteration

xk+1 = αS(xk)xk+1+ (1 − α)v

converges to the unique stochastic solution of Multilinear PageRank problem (2.7).

Proof. Consider the error at (k + 1)st iteration: xk+1− x =

α

2R[(xk⊗ I + I ⊗ xk)xk+1− (x ⊗ I + I ⊗ x)x]

= α

2R[xk⊗ xk+1+ xk+1⊗ xk− (x ⊗ x + x ⊗ x)]. Taking the 1-norm, we obtain:

||xk+1− x||1 ≤ α 2 2||xk+1− x||1+ 2||xk− x||1 . Thus, the inverse iteration converges linearly with rate α

1−α < 1 when α < 1

2.

Algorithm 3 shows a straightforward implementation of inverse iteration. The maximum number of iterations depends on the convergence rate com-puted in Theorem 3.4.

Data: x0, v, α, R, a tolerance , a maximum number of iterations N

Result: An approximation to the solution m k = 1;

x = x0;

while ||αR(x ⊗ x) + (1 − α)v − x||₁ > and k < N do S = 1₂R(x ⊗ I) + 1₂R(I ⊗ x);

Solve the system (I − αS)x = (1 − α)v with respect to x; k = k + 1;

end m = x;

(29)

3.5. NEWTON ITERATION 25

3.5 Newton iteration

Consider Newton’s method for solving the non linear equation f (x) := αR(x ⊗ x) + (1 − α)v − x = 0.

For a given initial approximation x0this method generates a sequence {xk}k

defined by

[I − αR(xk⊗ I + I ⊗ xk)]pk= f (xk) xk+1= xk+ pk. (3.1)

In fact, J(x) = −I + αR(x ⊗ I + I ⊗ x) is the Jacobian of f(x), so (3.1) exactly defines Newton iteration.

The following result shows that Newton’s method always converges quadrat-ically fast when solving multilinear PageRank vectors inside the unique regime (except for α = 1

2).

Theorem 3.5. Let α < 1₂. Let P be a third-order stochastic tensor and R its flattening along the first index, let v be a stochastic vector. The sequence {x_k}_k generated by Newton iteration (3.1) with initial approximation x0= 0

converges quadratically to the unique stochastic solution of (2.7). Proof. We divide the proof into 3 steps.

Step 1 Notice the expression (3.1) for f(x_k). Now, we compute f(xk+1).

f (xk+1) = f (xk+ pk) = αR[(xk+ pk) ⊗ (xk+ pk)] + (1 − α)v − xk− pk

= αR(xk⊗ xk) + αR(xk⊗ pk) + αR(pk⊗ xk)+

αR(pk⊗ pk) + (1 − α)v − xk− pk

= αR(pk⊗ pk) + αR(xk⊗ pk) + αR(pk⊗ xk) − pk+ f (xk)

= αR(pk⊗ pk).

Step 2 We now show that J(x_k) is not singular for each k, hence Newton iteration is well-defined.

In order to do it, we prove by induction that

f (xk) ≥ 0, xk≥ 0, zk= etxk≤ 1.

(30)

26 CHAPTER 3. ALGORITHMS 0 and zk ≤ 1. Then J(xk) is non-singular because −J(xk) = I −

αR(xk⊗ I + I ⊗ xk) is a strictly diagonally dominant M-matrix (in

fact, R(xk⊗ I + I ⊗ xk) is nonnegative and its 1-norm is 2zk). Thus,

xk+1 is well-defined. Then

xk+1= xk− J (xk)−1f (xk)

Since −J(xk)is an invertible M-matrix, in view of Proposition 1.3 its

inverse is nonnegative and it results xk+1 ≥ 0, pk ≥ 0. The first step

of the proof shows that also f(xk+1) ≥ 0. It remains to show that

zk+1 ≤ 1.

Multiplying equation (3.1) by et_{= [1, . . . , 1]}t_{on the left:}

et[I − αR(xk⊗ I + I ⊗ xk)]pk= etf (xk) etpk− αzk[etpk+ etpk] = αet(xk⊗ xk) + 1 − α − zk (1 − 2αzk)(zk+1− zk) = αzk2+ 1 − α − zk Suppose zk+1> 1, then 1 < zk+1= −αz2 k+ 1 − α 1 − 2αzk

Finally, this implies (zk− 1)2 < 0, that is a contradiction.

Step 3 We now show that the sequence fk := etf (xk) converges to 0. Since

f (xk) ≥ 0, the former expression is sufficient to prove the convergence

of Newton iteration.

The following equalities hold true:

fk+1= etf (xk+ pk) = αetR(pk⊗ pk) = α(etpk)2 = α fk 1 − 2αzk 2 = α f 2 k (1 − 2α)2_{+ 4αf} k , where we have used the expression of zk in terms of fk as root of the

quadratic equation αz2

k+ 1 − α − zk− fk= 0.

Ignoring the term (1 − 2α)2 _{in the denominator, we get f}

(31)

3.5. NEWTON ITERATION 27 and by induction fk≤ 1 4k−1f1= 1 4k−1α(1 − α) 2_,

where we have computed f1 by means of f1 = α(etp1)2 and p1 =

−J (x0)−1f (x0). This is sufficient to prove that fk converges to 0.

Now, it’s easy to check that lim k→∞ fk+1 f2 k = α (1 − 2α)2, so fk converges quadratically to 0.

The proof is now completed. Now, we explore the case α = 1

2.

In the proof of the previous theorem, the hypothesis α < 1

2 is useful to

state that the Jacobian is nonsingular in each iteration. We can replace this hypothesis with the assumption "R is an irreducible matrix" and x > 0. Ac-tually, this is not strange. Indeed, in view of the probabilistic interpretation of multilinear PageRank, zero entries in the solution can appear only when the second-order Markov chain associated with R is not irreducible.

With these assumptions, theorem 13 in [8] causes J(x) to be non singular for each vector x ≤ ˆx where ˆx is the minimal nonnegative solution (that is stochastic). Theorem 3.5 shows the monotonic convergerce of Newton’s iteration. Applying both these theorems, we can state that the sequence xk

is well-defined.

According to [8], we can apply this argument also in the supercritical regime, but now Newton’s iteration converges to the minimal nonnegative solution that is not stochastic.

In the next section, we present an algorithm to compute a stochastic solution by deflating the minimal nonnegative solution.

Notice that in the case α = 1/2 linear convergence holds instead of quadratic convergence. In fact, by Step 3 of Theorem 3.5, we can only state limkfk+1_f_k = 1₄.

(32)

28 CHAPTER 3. ALGORITHMS Data: v, α, R, a tolerance

Result: An approximation to the solution m x = 0;

while ||αR(x ⊗ x) + (1 − α)v − x||1 > do

Solve the system [I − αR(x ⊗ I + I ⊗ x)]p = f(x) with respect to p; x = x + p;

end m = x;

Algorithm 4:Newton iteration

3.6 Perron-based algorithm

From now on, we assume to be in the supercritical case, i.e. α > 1/2 and that the minimal nonnegative solution m is available. Let e denote the column vector with all ones.

Since all the solutions x must satisfy x ≥ m, it makes sense to change variable to obtain an equation in y := x − m ≥ 0.

Using bilinearity of (m + y) ⊗ (m + y) and the fact that m is solution of (2.7), we get:

y = αR(y⊗y)+αR(y⊗m)+αR(m⊗y) = αR(y⊗y)+G0y = (αR(y⊗I)+G0_m)y,

where G0

m is the Frechet derivative of the map G(x) introduced in Section

2.4, i.e. G0

m = αR(I ⊗ m) + αR(m ⊗ I).

Using Proposition 2.5, we have

ety = et(x − m) = 1 −1 − α

α =

2α − 1

α . (3.2)

From now on, we assume G0

m irreducible and m > 0. These assumptions

have already been discussed in the previous section. Define the matrix Py := αR(y ⊗ I) + G

0

m. Since Py ≥ G

0

m, then Py is

irreducible. Moreover, if y is chosen such that et_{y =} 2α−1

α , one gets:

etPy = αetR(y ⊗ I) + αetR(I ⊗ m) + αetR(m ⊗ I)

= αet(y ⊗ I) + αet(I ⊗ m) + αet(m ⊗ I)

= α2α − 1 α e t₊1 − α α e t₊1 − α α e t_{= e}t

(33)

3.6. PERRON-BASED ALGORITHM 29 and Py is a stochastic matrix.

Let us introduce the map P V (A) that gives the Perron-vector u of an irreducible matrix A ≥ 0, normalized such that et_{u = 1}_{. Its existence and}

uniqueness is guaranteed by Theorem 1.1.

Since Py is irreducible and stochastic, equation (3.2) shows that

y = 2α − 1

α P V (Py). (3.3)

Equation (3.3) suggests the following iteration: yk+1 =

2α − 1

α P V (Pyk)

starting from a nonnegative vector y0 such that ety0 = 2α−1_α .

The Perron-based iteration is described in Algorithm 5. Data: v, α, R, a tolerance , an initial value x₀

Result: An approximation to the stochastic solution s Normalize x0;

Compute the minimal solution m; y = x0− m; Normalize y: y = max{y, 0}, y = 2α−1 α y et_y; G0 = αR(m ⊗ I) + αR(I ⊗ m); while ||αR(x ⊗ x) + (1 − α)v − x||1> do Py = αR(y ⊗ I) + G0; y = 2α−1_α P V (Py); x = m + y; end s = x;

Algorithm 5:Perron-based iteration

3.6.1 Perron-Newton method

We can also apply Newton method to find a solution of (3.3). First, we compute the derivative of the function w(y) = P V (Py) along a direction z,

with y ≥ 0 such that et_{y =} 2α−1 α .

Let z be a non-zero vector and set y(h) = y + hz; hence

P_y(h)0 = αR(z ⊗ I). Since Py is irreducible, its Perron eigenvalue is simple

(34)

30 CHAPTER 3. ALGORITHMS and v(h) = P V (Py(h)). Notice that w(y) = v(0). We now compute v0(0)

differentiating the equality Py(h)v(h) = λ(h)v(h) with respect to h. We

divide it into three steps:

• Using the properties of y and m we obtain:

etPy(h)= αetR(y ⊗ I) + αhetR(z ⊗ I) + etG

0

m

= αet(y ⊗ I) + αhet(z ⊗ I) + etG0_m = (2α − 1)et+ αh(etz)et+ 2(1 − α)et = [1 + αh(etz)]et

Therefore the Perron eigenvalue of Py(h) is λ(h) = 1 + hαetz and its

derivative λ0_{(h) = αe}t_z_.

• Differentiating Py(h)v(h) = λ(h)v(h)and evaluating at h = 0, we get:

αR(z ⊗ I)v(0) + Pyv0(0) = α(etz)v(0) + v0(0).

Rearranging terms,

(I − Py)v0(0) = α(R(I ⊗ v(0)) − v(0)et)z

Since et_{v(h) = 1}_{, we have e}t_v0_{(0) = 0}_{; hence:}

(I − Py+ v(0)et)v0(0) = α(R(I ⊗ v(0)) − v(0)et)z

• We now prove that W = I − Py+ v(0)etis a nonsingular matrix.

Since (I − Py+ v(0)et)v(0) = v(0), we find that v(0) is an eigenvector

with eigenvalue 1 of W . Suppose that 0 is an eigenvalue of W , so there exists a vector d such that et_{d = 1} _{and (I − P}

y + v(0)et)d = 0

(d is not orthogonal to e otherwise (I − Py)d = 0 holds and then

d and v(0) would be both eigenvector of Py referred to eigenvalue 1

and this is a contradiction). Indeed, (I − Py)d + v(0) = 0 and also

(I − Py)2d = 0. This implies that 1 is not a simple eigenvalue of Py,

that is a contradiction. Now, we can compute v0₍₀₎_:

(35)

3.6. PERRON-BASED ALGORITHM 31 Since (I − Py+ v(0)et)−1v(0) = v(0), we get:

v0(0) = α((I − Py+ v(0)et)−1R(I ⊗ v(0)) − v(0)et)z.

Newton’s method finds an approximation to the solution of F (y) = 0 where F (y) = y −2α−1_α P V (Py).

By previous computations, we can define the Jacobian

F0(y) = I − (2α − 1)((I − Py+ w(y)et)−1R(I ⊗ w(y)) − w(y)et)

and Newton’s iteration is

yk+1= yk− (F0(yk))−1F (yk).

Classical theorems about Newton’s iteration ensure local convergence. Theorem 3.6. Let m be the minimal nonnegative solution of (2.7) and s be a stochastic solution.

Suppose that the matrix F0_{(s − m)} _{is nonsingular. Then, Newton’s}

iter-ation is locally convergent to the vector y = s − m.

The Perron-Newton method is described in Algorithm 6.

3.6.2 Continuation strategy

The above algorithms are sufficient to solve several numerical tests, but when α ≈ 1 they converge very slowly or stagnate far away from the minimal solution. We now describe a continuation strategy based on the relationship between the stochastic solution s and the parameter α. The key idea is to solve a sequence of multilinear PageRank problems increasing the value of α. At each step we use the following result.

Proposition 3.7. Let s be a stochastic solution of (2.7), α > 1/2, and suppose I − G0

s nonsingular. Then there exists a stochastic solution sαˆ to

(2.7) with α replaced by ˆα such that sαˆ = s − (I − G

0

s)

(36)

32 CHAPTER 3. ALGORITHMS Data: v, α, R, a tolerance , an initial value x0

Result: An approximation to the stochastic solution s Normalize x0;

Compute the minimal solution m; y = x0− m; Normalize y: y = max{y, 0}, y = 2α−1 α y et_y; G0 = αR(m ⊗ I) + αR(I ⊗ m); while ||αR(x ⊗ x) + (1 − α)v − x||1 > do Py = αR(y ⊗ I) + G0; w = P V (Py);

Solve the system (I − Py+ wet)z = R(I ⊗ w) with respect to z;

F0 = I + (2α − 1)wet− (2α − 1)z; y = y − F0−1(y − 2α−1_α w); Normalize y: y = 2α−1 α y et_y; x = m + y; end s = x;

Algorithm 6:Perron-Newton iteration

Proof. From now on, we consider the quantities s, f, G, G0 _functions

depend-ing on variable α.

We apply the implicit function theorem to

0 = f (s) = αR(s ⊗ s) + (1 − α)v − s obtaining ∂f ∂s = αR(s ⊗ I) + αR(I ⊗ s) − I = G 0 s− Iis a nonsingular matrix; ∂f ∂α = R(s ⊗ s) − v; ds dα = − ∂f ∂s −1∂f ∂α = (I − G 0 s) −1 (v − R(s ⊗ s)).

Note that sαˆ is stochastic by Corollary 2.4, indeed the continuous

func-tion et_s_{cannot jump from the value} 1− ˆα ˆ

α to 1 without taking intermediate

values.

The thesis follows by Taylor expansion.

This result suggests a new strategy to solve multilinear PageRank prob-lem with α ≈ 1.

(37)

3.6. PERRON-BASED ALGORITHM 33 Start with α0 = 0.6and solve a multilinear PageRank obtaining an

ap-proximation sα0 to the solution. Then, update α0 in some way: call α1 the new value and go on, until αh is less than α and the residual is greater than

a fixed tolerance. In [7], the authors suggest also an effective method for increasing the value of αh at each step. Let us discuss it.

Once sαh is computed, from the relation sαh ≈ sαh−1+ ds dα|αh−1(αh− αh−1) + 1 2 d2_s dα2|αh−1(αh− αh−1) 2_,

we can derive an estimate for the second derivative d2_s

dα2|αh−1and choose αh+1 such that the deleted second order term in the approximation (3.4)

1 2 d2_s dα2|αh(αh+1− αh) 2_≈ 1 2 d2_s dα2|αh−1(αh+1− αh) 2

is less than a fixed threshold τ.

(38)

34 CHAPTER 3. ALGORITHMS Data: v, α > 0.6, R, a tolerance , a threshold τ for the second

derivative

Result: An approximation to the stochastic solution s α0 = 0.6;

h = 0;

x =a stochastic solution with parameters R, α0, v ;

while α_h> α do

if it’s the first step then β = 0.01; else β = 2||x − xold||1/(αh− αh−1)2; end StepSize=q2τ β; αh+1= αh+StepSize; if αh+1> α then αh+1= α; end xold= x; x0 = x − (I − G 0 x)−1(v − R(x ⊗ x))(αh+1− αh);

x =a stochastic solution with parameters R, αh+1, v and initial

approximation x0;

h = h + 1; end

s = x;

Algorithm 7:Continuation method

3.7 Computational issues

In this section we deal with speed of convergence, computational cost and numerical stability of the algorithms above. We report our considerations in the table 3.1.

Recall that Fixed-point iteration, Inner-outer iteration, Inverse iteration and Perron iteration converge linearly, while Perron-Newton iteration con-verges quadratically (using Newton iteration as subroutine).

Notice that Fixed-point iteration has a greater convergence rate than Inner-outer iteration, but Inverse iterations has the lowest one.

(39)

3.7. COMPUTATIONAL ISSUES 35

Table 3.1: Computational cost and numerical stability of the algorithms 1, 2, 3, 5, 6. The parameter n is the number of rows of the matrix R, that is the same for all algorithms.

1 2 3 5 6

Computational cost per iteration O(n2₎ _O(n2₎ _O(n3₎ _O(n3₎ _O(n3₎

Stability YES YES YES YES

-Algorithms 1,2 and 4 require only sums and products of nonnegative matrices, so they are free of numerical issues. It is also true for Algorithm 3, since at each iteration it calls for solving a linear system by means of the inversion of an M-matrix, that is a stable task. Instead, Perron-Newton iteration calls for solving a linear system without good theoretical properties, so numerical stability is not guaranteed, but numerical experiments show that it may outperform the other methods (see Chapter 5).

(40)

(41)

Chapter 4

Graph clustering based on

third order structures

In this chapter we focus on an application of multilinear PageRank to graph clustering, exploiting its higher-order structure. In [3], a "Tensor spectral clustering" algorithm has been developed in order to exploit third order structure and others. Moreover, for this purpose, multilinear PageRank has been used also with high value of α. Hence, Perron-Newton method could be applied, since it may outperform the other methods in the supercritical case. We discuss also a different model, proposed in [9], that may allow us to store and manage large networks in a more efficient way.

4.1 Transition matrix and transition tensor

First, we recall notions about the transition matrix associated with a graph and how it may be used to partition a graph in clusters. Then, we Figure 4.1: Triangle between nodes 1, 2, 3 in an undirected and not weighted graph. We fill the entries (1, 2, 3) of the adjacency tensor T and its permu-tations with the value 1.

1

2

3

(42)

38 CHAPTER 4. GRAPH CLUSTERING generalize these notions for tensors in order to represent higher-order struc-ture of interest.

Consider an undirected graph G = (V, E), where n = |V | and m = |E|. Let A ∈ Rn×n

+ be the adjacency matrix of G, i.e., ai,j = 1 if (i, j) ∈ E and

ai,j = 0otherwise. Let D be the diagonal matrix with degrees of the vertices

of G. In other words, D = diag(Ae), where e is the vector of all ones. Define the Laplacian matrix K = D − A.

Thus, the matrix P = At_D−1 _{is a stochastic matrix, which we call the}

transition matrix. We may interpret this matrix as a Markov chain, that represents a random walk on the graph G.

We now show how the second left eigenvector of the Markov chain de-scribed here is key to spectral clustering.

An important tool for measuring the "quality" of a spectral clustering is the so called conductance of a subset S of V .

Definition 4.1. Let S be a subset of V . The conductance of S is

φ(S) = cut(S)

min

vol(S), vol(Sc₎ ,

where cut(s) = |{(u, v)|u ∈ S, v ∈ Sc_}|_{, vol(S) = |{(u, v)|u ∈ S}| and}

Sc= V \ S.

Small conductance indicates a good partition of the graph based only on the edges. Later, we generalize this notion in order to capture a good partition based on a higher-order structure.

It’s well-known that minimizing the conductance is an NP -hard problem and an approximation of this task is the following optimization problem [3].

min z∈Rn ztKz zt_Dz, s.t. et_{Dz = 0, ||z||} 2 = 1.

The latter is equivalent to the generalized eigenvalue problem Kz = λDz where the matrices K and D are positive semi-definite. The solution is the eigenvector related to the second smallest eigenvalue. In fact, the smallest eigenvalue is 0 and it corresponds to the trivial solution z = e, since Ke = 0.

(43)

4.1. TRANSITION MATRIX AND TRANSITION TENSOR 39 We rewrite this problem in terms of P .

Kz = λDz ⇔ (I − D−1A)z = λz ⇔ ztP = (1 − λ)zt.

where 1 − λ is the second largest eigenvalue of P , since 1 is the largest one and et_{P = e}t_.

Now, we have to describe how to convert the real value solution z into a partition of the underlying graph. We will do it later, when we will describe the generalized problem in terms of tensors.

The key ingredients for spectral clustering discussed above were a transi-tion matrix from an undirected graph, a Markov chain interpretatransi-tion of the transition matrix, and the second left eigenvector of the Markov chain. We now generalize these ideas for third-order network structures.

Definition 4.2. Let G = (V, E) be an undirected graph. Consider a sym-metric tensor T representing a third order structure.

The transition tensor P is defined as follows Pi,j,k=

Ti,j,k

P

iTi,j,k

1 ≤ i, j, k ≤ n. In the case that Pn

i=1Ti,j,k = 0, we fill in P:,j,k with a stochastic vector

u, called dangling distribution vector.

Definition 4.3. Let G = (V, E) be an undirected graph and T its repre-senting tensor as in Definition 4.2. We define the cut and volume measure for a non empty subset S ⊆ V :

cut3(S) = X i>j>k∈V Ti,j,k− X i>j>k∈S Ti,j,k− X i>j>k∈Sc Ti,j,k vol3(S) = X i∈S,j>k∈V Ti,j,k

Now the conductance is φ3(S) = cut3(S) min vol3(S), vol3(Sc)

(44)

40 CHAPTER 4. GRAPH CLUSTERING Let us make few comments about the tensor T .

Since T represents a third order structure in an undirected graph, we expect that all entries with at least two equal indices are 0, and the others could be 0 or 1, according to the underlying graph. Nevertheless, we may represent some third order structures in directed networks following other rules. For instance, nodes i, j, k form two directed 3-cycle in a directed graph if every possible directed edge between them is present and they form one directed 3-cycle if only one cycle between them is present. In order to capture this structure we can fill the tensor T as follows.

Ti,j,k =         

2 i, j, kform two directed 3-cycles 1 i, j, kform one directed 3-cycles 0 otherwise

Notice that also in this case the tensor T is symmetric and we can build the transition tensor P as well as in Definition 4.2.

4.2 Tensor spectral clustering

Entries of the transition tensor P can be interpreted as the transition probabilities of a second-order Markov chain. Now, we can exploit the tools described in Chapter 2 in order to interpret the stochastic tensor P as a spacey random walk on the graph G. We know that the stationary distribu-tion of this process satisfies the equadistribu-tion

x = αR(x ⊗ x) + (1 − α)v,

where α ∈ (0, 1) and R is the same defined in Chapter 2. We use Rk= P•,•,k

to denote the kth n × n block of R.

Let us re-interpret x as a stationary distribution of a particular PageRank problem.

Specifically, define the matrix P (x) =

n

X

k=1

xkRk.

(45)

4.2. TENSOR SPECTRAL CLUSTERING 41 1. Note that R(x ⊗ x) = n X k=1 Rk(xkx) = Xn k=1 Rkxk x = P (x)x.

Therefore, the multilinear PageRank vector satisfies

x = αP (x)x + (1 − α)v. (4.1)

Equation (4.1) is a first-order PageRank equation where the transition matrix depends on the solution itself.

Analogously to the first part of this chapter, we can interpret the second left eigenvalue of P (x) as the key for partitioning the graph G. Specifically, we start from a transition tensor P representing a third order structure and we build a particular first order Markov chain associated with the tensor P . Heuristically, P (x) is a weighted sum of n views of the graphs (the matrices Rk), from the perspective of each node, according to the three-dimensional

structure. If node k has a large influence on the three-dimensional structure, then xk will be large and we will weight data associated with node k more

heavily.

Hence, minimizing the conductance φ3 could be approximated similarly

to the first part of this chapter and the eigenvector z associated with the second left eigenvalue of P (x) could be interpreted as the solution of our problem.

We now describe how to convert z into a partition of the graph G. First, we sort the vertices by means of the values zi and then, consider

vertex sets Sk that consist of the first k nodes in the sorted vertex list. In

other words, if σi is equal to the index of the i-th smallest element of z, then

Sk= {σ1, σ2, . . . , σk}. We then choose S = arg minSkφ3(Sk).

Algorithm 8 shows a straightforward implementation of this spectral clus-tering method.

(46)

42 CHAPTER 4. GRAPH CLUSTERING Data: G = (V, E), |V | = n, a nonnegative tensor T , a dangling vector

u, α ∈ (0, 1), a stochastic vector v Result: Set of nodes S ⊆ V

for 1 ≤ j, k ≤ n such that P_iTi,j,k 6= 0 do

P:,j,k= T:,j,k/PiTi,j,k;

end

for j, k such that P_iTi,j,k = 0 do

P:,j,k= u; end x = M ultilinearP ageRank(α, P, v); Rk = P (:, :, k) for each k; P (x) =P kxkRk;

Compute the second left eigenvector z of P (x); σ = sort(z);

Sk= {σ1, . . . , σk}for each k;

S = arg minSkφ3(Sk);

Algorithm 8:Tensor spectral clustering

4.3 Another model for spectral clustering

In Definition 4.2 we have built the transition tensor P by normalizing the tensor T , but if Pn

i=1Ti,j,k = 0, this strategy fails. Therefore, we have

introduced a dangling vector u to fill P:,j,k in that case. Because the number

of j, k such that Pn

i=1Ti,j,k = 0 could be extremely large, this is not ideal.

In [9], a different stochastic model has been developed in order to avoid this issue. In this section, we provide a description of the results of [9].

Let us define Z = {(j, k) ∈ V × V | Pn

i=1Ti,j,k 6= 0}.

Consider a stochastic process whose transition probabilities are given by

P r(Xt+1= i|Xt= j,Ft) = (1 − α)vi+ αP r(Xt+1= i|Xt= j, Yt= k,Ft)P r(Yt= k|Fn)

P r(Xt+1= i|Xt= j, Yt= k,Ft) =

 



Ti,j,k/Pti=1Ti,j,k (j, k) ∈Z 1

n+t(1 +

Pt

s=11{Xs=i}) (j, k) /∈Z where Yt is the same defined in Section 2.3 for the spacey random walk.

This new process has been called "super-spacey random surfer". It picks a random state whenever the transition is not defined, i.e, when

Pn

(47)

4.3. ANOTHER MODEL FOR SPECTRAL CLUSTERING 43 Let P be the normalized tensor Ti,j,k/Pni=1Ti,j,k only for (j, k) ∈ Z.

Notice that P is not a transition tensor since P:,j,k = 0 for some (j, k).

Let x be a stationary distribution for a super-spacey random surfer and let R be the flattening of P along the first index. Then,

x = αR(x ⊗ x) + α(1 − ||R(x ⊗ x)||1)x + (1 − α)v. (4.2)

See [9] for a heuristic proof.

Let us rewrite this equation in terms of stationary distribution equation of a Markov chain. Define ˜P (x) = P (x) + x(et− et_{P (x))} _{where P (x) =}

P

kxkRk. The matrix ˜P is stochastic and a stationary distribution of a

super-spacey random surfer satisfies

x = α ˜P (x)x + (1 − α)v. (4.3)

Equation (4.3) is similar to equation (4.1). This similarity allows us to develop a spectral clustering algorithm based on the super-spacey random walk. Again, the key is the second largest eigenvalue of ˜P (x). Thus, a new issue arises: computing a stationary distribution x. Later, we will deal with this issue, adapting some algorithms of Chapter 3 in order to solve (4.3).

The new spectral clustering algorithm is a little bit modification of the algorithm 8, based on the tools introduced above. Algorithm 9 shows a suit-able implementation.

(48)

44 CHAPTER 4. GRAPH CLUSTERING Data: G = (V, E), |V | = n, a nonnegative tensor T , α ∈ (0, 1), a

stochastic vector v Result: Set of nodes S ⊆ V

for 1 ≤ j, k ≤ n such that P_iTi,j,k 6= 0 do

P:,j,k= T:,j,k/PiTi,j,k;

end

for j, k such that P_iTi,j,k = 0 do

P:,j,k=vector of all zeros;

end

x =stationary distribution of the super-spacey random walk defined by P ,α,v;

Compute ˜P (x);

Compute the second left eigenvector z of ˜P (x); σ = sort(z);

Sk= {σ1, . . . , σk}for each k;

S = arg minSkφ3(Sk);

Algorithm 9:Tensor spectral clustering 2

4.4 Stationary distribution of a super-spacey

ran-dom walk

In this section, we present some useful algorithms to compute an approx-imation of the stationary distribution of the super-spacey random walk. The first algorithm, outlined in the next Proposition 4.2, has been presented in [9]. The second algorithm, introduced and analysed in Proposition 4.3, is a modification of the former one that we have introduced in order to improve the computational efficiency. We also discuss the uniqueness of the solutions to (4.3).

Proposition 4.1. Consider a super-spacey random walk as defined above. If α < 1₅, then there exists a unique stationary distribution x satisfying equation (4.3).

Proof. Assume x and y are two solutions of (4.3). Let rx= ||R(x ⊗ x)||1and

ry = ||R(y ⊗ y)||1. Then

(49)

4.4. SUPER-SPACEY RANDOM WALK 45 Exploiting the same property of Kronecker product used in Chapter 3, we obtain:

||x − y||1 ≤ 2α||x − y|| + α||(1 − rx)(x − y)||1+ α||rx− ry|| ≤ 5α||x − y||1

Therefore x = y whenever α < 1 5.

Proposition 4.2. Consider a super-spacey random walk defined by a tensor P, α < 1₅ and a stochastic vector v. Let x0 be a stochastic vector. The

sequence generated by the fixed point iteration xk+1= α ˜P (xk)xk+ (1 − α)v

converges linearly to the unique stationary distribution x.

Proof. The proof is a simple modification of the same one proposed to demonstrate the next proposition. In particular, it turns out that the se-quence (xk)k converges linearly with convergence rate 5α.

Here, we propose a new effective modification in the iteration above, which is explained in detail in the next

Proposition 4.3. Consider a super-spacey random walk defined by a tensor P, α < 1₅ and a stochastic vector v. Let x0 be a stochastic vector. The

sequence generated by the inverse iteration

xk+1= α ˜P (xk)xk+1+ (1 − α)v

converges linearly to the unique stationary distribution x. Proof. Consider the error at (k + 1)st step

xk+1− x = α ˜P (xk)xk+1− α ˜P (x)x

= α[R(xk⊗ I)xk+1+ xk(e t

− etR(xk⊗ I))xk+1− R(x ⊗ I)x − x(e t − etR(x ⊗ I))x] = α[R(xk⊗ xk+1− x ⊗ x) + x(1 − etR(x ⊗ x) | {z } r ) − xk(1 − etR(xk⊗ xk+1) | {z } rk )].

Taking the 1-norm and exploiting the same property of Kronecker product used in Chapter 3, we obtain

(50)

46 CHAPTER 4. GRAPH CLUSTERING The last term is

||(1 − r)x − (1 − r_k)xk||1 = ||(1 − r)(x − xk) − rxk+ rkxk||1 ≤ ||(1 − r)(x − xk)||1+ ||xk||1 | {z } =1 |r − rk| ≤ ||x − xk||1+ |etR(x ⊗ x − xk⊗ xk+1)| ≤ ||x − x_k||1+ ||R(x ⊗ x − xk⊗ xk+1)||1 ≤ 2||x − xk||1+ ||x − xk+1||1. Finally, ||xk+1− x||1≤ α||xk− x||1+ α||xk+1− x||1+ 2α||xk− x||1+ α||xk+1− x||1, (1 − 2α)||xk+1− x||1 ≤ 3α||xk− x||.

Therefore, the scheme converges linearly with rate 3α

1−2α < 1 if α < 1 5.

Let us make a few comments.

Remark 1. In the convergent regime, i.e, if α < 1

5, the fixed point iteration

presented in Proposition 4.2 has a greater convergence rate than the inverse iteration of Proposition 4.3, but the latter requires solving a linear system at each step. However, the matrix is a nonsingular M-matrix at each step. This means that the inverse iteration is stable and free of numerical issue. Figure 4.2 shows that the inverse iteration may outperform fixed-point iteration. Remark 2. The operator ˜P is nonlinear in the variable x, so the theory about the quadratic vector equation, used in Section 2.4, fails. This means that computing a stochastic solution to (4.3) could be hard in the case α > 1

5.

For this reason, algorithm 8 is a good alternative. Actually, both algorithms 8 and 9 are based on some heuristic ideas, so numerical experiments will be an important tool to find when one is better than the other.

Remark 3. Both algorithms 8 and 9 call for computing the second left eigen-vector of a stochastic matrix. We’ll discuss this issue, from the computational point of view, in the next chapter.

Remark 4. Both algorithms 8 and 9 call for computing the conductance φ3

of a subset S ⊆ V . Also this issue will be discussed in the next chapter. Remark 5. The problem discussed in this chapter could be generalized be-yond the third order structure. It’s possible to build a graph clustering

Multilinear PageRank and its application to graph clustering

Contents

Introduction

Chapter 1

Preliminaries

1.1

Perron-Frobenius theorem and its applications

1.2

Nonnegative tensors

1.3

On quadratic vector equations

1.4

Vertex-reinforced random walk

Chapter 2

Multilinear PageRank problem

2.1

PageRank for higher-order Markov chains

2.2

A rank one approximation: multilinear

Page-Rank

2.3

Spacey random walk

2.4

Solution of a vector equation

2.5

Multilinear PageRank of order greater than two

Chapter 3

Algorithms

3.1

Fixed-point iteration

3.2

Shifted fixed-point iteration

3.3

Inner-outer iteration

3.4

Inverse iteration

3.5

Newton iteration

3.6

Perron-based algorithm

3.7

Computational issues

Chapter 4

Graph clustering based on

third order structures

4.1

Transition matrix and transition tensor

1

2

3

4.2

Tensor spectral clustering

4.3

Another model for spectral clustering

4.4

Stationary distribution of a super-spacey

ran-dom walk