• Non ci sono risultati.

Link-based Ranking

N/A
N/A
Protected

Academic year: 2021

Condividi "Link-based Ranking"

Copied!
12
0
0

Testo completo

(1)

Ranking

Link-based Ranking

(2° generation)

Reading 21

(2)

Query-independent ordering

First generation: using link counts as simple measures of popularity.

Undirected popularity:

Each page gets a score given by the number of in-links plus the number of out-links (es. 3+2=5).

Directed popularity:

Score of a page = number of its in-links (es. 3).

Easy to SPAM

(3)

Second generation:

PageRank

Each link has its own importance!!

PageRank is

independent of the query

many interpretations…

(4)

Basic Intuition…

What about nodes with no in/out links?

(5)

Google’ s Pagerank



else j i i

j out

i

0

) (

# 1

,

N j

out j i r

r

i B j

) 1 1

) ( (

#

) ) (

(

) (

 

B(i)B(i) : set of pages linking to i. : set of pages linking to i.

#out(j)

#out(j) : number of outgoing links from j. : number of outgoing links from j.

ee : vector of components 1/sqrt{N}. : vector of components 1/sqrt{N}.

Random jump

Principal eigenvector

r = [  

T

+ (1-) e e

T

] × r

(6)

Three different interpretations

Graph (intuitive interpretation)

Co-citation

Matrix (easy for computation)

Eigenvector computation or a linear system solution

Markov Chain (useful to prove convergence)

a sort of Usage Simulation

Any node

Neighbors

In the steady state” each



page has a long-term visit rate

- use this as the page’ s score.

(7)

Pagerank: use in Search Engines

Preprocessing:

Given graph, build matrix

Compute its principal eigenvector r

r[i] is the pagerank of page i

We are interested in the relative order

Query processing:

Retrieve pages containing query terms

Rank them by their Pagerank

The final order is query- independent

 

T

+ (1-) e e

T

(8)

HITS: Hypertext Induced Topic

Search

(9)

Calculating HITS

It is query- dependent

Produces two scores per page:

Authority score: a good authority page for a topic is pointed to by many good hubs for that topic.

Hub score: A good hub page for a topic points to many authoritative pages for that topic.

(10)

Authority and Hub scores

2

3

4

1 1

5

6

7

a(1) = h(2) + h(3) + h(4) h(1) = a(5) + a(6) + a(7)

(11)

HITS: Link Analysis Computation

Where

a: Vector of Authority’ s scores h: Vector of Hub’ s scores.

A: Adjacency matrix in which ai,j = 1 if ij

h AA

h

Aa A

a Aa

h

h A

a

T T T

 

 

Thus, h is an eigenvector of AA

t

a is an eigenvector of A

t

A

Symmetric matrices

(12)

Weighting links

Weight more if the query occurs in the

neighborhood of the link (e.g. anchor text).

y x

y a

x h

) (

) (

x y

y h x

a

) (

)

( ( ) ( , ) ( )

) ( )

, ( )

(

y h y

x w x

a

y a y

x w x

h

x y

y x

Riferimenti

Documenti correlati

Government Printing Office , Social Security Administration (US)..

This paper reports an experimental study of transient natural convection heat and mass transfer between two large cylindrical reservoirs connected through a short

[r]

[r]

To drill a cube is to partition it into 27 congruent subcubes and remove the closed central cube along with the 6 remaining subcubes sharing a face with that cube, resulting in a

Solution proposed by Roberto Tauraso, Dipartimento di Matematica, Universit`a di Firenze, viale Morgagni 67/A, 50134 Firenze, Italy.. We denote by σ(n) the sum of the divisors of

We look for small (sometimes “minimal”) set of generators for the fields K(E[m]) generated by the m-torsion points of an elliptic curve E (where K is any number field).. For m = 3

For the point-like source search using the direction information of upward-going muons, we evaluate the background due to atmospheric neutrino induced muons randomly mixing for