• Non ci sono risultati.

Information Retrieval 16 July 2019 Ex 1 [ranks 5]

N/A
N/A
Protected

Academic year: 2021

Condividi "Information Retrieval 16 July 2019 Ex 1 [ranks 5]"

Copied!
1
0
0

Testo completo

(1)

Information Retrieval

16 July 2019

Ex 1 [ranks 5] Let us given a graph G of directed edges {(1,3), (3,1), (1,4), (2,1)}.

Simulate the execution of one step of the PageRank algorithm, starting with all nodes having PR=1 (unnormalized), and assuming a teleportation step which favors node 3 (hence, it jumps only to it).

Ex 2 [points 5] Simulate the execution of the Consistent Hashing technique over the urls_ID = {3, 4, 9, 2, 5, 7, 12, 11} and the crawlers_ID = {1,2,3}, by defining proper hash functions in the co-domain of size m=13.

Ex 3 [ranks 4+3] Let us given a set of strings S = { pitom, dad, daddy, zoom }.

- Build a 2-gram index over S

- Given P = atom, show how your index executes the 1-edit error search in S.

Ex 4 [points 5] Show the Rocchio’s formula and comment on its application and its pros/cons.

Ex 5 [points 4+4] Given the two adjacency lists of a Web graph for the nodes 14 and 15, namely

14  3, 10, 11, 13, 14, 17, 19, 21, 25

15  5, 10, 11, 12, 14, 17, 19, 20, 21, 24, 33

show how the list of “15” can be compressed given the list of “14” by means of the algorithm adopted in WebGraph via (1) first copy-lists and (2) then copy-blocks.

Riferimenti

Documenti correlati

Assume that you use shingles of 3 words, hash functions generate random values in {0,…,10} (that you can choose as you want), and repeat min-hashing twice for estimating the

[r]

Ex 4 [points 2+4] Describe a dynamic index based on a small-big solution or a cascade of indexes, specifying the pros/cons of each of these solutions to the problem of adding

Ex 6 [lode] Show how it is possible to efficiently identify whether two bit-vectors of size d and hamming distance k are highly similar using the LSH approach, and compute

Ex 5 [ranks 4] Given an annotator, such as TagMe, show how you could empower the TF-IDF vector representation in order to perform “semantic” comparisons between documents by

 SECOND MINIMUM, which defines the similarity between a pair of clusters as the second minimum distance between all pairs of items of the two clusters Ex 3 [points 3] Discuss the

Ex 1 [ranks 3+3+3] Given the words {abaco, adagio, baco, alano} construct a 2-gram index, and describe the algorithm that solves 1-edit distance over it. Finally test the

[r]