• Non ci sono risultati.

Information Retrieval 5 June 2015 Ex 1 [ranks 5]

N/A
N/A
Protected

Academic year: 2021

Condividi "Information Retrieval 5 June 2015 Ex 1 [ranks 5]"

Copied!
1
0
0

Testo completo

(1)

Information Retrieval

5 June 2015

Ex 1 [ranks 5]Let us given a graph G of directed edges {(1,2), (2,1), (1,4), (3,1)}.

Simulate the execution of two steps of PageRank, starting with nodes with PR=1 (unnormalized), and assuming a teleportation step which favors node 3.

Ex 2 [points 4+5] Given the sequence S = (1, 2, 1, 1, 4, 7, 7, 4, 1, 3, 9), show:

 the PForDelta encoding with b=2 and base=0;

 the Elias-Fano encoding (warning: remind that EF encodes monotonic sequences).

Ex 3 [points 4+4]

 Describe the implementation of the Rank data structure which adds o(m) bits to a binary array B of length m, and answers any Rank-query in constant time.

 Then, show the data structure on B = [0011 0101 1110 1010 0000 1111 1010 0000] by setting z=4 and Z=8.

Ex 4 [points 4] Describe how the cosine similarity between a LSI-projected document and a LSI-projected query is computed in the reduced space of k dimensions.

Ex 5 [points 4] Given the following matrix of pair-wise similarities between 5 items

I1 1, 00 I2

0, 80

1, 00 I3

0, 20

0, 70

1, 00 I4

0, 70

0, 60

0, 40

1, 00 I5

0, 50

0, 50

0, 30

0, 80

1, 00

Show the next cluster formed by the agglomerative clustering algorithm based on AVG similarity-function, given that we have already formed the following clusters {(I1,I2), I3, (I4,I5)}

Riferimenti

Documenti correlati

Assume that you use shingles of 3 words, hash functions generate random values in {0,…,10} (that you can choose as you want), and repeat min-hashing twice for estimating the

Ex 5 [LAB TEST] Let us assume that we have built a Lucene index, with a Whitespace Analyzer, over the following three documents:.  d1 = " Con il nome Valle dei Re si

Ex 2 [points 4+3] Write and comment on the formula of PageRank and Personalized PageRank, and discuss how the Personalized PageRank can be used to estimate the “similarity” between two

Ex 4 [points 2+4] Describe a dynamic index based on a small-big solution or a cascade of indexes, specifying the pros/cons of each of these solutions to the problem of adding

Ex 6 [lode] Show how it is possible to efficiently identify whether two bit-vectors of size d and hamming distance k are highly similar using the LSH approach, and compute

Ex 5 [ranks 4] Given an annotator, such as TagMe, show how you could empower the TF-IDF vector representation in order to perform “semantic” comparisons between documents by

 SECOND MINIMUM, which defines the similarity between a pair of clusters as the second minimum distance between all pairs of items of the two clusters Ex 3 [points 3] Discuss the

Ex 1 [ranks 3+3+3] Given the words {abaco, adagio, baco, alano} construct a 2-gram index, and describe the algorithm that solves 1-edit distance over it. Finally test the