• Non ci sono risultati.

Information Retrieval 13 January 2020 Name: Surname: Matricola:

N/A
N/A
Protected

Academic year: 2021

Condividi "Information Retrieval 13 January 2020 Name: Surname: Matricola:"

Copied!
1
0
0

Testo completo

(1)

Information Retrieval

13 January 2020

Name: Surname: Matricola:

Exercise [points 3+4+4] Given a sequence of integers S = (1, 6, 11, 13, 15, 20, 21), encode them using (by motivating which sequence is compressed):

 Gamma-code (of Elias)

 PForDelta with base=1 and b=2

 Elias-Fano

Exercise [points 5] Consider the WAND algorithm by assuming that it is examining the head of the following three posting lists:

t1  (…, 5, 6, 7, 8, 11) t2  (…, 2, 3, 5, 7, 8, 11) t3  (…, 1, 4, 7, 13, 15)

At that time the current threshold equals 2, and the upper bounds of the scores in each posting list are: ub_1 = 0.4, ub_2 = 1, ub_3 = 0.7.

Which is the next docID whose full score is computed? (Motivate your answer)

Exercise [points 3+4+4+3]

Answer to the following questions:

1. Define Precision, Recall and F1.

2. Draw the graph G with edges (1,2), (2,3), (3,1) and (2,1), and show the formulas of the HITS algorithm for the hubness and authority scores of every node of your graph.

3. Show how to compute via Bloom Filter the intersection between two sets A and B, each one stored on a different machine, and estimate its communication cost (in bits) and its expected error (as the average number of extra items returned).

4. Show and prove the probability of a collision via Karp-Rabin’s rolling hash for two binary strings A and B of m bits each

Riferimenti

Documenti correlati

In 2008 we have witnessed the publication of two collective volumes dedicated to Public Communication of Science and Technology, an emergent field positioned in between the older

Certo, tra i più importanti vi è la trasformazione delle forme della comunicazione medica in ambito sociale, che non è soltanto, come viene citato in gran parte delle

 Define what is a Spectral Bloom filter, the operations supported, and show the formula of the error rate (as a function of the space and number of keys) and prove its correctness.

[r]

 Execute one step of the Personalized PageRank algorithm with respect to the node 1, assuming alpha = ½ and uniform

 Compute two steps of PageRank by assuming uniform starting probability and parameter ¼ for the teleportation step (hint: try to operate only on fractions)..  Assume now that

Exercise [rank 5] Given the string S=baobab, compute the Suffix Array, and then derive the LCP array by using the linear-time algorithm seen in class, showing the amount of

Exercise [rank 5] Given the string S=abaa, compute its Arithmetic encoding based on empirical frequencies by emitting the final binary compressed string. (hint: please work with