• Non ci sono risultati.

Information Retrieval 4 September 2018 Exercise 1 [rank 5+5]

N/A
N/A
Protected

Academic year: 2021

Condividi "Information Retrieval 4 September 2018 Exercise 1 [rank 5+5]"

Copied!
1
0
0

Testo completo

(1)

Information Retrieval

4 September 2018 Exercise 1 [rank 5+5] Answer to the following questions:

 Show the formula that identifies the best number of hash functions in a Bloom Filter of size m bits, storing n keys. Then show how it is derived.

 Given a dictionary D of strings of variable length, show an efficient data structure that solves the following wild-card query of the form s*t over D, where s and t are two strings of variable length.

Exercise 2 [points 5] Given a directed graph G with edges {(1,2), (2,1), (1,4), (3,1)}.

Simulate the execution of one step of Personalized PageRank for node 3, starting with nodes with probability ¼, and alpha = 1/2.

Exercise 3 [points 4+4] You are given the two files: F_old = “pane cotto basso”, F_new = “pane basso rotto”, and assume a block size B=3 chars.

 Show the execution of the algorithm rsync. (comment the various steps)

 Show the execution of the algorithm zsync. (comment the various steps) Exercise 4 [points 3+4] Describe the shingling technique and how it can be used, in combination with min-hashing, to compute the near-duplicate similarity of

documents, by commenting also the advantages.

Apply your description to the following documents d1 = “the box is red and large”, d2 = “the large box is red”, d3 = “the fox is red”. Assume that you use shingles of 3 words, hash functions generate random values in {0,…,10} (that you can choose as you want), and repeat min-hashing twice for estimating the Jaccard coefficient of two sets.

Riferimenti

Documenti correlati

Questa quk giera curiosa de saver cossa ghe giera drento in quela camara, che elo no voleva che Y an- dasse. La vk e la verze sta camara, e la vede che drento ghe xe tante anime

He expects to get money, because they look at him like they want to do something for him but, instead, as Rose described to us today, they say to him, “Look, we’re poor like you,

Le nostre linee personalizzate ad alta velocità sono in grado di movimentare elevate quantità di prodotti offrendo soluzioni di confezionamento rapide, in grado di soddisfare

Insert directly in the Boston shaker filled with ice cubes, in the following order: SQUEEZE LEMON LIME POWER SWEET&SOUR, SQUEEZE MIXED BERRIES LUXURY 2.. Finish the drink

Measles spreads quickly within and across communities unless herd immunity is sustained.

Several laboratories have successfully isolated stem cells from many tissues: the brain, the blood, the skin, the skeletal muscle, the cord blood, and the bone marrow and many have

Insert directly in the Boston shaker filled with ice cubes, in the following order: SQUEEZE LEMON LIME POWER SWEET&SOUR, SQUEEZE MIXED BERRIES LUXURY 2.. Finish the drink

Gestione elettronica Piattaforma inerziale a 6 assi, suite APRC che comprende ATC (controllo di trazione, AWC (controllo di impennata), AEB (freno motore) AEM (mappe motore),