• Non ci sono risultati.

Information Retrieval 26 November 2019 (First MidTerm) Name: Surname: Matricola: Ex 1 [rank 4]

N/A
N/A
Protected

Academic year: 2021

Condividi "Information Retrieval 26 November 2019 (First MidTerm) Name: Surname: Matricola: Ex 1 [rank 4]"

Copied!
1
0
0

Testo completo

(1)

Information Retrieval

26 November 2019 (First MidTerm)

Name: Surname: Matricola:

Ex 1 [rank 4] Given the three sets A = {1, 5, 8, 9}, B = {2, 3, 8}, C = {2, 5, 8} drawn from the Universe {0, 1, …, 10}. Show how you can use the Min-hash approach to evaluate the Jaccard similarity between every pair of those sets, based on a sketch of size 3 which is defined according to the permutations P1(x) = 4x+2 mod 11,

P2(x) = 2x mod 11, and P3(x) = 5x+1 mod 11.

Ex 2 [ranks 5] Given the following three adjacency lists of a Web graph, compress them according to the algorithm seen in class, by assuming that every list can copy from anyone before it:

4  7, 8, 13, 14, 18, 19 5  7, 9, 11, 13, 14, 18 6  7, 8, 11, 13, 14, 19

Ex 3 [ranks 4+4] Given the following files f_old = “aaabb” and f_new = “aabaaac”, show the execution of rsync and the execution of zsync on them by assuming a block of size 3.

Ex 4 [ranks 5] Given the dictionary of strings D = {aabb, aba, acac} construct a bigram index (hence k=2) and then search the string Q = “aabc” by assuming an edit-

distance error e=1. Precisely, first use the overlap distance to filter a set of

candidates, that are subsequently checked via dynamic programming, and return the string(s) at edit-distance 1.

Ex 5 [points 4+4] Answer to the following two questions:

1. Given two binary vectors of size d and having hamming distance h, state and prove the formula that defines the probability that LSH declares a match between them (the formula must use the parameters d and h).

2. Prove the error bound for the Spectral Bloom Filter.

Riferimenti

Documenti correlati

State and prove the lower bound for sorting n items in the two-level memory model with parameters M (internal memory) and B (disk-page size).. State and prove the average time bound

Definition (the preorder induced by a separable capitalization factor). Let ≥ fe the usual preorder of the financial events. We shall give two proofs: algebraic and geometrical. the

For a real number x whose fractional part is not 1/2, let hxi denote the nearest integer

[r]

(comment the various steps) Ex 3 [ranks 4] State and prove how the hamming distance between two binary vectors v and w of length n is related to the probability of declaring

 State and prove how the hamming distance between two vectors v and w is related to the probability of declaring a “match” between LSH-fingerprints of v and w, according to

Finally, is it sensible to assume that the elasticity of substitution across products of the same …rm does not di¤er from that across products of di¤erent …rms, thus limiting the

The cases with the higher electric fields has been shown because a more clear effect of the target field on the polarization of the ferroelectric is Data acquisitions