• Non ci sono risultati.

Information Retrieval 4 April 2019 Name: Surname: Matricola:

N/A
N/A
Protected

Academic year: 2021

Condividi "Information Retrieval 4 April 2019 Name: Surname: Matricola:"

Copied!
1
0
0

Testo completo

(1)

Information Retrieval

4 April 2019

Name: Surname: Matricola:

Ex 1 [points 5+5+5]

 Define what is a Bloom filter, the operations supported, and show the formula of the error rate (as a function of the space and number of keys) and prove its correctness.

 Show how the Jaccard similarity between two sets A and B can be estimated via min-hashing, as a function of the number of extracted minima.

 Define Link Probability and Commonness as they are used by entity annotators, like TagMe.

Ex 2 [points 4+3]

Given the three strings s1 = “asta”, s2 = “balla”, and s3 = “basta”, show the compression

- by gzip with window size w=5 chars when concatenated as s1 s2 s3,

- by Zdelta applied to the set of those strings deriving their concatenation order via the approach based on a weighted graph where the cost of edges is

estimated via gzip. (Hint: refer to “compression of a group of files” by Zdelta)

Ex 3 [points 4+4]

Given the graph

Execute one step of the PageRank algorithm, assuming uniform teleportation step, alpha = ½ and uniform starting probability.

Execute one step of the Personalized PageRank algorithm with respect to the node 1, assuming alpha = ½ and uniform starting probability.

Riferimenti

Documenti correlati

Find the most similar text to the query q = [one one toy] in the vector space model (no

Show how to compute via Bloom Filter the intersection between two sets A and B, each one stored on a different machine, and estimate its communication cost (in

 Compute two steps of PageRank by assuming uniform starting probability and teleportation step with parameter ¼, hence alpha = ¾ (hint: try to operate only on fractions). 

 Define what is a Spectral Bloom filter, the operations supported, and show the formula of the error rate (as a function of the space and number of keys) and prove its correctness.

[r]

 Simulate the execution of one step of the PageRank algorithm, starting with nodes 1 and 2 having score ¼ and node 3 having score ½, and assuming a teleportation step which

Exercise [rank 5] Given the string S=baobab, compute the Suffix Array, and then derive the LCP array by using the linear-time algorithm seen in class, showing the amount of

Exercise [rank 5] Given the string S=abaa, compute its Arithmetic encoding based on empirical frequencies by emitting the final binary compressed string. (hint: please work with