• Non ci sono risultati.

Information Retrieval 29 June 2015 Ex 1 [ranks 4+3+3]

N/A
N/A
Protected

Academic year: 2021

Condividi "Information Retrieval 29 June 2015 Ex 1 [ranks 4+3+3]"

Copied!
1
0
0

Testo completo

(1)

Information Retrieval

29 June 2015 Ex 1 [ranks 4+3+3]Given the string S = alalabarda,

 Construct the wavelet tree, by assuming that the symbols in the leaves are ordered alphabetically

 Show how it is executed the operation Rank(a,6) over that WaveletTree

 Show how it can be implemented the operation Select(a,4) over that WaveletTree [which returns the position of the 4th letter ‘a’ in S]

Ex 2 [points 4] Given a suffix array SA for a string S, detail the algorithm that

establishes whether does it exist in S a substring s whose length is larger than L and repeats more than c times in S.

Ex 3 [points 3+3] Consider the Consistent hashing technique:

 Discuss its advantages in crawling with respect to the simple scheme that allocates URL to crawlers based on a static url_domain->crawler scheme

 Simulate its execution over the urls_ID = {1, 4, 8, 2, 6, 7, 13, 11} and the crawler_ID = {1,2,3}, by defining proper hash functions.

Ex 4 [points 3+4+3] Given a topic annotator, like TagME:

 Detail its goal

 Define the features: link probability of an anchor, commonness of a wiki-page wrt to an anchor

 Define the ratio behind the Milne-Witten similarity measure and where it is used in TagME

Riferimenti

Documenti correlati

c) Compute the document which is more similar to the query [cosa capo], by using the cosine similarity without dividing by the norms of the vectors. Ex 2 [rank 5] Let you be given

Ex 2 [points 4+3] Write and comment on the formula of PageRank and Personalized PageRank, and discuss how the Personalized PageRank can be used to estimate the “similarity” between two

Ex 6 [lode] Show how it is possible to efficiently identify whether two bit-vectors of size d and hamming distance k are highly similar using the LSH approach, and compute

 SECOND MINIMUM, which defines the similarity between a pair of clusters as the second minimum distance between all pairs of items of the two clusters Ex 3 [points 3] Discuss the

Ex 1 [ranks 3+3+3] Given the words {abaco, adagio, baco, alano} construct a 2-gram index, and describe the algorithm that solves 1-edit distance over it. Finally test the

[r]

 Describe the algorithm that solves the top-k in auto-completion search, provided that k is not known at preprocessing time and possibly large (hint:. you need an additional

Ex 5 [points 3] Compute the probability of error for a Bloom Filter that uses a bit- array of m bits, k hash functions and indexes a dictionary of