• Non ci sono risultati.

Information Retrieval 08 January 2014 Ex 1 [ranks 3]

N/A
N/A
Protected

Academic year: 2021

Condividi "Information Retrieval 08 January 2014 Ex 1 [ranks 3]"

Copied!
1
0
0

Testo completo

(1)

Information Retrieval

08 January 2014

Ex 1 [ranks 3] Describe and comment the cost function used in RankNet.

Ex 2 [ranks 3]Given the string S=baba, compute its Arithmetic encoding based on empirical frequencies.

Ex 3 [points 4+3] Given the following matrix of pair-wise similarity between 5 items

Show the next cluster formed by the agglomerative clustering algorithm based on MIN and MAX similarity-functions, given that we have already formed the following clusters {(I1,I2), (I3,I4), (I5)}

Ex 4 [points 4+3] Describe the Permuterm-index of a set of n strings, each of length L, and comment how to index those strings in order to execute efficiently the query P*Q, where P and Q are any substrings. Also comment on the time and space

complexity of your proposed solution given the parameters n, L, and the lengths |P|

and |Q|.

Ex 5 [points 3] Compute the probability of error for a Bloom Filter that uses a bit- array of m bits, k hash functions and indexes a dictionary of n strings.

Ex 6 [points 3+4] Assume that you have 7 items (6,1,3,7,2,4,9) and three servers whose IDs are (1,2,3), and you want to distribute those items among the three servers via consistent hashing. How would you do? (hint: Instead of working on ring [0,1], work on the integers in {0,1,..,10} using a universal hash function h_a(x)= a*x mod m)

Riferimenti

Documenti correlati

[r]

 SECOND MINIMUM, which defines the similarity between a pair of clusters as the second minimum distance between all pairs of items of the two clusters Ex 3 [points 3] Discuss the

Ex 1 [ranks 3+3+3] Given the words {abaco, adagio, baco, alano} construct a 2-gram index, and describe the algorithm that solves 1-edit distance over it. Finally test the

[r]

 Discuss its advantages in crawling with respect to the simple scheme that allocates URL to crawlers based on a static url_domain->crawler scheme.  Simulate its execution over

 Describe the algorithm that solves the top-k in auto-completion search, provided that k is not known at preprocessing time and possibly large (hint:. you need an additional

Show the algorithm that uses a k-gram index for restricting the E-approximate search over a subset of candidate strings within D.. Show the 3-gram index for D={barca, bit, borda,

Ex 2 [ranks 3] Given the string S=aaba, compute the length of its Arithmetic encoding without running the entire algorithm, and comment