• Non ci sono risultati.

Information Retrieval 16 January 2015 Ex 1 [ranks 4+2]

N/A
N/A
Protected

Academic year: 2021

Condividi "Information Retrieval 16 January 2015 Ex 1 [ranks 4+2]"

Copied!
1
0
0

Testo completo

(1)

Information Retrieval

16 January 2015

Ex 1 [ranks 4+2]Given the string S=bbba, compute its Arithmetic encoding based on the empirical frequencies of symbols ‘a’ and ‘b’, and compare against the empirical entropy of S.

Ex 2 [points 3+5] Given the matrix of pair-wise similarities between five items

Show the next cluster formed by the agglomerative clustering algorithm, given that we have already formed the clusters {(I1,I2), (I3,I4), (I5)}, and based on the following similarity functions:

 MAX

 SECOND MINIMUM, which defines the similarity between a pair of clusters as the second minimum distance between all pairs of items of the two clusters Ex 3 [points 3] Discuss the pro and cons of using skip pointers in AND-queries implemented via inverted lists.

Ex 4 [points 3+3+2] Given the string S=ababca, construct its suffix array and then show how it is executed the search for a string P=ab.

Indicate also the space in bits used by a Suffix Array built on a string of n characters, and the time complexity needed to search there for a string of p characters.

Ex 5 [points 3+2] Given the sequence of keys S = (1, 7, 4, 3, 11, 13, 12) insert them via cuckoo hashing in two tables of size m=5, and hash functions h1(k) = k mod 5, h2(k) = 3*k mod 5. Comment also on the possibility to insert the key 6.

Riferimenti

Documenti correlati

- Comment on the difference in building and querying between the Term-based versus Document-based partitioning in Distributed indexing. - Given two nodes in a directed graph,

[r]

Ex 6 [lode] Show how it is possible to efficiently identify whether two bit-vectors of size d and hamming distance k are highly similar using the LSH approach, and compute

Ex 1 [ranks 3+3+3] Given the words {abaco, adagio, baco, alano} construct a 2-gram index, and describe the algorithm that solves 1-edit distance over it. Finally test the

[r]

 Discuss its advantages in crawling with respect to the simple scheme that allocates URL to crawlers based on a static url_domain->crawler scheme.  Simulate its execution over

 Describe the algorithm that solves the top-k in auto-completion search, provided that k is not known at preprocessing time and possibly large (hint:. you need an additional

GIORNATE DI INCONTRO-COLLABORAZIONE TRA LE UNIVERSITA' DI PAVIA E SASSARI FACOLTA’ DI FARMACIA E SCIENZE MM.FF.NN.. Esperienze Didattiche e Scientifiche