• Non ci sono risultati.

Information Retrieval 01 February 2016 Ex 1 [rank 5]

N/A
N/A
Protected

Academic year: 2021

Condividi "Information Retrieval 01 February 2016 Ex 1 [rank 5]"

Copied!
1
0
0

Testo completo

(1)

Information Retrieval

01 February 2016

Ex 1 [rank 5] Show how to synchronize via rsync the new file “bacaddabbb” (on the server) using the old file “acabbbdab” (on the client) and blocks of size 3 chars.

Ex 2 [points 4+3] Given the array B=0100 0111 1011 0010,

 Construct the Rank data structure over B by setting Z=4 and z=2.

 Describe how it is executed Rank1(10) by deploying that data structure.

Ex 3 [ranks 5+3]Given the dictionary D={bingo, bino, bon, bull}

 Construct an efficient data structure for the 1-error match problem (i.e. three admitted kinds of errors: INS-DEL-SUBST).

Given the previous item, describe the algorithm that searches for the strings in D which are at 1-error from the pattern P=bin.

Ex 4 [ranks 3+3]Given two sets A and B,

 show a LSH-based solution to compute a representation (sketch) of the two sets having length L and such that the Jaccard distance between A and B can be estimated via the two sketches.

 let A={1, 4, 6, 7} and B={2, 3, 4, 6}, instantiate your solution above on this example by assuming that A and B are drawn from the Universe = {1, …, 7} and L=3.

Ex 5 [ranks 4] Given an annotator, such as TagMe, show how you could empower the TF-IDF vector representation in order to perform “semantic” comparisons between documents by using still the cosine similarity.

Ex 6 [lode] State and prove the Theorem for the 2-optimality of MTF.

Riferimenti

Documenti correlati

Assume that you use shingles of 3 words, hash functions generate random values in {0,…,10} (that you can choose as you want), and repeat min-hashing twice for estimating the

The search engine must support the following query: the author names must be matched by the corresponding keywords; and the publishing year must occur in the specified range

Ex 4 [points 2+4] Describe a dynamic index based on a small-big solution or a cascade of indexes, specifying the pros/cons of each of these solutions to the problem of adding

Ex 6 [lode] Show how it is possible to efficiently identify whether two bit-vectors of size d and hamming distance k are highly similar using the LSH approach, and compute

 SECOND MINIMUM, which defines the similarity between a pair of clusters as the second minimum distance between all pairs of items of the two clusters Ex 3 [points 3] Discuss the

Ex 1 [ranks 3+3+3] Given the words {abaco, adagio, baco, alano} construct a 2-gram index, and describe the algorithm that solves 1-edit distance over it. Finally test the

[r]

 Discuss its advantages in crawling with respect to the simple scheme that allocates URL to crawlers based on a static url_domain->crawler scheme.  Simulate its execution over