• Non ci sono risultati.

Information Retrieval 11 January 2016 Ex 1 [rank 3]

N/A
N/A
Protected

Academic year: 2021

Condividi "Information Retrieval 11 January 2016 Ex 1 [rank 3]"

Copied!
1
0
0

Testo completo

(1)

Information Retrieval

11 January 2016

Ex 1 [rank 3] Given the two strings S1=baaa and S2=baacac use Z-Delta to compress S2 with respect to S1.

Ex 2 [points 2+3+4] Given a sequence of integers S=(1, 6, 15, 18, 21, 24, 30), encode them using:

 Gamma encoding

 Delta encoding

 Elias-Fano encoding

Ex 3 [ranks 2+4]Given the dictionary D={bingo, bins, box, bull, cat}

 construct a 3-gram index for D (by guaranteeing L 3-grams per L-long strings);

describe the algorithm that searches for the strings in D which are at 1-edit distance from P=binno.

Ex 4 [ranks 5+3]Let us given a graph G of directed edges {(1,3), (3,1), (3,2), (1,4), (2,1)}.

 Simulate the execution of two steps of the PageRank algorithm, starting with all nodes having score 1, and assuming a teleportation step which favors node 3 (hence, it jumps only to it).

 Simulate the HITS computation of a(3) and h(3) by assuming that all a- and h- scores are initially set to 1.

Ex 5 [ranks 2+2] In the context of a text annotator like TagMe, define

 the link probability of an anchor text a with respect to the Wikipedia knowledge graph.

 the commonness of the link between an achor a and a Wikipedia page p.

Ex 6 [lode] Show how it is possible to efficiently identify whether two bit-vectors of size d and hamming distance k are highly similar using the LSH approach, and compute the probability of detection as a function of d and k.

Riferimenti

Documenti correlati

 SECOND MINIMUM, which defines the similarity between a pair of clusters as the second minimum distance between all pairs of items of the two clusters Ex 3 [points 3] Discuss the

Ex 1 [ranks 3+3+3] Given the words {abaco, adagio, baco, alano} construct a 2-gram index, and describe the algorithm that solves 1-edit distance over it. Finally test the

[r]

 Discuss its advantages in crawling with respect to the simple scheme that allocates URL to crawlers based on a static url_domain->crawler scheme.  Simulate its execution over

 Describe the algorithm that solves the top-k in auto-completion search, provided that k is not known at preprocessing time and possibly large (hint:. you need an additional

Show the algorithm that uses a k-gram index for restricting the E-approximate search over a subset of candidate strings within D.. Show the 3-gram index for D={barca, bit, borda,

Ex 5 [points 3] Compute the probability of error for a Bloom Filter that uses a bit- array of m bits, k hash functions and indexes a dictionary of

Ex 2 [ranks 3] Given the string S=aaba, compute the length of its Arithmetic encoding without running the entire algorithm, and comment