• Non ci sono risultati.

Information Retrieval 20 July 2015 Ex 1 [ranks 3+3]

N/A
N/A
Protected

Academic year: 2021

Condividi "Information Retrieval 20 July 2015 Ex 1 [ranks 3+3]"

Copied!
1
0
0

Testo completo

(1)

Information Retrieval

20 July 2015

Ex 1 [ranks 3+3] Describe the iterative algorithm that binary encodes a fractional number in [0,1), and then comment on its correctness.

Ex 2 [points 3] State the Johnson-Linderstrauss lemma and comment on its usefulness in terms of document comparison.

Ex 3 [points 3+3]

 State the difference between hard and soft clustering.

 Show a graphical example of points for which a k-means clustering algorithm (with k=2) converges to two different partitionings, depending on the

positions of the two starting seeds.

Ex 4 [points 2+3+3]

 Define the TF-IDF(w,d) weight for a word w in a document d

 Given the four texts:

a. T1=”la bella casa”

b. T2=”le belle cose”

c. T3=”casa bella bella”

d. T4=”la casa le cose bella rosa”

Show the inverted list with its dictionary and postings, compute the vectors TF-IDF of the four texts. Indicate the text which is more similar to T3 under the dot-product distance.

 Show how the TF-IDF matrix is space-efficiently stored in the posting lists.

Ex 5 [points 3+4]

 Describe the algorithm that solves the top-k in auto-completion search, provided that k is small and fixed at preprocessing time.

 Describe the algorithm that solves the top-k in auto-completion search, provided that k is not known at preprocessing time and possibly large (hint:

you need an additional data structure, which one?).

Riferimenti

Documenti correlati

Petriccione, “Sulla misurabilit`a delle famiglie dei sistemi di k iperpiani passanti per un punto fisso dello spazio affine A 7 ”, Atti Accad.. Petriccione, “Sulla misurabilit`a

Solution proposed by Roberto Tauraso, Dipartimento di Matematica, Universit`a di Roma “Tor Vergata”, via della Ricerca Scientifica, 00133 Roma,

[r]

(American Mathematical Monthly, Vol.118, February 2011) Proposed by Mihaly Bencze (Romania). For a positive integer k, let α(k) be the largest odd divisor

[r]

 Discuss its advantages in crawling with respect to the simple scheme that allocates URL to crawlers based on a static url_domain->crawler scheme.  Simulate its execution over

Ex 5 [points 3] Compute the probability of error for a Bloom Filter that uses a bit- array of m bits, k hash functions and indexes a dictionary of

Ex 2 [ranks 3] Given the string S=aaba, compute the length of its Arithmetic encoding without running the entire algorithm, and comment