• Non ci sono risultati.

Information Retrieval 15 February 2018

N/A
N/A
Protected

Academic year: 2021

Condividi "Information Retrieval 15 February 2018"

Copied!
1
0
0

Testo completo

(1)

Information Retrieval

15 February 2018

Name: Surname: Matricola:

Ex 1 [points 7] Given the four texts:

a. T1=”a beautiful cat”

b. T2=”cat after cat”

c. T3=”after a beautiful girl a cat”

d. T4=”girl after girl”

Show the inverted list built over them by using gamma-coding over the docID gaps.

Compute the TF-IDF vectors of the four texts above (logs are in base two).

Find the most similar text to T3 in the vector space model (do not apply any normalization to the cosine score).

Ex 2 [points 5]

Describe the LSI technique and comment on how a query is projected into a k- dimensional space. Comment also the value and significance of k.

Ex 3 [points 2+5]

Describe the algorithm for extracting bigrams (collocations) based on statistical information on the constituting words and PoS tagging (noun, verb, adjective, etc.).

Describe the RAKE algorithm for extracting long keywords from texts, and comment its pros/cons.

Ex 4 [points 6]

Show how to synchronize via zsync the the old file “acaddbdabb” (on the client) by means of new file “bacaddabb” (on the server), setting blocks of 3 chars each.

Ex 5 [points 5] Given the binary vectors A = 10111, B = 10010, C = 00010, D = 00000, E = 01100; apply LSH for approximating the hamming distance by setting k=2, L=2 and projecting sets I1 = {0,3} and I2 = {3,4}.

Riferimenti

Documenti correlati

Finally simulate its working over the sequence 3,10,1,6,2,5,3, by showing the sorted blocks it forms with a memory capable to store M=2 integers.. Exercise

Show how to synchronize via rsync the new file “bacaddabb” (on the server) using the old file “acabbbdabac” (on the client) and blocks of size 3 chars. Given the graph:.. a)

- by Zdelta applied to the group of those strings using gzip as basic compressor and deriving their concatenation order via the approach based on a weighted graph4. Describe the

[r]

 Compute two steps of PageRank by assuming uniform starting probability and parameter ¼ for the teleportation step (hint: try to operate only on fractions)..  Assume now that

c) Compute the document which is more similar to the query [cosa capo], by using the cosine similarity without dividing by the norms of the vectors. Ex 2 [rank 5] Let you be given

The search engine must support the following query: the author names must be matched by the corresponding keywords; and the publishing year must occur in the specified range

 Good query optimization techniques (lecture!!) mean you pay little at query time for including stop words..  You need them for phrase queries