• Non ci sono risultati.

Information Retrieval 29 October 2018

N/A
N/A
Protected

Academic year: 2021

Condividi "Information Retrieval 29 October 2018"

Copied!
1
0
0

Testo completo

(1)

Information Retrieval

29 October 2018

Name: Surname: Matricola:

Ex 1 [points 5+3]

Given a set S of strings {bio, bionic, bit, bitly, buzz, car, caso, cast, zoo},

- show the 2-level indexing of S with Front Coding, by assuming a page size on disk able to contain 3 strings.

- show how it is executed the search for the string P=bis.

Ex 2 [points 4+3]

Given the three strings s1 = “abaco”, s2 = “dada”, s3 = “coaba”, show the compression

- by gzip with window size w=3 chars when concatenated as s1 s2 s3,

- by Zdelta applied to the group of those strings using gzip as basic compressor and deriving their concatenation order via the approach based on a weighted graph. (Hint: refer to “compression of a group of files” by Zdelta)

Ex 3 [points 3+4+4+4]

Answer to the following questions:

1. Describe the role of the Priority Queue in Mercator, and how it is populated.

2. Describe the search-engine indexing based on term-partitioning and the one based on document-partitioning, and then discuss their pro/cons with respect to simplicity and query efficiency.

3. Describe how LSH is applied to find binary vectors which are close according to the hamming distance, and then compute the probability that two vectors with hamming distance H are declared “similar” by LSH.

4. Show how the Jaccard similarity between two sets A and B can be estimated via min-hashing, as a function of the number of extracted minima.

Riferimenti

Documenti correlati

Apply your algorithm over G for one step only and with the teleportation step occurring with

Ex 2 [points 4+3] Write and comment on the formula of PageRank and Personalized PageRank, and discuss how the Personalized PageRank can be used to estimate the “similarity” between two

Ex 1 [ranks 3+2] Describe the dynamic indexing technique, assuming blocks of power-of-two number of documents, and compute the time complexity of each document insertion,

Our study was aimed at determining the circulating levels of serotonin and oxytocin in patients who participated in an AAAs program with a dog during dialysis treatment.. Abstract:

L’intervento prevede la pulitura di tutta la superfi cie e successivo consolidamento dell’intonaco staccato tramite iniezioni puntuali di malta fl uida caricata con resina

In microalgae, the role of the xanthophyll cycle does not seem to be homogeneous: zeaxanthin accumulation in the model green alga Chlamydomonas reinhardtii has been reported to

It is argued that variation within one grammar is due to the expression of different information-structural categories and word order change in- volves a change in the

Keywords: finite presemifield, finite semifield, autotopism, autotopism group, Cordero- Figueroa semifield, Figueroa’s presemifields.. MSC 2000 classification: primary 12K10,