• Non ci sono risultati.

Auto-completion Search

N/A
N/A
Protected

Academic year: 2021

Condividi "Auto-completion Search"

Copied!
17
0
0

Testo completo

(1)

Auto-completion Search

(2)

How it works

What’s the dictionary ?

(3)

Trie for the Dictionary

1

2 2

0

4

5

6

7

2 3

y

s

1 z

stile zyg

5 etic

ial ygy

aibelyite

czecin omo

Pro: O(p) search time = path scan

Cons: edge + node labels + tree structure

(4)

Top-1

What’s the ranking/scoring of the answers ?

1

2 2

0

4

5

6

7

2 3

y

s

1 z

stile zyg

5 etic

ial ygy

aibelyite

czecin omo

P = sy

8

2 1 4

8,1

(5)

Top-1: How to speed-up

How to compute the top-1 in O(1) time ?

1

2 2

0

4

5

6

7

2 3

y

s

1 z

stile zyg

5 etic

ial ygy

aibelyite

czecin omo

P = sy

8

2 1 4

4

1 7

2

3

5

8,1

1

(6)

Top-2

How to compute the top-2 in O(1) time ?

1

2 2

0

4

5

6

7

2 3

y

s

1 z

stile zyg

5 etic

ial ygy

aibelyite

czecin omo

P = sy

8

2 1 4

4,2

1,4 7,6

2

3

5 1,7

Top-k in O(1) time, but k× space

(7)

Top-k: How to squeeze ?

1

2 2

0

4

5

6

7

2 3

y

s

1 z

stile zyg

5 etic

ial ygy

aibelyite

czecin omo

P = sy

8

2 1 4

2

3

5

8 1

2 2

1 3

4 4

2 5

3 6

5 7 Score

String

Prefixed by P, proceed D&C

(8)

Top-k: How to squeeze ?

8 1

2 2

1 3

4 4

2 5

3 6

5 7 Score

String

Prefixed by P, proceed D&C

Let H be a max-heap of size k, keep also

min[H] and max[H]

Initialize H with k pairs <-, NULL>

Given the range <L,R> (here <1,4>)

Compute max-score in Array[L,R] (pos. M, value m)

If m ≤ min[H], skip;

else:

Insert <m,string> in H;

If size(H)>k then remove min[H];

Recurse on <L,M-1> and <M+1,R>, if not empty.

Time: O(k) time, and space

L R RMQ-query

in O(1) time and O(n) space

(9)

Example for Top-2

4 1

2 2

1 3

8 4

2 5

3 6

5 7 Score

String

Consider this other array

Range : operations

[1,7]: H  <8,4>; recurse on [1,3] and [5,7]

[1,3]: H={<8,4>}  <4,1>; recurse on [1,0] and [2,3]

[5,7]: H={<8,4>,<4,1>}  <5,7>; delete <4,1> from H, recurse on [5,6] and [8,7]

[2,3]: H={<8,4>,<5,7>}  <2,2>; since min[H]=5, not insert in H [5,6]: H ={<8,4>,<5,7>}  <3,6>; since min[H]=5, not insert in H

H = {<8,4> e <5,7>}

L R

(10)

A smarter approach

8 1

2 2

1 3

4 4

2 5

3 6

5 7 Score

String

Prefixed by P, proceed D&C

Let H be a max-heap, including items <val, string, [low,high]>

Compute max-score in Array[L,R] (pos. M, value m) i=0; insert <m, string[M], L, R> in H

While (i<k) do

Extract <x, string[X], Lx, Rx> from H, where x is max-value in H

Return String[X] as one of the top-k strings

Compute max-score in Array[Lx,X-1] (pos. M’, value m’)

insert <m’, string[M’], Lx, X-1>

Compute max-score in Array[X+1,Rx] (pos. M’’, value m’’)

insert <m’’, string[M’’], X+1, Rx>

i++;

Time: still O(k) time, and space

L R

(11)

Random access to postings lists

and other data types

(12)

A basic problem !

AbacoBattleCarColdCod ....

D

1 12 15 20 22....

Dog 

Array of n skip pointers to an array of m integers

(log m) bits per pointer = (n log m) bits = 32 n bits.

it is effective for few pointers

100001000001001000100 ....

B

Array of n string pointers to strings of total length m

(n log m) bits = 32 n bits.

it is independent of string length

We aim at achieving ≈ n log(m/n) bits < n log m

(13)

Rank/Select

00101001010101011111110000011010101....

B

Rank

b

(i) = number of b in B[1,i]

Select

b

(i) = position of the i-th b in B

Rank1(6) = 2 Select1(3) = 8

m = |B|

n = #1

Do exist data structures that solve this problem in

O(1) query time and very small

extra

space (i.e. +o(m) bits)

Wish to index the bit vector B (possibly

compressed).

(14)

The Bit-Vector Index: |B| + o(|

B|)

m = |B|

n = #1s

Goal. B is read-only, and the additional index takes o(m) bits.

00101001010101011 1111100010110101 0101010111000....

B

Z

8 18

(absolute) Rank1

Setting Z = poly(log m) and z=(1/2) log m:

Extra space is + (m/Z) log m + (m/z) log Z + o(m)

+ O(m loglog m / log m) = o(m) bits

Rank time is O(1)

Term o(m) is crucial in practice, B is untouched (not compressed)

0000 1 0 .... ... ...

1011 2 1 ....

block pos #1

z

(bucket-relative) Rank1

4 5 8

Rank

There exists a Bit-Vector Index taking o(m) extra bits

and constant time for Rank/Select.

B is needed and read-only!

(15)

z = 3, w=2

Elias-Fano (B is not needed)

If w = log (m/n) and z = log n, where m = |B| and n = #1 then

- L takes n w = n log (m/n) bits

- H takes n 1s + n 0s = 2n bits

0 1 2 3 4 5 6 7

In unary

Rank1(i) on B  Needs binary search over B

Select1(i) on B  uses L and (Select1(H,i) – i) in +o(n) space

(

Select

1

on H)

(16)

If you wish to play with Rank and Select

m/10 + n log (m/n)

Rank in 0.4 sec, Select in < 1 sec vs 32n bits of explicit pointers

Riferimenti

Documenti correlati

alla vigilia della sua missione in Corsica e Sardegna, richiedeva al cap- pellano-cantore Pere Sabater da Tortosa di procacciarsi in ogni modo «qualsevol xantres et tenoristes»,

Physical access structures describe how data is stored on disk to provide efficient

Just to stress the general point: it is clearly possible – and actually useful in general – to rely on the resources of HOL Light to develop formal proofs both about and within

In questo studio abbiamo monitorato 34 pazienti che sono stati sottoposti ad intervento di vitrectomia via pars plana per distacco di retina con introduzione di

and returns the redshift of the different velocity components (z), line width (b), and column density (log N) for each species. The derived metallicities and depletion patterns

to put all elements that are less than k to the left to put all elements that are greater than k to the right If the tree is balanced (i.e. it has approximately the same number of

to put all elements that are less than k to the left to put all elements that are greater than k to the right If the tree is balanced (i.e. it has approximately the same number of

to put all elements that are less than k to the left to put all elements that are greater than k to the right If the tree is balanced (i.e. it has approximately the same number of