• Non ci sono risultati.

Compressed Rank & Select on general strings

N/A
N/A
Protected

Academic year: 2021

Condividi "Compressed Rank & Select on general strings"

Copied!
14
0
0

Testo completo

(1)

Paolo Ferragina, Università di Pisa

Compressed Rank & Select on general strings

Paolo Ferragina

Dipartimento di Informatica, Università di Pisa

(2)

Paolo Ferragina, Università di Pisa

Generalised Rank and Select

 Rank(c,i) = #c in L[1,i]

 Select(c,i) = position of the i-th c in L

L = a b a a a c b c d a b e c d ...

Rank( a , 7 ) = 4

Select( a , 2 ) = 3

(3)

Generalised Rank and Select

If  is small (i.e. constant)

 Build binary Rank data structure for each symbol of

 Rank takes O(1) time and small space

If  is large (words ? )

 Need a smarter solution:

Wavelet Tree

data structure

Algorithmic reduction:

>> Reduce Rank&Select over arbitrary strings

... to Rank&Select over binary strings

(4)

Paolo Ferragina, Università di Pisa

The Wavelet Tree

a c

b r

d

abracadabra

(Alphabetic ?) Tree

(5)

The Wavelet Tree

a c

b r

d

abracadabra

aacaaa brdbr

brbr

rr

?

aaaaa

?

bb

?

d

?

(6)

Paolo Ferragina, Università di Pisa

The Wavelet Tree

a c

b r

d

abracadabra

aacaaa brdbr

brbr

abracadabra 01100010110

aacaaa 001000

brdbr 00100 brbr

0101

01100010110

001000 00100

0101

Fact. Given the tree and the binary strings, we can recover the original string !!

In any case, O(|| log ||) bits.

Easier Alphabetic order + Heap structure

(7)

brdbr

00100 abracadabra

01100010110

brbr

0101

aacaaa

001000

The Wavelet Tree

a c

b r

d Rank(b,8)

Rank(b,3) Rank(b,2)

Reduce toright symbols

Reduce toleft symbols

It’s binary

Every step can be turned to binary

(8)

Paolo Ferragina, Università di Pisa

abracadabra 01100010110

Rank1(8)=3

Rank0(2) =

2 – Rank1(1)= 1 Rank0(3) =

3 – Rank1(3)= 2 brbr

0101

brdbr

00100

aacaaa

001000

The Wavelet Tree

a c

b r

d

Generalised R&S implemented with log || binary R&S Rank(b,8)

Right move

=Rank1

Left move

=Rank0

Left move

=Rank0

Select is similar

(9)

Paolo Ferragina, Università di Pisa

Representing Trees

Paolo Ferragina

Dipartimento di Informatica, Università di Pisa

(10)

Standard representation

Binary tree: each node has two

pointers to its left and right children An n-node tree takes

2n pointers or 2n lg n bits.

Supports finding left child or right child of a node (in constant time).

For each extra operation (eg. parent, subtree size) we have to pay additional n lg n bits each.

x

x x x

x

x x x x

(11)

Can we improve the space bound?

 There are less than 2

2n

distinct binary trees on n nodes.

 2n bits are enough to distinguish between any two different binary trees.

 Can we represent an n node binary tree using 2n bits?

(12)

Binary tree representation

 A binary tree on n nodes can be represented using 2n+o(n) bits to support:

parent

left child

right child

in constant time.

(13)

Heap-like notation for a binary tree

1 1

1 1

1 1

1

1

0

0 0 0

0 0

0 0

0 Add external nodes

Label internal nodes with a 1 and external nodes with a 0

Write the labels in level order

1 1 1 1 0 1 1 0 1 0 0 1 0 0 0 0 0

One can reconstruct the tree from this sequence

An n node binary tree can be represented in 2n+1 bits.

What about the operations?

(14)

Heap-like notation for a binary tree

1 1 1 1 0 1 1 0 1 0 0 1 0 0 0 0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

8

5 6 7

4

3 2

1

9

17 16

15 14

12 13 11

10 1

8 7

6 5

4

2 3

1 2 3 4 5 6 7 8 parent(x) = On red (⌊x/2⌋)

left child(x) = On green(2x)

right child(x) = On green(2x+1) x  x: # 1’s up to x (Rank) x  x: position of x-th 1 (Select)

Riferimenti

Documenti correlati

With regard to the first aspect, Fulkerson says that Ovid will “undermine Vergil”, “as if his predecessors have missed a trick, either through incompetence or some more

In this paper, starting from any root system Φ with finite Coxeter group W and any W -invariant building set, we describe an explicit realization of the real spherical model as a

We have presented K ALMAN NB, a novel Na¨ıve Bayes model able to manage automatically the concept drift in data streams with categorical features by exploiting the Kalman Filter..

involved in any activity or occupation that, by its nature or the circumstances in which it is carried out, is likely to harm their health, safety, or morals. e) Light work is

Extending social protection by the amounts specified in Table 1 would reduce the number of children in child labour by 15.1 million by the end of 2022, more than

We also provide matching lower and upper bounds for partial sum and predecessor queries within attractor- bounded space, and extend our lower bounds to encompass navigation

States Parties shall take appropriate measures to ensure that a child who is seeking refugee status or who is considered a refugee in accordance with applicable international

++ Changes in the definition of birth registration were made from the second and third rounds of MICS (MICS2 and MICS3) to the fourth round (MICS4). In order to allow for