Paolo Ferragina, Università di Pisa
Compressed Rank & Select on general strings
Paolo Ferragina
Dipartimento di Informatica, Università di Pisa
Paolo Ferragina, Università di Pisa
Generalised Rank and Select
Rank(c,i) = #c in L[1,i]
Select(c,i) = position of the i-th c in L
L = a b a a a c b c d a b e c d ...
Rank( a , 7 ) = 4
Select( a , 2 ) = 3
Generalised Rank and Select
If is small (i.e. constant)
Build binary Rank data structure for each symbol of
Rank takes O(1) time and small space
If is large (words ? )
Need a smarter solution:
Wavelet Tree
data structureAlgorithmic reduction:
>> Reduce Rank&Select over arbitrary strings
... to Rank&Select over binary strings
Paolo Ferragina, Università di Pisa
The Wavelet Tree
a c
b r
d
abracadabra
(Alphabetic ?) Tree
The Wavelet Tree
a c
b r
d
abracadabra
aacaaa brdbr
brbr
rr
?
aaaaa
?
bb
?
d
?
Paolo Ferragina, Università di Pisa
The Wavelet Tree
a c
b r
d
abracadabra
aacaaa brdbr
brbr
abracadabra 01100010110
aacaaa 001000
brdbr 00100 brbr
0101
01100010110
001000 00100
0101
Fact. Given the tree and the binary strings, we can recover the original string !!
In any case, O(|| log ||) bits.
Easier Alphabetic order + Heap structure
brdbr
00100 abracadabra
01100010110
brbr
0101
aacaaa001000
The Wavelet Tree
a c
b r
d Rank(b,8)
Rank(b,3) Rank(b,2)
Reduce toright symbols
Reduce toleft symbols
It’s binary
Every step can be turned to binary
Paolo Ferragina, Università di Pisa
abracadabra 01100010110
Rank1(8)=3
Rank0(2) =
2 – Rank1(1)= 1 Rank0(3) =
3 – Rank1(3)= 2 brbr
0101
brdbr
00100
aacaaa001000
The Wavelet Tree
a c
b r
d
Generalised R&S implemented with log || binary R&S Rank(b,8)
Right move
=Rank1
Left move
=Rank0
Left move
=Rank0
Select is similar
Paolo Ferragina, Università di Pisa
Representing Trees
Paolo Ferragina
Dipartimento di Informatica, Università di Pisa
Standard representation
Binary tree: each node has two
pointers to its left and right children An n-node tree takes
2n pointers or 2n lg n bits.
Supports finding left child or right child of a node (in constant time).
For each extra operation (eg. parent, subtree size) we have to pay additional n lg n bits each.
x
x x x
x
x x x x
Can we improve the space bound?
There are less than 2
2ndistinct binary trees on n nodes.
2n bits are enough to distinguish between any two different binary trees.
Can we represent an n node binary tree using 2n bits?
Binary tree representation
A binary tree on n nodes can be represented using 2n+o(n) bits to support:
parent
left child
right child
in constant time.
Heap-like notation for a binary tree
1 1
1 1
1 1
1
1
0
0 0 0
0 0
0 0
0 Add external nodes
Label internal nodes with a 1 and external nodes with a 0
Write the labels in level order
1 1 1 1 0 1 1 0 1 0 0 1 0 0 0 0 0
One can reconstruct the tree from this sequence
An n node binary tree can be represented in 2n+1 bits.
What about the operations?
Heap-like notation for a binary tree
1 1 1 1 0 1 1 0 1 0 0 1 0 0 0 0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
8
5 6 7
4
3 2
1
9
17 16
15 14
12 13 11
10 1
8 7
6 5
4
2 3
1 2 3 4 5 6 7 8 parent(x) = On red (⌊x/2⌋)
left child(x) = On green(2x)
right child(x) = On green(2x+1) x x: # 1’s up to x (Rank) x x: position of x-th 1 (Select)