Two equivalent problems
RMQ on integer arrays RMQ on ±1 integer arrays
LCA on a Cartesian Tree
Euler tour of the Cartesian tree
RMQ on the array of node-levels
LCA on general trees RMQ on ±1 integer arrays
Euler tour of the Cartesian tree
RMQ on the array of node-levels
Search for k-mismatches
Problem: Find longest match between P[i,…] and T[j,…]
CCGTACGATCAGTACAGTACAGTACTTTTTTAAACCGGAGACTACA
T
CCGAACTATC
P
Data Structure
1. Concatenate P and T into a string X = T$P
2. Construct a data structure on X that retrieves FAST the longest match between any pair of suffixes of X
LCA or LCP query
If O(1) O(k) time
Suffix Tree & LCA
LCA
T#P = mississippi#si$
1 2 4 6 8 10 13
12
11 8
5 2 1 10 9
7 4
6 3
0
4
# i
ppi#
ssi
mississippi#
1
p
i# pi#
2 1
s
i
ppi# ssip pi#
3
si
ssip ppi# pi#
1
# 14
$
ppi# ssippi#
13
$
Longest match(3,13)
SA
12 11 14 8 5 2 1 10 9 13 7 4 6 3
Lcp
0 1 1 1 4 0 0 1 0 2 2 1 3
Suffix Array & LCP RMQ
T#P = mississippi#si
1 2 4 6 8 10 13
RMQ
sisippi#sissippi#
ssippi#
ssissippi#
LCP
LCP
Longest match(3,13)
Surprisingly, also LCA RMQ
The RMQ problem
RMQA(i,j) – returns the index of the smallest element in the subarray A[i..j].
10 2 12 1 12 13 21 15 14 10
RMQ(2,7) = 3
A[0] A[1] A[2] A[3] A[4] A[5] A[6] A[7] A[8] A[9]
Trivial solution: Precompute RMQ for every pair of indices.
This takes Q(n2) space, and O(1) query time
RMQ on a general array
10 25 22 34 7 19 9 12 26 16
A[0] A[1] A[2] A[3] A[4] A[5] A[6] A[7] A[8] A[9]
4 0
2
1 3
5 7
6
9 8
Cartesian Tree
RMQ(i,j) = LCA(i,j) on Cartesian trees
10 25 22 34 7 19 9 12 26 16
A[0] A[1] A[2] A[3] A[4] A[5] A[6] A[7] A[8] A[9]
4 0
2
1 3
5 7
6
9 8
Generic RMQ LCA RMQ
±1
LCA(u,v) = shallowest node between u and v during a depth first search traversal of T.
3
4 1 5
6 2 7
Euler tour3 2 3 1 4 1 7 1 3 5 6 5 3
1 2
3
4
6 9
10 8
5 7 11
12 0
LCA
T(4,6) = 3
Node at the lowest level
1 2 1 2 3 2 3 2 1 2 3 2 1
±1 array
Node Level
We are left with “ RMQ on ±1 array ”
RMQA(i,j) – returns the index of the smallest element in the subarray A[i..j].
10 11 12 11 12 13 14 15 14 13
RMQ(2,7) = 3
A[0] A[1] A[2] A[3] A[4] A[5] A[6] A[7] A[8] A[9]
Recall the trivial solution: Precompute RMQ for every pair of indices. This takes Q(n2) space, and O(1) query time
Sparse Table
Preprocess sub arrays of len 2k, for every k=0,1,…, log n
M(i,j) = index of min value in A[i, i+ 2j -1]
10 11 12 11 12 13 14 15 14 13
A[0] A[1] A[2] A[3] A[4] A[5] A[6] A[7] A[8] A[9]
M(2,0)=2 M(2,1)=3
M(2,2)=3
M(2,3)=3
Total space is O(n log n) RMQ query ?
Querying the Sparse Table
log( ) k j i
a1 ... ...
i j
2k elements 2k elements
Total space is O(n log n) RMQ query takes O(1) time
Bucketing
A’[i] = min in the i-th block of A.
A
... ... ... ... ...
blocks n
log 2n
l o g 2
n
A’[0] A’[i] A’[2n/logn]
… ...
B[0] B[i] B[2n/logn]
B’[i] is the position (index) of that min.
A’
Use the Bucketing
... ... ... ... ...
A’[0] A’[i] A’[2n/logn]
lo g 2 n
Preprocess A’ for RMQ using SparseTable
Space is (2n/log n) * log (2n/log n) = O(n)
RMQ queries on A’ take O(1) time
Preprocess every block of A’ for border RMQ
Space is O(n), border RMQ take O(1) time.
RMQ(i,j) takes O(1) time, if i,j lie in distinct blocks
A’
In-block RMQ over ±1 arrays
There are normalized blocks Set
Table[BlockEnc, i, j]
= RMQ(i,j), X ( , ) Y ( , ) i j RMQ i j RMQ i j
3 4 5 6 5 4 5 6 5 4
0 1 2 3 2 1 2 3 2 1 +1 +1 +1 -1 -1 +1 +1 -1 -1
( )
O n
log 1
2 2 ( )
n
O n
DX = DY
log2
( )O n n O n
X Y
Table entries
LZ-parsing (gzip)
T = mississippi#
1 2 4 6 8 10
12
11 8
5 2 1 10 9
7 4
6 3
0
4
# i
ppi#
ssi
mississippi#
1
p
i# pi#
2 1
s
i
ppi# ssip pi#
3
si
ssip ppi# pi#
1
#
ppi# ssippi#
<m><i><s><si><ssip><pi>
LZ-parsing (gzip)
T = mississippi#
1 2 4 6 8 10
12
11 8
5 2 1 10 9
7 4
6 3
0
4
# i
ppi#
ssi
mississippi#
1
p
i# pi#
2 1
s
i
ppi# ssippi#
si 3
ssip ppi# pi#
1
#
ppi# ssippi#
<ssip>
1. Longest repeated prefix of T[6,...]
2. Repeat is on the left of 6
It is on the path to 6
Leftmost occ
= 3 < 6
Leftmost occ
= 3 < 6
By maximality check only nodes
LZ-parsing (gzip)
T = mississippi#
1 2 4 6 8 10
12
11 8
5 2 1 10 9
7 4
6 3
0
4
# i
ppi#
ssi
mississippi#
1
p
i# pi#
2 1
s
i
ppi# ssip pi#
3
si
ssip ppi# pi#
1
#
ppi# ssippi#
<m><i><s><si><ssip><pi>
2
2 9
3
4
3
min-leaf
Leftmost copy
Parsing:
1. Scan T
2. Visit ST and stop when min-leaf ≥ current pos
Precompute the min descending leaf at every node in O(n) time.