Nicol`o Felicioni1
Dipartimento di Elettronica e Informazione Politecnico di Milano
nicolo . felicioni @ polimi . it
March 23, 2021
1Mostly based on Nicholas Mainardi’s material.
A Non Deterministic FSAis formally defined as a quintuple (Q, I, δ, q0, F), where:
Q, is the set of states of the automata
I is the alphabet of the input string which will be checked δ : Q × I 7→ ℘(Q) the transition function
q0∈ Q the (unique) initial state from where the automaton starts
F ⊆ Q the set of final accepting states of the automaton The transitive closure δ∗ is defined as:
δ∗(q, ) = {q}, δ∗(q, y .i ) =S
q0∈δ∗(q,y )δ(q0, i ) The string x is accepted ⇐⇒ δ∗(q0, x ) ∩ F 6= ∅
It is always possible to make a ND-FSA deterministic! =⇒ FSA and ND-FSA are equivalent! Indeed, using a subscript D to mark the elements of the FSA and the subscript N to mark the elements of the ND-FSA, we can derive the former from the latter with the following construction:
QD = ℘(QN) δD(qD, i ) =S
qN∈qDδN(qN, i ) FD = {qd | FN∩ qd 6= ∅}
The idea is that since ND-FSA goes to a set of states for the same input character, we can consider this set as a new state and the automaton becomes deterministic. The cost is a potential exponential factor in the number of states, since ℘(Q) = 2|Q|
A legitimate question: if ND-FSA and FSA are equivalent, why do we care about ND-FSA?
⇓
Because they can be way easier to design for some languages!
Therefore, we can employ this additional strategy to build a FSA for a language:
1 Design a ND-FSA able to recognize it
2 Derive a deterministic FSA using the algorithm we have just seen
ND-FSA for L = (0 | 1)∗0(0|1)3n0(0 | 1)∗ | n ≥ 0 q0
start q1
q2 q3
qf
0, 1
0 0
0, 1
0, 1 0, 1
0, 1
Deterministic version? Difficult to be designed from scratch Let’s try the determinization algorithm . . .
Expanding q0
q0
start 0 q01
1
Expanding q01
q0
start q01
q02
q012f
0 1
0 1
Expanding q02 q0
start q01
q02
q012f
q03
q013 0
1
0 1
0 1
Expanding q03 q0
start q01
q02
q012f
q03
q013 0
1
0 1
0 1
0,1
Expanding q013
q0
start q01
q02
q012f
q03 q013
q012 0
1
0
1 0 1
0,1
0
1
Expanding q012 q0
start q01
q02
q012f
q03 q013
q012
q023 qall
0
1
0
1 0 1
0,1
0
1 0 1
Expanding q023 q0
start q01
q02
q012f
q03 q013
q012
q023 qall
0
1
0
1 0 1
0,1
0
1 0 1
0,1
Expanding q012f q0
start q01
q02
q012f
q03 q013
q012
q023 qall
q023f 0
1
0
1 0 1
0,1
0
1 0 1
0,1
0 1
Expanding q023f q0
start q01
q02
q012f
q03 q013
q012
q023 qall
q023f
q013f 0
1
0
1 0 1
0,1
0
1 0 1
0,1
0 1
0,1
Expanding q013f q0
start q01
q02
q012f
q03 q013
q012
q023 qall
q023f
q013f 0
1
0
1 0 1
0,1
0
1 0 1
0,1
0 1
0,1 0,1
Expanding qall q0
start q01
q02
q012f
q03 q013
q012
q023 qall
q023f
q013f 0
1
0
1 0 1
0,1
0
1 0 1
0,1
0 1
0,1 0,1
0,1
An Example
Design an automaton able to recognize this language L = {0|1}∗1{0|1}4
q0
start q1 q2
q3
q4
qf
0, 1
1 0, 1
0, 1
0, 1 0, 1
Instead, how would have we recognized this language with a deterministic FSA?...
⇓
.. We would have needed a state for each possible sequence of the last 5 digits read, with accepting states being only the ones corresponding to sequences where the fifth-to-the-last digit is 1
⇓
25 = 32 states, with 16 of them being accepting ones.
Deterministic FSA for the language L = {0|1}∗1{0|1}2
000
start 001 010 011
100 101 110 111
0
1 0
1
0 1
0 1
0 1
0 1
0
1 0
1
Definition
Recall the deterministic requirement of PushDown Automaton:
∃α ∈ Γ, q ∈ Q(δ(q, , α) 6=⊥ =⇒ @i ∈ I (δ(q, i, α) 6=⊥)) If we remove this requirement and we modify the transition function to:
δ : Q × {I ∪ } × Γ 7→ ℘F(Q × Γ∗)
We obtain aNon-Deterministic PushDown Automaton - NPDA NPDA are more powerful than PDA:
1 Union of two languages recognizable by a PDA can be trivially performed
2 They can recognize languages where ”guessing” about the structure of a string is needed
L = {anbn∪ anbn2 | n ≥ 1}
This language is a classical counterexample to show that PDA are not close w.r.t the union of languages. But with a NPDA...
q0 start
q1
q10 q2
qf
aZ0| AZ0 aA | AA
bA |
A | bA |
Z0 | Z0
bA |
Z0 | Z0
A |
Z0 | Z0
L = {wwR | w ∈ {a, b}∗}
The problem with a PDA is that there is no marker which tells when w is finished, and thus we do not not know when we can start popping the stack to recognize wR. No longer an issue with a NPDA:
q0
start
q1 q2 qf
aZ0| AZ0 bZ0 | BZ0
aA | AA bB | BB aB | AB bA | BA
aA | bB |
aA | bB |
Z0 | Z0
Consider again the enriched parrot language:
L = {wcw | w ∈ {a, b}∗} ∪ {wcwR | w ∈ {a, b}∗}. Its recognition with a one tape TM is straightforward with a non-deterministic TM:
q0
start
q1
q2
q3
q4
qf
a␢ | Z0, (S, R) b␢ | Z0, (S, R) c␢ | Z0, (R, S)
a␢ | A, (R, R)
b␢ | B, (R, R) c␢ | ␢, (S, L) c␢ | ␢, (R, L)
cA | A, (S, L) cB | B, (S, L)
cZ0| Z0, (R, R) aA | A, (R, L)
bB | B, (R, L)
␢Z0| Z0, (S, S)
aA | A, (R, R) bB | B, (R, R)
␢␢ | ␢, (S, S)
Recall
The enriched parrot language can be recognized with a 1 tape deterministic TM too! How?
1 The TM starts recognizing the sublanguage wcwR, using the tape as a stack
2 In case of a failure, the head of the tape is moved back to the first cell, while the head of the input tape is moved back to the first cell after the one containing the c character.
3 The tape can now be used as a queue to recognize the parrot language.
At step 2, the TM performs backtracking, which is exactly the same behavior of a Non-Deterministic TM!
Backtracking is the core mechanism to simulate a ND-TM with a deterministic one!
Suppose we want to design an automaton able to recognize the language L = {anbncn| n ≥ 1}C.
We can exploit a kind of ”divide et impera” strategy We split the language in sublanguages, and we design an automaton for each of these sublanguages
Then, we merge these sublanguages by allowing a non-deterministic choice among these 3 automata How to split the language?
L1 : (a+b+c∗)C L2 : {anbmc∗| n 6= m}
L3 : {a∗bncm| n 6= m}
Automaton for L1 A FSA is sufficient!
q0
start q1 q2
q3
qe
a b, c
a
b
c
b a c
a, b c a, b, c
Automaton for L2
A Deterministic PDA is sufficient!
q0
start q1
q2
q3 q4
aZ0| BZ0
bZ0| Z0 aB | AB
aA | AA
bA | bB |
cA | Z0
cB |
bA | bZ0| Z0
bB |
cA | Z0
cZ0| Z0
cB | bZ0| Z0
cZ0| Z0
Automaton for L3
A Deterministic PDA is sufficient!
q0
start
q1 q2 q3
aZ0| Z0
cZ0| Z0 bZ0| CZ0
bB | BB bC | BC
cB |
cC |
cB | cZ0| Z0
cC |
cZ0| Z0
We can now merge the 3 automata with a non-deterministic choice among the 3 initial states of the automata:
Recognizer for L = {anbncn| n ≥ 1}C
q0
start
FSA L1
PDA L2
PDA L3
Z0| Z0
Z0| Z0
Z0| Z0
The resulting automaton is a Non-Deterministic PDA
NBFor the sake of correctness, the FSA for L1 must be turned into a PDA which does not alter the stack
We want to design the automation with minimum computational power required to recognize the language
L = {wcw | w ∈ {a, b}∗}C
Again, we split in different sublanguages, each breaking one constraint imposed by the original language on its strings Which sublanguages?
1 L1={x |∃α, β, γ, δ∈{a, b}∗((x =α.b.β.c.γ.a.δ ∨
x =α.a.β.c.γ.b.δ) ∧ |α|=|γ|)}: break the constraint that each character in the first half of the string is equal to the
corresponding character in the second half of the string
2 L2= {(a | b)∗c(a | b)∗}C: break the structure of string
3 L3= {x | ∃α, β ∈ {a, b}∗(x = α.c.β ∧ |α| 6= |β|)}: break the constraint that the string is split by the single character c in two equally long portions
Then, we merge these sublanguages with a non deterministic choice among the 3 corresponding automata
Automaton for L1 A NPDA is sufficient:
q0
start
q1
q2
q3
q4
qf
aZ0| MZ0
bZ0| MZ0
aM | MM bM | MM
aZ0| Z0
aM | M
bZ0| Z0
bM | M aM | M bM | M aZ0| Z0bZ0| Z0
cZ0| Z0 cM | M
aM | M bM | M aZ0| Z0bZ0| Z0
cZ0| Z0 cM | M
aM | bM |
bZ0| Z0
aM | bM | aZ0| Z0
aZ0| Z0
bZ0| Z0
Automaton for L2 A FSA is sufficient:
q0
start q1 qe
a, b
c
a, b
c a, b, c
Automaton for L3 A DPDA is sufficient:
q0
start
q1 q2 q3
aZ0|NZ0bZ0|NZ0
aN|MN bN|MN aM|MM bM|MM
cM | M cN | N
cZ0| Z0 aM | bM |
aN | bN |
aZ0| Z0 bZ0| Z0
aZ0| Z0 bZ0| Z0
We want to recognize L = (ww | w ∈ {a, b}∗)C. Which automaton?
Main Idea: Decomposition Into Sub-Languages! How?
⇓ L = L1∪ L2∪ L3
L1 : {x | |x| = 2k + 1, k ≥ 0}
L2 : {x | ∃α, β, γ(x = α.a.β.b.γ ∧ |α.γ| = |β|)}
L3 : {x | ∃α, β, γ(x = α.b.β.a.γ ∧ |α.γ| = |β|)}
Can we recognize L2 and L3 with a NPDA?
Yes, we can split β in two parts: β = β1.β2, |β1| = |α| ∧ |β2| = |γ|
Recognizer for L = (ww | w ∈ {a, b}∗)ˆ
q0
start
q1 q2
q3
q4
q5
q6
q7
q8 qf
Z0| Z0
Z0| Z0
aZ0| Z0 bZ0| Z0
aZ0| Z0bZ0| Z0
aZ0| MZ0
bZ0| MZ0
bM | MM aM | MM
aZ0| Z0 aM | M
bZ0| Z0 bM | M
aM | bM |
Z0| Z0
aM | bM |
Z0| Z0
aZ0| MZ0bZ0| MZ0
bM | MM aM | MM bZ0| Z0 bM | M aZ0| MZ0bZ0| MZ0
bM | MM aM | MM
aZ0| Z0 aM | M aM |
bM |
Z0| Z0
Determine the weakest automaton able to recognize the following languages:
1 L1 = {anbn| n ≥ 1} ∪ {anb2n| n ≥ 1}
2 L2 = {1anbn| n ≥ 1} ∪ {2anb2n| n ≥ 1}
3 L3 = {an1bn| n ≥ 1} ∪ {an2b2n| n ≥ 1}
4 L4 = {anbn1 | n ≥ 1} ∪ {anb2n2 | n ≥ 1}
⇓
L1, L4 : NPDA! We cannot determine the number of b to be counted L2, L3 : PDA! 1 and 2 allows to determine the number of b to be
counted
Determine the weakest automaton able to recognize the following languages:
1 L1 = {anbpbp | n ≥ 1, p ≥ 1} ∪ {anbpan| n ≥ 1, p ≥ 1}
2 L2 = {anbn| n ≥ 1} ∪ {bncn| n ≥ 1}
3 L3 = {anbnc+| n ≥ 1} ∪ {a+bncn| n ≥ 1}
⇓
1 L1 = {anb2p| n ≥ 1, p ≥ 1} ∪ {anbpan| n ≥ 1, p ≥ 1} ⇒ PDA: if the number of b is odd, then the sequence an after bp is mandatory, otherwise it is optional
2 L2: Recognize anbn (respectively bncn) if the string starts with a (resp. b) ⇒ PDA
3 L3: The sub-languages can neither be recognized
simultaneously nor be distinguished at the beginning of the string ⇒ NPDA
What about complements of L1, L2 and L3?
1 LC1, LC2: PDA are closed with respect to complement. Both L1 and L2 are recognized by a PDA ⇒ their complements are recognized by a PDA too
May they be recognized by a FSA? No, as FSA are closed w.r.t. complement too!
2 L5 = LC3: The complement of a language recognized by a NPDA cannot be recognized by a PDA. Why?
If L5is recognized by a PDA, then its complement (i.e., L3) must be recognized by a PDA too
Thus L5 can be recognized by either a NPDA or a TM L5= {anbnc+| n ≥ 1}C ∩ {a+bncn| n ≥ 1}C
Necessary to simultaneously verify that the number of a is different from the number of b and that the number of b is different from c ⇒ Impossible with a stack ⇒ TM
What about LC1 ∪ LC2?
LC1 ∪ LC2 = (L1∩ L2)C = {a2pb2p| p ≥ 1}C ⇒ PDA