Non Deterministic Automata

(1)

Nicol`o Felicioni¹

Dipartimento di Elettronica e Informazione Politecnico di Milano

nicolo . felicioni @ polimi . it

March 23, 2021

1Mostly based on Nicholas Mainardi’s material.

(2)

A Non Deterministic FSAis formally defined as a quintuple (Q, I, δ, q₀, F), where:

Q, is the set of states of the automata

I is the alphabet of the input string which will be checked δ : Q × I 7→ ℘(Q) the transition function

q₀∈ Q the (unique) initial state from where the automaton starts

F ⊆ Q the set of final accepting states of the automaton The transitive closure δ^∗ is defined as:

δ^∗(q, ) = {q}, δ^∗(q, y .i ) =S

q⁰∈δ^∗(q,y )δ(q⁰, i ) The string x is accepted ⇐⇒ δ^∗(q₀, x ) ∩ F 6= ∅

(3)

It is always possible to make a ND-FSA deterministic! =⇒ FSA and ND-FSA are equivalent! Indeed, using a subscript D to mark the elements of the FSA and the subscript N to mark the elements of the ND-FSA, we can derive the former from the latter with the following construction:

Q_D = ℘(Q_N) δ_D(q_D, i ) =S

qN∈q_Dδ_N(q_N, i ) FD = {qd | F_N∩ q_d 6= ∅}

The idea is that since ND-FSA goes to a set of states for the same input character, we can consider this set as a new state and the automaton becomes deterministic. The cost is a potential exponential factor in the number of states, since ℘(Q) = 2^|Q|

(4)

A legitimate question: if ND-FSA and FSA are equivalent, why do we care about ND-FSA?

⇓

Because they can be way easier to design for some languages!

Therefore, we can employ this additional strategy to build a FSA for a language:

1 Design a ND-FSA able to recognize it

2 Derive a deterministic FSA using the algorithm we have just seen

(5)

ND-FSA for L = (0 | 1)^∗0(0|1)³ⁿ0(0 | 1)^∗ | n ≥ 0 q0

start q1

q₂ q₃

q_f

0, 1

0 0

0, 1

0, 1 0, 1

0, 1

Deterministic version? Difficult to be designed from scratch Let’s try the determinization algorithm . . .

(6)

Expanding q0

q0

start 0 q01

1

(7)

Expanding q01

q0

start q01

q02

q012f

0 1

(8)

Expanding q₀₂ q0

start q01

q02

q012f

q03

q₀₁₃ 0

1

0 1

(9)

Expanding q₀₃ q0

start q01

q02

q012f

q03

q₀₁₃ 0

1

0 1

0,1

(10)

Expanding q₀₁₃

q0

start q01

q₀₂

q_012f

q₀₃ q₀₁₃

q₀₁₂ 0

1

0

1 0 1

0,1

0

1

(11)

Expanding q₀₁₂ q0

start q01

q₀₂

q_012f

q₀₃ q₀₁₃

q₀₁₂

q₀₂₃ q_all

0

1

0

1 0 1

0,1

0

1 0 1

(12)

Expanding q₀₂₃ q0

start q01

q₀₂

q_012f

q₀₃ q₀₁₃

q₀₁₂

q₀₂₃ q_all

0

1

0

1 0 1

0,1

0

1 0 1

0,1

(13)

Expanding q_012f q0

start q01

q₀₂

q_012f

q₀₃ q₀₁₃

q₀₁₂

q₀₂₃ q_all

q_023f 0

1

0

1 0 1

0,1

0

1 0 1

0,1

0 1

(14)

Expanding q_023f q0

start q01

q₀₂

q_012f

q₀₃ q₀₁₃

q₀₁₂

q₀₂₃ q_all

q_023f

q_013f 0

1

0

1 0 1

0,1

0

1 0 1

0,1

0 1

0,1

(15)

Expanding q_013f q0

start q01

q₀₂

q_012f

q₀₃ q₀₁₃

q₀₁₂

q₀₂₃ q_all

q_023f

q_013f 0

1

0

1 0 1

0,1

0

1 0 1

0,1

0 1

0,1 0,1

(16)

Expanding q_all q0

start q01

q₀₂

q_012f

q₀₃ q₀₁₃

q₀₁₂

q₀₂₃ q_all

q_023f

q_013f 0

1

0

1 0 1

0,1

0

1 0 1

0,1

0 1

0,1 0,1

0,1

(17)

An Example

Design an automaton able to recognize this language L = {0|1}^∗1{0|1}⁴

q₀

start q₁ q₂

q3

q4

qf

0, 1

1 0, 1

0, 1

0, 1 0, 1

(18)

Instead, how would have we recognized this language with a deterministic FSA?...

⇓

.. We would have needed a state for each possible sequence of the last 5 digits read, with accepting states being only the ones corresponding to sequences where the fifth-to-the-last digit is 1

⇓

2⁵ = 32 states, with 16 of them being accepting ones.

(19)

Deterministic FSA for the language L = {0|1}^∗1{0|1}²

000

start 001 010 011

100 101 110 111

0

1 0

1

0 1

0

1 0

1

(20)

Definition

Recall the deterministic requirement of PushDown Automaton:

∃α ∈ Γ, q ∈ Q(δ(q, , α) 6=⊥ =⇒ @i ∈ I (δ(q, i, α) 6=⊥)) If we remove this requirement and we modify the transition function to:

δ : Q × {I ∪ } × Γ 7→ ℘_F(Q × Γ^∗)

We obtain aNon-Deterministic PushDown Automaton - NPDA NPDA are more powerful than PDA:

1 Union of two languages recognizable by a PDA can be trivially performed

2 They can recognize languages where ”guessing” about the structure of a string is needed

(21)

L = {aⁿbⁿ∪ aⁿbⁿ² | n ≥ 1}

This language is a classical counterexample to show that PDA are not close w.r.t the union of languages. But with a NPDA...

q₀ start

q₁

q₁⁰ q2

q_f

aZ₀| AZ₀ aA | AA

bA |

A | bA |

Z0 | Z₀

bA |

Z₀ | Z₀

A |

Z₀ | Z₀

(22)

L = {ww^R | w ∈ {a, b}^∗}

The problem with a PDA is that there is no marker which tells when w is finished, and thus we do not not know when we can start popping the stack to recognize w^R. No longer an issue with a NPDA:

q0

start

q1 q2 qf

aZ₀| AZ₀ bZ0 | BZ₀

aA | AA bB | BB aB | AB bA | BA

aA | bB |

Z0 | Z₀

(23)

Consider again the enriched parrot language:

L = {wcw | w ∈ {a, b}^∗} ∪ {wcw^R | w ∈ {a, b}^∗}. Its recognition with a one tape TM is straightforward with a non-deterministic TM:

q0

start

q1

q2

q3

q4

qf

a␢ | Z0, (S, R) b␢ | Z0, (S, R) c␢ | Z0, (R, S)

a␢ | A, (R, R)

b␢ | B, (R, R) c␢ | ␢, (S, L) c␢ | ␢, (R, L)

cA | A, (S, L) cB | B, (S, L)

cZ0| Z0, (R, R) aA | A, (R, L)

bB | B, (R, L)

␢Z0| Z₀, (S, S)

aA | A, (R, R) bB | B, (R, R)

␢␢ | ␢, (S, S)

(24)

Recall

The enriched parrot language can be recognized with a 1 tape deterministic TM too! How?

1 The TM starts recognizing the sublanguage wcw^R, using the tape as a stack

2 In case of a failure, the head of the tape is moved back to the first cell, while the head of the input tape is moved back to the first cell after the one containing the c character.

3 The tape can now be used as a queue to recognize the parrot language.

At step 2, the TM performs backtracking, which is exactly the same behavior of a Non-Deterministic TM!

Backtracking is the core mechanism to simulate a ND-TM with a deterministic one!

(25)

Suppose we want to design an automaton able to recognize the language L = {aⁿbⁿcⁿ| n ≥ 1}^C.

We can exploit a kind of ”divide et impera” strategy We split the language in sublanguages, and we design an automaton for each of these sublanguages

Then, we merge these sublanguages by allowing a non-deterministic choice among these 3 automata How to split the language?

L1 : (a⁺b⁺c^∗)^C L2 : {aⁿb^mc^∗| n 6= m}

L3 : {a^∗bⁿc^m| n 6= m}

(26)

Automaton for L₁ A FSA is sufficient!

q₀

start q₁ q₂

q3

qe

a b, c

a

b

c

b a c

a, b c a, b, c

(27)

Automaton for L₂

A Deterministic PDA is sufficient!

q0

start q1

q2

q3 q4

aZ0| BZ₀

bZ0| Z₀ aB | AB

aA | AA

bA | bB |

cA | Z0

cB |

bA | bZ0| Z0

bB |

cA | Z0

cZ0| Z0

cB | bZ0| Z₀

cZ0| Z0

(28)

Automaton for L₃

A Deterministic PDA is sufficient!

q0

start

q1 q2 q3

aZ0| Z₀

cZ0| Z₀ bZ0| CZ₀

bB | BB bC | BC

cB |

cC |

cB | cZ0| Z₀

cC |

cZ0| Z0

(29)

We can now merge the 3 automata with a non-deterministic choice among the 3 initial states of the automata:

Recognizer for L = {aⁿbⁿcⁿ| n ≥ 1}^C

q0

start

FSA L1

PDA L2

PDA L3

Z0| Z0

The resulting automaton is a Non-Deterministic PDA

NBFor the sake of correctness, the FSA for L₁ must be turned into a PDA which does not alter the stack

(30)

We want to design the automation with minimum computational power required to recognize the language

L = {wcw | w ∈ {a, b}^∗}^C

Again, we split in different sublanguages, each breaking one constraint imposed by the original language on its strings Which sublanguages?

1 L1={x |∃α, β, γ, δ∈{a, b}^∗((x =α.b.β.c.γ.a.δ ∨

x =α.a.β.c.γ.b.δ) ∧ |α|=|γ|)}: break the constraint that each character in the first half of the string is equal to the

corresponding character in the second half of the string

2 L2= {(a | b)^∗c(a | b)^∗}^C: break the structure of string

3 L3= {x | ∃α, β ∈ {a, b}^∗(x = α.c.β ∧ |α| 6= |β|)}: break the constraint that the string is split by the single character c in two equally long portions

Then, we merge these sublanguages with a non deterministic choice among the 3 corresponding automata

(31)

Automaton for L1 A NPDA is sufficient:

q0

start

q1

q2

q3

q4

qf

aZ0| MZ0

bZ0| MZ0

aM | MM bM | MM

aZ0| Z0

aM | M

bZ0| Z0

bM | M aM | M bM | M aZ0| Z₀bZ0| Z₀

cZ0| Z₀ cM | M

aM | M bM | M aZ0| Z0bZ0| Z0

cZ0| Z₀ cM | M

aM | bM |

bZ0| Z0

aM | bM | aZ0| Z₀

aZ0| Z0

bZ0| Z₀

(32)

Automaton for L2 A FSA is sufficient:

q0

start q1 qe

a, b

c

a, b

c a, b, c

Automaton for L3 A DPDA is sufficient:

q0

start

q1 q2 q3

aZ0|NZ0bZ0|NZ0

aN|MN bN|MN aM|MM bM|MM

cM | M cN | N

cZ0| Z₀ aM | bM |

aN | bN |

aZ0| Z₀ bZ0| Z₀

(33)

We want to recognize L = (ww | w ∈ {a, b}^∗)^C. Which automaton?

Main Idea: Decomposition Into Sub-Languages! How?

⇓ L = L1∪ L₂∪ L₃

L₁ : {x | |x| = 2k + 1, k ≥ 0}

L₂ : {x | ∃α, β, γ(x = α.a.β.b.γ ∧ |α.γ| = |β|)}

L₃ : {x | ∃α, β, γ(x = α.b.β.a.γ ∧ |α.γ| = |β|)}

Can we recognize L2 and L3 with a NPDA?

Yes, we can split β in two parts: β = β₁.β₂, |β₁| = |α| ∧ |β₂| = |γ|

(34)

Recognizer for L = (ww | w ∈ {a, b}^∗)ˆ

q0

start

q1 q2

q3

q4

q5

q6

q7

q8 qf

Z0| Z0

Z0| Z₀

aZ0| Z₀ bZ0| Z₀

aZ0| Z0bZ0| Z0

aZ0| MZ0

bZ0| MZ0

bM | MM aM | MM

aZ0| Z₀ aM | M

bZ0| Z₀ bM | M

aM | bM |

Z0| Z0

aM | bM |

Z0| Z₀

aZ0| MZ0bZ0| MZ0

bM | MM aM | MM bZ0| Z₀ bM | M aZ0| MZ0bZ0| MZ0

bM | MM aM | MM

aZ0| Z₀ aM | M aM |

bM |

Z0| Z₀

(35)

Determine the weakest automaton able to recognize the following languages:

1 L1 = {aⁿbⁿ| n ≥ 1} ∪ {aⁿb²ⁿ| n ≥ 1}

2 L2 = {1aⁿbⁿ| n ≥ 1} ∪ {2aⁿb²ⁿ| n ≥ 1}

3 L₃ = {aⁿ1bⁿ| n ≥ 1} ∪ {aⁿ2b²ⁿ| n ≥ 1}

4 L₄ = {aⁿbⁿ1 | n ≥ 1} ∪ {aⁿb²ⁿ2 | n ≥ 1}

⇓

L1, L4 : NPDA! We cannot determine the number of b to be counted L₂, L₃ : PDA! 1 and 2 allows to determine the number of b to be

counted

(36)

Determine the weakest automaton able to recognize the following languages:

1 L₁ = {aⁿb^pb^p | n ≥ 1, p ≥ 1} ∪ {aⁿb^paⁿ| n ≥ 1, p ≥ 1}

2 L2 = {aⁿbⁿ| n ≥ 1} ∪ {bⁿcⁿ| n ≥ 1}

3 L₃ = {aⁿbⁿc⁺| n ≥ 1} ∪ {a⁺bⁿcⁿ| n ≥ 1}

⇓

1 L₁ = {aⁿb^2p| n ≥ 1, p ≥ 1} ∪ {aⁿb^paⁿ| n ≥ 1, p ≥ 1} ⇒ PDA: if the number of b is odd, then the sequence aⁿ after b^p is mandatory, otherwise it is optional

2 L₂: Recognize aⁿbⁿ (respectively bⁿcⁿ) if the string starts with a (resp. b) ⇒ PDA

3 L3: The sub-languages can neither be recognized

simultaneously nor be distinguished at the beginning of the string ⇒ NPDA

(37)

What about complements of L₁, L₂ and L₃?

1 L^C₁, L^C₂: PDA are closed with respect to complement. Both L₁ and L₂ are recognized by a PDA ⇒ their complements are recognized by a PDA too

May they be recognized by a FSA? No, as FSA are closed w.r.t. complement too!

2 L₅ = L^C₃: The complement of a language recognized by a NPDA cannot be recognized by a PDA. Why?

If L5is recognized by a PDA, then its complement (i.e., L3) must be recognized by a PDA too

Thus L5 can be recognized by either a NPDA or a TM L5= {aⁿbⁿc⁺| n ≥ 1}^C ∩ {a⁺bⁿcⁿ| n ≥ 1}^C

Necessary to simultaneously verify that the number of a is different from the number of b and that the number of b is different from c ⇒ Impossible with a stack ⇒ TM

What about L^C₁ ∪ L^C₂?

L^C₁ ∪ L^C₂ = (L₁∩ L₂)^C = {a^2pb^2p| p ≥ 1}^C ⇒ PDA