• Non ci sono risultati.

Turing Machines and Language Classification

N/A
N/A
Protected

Academic year: 2021

Condividi "Turing Machines and Language Classification"

Copied!
30
0
0

Testo completo

(1)

Turing Machines and Language Classification

Nicol`o Felicioni1

Dipartimento di Elettronica e Informazione Politecnico di Milano

nicolo . felicioni @ polimi . it

March 16, 2021

1Mostly based on Nicholas Mainardi’s material.

(2)

The ultimate computing model

Definition

Willing to further extend the capabilities of the PDA we substitute the stack with one (or more) tapes

The resulting automaton is the Turing Machine, which is able to do anything we define as computablea

The TM is able to choose whether to advance, go back or stand still with the cursor on the tape

Multiple tapes can be added, but they do not increase the computing power

aor, at least, Alonso Church believed so, and we pretty much agree

(3)

The ultimate computing model

Definition

For greater comfort, we use a marker symbol Z0 to indicate the beginning of the tape (in case it’s not there, we can add it)

We assume that the input is positioned on the input tape All the other tapes, if any, are filled with the blank symbol␢ All the actions on the tapes contents are driven by a state machine

Notation:<input tape><tape1>, . . . , <tapen> | <tape1>

, . . . , <tapen>, (R | S | L)n+1

(4)

Formalization

Definition

A k-tapes recognizerTM is formally defined as a 6-tuple (Q, I, Γ, δ, q0, F), where:

Q, is the set of states of the automata

I ∪ {␢} is the alphabet of the input string which will be checked Γ ∪ {␢} is the alphabet of the symbols on the tapes

δ : Q × I × Γk 7→ Q × Γk× {R, S, L}k+1 the transition function q0∈ Q the (unique) initial state from where the automaton starts

F ⊆ Q the set of final accepting states of the automaton

(5)

A first example

L = anbncn, n ≥ 1

The trick is that we can sweep over the memory tape as many times as we want

q0

start q1 q2 q3

q4

a␢|Z0, (S, R)

a␢|A, (R, R)

b␢|␢, (S, L)

bA|A, (R, L)

cZ0|Z0, (S, R)

␢␢|␢, (S, S) cA|A, (R, R)

(6)

A bit more complex

L = anbn÷2cn÷2, n ≥ 1

A slight deviation from the common idea of counting

More than a single solution possible (it’s the same as saying :

“devise a program that writes such strings”)

One method to perform division: stop on the input tape, continue in memory

Other methods involve writing twice the symbols on the tape

(7)

A bit more complex

q0

start q1 q2 q4 q10

q6 q3

q5 q9

q7 q8

a␢|Z0, (S, R) a␢|A, (R, R)

b␢|␢, (S, L)

␢␢|␢, (S, L)

bA|A, (S, L)

cZ0|Z0, (S, R)

cA|A, (S, L) cZ0|Z0, (S, R)

␢A|A, (S, L)

␢A|A, (S, R)

␢␢|␢, (S, S) cA|A, (S, R)

cA|A, (R, R)

␢Z0|Z0, (S, S)

␢␢|␢, (S, S) bA|A, (R, L)

(8)

The Parrot Language

L = {wcw | w ∈ {a, b}}

q0

start q1 q2

q3 qf

b␢ | Z0, (S, R) a␢ | Z0, (S, R)

c␢ | Z0, (R, R)

b␢ | B, (R, R) a␢ | A, (R, R)

c␢ | ␢, (S, L)

cA | A, (S, L) cB | B, (S, L)

cZ0| Z0, (R, R)

bB | B, (R, R) aA | A, (R, R)

␢␢ | ␢, (S, S)

(9)

An Enriched Parrot Language

What if we add the language L1 = {wcwR | w ∈ {a, b}} to the parrot language? We need to recognize

L = {wcw | w ∈ {a, b}} ∪ {wcwR | w ∈ {a, b}}.

There is an issue: We cannot infer from the input string if we have to match L1 or the parrot language

This is no longer a problem with a TM!

Indeed, we can deal with both cases without guessing a-priori which sublanguage should be recognized

How can we do this? ....

(10)

An Enriched Parrot Language

We can use 2 tapes instead of only one! The first one acts as a queue, the second one as a stack:

q0

start q1

q2 q3 qf

q4

q5 b␢, ␢ | Z0, Z0, (S, R, R)

a␢, ␢ | Z0, Z0, (S, R, R) c␢, ␢ | Z0, Z0, (R, R, S)

b␢, ␢ | B, B, (R, R, R) a␢, ␢ | A, A, (R, R, R) c␢, ␢ | ␢, ␢, (S, L, S)

cA,␢ | A, ␢, (S, L, S) cB,␢ | B, ␢, (S, L, S)

cZ0,␢ | Z0,␢, (R, R, L)

bB, B | B, B, (R, R, L) aA, A | A, A, (R, R, L)

aA, B

| A, D, (R,R,S)

bB, A

| B, D, (R,R,S)

aB, A

|D , A, (R

, S, L) bA, B

|D , B, (R

, S, L)

␢␢, Z0|␢, Z0, (S, S, S) bB, D | B, D, (R, R, S) aA, D | A, D, (R, R, S)

␢␢, D | ␢, D, (S, S, S)

␢D, Z0| D, Z0, (S, S, S)

bD, B | D, B, (R, S, L) aD, A | D, A, (R, S, L)

(11)

An Enriched Parrot Language

However, we may wonder: can L be recognized with a 2 tapes TM, but not with a 1 tape TM?

No! Recall that all types of TMs are equivalent. A recognizer for L can be built with only 1 tape, it is just more complex than using an additional tape. How?

The TM starts recognizing the sublanguage L1, using the tape as a stack

In case of a failure, the head of the tape is moved back to the first cell, while the head of the input tape is moved back to the first cell after the one containing the c character.

The tape can now be used as a queue to recognize the parrot language.

(12)

An Enriched Parrot Language

L = {wcw | w ∈ {a, b}} ∪ {wcwR | w ∈ {a, b}} with a 1 tape TM

q0

start q1 q2

q3

q4 qf

b␢ | Z0, (S, R) a␢ | Z0, (S, R)

c␢ | Z0, (R, R)

b␢ | B, (R, R) a␢ | A, (R, R)

c␢ | ␢, (R, L)

aA | A, (R, L) bB | B, (R, L)

aB

|B, (L,L) bA

|A, (L,L) ␢Z0| Z0, (S, S) aA | A, (L, L)

aB | B, (L, L) aZ0| Z0, (L, S)

bA | A, (L, L) bB | B, (L, L) bZ0| Z0, (L, S)

cA | A, (S, L) cB | B, (S, L) cZ0| Z0, (R, R)

bB | B, (R, R) aA | A, (R, R)

␢␢ | ␢, (S, S)

(13)

A Powerful Computational Model

L = {anbn2, n ≥ 1} with a 1 tape TM

Basic Idea: At each iteration, we read n times b by employing the tape as a stack. We keep track of how many iterations have been performed through a marker symbol M, which replaces the A symbol used to count.

q0

start qq11

q2 q3 q4 q5

qf a␢ | Z0, (S, R) a␢ | A, (R, R) b␢ | ␢, (S, L)

bM | M, (R, L) bA | M, (R, L)

bA | A, (R, L)

bZ0| Z0, (S, R)

␢Z0| Z0, (S, R)

bA | A, (S, R)

␢M | M, (S, S)

bA | A, (S, R) bM | M, (S, R) b␢ | ␢, (S, L)

(14)

Complement

To design the automaton for the complement of a language, we cannot follow the same procedure seen for FSA and PDA unless we are sure the TM always halts, but we will see later in the course this is not always possible

Therefore, it is better to design the automaton without an algorithm, but using the original one as a building block

L = {anbn2, n ≥ 1}C We can see L as

L1∪ L2, L1= {anbm | m 6= n2, m ≥ 0, n ≥ 1}, L2 = (a+b)C Then, we can build an automaton for L1 starting from the original one, and merge it with the FSA recognizing L2

(15)

Complement

Automaton for L1 = {anbm | m 6= n2, m ≥ 0, n ≥ 1}

q0

start qq11

q2 q3 q4

q5

q6

q7 qf

a␢ | Z0, (S, R)

a␢ | A, (R, R) b␢ | ␢, (S, L)

␢␢ | ␢, (S, S) bM | M, (R, L) bA | M, (R, L)

␢M| ␢, (S,S)

␢A| ␢, (S,S) bA | A, (R, L)

bZ0| Z0, (S, R)

␢Z0| Z0, (S, R)

␢A| ␢, (S , S)

bA | A, (S, R)

␢M | M, (S, S)

bM |␢, (R, S)

␢A | ␢, (S, S) bA | A, (S, R) bM | M, (S, R) b␢ | ␢, (S, L)

b␢ | ␢, (R, S)

␢␢ | ␢, (S, S)

(16)

Complement

Automaton to Recognize L2 = (a+b)C

q0

start q1 q2

qf a

b a

b b

a a, b

Strategy to combine TM for L1 and FSA for L2:

We can now replace the states q1 and q2 with the automaton recognizing L1, with a transition reading an a allowed from each possible state of the automaton for L1 where a b has already been read.

Beware of final states while combining the two automata . . .

(17)

Complement

TM for L = {anbn2, n ≥ 1}C?

q0

start qq11

q2 q3 q4

q5

q6

q7 qf

a␢ | Z0, (S, R)

b␢ | ␢, (R, S) a␢ | A, (R, R) b␢ | ␢, (S, L)

␢␢ | ␢, (S, S) bM | M, (R, L) bA | M, (R, L)

␢M| ␢, (S,S)aM

| ␢,(R,S)

␢A| ␢,(S,S)aA

| ␢,(R,S) bA | A, (R, L)

bZ0| Z0, (S, R)

␢Z0| Z0, (S, R)

␢A| ␢, (S , S) aZ

0| ␢, (R , S) aA

| ␢, (R , S)

bA | A, (S, R)

␢M | M, (S, S)

bM |␢, (R, S)

␢A | ␢, (S, S) bA | A, (S, R) bM | M, (S, R) b␢ | ␢, (S, L)

b␢ | ␢, (R, S)

␢␢ | ␢, (S, S) a␢ | ␢, (R, S)

a␢ | ␢, (R, S) b␢ | ␢, (R, S)

q0 ∈ F ⇒ This TM always halts in q0 ⇒ TM accepts L = {a, b}

(18)

Complement

TM for L = {anbn2, n ≥ 1}C?

q0

start qq11

q2 q3 q4

q5

q6

q7 qf

a␢ | Z0, (S, R)

b␢ | ␢, (R, S) a␢ | A, (R, R) b␢ | ␢, (S, L)

␢␢ | ␢, (S, S) bM | M, (R, L) bA | M, (R, L)

␢M| ␢, (S,S)aM

| ␢,(R,S)

␢A| ␢,(S,S)aA

| ␢,(R,S) bA | A, (R, L)

bZ0| Z0, (S, R)

␢Z0| Z0, (S, R)

␢A| ␢, (S , S) aZ

0| ␢, (R , S) aA

| ␢, (R , S)

bA | A, (S, R)

␢M | M, (S, S)

bM |␢, (R, S)

␢A | ␢, (S, S) bA | A, (S, R) bM | M, (S, R) b␢ | ␢, (S, L)

b␢ | ␢, (R, S)

␢␢ | ␢, (S, S) a␢ | ␢, (R, S)

a␢ | ␢, (R, S) b␢ | ␢, (R, S)

q0 ∈ F ⇒ This TM always halts in q0 ⇒ TM accepts L = {a, b}

(19)

Complement

Correct TM for L = {anbn2, n ≥ 1}C:

q0

start qq11

q2 q3 q4

q5

q6

q7 qf

a␢ | Z0, (S, R)

b␢ | ␢, (R, S)

␢␢ | ␢, (S, S) a␢ | A, (R, R) b␢ | ␢, (S, L)

␢␢ | ␢, (S, S) bM | M, (R, L) bA | M, (R, L)

␢M| ␢,(S,S)aM

| ␢,(R,S)

␢A| ␢,(S,S)aA

| ␢,(R,S) bA | A, (R, L)

bZ0| Z0, (S, R)

␢Z0| Z0, (S, R)

␢A| ␢, (S , S) aZ

0| ␢, (R , S) aA

| ␢, (R , S)

bA | A, (S, R)

␢M | M, (S, S)

bM |␢, (R, S)

␢A | ␢, (S, S) bA | A, (S, R) bM | M, (S, R) b␢ | ␢, (S, L)

b␢ | ␢, (R, S)

␢␢ | ␢, (S, S) a␢ | ␢, (R, S)

(20)

Single Tape TM

We have only the input tape available, which is obviously writable.

They are equivalent to k-tapes TM

The convention is hinput tape readi | hinput tape writei , (R | S | L)

Example: L = {wcw | w ∈ {a, b}}

q0

start

q1 q2

q3 q4

q5 q6

qf

b | M, R a | M, R

c | c, R

a | a, R b | b, R

c | c, R

N | N, R b | N, L a | a, R

b | b, R

c | c, R

N | N, R a | N, L

N | N, L c | c, L a | a, L b | b, L M | M, R

N | N, R

␢ | ␢, S

(21)

Transducer TM

Definition

A k-tapes transducer TM is formally defined as a 8-tuple (Q, I, Γ, δ, q0, F, η, O), where:

Q, is the set of states of the automata

I ∪ {␢} is the alphabet of the input string which will be read Γ ∪ {␢} is the alphabet of the symbols on the tapes

δ : Q × I × Γk 7→ Q × Γk× {R, S, L}k+1 the transition function q0∈ Q the (unique) initial state from where the automaton starts

F ⊆ Q the set of final accepting states O the output alphabet (may coincide with I)

η : Q × I × Γk 7→ O× {R, S}, the translation function, which emits characters on the output tape of the automaton

Notation: hinput tapeihtape1i . . . htapeki |

htape1i, . . . , htapeki, houtput tapei, (R | S | L)k + 1(R | S )

(22)

An Example

τ (anbkcm) = dmin(n,m,k) | n, m, k ≥ 1

q0

start q1 q2

q3 q4

qf

a␢ | Z0,␢, (S, R, S)

a␢ | A, ␢, (R, R, S)

b␢ | ␢, ␢, (S, L, S)

bA | A,␢, (R, L, S) bZ0| Z0,␢, (R, S, S)

cA | A,␢, (S, R, S) cZ0| Z0,␢, (S, R, S)

cA | A, d , (R, R, R) c␢ | ␢, ␢, (R, S, S)

␢A | A, ␢, (S, S, S)

␢, ␢ | ␢, ␢, (S, S, S) c␢ | ␢, ␢, (R, S, S)

␢␢ | ␢, ␢, (S, S, S)

(23)

A Surprising Development

Consider the language L = NbinaN, that is an integer N in its binary representation which specifies the length of the subsequent sequence of a. Do we really need a TM for it?

We simply needs to read N and count a number of a equal to N

if N is bounded, we can do it with an FSA

If N is unbounded, we need at least a stack to store it in its binary form and decrementing it by 1 each time an a is read.

The string is accepted if and only if the binary number 0 is found on the stack when the string has been entirely read.

Can we decrement using only a stack? ...

(24)

A Surprising Development

... Turns out we cannot decrement with a PDA!

The issue stands generally in borrows and carries, respectively for decrementing and incrementing

Indeed, subtracting by 1 can be done to the least significant digit unless it is 0. In this case, subtracting by 1 requires a borrow from the closest more significant digit.

If this digit is 0 too, the borrow is required to the next more significant digit until a digit different from 0 is found.

If we have to do it with a stack, we erase the memory in order to go back to the first non-zero digit

For binary numbers, this is not an issue since all the erased cells are 1. However, how many cells have we just erased?

This information is lost with just a stack, thus a PDA is not sufficient!

(25)

A Surprising Development

Let’s design the TM to recognize this language L = NbinaN:

q0 start

q1 q2 q3 q4

q5 qf

0␢ | Z0, (S, R) 1␢ | Z0, (S, R) 0␢ | 0, (R, R) 1␢ | 1, (R, R)

a␢ | ␢, (S, L)

␢␢ | ␢, (S, L)

a1 | 0, (R, S)

a0 | 1, (S, L)

␢0 | 0, (S, L)

a0 | 1, (S, L)

a1 | 0, (S, R)

a1 | 1, (S, R)

a␢ | ␢, (R, L)

␢0 | 0, (S, L)

␢Z0| Z0, (S, S)

(26)

A Surprising Development

The language L is really common to be found in protocol formats!

For instance, IP messages usually contain a length field in their header

However, this length field employs a fixed number of bytes (e.g. one octet), thus the number N is bounded

This language can be recognized by a FSA!

Ok, but how many states?

(27)

A Surprising Development

L = NbinaN, |Nbin| = 2

q0 start

q1 q2

q3

q4

q5

qf 0 1

1 0 0

1

a a a

If the upper bound is N, then we need approximately 2N states → not feasible for common protocols which require 4-octets length fields

(28)

A Surprising Development

Can we do it with a PDA employing about |Nbin| states? Yes L = NbinaN, |Nbin| = 3

q0

start q1 q2 qf

q4

qa q11

q12 q31

q10

q02 0Z0| 0Z0

1Z0| 1Z0

00 | 00

10 | 10

00 | 00

10 | 10

00 | 00 01 | 01 10 | 10 11 | 11

a1 | 

a0| a1 | 

a0 | 

1|01

0 | 

0|

1 | 001

Z0| Z0

1 | 10

0

|

1 | 110

(29)

Identifying Minimum Computational Power Required For a Language

Associate each language with the minimum power automaton needed to recognize it:

1 L1 = {anbn2cn2 | n ≥ 1}

2 L2 = {anbn| n ≥ 1} ∪ {ancn| n ≥ 1}

3 L3 = {anbn| 1 ≤ n ≤ 20}

4 L4 = {anbnbn| n ≥ 1}

5 L5 = {anb2nan}

6 L6 = {aibjck | i + k = j}

(30)

Identifying Minimum Computational Power Required For a Language

Associate each language with the minimum power automaton needed to recognize it:

1 L1 = {anbn2cn2 | n ≥ 1} =⇒ TM : We have already done it

2 L2 = {anbn| n ≥ 1} ∪ {ancn| n ≥ 1} =⇒ PDA : The two sub-languages can both be recognized by a PDA. We can decide among them by reading a b or c

3 L3 = {anbn| 1 ≤ n ≤ 20} =⇒ FSA : There is an upper bound on the number of a and b

4 L4 = {anbnbn| n ≥ 1} =⇒ PDA: It is equivalent to anb2n

5 L5 = {anb2nan} =⇒ TM: Same reason as anbncn, i.e. we cannot count the a after we count the b

6 L6 = {aibjck | i + k = j} =⇒ PDA: L6 = {aibibkck}, that is equivalent to recognizing two times anbn with unrelated n

Riferimenti

Documenti correlati

Definition (the preorder induced by a separable capitalization factor). Let ≥ fe the usual preorder of the financial events. We shall give two proofs: algebraic and geometrical. the

decreases the activation energy of the forward reaction and increases the activation energy of the reverse reaction.. increases the activation energy of the forward reaction

in the soil, with the results of tests of the load-bearing capacity of the GRP panels placed in the models of the concrete conduits that were renewed using the

In the model considered, the appearance of magnetic field reversions against the background of a constant or slightly varying amplitude of the velocity field (except for the case of

We hold these truths to be self-evident, that all men are created equal, that they are endowed by their creator with certain inalienable rights, that among these are life, liberty,

In the present study, we combined this information with the actual movement intention to select, for each intention, two subsets of movements: (i) high-informative movements (N = 50,

We will relate the transmission matrix introduced earlier for conductance and shot noise to a new concept: the Green’s function of the system.. The Green’s functions represent the

the maximum valve operation frequency is 3 Hz (50 % duty cycle). It was also evaluated that a fluidic channels delivers 1.8 μl/h of fluid at a.. pressurized at 15 psi. Using