• Non ci sono risultati.

Principles  of  Programming  Languages

N/A
N/A
Protected

Academic year: 2022

Condividi "Principles  of  Programming  Languages"

Copied!
34
0
0

Testo completo

(1)

Principles  of  Programming  Languages  

h"p://www.di.unipi.it/~andrea/Dida2ca/PLP-­‐14/

 

Prof.  Andrea  Corradini  

Department  of  Computer  Science,  Pisa

 

•  From  RE  to  DFA,  directly  

•  Minimiza@on  of  DFA’s  

•  Exercises  on  lexical  analysis    

Lesson 6!

(2)

2

From  Regular  Expression  to  DFA   Directly  

•  The “important states” of an NFA are those with a non-ε outgoing transition,

– if move({s}, a) ≠ ∅ for some a then s is an important state

•  The subset construction algorithm uses only the important states when it determines

ε-closure(move(T, a))

(3)

3

N(r2)   N(r1)  

What  are  the  “important  states”  in  the   NFA  built  from  Regular  Expression?  

f   i   ε  

f   i   a

f  

i   N(r1)  

N(r2)  

start

start

start ε  

ε   ε  

ε  

f   i  

start

N(r)   f  

i  

start

ε  

ε   ε  

a

r1⏐r2

r1r2

r* ε   ε  

(4)

4

From  Regular  Expression  to  DFA   Directly  (Algorithm)  

•  The only accepting state (via the Thompson algorithm) is not important

•  Augment the regular expression r with a

special end symbol # to make accepting states important: the new expression is r#

•  Construct a syntax tree for r#

•  Attach a unique integer to each node not

labeled by ε

(5)

5

From  Regular  Expression  to  DFA  

Directly:  Syntax  Tree  of       (a|b)*abb# !

*

|

a1

b2

a3

b4

b5

#6

concatenation

“cat-nodes”

closure

“star-node”

alternation

“or-node” position

number (for leafs ≠ε)

(6)

6

From  Regular  Expression  to  DFA   Directly:  Annota@ng  the  Tree  

•  Traverse the tree to construct functions nullable, firstpos, lastpos, and followpos

•  For a node n, let L(n) be the language generated by the subtree with root n

•  nullable(n): L(n) contains the empty string ε

•  firstpos(n): set of positions under n that can match the first symbol of a string in L(n)

•  lastpos(n): the set of positions under n that can match the last symbol of a string in L(n)

•  followpos(i): the set of positions that can follow position i in any generated string

(7)

7

From Regular Expression to DFA Directly: Annotating the Tree

Node n nullable(n) firstpos(n) lastpos(n)

Leaf ε true

Leaf i false {i} {i}

| / \ c1 c2

nullable(c1) or

nullable(c2)

firstpos(c1)

firstpos(c2)

lastpos(c1)

lastpos(c2)

/ \ c1 c2

nullable(c1) and

nullable(c2)

if nullable(c1) then firstpos(c1) ∪

firstpos(c2) else firstpos(c1)

if nullable(c2) then lastpos(c1) ∪

lastpos(c2) else lastpos(c2)

*

| c1

true firstpos(c1) lastpos(c1)

(8)

8

From  Regular  Expression  to  DFA  

Annota@ng  the  Syntax  Tree  of   ( a|b)*abb#

{6}

{1, 2, 3}

{5}

{1, 2, 3}

{4}

{1, 2, 3}

{3}

{1, 2, 3}

{1, 2}

{1, 2} *

{1, 2}

{1, 2} |

{1}

{1} a {2} b{2}

{3}

{3} a

{4}

{4} b

{5}

{5} b

{6}

{6} #

nullable

3

4

5

6

(9)

9

From Regular Expression to DFA

followpos on the Syntax Tree of

(a|b)*abb#

{6}

{1, 2, 3}

{5}

{1, 2, 3}

{4}

{1, 2, 3}

{3}

{1, 2, 3}

{1, 2}

{1, 2} *

{1, 2}

{1, 2} |

{1}

{1} a {2} b{2}

{3}

{3} a

{4}

{4} b

{5}

{5} b

{6}

{6} #

nullable

1 2

3

4

5

6

<ċ

$  $ $

2Ð

  $ $  $ $$$ ǝҿ   $

2Ð

$  $ $ $$ $

1Ð

$  $   $ $$ $

2Ð!"#$%&

$$$ ^ $    $  $

$ $ dҿ

 $

2 Ç Ð

  $   $ !$ !$

֎നஸᎯᅻ਺ᗮ(ùŷѢ؛Ĝ  ؛ࡐྲ়ྀᗮΞ  ؛૧အᅻᗮྲྀအ়਺በᗮനྲྀᗮዐ௿਺ᗮበᒹྲྀዐࡐᒏᗮዐᅻ਺਺ᗮ૧အᅿᗮ$5Ȇg)«5ggƽ̓

ߖ਺ᗮ༌ᎯበዐᗮࡐเበအᗮࡐႼႼเᒹᗮᅻᎯเ਺ᗮዐအᗮዐ௿਺ᗮበዐࡐᅻŸྲྀအ়਺ȯᗮݬ௿ࡐዐᗮᅻᎯเ਺ᗮዐ਺เเበᗮᎯበᗮႼအበനዕനအྲྀበᗮĽ؛ࡐྲ়ྀᗮ

ࡐᅻਾᗮനྲྀᗮ࣒အዐ௿ᗮ ࡐྲ়ྀᗮ በനྲྀह਺ᗮ࣒အዐ௿ᗮĜ  ؛ࡐྲ়ྀᗮ  ؛૧အᅻᗮ ዐ௿നበᗮྲྀအ়਺ᗮࡐᅻ਺ᗮWĽ ؛ù؛ ݬ௿਺ᗮ׏အ༌Ⴜเ਺ዐ਺ᗮበੁዐበᗮ ࡐᅻ਺ᗮበᎯ༌༌ࡐᅻനᔎ਺়ᗮനྲྀᗮ֎നஸɉᗮ(ù-"ù؛

Wҿ

45&'† ҿ

؛

Ϭ׋

W  (؛

W  (؛

W؛W؛

W-؛

ΠϏ

֎നஸᎯᅻ਺ᗮ(ù-"Ʉ؛ݬ௿਺ᗮ૧Ꭿྲྀहዐനအྲྀᗮ

ߖ਺ᗮहࡐྲྀᗮᅻ਺Ⴢᅻ਺በ਺ྲྀዐᗮዐ௿਺ᗮ૧Ꭿྲྀहዐനအྲྀᗮ ࣒ᒹᗮहᅻ਺ࡐዐനྲྀஸᗮࡐᗮ়നᅻ਺हዐ਺়ᗮஸሪࡐႼ௿ᗮᑌനዐ௿ᗮ ࡐᗮྲྀအ়਺ᗮ૧အᅻᗮ਺ᅶह௿ᗮႼအበനዐനအྲྀᗮࡐྲ়ྀᗮࡐྲྀᗮࡐᅻड़ᗮ૧ᅻအ༌ᗮႼအበനዐനအྲྀᗮĠ؛ዐအᗮႼအበനዐനအྲྀᗮ෼ᗮന૧ᗮࡐྲ়ྀᗮအྲྀเᒹᗮനଘᗮ

෼ᗮനበᗮനྲྀᗮ ֎നஸᎯᅻ਺ᗮ(ù- ؛በ௿အᑌበᗮዐ௿നበᗮஸᅻࡐ჎௿ᗮ૧အᅻᗮዐ௿਺ᗮ૧Ꭿྲྀहዐനအྲྀᗮအ૧ᗮ֎നஸȯᗮ(ù-"ù؛

צዐᗮበ௿အᎯเ়ᗮहအ༌਺ᗮࡐበᗮྲྀအᗮበᎯᅻႼᅻനበ਺ᗮዐ௿ࡐዐᗮዐ௿਺ᗮஸᅻࡐႼ௿ᗮ૧အᅻᗮ നበᗮࡐเ༌အበዐᗮࡐྲྀᗮڴ֎ӕᗮ ᑌനዐ௿အᎯዐᗮ ծƊዐᅻࡐྲྀበനዐനအྲྀበᗮ૧အᅻᗮዐ௿਺ᗮᎯྲ়ྀ਺ᅻเᒹനྲྀஸᗮራ਺ஸᎯเࡐᅻᗮ਺ᒏჂሬ਺በበനအྲྀýᗮࡐྲ়ྀᗮᑌအᎯเ়ᗮ࣒਺॑အ༌਺ᗮ အྲྀ਺ᗮന૧ᗮᑌ਺гᗮ

ù؛ڈࡐค਺ᗮࡐเเᗮႼအበനዐനအྲྀበᗮനྲྀᗮ֨ ה ؛အ૧ᗮዐ௿ਾᗮᅻအအዐᗮ࣒਺ᗮനྲྀനዐനࡐเᗮበዐࡐዐ਺በýᗮ

ù؛ ٟࡐ࣒਺เᗮ਺ࡐॕ௿ᗮࡐᅻहᗮ૧ᅻအ༌ᗮĠ؛ዐအᗮ෼ᗮ࣒ᒹᗮዕ௿਺ᗮበᒹ༌࣒အเᗮࡐዐᗮႼအበനዐനအྲྀᗮĠ ؛ࡐྲ়ྀᗮ

(10)

10

From Regular Expression to DFA Directly: followpos

for each node n in the tree do

if n is a cat-node with left child c1 and right child c2 then for each i in lastpos(c1) do

followpos(i) := followpos(i) ∪ firstpos(c2)

end do

else if n is a star-node

for each i in lastpos(n) do

followpos(i) := followpos(i) ∪ firstpos(n)

end do

end if end do

(11)

11

From Regular Expression to DFA Directly: Example

1,2,3  

start a 1,2,  

3,4   1,2,  

3,6   1,2,  

3,5  

b b

b b

a a

a

Node followpos

a 1 {1, 2, 3}

b 2 {1, 2, 3}

a 3 {4}

b 4 {5}

b 5 {6}

# 6 -

1  

2  

3   4   5   6  

(12)

12

From Regular Expression to DFA Directly: The Algorithm

s0 := firstpos(root) where root is the root of the syntax tree for (r)#

Dstates := {s0} and is unmarked

while there is an unmarked state T in Dstates do mark T

for each input symbol a

Σ

do

let U be the union of followpos(p) for all positions p in T

if U is not empty and not in Dstates then add U as an unmarked state to Dstates end if

Dtran[T, a] := U end do

end do

(13)

Minimizing  the  Number  of  States     of  a  DFA  

•  Given  a  DFA,  let  us  show  how  to  get  a  DFA  

which  accepts  the  same  regular  language  with   a  minimal  number  of  states  

13

A  

start

B   C  

D   E  

b b

b

b

b a a

a a

a

AC  

start

B   b D   b E  

b

a b

a

a a

(14)

14  

Equivalent  States:  Example  

Consider  the  accept  states  c  and  g.    They  are  both  sinks.  

Q:    Do  we  need  both  states?  

a  

b  

1  

d  

0,1  

e  

0,1  

1  

c  

0,1  

g   f  

0   0  

0  

0   1  

1  

(15)

15  

Equivalent  States:  Example  

A:    No,  they  can  be  merged!  

Q:    Can  any  other  states  be  merged  because  any  subsequent   string  suffixes  produce  iden@cal  results?  

a  

b  

1  

d   1   e   0,1  

0,1  

cg  

f   0  

0   0  

0   1  

1  

(16)

16  

Equivalent  States:  Example  

A:    Yes,  b  and  f.    No@ce  that  if  you're  in  b  or  f  then:  

1.  if  string  ends,  reject  in  both  cases  

2.  if  next  character  is  0,  forever  accept  in  both  cases   3.  if  next  character  is  1,  forever  reject  in  both  cases   So  merge  b  with  f.  

a  

b  

1  

d   1   e   0,1  

0,1  

cg  

f   0  

0   0  

0   1  

1  

(17)

17  

Equivalent  States:  Defini@on  

Intui@vely  two  states  are  equivalent    if  all  subsequent   behavior  from  those  states  is  the  same.  

 

a  

0,1  

d   1   e   0,1  

0,1  

cg  

bf   0  

0   1  

DEF:  Two  states  q  and  q' in  a  DFA    M  =  (Q,  Σ,  δ,  q0,  F  )  are  

equivalent  (or  indis-nguishable)  if  for  all  strings  u  ∈  Σ*,   the  states  on  which  u  ends  on  when  read  from  q  and  q'   are  both  accept,  or  both  non-­‐accept.  

(18)

18  

Finishing  the  Example  

Q:    Any  other  ways  to  simplify  the   automaton?  

a  

0,1  

d   1   e   0,1  

0,1  

cg  

B,f   0  

0   1  

(19)

19  

Useless  States  

A:    Get  rid  of  d.  

Gefng  rid  of  unreachable  useless  states   doesn't  affect  the  accepted  language.  

a  

0,1  

0,1   e  

0,1  

cg  

bf   0  

1  

(20)

20  

Minimiza@on  Algorithm:  Goals  

DEF:    An  automaton  is  irreducible  if    

–    it  contains  no  useless  states,  and  

–    no  two  dis@nct  states  are  equivalent.  

 The  goal  of  the  MinimizaAon  Algorithm  is  to  create  an   irreducible  automata  from  an  arbitrary  one,  accep@ng   the  same  language.  

 The  minimiza@on  algorithm  incrementally  builds  a   parAAon  of  the  states  of  the  given  DFA:  

•  It  starts  with  a  par@@on  separa@ng  just  accep@ng/non   accep@ng  states  

•  Next  it  splits  an  equivalence  class  if  it  contains  two  non   equivalent  states  

(21)

21  

Minimiza@on  Algorithm.  

(Par@@on  Refinement)  Code  

DFA  minimize(DFA  (Q,  S,  d,  q0,  F  )  )  

   remove  any  state  q  unreachable  from  q0        Par@@on  P  =  {F,  Q  -­‐  F  }    

   boolean  Consistent  =  false  

   while  (  Consistent  ==  false  )  Consistent  =  true          for(every  Set  S    P,  char  a    S,  Set  T    P  )  

       //  collect  states  of  T  that  reach  S  using  a                      Set    temp  =  {q    T  |  d(q,a)    S  }    

                   if  (temp  !=  Ø    &&  temp  !=  T  )                                Consistent  =  false  

                         P  =  (P  -­‐T  )∪{temp,T-­‐temp}  

   return  defineMinimizor(  (Q,  S,  d,  q0,  F  ),  P  )  

(22)

22  

Minimiza@on  Algorithm.  

(Par@@on  Refinement)  Code  

DFA  defineMinimizor  (DFA  (Q,  Σ,  δ,  q0,  F  ),  Par@@on  P  )      Set  Q'  =P    

   State  q'0  =  the  set  in  P    which  contains  q0      F'  =  {  S  ∈  P    |  S  ⊆  F  }  

   for  (each  S  ∈  P,  a  ∈  

Σ)

define  δ'  (S,a)  =  the  set  T  ∈  P    which  contains          the  states  δ'(S,a)  

   return  (Q',  

Σ

,  δ',  q'0,  F'  )  

(23)

23  

Minimiza@on  Algorithm:  Example    

Show  the  result  of  applying  the  minimiza@on  

algorithm  to  this  DFA  

(24)

24  

Proof  of  Minimal  Automaton  

Previous  algorithm  guaranteed  to  produce  an   irreducible  DFA.    Why  should  that  FA  be  the   smallest  possible  FA  for  its  accepted  

language?  

Analogous  ques@on  in  calculus:    Why  should  a  

local  minimum  be  a  global  minimum?    Usually  

not  the  case!  

(25)

25  

Proof  of  Minimal  Automaton  

THM  (Myhill-­‐Nerode):    The  minimizaCon  algorithm   produces  the  smallest  possible  automaton  for  its   accepted  language.  

Proof.    Show  that  any  irreducible  automaton  is  the   smallest  for  its  accepted  language  L:  

We  say  that  two  strings  u,v  ∈  Σ*  are  indis-nguishable   if  for  all  strings    x,          ux  ∈  L    ó    vx  ∈  L.  

No@ce  that  if  u    and  v    are  disAnguishable,  their  paths   from  the  start  state  must  have  different  endpoints.  

(26)

26  

Proof  of  Minimal  Automaton  

Consequently,  the  number  of  states  in  any  DFA  for  L   must  be  as  great  as  the  number  of  mutually  

dis@nguishable  strings  for  L.  

But  an  irreducible  DFA  has  the  property  that  every   state  gives  rise  to  another  mutually  dis@nguishable   string!  

Therefore,  any  other  DFA  must  have  at  least  as  many   states  as  the  irreducible  DFA                  

Let’s  see  how  the  proof  works  on  a  previous  example:  

(27)

27  

Proof  of  Minimal  Automaton:  Example  

The  “spanning  tree  of  strings”  {ε,0,01,00}  is  a   mutually  dis@nguishable  set  (otherwise  

redundancy  would  occur  and  hence  DFA   would  be  reducible).    Any  other  DFA  for  L   has  ≥4  states.  

a  

0,1  

0,1   e  

0,1  

cg  

bf   0  

1  

(28)

Exercises  on  Lexical  Analysis  

3.1.1  Divide  the  following  C++  program  into  appropriate   lexemes:  

 

float limitedSquare(x){float x;!

/* returns x-squared, but never more than 100 */!

return (x <= -10.0 || x >= 10.0) ? 100 : x*x;!

}!  

Which  lexemes  should  get  associated  lexical  values?    

What  should  those  values  be?  

 

28  

(29)

From  RE  to  Automata  and  backwards  

•  We  have  seen:  

–  RE  à  NFA  

–  NFA  à  DFA        [and  obviously    DFA  à  NFA]  

–  RE  à  DFA,  directly   –  DFA  à  minimal  DFA  

•  What  about        NFA,  DFA  à  RE?  More  difficult.  

Three  approaches  (not  presented):  

•  Dynamic  Programming  [Scov  Sec@on  2.4  on  CD][Hopcrow,   Motwani,  Ullman,  Sec@on  3.2.1]  

•  Incremental  state  elimina@on  [HMU,  Sec@on  3.2.2]  

•  RE  as  fixpoint  solu@on  of  system  of  language  equa@ons     [uses  right-­‐linear  grammars  for  Regular  Languages]    

29  

(30)

Exercises  on  Regular  Expressions  

3.3.2  Describe  the  languages  denoted  by  the  following  regular   expressions:  

b)  ((ε|a)b*)*  

c)  (a|b)*a(a|b)(a|b)    

3.3.5  Write  regular  defini@ons  for  the  following  languages:  

b)  All  strings  of  lowercase  levers  in  which  the  levers  are  in   ascending  lexicographic  order.  

c)  Comments,  consis@ng  of  a  string  surrounded  by  /*  and  */,   without  an  intervening  */,  unless  it  is  inside  double-­‐quotes  (”)   i)  All  strings  of  a's  and  b's  that  do  not  contain  the  subsequence   abb.  

 

30  

(31)

Exercises  with  Lex  or  Flex  

•  3.5.2  Write  a  Lex  program  that  copies  a  file,   replacing  each  non-­‐empty  sequence  of  white   spaces  by  a  single  blank.  

•  3.5.3  Write  a  Lex  program  that  copies  a  C   program,  replacing  each  instance  of  the   keyword  float  by  double.  

31  

(32)

Exercises  on  Finite  Automata  

•  3.6.2  Design  finite  automata  for  the  following   languages  (providing  both  the  transi@on  graph   and  the  transi@on  table):  

a)  All  strings  of  lowercase  levers  that  contain  the  five   vowels  in  order.  

d)  All  strings  of  digits  with  no  repeated  digits.  Hint:  Try   this  problem  first  with  a  few  digits,  such  as  {O,  1,  2}.  

f)  All  strings  of  a's  and  b's  with  an  even  number  of  a's   and  an  odd  number  of  b's.  

   

32  

(33)

Exercises:  from  RE  to  DFA  

3.7.3  Convert  the  following  regular  expressions  to  

determinis@c  finite  automata,  using  the  [McNaughton-­‐

Yamada-­‐]Thompson  algorithm  (3.23)  and  the  subset   construcCon  algorithm  (3.20):

 

a)  (a|b)*   b)  (a*|b*)  *   c)  ((ε|a)|b*)  *  

d)  (a|b)  *abb(a|b)  *  

33  

(34)

Exercises:  Minimizing  DFA  

•  3.9.3    Show  that  the  RE    

a)  (a|b)*   b)  (a*|b*)  *   c)  ((ε|a)|b*)  *  

are  equivalent  by  showing  that  their  minimum   state  DFA’s  are  isomorphic.  

34  

Riferimenti

Documenti correlati

•  Every nonterminal has one (recursive) procedure responsible for parsing the nonterminal’s syntactic category of input tokens. •  When a nonterminal has multiple

•  An NFA accepts an input string x (over Σ ) if and only if there is some path with edges labeled with symbols from x in sequence from the start state to some.. accepting state

–  Transla=ng short-circuit Boolean expressions and flow-of- control statements with backpatching lists.. How does the code

•  A backward analysis propagates informa@on in the opposite direc@on in which the program flows. •  A forward analysis propagates informa@on in the same direc@on in which

/* Returns a stream consisting of the results of replacing each element of this stream with the contents of a mapped stream produced by applying the provided mapping function to

•  LALR parsing (Look-Ahead LR) merges two or more LR(1) state into one state to reduce table size. •  Less powerful

§  The types of variables and parameters are not declared, and cannot be inferred by the Python compiler.. So run-time type checks are needed to detect

•  Each state is a mul=way branch based on the next input character.. Puxng the