• Non ci sono risultati.

Principles  of  Programming  Languages

N/A
N/A
Protected

Academic year: 2022

Condividi "Principles  of  Programming  Languages"

Copied!
39
0
0

Testo completo

(1)

Principles  of  Programming  Languages  

h"p://www.di.unipi.it/~andrea/Dida2ca/PLP-­‐14/

 

Prof.  Andrea  Corradini  

Department  of  Computer  Science,  Pisa

 

•  Genera;on  of  Lexical  Analyzers    

Lesson 5!

(2)

Creating a Lexical Analyzer with Lex and Flex

lex  (or  flex)  

lex source program

lex.l lex.yy.c

input stream

C   compiler  

a.out   sequence

of tokens lex.yy.c

a.out

(3)

Lex  Specifica;on  

•  A lex specification consists of three parts:

regular definitions, C declarations in %{ %}

%%

translation rules %%

•  The translation rules are of the form:

p1 { action1 } p2 { action2 } …

pn { actionn }

(4)

Regular  Expressions  in  Lex  

x match the character x

\. match the character .

“string” match contents of string of characters . match any character except newline

^ match beginning of a line

$ match the end of a line

[xyz] match one character x, y, or z (use \ to escape -) [^xyz]match any character except x, y, and z

[a-z] match one of a to z

r* closure (match zero or more occurrences)

r+ positive closure (match one or more occurrences) r? optional (match zero or one occurrence)

r1r2 match r1 then r2 (concatenation) r1|r2 match r1 or r2 (union)

( r ) grouping

(5)

Example  Lex  Specifica;on  1  

%{

#include <stdio.h>

%}

%%

[0-9]+ { printf(“%s\n”, yytext); } .|\n { }

%%

main()

{ yylex();

}

Contains the matching

lexeme

Invokes the lexical

analyzer

lex spec.l

gcc lex.yy.c -ll ./a.out < spec.l

Translation rules

(6)

Example  Lex  Specifica;on  2  

%{

#include <stdio.h>

int ch = 0, wd = 0, nl = 0;

%}

delim [ \t]+

%%

\n { ch++; wd++; nl++; }

^{delim} { ch+=yyleng; }

{delim} { ch+=yyleng; wd++; } . { ch++; }

%%

main()

{ yylex();

printf("%8d%8d%8d\n", nl, wd, ch);

}

Regular definition Translation

rules

(7)

Example  Lex  Specifica;on  3  

%{

#include <stdio.h>

%}

digit [0-9]

letter [A-Za-z]

id {letter}({letter}|{digit})*

%%

{digit}+ { printf(“number: %s\n”, yytext); } {id} { printf(“ident: %s\n”, yytext); } . { printf(“other: %s\n”, yytext); }

%%

main()

{ yylex();

}

Regular definitions Translation

rules

(8)

Example  Lex  Specifica;on  4  

%{ /* definitions of manifest constants */

#define LT (256)

%}

delim [ \t\n]

ws {delim}+

letter [A-Za-z]

digit [0-9]

id {letter}({letter}|{digit})*

number {digit}+(\.{digit}+)?(E[+\-]?{digit}+)?

%%

{ws} { }

if {return IF;}

then {return THEN;}

else {return ELSE;}

{id} {yylval = install_id(); return ID;}

{number} {yylval = install_num(); return NUMBER;}

“<“ {yylval = LT; return RELOP;}

“<=“ {yylval = LE; return RELOP;}

“=“ {yylval = EQ; return RELOP;}

“<>“ {yylval = NE; return RELOP;}

“>“ {yylval = GT; return RELOP;}

“>=“ {yylval = GE; return RELOP;}

Return token to

parser Token

attribute

Install yytext as

(9)

Design  of  a  Lexical  Analyzer  Generator  

•  Translate regular expressions to NFA

•  Translate NFA to an efficient DFA

 regular  

expressions   NFA   DFA  

Simulate NFA to recognize

tokens

Simulate DFA to recognize

tokens Optional

(10)

Nondeterminis;c  Finite  Automata  

•  An NFA is a 5-tuple (S, Σ, δ, s

0

, F) where

S is a finite set of states

Σ is a finite set of symbols, the alphabet

δ is a mapping from S × Σ to a set of states s

0

∈ S is the start state

F ⊆ S is the set of accepting (or final) states

(11)

Transi;on  Graph  

•  An NFA can be diagrammatically represented by a labeled directed graph called a transition graph

0  

start a 1   b 2   b 3  

a

b

S  =  {0,1,2,3}  

Σ  =  {a,b}  

s0  =  0   F  =  {3}  

(12)

Transi;on  Table  

•  The mapping δ of an NFA can be represented in a transition table

State Input a

Input b 0 {0, 1} {0}

1 {2}

2 {3}

δ(0,a)  =  {0,1}  

δ(0,b)  =  {0}  

δ(1,b)  =  {2}  

δ(2,b)  =  {3}  

(13)

The  Language  Defined  by  an  NFA  

•  An NFA accepts an input string x (over Σ) if and only if there is some path with edges labeled with symbols from x in sequence from the start state to some

accepting state in the transition graph

•  A state transition from one state to another on the path is called a move

•  The language defined by an NFA is the set of input strings it accepts

•  What is the language accepted by the example NFA?

–  (a⏐b)*abb

(14)

Design  of  a  Lexical  Analyzer  Generator:  

RE  to  NFA  to  DFA  

s0  

N(p1)   N(p2)  

start ε   ε  

N(pn)  

ε  

p1 { action1 }

p2 { action2 }

pn { actionn }

action1 action2 actionn Lex specification with

regular expressions

NFA

Subset construction

(15)

15

N(r2)   N(r1)  

From  Regular  Expression  to  NFA   (Thompson’s  Construc;on)  

f   i   ε  

f   i   a

f  

i   N(r1)  

N(r2)  

start

start

start ε  

ε   ε  

ε  

f   i  

start

N(r)   f  

i  

start

ε  

ε   ε  

a

r1⏐r2

r1r2

r* ε   ε  

(16)

16  

0  

start 1   a 10  

2  

b

b a

b

3  

4   5  

6   7   8   9  

ε   ε   ε  

ε   ε  

ε   ε  

ε   (a⏐b)*abb  

 /  &+(/ #,"#/ $#-)")/

 

 

 

 

  

 

 

 

:7PH4X  / $)HM4X OH44X 5CHX  #



%O)O4X ?P>+4HMX 9)R4X +44?X 09CM4?X 5CHX 0C?M:MO4?0VX S:O9X S9)OX 5C<<CSM X CHX X S4X 0C@MOIP0OX



'4X 0)?X ?CSX 0C>,:?4X J1X )?2X J   X PM:?7X O94X 0C?MOHP0O:C?XC5X:7 X  /OCX C+O):?X O94X !X 5CHXX  X O9:MX "X :MX M9CS?X :?X:7 X/

:7PK4X/ !X 5CHXX

&94X!X5CHXX X :MXO94X M)>4X O9)OX5CHX  X &94X!X5CHXX #:MX O94?X )MXM9DSAX:?X 7X/(4X 9)R4X PM42X O94X 0C?MOHP0O:CBX:?X ;7X /OCX+P:<2X

An  example:  

RE  -­‐>  Parse  Tree  -­‐>  NFA  

(17)

17

Combining  the  NFAs  of  a  Set  of   Regular  Expressions  

2   1   a

start

6   3   a

start

4   b 5   b

8   7   b

start

a b

a { action1 }

abb { action2 } a*b+ { action3 }

2   1   a

6  

3   a 4   b 5   b

8   7   b

a b

0  

start

ε   ε   ε  

(18)

Simula;ng  the  Combined  NFA     Example  1  

2   1   a

6  

3   a 4   b 5   b

8   7   b

a b

0  

start

ε   ε   ε  

0 1 3

2 4 7

7 8

Must find the longest match:

action1

action2

action3

a a b a none

action3

(19)

Simula;ng  the  Combined  NFA     Example  2  

2   1   a

6  

3   a 4   b 5   b

8   7   b

a b

0  

start

ε   ε   ε  

0 1 3 7

2 4 7

5 8

6 8

When two or more accepting states are reached, the first action given in the Lex specification is executed

action1

action2

action3

a b b a none

action2 action3

(20)

Determinis;c  Finite  Automata  

•  A deterministic finite automaton is a special case of an NFA

–  No state has an ε-transition

–  For each state s and input symbol a there is at most one edge labeled a leaving s

•  Each entry in the transition table is a single state

–  At most one path exists to accept a string –  Simulation algorithm is simple

(21)

Example  DFA  

0  

start a 1   b 2   b 3  

b b

a

a

a

A  DFA  that  accepts  (a⏐b)*abb

(22)

Conversion  of  an  NFA  into  a  DFA  

•  The subset construction algorithm converts an NFA into a DFA using:

ε-closure(s) = {s} ∪ {t ⏐ s →ε … →ε t}

ε-closure(T) = ∪s∈T ε-closure(s)

move(T, a) = {t ⏐ s →a t and s ∈ T}

•  The algorithm produces:

Dstates is the set of states of the new DFA consisting of sets of states of the NFA

Dtran is the transition table of the new DFA

(23)

23

ε-­‐closure  and  move  Examples  

2   1   a

6  

3   a 4   b 5   b

8   7   b

a b

0  

start

ε   ε   ε  

ε-­‐closure({0})  =  {0,1,3,7}  

move({0,1,3,7},a)  =  {2,4,7}  

ε-­‐closure({2,4,7})  =  {2,4,7}  

move({2,4,7},a)  =  {7}  

ε-­‐closure({7})  =  {7}  

move({7},b)  =  {8}  

ε-­‐closure({8})  =  {8}  

move({8},a)  =  ∅  

0 1 3 7

2 4 7

7 8

a a b a none

Also used to simulate NFAs (!)

(24)

Simula;ng  an  NFA  using   ε-­‐closure  and  move  

S := ε-closure({s0}) Sprev := ∅

a := nextchar() while S ≠ ∅ do

Sprev := S

S := ε-closure(move(S,a)) a := nextchar()

end do

if Sprev ∩ F ≠ ∅ then

execute action in Sprev return “yes”

(25)

The  Subset  Construc;on  Algorithm:  

from  a  NFA  to  an  equivalent  DFA  

•  Initially, ε-closure(s0) is the only state in Dstates and it is unmarked

while there is an unmarked state T in Dstates do mark T

for each input symbol a

Σ

do

U := ε-closure(move(T,a)) if U is not in Dstates then

add U as an unmarked state to Dstates end if

Dtran[T, a] := U end do

end do

(26)

Subset  Construc;on  Example  1  

0  

start 1   a 10  

2  

b

b a

b

3  

4   5  

6   7   8   9  

ε   ε   ε  

ε   ε  

ε   ε  

ε  

A  

start

B   C  

D   E  

b b

b

b

b a

Dstates

A = {0,1,2,4,7}

B = {1,2,3,4,6,7,8}

C = {1,2,4,5,6,7}

a

(27)

27

Subset  Construc;on  Example  2  

2   1   a

6  

3   a 4   b 5   b

8   7   b

a b

0  

start

ε   ε   ε  

a1

a2

a3

Dstates

A = {0,1,3,7}

B = {2,4,7}

C = {8}

D = {7}

E = {5,8}

F = {6,8}

A  

start

a

D  

b b

b

b a

B   b

C  

E   F  

a

b

a1

a3

a3 a2 a3

(28)

Minimizing  the  Number  of  States  of  a   DFA  

A  

start

B   C  

D   E  

b b

b

b

b a a

a a

a

AC  

start

B   b D   b E  

b

a b

a

a a

(29)

From  Regular  Expression  to  DFA   Directly  

•  The “important states” of an NFA are those without an ε-transition, that is if

move({s}, a) ≠ ∅ for some a then s is an important state

•  The subset construction algorithm uses only the important states when it determines

ε-closure(move(T, a))

(30)

N(r2)   N(r1)  

What  are  the  “important  states”  in  the   NFA  built  from  Regular  Expression?  

f   i   ε  

f   i   a

f  

i   N(r1)  

N(r2)  

start

start

start ε  

ε   ε  

ε  

f   i  

start

N(r)   f  

i  

start

ε   ε  

a

r1⏐r2

r1r2

r* ε   ε  

(31)

From  Regular  Expression  to  DFA   Directly  (Algorithm)  

•  The only accepting state (via the Thompson algorithm) is not important

•  Augment the regular expression r with a

special end symbol # to make accepting states important: the new expression is r#

•  Construct a syntax tree for r#

•  Attach a unique integer to each node not

labeled by ε

(32)

From  Regular  Expression  to  DFA  

Directly:  Syntax  Tree  of       (a|b)*abb# !

*

|

a3

b4

b5

#6

concatenation

“cat-nodes”

closure

“star-node”

alternation

“or-node” position

number

(33)

From  Regular  Expression  to  DFA   Directly:  Annota;ng  the  Tree  

•  Traverse the tree to construct functions nullable, firstpos, lastpos, and followpos

•  For a node n, let L(n) be the language generated by the subtree with root n

•  nullable(n): L(n) contains the empty string ε

•  firstpos(n): set of positions under n that can match the first symbol of a string in L(n)

•  lastpos(n): the set of positions under n that can match the last symbol of a string in L(n)

•  followpos(i): the set of positions that can follow position i in the tree

(34)

From  Regular  Expression  to  DFA  

Annota;ng  the  Syntax  Tree  of   ( a|b)*abb#

{6}

{1, 2, 3}

{5}

{1, 2, 3}

{4}

{1, 2, 3}

{3}

{1, 2, 3}

{1, 2}

{1, 2} *

{1, 2}

{1, 2} |

{3}

{3} a

{4}

{4} b

{5}

{5} b

{6}

{6} #

nullable

3

4

5

6

(35)

35

From Regular Expression to DFA Directly: Annotating the Tree

Node n nullable(n) firstpos(n) lastpos(n)

Leaf ε true

Leaf i false {i} {i}

| / \ c1 c2

nullable(c1) or

nullable(c2)

firstpos(c1)

firstpos(c2)

lastpos(c1)

lastpos(c2)

/ \ c1 c2

nullable(c1) and

nullable(c2)

if nullable(c1) then firstpos(c1) ∪

firstpos(c2) else firstpos(c1)

if nullable(c2) then lastpos(c1) ∪

lastpos(c2) else lastpos(c2)

*

| c

true firstpos(c1) lastpos(c1)

(36)

From Regular Expression to DFA Directly: followpos

for each node n in the tree do

if n is a cat-node with left child c1 and right child c2 then for each i in lastpos(c1) do

followpos(i) := followpos(i) ∪ firstpos(c2) end do

else if n is a star-node

for each i in lastpos(n) do

followpos(i) := followpos(i) ∪ firstpos(n) end do

end if

(37)

37

From Regular Expression to DFA

followpos on the Syntax Tree of

(a|b)*abb#

{6}

{1, 2, 3}

{5}

{1, 2, 3}

{4}

{1, 2, 3}

{3}

{1, 2, 3}

{1, 2}

{1, 2} *

{1, 2}

{1, 2} |

{1}

{1} a {2} b{2}

{3}

{3} a

{4}

{4} b

{5}

{5} b

{6}

{6} #

nullable

1 2

3

4

5

6

<ċ

$  $ $

2Ð

  $ $  $ $$$ ǝҿ   $

2Ð

$  $ $ $$ $

1Ð

$  $   $ $$ $

2Ð!"#$%&

$$$ ^ $    $  $

$ $ dҿ

 $

2 Ç Ð

  $   $ !$ !$

֎നஸᎯᅻ਺ᗮ(ùŷѢ؛Ĝ  ؛ࡐྲ়ྀᗮΞ  ؛૧အᅻᗮྲྀအ়਺በᗮനྲྀᗮዐ௿਺ᗮበᒹྲྀዐࡐᒏᗮዐᅻ਺਺ᗮ૧အᅿᗮ$5Ȇg)«5ggƽ̓

ߖ਺ᗮ༌ᎯበዐᗮࡐเበအᗮࡐႼႼเᒹᗮᅻᎯเ਺ᗮዐအᗮዐ௿਺ᗮበዐࡐᅻŸྲྀအ়਺ȯᗮݬ௿ࡐዐᗮᅻᎯเ਺ᗮዐ਺เเበᗮᎯበᗮႼအበനዕനအྲྀበᗮĽ؛ࡐྲ়ྀᗮ

ࡐᅻਾᗮനྲྀᗮ࣒အዐ௿ᗮ ࡐྲ়ྀᗮ በനྲྀह਺ᗮ࣒အዐ௿ᗮĜ  ؛ࡐྲ়ྀᗮ  ؛૧အᅻᗮ ዐ௿നበᗮྲྀအ়਺ᗮࡐᅻ਺ᗮWĽ ؛ù؛ ݬ௿਺ᗮ׏အ༌Ⴜเ਺ዐ਺ᗮበੁዐበᗮ ࡐᅻ਺ᗮበᎯ༌༌ࡐᅻനᔎ਺়ᗮനྲྀᗮ֎നஸɉᗮ(ù-"ù؛

Wҿ

45&'† ҿ

؛

Ϭ׋

W  (؛

W  (؛

W؛W؛

W-؛

ΠϏ

֎നஸᎯᅻ਺ᗮ(ù-"Ʉ؛ݬ௿਺ᗮ૧Ꭿྲྀहዐനအྲྀᗮ

ߖ਺ᗮहࡐྲྀᗮᅻ਺Ⴢᅻ਺በ਺ྲྀዐᗮዐ௿਺ᗮ૧Ꭿྲྀहዐനအྲྀᗮ ࣒ᒹᗮहᅻ਺ࡐዐനྲྀஸᗮࡐᗮ়നᅻ਺हዐ਺়ᗮஸሪࡐႼ௿ᗮᑌനዐ௿ᗮ ࡐᗮྲྀအ়਺ᗮ૧အᅻᗮ਺ᅶह௿ᗮႼအበനዐനအྲྀᗮࡐྲ়ྀᗮࡐྲྀᗮࡐᅻड़ᗮ૧ᅻအ༌ᗮႼအበനዐനအྲྀᗮĠ؛ዐအᗮႼအበനዐനအྲྀᗮ෼ᗮന૧ᗮࡐྲ়ྀᗮအྲྀเᒹᗮനଘᗮ

෼ᗮനበᗮനྲྀᗮ ֎നஸᎯᅻ਺ᗮ(ù- ؛በ௿အᑌበᗮዐ௿നበᗮஸᅻࡐ჎௿ᗮ૧အᅻᗮዐ௿਺ᗮ૧Ꭿྲྀहዐനအྲྀᗮအ૧ᗮ֎നஸȯᗮ(ù-"ù؛

צዐᗮበ௿အᎯเ়ᗮहအ༌਺ᗮࡐበᗮྲྀအᗮበᎯᅻႼᅻനበ਺ᗮዐ௿ࡐዐᗮዐ௿਺ᗮஸᅻࡐႼ௿ᗮ૧အᅻᗮ നበᗮࡐเ༌အበዐᗮࡐྲྀᗮڴ֎ӕᗮ ᑌനዐ௿အᎯዐᗮ ծƊዐᅻࡐྲྀበനዐനအྲྀበᗮ૧အᅻᗮዐ௿਺ᗮᎯྲ়ྀ਺ᅻเᒹനྲྀஸᗮራ਺ஸᎯเࡐᅻᗮ਺ᒏჂሬ਺በበനအྲྀýᗮࡐྲ়ྀᗮᑌအᎯเ়ᗮ࣒਺॑အ༌਺ᗮ အྲྀ਺ᗮന૧ᗮᑌ਺гᗮ

ù؛ڈࡐค਺ᗮࡐเเᗮႼအበനዐനအྲྀበᗮനྲྀᗮ֨ ה ؛အ૧ᗮዐ௿ਾᗮᅻအအዐᗮ࣒਺ᗮനྲྀനዐനࡐเᗮበዐࡐዐ਺በýᗮ

ù؛ ٟࡐ࣒਺เᗮ਺ࡐॕ௿ᗮࡐᅻहᗮ૧ᅻအ༌ᗮĠ؛ዐအᗮ෼ᗮ࣒ᒹᗮዕ௿਺ᗮበᒹ༌࣒အเᗮࡐዐᗮႼအበനዐനအྲྀᗮĠ ؛ࡐྲ়ྀᗮ

(38)

From Regular Expression to DFA Directly: Algorithm

s0 := firstpos(root) where root is the root of the syntax tree for (r)#

Dstates := {s0} and is unmarked

while there is an unmarked state T in Dstates do mark T

for each input symbol a

Σ

do

let U be the union of followpos(p) for all positions p in T

if U is not empty and not in Dstates then add U as an unmarked state to Dstates end if

Dtran[T, a] := U end do

(39)

39

From Regular Expression to DFA Directly: Example

1,2,3  

start a 1,2,  

3,4   1,2,  

3,6   1,2,  

3,5  

b b

b b

a a

a

Node followpos

1 a {1, 2, 3}

2 b {1, 2, 3}

3 a {4}

4 b {5}

5 b {6}

6 # -

1  

2  

3   4   5   6  

Riferimenti

Documenti correlati

•  Rearranging expressions may lead to arithmetic overflow or different floating point results.. –  Assume b, d, and c are very large positive integers, then if b-c+d is

more operations on characters .... more operations on

–  Compile time: the time of translation of high-level constructs to machine code and choice of memory layout for data objects. –  Link time: the time at which multiple object

[r]

•  Every nonterminal has one (recursive) procedure responsible for parsing the nonterminal’s syntactic category of input tokens. •  When a nonterminal has multiple

–  Transla=ng short-circuit Boolean expressions and flow-of- control statements with backpatching lists.. How does the code

special end symbol # to make accepting states important: the new expression is r#. •  Construct a syntax tree

•  A backward analysis propagates informa@on in the opposite direc@on in which the program flows. •  A forward analysis propagates informa@on in the same direc@on in which