• Non ci sono risultati.

Parsing Parsing

N/A
N/A
Protected

Academic year: 2021

Condividi "Parsing Parsing"

Copied!
92
0
0

Testo completo

(1)

Parsing Parsing

Giuseppe Attardi

Giuseppe Attardi

Università di Pisa

Università di Pisa

(2)

Parsing Parsing

To extract the grammatical structure of To extract the grammatical structure of

a sentence, where:

a sentence, where:

sentence = program sentence = program

words = tokens words = tokens

For further information:

Aho, Sethi, Ullman, “Compilers: Principles, Techniques, and Tools” (a.k.a, the “Dragon Book”)

(3)

Outline of coverage Outline of coverage

Context-free grammars Context-free grammars

Parsing Parsing

– Tabular Parsing Methods – One pass

Top-down

Bottom-up

Yacc Yacc

(4)

Grammatical structure of program Grammatical structure of program

function-def

name arguments stmt-list

main stmt

expression operator

expression expression

variable string

cout

<<

“hello, world\n”

()

(5)

Context-free languages Context-free languages

Grammatical structure defined by context- Grammatical structure defined by context-

free grammar free grammar

statementstatement  assignmentassignment ; ; statement

statement  expressionexpression ; ; statement

statement  compound-statementcompound-statement assignment

assignment  ident =ident = expressionexpression compound-statement

compound-statement 



{{ declaration-list statement-list declaration-list statement-list }} Definition (Context-free). Grammar with only one

non-terminal symbol in left hand side of productions.

Definition (Context-free). Grammar with only one

non-terminal symbol in left hand side of productions.

Symbols terminal (token) non-terminal

(6)

Context Free Grammar Context Free Grammar

G G = ( = ( V V , ,   , , P P , , S S ) )

V is a finite set of non-terminal symbols V is a finite set of non-terminal symbols

  is a finite set of terminal symbols is a finite set of terminal symbols

P P is a finite set of productions is a finite set of productions   V V   ( ( V V  

 

S is the start symbol S is the start symbol

(7)

Parse trees Parse trees

Parse tree from Parse tree from A A ” = root labeled ” = root labeled with

with A A

Complete parse tree” = all leaves Complete parse tree” = all leaves labeled with tokens

labeled with tokens

Parse tree: tree labeled with grammar Parse tree: tree labeled with grammar

symbols, such that:

symbols, such that:

if node is labeled

if node is labeled A A and its and its children are labeled

children are labeled x x

11

... ... x x

nn

, then there , then there is a production

is a production

A A     x x

11

... ... x x

nn

Parse tree: tree labeled with grammar Parse tree: tree labeled with grammar

symbols, such that:

symbols, such that:

if node is labeled

if node is labeled A A and its and its children are labeled

children are labeled x x

11

... ... x x

nn

, then there , then there is a production

is a production

A A     x x

11

... ... x x

nn

(8)

Parse trees and sentences Parse trees and sentences

Frontier Frontier of tree = labels on leaves (in left- of tree = labels on leaves (in left- to-right order)

to-right order)

Frontier of tree from Frontier of tree from S S is a is a sentential form sentential form

L E a

L

; E

“Frontier”

Definition. A

Definition. A sentence sentence is the frontier of a complete is the frontier of a complete tree from

tree from S.S.

Definition. A

Definition. A sentence sentence is the frontier of a complete is the frontier of a complete tree from

tree from S.S.

(9)

Example Example

G G : L : L   L L ; ; E | E E | E E E   a a | | b b

Syntax trees from start symbol (L):

Syntax trees from start symbol (L):

a a;E a;b;b

L E a

L E a

L

; E L

E a

L

; E b

L

E b

;

Sentential forms:

Sentential forms:

(10)

Derivations Derivations

Alternate definition of

Alternate definition of sentential form sentential form : :

Given Given   , ,   in in V V *, say *, say   is a is a derivation derivation step

step if if     ’’ and ’’ and   = =     ’’ , where ’’ , where A A  

  is a production is a production

  is a is a sentential form sentential form iff there exists a iff there exists a derivation

derivation (sequence of derivation steps) (sequence of derivation steps) S S   (alternatively, we say that (alternatively, we say that S S  

  ) )

Two definitions are equivalent, but note that there Two definitions are equivalent, but note that there

are many derivations corresponding to each parse tree are many derivations corresponding to each parse tree Two definitions are equivalent, but note that there

Two definitions are equivalent, but note that there

are many derivations corresponding to each parse tree

are many derivations corresponding to each parse tree

(11)

Another example Another example

H H : L : L   E E ; L | E ; L | E E

E   a a | | b b

L E a

L E a L

E ; L

E b L

E ; a

L E b

;

(12)

Ambiguity Ambiguity

A sentence can have more than one parse A sentence can have more than one parse tree tree

A grammar is A grammar is ambiguous ambiguous if there is a if there is a sentence with more than one parse tree sentence with more than one parse tree

Example 1 Example 1

E  E+E | E*E | id

E

E

E E

E

id id

+

id

*

E

E

E E

id E

id id

+

*

(13)

Notes Notes

If e then if b then d else f If e then if b then d else f

{ int x; y = 0; } { int x; y = 0; }

a.b.c = d; a.b.c = d;

Id -> s | s.id Id -> s | s.id

E -> E + T -> E + T + T -> T + T + T -> id E -> E + T -> E + T + T -> T + T + T -> id + T + T -> id + T * id + T -> id + id * id + T + T -> id + T * id + T -> id + id * id

+ T ->

+ T ->

id + id * id + id

id + id * id + id

(14)

Ambiguity Ambiguity

Ambiguity is a feature of the Ambiguity is a feature of the

grammar rather than the language grammar rather than the language

Certain ambiguous grammars may Certain ambiguous grammars may

have equivalent unambiguous ones

have equivalent unambiguous ones

(15)

Grammar Grammar

Transformations Transformations

Grammars can be transformed Grammars can be transformed without affecting the language without affecting the language

generated generated

Three transformations are discussed Three transformations are discussed next:

next:

– Eliminating Ambiguity

– Eliminating Left Recursion (i.e.

productions of the form

AA

 )

– Left Factoring

(16)

Eliminating Ambiguity Eliminating Ambiguity

Sometimes an ambiguous grammar can Sometimes an ambiguous grammar can be rewritten to eliminate ambiguity

be rewritten to eliminate ambiguity

For example, the grammar of Example 1 For example, the grammar of Example 1 can be written as follows:

can be written as follows:

E  E +T | T

T  E * id | id

The language generated by this grammar The language generated by this grammar is the same as that generated by the

is the same as that generated by the previous grammar. Both generate

previous grammar. Both generate id id (+ (+ id| id|

**

id id )* )*

However, this grammar is not ambiguous However, this grammar is not ambiguous

(17)

Eliminating Ambiguity (Cont.) Eliminating Ambiguity (Cont.)

One advantage of this grammar is One advantage of this grammar is that it represents the precedence that it represents the precedence

between operators. In the parsing between operators. In the parsing

tree, products appear nested within tree, products appear nested within

additions additions

E

T

T E

id

+

*

id T

id

(18)

Eliminating Ambiguity (Cont.) Eliminating Ambiguity (Cont.)

An example of ambiguity in a An example of ambiguity in a

programming language is the dangling programming language is the dangling else else

Consider Consider

S S



if if   then then S S else else S S | | if if   then then S S | |  

(19)

Eliminating Ambiguity (Cont.) Eliminating Ambiguity (Cont.)

When there are two nested ifs and When there are two nested ifs and only one else..

only one else..

S

ifif then S else S

if  then S

S

ifif then S

ifif then S else S

(20)

Eliminating Ambiguity Eliminating Ambiguity

(Cont.) (Cont.)

In most languages (including C++ and Java), In most languages (including C++ and Java), each

each else is assumed to belong to the else is assumed to belong to the nearest

nearest if if that is not already matched by an that is not already matched by an else else . .

This association is expressed in the following This association is expressed in the following (unambiguous) grammar:

(unambiguous) grammar:

S S MatchedMatched

| Unmatched| Unmatched

Matched Matched ifif thenthen Matched Matched else Matchedelse Matched

| |

Unmatched Unmatched ifif then Sthen S

||ifif then Matched then Matched else Unmatchedelse Unmatched

(21)

Eliminating Ambiguity (Cont.) Eliminating Ambiguity (Cont.)

Ambiguity is a property of the Ambiguity is a property of the grammar

grammar

It is undecidable whether a context It is undecidable whether a context free grammar is ambiguous

free grammar is ambiguous

The proof is done by reduction to The proof is done by reduction to Post’s correspondence problem Post’s correspondence problem

Although there is no general Although there is no general

algorithm, it is possible to isolate algorithm, it is possible to isolate certain constructs in productions certain constructs in productions

which lead to ambiguous grammars

which lead to ambiguous grammars

(22)

Eliminating Ambiguity (Cont.) Eliminating Ambiguity (Cont.)

For example, a grammar containing the production For example, a grammar containing the production

AAAA |AA | would be ambiguous, because the would be ambiguous, because the substring

substring  has two parses: has two parses:

A

A A

A

A

A A

A A

A

This ambiguity disappears if we use the productions This ambiguity disappears if we use the productions

AAAB |AB | BB and and BB or the productions or the productions

AABA |BA | BB and and BB..

(23)

Eliminating Ambiguity (Cont.) Eliminating Ambiguity (Cont.)

Examples of ambiguous productions: Examples of ambiguous productions:

AAA AA | A

AA | AA

A CF language is inherently ambiguous if A CF language is inherently ambiguous if it has no unambiguous CFG

it has no unambiguous CFG

– An example of such a language is

L = {aibjcm | i=j or j=m} which can be generated by the grammar:

SAB | DC

AaA | CcC | 

BbBc |  DaDb | 

(24)

Elimination of Left Recursion Elimination of Left Recursion

A grammar is left recursive if it has a A grammar is left recursive if it has a nonterminal

nonterminal A A and a derivation and a derivation A A  



A A   for for some string

some string  

Top-down parsing methods cannot handle left-

recursive grammars, so a transformation to eliminate left recursion is needed

Immediate left recursion (productions of the Immediate left recursion (productions of the form

form A A   A A   ) can be easily eliminated: ) can be easily eliminated:

1. Group the A-productions as

A A1 | A2 | … | Am | 1| 2 | … | n where no i begins with A

2. Replace the A-productions by

A 1A’ | 2A’ | … | nA’

A’ 1A’ |2A’| … | mA’ | 

(25)

Elimination of Left Recursion Elimination of Left Recursion

(Cont.) (Cont.)

The previous transformation, The previous transformation,

however, does not eliminate left however, does not eliminate left

recursion involving two or more steps recursion involving two or more steps

For example, consider the grammar For example, consider the grammar

S A

a |

b

A A

c |

Sd |

S is left-recursive because S is left-recursive because S S

  A A a a   S S da da   but it is not immediately left but it is not immediately left recursive

recursive

(26)

Elimination of Left Recursion Elimination of Left Recursion

(Cont.) (Cont.)

Algorithm. Left recursion elimination.

Algorithm. Left recursion elimination.

Arrange nonterminals in some order

Arrange nonterminals in some order AA11, , AA2 ,2 ,,…, ,…, AAnn for i = 1 to n {

for i = 1 to n {

for j = 1 to i - 1 { for j = 1 to i - 1 {

replace each production of the formreplace each production of the form Ai Aj

byby

Ai 1 | … | n

where where AAj j 1 1 |…| |…| n nare the current productions for are the current productions for AAjj }}

eliminate the immediate left recursion among the A eliminate the immediate left recursion among the Aii--

productions productions }}

(27)

Elimination of Left Recursion Elimination of Left Recursion

(Cont.) (Cont.)

Notice that iteration Notice that iteration ii only changes productions only changes productions with

with AAii on the left-hand side, and on the left-hand side, and AAjj with with jj > > ii in in the right-hand side

the right-hand side

Correctness induction proof: Correctness induction proof:

– Clearly true for i = 1

– If true for all i < k, then when the outer loop is executed for i = k, the inner loop will remove all productions Ai

Aj with j < i

– Finally, after the elimination of self recursion, m in any

AiAm productions will be > i

At the end of the algorithm, all derivations of the At the end of the algorithm, all derivations of the form

form AAi i AAmmwill have will have mm > > ii and therefore left and therefore left recursion will not be present

recursion will not be present

(28)

Left Factoring Left Factoring

Left factoring helps transform a grammar for Left factoring helps transform a grammar for predictive parsing

predictive parsing

For example, if we have the two productionsFor example, if we have the two productions S S ifif  thenthen SS elseelse SS

| | if if  thenthen SS

on seeing the input token

on seeing the input token if, we cannot if, we cannot

immediately tell which production to choose to immediately tell which production to choose to

expand expand SS

In general, if we have In general, if we have A A 1 1 ||22 and the input and the input begins with

begins with , we do not know, we do not know (without looking (without looking further) which production to use to expand

further) which production to use to expand AA

(29)

Left Factoring (Cont.) Left Factoring (Cont.)

However, we may defer the decision However, we may defer the decision by expanding A to

by expanding A to   A’ A’

Then after seeing the input derived Then after seeing the input derived from

from   , we may expand A’ to , we may expand A’ to  

1 1

or to or to

 

22

Left-factored, the original Left-factored, the original productions become

productions become

A A     A’ A’

A’ A’    

1 1

| |  

22

(30)

Non-Context-Free Language Non-Context-Free Language

Constructs Constructs

Examples of non-context-free languages are:Examples of non-context-free languages are:

– L1 = {wcw | w is of the form (a|b)*}

– L2 = {anbmcndm | n  1 and m  1 } – L3 = {anbncn | n 0 }

Languages similar to these that are context freeLanguages similar to these that are context free

– L'1 = {wcwR | w is of the form (a|b)*} (wR stands for w reversed)

This language is generated by the grammar

S aSa | bSb | c

– L'2 = {anbmcmdn | n 1 and m 1 }

This language is generated by the grammar

S aSd | aAd

A bAc | bc

(31)

Non-Context-Free Language Non-Context-Free Language

Constructs (Cont.) Constructs (Cont.)

L'' L''

2 2

= { = { a a

nn

b b

nn

c c

mm

d d

mm

| | n n   1 1 and and m m   1 1 } }

is generated by the grammar is generated by the grammar

S

AB

A

aAb | ab B

cBd | cd

L' L'

3 3

= { = { a a

nn

b b

nn

| | n n   1 1 } }

is generated by the grammar is generated by the grammar

S

aSb | ab

This language is not definable by any This language is not definable by any

regular expression

regular expression

(32)

CFG vs DFSA CFG vs DFSA

L‘ L‘

4 4

= { = { a a

nn

b b

mm

| | n n > 0, > 0, n n > 0} > 0}

0 1

b a

start b

(33)

Non-Context-Free Language Non-Context-Free Language

Constructs (Cont.) Constructs (Cont.)

Suppose we could construct a DFSM Suppose we could construct a DFSM DD accepting accepting L'L'3. 3.

DD must have a finite number of states, say must have a finite number of states, say kk. .

Consider the sequence of states Consider the sequence of states ss00, , ss11, , ss22, …, , …, sskk entered by

entered by DD having read having read , , aa, , aaaa, …, , …, aakk. .

Since Since DD only has only has kk states, two of the states in the states, two of the states in the sequence have to be equal. Say,

sequence have to be equal. Say, s sii  ssjj ( (ii  jj). ).

From From ssii, a sequence of , a sequence of ii bbs leads to an accepting s leads to an accepting (final) state. Therefore, the same sequence of

(final) state. Therefore, the same sequence of ii bs bs will also lead to an accepting state from

will also lead to an accepting state from ssjj. . Therefore

Therefore DD would accept would accept aajjbbii which means that which means that the language accepted by

the language accepted by DD is not identical to L’ is not identical to L’33. . A contradiction.

A contradiction.

(34)

Parsing Parsing

The parsing problem is: Given string of The parsing problem is: Given string of

tokens

tokens w w , find a parse tree whose frontier , find a parse tree whose frontier is is w w . (Equivalently, find a derivation from . (Equivalently, find a derivation from w w ) )

A A parser parser for a grammar for a grammar G G reads a list of reads a list of

tokens and finds a parse tree if they form tokens and finds a parse tree if they form a sentence (or reports an error otherwise) a sentence (or reports an error otherwise) Two classes of algorithms for parsing:

Two classes of algorithms for parsing:

– Top-down – Bottom-up

(35)

Parser generators Parser generators

A A parser generator parser generator is a program that reads is a program that reads a grammar and produces a parser

a grammar and produces a parser

The best known parser generator is The best known parser generator is yacc yacc It It produces bottom-up parsers

produces bottom-up parsers

Most parser generators - including yacc - Most parser generators - including yacc - do not work for every CFG; they accept a do not work for every CFG; they accept a

restricted class of CFG’s that can be restricted class of CFG’s that can be

parsed efficiently using the method parsed efficiently using the method

employed by that parser generator

employed by that parser generator

(36)

Top-down (predictive) Top-down (predictive)

parsing parsing

Starting from parse tree containing Starting from parse tree containing just

just S S , build tree down toward input. , build tree down toward input.

Expand left-most non-terminal.

Expand left-most non-terminal.

(37)

Top-down parsing algorithm Top-down parsing algorithm

Let input = a

Let input = a

11

a a

22

...a ...a

nn

current sentential form (csf) = current sentential form (csf) = S S

loop { loop {

suppose csf = a suppose csf = a

11

…a …a

kk

A A  

based on a based on a

kk+1+1

…, choose production …, choose production A A  

csf becomes a csf becomes a

11

…a …a

kk

  

} }

(38)

Top-down parsing example Top-down parsing example

Grammar:

Grammar: H H : L : L   E E ; ; L | E L | E E

E   a a | | b b Input:

Input: a;b a;b

Parse tree Sentential form

Parse tree Sentential form Input Input

L a;b

E;L a;b

L L E ;L

L E ;L a

a;L a;b

(39)

Top-down parsing example Top-down parsing example

(cont.) (cont.)

Parse tree Sentential form Input Parse tree Sentential form Input

a;E a;b

L E ;L a E

L E ;L a E b

a;b a;b

(40)

LL(1) parsing LL(1) parsing

Efficient form of top-down parsing Efficient form of top-down parsing

Use only first symbol of remaining Use only first symbol of remaining input (

input ( a a

kk+1+1

) to choose next ) to choose next

production. That is, employ a production. That is, employ a

function M:

function M:     N N   P in “choose P in “choose production” step of algorithm.

production” step of algorithm.

When this is possible, grammar is When this is possible, grammar is called LL(1)

called LL(1)

(41)

LL(1) examples LL(1) examples

Example 1: Example 1:

H: L E ; L | E E a | b

Given input a;b, so next symbol is a.

Which production to use? Can’t tell.

 H not LL(1)

(42)

LL(1) examples LL(1) examples

Example 2: Example 2:

Exp Term Exp’

Exp’  $ | + Exp Term id

(Use $ for “end-of-input” symbol.)

Grammar is LL(1): Exp and Term have only one production; Exp’ has two productions but only one is applicable at any time.

Grammar is LL(1): Exp and Term have only one production; Exp’ has two productions but only one is applicable at any time.

(43)

Nonrecursive predictive Nonrecursive predictive

parsing parsing

Maintain a stack explicitly, rather Maintain a stack explicitly, rather than implicitly via recursive calls than implicitly via recursive calls

Key problem during predictive Key problem during predictive

parsing: determining the production parsing: determining the production

to be applied for a non-terminal

to be applied for a non-terminal

(44)

Nonrecursive predictive parsing Nonrecursive predictive parsing

Algorithm.

Algorithm. Nonrecursive predictive parsingNonrecursive predictive parsing Set Set ipip to point to the first symbol of to point to the first symbol of ww$.$.

Push S onto the stack.

Push S onto the stack.

repeat repeat

Let XLet X be the top of the stack symbol and a the symbol pointed to by be the top of the stack symbol and a the symbol pointed to by ipip if if X is a terminal or $ X is a terminal or $ thenthen

ifif XX == == aa thenthen

pop Xpop X from the stack and advance from the stack and advance ipip else error()else error()

elseelse // // XX is a nonterminal is a nonterminal

if if M[M[X,a] == X,a] == XXYY11 Y Y22 … Y … Y kk thenthen pop Xpop X from the stack from the stack push Y

push YkkY Y k-1k-1, …, Y, …, Y11 onto the stack with Y onto the stack with Y11 on top on top (push nothing if

(push nothing if YY11 Y Y22 … Y … Y kk is is  ) ) output the production

output the production XXYY11 Y Y22 … Y … Y kk

else error()else error() until

until X == $ X == $

(45)

LL(1) grammars LL(1) grammars

No left recursion No left recursion

A  A : If this production is chosen, parse makes no progress.

No common prefixes No common prefixes

A  | 

Can fix by “left factoring”:

A  A’

’| 

(46)

LL(1) grammars (cont.) LL(1) grammars (cont.)

No ambiguity No ambiguity

Precise definition requires that

production to choose be unique

(“choose” function M very hard to

calculate otherwise)

(47)

LL(1) definition LL(1) definition

Define: Grammar G = (N, Define: Grammar G = (N, , P, S) is LL(1), P, S) is LL(1) iff whenever there iff whenever there are two left-most derivations

are two left-most derivations SS * * wAwA ww * * wtxwtx

SS * * wAwA ww * * wtywty it follows that

it follows that = =

Leftmost-derivation: where the leftmost non-terminal is Leftmost-derivation: where the leftmost non-terminal is

always expanded first always expanded first

In other words, given In other words, given 1. a string

1. a string wA wA in V* and in V* and

2. t, the first terminal symbol to be derived from 2. t, the first terminal symbol to be derived from AA there is at most one production that can be applied to

there is at most one production that can be applied to A toA to yield a derivation of any terminal string beginning with yield a derivation of any terminal string beginning with wtwt

(48)

Checking LL(1)-ness Checking LL(1)-ness

For any sequence of grammar symbols For any sequence of grammar symbols   , , define set FIRST(

define set FIRST(   ) )     to be to be FIRST(

FIRST(   ) = { ) = { a a | |     * * a a   for some for some   } }

FIRST sets can often be calculated by FIRST sets can often be calculated by inspection

inspection

(49)

FIRST Sets FIRST Sets

Exp  Term Exp’

Exp’  $ | + Exp Term id

(Use $ for “end-of-input” symbol)

FIRST($) = {$}

FIRST(

+

Exp) = {

+

}

FIRST($)  FIRST(

+

Exp) = {}

grammar is LL(1)

FIRST($) = {$}

FIRST(

+

Exp) = {

+

}

FIRST($)  FIRST(

+

Exp) = {}

grammar is LL(1)

(50)

FIRST Sets FIRST Sets

L E ; L | E E a | b

FIRST(E ; L) = {a, b} = FIRST(E) FIRST(E ; L)  FIRST(E)  {}

 grammar not LL(1).

FIRST(E ; L) = {a, b} = FIRST(E) FIRST(E ; L)  FIRST(E)  {}

 grammar not LL(1).

(51)

Computing FIRST Sets (no Computing FIRST Sets (no  

productions) productions)

Algorithm

Algorithm. Compute FIRST(. Compute FIRST(XX) for all grammar symbols ) for all grammar symbols XX

forall

forall XX  V do FIRST( V do FIRST(XX) = {}) = {}

forall

forall XX    do FIRST( do FIRST(XX) = {) = {XX}} repeat

repeat

forall productions forall productions X X  YY11YY22 … Y … Ykk do do FIRST(

FIRST(XX) = FIRST() = FIRST(XX) )  FIRST( FIRST(YY11))

until no more elements are added to any FIRST set until no more elements are added to any FIRST set

(52)

Computing FIRST Sets (with Computing FIRST Sets (with  

productions) productions)

Algorithm

Algorithm. Compute FIRST(. Compute FIRST(XX) for all grammar symbols ) for all grammar symbols XX

forall

forall XX  V do FIRST( V do FIRST(XX) = {}) = {}

forall

forall XX    do FIRST( do FIRST(XX) = {) = {XX}} forall productions

forall productions XX   do FIRST( do FIRST(XX) = FIRST() = FIRST(XX) )  { {}} repeat

repeat

forall productions forall productions X X  YY11YY22 … Y … Ykk do do for i = 1, k do

FIRST(

FIRST(XX) = FIRST() = FIRST(XX) )  (FIRST( (FIRST(YYii) - {) - {})}) if if   FIRST( FIRST(YYii) then break) then break

else if i == k then

FIRST(X) = FIRST(X)  {}

until no more elements are added to any FIRST set until no more elements are added to any FIRST set

(53)

FIRST Sets of Strings of Symbols FIRST Sets of Strings of Symbols

FIRST(X FIRST(X

11

X X

22

…X …X

nn

) is the union of ) is the union of FIRST(X

FIRST(X

11

) and all FIRST(X ) and all FIRST(X

ii

) such that ) such that  

  FIRST( FIRST( X X

kk

) for ) for k k = 1, 2, …, = 1, 2, …, i i -1 -1

FIRST(X FIRST(X

11

X X

22

…X …X

nn

) contains ) contains   iff iff     FIRST(

FIRST( X X

kk

) for k = 1, 2, …, ) for k = 1, 2, …, n n

(54)

FIRST Sets do not Suffice FIRST Sets do not Suffice

Given the productions Given the productions

A A   aT x aT x

A A   bT y bT y

T T   w w T T    

T T   w w should be applied when the next should be applied when the next input token is w.

input token is w.

T T     should be applied whenever the should be applied whenever the next terminal is either x or y

next terminal is either x or y

(55)

FOLLOW Sets FOLLOW Sets

For any nonterminal For any nonterminal X X , define the set , define the set FOLLOW(

FOLLOW( X X ) )     as as FOLLOW(

FOLLOW( X X ) = { ) = { a a | S | S   * *   X X a a   } }

(56)

Computing the FOLLOW Set Computing the FOLLOW Set

Algorithm. Compute FOLLOW(X) for all nonterminals Algorithm. Compute FOLLOW(X) for all nonterminals XX

FOLLOW(S) = {$}

FOLLOW(S) = {$}

forall productions A

forall productions A  BB do do

FOLLOW(B) = Follow(B) FOLLOW(B) = Follow(B)  (FIRST( (FIRST() - {) - {})}) repeat

repeat

forall productions A forall productions A  B or A B or A  BB with  with     FIRST(

FIRST() do) do

FOLLOW(B) = FOLLOW(B)

FOLLOW(B) = FOLLOW(B)  FOLLOW(A) FOLLOW(A) until all FOLLOW sets remain the same

until all FOLLOW sets remain the same

(57)

Another Definition of LL(1) Another Definition of LL(1)

Define: Grammar G is LL(1)

Define: Grammar G is LL(1) if for every if for every A A   N with productions A N with productions A  

11

 

nn

FIRST(

FIRST(  

i i

FOLLOW(A)) FOLLOW(A))   FIRST(

FIRST(  

j j

FOLLOW(A)) = {} for all FOLLOW(A)) = {} for all i i , , j j

(58)

Top-down Parsing Top-down Parsing

Input tokens: <t0,t1,…,ti,...>

L

E0 … En

Start symbol and root of parse tree Start symbol and root of parse tree

Input tokens: <ti,...> L E0 … En From left to right, ...

“grow” the parse tree downwards From left to right,

“grow” the parse tree downwards

(59)

Construction of a predictive parsing table Construction of a predictive parsing table

Algorithm

Algorithm. Construction of a predictive parsing . Construction of a predictive parsing table

table

M[,] = {}

M[,] = {}

forall productions A

forall productions A   do do forall a

forall a  FIRST( FIRST() do ) do M[A, a] = M[A, a]

M[A, a] = M[A, a]  {A  {A  }} if if   FIRST( FIRST() then ) then

forall b

forall b  FOLLOW(A) do FOLLOW(A) do M[A, b] = M[A, b]

M[A, b] = M[A, b]  {A  {A  }} Make all empty entries of M be error

Make all empty entries of M be error

(60)

Recursive Descent Recursive Descent

Parsing

Parsing

(61)

Recursive Descent Parser Recursive Descent Parser

stat

stat → var → var = = expr expr ; ; expr → term [

expr → term [ + + term] term]

term → factor [

term → factor [ * * factor] factor]

factor →

factor → ( ( expr expr ) ) | var | | var | constant | constant | call( fun, expr )

call( fun, expr ) var →

var → identifier identifier

(62)

Scanner Scanner

public class Scanner {

private StreamTokenizer input;

private Type lastToken;

public enum Type { INVALID_CHAR, NO_TOKEN , VARIABLE, PLUS, FALSE, TRUE,

// etc. for remaining tokens, then:

}; EOF

public Scanner (Reader r) {

input = new StreamTokenizer(r);

input.resetSyntax();

input.eolIsSignificant(false);

input.wordChars('a', 'z');

input.wordChars('A', 'Z');

input.ordinaryChar('+');

input.ordinaryChar('*');

input.ordinaryChar('=');

input.ordinaryChar('(');

input.ordinaryChar(')');

}

Riferimenti

Documenti correlati

Cfr. Fogazzaro, Piccolo mondo antico, cit., p.. Tiziana Piras, Reminiscenze e citazioni in “Piccolo mondo antico” 207. in Milano”) 48 che ricorda Jacopo “inferraiuolato sino

For the individual problem, input is erroneous haplotype data, from sequencing For the population problem, data is ambiguous genotype data, from screening.. OBJ is lead by

For the individual problem, input is erroneous haplotype data, from sequencing For the population problem, data is ambiguous genotype data, from screening.. OBJ is lead by

As soon as the main predicate or head is parsed, it makes available all lexical information in order to predict if possible the complement structure, or to guide the

In keeping with the indications laid down by the Recommendation, the Italian implementation plan determines that young people “should be introduced into the Guarantee

Quando sono usati come aggettivi possono essere seguiti sia da nomi sin go la ri che plurali, precedono il nome al quale sono riferiti e sono invariabili. Essi sono: some,

La forma negativa dei tempi semplici dei verbi inglesi non ausiliari si fa coniugando il ver- bo TO DO – DID – DONE, che in questo caso è ausiliare, al tempo previsto dalla frase,

Compare SLAC's large, high-power microwave generator (klystron - below) with this much smaller one (magnetron - right) from a typical microwave oven.. A klystron looks and