Parsing Parsing

(1)

Parsing Parsing

Giuseppe Attardi

Università di Pisa

(2)

Parsing Parsing

To extract the grammatical structure of To extract the grammatical structure of

a sentence, where:

sentence = program sentence = program

words = tokens words = tokens

For further information:

Aho, Sethi, Ullman, “Compilers: Principles, Techniques, and Tools” (a.k.a, the “Dragon Book”)

(3)

Outline of coverage Outline of coverage



Context-free grammars Context-free grammars



Parsing Parsing

– Tabular Parsing Methods – One pass

• Top-down

• Bottom-up



Yacc Yacc

(4)

Grammatical structure of program Grammatical structure of program

function-def

name arguments stmt-list

main stmt

expression operator

expression expression

variable string

cout

<<

“hello, world\n”

()

(5)

Context-free languages Context-free languages

Grammatical structure defined by context- Grammatical structure defined by context-

free grammar free grammar

statementstatement  assignmentassignment ; ; statement

statement  expressionexpression ; ; statement

statement  compound-statementcompound-statement assignment

assignment  ident =ident = expressionexpression compound-statement

compound-statement 



{{ declaration-list statement-list declaration-list statement-list }} Definition (Context-free). Grammar with only one

non-terminal symbol in left hand side of productions.

Definition (Context-free). Grammar with only one

non-terminal symbol in left hand side of productions.

Symbols terminal (token) non-terminal

(6)

Context Free Grammar Context Free Grammar

G G = ( = ( V V , ,   , , P P , , S S ) )



V is a finite set of non-terminal symbols V is a finite set of non-terminal symbols



  is a finite set of terminal symbols is a finite set of terminal symbols



P P is a finite set of productions is a finite set of productions   V V   ( ( V V  

 

^^



S is the start symbol S is the start symbol

(7)

Parse trees Parse trees



“ “ Parse tree from Parse tree from A A ” = root labeled ” = root labeled with

with A A



“ “ Complete parse tree” = all leaves Complete parse tree” = all leaves labeled with tokens

labeled with tokens

Parse tree: tree labeled with grammar Parse tree: tree labeled with grammar

symbols, such that:

if node is labeled

if node is labeled A A and its and its children are labeled

children are labeled x x

₁₁

... ... x x

_n_n

, then there , then there is a production

is a production

A A     x x

₁₁

... ... x x

_n_n

Parse tree: tree labeled with grammar Parse tree: tree labeled with grammar

symbols, such that:

if node is labeled

if node is labeled A A and its and its children are labeled

children are labeled x x

₁₁

... ... x x

_n_n

, then there , then there is a production

is a production

A A     x x

₁₁

... ... x x

_n_n

(8)

Parse trees and sentences Parse trees and sentences



Frontier Frontier of tree = labels on leaves (in left- of tree = labels on leaves (in left- to-right order)

to-right order)



Frontier of tree from Frontier of tree from S S is a is a sentential form sentential form

L E a

L

; E

“Frontier”

Definition. A

Definition. A sentence sentence is the frontier of a complete is the frontier of a complete tree from

tree from S.S.

Definition. A

Definition. A sentence sentence is the frontier of a complete is the frontier of a complete tree from

tree from S.S.

(9)

Example Example

G G : L : L   L L ; ; E | E E | E E E   a a | | b b

Syntax trees from start symbol (L):

a a;E a;b;b

L E a

L

; E L

E a

L

; E b

L

E b

;

Sentential forms:

(10)

Derivations Derivations

Alternate definition of

Alternate definition of sentential form sentential form : :



Given Given   , ,   in in V V , say , say   is a is a derivation derivation step

step if if   ’ ’   ’’ and ’’ and   = =   ’ ’   ’’ , where ’’ , where A A  

  is a production is a production



  is a is a sentential form sentential form iff there exists a iff there exists a derivation

derivation (sequence of derivation steps) (sequence of derivation steps) S S   (alternatively, we say that (alternatively, we say that S S  

^^

  ) )

Two definitions are equivalent, but note that there Two definitions are equivalent, but note that there

are many derivations corresponding to each parse tree are many derivations corresponding to each parse tree Two definitions are equivalent, but note that there

Two definitions are equivalent, but note that there

are many derivations corresponding to each parse tree

(11)

Another example Another example

H H : L : L   E E ; L | E ; L | E E

E   a a | | b b

L E a

L E a L

E ; L

E b L

E ; a

L E b

;

(12)

Ambiguity Ambiguity



A sentence can have more than one parse A sentence can have more than one parse tree tree



A grammar is A grammar is ambiguous ambiguous if there is a if there is a sentence with more than one parse tree sentence with more than one parse tree



Example 1 Example 1

E ** E+E | E*E | id**

E

E E

E

id id

+

id

*

E

E E

id E

id id

+

*

(13)

Notes Notes



If e then if b then d else f If e then if b then d else f



{ int x; y = 0; } { int x; y = 0; }



a.b.c = d; a.b.c = d;



Id -> s | s.id Id -> s | s.id

E -> E + T -> E + T + T -> T + T + T -> id E -> E + T -> E + T + T -> T + T + T -> id **+ T + T -> id + T * id + T -> id + id * id + T + T -> id + T * id + T -> id + id * id**

+ T ->

**id + id * id + id**

(14)

Ambiguity Ambiguity



Ambiguity is a feature of the Ambiguity is a feature of the

grammar rather than the language grammar rather than the language



Certain ambiguous grammars may Certain ambiguous grammars may

have equivalent unambiguous ones

(15)

Grammar Grammar

Transformations Transformations



Grammars can be transformed Grammars can be transformed without affecting the language without affecting the language

generated generated



Three transformations are discussed Three transformations are discussed next:

– Eliminating Ambiguity

– Eliminating Left Recursion (i.e.

productions of the form

AA

 )

– Left Factoring

(16)

Eliminating Ambiguity Eliminating Ambiguity



Sometimes an ambiguous grammar can Sometimes an ambiguous grammar can be rewritten to eliminate ambiguity

be rewritten to eliminate ambiguity



For example, the grammar of Example 1 For example, the grammar of Example 1 can be written as follows:

can be written as follows:

E  E +T | T

T  E * id | id



The language generated by this grammar The language generated by this grammar is the same as that generated by the

is the same as that generated by the previous grammar. Both generate

previous grammar. Both generate id id (+ (+ id| id|

**

id id )* )*



However, this grammar is not ambiguous However, this grammar is not ambiguous

(17)

Eliminating Ambiguity (Cont.) Eliminating Ambiguity (Cont.)



One advantage of this grammar is One advantage of this grammar is that it represents the precedence that it represents the precedence

between operators. In the parsing between operators. In the parsing

tree, products appear nested within tree, products appear nested within

additions additions

E

T

T E

id

+

*

id T

id

(18)

Eliminating Ambiguity (Cont.) Eliminating Ambiguity (Cont.)



An example of ambiguity in a An example of ambiguity in a

programming language is the dangling programming language is the dangling else else



Consider Consider

S S



if if   then then S S else else S S | | if if   then then S S | |  

(19)

Eliminating Ambiguity (Cont.) Eliminating Ambiguity (Cont.)



When there are two nested ifs and When there are two nested ifs and only one else..

only one else..

S

ifif  then S else S

if  then S 

S 

ifif  then S

ifif  then S else S



(20)

Eliminating Ambiguity Eliminating Ambiguity

(Cont.) (Cont.)



In most languages (including C++ and Java), In most languages (including C++ and Java), each

each else is assumed to belong to the else is assumed to belong to the nearest

nearest if if that is not already matched by an that is not already matched by an else else . .



This association is expressed in the following This association is expressed in the following (unambiguous) grammar:

(unambiguous) grammar:

S S MatchedMatched

| Unmatched| Unmatched

Matched Matched _ ifif  thenthen Matched Matched else Matchedelse Matched

| | 

Unmatched Unmatched  _ ifif then Sthen S

||ifif   then Matched then Matched else Unmatchedelse Unmatched

(21)

Eliminating Ambiguity (Cont.) Eliminating Ambiguity (Cont.)



Ambiguity is a property of the Ambiguity is a property of the grammar

grammar



It is undecidable whether a context It is undecidable whether a context free grammar is ambiguous

free grammar is ambiguous



The proof is done by reduction to The proof is done by reduction to Post’s correspondence problem Post’s correspondence problem



Although there is no general Although there is no general

algorithm, it is possible to isolate algorithm, it is possible to isolate certain constructs in productions certain constructs in productions

which lead to ambiguous grammars

(22)

Eliminating Ambiguity (Cont.) Eliminating Ambiguity (Cont.)

 For example, a grammar containing the production For example, a grammar containing the production

AAAA |AA |  would be ambiguous, because the would be ambiguous, because the substring

substring  has two parses: has two parses:

A

A A

A

A A

A



 



 This ambiguity disappears if we use the productions This ambiguity disappears if we use the productions

AAAB |AB | BB and and BB or the productions or the productions

AABA |BA | BB and and BB..

(23)

Eliminating Ambiguity (Cont.) Eliminating Ambiguity (Cont.)



Examples of ambiguous productions: Examples of ambiguous productions:

AAA AA | A

AA | AA



A CF language is inherently ambiguous if A CF language is inherently ambiguous if it has no unambiguous CFG

it has no unambiguous CFG

– An example of such a language is

L = {aⁱb^jc^m | i=j or j=m} which can be generated by the grammar:

SAB | DC

AaA | CcC | 

BbBc |  DaDb | 

(24)

Elimination of Left Recursion Elimination of Left Recursion



A grammar is left recursive if it has a A grammar is left recursive if it has a nonterminal

nonterminal A A and a derivation and a derivation A A  

^^

A A   for for some string

some string  

– Top-down parsing methods cannot handle left-

recursive grammars, so a transformation to eliminate left recursion is needed



Immediate left recursion (productions of the Immediate left recursion (productions of the form

form A A   A A   ) can be easily eliminated: ) can be easily eliminated:

1. Group the A-productions as

A A₁| ^A₂| … | ^A_m| ₁| ₂ | … | _n where no _i begins with A

2. Replace the A-productions by

A ₁Â’ | ₂Â’ | … | _nÂ’

A’  1A’ |2A’| … | mA’ | 

(25)

Elimination of Left Recursion Elimination of Left Recursion

(Cont.) (Cont.)



The previous transformation, The previous transformation,

however, does not eliminate left however, does not eliminate left

recursion involving two or more steps recursion involving two or more steps



For example, consider the grammar For example, consider the grammar

S A

a |

_b

A A

c |

Sd | 

S is left-recursive because S is left-recursive because S S

  A A a a   S S da da   but it is not immediately left but it is not immediately left recursive

recursive

(26)

Elimination of Left Recursion Elimination of Left Recursion

(Cont.) (Cont.)

Algorithm. Left recursion elimination.

Arrange nonterminals in some order

Arrange nonterminals in some order AA₁₁, , AA_{2 ,}_{2 ,},…, ,…, AA_n_n for i = 1 to n {

for i = 1 to n {

for j = 1 to i - 1 { for j = 1 to i - 1 {

replace each production of the formreplace each production of the form A_iAj 

byby

A_i1 | … | n 

where where AA_j_j1 ₁|…| |…| n _nare the current productions for are the current productions for AA_j_j }}

eliminate the immediate left recursion among the A eliminate the immediate left recursion among the A_i_i--

productions productions }}

(27)

Elimination of Left Recursion Elimination of Left Recursion

(Cont.) (Cont.)

 Notice that iteration Notice that iteration ii only changes productions only changes productions with

with AAi_i on the left-hand side, and on the left-hand side, and AA_j_j with with jj > > ii in in the right-hand side

the right-hand side

 Correctness induction proof: Correctness induction proof:

– Clearly true for i = 1

– If true for all i < k, then when the outer loop is executed for i = k, the inner loop will remove all productions Ai

Aj with j < i

– Finally, after the elimination of self recursion, m in any

AiAm productions will be > i

 At the end of the algorithm, all derivations of the At the end of the algorithm, all derivations of the form

form AA_i_i^^AA_m_mwill have will have mm > > ii and therefore left and therefore left recursion will not be present

recursion will not be present

(28)

Left Factoring Left Factoring

 Left factoring helps transform a grammar for Left factoring helps transform a grammar for predictive parsing

predictive parsing

 For example, if we have the two productionsFor example, if we have the two productions S S ifif  thenthen SS elseelse SS

| | if if  thenthen SS

on seeing the input token

on seeing the input token if, we cannot if, we cannot

immediately tell which production to choose to immediately tell which production to choose to

expand expand SS

 In general, if we have In general, if we have A A  ₁₁||₂₂ and the input and the input begins with

begins with , we do not know, we do not know(without looking (without looking further) which production to use to expand

further) which production to use to expand AA

(29)

Left Factoring (Cont.) Left Factoring (Cont.)



However, we may defer the decision However, we may defer the decision by expanding A to

by expanding A to   A’ A’



Then after seeing the input derived Then after seeing the input derived from

from   , we may expand A’ to , we may expand A’ to  

₁₁

or to or to

 

₂₂



Left-factored, the original Left-factored, the original productions become

productions become

A A     A’ A’

A’ A’    

₁₁

| |  

₂₂

(30)

Non-Context-Free Language Non-Context-Free Language

Constructs Constructs

 Examples of non-context-free languages are:Examples of non-context-free languages are:

– L₁= {wcw | w is of the form (a|b)*}

– L2 = {aⁿb^mcⁿd^m | n  1 and m  1 } – L₃= {aⁿbⁿcⁿ | n  0 }

 Languages similar to these that are context freeLanguages similar to these that are context free

– L'1 = {wcw^R | w is of the form (a|b)*} (w^Rstands for w reversed)

This language is generated by the grammar

S aSa | bSb | c

– L'₂= {aⁿb^mc^mdⁿ | n  1 and m 1 }

This language is generated by the grammar

S aSd | aAd

A bAc | bc

(31)

Non-Context-Free Language Non-Context-Free Language

Constructs (Cont.) Constructs (Cont.)



L'' L''

₂₂

= { = { a a

ⁿⁿ

b b

ⁿⁿ

c c

^m^m

d d

^m^m

| | n n   1 1 and and m m   1 1 } }

is generated by the grammar is generated by the grammar

S



AB

A



aAb | ab B



cBd | cd



L' L'

₃₃

= { = { a a

ⁿⁿ

b b

ⁿⁿ

| | n n   1 1 } }

is generated by the grammar is generated by the grammar

S



aSb | ab

This language is not definable by any This language is not definable by any

regular expression

(32)

CFG vs DFSA CFG vs DFSA



L‘ L‘

₄₄

= { = { a a

ⁿⁿ

b b

^m^m

| | n n > 0, > 0, n n > 0} > 0}

0 1

b a

start b

(33)

Non-Context-Free Language Non-Context-Free Language

Constructs (Cont.) Constructs (Cont.)

 Suppose we could construct a DFSM Suppose we could construct a DFSM DD accepting accepting L'L'_3._3.

 DD must have a finite number of states, say must have a finite number of states, say kk. .

 Consider the sequence of states Consider the sequence of states ss₀₀, , ss₁₁, , ss₂₂, …, , …, ss_k_k entered by

entered by DD having read having read , , aa, , aaaa, …, , …, aa^k^k. .

 Since Since DD only has only has kk states, two of the states in the states, two of the states in the sequence have to be equal. Say,

sequence have to be equal. Say, s s_i_i  ss_j_j ( (ii  jj). ).

 From From ss_i_i, a sequence of , a sequence of ii bbs leads to an accepting s leads to an accepting (final) state. Therefore, the same sequence of

(final) state. Therefore, the same sequence of ii bs bs will also lead to an accepting state from

will also lead to an accepting state from ss_j_j. . Therefore

Therefore DD would accept would accept aa^j^jbbⁱⁱ which means that which means that the language accepted by

the language accepted by DD is not identical to L’ is not identical to L’₃₃. . A contradiction.

A contradiction.

(34)

Parsing Parsing

The parsing problem is: Given string of The parsing problem is: Given string of

tokens

tokens w w , find a parse tree whose frontier , find a parse tree whose frontier is is w w . (Equivalently, find a derivation from . (Equivalently, find a derivation from w w ) )

A A parser parser for a grammar for a grammar G G reads a list of reads a list of

tokens and finds a parse tree if they form tokens and finds a parse tree if they form a sentence (or reports an error otherwise) a sentence (or reports an error otherwise) Two classes of algorithms for parsing:

Two classes of algorithms for parsing:

– Top-down – Bottom-up

(35)

Parser generators Parser generators



A A parser generator parser generator is a program that reads is a program that reads a grammar and produces a parser

a grammar and produces a parser



The best known parser generator is The best known parser generator is yacc yacc It It produces bottom-up parsers

produces bottom-up parsers



Most parser generators - including yacc - Most parser generators - including yacc - do not work for every CFG; they accept a do not work for every CFG; they accept a

restricted class of CFG’s that can be restricted class of CFG’s that can be

parsed efficiently using the method parsed efficiently using the method

employed by that parser generator

(36)

Top-down (predictive) Top-down (predictive)

parsing parsing



Starting from parse tree containing Starting from parse tree containing just

just S S , build tree down toward input. , build tree down toward input.

Expand left-most non-terminal.

(37)

Top-down parsing algorithm Top-down parsing algorithm

Let input = a

₁₁

a a

₂₂

...a ...a

_n_n

current sentential form (csf) = current sentential form (csf) = S S

loop { loop {

suppose csf = a suppose csf = a

₁₁

…a …a

_k_k

A A  

based on a based on a

_k_k₊₁₊₁

…, choose production …, choose production A A  

csf becomes a csf becomes a

₁₁

…a …a

_k_k

   _

} }

(38)

Top-down parsing example Top-down parsing example

Grammar:

Grammar: H H : L : L   E E ; ; L | E L | E E

E   a a | | b b Input:

Input: a;b a;b

Parse tree Sentential form

Parse tree Sentential form Input Input

L a;b

E;L a;b

L L E ;L

L E ;L a

a;L a;b

(39)

Top-down parsing example Top-down parsing example

(cont.) (cont.)

Parse tree Sentential form Input Parse tree Sentential form Input

a;E a;b

L E ;L a E

L E ;L a E b

a;b a;b

(40)

LL(1) parsing LL(1) parsing



Efficient form of top-down parsing Efficient form of top-down parsing



Use only first symbol of remaining Use only first symbol of remaining input (

input ( a a

_k_k₊₁₊₁

) to choose next ) to choose next

production. That is, employ a production. That is, employ a

function M:

function M:     N N   P in “choose P in “choose production” step of algorithm.

production” step of algorithm.



When this is possible, grammar is When this is possible, grammar is called LL(1)

called LL(1)

(41)

LL(1) examples LL(1) examples



Example 1: Example 1:

H: L E ; L | E E a | b

Given input a;b, so next symbol is a.

Which production to use? Can’t tell.

 H not LL(1)

(42)

LL(1) examples LL(1) examples



Example 2: Example 2:

Exp Term Exp’

Exp’  $ | + Exp Term id

(Use $ for “end-of-input” symbol.)

Grammar is LL(1): Exp and Term have only one production; Exp’ has two productions but only one is applicable at any time.

(43)

Nonrecursive predictive Nonrecursive predictive

parsing parsing



Maintain a stack explicitly, rather Maintain a stack explicitly, rather than implicitly via recursive calls than implicitly via recursive calls



Key problem during predictive Key problem during predictive

parsing: determining the production parsing: determining the production

to be applied for a non-terminal

(44)

Nonrecursive predictive parsing Nonrecursive predictive parsing

Algorithm.

Algorithm. Nonrecursive predictive parsingNonrecursive predictive parsing Set Set ipip to point to the first symbol of to point to the first symbol of ww$.$.

Push S onto the stack.

repeat repeat

Let XLet X be the top of the stack symbol and a the symbol pointed to by be the top of the stack symbol and a the symbol pointed to by ipip if if X is a terminal or $ X is a terminal or $ thenthen

ifif XX == == aa thenthen

pop Xpop X from the stack and advance from the stack and advance ipip else error()else error()

elseelse // // XX is a nonterminal is a nonterminal

if if M[M[X,a] == X,a] == XXYY11 Y Y22 … Y … Y kk thenthen pop Xpop X from the stack from the stack push Y

push Yk_kY Y k-1_k-1, …, Y, …, Y1₁ onto the stack with Y onto the stack with Y1₁ on top on top (push nothing if

(push nothing if YY1₁ Y Y2₂ … Y … Y _kk is is  ) ) output the production

output the production XXYY₁1 Y Y2₂ … Y … Y k_k

else error()else error() until

until X == $ X == $

(45)

LL(1) grammars LL(1) grammars



No left recursion No left recursion

A  A : If this production is chosen, parse makes no progress.



No common prefixes No common prefixes

A  | 

Can fix by “left factoring”:

A  A’

’| 

(46)

LL(1) grammars (cont.) LL(1) grammars (cont.)



No ambiguity No ambiguity

Precise definition requires that

production to choose be unique

(“choose” function M very hard to

calculate otherwise)

(47)

LL(1) definition LL(1) definition

 Define: Grammar G = (N, Define: Grammar G = (N, , P, S) is LL(1), P, S) is LL(1) iff whenever there iff whenever there are two left-most derivations

are two left-most derivations SS * * wAwA   ww * * wtxwtx

SS * * wAwA  _ ww * _* wtywty it follows that

it follows that  = =

Leftmost-derivation: where the leftmost non-terminal is Leftmost-derivation: where the leftmost non-terminal is

always expanded first always expanded first

 In other words, given In other words, given 1. a string

1. a string wA wA in V* and in V* and

2. t, the first terminal symbol to be derived from 2. t, the first terminal symbol to be derived from AA there is at most one production that can be applied to

there is at most one production that can be applied to A toA to yield a derivation of any terminal string beginning with yield a derivation of any terminal string beginning with wtwt

(48)

Checking LL(1)-ness Checking LL(1)-ness



For any sequence of grammar symbols For any sequence of grammar symbols   , , define set FIRST(

define set FIRST(   ) )     to be to be FIRST(

FIRST(   ) = { ) = { a a | |     * * a a   for some for some   } }



FIRST sets can often be calculated by FIRST sets can often be calculated by inspection

inspection

(49)

FIRST Sets FIRST Sets

Exp  Term Exp’

Exp’  $ | + Exp Term id

(Use $ for “end-of-input” symbol)

FIRST($) = {$}

FIRST(

+

^{Exp) = {}

+

^}

FIRST($)  FIRST(

+

^{Exp) = {}}

 grammar is LL(1)

FIRST($) = {$}

FIRST(

+

^{Exp) = {}

+

^}

FIRST($)  FIRST(

+

^{Exp) = {}}

 grammar is LL(1)

(50)

FIRST Sets FIRST Sets

L E ; L | E E a | b

FIRST(E ; L) = {a, b} = FIRST(E) FIRST(E ; L)  FIRST(E)  {}

 grammar not LL(1).

FIRST(E ; L) = {a, b} = FIRST(E) FIRST(E ; L)  FIRST(E)  {}

 grammar not LL(1).

(51)

Computing FIRST Sets (no Computing FIRST Sets (no  

productions) productions)

Algorithm

Algorithm. Compute FIRST(. Compute FIRST(XX) for all grammar symbols ) for all grammar symbols XX

forall

forall XX  V do FIRST( V do FIRST(XX) = {}) = {}

forall

forall XX    do FIRST( do FIRST(XX) = {) = {XX}} repeat

repeat

forall productions forall productions X X  YY₁₁YY₂₂ … Y … Y_k_k do do FIRST(

FIRST(XX) = FIRST() = FIRST(XX) )  FIRST( FIRST(YY₁₁))

until no more elements are added to any FIRST set until no more elements are added to any FIRST set

(52)

Computing FIRST Sets (with Computing FIRST Sets (with  

productions) productions)

Algorithm

Algorithm. Compute FIRST(. Compute FIRST(XX) for all grammar symbols ) for all grammar symbols XX

forall

forall XX  V do FIRST( V do FIRST(XX) = {}) = {}

forall

forall XX    do FIRST( do FIRST(XX) = {) = {XX}} forall productions

forall productions XX   do FIRST( do FIRST(XX) = FIRST() = FIRST(XX) )  { {}} repeat

repeat

forall productions forall productions X X  YY₁₁YY₂₂ … Y … Y_k_k do do for i = 1, k do

FIRST(

FIRST(XX) = FIRST() = FIRST(XX) )  (FIRST( (FIRST(YY_i_i) - {) - {})}) if if   FIRST( FIRST(YY_i_i) then break) then break

else if i == k then

FIRST(X) = FIRST(X)  {}

until no more elements are added to any FIRST set until no more elements are added to any FIRST set

(53)

FIRST Sets of Strings of Symbols FIRST Sets of Strings of Symbols



FIRST(X FIRST(X

₁₁

X X

₂₂

…X …X

_n_n

) is the union of ) is the union of FIRST(X

FIRST(X

₁₁

) and all FIRST(X ) and all FIRST(X

_i_i

) such that ) such that  

  FIRST( FIRST( X X

_k_k

) for ) for k k = 1, 2, …, = 1, 2, …, i i -1 -1



FIRST(X FIRST(X

₁₁

X X

₂₂

…X …X

_n_n

) contains ) contains   iff iff     FIRST(

FIRST( X X

_k_k

) for k = 1, 2, …, ) for k = 1, 2, …, n n

(54)

FIRST Sets do not Suffice FIRST Sets do not Suffice



Given the productions Given the productions

A A   aT x aT x

A A   bT y bT y

T T   w w T T    



T T   w w should be applied when the next should be applied when the next input token is w.

input token is w.



T T     should be applied whenever the should be applied whenever the next terminal is either x or y

next terminal is either x or y

(55)

FOLLOW Sets FOLLOW Sets



For any nonterminal For any nonterminal X X , define the set , define the set FOLLOW(

FOLLOW( X X ) )     as as FOLLOW(

FOLLOW( X X ) = { ) = { a a | S | S   * *   X X a a   } }

(56)

Computing the FOLLOW Set Computing the FOLLOW Set

Algorithm. Compute FOLLOW(X) for all nonterminals Algorithm. Compute FOLLOW(X) for all nonterminals XX

FOLLOW(S) = {$}

forall productions A

forall productions A  BB do do

FOLLOW(B) = Follow(B) FOLLOW(B) = Follow(B)  (FIRST( (FIRST() - {) - {})}) repeat

repeat

forall productions A forall productions A  B or A B or A  BB with  with     FIRST(

FIRST() do) do

FOLLOW(B) = FOLLOW(B)

FOLLOW(B) = FOLLOW(B)  FOLLOW(A) FOLLOW(A) until all FOLLOW sets remain the same

until all FOLLOW sets remain the same

(57)

Another Definition of LL(1) Another Definition of LL(1)

Define: Grammar G is LL(1)

Define: Grammar G is LL(1) if for every if for every A A   N with productions A N with productions A  

₁₁

 

_n_n

FIRST(

FIRST(  

_i_i

FOLLOW(A)) FOLLOW(A))   FIRST(

FIRST(  

_j_j

FOLLOW(A)) = {} for all FOLLOW(A)) = {} for all i i , , j j

(58)

Top-down Parsing Top-down Parsing

Input tokens: <t₀,t₁,…,t_i,...>

L

E₀ … E_n

Start symbol and root of parse tree Start symbol and root of parse tree

Input tokens: <t_i,...> L E₀ … E_n From left to right, ...

“grow” the parse tree downwards From left to right,

“grow” the parse tree downwards

(59)

Construction of a predictive parsing table Construction of a predictive parsing table

Algorithm

Algorithm. Construction of a predictive parsing . Construction of a predictive parsing table

table

M[,] = {}

forall productions A

forall productions A   do do forall a

forall a  FIRST( FIRST() do ) do M[A, a] = M[A, a]

M[A, a] = M[A, a]  {A  {A  }} if if   FIRST( FIRST() then ) then

forall b

forall b  FOLLOW(A) do FOLLOW(A) do M[A, b] = M[A, b]

M[A, b] = M[A, b]  {A  {A  }} Make all empty entries of M be error

Make all empty entries of M be error

(60)

Recursive Descent Recursive Descent

Parsing

(61)

Recursive Descent Parser Recursive Descent Parser

stat

stat → var → var = = expr expr ; ; expr → term [

expr → term [ + + term] term]

term → factor [

term → factor [ * * factor] factor]

factor →

factor → ( ( expr expr ) ) | var | | var | constant | constant | call( fun, expr )

call( fun, expr ) var →

var → identifier identifier

(62)

Scanner Scanner

public class Scanner {

private StreamTokenizer input;

private Type lastToken;

public enum Type { INVALID_CHAR, NO_TOKEN , VARIABLE, PLUS, FALSE, TRUE,

// etc. for remaining tokens, then:

}; EOF

public Scanner (Reader r) {

input = new StreamTokenizer(r);

input.resetSyntax();

input.eolIsSignificant(false);

input.wordChars('a', 'z');

input.wordChars('A', 'Z');

input.ordinaryChar('+');

input.ordinaryChar('*');

input.ordinaryChar('=');

input.ordinaryChar('(');

input.ordinaryChar(')');

}

Parsing Parsing

Parsing Parsing

Giuseppe Attardi

Giuseppe Attardi

Università di Pisa

Università di Pisa

Parsing Parsing

To extract the grammatical structure of To extract the grammatical structure of

a sentence, where:

a sentence, where:

sentence = program sentence = program

words = tokens words = tokens

Outline of coverage Outline of coverage

Context-free grammars Context-free grammars

Parsing Parsing

– Tabular Parsing Methods – One pass

Yacc Yacc

Grammatical structure of program Grammatical structure of program

Context-free languages Context-free languages

Grammatical structure defined by context- Grammatical structure defined by context-

free grammar free grammar

Context Free Grammar Context Free Grammar

G G = ( = ( V V , ,   , , P P , , S S ) )

V is a finite set of non-terminal symbols V is a finite set of non-terminal symbols

  is a finite set of terminal symbols is a finite set of terminal symbols

P P is a finite set of productions is a finite set of productions   V V   ( ( V V  

 

S is the start symbol S is the start symbol

Parse trees Parse trees

“ “ Parse tree from Parse tree from A A ” = root labeled ” = root labeled with

with A A

“ “ Complete parse tree” = all leaves Complete parse tree” = all leaves labeled with tokens

labeled with tokens

Parse tree: tree labeled with grammar Parse tree: tree labeled with grammar

symbols, such that:

symbols, such that:

if node is labeled

if node is labeled A A and its and its children are labeled

children are labeled x x

... ... x x

, then there , then there is a production

is a production

A A     x x

... ... x x

Parse tree: tree labeled with grammar Parse tree: tree labeled with grammar

symbols, such that:

symbols, such that:

if node is labeled

if node is labeled A A and its and its children are labeled

children are labeled x x

... ... x x

, then there , then there is a production

is a production

A A     x x

... ... x x

Parse trees and sentences Parse trees and sentences

Frontier Frontier of tree = labels on leaves (in left- of tree = labels on leaves (in left- to-right order)

to-right order)

Frontier of tree from Frontier of tree from S S is a is a sentential form sentential form

Example Example

G G : L : L   L L ; ; E | E E | E E E   a a | | b b

Syntax trees from start symbol (L):

Syntax trees from start symbol (L):

Derivations Derivations

Alternate definition of

Alternate definition of sentential form sentential form : :

Given Given   , ,   in in V V *, say *, say   is a is a derivation derivation step

step if if   ’ ’   ’’ and ’’ and   = =   ’ ’   ’’ , where ’’ , where A A  

  is a production is a production

  is a is a sentential form sentential form iff there exists a iff there exists a derivation

derivation (sequence of derivation steps) (sequence of derivation steps) S S   (alternatively, we say that (alternatively, we say that S S  

  ) )

Two definitions are equivalent, but note that there Two definitions are equivalent, but note that there

are many derivations corresponding to each parse tree are many derivations corresponding to each parse tree Two definitions are equivalent, but note that there

Two definitions are equivalent, but note that there

are many derivations corresponding to each parse tree

are many derivations corresponding to each parse tree

Another example Another example

H H : L : L   E E ; L | E ; L | E E

E   a a | | b b

Given Given   , ,   in in V V , say , say   is a is a derivation derivation step

E ** E+E | E*E | id**

E -> E + T -> E + T + T -> T + T + T -> id E -> E + T -> E + T + T -> T + T + T -> id **+ T + T -> id + T * id + T -> id + id * id + T + T -> id + T * id + T -> id + id * id**

**id + id * id + id**

**id + id * id + id**