Parsing Parsing
Giuseppe Attardi
Giuseppe Attardi
Università di Pisa
Università di Pisa
Parsing Parsing
To extract the grammatical structure of To extract the grammatical structure of
a sentence, where:
a sentence, where:
sentence = program sentence = program
words = tokens words = tokens
For further information:
Aho, Sethi, Ullman, “Compilers: Principles, Techniques, and Tools” (a.k.a, the “Dragon Book”)
Outline of coverage Outline of coverage
Context-free grammars Context-free grammars
Parsing Parsing
– Tabular Parsing Methods – One pass
• Top-down
• Bottom-up
Yacc Yacc
Grammatical structure of program Grammatical structure of program
function-def
name arguments stmt-list
main stmt
expression operator
expression expression
variable string
cout
<<
“hello, world\n”
()
Context-free languages Context-free languages
Grammatical structure defined by context- Grammatical structure defined by context-
free grammar free grammar
statementstatement assignmentassignment ; ; statement
statement expressionexpression ; ; statement
statement compound-statementcompound-statement assignment
assignment ident =ident = expressionexpression compound-statement
compound-statement
{{ declaration-list statement-list declaration-list statement-list }} Definition (Context-free). Grammar with only one
non-terminal symbol in left hand side of productions.
Definition (Context-free). Grammar with only one
non-terminal symbol in left hand side of productions.
Symbols terminal (token) non-terminal
Context Free Grammar Context Free Grammar
G G = ( = ( V V , , , , P P , , S S ) )
V is a finite set of non-terminal symbols V is a finite set of non-terminal symbols
is a finite set of terminal symbols is a finite set of terminal symbols
P P is a finite set of productions is a finite set of productions V V ( ( V V
S is the start symbol S is the start symbol
Parse trees Parse trees
“ “ Parse tree from Parse tree from A A ” = root labeled ” = root labeled with
with A A
“ “ Complete parse tree” = all leaves Complete parse tree” = all leaves labeled with tokens
labeled with tokens
Parse tree: tree labeled with grammar Parse tree: tree labeled with grammar
symbols, such that:
symbols, such that:
if node is labeled
if node is labeled A A and its and its children are labeled
children are labeled x x
11... ... x x
nn, then there , then there is a production
is a production
A A x x
11... ... x x
nnParse tree: tree labeled with grammar Parse tree: tree labeled with grammar
symbols, such that:
symbols, such that:
if node is labeled
if node is labeled A A and its and its children are labeled
children are labeled x x
11... ... x x
nn, then there , then there is a production
is a production
A A x x
11... ... x x
nnParse trees and sentences Parse trees and sentences
Frontier Frontier of tree = labels on leaves (in left- of tree = labels on leaves (in left- to-right order)
to-right order)
Frontier of tree from Frontier of tree from S S is a is a sentential form sentential form
L E a
L
; E
“Frontier”
Definition. A
Definition. A sentence sentence is the frontier of a complete is the frontier of a complete tree from
tree from S.S.
Definition. A
Definition. A sentence sentence is the frontier of a complete is the frontier of a complete tree from
tree from S.S.
Example Example
G G : L : L L L ; ; E | E E | E E E a a | | b b
Syntax trees from start symbol (L):
Syntax trees from start symbol (L):
a a;E a;b;b
L E a
L E a
L
; E L
E a
L
; E b
L
E b
;
Sentential forms:
Sentential forms:
Derivations Derivations
Alternate definition of
Alternate definition of sentential form sentential form : :
Given Given , , in in V V *, say *, say is a is a derivation derivation step
step if if ’ ’ ’’ and ’’ and = = ’ ’ ’’ , where ’’ , where A A
is a production is a production
is a is a sentential form sentential form iff there exists a iff there exists a derivation
derivation (sequence of derivation steps) (sequence of derivation steps) S S (alternatively, we say that (alternatively, we say that S S
) )
Two definitions are equivalent, but note that there Two definitions are equivalent, but note that there
are many derivations corresponding to each parse tree are many derivations corresponding to each parse tree Two definitions are equivalent, but note that there
Two definitions are equivalent, but note that there
are many derivations corresponding to each parse tree
are many derivations corresponding to each parse tree
Another example Another example
H H : L : L E E ; L | E ; L | E E
E a a | | b b
L E a
L E a L
E ; L
E b L
E ; a
L E b
;
Ambiguity Ambiguity
A sentence can have more than one parse A sentence can have more than one parse tree tree
A grammar is A grammar is ambiguous ambiguous if there is a if there is a sentence with more than one parse tree sentence with more than one parse tree
Example 1 Example 1
E E+E | E*E | id
E
E
E E
E
id id
+
id*
E
E
E E
id E
id id
+
*
Notes Notes
If e then if b then d else f If e then if b then d else f
{ int x; y = 0; } { int x; y = 0; }
a.b.c = d; a.b.c = d;
Id -> s | s.id Id -> s | s.id
E -> E + T -> E + T + T -> T + T + T -> id E -> E + T -> E + T + T -> T + T + T -> id + T + T -> id + T * id + T -> id + id * id + T + T -> id + T * id + T -> id + id * id
+ T ->
+ T ->
id + id * id + id
id + id * id + id
Ambiguity Ambiguity
Ambiguity is a feature of the Ambiguity is a feature of the
grammar rather than the language grammar rather than the language
Certain ambiguous grammars may Certain ambiguous grammars may
have equivalent unambiguous ones
have equivalent unambiguous ones
Grammar Grammar
Transformations Transformations
Grammars can be transformed Grammars can be transformed without affecting the language without affecting the language
generated generated
Three transformations are discussed Three transformations are discussed next:
next:
– Eliminating Ambiguity
– Eliminating Left Recursion (i.e.
productions of the form
AA )
– Left Factoring
Eliminating Ambiguity Eliminating Ambiguity
Sometimes an ambiguous grammar can Sometimes an ambiguous grammar can be rewritten to eliminate ambiguity
be rewritten to eliminate ambiguity
For example, the grammar of Example 1 For example, the grammar of Example 1 can be written as follows:
can be written as follows:
E E +T | T
T E * id | id
The language generated by this grammar The language generated by this grammar is the same as that generated by the
is the same as that generated by the previous grammar. Both generate
previous grammar. Both generate id id (+ (+ id| id|
**
id id )* )*
However, this grammar is not ambiguous However, this grammar is not ambiguous
Eliminating Ambiguity (Cont.) Eliminating Ambiguity (Cont.)
One advantage of this grammar is One advantage of this grammar is that it represents the precedence that it represents the precedence
between operators. In the parsing between operators. In the parsing
tree, products appear nested within tree, products appear nested within
additions additions
E
T
T E
id
+
*
id T
id
Eliminating Ambiguity (Cont.) Eliminating Ambiguity (Cont.)
An example of ambiguity in a An example of ambiguity in a
programming language is the dangling programming language is the dangling else else
Consider Consider
S S
if if then then S S else else S S | | if if then then S S | |
Eliminating Ambiguity (Cont.) Eliminating Ambiguity (Cont.)
When there are two nested ifs and When there are two nested ifs and only one else..
only one else..
S
ifif then S else S
if then S
S
ifif then S
ifif then S else S
Eliminating Ambiguity Eliminating Ambiguity
(Cont.) (Cont.)
In most languages (including C++ and Java), In most languages (including C++ and Java), each
each else is assumed to belong to the else is assumed to belong to the nearest
nearest if if that is not already matched by an that is not already matched by an else else . .
This association is expressed in the following This association is expressed in the following (unambiguous) grammar:
(unambiguous) grammar:
S S MatchedMatched
| Unmatched| Unmatched
Matched Matched ifif thenthen Matched Matched else Matchedelse Matched
| |
Unmatched Unmatched ifif then Sthen S
||ifif then Matched then Matched else Unmatchedelse Unmatched
Eliminating Ambiguity (Cont.) Eliminating Ambiguity (Cont.)
Ambiguity is a property of the Ambiguity is a property of the grammar
grammar
It is undecidable whether a context It is undecidable whether a context free grammar is ambiguous
free grammar is ambiguous
The proof is done by reduction to The proof is done by reduction to Post’s correspondence problem Post’s correspondence problem
Although there is no general Although there is no general
algorithm, it is possible to isolate algorithm, it is possible to isolate certain constructs in productions certain constructs in productions
which lead to ambiguous grammars
which lead to ambiguous grammars
Eliminating Ambiguity (Cont.) Eliminating Ambiguity (Cont.)
For example, a grammar containing the production For example, a grammar containing the production
AAAA |AA | would be ambiguous, because the would be ambiguous, because the substring
substring has two parses: has two parses:
A
A A
A
A
A A
A A
A
This ambiguity disappears if we use the productions This ambiguity disappears if we use the productions
AAAB |AB | BB and and BB or the productions or the productions
AABA |BA | BB and and BB..
Eliminating Ambiguity (Cont.) Eliminating Ambiguity (Cont.)
Examples of ambiguous productions: Examples of ambiguous productions:
AAA AA | A
AA | AA
A CF language is inherently ambiguous if A CF language is inherently ambiguous if it has no unambiguous CFG
it has no unambiguous CFG
– An example of such a language is
L = {aibjcm | i=j or j=m} which can be generated by the grammar:
SAB | DC
AaA | CcC |
BbBc | DaDb |
Elimination of Left Recursion Elimination of Left Recursion
A grammar is left recursive if it has a A grammar is left recursive if it has a nonterminal
nonterminal A A and a derivation and a derivation A A
A A for for some string
some string
– Top-down parsing methods cannot handle left-
recursive grammars, so a transformation to eliminate left recursion is needed
Immediate left recursion (productions of the Immediate left recursion (productions of the form
form A A A A ) can be easily eliminated: ) can be easily eliminated:
1. Group the A-productions as
A A1 | A2 | … | Am | 1| 2 | … | n where no i begins with A
2. Replace the A-productions by
A 1A’ | 2A’ | … | nA’
A’ 1A’ |2A’| … | mA’ |
Elimination of Left Recursion Elimination of Left Recursion
(Cont.) (Cont.)
The previous transformation, The previous transformation,
however, does not eliminate left however, does not eliminate left
recursion involving two or more steps recursion involving two or more steps
For example, consider the grammar For example, consider the grammar
S A
a |
bA A
c |
Sd | S is left-recursive because S is left-recursive because S S
A A a a S S da da but it is not immediately left but it is not immediately left recursive
recursive
Elimination of Left Recursion Elimination of Left Recursion
(Cont.) (Cont.)
Algorithm. Left recursion elimination.
Algorithm. Left recursion elimination.
Arrange nonterminals in some order
Arrange nonterminals in some order AA11, , AA2 ,2 ,,…, ,…, AAnn for i = 1 to n {
for i = 1 to n {
for j = 1 to i - 1 { for j = 1 to i - 1 {
replace each production of the formreplace each production of the form Ai Aj
byby
Ai 1 | … | n
where where AAj j 1 1 |…| |…| n nare the current productions for are the current productions for AAjj }}
eliminate the immediate left recursion among the A eliminate the immediate left recursion among the Aii--
productions productions }}
Elimination of Left Recursion Elimination of Left Recursion
(Cont.) (Cont.)
Notice that iteration Notice that iteration ii only changes productions only changes productions with
with AAii on the left-hand side, and on the left-hand side, and AAjj with with jj > > ii in in the right-hand side
the right-hand side
Correctness induction proof: Correctness induction proof:
– Clearly true for i = 1
– If true for all i < k, then when the outer loop is executed for i = k, the inner loop will remove all productions Ai
Aj with j < i
– Finally, after the elimination of self recursion, m in any
AiAm productions will be > i
At the end of the algorithm, all derivations of the At the end of the algorithm, all derivations of the form
form AAi i AAmmwill have will have mm > > ii and therefore left and therefore left recursion will not be present
recursion will not be present
Left Factoring Left Factoring
Left factoring helps transform a grammar for Left factoring helps transform a grammar for predictive parsing
predictive parsing
For example, if we have the two productionsFor example, if we have the two productions S S ifif thenthen SS elseelse SS
| | if if thenthen SS
on seeing the input token
on seeing the input token if, we cannot if, we cannot
immediately tell which production to choose to immediately tell which production to choose to
expand expand SS
In general, if we have In general, if we have A A 1 1 ||22 and the input and the input begins with
begins with , we do not know, we do not know (without looking (without looking further) which production to use to expand
further) which production to use to expand AA
Left Factoring (Cont.) Left Factoring (Cont.)
However, we may defer the decision However, we may defer the decision by expanding A to
by expanding A to A’ A’
Then after seeing the input derived Then after seeing the input derived from
from , we may expand A’ to , we may expand A’ to
1 1or to or to
22
Left-factored, the original Left-factored, the original productions become
productions become
A A A’ A’
A’ A’
1 1| |
22Non-Context-Free Language Non-Context-Free Language
Constructs Constructs
Examples of non-context-free languages are:Examples of non-context-free languages are:
– L1 = {wcw | w is of the form (a|b)*}
– L2 = {anbmcndm | n 1 and m 1 } – L3 = {anbncn | n 0 }
Languages similar to these that are context freeLanguages similar to these that are context free
– L'1 = {wcwR | w is of the form (a|b)*} (wR stands for w reversed)
This language is generated by the grammar
S aSa | bSb | c
– L'2 = {anbmcmdn | n 1 and m 1 }
This language is generated by the grammar
S aSd | aAd
A bAc | bc
Non-Context-Free Language Non-Context-Free Language
Constructs (Cont.) Constructs (Cont.)
L'' L''
2 2= { = { a a
nnb b
nnc c
mmd d
mm| | n n 1 1 and and m m 1 1 } }
is generated by the grammar is generated by the grammar
S
AB
A
aAb | ab B
cBd | cd
L' L'
3 3= { = { a a
nnb b
nn| | n n 1 1 } }
is generated by the grammar is generated by the grammar
S
aSb | ab
This language is not definable by any This language is not definable by any
regular expression
regular expression
CFG vs DFSA CFG vs DFSA
L‘ L‘
4 4= { = { a a
nnb b
mm| | n n > 0, > 0, n n > 0} > 0}
0 1
b a
start b
Non-Context-Free Language Non-Context-Free Language
Constructs (Cont.) Constructs (Cont.)
Suppose we could construct a DFSM Suppose we could construct a DFSM DD accepting accepting L'L'3. 3.
DD must have a finite number of states, say must have a finite number of states, say kk. .
Consider the sequence of states Consider the sequence of states ss00, , ss11, , ss22, …, , …, sskk entered by
entered by DD having read having read , , aa, , aaaa, …, , …, aakk. .
Since Since DD only has only has kk states, two of the states in the states, two of the states in the sequence have to be equal. Say,
sequence have to be equal. Say, s sii ssjj ( (ii jj). ).
From From ssii, a sequence of , a sequence of ii bbs leads to an accepting s leads to an accepting (final) state. Therefore, the same sequence of
(final) state. Therefore, the same sequence of ii bs bs will also lead to an accepting state from
will also lead to an accepting state from ssjj. . Therefore
Therefore DD would accept would accept aajjbbii which means that which means that the language accepted by
the language accepted by DD is not identical to L’ is not identical to L’33. . A contradiction.
A contradiction.
Parsing Parsing
The parsing problem is: Given string of The parsing problem is: Given string of
tokens
tokens w w , find a parse tree whose frontier , find a parse tree whose frontier is is w w . (Equivalently, find a derivation from . (Equivalently, find a derivation from w w ) )
A A parser parser for a grammar for a grammar G G reads a list of reads a list of
tokens and finds a parse tree if they form tokens and finds a parse tree if they form a sentence (or reports an error otherwise) a sentence (or reports an error otherwise) Two classes of algorithms for parsing:
Two classes of algorithms for parsing:
– Top-down – Bottom-up
Parser generators Parser generators
A A parser generator parser generator is a program that reads is a program that reads a grammar and produces a parser
a grammar and produces a parser
The best known parser generator is The best known parser generator is yacc yacc It It produces bottom-up parsers
produces bottom-up parsers
Most parser generators - including yacc - Most parser generators - including yacc - do not work for every CFG; they accept a do not work for every CFG; they accept a
restricted class of CFG’s that can be restricted class of CFG’s that can be
parsed efficiently using the method parsed efficiently using the method
employed by that parser generator
employed by that parser generator
Top-down (predictive) Top-down (predictive)
parsing parsing
Starting from parse tree containing Starting from parse tree containing just
just S S , build tree down toward input. , build tree down toward input.
Expand left-most non-terminal.
Expand left-most non-terminal.
Top-down parsing algorithm Top-down parsing algorithm
Let input = a
Let input = a
11a a
22...a ...a
nncurrent sentential form (csf) = current sentential form (csf) = S S
loop { loop {
suppose csf = a suppose csf = a
11…a …a
kkA A
based on a based on a
kk+1+1…, choose production …, choose production A A
csf becomes a csf becomes a
11…a …a
kk
} }
Top-down parsing example Top-down parsing example
Grammar:
Grammar: H H : L : L E E ; ; L | E L | E E
E a a | | b b Input:
Input: a;b a;b
Parse tree Sentential form
Parse tree Sentential form Input Input
L a;b
E;L a;b
L L E ;L
L E ;L a
a;L a;b
Top-down parsing example Top-down parsing example
(cont.) (cont.)
Parse tree Sentential form Input Parse tree Sentential form Input
a;E a;b
L E ;L a E
L E ;L a E b
a;b a;b
LL(1) parsing LL(1) parsing
Efficient form of top-down parsing Efficient form of top-down parsing
Use only first symbol of remaining Use only first symbol of remaining input (
input ( a a
kk+1+1) to choose next ) to choose next
production. That is, employ a production. That is, employ a
function M:
function M: N N P in “choose P in “choose production” step of algorithm.
production” step of algorithm.
When this is possible, grammar is When this is possible, grammar is called LL(1)
called LL(1)
LL(1) examples LL(1) examples
Example 1: Example 1:
H: L E ; L | E E a | b
Given input a;b, so next symbol is a.
Which production to use? Can’t tell.
H not LL(1)
LL(1) examples LL(1) examples
Example 2: Example 2:
Exp Term Exp’
Exp’ $ | + Exp Term id
(Use $ for “end-of-input” symbol.)
Grammar is LL(1): Exp and Term have only one production; Exp’ has two productions but only one is applicable at any time.
Grammar is LL(1): Exp and Term have only one production; Exp’ has two productions but only one is applicable at any time.
Nonrecursive predictive Nonrecursive predictive
parsing parsing
Maintain a stack explicitly, rather Maintain a stack explicitly, rather than implicitly via recursive calls than implicitly via recursive calls
Key problem during predictive Key problem during predictive
parsing: determining the production parsing: determining the production
to be applied for a non-terminal
to be applied for a non-terminal
Nonrecursive predictive parsing Nonrecursive predictive parsing
Algorithm.
Algorithm. Nonrecursive predictive parsingNonrecursive predictive parsing Set Set ipip to point to the first symbol of to point to the first symbol of ww$.$.
Push S onto the stack.
Push S onto the stack.
repeat repeat
Let XLet X be the top of the stack symbol and a the symbol pointed to by be the top of the stack symbol and a the symbol pointed to by ipip if if X is a terminal or $ X is a terminal or $ thenthen
ifif XX == == aa thenthen
pop Xpop X from the stack and advance from the stack and advance ipip else error()else error()
elseelse // // XX is a nonterminal is a nonterminal
if if M[M[X,a] == X,a] == XXYY11 Y Y22 … Y … Y kk thenthen pop Xpop X from the stack from the stack push Y
push YkkY Y k-1k-1, …, Y, …, Y11 onto the stack with Y onto the stack with Y11 on top on top (push nothing if
(push nothing if YY11 Y Y22 … Y … Y kk is is ) ) output the production
output the production XXYY11 Y Y22 … Y … Y kk
else error()else error() until
until X == $ X == $
LL(1) grammars LL(1) grammars
No left recursion No left recursion
A A : If this production is chosen, parse makes no progress.
No common prefixes No common prefixes
A |
Can fix by “left factoring”:
A A’
’|
LL(1) grammars (cont.) LL(1) grammars (cont.)
No ambiguity No ambiguity
Precise definition requires that
production to choose be unique
(“choose” function M very hard to
calculate otherwise)
LL(1) definition LL(1) definition
Define: Grammar G = (N, Define: Grammar G = (N, , P, S) is LL(1), P, S) is LL(1) iff whenever there iff whenever there are two left-most derivations
are two left-most derivations SS * * wAwA ww * * wtxwtx
SS * * wAwA ww * * wtywty it follows that
it follows that = =
Leftmost-derivation: where the leftmost non-terminal is Leftmost-derivation: where the leftmost non-terminal is
always expanded first always expanded first
In other words, given In other words, given 1. a string
1. a string wA wA in V* and in V* and
2. t, the first terminal symbol to be derived from 2. t, the first terminal symbol to be derived from AA there is at most one production that can be applied to
there is at most one production that can be applied to A toA to yield a derivation of any terminal string beginning with yield a derivation of any terminal string beginning with wtwt
Checking LL(1)-ness Checking LL(1)-ness
For any sequence of grammar symbols For any sequence of grammar symbols , , define set FIRST(
define set FIRST( ) ) to be to be FIRST(
FIRST( ) = { ) = { a a | | * * a a for some for some } }
FIRST sets can often be calculated by FIRST sets can often be calculated by inspection
inspection
FIRST Sets FIRST Sets
Exp Term Exp’
Exp’ $ | + Exp Term id
(Use $ for “end-of-input” symbol)
FIRST($) = {$}
FIRST(
+
Exp) = {+
}FIRST($) FIRST(
+
Exp) = {} grammar is LL(1)
FIRST($) = {$}
FIRST(
+
Exp) = {+
}FIRST($) FIRST(
+
Exp) = {} grammar is LL(1)
FIRST Sets FIRST Sets
L E ; L | E E a | b
FIRST(E ; L) = {a, b} = FIRST(E) FIRST(E ; L) FIRST(E) {}
grammar not LL(1).
FIRST(E ; L) = {a, b} = FIRST(E) FIRST(E ; L) FIRST(E) {}
grammar not LL(1).
Computing FIRST Sets (no Computing FIRST Sets (no
productions) productions)
Algorithm
Algorithm. Compute FIRST(. Compute FIRST(XX) for all grammar symbols ) for all grammar symbols XX
forall
forall XX V do FIRST( V do FIRST(XX) = {}) = {}
forall
forall XX do FIRST( do FIRST(XX) = {) = {XX}} repeat
repeat
forall productions forall productions X X YY11YY22 … Y … Ykk do do FIRST(
FIRST(XX) = FIRST() = FIRST(XX) ) FIRST( FIRST(YY11))
until no more elements are added to any FIRST set until no more elements are added to any FIRST set
Computing FIRST Sets (with Computing FIRST Sets (with
productions) productions)
Algorithm
Algorithm. Compute FIRST(. Compute FIRST(XX) for all grammar symbols ) for all grammar symbols XX
forall
forall XX V do FIRST( V do FIRST(XX) = {}) = {}
forall
forall XX do FIRST( do FIRST(XX) = {) = {XX}} forall productions
forall productions XX do FIRST( do FIRST(XX) = FIRST() = FIRST(XX) ) { {}} repeat
repeat
forall productions forall productions X X YY11YY22 … Y … Ykk do do for i = 1, k do
FIRST(
FIRST(XX) = FIRST() = FIRST(XX) ) (FIRST( (FIRST(YYii) - {) - {})}) if if FIRST( FIRST(YYii) then break) then break
else if i == k then
FIRST(X) = FIRST(X) {}
until no more elements are added to any FIRST set until no more elements are added to any FIRST set
FIRST Sets of Strings of Symbols FIRST Sets of Strings of Symbols
FIRST(X FIRST(X
11X X
22…X …X
nn) is the union of ) is the union of FIRST(X
FIRST(X
11) and all FIRST(X ) and all FIRST(X
ii) such that ) such that
FIRST( FIRST( X X
kk) for ) for k k = 1, 2, …, = 1, 2, …, i i -1 -1
FIRST(X FIRST(X
11X X
22…X …X
nn) contains ) contains iff iff FIRST(
FIRST( X X
kk) for k = 1, 2, …, ) for k = 1, 2, …, n n
FIRST Sets do not Suffice FIRST Sets do not Suffice
Given the productions Given the productions
A A aT x aT x
A A bT y bT y
T T w w T T
T T w w should be applied when the next should be applied when the next input token is w.
input token is w.
T T should be applied whenever the should be applied whenever the next terminal is either x or y
next terminal is either x or y
FOLLOW Sets FOLLOW Sets
For any nonterminal For any nonterminal X X , define the set , define the set FOLLOW(
FOLLOW( X X ) ) as as FOLLOW(
FOLLOW( X X ) = { ) = { a a | S | S * * X X a a } }
Computing the FOLLOW Set Computing the FOLLOW Set
Algorithm. Compute FOLLOW(X) for all nonterminals Algorithm. Compute FOLLOW(X) for all nonterminals XX
FOLLOW(S) = {$}
FOLLOW(S) = {$}
forall productions A
forall productions A BB do do
FOLLOW(B) = Follow(B) FOLLOW(B) = Follow(B) (FIRST( (FIRST() - {) - {})}) repeat
repeat
forall productions A forall productions A B or A B or A BB with with FIRST(
FIRST() do) do
FOLLOW(B) = FOLLOW(B)
FOLLOW(B) = FOLLOW(B) FOLLOW(A) FOLLOW(A) until all FOLLOW sets remain the same
until all FOLLOW sets remain the same
Another Definition of LL(1) Another Definition of LL(1)
Define: Grammar G is LL(1)
Define: Grammar G is LL(1) if for every if for every A A N with productions A N with productions A
11
nnFIRST(
FIRST(
i iFOLLOW(A)) FOLLOW(A)) FIRST(
FIRST(
j jFOLLOW(A)) = {} for all FOLLOW(A)) = {} for all i i , , j j
Top-down Parsing Top-down Parsing
Input tokens: <t0,t1,…,ti,...>
L
E0 … En
Start symbol and root of parse tree Start symbol and root of parse tree
Input tokens: <ti,...> L E0 … En From left to right, ...
“grow” the parse tree downwards From left to right,
“grow” the parse tree downwards
Construction of a predictive parsing table Construction of a predictive parsing table
Algorithm
Algorithm. Construction of a predictive parsing . Construction of a predictive parsing table
table
M[,] = {}
M[,] = {}
forall productions A
forall productions A do do forall a
forall a FIRST( FIRST() do ) do M[A, a] = M[A, a]
M[A, a] = M[A, a] {A {A }} if if FIRST( FIRST() then ) then
forall b
forall b FOLLOW(A) do FOLLOW(A) do M[A, b] = M[A, b]
M[A, b] = M[A, b] {A {A }} Make all empty entries of M be error
Make all empty entries of M be error
Recursive Descent Recursive Descent
Parsing
Parsing
Recursive Descent Parser Recursive Descent Parser
stat
stat → var → var = = expr expr ; ; expr → term [
expr → term [ + + term] term]
term → factor [
term → factor [ * * factor] factor]
factor →
factor → ( ( expr expr ) ) | var | | var | constant | constant | call( fun, expr )
call( fun, expr ) var →
var → identifier identifier
Scanner Scanner
public class Scanner {
private StreamTokenizer input;
private Type lastToken;
public enum Type { INVALID_CHAR, NO_TOKEN , VARIABLE, PLUS, FALSE, TRUE,
// etc. for remaining tokens, then:
}; EOF
public Scanner (Reader r) {
input = new StreamTokenizer(r);
input.resetSyntax();
input.eolIsSignificant(false);
input.wordChars('a', 'z');
input.wordChars('A', 'Z');
input.ordinaryChar('+');
input.ordinaryChar('*');
input.ordinaryChar('=');
input.ordinaryChar('(');
input.ordinaryChar(')');
}