u2
u2
2
Please read this disclaimer before
proceeding:
This document is confidential and intended solely for the educational
purpose of RMK Group of Educational Institutions. If you have
received this document through email in error, please notify the
system manager. This document containsproprietary information and
is intended only to the respective group / learning community as
intended. If you are not the addressee you should not disseminate,
distribute or copy through e-mail. Please notify the sender
immediately by e-mail if you have received this document by mistake
and delete this document from your system. If you are not the
intended recipient you are notified that disclosing, copying,
distributing or taking any action in reliance on the contents of this
information is strictly prohibited.
3
22CS601
Compiler Design (Lab Integrated)
Department: CSE
Created by
Date: 18.12.2024
4
CONTENTS
S. No Page. No
Contents
1 Course Objectives 6
2 Pre Requisites 7
3 Syllabus 8
4 Course outcomes 10
6 Lecture Plan 12
8 Lecture Notes 14
9 Assignments 105
5
1. COURSE OBJECTIVES
6
2. PRE REQUISITES
22CS601
Semester VI
Compiler Design (Lab Integrated)
Semester V
22CS502
Theory of Computation (Lab Integrated)
Semester II
Semester III
22MA301 22CS201
Data Structures (Lab Integrated)
Discrete Mathematics
Semester I Semester II
22CS101 22IT202
Problem Solving DBMS (Lab Integrated)
using C++ (Lab
Integrated)
7
3. SYLLABUS
OBJECTIVES
Principle sources of optimization – Peep hole Optimization – DAG construction -Basic blocks
and flow graph - Optimization in Basic blocks – Data flow analysis.
8
3. SYLLABUS
LIST OF EXPERIMENTS:
1. Develop a lexical analyzer to recognize a few patterns in C. (Ex.
identifiers, constants, comments, operators etc.). Create a symbol table,
while recognizing identifiers.
2. Design a lexical analyzer for the given language. The lexical analyzer
should ignore redundant spaces, tabs and new lines, comments etc.
9
4. COURSE OUTCOME
10
5. CO - PO / PSO MAPPING
CO1 K2 3 2 1 - - - - 1 1 1 - 1 2 - -
CO2 K3 3 2 1 - - - - 1 1 1 - 1 2 - -
CO3 K3 3 2 1 - - - - 1 1 1 - 1 2 - -
CO4 K3 3 2 1 - - - - 1 1 1 - 1 2 - -
CO5 K3 3 2 1 - - - - 1 1 1 - 1 2 - -
CO6 K3 3 2 1 - - - - 1 1 1 - 1 2 - -
• Correlation Level –
• 1. Slight (Low)
• 2. Moderate (Medium)
3. Substantial (High) ,
If there is no correlation, put “-“.
11
6. LECTURE PLAN : UNIT – II
SYNTAX ANALYSIS
1 Role of the Parser, Context-free grammars MD1 & MD5 T1 Define parser role
03.01. 03.01. K2
2025 2025
12
7. ACTIVITY BASED LEARNING : UNIT – II
result in JFLAP
E ->E+T | T
T ->T*F }F
F ->(E) |id
13
8. LECTURE NOTES : UNIT – II
SYNTAX ANALYSIS
1. Top down parser: which build parse trees from top(root) to bottom(leaves)
2. Bottom up parser: which build parse trees from leaves and work up the root.
There are three general types of parsers for grammars: universal, top-down, and
bottom-up. Universal parsing methods such as the Cocke-Younger-Kasami
algorithm and Earley's algorithm can parse any grammar. These general methods
are, however, too inefficient to use in production compilers.
The most efficient top-down and bottom-up methods work only for subclasses of
grammars, but several of these classes, particularly, LL and LR grammars, are
expressive enough to describe most of the syntactic constructs in modern
programming languages.
14
LECTURE NOTES
Syntax Error Handling
• Syntactic errors include misplaced semicolons or extra or missing braces; that is,
"{ " or "}. " As another example, in C or Java, the appearance of a cas e statement
without an enclosing switc h is a syntactic error (however, this situation is usually
allowed by the parser and caught later in the processing, as the compiler attempts
to generate code).
• Logical errors can be anything from incorrect reasoning on the part of the
programmer to the use in a C program of the assignment operator = instead of the
comparison operator ==. The program containing = may be well formed; however,
it may not reflect the programmer's intent.
Error-Recovery Strategies
In this method, successive characters from the input are removed one at a time
until a designated set of synchronizing tokens is found. Synchronizing tokens are
deli-meters such as; or }
The advantage is that it’s easy to implement and guarantees not to go into an
infinite loop The disadvantage is that a considerable amount of input is skipped
without checking it for additional errors.
15
LECTURE NOTES
2. Phrase-Level Recovery :
• The correction can be deletion of extra semicolons, replacing the comma with
semicolons, or inserting a missing semicolon.
• While performing correction, utmost care should be taken for not going in an
infinite loop.
3. Error production
• If a user has knowledge of common errors that can be encountered then, these
errors can be incorporated by augmenting the grammar with error productions
that generate erroneous constructs.
• If this is used then, during parsing appropriate error messages can be generated
and parsing can be continued.
4. Global Correction
• The parser examines the whole program and tries to find out the closest match
for it which is error-free.
• The closest match program has less number of insertions, deletions, and
G = (V, T, P, S)
Where,
G is the grammar, which consists of a set of the production rule. It is used to generate
the string of a language.
P is a set of production rules, which is used for replacing non-terminals symbols(on the
left side of the production) in a string with other terminal or non-terminal symbols(on
the right side of the production).
S is the start symbol which is used to derive the string. We can derive the string by
repeatedly replacing a non-terminal by the right-hand side of the production until all
non-terminal have been replaced by terminal symbols.
Example 1:Construct the CFG for the language having any number of a's over the set
∑= {a}.
Now if we want to derive a string "aaaaaa", we can start with start symbols.
17
LECTURE NOTES
S
⇒ aS
⇒ aaS rule 1
⇒ aaaS rule 1
⇒ aaaaS rule 1
⇒ aaaaaS rule 1
⇒ aaaaaaS rule 1
⇒ aaaaaaε rule 2
⇒ aaaaaa
The r.e. = a* can generate a set of string {ε, a, aa, aaa,.....}. We can have a null
string because S is a start symbol and rule 2 gives S → ε.
The rules are in the combination of 0's and 1's with the start symbol. Since (0+1)*
indicates {ε, 0, 1, 01, 10, 00, 11, ....}. In this set, ε is a string, so in the rule, we can
set the rule S → ε.
Solution: The string that can be generated for a given language is {aacaa, bcb,
abcba, bacab, abbcbba, ....}
18
LECTURE NOTES
Now if we want to derive a string "abbcbba", we can start with start symbols.
S ⇒aSa
⇒abSba from rule 2
⇒ abbSbba from rule 2
⇒abbcbba from rule 3
Thus any of this kind of string can be derived from the given production rules.
Solution: The string that can be generated for a given language is {abb, aabbbb,
aaabbbbbb....}.
Now if we want to derive a string "aabbbb", we can start with start symbols.
S ⇒aSbb
⇒ aabbbb
Solution: We can easily show using the pumping lemma that the language
L = { w wR | w ϵ (0+1)* } is not regular.
However, we can describe this language by the following context-free grammar over
the alphabet {0,1}:
· P→ε
· P → 0
· P→1
· P → 0P0
· P → 1P1
19
LECTURE NOTES
Example 6 :
Give context-free grammars that generate the following languages.
A → 0 | 0A | 0A1
B → 1 | B1 | 0B1
C → 2 | 2C
D → 0 | 0D
E → 1 | 1E | 1E2
F → 2 | F2 | 1F2
20
LECTURE NOTES
Leftmost Derivation :
In the previous example we used a derivation called a leftmost derivation. We can
specifically denote a leftmost derivation using the subscript “lm”, like:
⇒lm or ⇒*lm
A leftmost derivation is simply one in which we replace the leftmost variable in a production
body by one of its production bodies first, and then work our way from left to right.
Rightmost Derivation :
Not surprisingly, we also have a rightmost derivation which we can specifically denote via:
⇒rm or ⇒*r
A rightmost derivation is one in which we replace the rightmost variable by one of its
production bodies first , and then work our way from right to left.
21
LECTURE NOTES
Derivation:
Derivation is a sequence of production rules. It is used to get the input string through these
production rules. During parsing, we have to take two decisions. These are as follows:
We have two options to decide which non-terminal to be placed with production rule.
1. Leftmost Derivation:
In the leftmost derivation, the input is scanned and replaced with the production rule from left
to right. So in leftmost derivation, we read the input string from left to right.
Example:
Production rules:
1. E -> E + E
2. E -> E - E
3. E ->a | b
Input:
a-b+a
E => E + E
=> E - E + E
=> a - E + E
=> a - b + E
=> a - b + a
22
LECTURE NOTES
2. Rightmost Derivation:
In rightmost derivation, the input is scanned and replaced with the production rule from right
to left. So in rightmost derivation, we read the input string from right to left.
Example
Production rules:
1. E ->E + E
2. E ->E - E
3. E -> a | b
Input : a - b + a
E => E - E
=> E - E + E
=> E - E + a
=> E - b + a
=> a - b + a
When we use the leftmost derivation or rightmost derivation, we may get the same string. This
type of derivation does not affect on getting of a string.
Examples of Derivation:
Example 1:Derive the string "abb" for leftmost derivation and rightmost derivation using a
CFG given by,
1. S → AB | ε
2. A → aB
3. B → Sb
23
LECTURE NOTES
Solution:
Leftmost derivation:
=>AB
=>aB B
=> a Sb B
=> a ε bB
=> a b Sb
=> ab ε b
=> abb
Rightmost derivation:
=> A B
=>A S b
=> A ε b
=> aB b
=> a Sb b
=> a εbb
=> abb
24
LECTURE NOTES
Example 2:
Derive the string "aabbabba" for leftmost derivation and rightmost derivation using a CFG given
by,
1. S → aB | bA
2. S → a | aS | bAA
3. S → b | aS | aBB
Solution:
Leftmost derivation:
S
=> aB S → aB
=>aaBB B → aBB
=> aabB B→b
=> aabbS B → bS
=> aabbaB S → aB
=> aabbabS B → bS
=> aabbabbA S → bA
=>aabbabba A→a
Rightmost derivation:
S
=> aB S → aB
=> aaBbS B → bS
=> aaBbbA S → bA
=> aaBbba A→a
=> aabSbba B → bS
=> aabbAbba S → bA
25
LECTURE NOTES
Derivation Tree
Derivation tree is a graphical representation for the derivation of the given production rules for
a given CFG. It is the simple way to show how the derivation can be done to obtain some
string from a given set of production rules. The derivation tree is also called a parse tree.
Parse tree follows the precedence of operators. The deepest sub-tree traversed first. So, the
operator in the parent node has less precedence over the operator in the sub-tree.
Example 1:
Production rules:
1. E = E + E
2. E = E * E
3. E = a | b | c
Input
1. a * b + c
Step 1:
26
LECTURE NOTES
Step 2:
Step 3:
Step 4:
27
LECTURE NOTES
Step 5:
Note: We can draw a derivation tree step by step or directly in one step.
Example 2:
Draw a derivation tree for the string "bab" from the CFG given by
1. S → bSb | a | b
Solution:
28
LECTURE NOTES
The above tree is a derivation tree drawn for deriving a string bbabb. By simply
reading the leaf nodes, we can obtain the desired string. The same tree can also be
denoted by,
Example 3:
Construct a derivation tree for the string aabbabba for the CFG given by,
1. S → aB | bA
2. A → a | aS | bAA
3. B → b | bS | aBB
Solution:
To draw a tree, we will first try to obtain derivation for the string aabbabba
29
LECTURE NOTES
=>a B
=>a aBB
=>aa bS B
=>aab bA B
=>aabb a B
=>aabba bS
=>aabbab bA
=>aabbabb a
30
LECTURE NOTES
Example 6:construct parse tree for the string w=abaabb:
S → SS | aSb | ε
Solution:
Ambiguity in Grammar :
A grammar is said to be ambiguous if there exists more than one leftmost
derivation or more than one rightmost derivation or more than one parse tree for
the given input string. If the grammar is not ambiguous, then it is called
unambiguous.
If the grammar has ambiguity, then it is not good for compiler construction. No
method can automatically detect and remove the ambiguity, but we can remove
ambiguity by re-writing the whole grammar without ambiguity.
Example 1:
1. E → I
2. E → E + E
3. E → E * E
4. E → (E)
5. I → ε | 0 | 1 | 2 | ... | 9
31
LECTURE NOTES
Solution:
For the string "3 * 2 + 5", the above grammar can generate two parse trees by
leftmost derivation:
Since there are two parse trees for a single string "3 * 2 + 5", the grammar G is
ambiguous.
Example 2:
1. E → E + E
2. E → E - E
3. E → id
Solution:
From the above grammar String "id + id - id" can be derived in 2 ways:
32
LECTURE NOTES
E => E - E
=> E + E - E
=> id + E - E
=> id + id - E
=> id + id - id
Since there are two leftmost derivation for a single string "id + id - id", the grammar G
is ambiguous.
Example 3:
1. S → aSb | SS
2. S → ε
Solution:
For the string "aabb" the above grammar can generate two parse trees
33
LECTURE NOTES
Since there are two parse trees for a single string "aabb", the grammar G is
ambiguous.
Example 4:
1. A → AA
2. A → (A)
3. A → a
Solution:
For the string "a(a)aa" the above grammar can generate two parse trees:
Since there are two parse trees for a single string "a(a)aa", the grammar G is
ambiguous.
34
LECTURE NOTES
WRITING A GRAMMAR
TOP-DOWN PARSING
A program that performs syntax analysis is called a parser. A syntax analyzer takes
tokens as input and output error message if the program syntax is wrong. The
parser uses symbol-look- ahead table and an approach called top-down parsing
without backtracking.
35
LECTURE NOTES
A usual implementation of an LL(1) parser is: initialize its data structures, get the
lookahead token by calling scanner routines, and call the routine that implements
the start symbol.
36
LECTURE NOTES
Here is an example.
proc syntaxAnalysis()
begin
initialize(); // initialize global data and structures nextToken(); //
get the lookahead token
Back-tracking
Top- down parsers start from the root node (start symbol) and match the input
string against the production rules to replace them (if matched). To understand
this, take the following example of CFG:
S → rXd | rZd
X → oa | ea
Z → ai
Left Factoring
Left factoring is a process by which the grammar with common prefixes is
transformed to make it useful for Top down parsers.
S → iEtS / iEtSeS / a
E→b
The left factored grammar is-
S → iEtSS’ / a
S’ → eS / ∈
E→b
37
LECTURE NOTES
Left Recursion:
A grammar is left-recursive if and only if there exists a nonterminal symbol that can derive to a sentential form
with itself as the leftmost symbol .
A→A𝛼/𝛽
Elimination of Left Recursion
A→A𝛼/𝛽
Introduce a new nonterminal A' and rewrite the rule as
𝐴 → 𝛽𝐴′
𝐴′ → 𝛼𝐴′/∈
1) E →E +T / T
Solution:
E→ 𝑇𝐸′ 𝐸′→ +𝑇 𝐸′ / ∈
2) T →T*F / F
Solution:
T→ FT′ T′→ *F T′ / ∈
To compute FIRST(X) for all grammar symbols X, apply the following rules until no more terminals or e can be
added to any FIRST set.
To compute the FIRST(A) for all nonterminals A, apply the following rules until nothing can be added to any
FOLLOW set.
1. Place $ in FOLLOW(S), where S is the start symbol and $ in the input right endmarker.
2. If there is a production A=>aBs where FIRST(s) except e is placed in FOLLOW(B).
3. If there is a production A->aB or a production A->aBs where FIRST(s) contains e, then everything in
FOLLOW(A) is in FOLLOW(B).
PROBLEM :
Consider the following example to understand the concept of First and Follow.Find the first and follow of all
nonterminals in the Grammar-
E -> TE' E'-> +TE'|e T -> FT'
T'-> *FT'|e F -> (E)|id Then:
38
LECTURE NOTES
FIRST(E)=FIRST(T)=FIRST(F)={(,id} FIRST(E')={+,e}
FIRST(T')={*,e} FOLLOW(E)=FOLLOW(E')={),$}
FOLLOW(T)=FOLLOW(T')={+,),$}
FOLLOW(F)={+,*,),$}
For example, id and left parenthesis are added to FIRST(F) by rule 3 in definition of FIRST with i=1
in each case, since FIRST(id)=(id) and FIRST('(')= {(} by rule 1. Then by rule 3 with i=1, the
production T -> FT' implies that id and left parenthesis belong to FIRST(T) also.
To compute FOLLOW,we put $ in FOLLOW(E) by rule 1 for FOLLOW. By rule 2 applied to production
F-> (E), right parenthesis is also in FOLLOW(E). By rule 3 applied to production E-> TE', $ and right
parenthesis are in FOLLOW(E').
Example 2:
S → aBDh
B ->cC
C → bC / ∈
D → EF
E→g/∈
F→f/∈
Follow Functions
Follow(S) = { $ }
Follow(B) = { First(D) – ∈ } ∪ follow(D) = { g , f , h }
Follow(C) = Follow(B) = {c, g , f , h }
Follow(D) = First(h) = { h }
Follow(E) = { First(F) – ∈ } ∪ Follow(F) = { f , h }
Follow(F) = Follow(D) = { h }
39
LECTURE NOTES
PREDICTIVE PARSER :
A predictive parser is a recursive descent parser with no backtracking or backup. It
is a top-down parser that does not require backtracking. At each step, the choice of
the rule to be expanded is made upon the next terminal symbol.
40
LECTURE NOTES
LL(1) GRAMMAR
The above algorithm can be applied to any grammar G to produce a parsing table M. For
some Grammars, for example if G is left recursive or ambiguous, then M will have at least
one multiply-defined entry. A grammar whose parsing table has no multiply defined
entries is said to be LL(1). It can be shown that the above algorithm can be used to
produce for every LL(1) grammar G a parsing table M that parses all and only the
sentences of G. LL(1) grammars have several distinctive properties. No ambiguous or left
recursive grammar can be LL(1).
There remains a question of what should be done in case of multiply defined entries. One
easy solution is to eliminate all left recursion and left factoring, hoping to produce a
grammar which will produce no multiply defined entries in the parse tables. Unfortunately
there are some grammars which will give an LL(1) grammar after any kind of alteration. In
general, there are no universal rules to convert multiply defined entries into single valued
entries without affecting the language recognized by the parser.
The main difficulty in using predictive parsing is in writing a grammar for the source
language such that a predictive parser can be constructed from the grammar. Although left
recursion elimination and left factoring are easy to do, they make the resulting grammar
hard to read and difficult to use the translation purposes. To alleviate some of this
difficulty, a common organization for a parser in a compiler is to use a predictive parser for
control constructs and to use operator precedence for expressions. However, if an LR
parser generator is available, one can get all the benefits of predictive parsing and operator
precedence automatically.
41
LECTURE NOTES
E → TE′
E′ → +TE′|ε
T′ → FT′
T′ → FT′|ε
F →calc(E)|id
: lex.yy.o
y.tab.o gcc -o calc lex.yy.o y.tab.o -ly -ll
Solution:
lex.yy.c: calc.l y.tab.c
flex calc.l
Steps to
y.tab.c: follow
calc.y
to construct predictive parser is,
bison -vdty calc.y
Now, fill the table by applying rules from the construction of the Predictive Parsing
Table.
E → TE′
Comparing E → TE′with A → α
E → TE′
A→ Α
∴ A = E, α = TE′
∴ FIRST(α) = FIRST(TE′) = FIRST(T) = {(, id}
Applying Rule (1) of Predictive Parsing Table
∴ ADD E → TE′ to M[E, ( ] and M[E, id]
∴ write E → TE′in front of Row (E)and Columns {(, id} (1)
E′ → +TE′|𝜀
7
9
LECTURE NOTES
Comparing it with A → α
E → +TE′
A→ α
∴ A = E′ α = +TE′
∴ FIRST(α) = FIRST(+TE′) = {+}
∴ ADD E → +TE′ to M[E′, +]
∴ write production E′ → +TE′ in front of Row (E′)and Column (+) (2)
E′ → ε
Comparing it with A → α
E→ ε
A→ α
∴α=ε
∴ FIRST(α) = {ε}
∴ Applying Rule (2)of the Predictive Parsing Table.
Find FOLLOW (E′) = { ), $}
T → FT′
Comparing it with A → α
T → FT′
A→ α
∴ A = T, α = FT′
∴ FIRST(α) = FIRST (FT′) = FIRST (F) = {(, id}
∴ ADD Production T → FT′ to M[T, (] and M[T, id]
∴ write T → FT′ in front of Row (T)and Column {(,
id} (4)
T′ →*FT′
Comparing it with A → α
8
0
LECTURE NOTES
T → *FT′
A→ α
∴ FIRST(α) = FIRST (* FT′) = {*}
∴ ADD Production T → +FT′ to M[T′,*]
∴ write T′ →∗ FT′ in front of Row (T′)and Column {*} (5)
T′ → ε
Comparing it with A → α
T′ → ε
A→ α
∴ A = T′
α=ε
F →(E)
Comparing it with A → α
F → (E)
A→ A
∴ A = F, α = E
∴ FIRST(α) = FIRST ((E)) = {(}
∴ ADD F → (E) to M[F, (]
∴ write F → (E) in front of Row (F)and Column (( ) (7)
F → id
Comparing it with A → α
F → Id
A→ A
8
1
LECTURE NOTES
∴ FIRST(α) = FIRST (id) = {id}
∴ ADD F → id to M[F, id]
∴ write F → id in front of Row (F)and Column (id) (8)
Combining all statements (1) to (8) will generate the following Predictive or LL (1)
Parsing Table −
Id + * ( ) $
E E → TE′ E → TE′
E′ E′ → +TE′ E′ → ε T′ → ε
T T → FT′ T → FT′
T′ T′ → ε T′ →* FT′ T′ → ε T′ → ε
F F → id F → (E)
$E id + id * id $ E → TE′
$E′T id + id * id $ T → FT′
$E′T′F id + id * id $ F → id
$E′T′id id + id * id $ Remove id
$E′T′ +id * id $ T′ → ε
$E′T id * id $ T → FT′
$E′T′F id * id $ F → id
8
2
LECTURE NOTES
$E′T′id id * id $ Remove id
$E′T′ * id $ T′ →* FT′
$E′T′F * * id $ Remove *
$E′T′F id $ F → id
$E′T′id id $ Remove id
$E′T′ $ T′ → ε
$E′ $ E′ → ε
$E $ Accept
8
3
LECTURE NOTES
An error is detected during predictive parsing when the terminal on top of the stack
does not match the next input symbol or when non-terminal A is on top of the
stack, a is the next input symbol, and the parsing table entry M[A,a] is empty.
• Panic-mode error recovery is based on the idea of skipping symbols on the input
48
LECTURE NOTES
Zero or more input symbols are then discarded until a symbol a is found that can
legitimately follow The situation might exist where there is more than one choice
for the nonterminal A.
The parser determines that a string derivable from A contains an error. Part of that
string has already been processed, and the result of this processing is a sequence
of states on top of the stack.
The remainder of the string is still in the input, and the parser attempts to skip over
the remainder of this string by looking for a symbol on the input that can
legitimately follow A.
By removing states from the stack, skipping over the input, and pushing GOTO(s,
A) on the stack, the parser pretends that if has found an instance of A and resumes
normal parsing.
49
LECTURE NOTES
PHRASE-LEVEL RECOVERY
Phrase-level recovery is implemented by examining each error entry in the LR
action table and deciding on the basis of language usage the most likely
programmer error that would give rise to that error. An appropriate recovery
procedure can then be constructed; presumably the top of the stack and/or first
input symbol would be modified in a way deemed appropriate for each error entry.
In designing specific error-handling routines for an LR parser, we can fill in each
blank entry in the action field with a pointer to an error routine that will take the
appropriate action selected by the compiler designer.
The actions may include insertion or deletion of symbols from the stack or the input
or both, or alteration and transposition of input symbols. We must make our
choices so that the LR parser will not get into an infinite loop. A safe strategy will
assure that at least one input symbol will be removed or shifted eventually, or that
the stack will eventually shrink if the end of the input has been reached. Popping a
stack state that covers a non terminal should be avoided, because this modification
eliminates from the stack a construct that has already been successfully parsed.
50
LECTURE NOTES
Bottom-Up Parser:
Bottom-up parsing can be defined as an attempt to reduce the input string w to the
start symbol of grammar by tracing out the rightmost derivations of w in reverse.
E -> E + T | T
T -> T * F | F
F -> (E) | id
Example: id*id
51
LECTURE NOTES
LR PARSING INTRODUCTION
The "L" is for left-to-right scanning of the input and the "R" is for constructing a
rightmost derivation in reverse.
WHY LR PARSING:
• LR parsers can be constructed to recognize virtually all programming-language
constructs for which context-free grammars can be written.
• The class of grammars that can be parsed using LR methods is a proper subset of
the class of grammars that can be parsed with predictive parsers.
• An LR parser can detect a syntactic error as soon as it is possible to do so on a
left- to- right scan of the input.
52
MODELS OF LR PARSERS LECTURE NOTES
The schematic form of an LR parser is shown below.
• The program uses a stack to store a string of the form s0 X1 s1 X2...Xm sm where
sm is on top.
• Each Xi is a grammar symbol and each si is a symbol representing a state. Each state
symbol summarizes the information contained in the stack below it.
• The combination of the state symbol on top of the stack and the current input symbol
are used to index the parsing table and determine the shif treduce parsing decision.
• The parsing table consists of two parts: a parsing action function action and a goto
function goto.
This configuration represents the right-sentential form X1 X1 ... Xm ai ai+1 ...an
in essentially the same way a shift-reduce parser would; only the presence of the states
on the stack is new.
Recall the sample parse we did (see Example 1: Sample bottom-up parse) in which
we assembled the right-sentential form by concatenating the remainder of the input buffer to
the top of the stack.
The next move of the parser is determined by reading ai and sm, and consulting the
parsing action table entry action[sm, ai]. Note that we are just looking at the
state here and no symbol below it. We'll see how this actually works later.
The configurations resulting after each of the four types of move are as follows:
If action[sm, ai] = shift s, the parser executes a shift move entering the
configuration (s0 X1 s1 X2 s2... Xm sm ai s, ai+1... an$)
Here the parser has shifted both the current input symbol ai and the next symbol.
If action[sm, ai] = reduce A -> b, then the parser executes a reduce move, entering the
configuration, (s0 X1 s1 X2 s2... Xm-r sm-r A s, ai ai+1... an$)
where s = goto[sm-r, A] and r is the length of b, the right side of the production. The parser
first popped 2r symbols off the stack (r state symbols and r grammar symbols), exposing
state sm-r. The parser then pushed both A, the left side of the production, and s, the entry
for goto[sm-r, A], o44nto the stack. The current input symbol is not changed in a reduce
move.
The output of an LR parser is generated after a reduce move by executing the semantic
action associated with the reducing production. For example, we might just print out the
production reduced.
If action[sm, ai] = accept, parsing is completed. 53
LECTURE NOTES
LECTURE NOTES
OPERATOR GRAMMARS
Examples
This is an example of operator grammar:
E->E+E/E*E/id
However, the grammar given below is not an operator grammar because two non-
terminals are adjacent to each other:
S->SAS/a
A->bSb/b
S->SbSbS/SbS/a
A->bSb/b
54
LECTURE NOTES
a ⋗ b This means a “takes
a ≐ b This means a “has same precedence as” b.
55
LECTURE NOTES
E → E A E |(E)|id
56
LECTURE NOTES
A → +| − | ∗
Solution
E → E + E |E − E |E * E |(E) | id.
Relation Meaning
Depending upon these precedence Relations, we can decide which operations will be
executed or parsed first.
Example−
In a statement a + b * c
∴ + <. *
In statement a * b + c
3
9
LECTURE NOTES
∴∗.>+
i.e., (a + b) + c
+ .> +
∴ It will become a ↑ (b ↑ c)
∴ ↑<. ↑
∴ θ <. id $ <. id
id . > θ id . > $
id . >)
(<. id.
$ <. ( id . > $
$ <. + ). > $
$ <.*
4
0
LECTURE NOTES
E → E + E | E ∗ E/id
Solution
Operator-Precedence Relations
Id + * $
Operator Like minus can be unary or binary. So, this operator can have
Bottom-up parsing starts from the leaf nodes of a tree and works in upward
direction till it reaches the root node. Here, we start from a sentence and then apply
production rules in reverse manner in order to reach the start symbol. The image
given below depicts the bottom-up parsers available.
4
1
LECTURE NOTES
Shift-Reduce Parsing
Shift-reduce parsing uses two unique steps for bottom-up parsing. These steps are
known as shift-step and reduce-step.
Shift step: The shift step refers to the advancement of the input pointer to the
next input symbol, which is called the shifted symbol. This symbol is pushed onto
the stack. The shifted symbol is treated as a single node of the parse tree.
Reduce step : When the parser finds a complete grammar rule (RHS) and replaces it
to (LHS), it is known as reduce-step. This occurs when the top of the stack contains
a handle. To reduce, a POP function is performed on the stack which pops off the
handle and replaces it with LHS non-terminal symbol.
Constructing a parse tree for an input string beginning at the leaves and going
towards the root is called bottom-up parsing.
Example:
Consider the grammar:
S → aABe
A → Abc | b
4
2
LECTURE NOTES
B→d
The sentence to be recognized is abbcde.
abbcde (A → b) S→ aABe
aAde (B → d) → aAbcde
Handles:
Example:
E → E+E
E→ E*E
E → (E)
E → id
E →E+E
→ E+E*E
4
3
LECTURE NOTES
→ E+E*id3
→ E+id2*id3
→id 1+id2*id3
Handle pruning:
$E +id2*id3 $ shift
$ E+ id2*id3 $ shift
$ E+E*E $ reduce by E→ E *E
$E $ accept
Shift-reduce conflict:
Example:
1. Reduce-reduce conflict:
Consider
the
grammar:
M → R+R
| R+c | R
R→c
$c +c $ Reduc $c +c $ Reduc
e by e by
R→c R→c
$R +c $ Shift $R +c $ Shift
$ R+ c$ Shift $ R+ c$ Shift
Viable prefixes:
➢a is a viable prefix of the grammar if there iswsuch that awis a right sentinel
form.
➢The set of prefixes of right sentinel forms that can appear on the stack
of a shift-reduce parserare called viable prefixes.
4
➢The set of viable prefixes is a regular 6language.
LECTURE NOTES
LR Parser
There are three widely used algorithms available for constructing an LR parser:
LR(1) – LR Parser:
o Works on complete set of LR(1) Grammar
o Generates large table and large number of states
o Slow construction
LR Parsing Algorithm
token = next_token()
repeat forever
s = top of stack
PUSH si
4
7
LECTURE NOTES
token = next_token()
PUSH goto[s,A]
else
error()
LL vs. LR
LL LR
Starts with the root nonterminal on Ends with the root nonterminal on the
the stack. stack.
Uses the stack for designating what Uses the stack for designating what is
is still to be expected. already seen.
Builds the parse tree top-down. Builds the parse tree bottom-up.
Reads the terminals when it pops one Reads the terminals while it pushes
off the stack. them on the stack.
Pre-order traversal of the parse tree. Post-order traversal of the parse tree.
LALR(1) Parsers
Let's create an SLR(1) parse table for the following augmented grammar, G, from
page 255 of the Dragon Book.
0. S′ → S
1. S → L = R
2. S → R
3. L → * R
4. L → id
5. R → L
State 0:
S′ → ⋅ S
S→⋅L=R
S→⋅R
L→⋅*R
L → ⋅ id
R→⋅L
4
9
LECTURE NOTES
State 1:
S′ → S ⋅
State 2:
S→L⋅=R
R→L⋅
State 3:
S→R⋅
State 4:
L→*⋅R
R→⋅L
L→⋅*R
L → ⋅ id
State 5:
L → id ⋅
State 6:
S→L=⋅R
R→⋅L
L→⋅*R
L → ⋅ id
State 7:
L→*R⋅
State 8:
5
0
LECTURE NOTES
R→L⋅
State 9:
S→L=R⋅
FOLLOW(S) = {$}
FOLLOW(L) = {$, =}
FOLLOW(R) = {$, =}
Look at state 2. LR(0) item S → L ⋅ = R calls for a shift on lookahead =. But LR(0)
item R → L ⋅ calls for a reduce on lookahead =, since FOLLOW(L) contains =. That is
a parsing conflict.
So the shift reduce conflict in state 2 can be resolved by choosing the shift action
without affecting any parses.
Imagine you have an SLR(1) grammar G1. In one part of G1 you have production
R→B
in an entirely different section of the grammar. Call the new grammar G2.
The problem is that FOLLOW sets are global, taking information from the entire
grammar.
LALR(1) parsers
A LALR(1) parser uses the same LR(0) finite-state machine that an SLR(1) parser
uses. But the LALR algorithm is more sensitive, and can remove spurious conflicts
like the one above, by using a more local notion of FOLLOW sets.
Ambiguous grammars provide a shorter and more natural specification for certain
One can isolate a construct for optimization purposes using ambiguous grammars
5
2
LECTURE NOTES
No ambiguous grammar can be LR. But by carefully resolving conflicts an LR
Resolving conflicts can be done by either one of the two following methods
Method 1:
E → E + E | E * E | ( E ) | id
We know that ‘+’ and ‘*’ are left associative and ‘*’ is having precedence
over ‘+’
Method 1:
E’ → E E → E + E E → E * E E → ( E ) E → id
5
3
LECTURE NOTES
and the LR(0) items are:
I0: I5:
E→E.+E I6:
I2: E→E.*E
E→.E*E E→E.*E
E→.(E) I8:
I3: E→E.*E
E→.id I9:
E→.E+E
I4: E→E+.E
E→.E*E E→.(E)
E→id
E→(E).
72
LECTURE NOTES
5
5
LECTURE NOTES
Method 1:
While designing SLR parser table using it’s algorithm, conflicts arise as the
grammar is ambiguous
Let the input is id+id*id$. Then after consuming input upto ‘*’ the stack content will
be
O E1 + 4 E7
Now on ‘*’ shift must happen as ‘*’ is having precedence over ‘+’
Let the input be id+id+id$. After consuming input upto second ‘+’, the stack content
must be
5
6
LECTURE NOTES
0E1 + 4E7
Let the input is id+id*id$. After consuming input upto ‘+’ the stack content is
O E1 + 5 E8
Now on ‘+’ reduction must happen as ‘*’ is having precedence over ‘+’
Let the input be id*id*id$. After consuming input upto second ‘*’, the stack content
will be
0E1 + 5E8
After resolving conflicts the SLR parsing table is as shown in the table.
5
7
LECTURE NOTES
To construct the parser table we must convert our NFA into a DFA. The states in
the LR table will be the e-closures of the states corresponding to the items
SO...the process of creating the LR state table parallels the process of constructing
an equivalent DFA from a machine with e-transitions. Been there, done that - this is
essentially the subset construction algorithm so we are in familiar territory here.
We need two operations: closure() and goto().
closure()
If A -> a.Bb is in closure(I), and B -> g is a production, then add the initial item [B
-> .g] to I, if it is not already there. Apply this rule until no more new items can be
added to closure(I).
From our grammar above, if I is the set of one item {[E'-> .E]}, then closure(I)
contains: I0: E' -> .E
E -> .E +
T E -> .T
T -> .T
* F T -> .F
F -> .(E)
F -> .id
76
LECTURE NOTES
goto()
SETS-OF-ITEMS-CONSTRUCTION
Method:
Construct C = {I0, I1 , ..., In} the collection of sets of LR(0) items for G'.
77
LECTURE NOTES
The initial state of the parser is the one constructed from the set of
items containing [S' -> .S].
78
LECTURE NOTES
An Example
(1) E -> E * B
(2) E -> E + B
(3) E -> B
The Action and Goto Table The two LR(0) parsing tables for this grammar look as
follows:
79
LECTURE NOTES
LALR PARSER:
We begin with two observations. First, some of the states generated for LR(1)
parsing have the same set of core (or first) components and differ only in their
second component, the lookahead symbol.
Our intuition is that we should be able to merge these states and reduce the
number of states we have, getting close to the number of states that would be
generated for LR(0) parsing.
This observation suggests a hybrid approach: We can construct the canonical LR(1)
sets of items and then look for sets of items having the same core. We merge
these sets with common cores into one set of items.
The merging of states with common cores can never produce a shift/reduce
conflict that was not present in one of the original states because shift actions
depend only on the core, not the lookahead.
Our second observation is that we are really only interested in the lookahead symbol
in places where there is a problem.
So our next thought is to take the LR(0) set of items and add lookaheads only
where they are needed. This leads to a more efficient, but much more complicated
method.
80
LECTURE NOTES
Input: G’
Output: LALR parsing table functions with action and goto for G'. Method:
1. Construct C = {I0, I1 , ..., In} the collection of sets of LR(1) items for G'.
2. For each core present among the set of LR(1) items, find all sets having that
core and replace these sets by the union.
3. Let C' = {J0, J1 , ..., Jm} be the resulting sets of LR(1) items. The parsing
actions for state i are constructed from Ji in the same manner as in the
construction of the canonical LR parsing table.
4. If there is a conflict, the grammar is not LALR(1) and the algorithm fails.
5. The goto table is constructed as follows: If J is the union of one or more sets of
LR(1) items, that is, J = I0U I1 U ... U Ik, then the cores of goto(I0, X), goto(I1, X),
..., goto(Ik, X) are the same, since I0, I1 , ..., Ik all have the same core. Let K be
the union of all sets of items having the same core asgoto(I1, X).
81
LECTURE NOTES
state c d $ S C
0 S36 S47 1 2
1 Accept
2 S36 S47 5
36 S36 S47 89
47 R3 R3
5 R1
89 R2 R2 R2
HANDLING ERRORS
The LALR parser may continue to do reductions after the LR parser would have
spotted an error, but the LALR parser will never do a shift after the point the LR
parser would have discovered the error and will eventually find the error.
LR ERROR RECOVERY
An LR parser will detect an error when it consults the parsing action table and find
a blank or error entry. Errors are never detected by consulting the goto table. An
LR parser will detect an error as soon as there is no valid continuation for the
portion of the input thus far scanned. A canonical LR parser will not make even a
single reduction before announcing the error. SLR and LALR parsers may make
several reductions before detecting an error, but they will never shift an erroneous
input symbol onto the stack.
82
LECTURE NOTES
Steps:
1. create augment grammar
2. generate kernel items
3. find closure
4. compute goto()
5.construct parsing table
S->a
S->(L)
L->S
L->L,S
S’-> .S
S’-> .S
S-> . a
S->.(L) I0
83
LECTURE NOTES
ACTION GOTO
a ( ) , $ S L
0 S2 S3 1
1 acce
pt
2 R1 R1 R1
3 S2 S3 5 4
4 S6 S7
5 R3 R3
6 R2 R2 R2
7 S2 S3 8
8 R4 R4
84
LECTURE NOTES
R0 S’-> S . I1 Follow(S’)={ $ }
R1 S-> a . I2 Follow(S)={ $ )
,}
R2 S-> ( L ) . I6 Follow(S)={ $ )
,}
R3 L-> S . I5 Follow(L)={ ) ,
}
R4 L-> L , S . I8 Follow(L)={ ) ,
}
85
LECTURE NOTES
$0 (a,a)$ S3
$0(3 a,a)$ S2
$0(3a2 ,a)$ R1
$0(3S5 ,a)$ R3
$0(3L4 ,a)$ S7
$0(3L4 , 7 a)$ S2
$0(3L4 , 7a2 )$ R1
$0(3L4 , 7S8 )$ R4
$0(3L4 )$ S6
$0(3L4)6 $ R2
$0S1 $ accept
86
LECTURE NOTES
Steps:
1. create augment grammar
3. find closure
4. compute goto()
5.construct parsing table
6.parse the string
S->L=R
S->R
L->*R
L->id
R->L
87
LECTURE NOTES
S’-> .S , $
S-> . L=R
S-> . R
I0
L-> . *R
L-> . Id
R-> . L
Next find 2nd component:
compare each of the production with A-> α
S’-> .S , $ .Bβ , a
S-> . L=R, S’ -> . s, $ here no β, so $ is 2nd comp to S
$ S -> . L=R,$
S-> . R, $ comp to L
L-> . *R, S-> . R, $ here no β, so $ is
I0
=/$ 2 comp to R
nd
L-> . L production is not in
Id,=/$ standard form
R-> . L, $ R-> . L, $ here no β, so $ is
2nd comp to L
Rule to find 2nd component:
Consider the production of the form : A-> α .Bβ , a
THEN 2nd component of B is : β , if it is terminal
First (β ) if it is non terminal
a, if there is no β
88
LECTURE NOTES
We notice that some states in CLR parser have the same core items and differ only
in possible lookahead symbols.
Such as
I4 and I4’
I5 and I5’
I7 and I7’
I8 and I8’
So we shrink the obtained CLR parser by merging such states to form LALR Parser
Hence
CLR PARSER has 14 States (I0, I1,I2,I3,I4,I4’,I5,I5’,I6,I7,I7’,I8,I8’,I9)
LALR PARSER or LR(2) has 10 states (I0, I1,I2,I3,I4,I5,I6,I7,I8,I9)
89
LECTURE NOTES
2. Rules
3. Auxiliary routines
90
LECTURE NOTES
Definition:
The definition part includes information about the tokens used in the syntax
definition.
%token NUMBER
%token ID
Yacc automatically assigns numbers for tokens, but it can be overridden by
%token NUMBER 621
Yacc also recognizes single characters as tokens. Therefore, assigned token
numbers should no overlap ASCII codes.
The definition part can include C code external to the definition of the parser and
variable declarations, within %{ and %} in the first column.
It can also include the specification of the starting symbol in the grammar:
%start nonterminal
Rule Part:
The rules part contains grammar definition in a modified BNF form.
Actions is C code in { } and can be embedded inside (Translation schemes).
Auxiliary Routines Part:
The auxiliary routines part is only C code.
It includes function definitions for every function needed in rules part.
It can also contain the main() function definition if the parser is going to be run as a
program.
The main() function must call the function yyparse().
Steps for Compiling YACC Program:
Write lex program in a file file.l and yacc in a file file.y
Open Terminal and Navigate to the Directory where you have saved the files.
type lex file.l
type yacc file.y
type cc lex.yy.c y.tab.h -ll
type ./a.out
91
LECTURE NOTES
92
LECTURE NOTES
char *argv[];
{
progname=argv[0];
yyparse();
}
yylex()
{
int c;
while((c=getchar())==' '||c=='\t')
;
if(c==EOF)
return 0;
if(c=='.'||isdigit(c))
{
ungetc(c,stdin);
scanf("%lf",&yylval);
return NUMBER;
}
if(c=='\n')
lineno++;
return(c);
93
LECTURE NOTES
bison textFile.y - a y.tab.c file containing C code for a parser is created.
gcc -c y.tab.c - compile parser code normally using gcc C compiler.
gcc -o parse y.tab.o lex.yy.o -ll -ly - we link the parser, scanner and libraries.
./parse - we execute the parser that reads from standard input.
The bison input file format is similar to the yacc file format therefore we shall not repeat
it here.
Following the example calculator we defined in the previous section the makefile in
this case would resemble the one below:
calc: lex.yy.o
y.tab.o gcc -o calc lex.yy.o y.tab.o -ly -ll
y.tab.c: calc.y
bison -vdty calc.y
SOLVED PROBLEMS:
Example 1
Construct a Predictive Parsing table for the following grammar & also check whether
string id + id * id is accepted or not.
E → TE′
E′ → +TE′|ε
T′ → FT′
T′ → FT′|ε
F → (E)|id
Solution −
Step1− Elimination of Left Recursion & perform Left Factoring
As there is no left recursion in Grammar so, we will proceed as it is. Also, there is no
need for Left Factoring.
7
Step2− Computation of FIRST 8
LECTURE NOTES
FIRST(E) = FIRST(T) = FIRST(F) = {(,
id} FIRST (E′) = {+, ε}
Now, fill the table by applying rules from the construction of the Predictive Parsing
Table.
E → TE′
Comparing E → TE′with A → α
E→ TE′
A→ Α
∴ A = E, α = TE′
∴ FIRST(α) = FIRST(TE′) = FIRST(T) = {(, id}
Applying Rule (1) of Predictive Parsing Table
∴ ADD E → TE′ to M[E, ( ] and M[E, id]
∴ write E → TE′in front of Row (E)and Columns {(, id} (1)
E′ → +TE′|𝜀
7
9
LECTURE NOTES
Comparing it with A → α
E→ +TE′
A→ α
∴ A = E′ α = +TE′
∴ FIRST(α) = FIRST(+TE′) = {+}
∴ ADD E → +TE′ to M[E′, +]
∴ write production E′ → +TE′ in front of Row (E′)and Column (+) (2)
E′ → ε
Comparing it with A → α
E→ ε
A→ α
∴α=ε
∴ FIRST(α) = {ε}
∴ Applying Rule (2)of the Predictive Parsing Table.
Find FOLLOW (E′) = { ), $}
∴ ADD Production E′ → ε to M[E′, )] and M[E′, $]
∴ write E′ → ε in front of Row (E′)and Column {$,
)} (3)
T → FT′
Comparing it with A → α
T→ FT′
A→ α
∴ A = T, α = FT′
∴ FIRST(α) = FIRST (FT′) = FIRST (F) = {(, id}
∴ ADD Production T → FT′ to M[T, (] and M[T, id]
∴ write T → FT′ in front of Row (T)and Column {(, id}
(4)
T′ →*FT′
Comparing it with A → α
8
0
LECTURE NOTES
T→ *FT′
A→ α
∴ FIRST(α) = FIRST (* FT′) = {*}
∴ ADD Production T → +FT′ to M[T′,*]
∴ write T′ →∗ FT′ in front of Row (T′)and Column {*} (5)
T′ → ε
Comparing it with A → α
T′ → ε
A→ α
∴ A = T′ α = ε
∴ FIRST(α) = FIRST {𝜀} = {ε}
∴ Applying Rule (2)of the Predictive Parsing Table. Find
FOLLOW (A) = FOLLOW (T′) = {+, ), $}
∴ ADD T′ → ε to M[T′, +], M[T′, )] and M[T′, $]
∴ write T′ → ε in front of Row (T′)and Column {+, ), $} (6)
F →(E)
Comparing it with A → α
F→ (E)
A→ A
∴ A = F, α = E
∴ FIRST(α) = FIRST ((E)) = {(}
∴ ADD F → (E) to M[F, (]
∴ write F → (E) in front of Row (F)and Column (( ) (7)
F → id
Comparing it with A → α
F→ Id
A→ A
8
1
LECTURE NOTES
∴ FIRST(α) = FIRST (id) = {id}
∴ ADD F → id to M[F, id]
∴ write F → id in front of Row (F)and Column (id) (8)
Combining all statements (1) to (8) will generate the following Predictive or LL (1)
Parsing Table −
Id + * ( ) $
E E → TE′ E → TE′
E′ E′ → +TE′ E′ → ε T′ → ε
T T → FT′ T → FT′
T′ T′ → ε T′ →* FT′ T′ → ε T′ → ε
F F → id F → (E)
Step5− Checking Acceptance of String id + id * id using Predictive Parsing Program
Initially, the stack will contain the starting symbol E and $ at the bottom of the stack.
Input Buffer will contain a string attached with $ at the right end.
If the top of stack = Current Input Symbol, then symbol from the top of the stack will
be popped, and also input pointer advances or reads next symbol.
The sequence of Moves by Predictive Parser
$E id + id * id $ E → TE′
$E′T id + id * id $ T → FT′
$E′T′F id + id * id $ F → id
$E′T′id id + id * id $ Remove id
$E′T′ +id * id $ T′ → ε
$E′T id * id $ T → FT′
$E′T′F id * id $ F → id
8
2
LECTURE NOTES
$E′T′id id * id $ Remove id
$E′T′ * id $ T′ →* FT′
$E′T′F * * id $ Remove *
$E′T′F id $ F → id
$E′T′id id $ Remove id
$E′T′ $ T′ → ε
$E′ $ E′ → ε
$E $ Accept
EXAMPLE 2:
Consider the ambiguous grammar. E → E + E E → E * E E → (E) E → id (a)
Construct LR (0) items for above grammar. (b) Construct SLR parsing table for
grammar. (c) Parse the input string id + id * id.
Solution
8
3
LECTURE NOTES
84
LECTURE NOTES
Applying goto on I9
∵ goto cannot be applied on I9, as the dot in E → (E). is on the last position.
Step3− Computation of FOLLOW
E→E+E
Comparing E → E + E with A → α B β
∴ α = ε, B = E, β = +E
∵ FIRST(β) = FIRST(+E) = {+}
∴ Rule (2a)of FOLLOW FOLLOW(E) = {+}
Applying Rule (3) (2)
Comparing E → E + E with A → α B
∴ A = E, α = E+, B = E FOLLOW(E) =
{FOLLOW(E)}
(3)
E→E∗E
Applying Rule (2) and Rule (3) of FOLLOW, we get
FOLLOW(E) = {*} FOLLOW(E) = {FOLLOW(E)}
(4)
E → (E)
Applying Rule (2) (5)
E → id
101
LECTURE NOTES
Step4 − Construction of SLR Parsing Table
0 id + id * id$ Shift
0 id 3 + id * id $ Reduce by E → id
0E1+4 id * id $ Shift
0 E 1 + 4 id * id $ Reduce by E → id
3
102
LECTURE NOTES
0 E 1 + 4 E 7* 5 id 3 $ Reduce by E → id
0 E 1 + 4 E 7* 5 E 8 $ Reduce by E → E * E
0E1+4E7 $ Reduce by E → E + E
0E1 $ accept
The above parsing solves the conflicting problem in Action [7, *]. So,
Action [7, *] = s5 instead of r1.
Similarly on Parsing the string id + id + id.
0 id + id + id$ Shift
0 id 3 + id + id $ Reduce by E → id
0 E 1 +4 id + id $ Shift
0 E 1 +4 id 3 +id $ Reduce by E → id
0E+4 id $ Shift
103
LECTURE NOTES
0 E + 4id 3 $ Reduce by E → id
0 E + 4E 7 $ Reduce by E → E + E
0E1 $ accept
So, the Above parsing shows how to resolve shift Reduce conflict at Action [7,
+] So, Action [7, +] = r1 instead of s4
Similarly, other entries such as Action [8, +] and Action [8, *] can be solved by
taking strings.
id * id * id and id * id + id
8
8
9. ASSIGNMENT : UNIT – II
105
10. Part A – Questions & Answers
S.No Questions K CO
Level Level
1. Differentiate Top Down Parser And Bottom Up
Parser? Give Example for each.
Top down Parser are the parsers which constructs the
parse tree from the root to the leaves in pre- order for the
given input string. Predictive Parser, Recursive Descendent
K1
Parser. CO2
3. Define Handles.
A handle of a string is a substring that matches the right side
of a production. This reduction helps in constructing the
K1 CO2
parse tree or right most derivation.
106
10. Part A – Questions & Answers
S.No Questions K CO
Level Level
4. Define ambiguous grammar with an example, and
specify it demerits.
If a grammar produces more than one parse tree for the
given input string then it is called ambiguous grammar. Its
K1 CO2
demerit is It is difficult to select or determine which parse
tree is suitable for an input string.
Ex:
E E+E / E*E / id
5. Mention the properties of parse tree.
The root is labeled by the start symbol.
Each leaf is labeled by a token or by
9
1
10. Part A – Questions & Answers
S.No Questions K CO
Level Level
8. How will you define a context free grammar?
A context free grammar consists of terminals, non-terminals,
a start symbol, and productions.
i.Terminals are the basic symbols from which strings are
formed. “Token” is a synonym for terminal. Ex: if, then,
K1 CO2
else.
ii.Nonterminals are syntactic variables that denote sets of
strings, which help define the language generated by the
grammar. Ex: stmt, expr.
9
2
10. Part A – Questions & Answers
S.No Questions K CO
Level Level
10. Draw syntax tree for the expression a=b*– c+b*– c.
K2 CO2
CO2
K2
13. What are the various conflicts that occur during shift
reduce parsing?
K1 CO2
Reduce/Reduce conflict
Shift/ Reduce conflict
9
3
10. Part A – Questions & Answers
S.No Questions K CO
Level Level
14. What is dangling reference?
A dangling reference occurs when there is a reference to
storage that has been deallocated. It is a logical error to use
dangling references, since the value of deallocated storage is
K1 CO2
undefined according to the semantics of most languages.
9
4
10. Part A – Questions & Answers
S.No Questions K CO
Level Level
18. Mention the role of semantic analysis.
It is used to check the type information of the syntactically
verified statements. K1 CO2
1) Universal parsing
2) Top-down K1 CO2
3) Bottom-up
20. Write short notes on YACC.
YACC is an automatic tool for generating the parser program.
YACC stands for Yet Another Compiler Compiler which is
basically the utility available from UNIX. Basically YACC is K1 CO2
9
5
10. Part A – Questions & Answers
S.No Questions K CO
Level Level
22. What are the different strategies that a parser can
employ to recover from a syntactic error?
• Panic mode
• Phrase level
• Error productions K1 CO2
• Global correction
9
6
10. Part A – Questions & Answers
S.No Questions K CO
Level Level
25. Write the algorithm for FIRST and FOLLOW in parser.
FIRST(α) is the set of terminals that begin strings derived
from α.
Rules
To compute FIRST(X), where X is a grammar symbol
If X is a terinal, then FIRST(X)={X}
If X-> є is a production, then add є to FIRST(X)
If X is a non terminal and X->Y1 Y2..Yk is a production.
Then add FIRST(Y1) to FIRST (X). If Y1 derives є. Then add
FIRST(Y2) to FIRST(X)
Rules
If $ is the input end-marker, and S is the start symbol, $ ∈
FOLLOW(S).
If there is a production, A → αBβ, then (FIRST (β) – ε) ⊆
FOLLOW (B).
If there is a production, A → αB, or a production A → αBβ,
where ε ∈ FIRST (β), then FOLLOW (A) ⊆ FOLLOW (B).
26. Differentiate sentence and sentential form.
Sentence Sentential form
If S=>w then the If S=>a then a is a
string w is called sentential form of G.
Sentence of G. K1 CO2
Sentence is a string of Sentential form may
terminals. Sentence is contain non terminals
a sentential form with
no nonterminals.
9
7
10. Part A – Questions & Answers
S.No Questions K CO
Level Level
27. Define a context free grammar.
A context free grammar G is a collection of the following
V is a set of non terminals
T is a set of terminals
S is a start symbol
K1 CO2
P is a set of production rules
G can be represented as G = (V,T,S,P)
Production rules are given in the following form
Non terminal → (V U T)*
28. Define ambiguous grammar.
A grammar G is said to be ambiguous if it generates more K1 CO2
than one parse tree for some sentence of language L(G).
9
8
10. Part A – Questions & Answers
S.No Questions K CO
Level Level
30. Define Terminals.
Terminals are the basic symbols from which strings are
formed. A set of productions (P). The productions of a K1 CO2
grammar specify the manner in which the terminals and non-
terminals can be combined to form strings.
9
9
10. Part A – Questions & Answers
S.No Questions K CO
Level Level
34. What are the different strategies that a parser can employ to
recover from a syntactic error?
There are four common error-recovery strategies that can be
implemented in the parser to deal with errors in the code.
Panic mode
K2 CO2
Statement mode
Error productions
Global correction
Abstract Syntax Trees
35. Define context free language. When will you say that two K1 CO2
CFGs are equal?
A context-free grammar is a set of recursive rules used
to generate patterns of strings. A context-free grammar
can describe all regular languages and more, but they
cannot describe all possible languages. Context-free
grammars are studied in fields of theoretical computer
science, compiler design, and linguistics.
Two grammars G1 and G2 are defined to be equivalent if
they generate the same language. The
grammar equivalence problem is the problem of deciding
whether two context-free grammars are equivalent.
9
9
11. Part B – Questions
S.No Questions K CO
Level Level
1. Explain error-recovery strategies in detail. K2 CO2
F-> id
5. Construct an SLR parsing table for the given grammar.
K3 CO2
G: E -> E+T | T T -> TF | F F -> F* | a | b
10
1
11. Part B – Questions
S.No Questions K CO
Level Level
9. Construct Parsing table for the grammar and find moves
made by predictive parser on input id + id * id and find
FIRST and FOLLOW.
E -> E + T
E -> T CO2
T -> T * F K3
T -> F
F -> (E)
F-> id
F-> id
Construct a LALR parsing table for the grammar given
above. Verify whether the input string id + id * id is
accepted by the grammar or not.
10
2
11. Part B – Questions
S.No Questions K CO
Level Level
Consider the grammar given below.
14.
E -> E + T
E -> T
T -> T * F
T -> F CO2
F -> (E) K3
F-> id
Construct an LR parsing table for the above grammar.
Give the moves of LR parser on id*id+id.
S.No Questions K CO
Level Level
21. Consider the grammar given below.
S -> CC
C -> aC CO2
C -> d
Construct a CLR parsing table for the above grammar. K3
22. i) Construct parse tree for the input string w = cad using
top-down parser.
S - > cAd
A - > ab | a CO2
K3
ii)List all LR(0) items for the following grammar.
S->AS|b
A->SA|a
23. Construct the LALR parsing table for the grammar.
E->E+T/T
K3 CO2
T->TF/F
F->F*/a/b
24. Show that the following grammar
10
4
11. Part B – Questions
S.No Questions K CO
Level Level
Construct a predictive parser for the grammar CO2
T’ -> !F|(E)|1|0
S -> iE tS | iEtSeS | a
K3
E -> b.
S -> (L)/a
L -> L, S|S.
S -> AaAb/BbBa
29. A -> ε
K3
B -> ε
121
11. Part C – Questions
1. Consider the CFG depicted below where “begin”, “end” and “x” are all
terminal symbols of the grammar and stat is considered the starting symbol
for this grammar. Productions are numbered in parenthesis and you can
abbreviate “begin” to “b” and “end” to “e” respectively.
B->d
And parse the statement “bdc” and “dd”.
Show that the following grammar:
S -> Aa | bAc | dc | bda
A -> a
122
12. Supportive online Certification courses
NPTEL : https://nptel.ac.in/courses/106105190
Swayam : https://www.classcentral.com/course/swayam-compiler-design-12926
coursera : https://www.coursera.org/learn/nand2tetris2
Udemy : https://www.udemy.com/course/introduction-to-compiler-construction-and-design/
Mooc : https://www.mooc-list.com/course/compilers-coursera
Edx : https://www.edx.org/course/compilers
123
13. REAL TIME APPLICATIONS : UNIT – II
words.
· Coreference resolution. It is about understanding references to multiple
entities existing in the text and disambiguating that reference. For example the
difference between a mouse (the animal) and a mouse (PC input device).
·Argument mining which is the automatic extraction and identification of
argumentative structures in a text piece typically an article of a journal.
124
14. CONTENTS BEYOND SYLLABUS : UNIT – II
125
15. ASSESSMENT SCHEDULE
Name of the
S.NO Start Date End Date Portion
Assessment
• TEXT BOOKS:
• REFERENCE BOOKS:
1. Randy Allen, Ken Kennedy, “Optimizing Compilers for Modern Architectures: A
Dependence-based Approach”, Morgan Kaufmann Publishers, 2002.
2. Steven S. Muchnick, “Advanced Compiler Design and Implementation”, Morgan
Kaufmann Publishers - Elsevier Science, India, Indian Reprint, 2003.
3. Keith D Cooper and Linda Torczon, “Engineering a Compiler”, Morgan
Kaufmann Publishers, Elsevier Science, 2004.
4. V. Raghavan, “Principles of Compiler Design”, Tata McGraw Hill Education
Publishers, 2010.
5. Allen I. Holub, “Compiler Design in C”, Prentice-Hall Software Series, 1993.
127
17. MINI PROJECT SUGGESTION
• Objective:
This module facilitate hands-on skills of the students (from the practical courses
more effectively) and they can try the following mini projects for deep
understanding in Compiler Design
Planning:
• This method is mostly used to improve the ability of students in application domain
and also to reinforce knowledge imparted during the lecture.
• Being a technical institute, this method is extensively used to provide empirical
evidence of theory learnt.
• Students are asked to prepare mini projects involving application of the
concepts, principles or laws learnt.
• The faulty guides the students at various stages of developing the project
and gives timely inputs for the development of the model.
Projects:
Set
Number Mini Project Title
(Category)
Set - 1 Design, develop and implement YACC/C program to
(Toppers)
demonstrate Shift Reduce Parsing technique for the
grammar rules: E →E+T | T, T →T*F | F, F → (E) |id and parse
the sentence: id + id * id (CO2, K3)
Set - 4 Generate SLR Parse Table From CFG Grammar (CO2, K3)
(Below
Average)
Disclaimer:
This document is confidential and intended solely for the educational purpose of RMK Group of Educational
Institutions. If you have received this document through email in error, please notify the system manager. This
document contains proprietary information and is intended only to the respective group / learning community as
intended. If you are not the addressee you should not disseminate, distribute or copy through e-mail. Please notify
the sender immediately by e-mail if you have received this document by mistake and delete this document from
your system. If you are not the intended recipient you are notified that disclosing, copying, distributing or taking any
action in reliance on the contents of this information is strictly prohibited.
129