0% found this document useful (0 votes)
15 views

u2

This document is a confidential course outline for the Compiler Design (Lab Integrated) course at RMK Group of Educational Institutions, detailing course objectives, prerequisites, syllabus, and outcomes. It includes a comprehensive lecture plan, assessment components, and activity-based learning strategies. The document is intended for educational purposes and should not be disseminated without authorization.

Uploaded by

Hari haran
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

u2

This document is a confidential course outline for the Compiler Design (Lab Integrated) course at RMK Group of Educational Institutions, detailing course objectives, prerequisites, syllabus, and outcomes. It includes a comprehensive lecture plan, assessment components, and activity-based learning strategies. The document is intended for educational purposes and should not be disseminated without authorization.

Uploaded by

Hari haran
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 129

1

2
Please read this disclaimer before
proceeding:
This document is confidential and intended solely for the educational
purpose of RMK Group of Educational Institutions. If you have
received this document through email in error, please notify the
system manager. This document containsproprietary information and
is intended only to the respective group / learning community as
intended. If you are not the addressee you should not disseminate,
distribute or copy through e-mail. Please notify the sender
immediately by e-mail if you have received this document by mistake
and delete this document from your system. If you are not the
intended recipient you are notified that disclosing, copying,
distributing or taking any action in reliance on the contents of this
information is strictly prohibited.

3
22CS601
Compiler Design (Lab Integrated)

Department: CSE

Batch / Year: 2022 - 2026 / III Year CSE

Created by

Dr.P. Ezhumalai Professor/CSE/RMDEC


Dr. Selvi S, Professor / CSE / RMKEC
Dr.A.K.Jaithunbi ASP/CSE/RMDEC
Dr. Pranamita Nanda ASP/CSE/RMKCET
(Course Coordinator)
Mr. Prabhu V, Associate Professor / CSE / RMKEC
Mr.D.Jayakumar Asst. Professor/CSE/RMDEC

Date: 18.12.2024

4
CONTENTS

S. No Page. No
Contents

1 Course Objectives 6

2 Pre Requisites 7

3 Syllabus 8

4 Course outcomes 10

5 CO- PO/PSO Mapping 11

6 Lecture Plan 12

7 Activity based learning 13

8 Lecture Notes 14

9 Assignments 105

10 Part A Questions & Answers 106

11 Part B Questions 117

12 Supportive online Certification courses 123

13 Real time Applications 124

14 Contents beyond the Syllabus 125

15 Assessment Schedule 126

16 Prescribed Text Books & Reference Books 127

17 Mini Project Suggestions 128

5
1. COURSE OBJECTIVES

• To study the different phases of compiler

• To understand the techniques for tokenization and


parsing

• To understand the conversion of source program into


an intermediate representation

• To learn the different techniques used for assembly


code generation

• To analyze various code optimization techniques

6
2. PRE REQUISITES

22CS601
Semester VI
Compiler Design (Lab Integrated)

Semester V

22CS502
Theory of Computation (Lab Integrated)

Semester II
Semester III

22MA301 22CS201
Data Structures (Lab Integrated)
Discrete Mathematics

Semester I Semester II
22CS101 22IT202
Problem Solving DBMS (Lab Integrated)
using C++ (Lab
Integrated)

7
3. SYLLABUS

22CS601 COMPILER DESIGN (LAB INTEGRATED) L T P C


3 0 2 4

OBJECTIVES

• To study the different phases of compiler


• To understand the techniques for tokenization and parsing
• To understand the conversion of source program into an intermediate
representation
• To learn the different techniques used for assembly code generation
• To analyze various code optimization techniques

UNIT I INTRODUCTION TO COMPILERS 9 + 6 = 15


Introduction–Structure of a Compiler–Role of the Lexical Analyzer - Input Buffering -
Specification of Tokens - Recognition of Tokens–The Lexical Analyzer Generator LEX- Finite
Automata - Regular Expressions to NFA-Optimization of DFA based pattern matches -
Conversion from NFA to DFA - Minimization of DFA.
UNIT II SYNTAX ANALYSIS 9 + 6 = 15
Role of the Parser - Context-free grammars – Derivation Trees – Ambiguity in Grammars and
Languages- Writing a grammar – Types of parsing - Top-Down Parsing – Predictive parser or
LL(1) Parser –Bottom-Up Parsing – Shift Reduce Parser - LR Parsers - SLR, CLR, LALR Parser -
Parser Generators YACC .
UNIT III INTERMEDIATE CODE GENERATION 9 + 6 = 15
Syntax Directed Definitions - Evaluation Orders for Syntax Directed Definitions – Application of
Syntax Directed Translation - Intermediate Languages - Syntax Tree – Three Address Code –
Implementation of Three address code – Declarations - Translation of Expressions - Type
Checking
UNIT IV RUN-TIM ENVIRONMENT AND CODE GENERTION 9 + 6 = 15
Run Time Environment: Storage Organization-Storage allocation strategies - Access to nonlocal
data on stack – Heap management - Parameter Passing - Issues in the design of Code
Generator – Design of simple Code Generator –Register allocation and assignment.
UNIT V CODE OPTIMIZATION 9 + 6 = 15

Principle sources of optimization – Peep hole Optimization – DAG construction -Basic blocks
and flow graph - Optimization in Basic blocks – Data flow analysis.

8
3. SYLLABUS

LIST OF EXPERIMENTS:
1. Develop a lexical analyzer to recognize a few patterns in C. (Ex.
identifiers, constants, comments, operators etc.). Create a symbol table,
while recognizing identifiers.
2. Design a lexical analyzer for the given language. The lexical analyzer
should ignore redundant spaces, tabs and new lines, comments etc.

3. Implement a Lexical Analyzer using Lex Tool


4. Design Predictive Parser for the given language
5. Implement an Arithmetic Calculator using LEX and YACC
6. Generate three address code for a simple program using LEX and YACC.
7. Implement simple code optimization techniques (Constant folding,
Strength reduction and Algebraic transformation)
8. Implement back-end of the compiler for which the three address code is
given as input and the 8086 assembly language code is produced as
output.

9
4. COURSE OUTCOME

Upon completion of the course, the students will be able to:

COURSE OUTCOMES HKL

Explain the different phases of compiler K2


CO1

Describe the fundamental components of a compiler K3


CO2
Design and implement a lexical analyzer using finite automata
CO3 K3
and regular expressions.

Compare various parsing techniques.


CO4 K3

Implement code optimization techniques with simple code K3


CO5
generators

Develop code generation strategies that translate intermediate K3


CO6
representations into target machine code

• HKL = Highest Knowledge Level

10
5. CO - PO / PSO MAPPING

PROGRAM OUTCOMES PSO


K3, PSO PSO PSO
CO HKL K3 K4 K5 K5 K4, A3 A2 A3 A3 A3 A3 A2 1 2 3
K5
PO PO PO PO PO PO PO PO PO PO PO PO
-1 -2 -3 -4 -5 -6 -7 -8 -9 -10 -11 -12

CO1 K2 3 2 1 - - - - 1 1 1 - 1 2 - -

CO2 K3 3 2 1 - - - - 1 1 1 - 1 2 - -

CO3 K3 3 2 1 - - - - 1 1 1 - 1 2 - -

CO4 K3 3 2 1 - - - - 1 1 1 - 1 2 - -

CO5 K3 3 2 1 - - - - 1 1 1 - 1 2 - -

CO6 K3 3 2 1 - - - - 1 1 1 - 1 2 - -

• Correlation Level –
• 1. Slight (Low)
• 2. Moderate (Medium)
3. Substantial (High) ,
If there is no correlation, put “-“.

11
6. LECTURE PLAN : UNIT – II

SYNTAX ANALYSIS

UNIT – II SYNTAX ANALYSIS


Actua
Propos
l Highest Re
S. ed Mode of Delivery
Topic Lectu CO Cognitive LU Outcomes mar
No Lecture Delivery Resources
re Level k
Date
Date

1 Role of the Parser, Context-free grammars MD1 & MD5 T1 Define parser role
03.01. 03.01. K2
2025 2025

Derivation Trees, Ambiguity in


2 MD1 & MD5 T1 Operations on CFG and ambiguity
04.01.
2025
Grammars and Languages 04.01.
2025
K2

MD1 & MD5


06.01. 06.01.
3 2025 Writing a grammar 2025
K2 T1 Constructions of a grammar
CO2
Explains concept of types of
4 Types of Parsing - Top-Down Parsing, MD1 & MD5 T1
07.01. 07.01. K2 parser - top down parsing
2025 2025

08.01. 08.01. Predictive parser or LL(1)


5 Predictive parser or LL(1) Parser MD1 & MD5 T1
2025 2025 K3 Parser.

Explains and bottom up


6 Bottom -Up Parsing - Shift Reduce Parser - MD1 & MD5 T1
10.01. SLR 10.01. K3 parsing construction of SLR
2025 2025 parser

Define LR parser and role of


11.01. LR Parsers – SLR 11.01. K3 MD1 & MD5 SLR parser
2025 2025
7 T1

MD1 & MD5 T1


20.01. 20.01. Explains construction of CLR
8 2025 LR Parsers – CLR - LALR 2025 K3 & LALR parser

MD1 & MD5 T1


22.01. 22.01. Use of YACC tool to design
9 2025 Parser Generator YACCs 2025
K3 Synatx analyzer.

• ASSESSMENT COMPONENTS MODE OF DELEIVERY


• AC 1. Unit Test MD 1. Oral presentation
• AC 2. Assignment MD 2. Tutorial
• AC 3. Course Seminar MD 3. Seminar
• AC 4. Course Quiz MD 4 Hands On
• AC 5. Case Study MD 5. Videos
• AC 6. Record Work MD 6. Field Visit
• AC 7. Lab / Mini Project
• AC 8. Lab Model Exam
• AC 9. Project Review

12
7. ACTIVITY BASED LEARNING : UNIT – II

Group Discussion on Different Parsing Methods

1. Use JFLAP to demonstrate the construction of LL(1) parsing table


for the grammar
S -> aABb
A ->aAc |λ

2. Use the above grammar to demonstrate the construction of SLR


parsing table and parse the string

3. Compute FIRST() and FOLLOW ( ) for the grammar and display

result in JFLAP

E ->E+T | T
T ->T*F }F
F ->(E) |id

13
8. LECTURE NOTES : UNIT – II

SYNTAX ANALYSIS

ROLE OF THE PARSER


Parser obtains a string of tokens from the lexical analyzer and verifies that it can be
generated by the language for the source program.
This is the Second phases of the compiler. The parser should report any syntax
errors in an intelligible fashion. To get the token the parser should request the
Lexical analyzer using the method getnexttoken(), after receiving this method the
token will return the token for the Parser.
The two types of parsers employed are:

1. Top down parser: which build parse trees from top(root) to bottom(leaves)
2. Bottom up parser: which build parse trees from leaves and work up the root.

There are three general types of parsers for grammars: universal, top-down, and
bottom-up. Universal parsing methods such as the Cocke-Younger-Kasami
algorithm and Earley's algorithm can parse any grammar. These general methods
are, however, too inefficient to use in production compilers.
The most efficient top-down and bottom-up methods work only for subclasses of
grammars, but several of these classes, particularly, LL and LR grammars, are
expressive enough to describe most of the syntactic constructs in modern
programming languages.

14
LECTURE NOTES
Syntax Error Handling

Common programming errors can occur at many different levels.

• Lexical errors include misspellings of identifiers, keywords, or operators — e.g.,


the use of an identifier elipseSiz e instead of ellipseSiz e — and missing quotes
around text intended as a string.

• Syntactic errors include misplaced semicolons or extra or missing braces; that is,
"{ " or "}. " As another example, in C or Java, the appearance of a cas e statement
without an enclosing switc h is a syntactic error (however, this situation is usually
allowed by the parser and caught later in the processing, as the compiler attempts
to generate code).

• Semantic errors include type mismatches between operators and operands. An


example is a retur n statement in a Java method with result type void.

• Logical errors can be anything from incorrect reasoning on the part of the
programmer to the use in a C program of the assignment operator = instead of the
comparison operator ==. The program containing = may be well formed; however,
it may not reflect the programmer's intent.

Error-Recovery Strategies

1. Panic Mode Recovery

In this method, successive characters from the input are removed one at a time
until a designated set of synchronizing tokens is found. Synchronizing tokens are
deli-meters such as; or }

The advantage is that it’s easy to implement and guarantees not to go into an
infinite loop The disadvantage is that a considerable amount of input is skipped
without checking it for additional errors.

15
LECTURE NOTES
2. Phrase-Level Recovery :

• In this method, when a parser encounters an error, it performs the necessary


correction on the remaining input so that the rest of the input statement allows
the parser to parse ahead.

• The correction can be deletion of extra semicolons, replacing the comma with
semicolons, or inserting a missing semicolon.

• While performing correction, utmost care should be taken for not going in an
infinite loop.

• A disadvantage is that it finds it difficult to handle situations where the actual


error occurred before pointing of detection.

3. Error production

• If a user has knowledge of common errors that can be encountered then, these
errors can be incorporated by augmenting the grammar with error productions
that generate erroneous constructs.

• If this is used then, during parsing appropriate error messages can be generated
and parsing can be continued.

• The disadvantage is that it’s difficult to maintain.

4. Global Correction

• The parser examines the whole program and tries to find out the closest match
for it which is error-free.

• The closest match program has less number of insertions, deletions, and

changes of tokens to recover from erroneous input.


16
LECTURE NOTES

Context-Free Grammar (CFG)


CFG stands for context-free grammar. It is is a formal grammar which is used to
generate all possible patterns of strings in a given formal language. Context-free
grammar G can be defined by four tuples as:

G = (V, T, P, S)

Where,

G is the grammar, which consists of a set of the production rule. It is used to generate
the string of a language.

T is the final set of a terminal symbol. It is denoted by lower case letters.

V is the final set of a non-terminal symbol. It is denoted by capital letters.

P is a set of production rules, which is used for replacing non-terminals symbols(on the
left side of the production) in a string with other terminal or non-terminal symbols(on
the right side of the production).

S is the start symbol which is used to derive the string. We can derive the string by
repeatedly replacing a non-terminal by the right-hand side of the production until all
non-terminal have been replaced by terminal symbols.

Example 1:Construct the CFG for the language having any number of a's over the set
∑= {a}.

Solution: As we know the regular expression for the above language is


1. r.e. = a*

Production rule for the Regular expression is as follows:


1. S → aS rule 1
2. S→ε rule 2

Now if we want to derive a string "aaaaaa", we can start with start symbols.

17
LECTURE NOTES

S
⇒ aS
⇒ aaS rule 1
⇒ aaaS rule 1
⇒ aaaaS rule 1
⇒ aaaaaS rule 1
⇒ aaaaaaS rule 1
⇒ aaaaaaε rule 2
⇒ aaaaaa

The r.e. = a* can generate a set of string {ε, a, aa, aaa,.....}. We can have a null
string because S is a start symbol and rule 2 gives S → ε.

Example 2:Construct a CFG for the regular expression (0+1)*

Solution: The CFG can be given by,


1. Production rule (P):
2. S → 0S | 1S
3. S → ε

The rules are in the combination of 0's and 1's with the start symbol. Since (0+1)*
indicates {ε, 0, 1, 01, 10, 00, 11, ....}. In this set, ε is a string, so in the rule, we can
set the rule S → ε.

Example 3:Construct a CFG for a language L = {wcwR | where w € (a, b)*}.

Solution: The string that can be generated for a given language is {aacaa, bcb,
abcba, bacab, abbcbba, ....}

The grammar could be:


S → aSa rule 1
S → bSb rule 2
S→c rule 3

18
LECTURE NOTES

Now if we want to derive a string "abbcbba", we can start with start symbols.
S ⇒aSa
⇒abSba from rule 2
⇒ abbSbba from rule 2
⇒abbcbba from rule 3

Thus any of this kind of string can be derived from the given production rules.

Example 4:Construct a CFG for the language L = anb2n where n>=1.

Solution: The string that can be generated for a given language is {abb, aabbbb,
aaabbbbbb....}.

The grammar could be:


1. S → aSbb | abb

Now if we want to derive a string "aabbbb", we can start with start symbols.
S ⇒aSbb
⇒ aabbbb

Example 5: Language of palindromes

Solution: We can easily show using the pumping lemma that the language
L = { w wR | w ϵ (0+1)* } is not regular.

However, we can describe this language by the following context-free grammar over
the alphabet {0,1}:
· P→ε
· P → 0
· P→1

· P → 0P0

· P → 1P1

19
LECTURE NOTES

Example 6 :
Give context-free grammars that generate the following languages.

s.No Language Corresponding grammar


production

(i) L= { w ∈ {0, 1} ∗ | w contains S → X1X1X1X


at least three 1s } X → 0X | 1X | ε

(ii) L= { w ∈ {0, 1} ∗ | w = wR and S → 0S0 | 1S1 | ε


|w| is even }

(iii) L={ w ∈ {0, 1} ∗ | the length of w is S → 0S0 | 0S1 | 1S0 | 1S1 | 0


odd and the middle symbol is 0 }

(iv) L={ a i b j c k | i, j, k ≥ 0, and i = j S → XY | W X


or i = k } → aXb | ε Y
→ cY | ε
W → aW c | Z
Z → bZ | ε

(v) L={ a i b j c k | i, j, k ≥ 0 and S → aSc | X X


i+j=k} → bXc | ε

(vi) L(G) = { an bm | 0 ≤ n ≤ m ≤ 2n}. S → aSb | aSbb | ε

(vii) L={0i 1 j 2 k | i≠ j or j≠k } S → AC | BC | DE | DF

A → 0 | 0A | 0A1

B → 1 | B1 | 0B1

C → 2 | 2C

D → 0 | 0D

E → 1 | 1E | 1E2

F → 2 | F2 | 1F2

20
LECTURE NOTES

Recursive Inference – Bottom up process


· The process of coming up with strings that satisfy individual productions and then
concatenating them together according to more general rules is called Recursive
Inference.

Derivation Tree – Top Down


· Similar to recursive inference, but top-down instead of bottom-up
· Expand start symbol first and derive way down in such a manner that it matches the
input string

· For the given CFG example,


E→I // Expression is an identifier
E→E+E // Add two expressions
E→E*E // Multiply two expressions
E→ (E) // Add parenthesis
I→ L // Identifier is a Letter
I→ ID // Identifier + Digit
I→ IL // Identifier + Letter
D→0|1|2|3|4|5|6|7|8 |9 // Digits
L→ a|b|c|…A|B|…Z // Letters
given a*(a+b1) we can derive this by:
E ⇒ E*E ⇒ I*E ⇒ L*E ⇒ a*E ⇒ a*(E) ⇒ a*(E+E) ⇒ a*(I+E)
⇒ a*(L+E) ⇒ a*(a+E) ⇒ a*(a+I) ⇒ a*(a+ID) ⇒ a*(a+LD)
⇒ a*(a+bD) ⇒ a*(a+b1)
·
Symbol ⇒ in deriving a*(a+b1)

Leftmost Derivation :
In the previous example we used a derivation called a leftmost derivation. We can
specifically denote a leftmost derivation using the subscript “lm”, like:
⇒lm or ⇒*lm
A leftmost derivation is simply one in which we replace the leftmost variable in a production
body by one of its production bodies first, and then work our way from left to right.

Rightmost Derivation :
Not surprisingly, we also have a rightmost derivation which we can specifically denote via:
⇒rm or ⇒*r
A rightmost derivation is one in which we replace the rightmost variable by one of its
production bodies first , and then work our way from right to left.

21
LECTURE NOTES

Derivation:
Derivation is a sequence of production rules. It is used to get the input string through these
production rules. During parsing, we have to take two decisions. These are as follows:

● We have to decide the non-terminal which is to be replaced.


● We have to decide the production rule by which the non-terminal will be
replaced.

We have two options to decide which non-terminal to be placed with production rule.

1. Leftmost Derivation:

In the leftmost derivation, the input is scanned and replaced with the production rule from left
to right. So in leftmost derivation, we read the input string from left to right.

Example:

Production rules:

1. E -> E + E
2. E -> E - E
3. E ->a | b

Input:

a-b+a

The leftmost derivation is:

E => E + E

=> E - E + E
=> a - E + E
=> a - b + E
=> a - b + a

22
LECTURE NOTES

2. Rightmost Derivation:

In rightmost derivation, the input is scanned and replaced with the production rule from right
to left. So in rightmost derivation, we read the input string from right to left.

Example

Production rules:

1. E ->E + E
2. E ->E - E
3. E -> a | b

Input : a - b + a

The rightmost derivation is:

E => E - E
=> E - E + E
=> E - E + a
=> E - b + a
=> a - b + a

When we use the leftmost derivation or rightmost derivation, we may get the same string. This
type of derivation does not affect on getting of a string.

Examples of Derivation:

Example 1:Derive the string "abb" for leftmost derivation and rightmost derivation using a
CFG given by,

1. S → AB | ε
2. A → aB
3. B → Sb

23
LECTURE NOTES

Solution:

Leftmost derivation:

=>AB

=>aB B

=> a Sb B

=> a ε bB

=> a b Sb

=> ab ε b

=> abb

Rightmost derivation:

=> A B

=>A S b

=> A ε b

=> aB b

=> a Sb b

=> a εbb

=> abb

24
LECTURE NOTES

Example 2:

Derive the string "aabbabba" for leftmost derivation and rightmost derivation using a CFG given
by,

1. S → aB | bA
2. S → a | aS | bAA
3. S → b | aS | aBB

Solution:

Leftmost derivation:

S
=> aB S → aB
=>aaBB B → aBB
=> aabB B→b
=> aabbS B → bS
=> aabbaB S → aB
=> aabbabS B → bS
=> aabbabbA S → bA
=>aabbabba A→a

Rightmost derivation:

S
=> aB S → aB

=> aaBB B → aBB

=> aaBbS B → bS

=> aaBbbA S → bA
=> aaBbba A→a
=> aabSbba B → bS

=> aabbAbba S → bA

=> aabbabba A→a

25
LECTURE NOTES

Derivation Tree

Derivation tree is a graphical representation for the derivation of the given production rules for
a given CFG. It is the simple way to show how the derivation can be done to obtain some
string from a given set of production rules. The derivation tree is also called a parse tree.

Parse tree follows the precedence of operators. The deepest sub-tree traversed first. So, the
operator in the parent node has less precedence over the operator in the sub-tree.

A parse tree contains the following properties:

1. The root node is always a node indicating start symbols.


2. The derivation is read from left to right.
3. The leaf node is always terminal nodes.
4. The interior nodes are always the non-terminal nodes.

Example 1:

Production rules:

1. E = E + E
2. E = E * E
3. E = a | b | c

Input

1. a * b + c

Step 1:

26
LECTURE NOTES

Step 2:

Step 3:

Step 4:

27
LECTURE NOTES

Step 5:

Note: We can draw a derivation tree step by step or directly in one step.

Example 2:

Draw a derivation tree for the string "bab" from the CFG given by

1. S → bSb | a | b

Solution:

Now, the derivation tree for the string "bbabb" is as follows:

28
LECTURE NOTES

The above tree is a derivation tree drawn for deriving a string bbabb. By simply
reading the leaf nodes, we can obtain the desired string. The same tree can also be
denoted by,

Example 3:

Construct a derivation tree for the string aabbabba for the CFG given by,

1. S → aB | bA
2. A → a | aS | bAA
3. B → b | bS | aBB

Solution:

To draw a tree, we will first try to obtain derivation for the string aabbabba

29
LECTURE NOTES

=>a B

=>a aBB

=>aa bS B

=>aab bA B

=>aabb a B

=>aabba bS

=>aabbab bA

=>aabbabb a

Now, the derivation tree is as follows:

30
LECTURE NOTES
Example 6:construct parse tree for the string w=abaabb:
S → SS | aSb | ε
Solution:

Ambiguity in Grammar :
A grammar is said to be ambiguous if there exists more than one leftmost
derivation or more than one rightmost derivation or more than one parse tree for
the given input string. If the grammar is not ambiguous, then it is called
unambiguous.

If the grammar has ambiguity, then it is not good for compiler construction. No
method can automatically detect and remove the ambiguity, but we can remove
ambiguity by re-writing the whole grammar without ambiguity.

Example 1:

Let us consider a grammar G with the production rule

1. E → I
2. E → E + E
3. E → E * E
4. E → (E)
5. I → ε | 0 | 1 | 2 | ... | 9

31
LECTURE NOTES

Solution:

For the string "3 * 2 + 5", the above grammar can generate two parse trees by
leftmost derivation:

Since there are two parse trees for a single string "3 * 2 + 5", the grammar G is
ambiguous.

Example 2:

Check whether the given grammar G is ambiguous or not.

1. E → E + E
2. E → E - E
3. E → id

Solution:

From the above grammar String "id + id - id" can be derived in 2 ways:

32
LECTURE NOTES

First Leftmost derivation


E => E + E
=> id + E
=>id + E - E
=> id + id - E
=> id + id- id

Second Leftmost derivation

E => E - E
=> E + E - E
=> id + E - E
=> id + id - E
=> id + id - id

Since there are two leftmost derivation for a single string "id + id - id", the grammar G
is ambiguous.

Example 3:

Check whether the given grammar G is ambiguous or not.

1. S → aSb | SS
2. S → ε

Solution:

For the string "aabb" the above grammar can generate two parse trees

33
LECTURE NOTES

Since there are two parse trees for a single string "aabb", the grammar G is
ambiguous.

Example 4:

Check whether the given grammar G is ambiguous or not.

1. A → AA
2. A → (A)
3. A → a

Solution:

For the string "a(a)aa" the above grammar can generate two parse trees:

Since there are two parse trees for a single string "a(a)aa", the grammar G is
ambiguous.

34
LECTURE NOTES

WRITING A GRAMMAR

A grammar consists of a number of productions. Each production has an abstract


symbol called a nonterminal as its left-hand side, and a sequence of one or more
nonterminal and terminal symbols as its right-hand side. For each grammar, the
terminal symbols are drawn from a specified alphabet.

Starting from a sentence consisting of a single distinguished nonterminal, called the


goal symbol, a given context-free grammar specifies a language, namely, the set of
possible sequences of terminal symbols that can result from repeatedly replacing
any nonterminal in the sequence with a right-hand side of a production for which
the nonterminal is the left-hand side.

Example, Grammar G: stmt→ifexprthenstmt|ifexprthenstmtelstmte|other

TOP-DOWN PARSING

A program that performs syntax analysis is called a parser. A syntax analyzer takes
tokens as input and output error message if the program syntax is wrong. The
parser uses symbol-look- ahead table and an approach called top-down parsing
without backtracking.

Top-down parsers check to see if a string can be generated by a grammar by


creating a parse tree starting from the initial symbol and working down.

Bottom-up parsers, however, check to see a string can be generated from a


grammar by creating a parse tree from the leaves, and working up. Early parser
generators such as YACC creates bottom-up parsers whereas many of Java parser
generators such as JavaCC create top-down parsers.

35
LECTURE NOTES

RECURSIVE DESCENT PARSING


• Typically, top-down parsers are implemented as a set of recursive functions that
descent through a parse tree for a string.
• This approach is known as recursive descent parsing, also known as LL(k) parsing
where the first L stands for left-to-right, the second L stands for leftmost-
derivation, and k indicates k-symbol look-ahead.
• Therefore, a parser using the single symbol look-ahead method and top-down
parsing without backtracking is called LL(1) parser.
• In the following sections, we will also use an extended BNF notation in which
some regulation expression operators are to be incorporated.
• Recursive descent is a top-down parsing technique that constructs the parse tree
from the top and the input is read from left to right.
• It uses procedures for every terminal and non-terminal entity. This parsing
technique recursively parses the input to make a parse tree, which may or may
not require back-tracking.
• But the grammar associated with it (if not left factored) cannot avoid back-
tracking. A form of recursive-descent parsing that does not require any back-
tracking is known as predictive parsing.
• This parsing technique is regarded recursive as it uses context-free grammar
which is recursive in nature.

A syntax expression defines sentences of the form , or . A syntax of the form


defines sentences that consist of a sentence of the form followed by a sentence of
the form followed by a sentence of the form . A syntax of the form defines zero or
one occurrence of the form . A syntax of the form defines zero or more occurrences
of the form .

A usual implementation of an LL(1) parser is: initialize its data structures, get the
lookahead token by calling scanner routines, and call the routine that implements
the start symbol.

36
LECTURE NOTES
Here is an example.
proc syntaxAnalysis()
begin
initialize(); // initialize global data and structures nextToken(); //
get the lookahead token

program(); // parser routine that implements the start symbol end;

Back-tracking
Top- down parsers start from the root node (start symbol) and match the input
string against the production rules to replace them (if matched). To understand
this, take the following example of CFG:
S → rXd | rZd

X → oa | ea
Z → ai

Left Factoring
Left factoring is a process by which the grammar with common prefixes is
transformed to make it useful for Top down parsers.
S → iEtS / iEtSeS / a
E→b
The left factored grammar is-
S → iEtSS’ / a
S’ → eS / ∈
E→b

37
LECTURE NOTES
Left Recursion:
A grammar is left-recursive if and only if there exists a nonterminal symbol that can derive to a sentential form
with itself as the leftmost symbol .
A→A𝛼/𝛽
Elimination of Left Recursion
A→A𝛼/𝛽
Introduce a new nonterminal A' and rewrite the rule as
𝐴 → 𝛽𝐴′

𝐴′ → 𝛼𝐴′/∈

Eliminate left recursion for the following grammar:

1) E →E +T / T

Solution:

E→ 𝑇𝐸′ 𝐸′→ +𝑇 𝐸′ / ∈

2) T →T*F / F

Solution:

T→ FT′ T′→ *F T′ / ∈

FIRST AND FOLLOW

To compute FIRST(X) for all grammar symbols X, apply the following rules until no more terminals or e can be
added to any FIRST set.

1. If X is terminal, then FIRST(X) is {X}.

2. If X->e is a production, then add e to FIRST(X).

3. If X is non-terminal and X->Y1Y2...Yk is a production, then place a in FIRST(X) if for some i, a is in


FIRST(Yi) and e is in all of FIRST(Y1),...,FIRST(Yi-1) that is, Y1.......Yi-1=*>e. If e is in FIRST(Yj) for all
j=1,2,...,k, then add e to FIRST(X). For example, everything in FIRST(Yj) is surely in FIRST(X). If y1 does not
derive e, then we add nothing more to FIRST(X), but if Y1=*>e, then we add FIRST(Y2) and so on.

To compute the FIRST(A) for all nonterminals A, apply the following rules until nothing can be added to any
FOLLOW set.
1. Place $ in FOLLOW(S), where S is the start symbol and $ in the input right endmarker.
2. If there is a production A=>aBs where FIRST(s) except e is placed in FOLLOW(B).
3. If there is a production A->aB or a production A->aBs where FIRST(s) contains e, then everything in
FOLLOW(A) is in FOLLOW(B).

PROBLEM :

Consider the following example to understand the concept of First and Follow.Find the first and follow of all
nonterminals in the Grammar-
E -> TE' E'-> +TE'|e T -> FT'
T'-> *FT'|e F -> (E)|id Then:

38
LECTURE NOTES

FIRST(E)=FIRST(T)=FIRST(F)={(,id} FIRST(E')={+,e}
FIRST(T')={*,e} FOLLOW(E)=FOLLOW(E')={),$}
FOLLOW(T)=FOLLOW(T')={+,),$}
FOLLOW(F)={+,*,),$}

For example, id and left parenthesis are added to FIRST(F) by rule 3 in definition of FIRST with i=1
in each case, since FIRST(id)=(id) and FIRST('(')= {(} by rule 1. Then by rule 3 with i=1, the
production T -> FT' implies that id and left parenthesis belong to FIRST(T) also.
To compute FOLLOW,we put $ in FOLLOW(E) by rule 1 for FOLLOW. By rule 2 applied to production
F-> (E), right parenthesis is also in FOLLOW(E). By rule 3 applied to production E-> TE', $ and right
parenthesis are in FOLLOW(E').

Example 2:
S → aBDh
B ->cC
C → bC / ∈
D → EF
E→g/∈
F→f/∈

Solution: First Functions


First(S) = { a }
First(B) = {b , ∈ }
First(C) = { b , ∈ }
First(D) = { First(E) – ∈ } ∪ First(F)
First(E) = { g , ∈ }
First(F) = { f , ∈ }

Follow Functions
Follow(S) = { $ }
Follow(B) = { First(D) – ∈ } ∪ follow(D) = { g , f , h }
Follow(C) = Follow(B) = {c, g , f , h }
Follow(D) = First(h) = { h }
Follow(E) = { First(F) – ∈ } ∪ Follow(F) = { f , h }
Follow(F) = Follow(D) = { h }

39
LECTURE NOTES
PREDICTIVE PARSER :
A predictive parser is a recursive descent parser with no backtracking or backup. It
is a top-down parser that does not require backtracking. At each step, the choice of
the rule to be expanded is made upon the next terminal symbol.

Pseudocode for a predictive parser is:

Steps for Designing a Predictive Parser:


1.Eliminate the Left Recursion
2.Eliminate the Left Factoring
3. Compute FIRST and FOLLOW
4. Generate the Predictive Parsing table
5. Check the given input is accepted or not

40
LECTURE NOTES

CONSTRUCTION OF PREDICTIVE PARSING TABLES


For any grammar G, the following algorithm can be used to construct the predictive parsing
table. The algorithm is
Input : Grammar G Output : Parsing table M Method
For each production A-> a of the grammar, do steps 2 and 3
For each terminal a in FIRST(a), add A->a, to M[A,a].
If e is in First(a), add A->a to M[A,b] for each terminal b in FOLLOW(A). If
e is in FIRST(a) and $ is in FOLLOW(A), add A->a to M[A,$].
Make each undefined entry of M be error.

LL(1) GRAMMAR

The above algorithm can be applied to any grammar G to produce a parsing table M. For
some Grammars, for example if G is left recursive or ambiguous, then M will have at least
one multiply-defined entry. A grammar whose parsing table has no multiply defined
entries is said to be LL(1). It can be shown that the above algorithm can be used to
produce for every LL(1) grammar G a parsing table M that parses all and only the
sentences of G. LL(1) grammars have several distinctive properties. No ambiguous or left
recursive grammar can be LL(1).

There remains a question of what should be done in case of multiply defined entries. One
easy solution is to eliminate all left recursion and left factoring, hoping to produce a
grammar which will produce no multiply defined entries in the parse tables. Unfortunately
there are some grammars which will give an LL(1) grammar after any kind of alteration. In
general, there are no universal rules to convert multiply defined entries into single valued
entries without affecting the language recognized by the parser.

The main difficulty in using predictive parsing is in writing a grammar for the source
language such that a predictive parser can be constructed from the grammar. Although left
recursion elimination and left factoring are easy to do, they make the resulting grammar
hard to read and difficult to use the translation purposes. To alleviate some of this
difficulty, a common organization for a parser in a compiler is to use a predictive parser for
control constructs and to use operator precedence for expressions. However, if an LR
parser generator is available, one can get all the benefits of predictive parsing and operator
precedence automatically.

41
LECTURE NOTES

Problem based on Predictive parser:

Construct a Predictive Parsing table for the following grammar


& also check whether string id + id * id is accepted or not.

E → TE′
E′ → +TE′|ε
T′ → FT′
T′ → FT′|ε
F →calc(E)|id
: lex.yy.o
y.tab.o gcc -o calc lex.yy.o y.tab.o -ly -ll

Solution:
lex.yy.c: calc.l y.tab.c
flex calc.l

Steps to
y.tab.c: follow
calc.y
to construct predictive parser is,
bison -vdty calc.y

Step 1: Eliminate Left recursion


Step 2: Eliminate Left factoring
Step 3: Computation of FIRST() and FOLLOW()
Step 4: Construction of Predictive parsing table
Step 5: Parsing the input using the table

Step1− Elimination of Left Recursion & perform Left Factoring

As there is no left recursion in Grammar so, we will proceed as it is.


Also, there is no need for Left Factoring.

Step2− Elimination of Left Factoring


As there is no left factoring in Grammar
7
8
LECTURE NOTES
Step 3− Computation of FIRST()
FIRST(E) = FIRST(T) = FIRST(F) = {(, id}
FIRST (E′) = {+, ε}
FIRST (T′) = {*, ε}
Computation of FOLLOW()
FOLLOW (E) = FOLLOW(E′) = {), $}
FOLLOW (T) = FOLLOW(T′) = {+, ), $}
FOLLOW (F) = {+,*,),$}
Step4− Construction of Predictive Parsing Table
Create the table, i.e., write all non-terminals row-wise & all terminal column-wise.

Now, fill the table by applying rules from the construction of the Predictive Parsing
Table.

 E → TE′
Comparing E → TE′with A → α

E → TE′

A→ Α
∴ A = E, α = TE′
∴ FIRST(α) = FIRST(TE′) = FIRST(T) = {(, id}
Applying Rule (1) of Predictive Parsing Table
∴ ADD E → TE′ to M[E, ( ] and M[E, id]
∴ write E → TE′in front of Row (E)and Columns {(, id} (1)

 E′ → +TE′|𝜀
7
9
LECTURE NOTES
Comparing it with A → α

E → +TE′

A→ α
∴ A = E′ α = +TE′
∴ FIRST(α) = FIRST(+TE′) = {+}
∴ ADD E → +TE′ to M[E′, +]
∴ write production E′ → +TE′ in front of Row (E′)and Column (+) (2)

E′ → ε

Comparing it with A → α

E→ ε

A→ α
∴α=ε
∴ FIRST(α) = {ε}
∴ Applying Rule (2)of the Predictive Parsing Table.
Find FOLLOW (E′) = { ), $}

∴ ADD Production E′ → ε to M[E′, )] and M[E′, $]


∴ write E′ → ε in front of Row (E′)and Column {$,
)} (3)

T → FT′

Comparing it with A → α

T → FT′

A→ α
∴ A = T, α = FT′
∴ FIRST(α) = FIRST (FT′) = FIRST (F) = {(, id}
∴ ADD Production T → FT′ to M[T, (] and M[T, id]
∴ write T → FT′ in front of Row (T)and Column {(,
id} (4)

T′ →*FT′

Comparing it with A → α
8
0
LECTURE NOTES

T → *FT′

A→ α
∴ FIRST(α) = FIRST (* FT′) = {*}
∴ ADD Production T → +FT′ to M[T′,*]
∴ write T′ →∗ FT′ in front of Row (T′)and Column {*} (5)

 T′ → ε
Comparing it with A → α

T′ → ε

A→ α
∴ A = T′
α=ε

∴ FIRST(α) = FIRST {𝜀} = {ε}


∴ Applying Rule (2)of the Predictive Parsing Table.
Find FOLLOW (A) = FOLLOW (T′) = {+, ), $}
∴ ADD T′ → ε to M[T′, +], M[T′, )] and M[T′, $]
∴ write T′ → ε in front of Row (T′)and Column {+, ), $} (6)

 F →(E)
Comparing it with A → α

F → (E)

A→ A
∴ A = F, α = E
∴ FIRST(α) = FIRST ((E)) = {(}
∴ ADD F → (E) to M[F, (]
∴ write F → (E) in front of Row (F)and Column (( ) (7)

 F → id
Comparing it with A → α

F → Id

A→ A
8
1
LECTURE NOTES
∴ FIRST(α) = FIRST (id) = {id}
∴ ADD F → id to M[F, id]
∴ write F → id in front of Row (F)and Column (id) (8)
Combining all statements (1) to (8) will generate the following Predictive or LL (1)
Parsing Table −

Id + * ( ) $

E E → TE′ E → TE′

E′ E′ → +TE′ E′ → ε T′ → ε

T T → FT′ T → FT′

T′ T′ → ε T′ →* FT′ T′ → ε T′ → ε

F F → id F → (E)

Step5− Checking Acceptance of String id + id * id using


Predictive Parsing Program
Initially, the stack will contain the starting symbol E and $ at the bottom of the stack.
Input Buffer will contain a string attached with $ at the right end.
If the top of stack = Current Input Symbol, then symbol from the top of the stack will be
popped, and also input pointer advances or reads next symbol.

Stack The sequence of Moves by Predictive ParserAction


Input

$E id + id * id $ E → TE′

$E′T id + id * id $ T → FT′

$E′T′F id + id * id $ F → id

$E′T′id id + id * id $ Remove id

$E′T′ +id * id $ T′ → ε

$E′ +id * id $ E′ → +TE′

$E′T + +id * id $ Remove +

$E′T id * id $ T → FT′

$E′T′F id * id $ F → id
8
2
LECTURE NOTES

Stack Input Action

$E′T′id id * id $ Remove id

$E′T′ * id $ T′ →* FT′

$E′T′F * * id $ Remove *

$E′T′F id $ F → id

$E′T′id id $ Remove id

$E′T′ $ T′ → ε

$E′ $ E′ → ε

$E $ Accept

8
3
LECTURE NOTES

ERROR RECOVERY IN PREDICTIVE PARSING


The stack of a non-recursive predictive parser makes explicit the terminals and non-
terminals that the parser hopes to match with the remainder of the input. We shall
therefore refer to symbols on the parser stack in the following discussion.

An error is detected during predictive parsing when the terminal on top of the stack
does not match the next input symbol or when non-terminal A is on top of the
stack, a is the next input symbol, and the parsing table entry M[A,a] is empty.

• Panic-mode error recovery is based on the idea of skipping symbols on the input

until a token in a selected set of synchronizing tokens appears. Its effectiveness


depends on the choice of synchronizing set.
• The sets should be chosen so that the parser recovers quickly from errors that are
likely to occur in practice. Some heuristics are as follows
• As a starting point, we can place all symbols in FOLLOW(A) into the synchronizing
set for non terminal A. If we skip tokens until an element of FOLLOW(A) is seen
and pop A from the stack, it is likely that parsing can continue.

• It is not enough to use FOLLOW(A) as the synchronizing set for A. Fo example , if


semicolons terminate statements, as in C, then keywords that begin statements
may not appear in the FOLLOW set of the non terminal generating expressions. A
missing semicolon after an assignment may therefore result in the keyword
beginning the next statement being skipped. Often, there is a hierarchical
structure on constructs in a language; e.g., expressions appear within statement,
which appear within b blocks, and so on. We can add to the synchronizing set of
a lower construct the symbols that begin higher constructs. For example, we
might add keywords that begin statements to the synchronizing sets for the non
terminals generating expressions.

48
LECTURE NOTES

PANIC-MODE ERROR RECOVERY


We can implement panic-mode error recovery by scanning down the stack until a
state s with a goto on a particular nonterminal A is found.

Zero or more input symbols are then discarded until a symbol a is found that can
legitimately follow The situation might exist where there is more than one choice
for the nonterminal A.

Normally these would be nonterminals representing major program pieces, e.g. an


expression, a statement, or a block. For example, if A is the nonterminal stmt, a
might be semicolon or }, which marks the end of a statement sequence. This
method of error recovery attempts to eliminate the phrase containing the syntactic
error.

The parser determines that a string derivable from A contains an error. Part of that
string has already been processed, and the result of this processing is a sequence
of states on top of the stack.

The remainder of the string is still in the input, and the parser attempts to skip over
the remainder of this string by looking for a symbol on the input that can
legitimately follow A.

By removing states from the stack, skipping over the input, and pushing GOTO(s,
A) on the stack, the parser pretends that if has found an instance of A and resumes
normal parsing.

49
LECTURE NOTES

PHRASE-LEVEL RECOVERY
Phrase-level recovery is implemented by examining each error entry in the LR
action table and deciding on the basis of language usage the most likely
programmer error that would give rise to that error. An appropriate recovery
procedure can then be constructed; presumably the top of the stack and/or first
input symbol would be modified in a way deemed appropriate for each error entry.
In designing specific error-handling routines for an LR parser, we can fill in each
blank entry in the action field with a pointer to an error routine that will take the
appropriate action selected by the compiler designer.

The actions may include insertion or deletion of symbols from the stack or the input
or both, or alteration and transposition of input symbols. We must make our
choices so that the LR parser will not get into an infinite loop. A safe strategy will
assure that at least one input symbol will be removed or shifted eventually, or that
the stack will eventually shrink if the end of the input has been reached. Popping a
stack state that covers a non terminal should be avoided, because this modification
eliminates from the stack a construct that has already been successfully parsed.

50
LECTURE NOTES

Bottom-Up Parser:

Bottom-up parsing can be defined as an attempt to reduce the input string w to the
start symbol of grammar by tracing out the rightmost derivations of w in reverse.

E -> E + T | T

T -> T * F | F

F -> (E) | id

Example: id*id

51
LECTURE NOTES

LR PARSING INTRODUCTION

The "L" is for left-to-right scanning of the input and the "R" is for constructing a
rightmost derivation in reverse.

WHY LR PARSING:
• LR parsers can be constructed to recognize virtually all programming-language
constructs for which context-free grammars can be written.

• The LR parsing method is the most general non-backtracking shift-reduce parsing


method known, yet it can be implemented as efficiently as other shift-reduce
methods.

• The class of grammars that can be parsed using LR methods is a proper subset of
the class of grammars that can be parsed with predictive parsers.
• An LR parser can detect a syntactic error as soon as it is possible to do so on a
left- to- right scan of the input.

The disadvantage is that it takes too much work to construct an LR parser by


hand for a typical programming-language grammar. But there are lots of LR parser
generators available to make this task easy.

52
MODELS OF LR PARSERS LECTURE NOTES
The schematic form of an LR parser is shown below.
• The program uses a stack to store a string of the form s0 X1 s1 X2...Xm sm where
sm is on top.
• Each Xi is a grammar symbol and each si is a symbol representing a state. Each state
symbol summarizes the information contained in the stack below it.
• The combination of the state symbol on top of the stack and the current input symbol
are used to index the parsing table and determine the shif treduce parsing decision.
• The parsing table consists of two parts: a parsing action function action and a goto
function goto.
This configuration represents the right-sentential form X1 X1 ... Xm ai ai+1 ...an
in essentially the same way a shift-reduce parser would; only the presence of the states
on the stack is new.

Recall the sample parse we did (see Example 1: Sample bottom-up parse) in which
we assembled the right-sentential form by concatenating the remainder of the input buffer to
the top of the stack.
The next move of the parser is determined by reading ai and sm, and consulting the
parsing action table entry action[sm, ai]. Note that we are just looking at the

state here and no symbol below it. We'll see how this actually works later.
The configurations resulting after each of the four types of move are as follows:
If action[sm, ai] = shift s, the parser executes a shift move entering the
configuration (s0 X1 s1 X2 s2... Xm sm ai s, ai+1... an$)
Here the parser has shifted both the current input symbol ai and the next symbol.
If action[sm, ai] = reduce A -> b, then the parser executes a reduce move, entering the
configuration, (s0 X1 s1 X2 s2... Xm-r sm-r A s, ai ai+1... an$)
where s = goto[sm-r, A] and r is the length of b, the right side of the production. The parser
first popped 2r symbols off the stack (r state symbols and r grammar symbols), exposing
state sm-r. The parser then pushed both A, the left side of the production, and s, the entry
for goto[sm-r, A], o44nto the stack. The current input symbol is not changed in a reduce
move.
The output of an LR parser is generated after a reduce move by executing the semantic
action associated with the reducing production. For example, we might just print out the
production reduced.
If action[sm, ai] = accept, parsing is completed. 53
LECTURE NOTES
LECTURE NOTES
OPERATOR GRAMMARS

A grammar that is used to define mathematical operators is called an operator


grammar or operator precedence grammar. Such grammars have the restriction
that no production has either an empty right-hand side (null productions) or two
adjacent non-terminals in its right-hand side.

Examples
This is an example of operator grammar:
E->E+E/E*E/id

However, the grammar given below is not an operator grammar because two non-
terminals are adjacent to each other:

S->SAS/a

A->bSb/b

We can convert it into an operator grammar, though:

S->SbSbS/SbS/a

A->bSb/b

Operator precedence parser – An operator precedence parser is a bottom-up


parser that interprets an operator grammar. This parser is only used for operator
grammars. Ambiguous grammars are not allowed in any parser except operator
precedence parser.
There are two methods for determining what precedence relations should hold
between a pair of terminals:

1. Use the conventional associativity and precedence of operator.


2.The second method of selecting operator-precedence relations is first to construct
an unambiguous grammar for the language, a grammar that reflects the correct
associativity and precedence in its parse trees.

This parser relies on the following three precedence relations: ⋖, ≐, ⋗


a ⋖ b This means a “yields precedence to” b.

54
LECTURE NOTES
a ⋗ b This means a “takes
a ≐ b This means a “has same precedence as” b.

Figure – Operator precedence relation table for grammar E->E+E/E*E/id


There is not given any relation between id and id as id will not be compared and
two variables can not come side by side. There is also a disadvantage of this table
– if we have n operators then size of table will be n*n and complexity will be 0(n2).
In order to decrease the size of table, we use operator function table.
Operator precedence parsers usually do not store the precedence table with the
relations; rather they are implemented in a special way. Operator precedence
parsers use precedence functions that map terminal symbols to integers, and the
precedence relations between the symbols are implemented by numerical
comparison. The parsing table can be encoded by two precedence
functions f and g that map terminal symbols to integers. We select f and g such
that:

1. f(a) < g(b) whenever a yields precedence to b


2. f(a) = g(b) whenever a and b have the same precedence
3.f(a) > g(b) whenever a takes precedence over b
Example – Consider the following grammar:

E -> E + E/E * E/( E )/id

This is the directed graph representing the precedence function:

55
LECTURE NOTES

fid -> g* -> f+ ->g+ -> f$

gid -> f* -> g* ->f+ -> g+ ->f$

Size of the table is 2n.


Operator Precedence Parsing is also a type of Bottom-Up Parsing that can be used
to a class of Grammars known as Operator Grammar.

A Grammar G is Operator Grammar if it has the following properties −

 Production should not contain ϵ on its right side.


There should not be two adjacent non-terminals at the right side of production.

Example1 − Verify whether the following Grammar is operator Grammar or not.

E → E A E |(E)|id

56
LECTURE NOTES
A → +| − | ∗

Solution

No, it is not an operator Grammar as it does not satisfy property 2 of operator


Grammar.

As it contains two adjacent Non-terminals on R.H.S of production E → E A E.

We can convert it into the operator Grammar by substituting the value of A in E → E


A E.

E → E + E |E − E |E * E |(E) | id.

Operator Precedence Relations

Three precedence relations exist between the pair of terminals.

Relation Meaning

p <. q p has less precedence than q.

p >. q p has more precedence than q.

p =. q p has equal precedence than q.

Depending upon these precedence Relations, we can decide which operations will be
executed or parsed first.

Association and Precedence Rules

If operators have different precedence Since

* has higher precedence than +

Example−

In a statement a + b * c

∴ + <. *

In statement a * b + c
3
9
LECTURE NOTES
∴∗.>+

 If operators have Equal precedence, then use Association rules.


(a)Example minus; In statement a + b + c here + operators are having equal
precedence.

As '+' is left Associative in a + b + c

∴ (a + b) will be computed first, and then it will be added to c.

i.e., (a + b) + c

+ .> +

Similarly, '*' is left Associative in a * b * c

(b) Example − In a statement a ↑ b ↑ c here, ↑ is the Right Associative operator

∴ It will become a ↑ (b ↑ c)

∴ (b ↑ c) will be computed first.

∴ ↑<. ↑

 Identifier has more precedence then all operators and symbols.

∴ θ <. id $ <. id
id . > θ id . > $

id . >)

(<. id.

 $ has less precedence than all other operators and symbols.

$ <. ( id . > $

$ <. + ). > $

$ <.*

Example2 − Construct the Precedence Relation table for the Grammar.

4
0
LECTURE NOTES
E → E + E | E ∗ E/id

Solution

Operator-Precedence Relations

Id + * $

Id .> .> .>

+ <. .> <. .>

* <. .> .> .>

$ <. <. <.

Advantages of Operator Precedence Parsing

It is accessible to execute. Disadvantages of

Operator Precedence Parsing

Operator Like minus can be unary or binary. So, this operator can have

different precedence’s in different statements.

 Operator Precedence Parsing applies to only a small class of Grammars.

LR(O), SLR(1), LR(1), LALR(1) GRAMMARS AND BOTTOM-UP PARSING


BOTTOM-UP PARSER

Bottom-up parsing starts from the leaf nodes of a tree and works in upward
direction till it reaches the root node. Here, we start from a sentence and then apply
production rules in reverse manner in order to reach the start symbol. The image
given below depicts the bottom-up parsers available.

4
1
LECTURE NOTES

Shift-Reduce Parsing

Shift-reduce parsing uses two unique steps for bottom-up parsing. These steps are
known as shift-step and reduce-step.

Shift step: The shift step refers to the advancement of the input pointer to the
next input symbol, which is called the shifted symbol. This symbol is pushed onto
the stack. The shifted symbol is treated as a single node of the parse tree.
Reduce step : When the parser finds a complete grammar rule (RHS) and replaces it

to (LHS), it is known as reduce-step. This occurs when the top of the stack contains
a handle. To reduce, a POP function is performed on the stack which pops off the
handle and replaces it with LHS non-terminal symbol.
Constructing a parse tree for an input string beginning at the leaves and going
towards the root is called bottom-up parsing.

A general type of bottom-up parser is a shift-reduce parser.


SHIFT-REDUCE PARSING
Shift-reduce parsing is a type of bottom-up parsing that attempts to construct a
parse tree for an input string beginning at the leaves (the bottom) and working up
towards the root (the top).

Example:
Consider the grammar:
S → aABe
A → Abc | b

4
2
LECTURE NOTES
B→d
The sentence to be recognized is abbcde.

REDUCTION (LEFTMOST) RIGHTMOST DERIVATION

abbcde (A → b) S→ aABe

aAbcde (A → Abc) → aAde

aAde (B → d) → aAbcde

aABe(S → aABe) → abbcde

The reductions trace out the right-most derivation in reverse.

Handles:

A handle of a string is a substring that matches the right side of a


production, and whose reduction to the non-terminal on the left side of the
production represents one step along the reverse of a rightmost derivation.

Example:

Consider the grammar:

E → E+E
E→ E*E

E → (E)

E → id

And the input string id1+id2*id3

The rightmost derivation is :

E →E+E

→ E+E*E

4
3
LECTURE NOTES
→ E+E*id3

→ E+id2*id3

→id 1+id2*id3

In the above derivation the underlined substrings are called handles.

Handle pruning:

A rightmost derivation in reverse can be obtained by “handle pruning”.

(i.e.) ifwis a sentence or string of the grammar at hand, thenw= y n,


where yn is then th right-sentinel form of some rightmost derivation.

Stack implementation of shift-reduce parsing :

Stack Input Action


$ id1+id2*id3 $ shift

$ id1 +id2*id3 $ reduce by E→id

$E +id2*id3 $ shift

$ E+ id2*id3 $ shift

$ E+id2 *id3 $ reduce by E→id

$ E+E *id3 $ shift

$ E+E* id3 $ shift

$ E+E*id3 $ reduce by E→id

$ E+E*E $ reduce by E→ E *E

$ E+E $ reduce by E→ E+E

$E $ accept

Actions in shift-reduce parser:


4
4
LECTURE NOTES
• shift – The next input symbol is shifted onto the top of the stack.
• reduce – The parser replaces the handle within a stack with a non-terminal.
• accept – The parser announces successful completion of parsing.
• error – The parser discovers that a syntax error has occurred and
calls an error recoveryroutine.

Conflicts in shift-reduce parsing:

There are two conflicts that occur in shift shift-reduce parsing:

1. Shift-reduce conflict: The parser cannot decide whether to shift or to


reduce.

2. Reduce-reduce conflict: The parser cannot decide which of several


reductions to make.

Shift-reduce conflict:

Example:

Consider the grammar:

E→E+E | E*E | id and input id+id*id

Stack Input Action Stack Input Action


$ E+E *id $ Reduce $E+E *id $ Shift
by
E→E+
E
$E *id $ Shift $E+E* id $ Shift

$ E* id $ Shift $E+E*id $ Reduce


by
E→id
$ E*id $ Reduce $E+E*E $ Reduc
by e by
E→id E→E*
E
$ E*E $ Reduc $E+E $ Reduc
e by e by
E→E* E→E*
E E
$E 4 $E
5
LECTURE NOTES

1. Reduce-reduce conflict:

Consider

the

grammar:

M → R+R

| R+c | R

R→c

and input c+c

Stack Input Action Stack Input Action


$ c+c $ Shift $ c+c $ Shift

$c +c $ Reduc $c +c $ Reduc
e by e by
R→c R→c
$R +c $ Shift $R +c $ Shift

$ R+ c$ Shift $ R+ c$ Shift

$ R+c $ Reduc $ R+c $ Reduce


e by by
R→c M→R+
c
$ R+R $ Reduce $M $
by
M→R+
R
$M $

Viable prefixes:

➢a is a viable prefix of the grammar if there iswsuch that awis a right sentinel
form.
➢The set of prefixes of right sentinel forms that can appear on the stack
of a shift-reduce parserare called viable prefixes.
4
➢The set of viable prefixes is a regular 6language.
LECTURE NOTES
LR Parser

The LR parser is a non-recursive, shift-reduce, bottom-up parser. It uses a wide


class of context-free grammar which makes it the most efficient syntax analysis
technique. LR parsers are also known as LR(k) parsers, where L stands for left-to-
right scanning of the input stream; R stands for the construction of right-most
derivation in reverse, and k denotes the number of lookahead symbols to make
decisions.

There are three widely used algorithms available for constructing an LR parser:

 SLR(1) – Simple LR Parser:


o Works on smallest class of grammar
o Few number of states, hence very small table
o Simple and fast construction

 LR(1) – LR Parser:
o Works on complete set of LR(1) Grammar
o Generates large table and large number of states

o Slow construction

 LALR(1) – Look-Ahead LR Parser:


o Works on intermediate size of grammar

o Number of states are same as in SLR(1)

LR Parsing Algorithm

Here we describe a skeleton algorithm of an LR parser:

token = next_token()

repeat forever
s = top of stack

if action[s, token] = “shift si” then


PUSH token

PUSH si
4
7
LECTURE NOTES

token = next_token()

else if action[s, token] = “reduce A::= β“


then POP 2 * |β| symbols
s = top of
stack PUSH A

PUSH goto[s,A]

else if action[s, token] = “accept”


then return

else
error()

LL vs. LR

LL LR

Does a leftmost derivation. Does a rightmost derivation in reverse.

Starts with the root nonterminal on Ends with the root nonterminal on the
the stack. stack.

Ends when the stack is empty. Starts with an empty stack.

Uses the stack for designating what Uses the stack for designating what is
is still to be expected. already seen.

Builds the parse tree top-down. Builds the parse tree bottom-up.

Continuously pops a nonterminal off Tries to recognize a right hand side on


the stack and pushes the corresponding the stack pops it and pushes the
right hand side. corresponding nonterminal.
4
8
LECTURE NOTES

Expands the non-terminals. Reduces the non-terminals.

Reads the terminals when it pops one Reads the terminals while it pushes
off the stack. them on the stack.

Pre-order traversal of the parse tree. Post-order traversal of the parse tree.

LALR(1) Parsers

Let's create an SLR(1) parse table for the following augmented grammar, G, from
page 255 of the Dragon Book.

0. S′ → S

1. S → L = R

2. S → R

3. L → * R

4. L → id

5. R → L

The LR(0) finite-state machine for G has the following states.

State 0:
S′ → ⋅ S
S→⋅L=R
S→⋅R
L→⋅*R
L → ⋅ id
R→⋅L
4
9
LECTURE NOTES

State 1:
S′ → S ⋅

State 2:
S→L⋅=R
R→L⋅

State 3:
S→R⋅

State 4:
L→*⋅R
R→⋅L
L→⋅*R
L → ⋅ id

State 5:
L → id ⋅

State 6:
S→L=⋅R
R→⋅L
L→⋅*R
L → ⋅ id

State 7:
L→*R⋅

State 8:

5
0
LECTURE NOTES
R→L⋅

State 9:
S→L=R⋅

The FIRST and FOLLOW sets are as follows.

FIRST(S) = {*, id}

FIRST(L) = {*, id}


FIRST(R) = {*, id}

FOLLOW(S) = {$}
FOLLOW(L) = {$, =}
FOLLOW(R) = {$, =}

Look at state 2. LR(0) item S → L ⋅ = R calls for a shift on lookahead =. But LR(0)
item R → L ⋅ calls for a reduce on lookahead =, since FOLLOW(L) contains =. That is
a parsing conflict.

But, in reality, = can only follow L when L occurs in the context of


production S → L = R, never when L is derived using production R → L.

So the shift reduce conflict in state 2 can be resolved by choosing the shift action
without affecting any parses.

The source of spurious conflicts

Imagine you have an SLR(1) grammar G1. In one part of G1 you have production

R→B

and all is well. Imagine that + is not in FOLLOW(B).

Now, you modify G1 slightly by adding production


5
1
LECTURE NOTES
A→B+C

in an entirely different section of the grammar. Call the new grammar G2.

Now + is in FOLLOW(B). That can introduce a parsing conflict in a state of G2's


machine containing LR(0) item R → B ⋅, even though the two parts of the grammar
have nothing to do with one another.

The problem is that FOLLOW sets are global, taking information from the entire
grammar.

LALR(1) parsers

A LALR(1) parser uses the same LR(0) finite-state machine that an SLR(1) parser
uses. But the LALR algorithm is more sensitive, and can remove spurious conflicts
like the one above, by using a more local notion of FOLLOW sets.

Grammar G above is not an SLR(1) grammar, but it is a LALR(1) grammar.

AMBIGUITY AND LR PARSING


Advantages of using Ambiguous Grammars:

Ambiguous grammars provide a shorter and more natural specification for certain

constructs when compared to the equivalent unambiguous grammars

One can isolate a construct for optimization purposes using ambiguous grammars

One can incorporate a special construct by adding new productions to an

existing ambiguous grammar

Ambiguous grammar can be handled by bottom up parser.

Creating LR Parsers for Ambiguous Grammars:

5
2
LECTURE NOTES
No ambiguous grammar can be LR. But by carefully resolving conflicts an LR

parser for an ambiguous grammar can be designed

Resolving conflicts can be done by either one of the two following methods

Method 1: By using associativity and precedence in case of expression


Method 2: By keeping in view the correct sentences in the language

Method 1:

 Consider the following ambiguous grammar for expressions

E → E + E | E * E | ( E ) | id
We know that ‘+’ and ‘*’ are left associative and ‘*’ is having precedence
over ‘+’

 The unambiguous version of the equivalent grammar is


E → E + T | T T → T * F | F F → ( E ) | id
Reasons to use Ambiguous version:

 We can change operator precedence

 Parser will have less number of states compared to unambiguous version

 Parser never wastes time in performing reductions like E → T or T → F

Steps for Creating LR parser

 Step 1: Construct LR(0) items.

 Step 2: Construct LR parsing table with the help of step 1

 Step 3: Parse an input string with the help of step 2

Steps for Constructing LR(0) Items

 Step 1: Add augmented grammar.

 Step 2: Apply closure and goto operations

Method 1:
E’ → E E → E + E E → E * E E → ( E ) E → id

5
3
LECTURE NOTES
and the LR(0) items are:

I0: I5:

E’→.E E→.E+E E→E*.E E→.E+E

E→.E*E E→.id E→.E*E E→.(E)

I1: E’→E. E→.id

E→E.+E I6:

E→E.*E E→(E.) E→E.+E

I2: E→E.*E

I7: E→E+E. E→E.+E


E’→.E E→.E+E

E→.E*E E→E.*E

E→.(E) I8:

E→.id E→E*E. E→E.+E

I3: E→E.*E

E→.id I9:

E→.E+E
I4: E→E+.E
E→.E*E E→.(E)
E→id

E→(E).

72
LECTURE NOTES

Creating LR parser for Ambiguous Grammar:

5
5
LECTURE NOTES

Method 1:

While designing SLR parser table using it’s algorithm, conflicts arise as the

grammar is ambiguous

 The states involved in conflicts are


I7: reduction by E→E+E and shift on ‘+’ and ‘*’
I8: reduction by E→E*E and shift on ‘+’ and ‘*’

Resolving Conflicts with I7 on Top:

Let the input is id+id*id$. Then after consuming input upto ‘*’ the stack content will
be

O E1 + 4 E7

Now on ‘*’ shift must happen as ‘*’ is having precedence over ‘+’

Let the input be id+id+id$. After consuming input upto second ‘+’, the stack content
must be

5
6
LECTURE NOTES
0E1 + 4E7

Now ‘+’ reduction must happen as + is left associative

So, I7 must reduce on’+’ and must shift on ‘*’.

Resolving Conflicts with I8 on Top:

Let the input is id+id*id$. After consuming input upto ‘+’ the stack content is

O E1 + 5 E8

Now on ‘+’ reduction must happen as ‘*’ is having precedence over ‘+’

Let the input be id*id*id$. After consuming input upto second ‘*’, the stack content
will be

0E1 + 5E8

Now on ‘*’ reduction must happen as ‘*’ is left associative.

So, I8 must reduce on’+’ and must shift on ‘*’.

After resolving conflicts the SLR parsing table is as shown in the table.

5
7
LECTURE NOTES

CONSTRUCTING THE SLR PARSING TABLE

To construct the parser table we must convert our NFA into a DFA. The states in
the LR table will be the e-closures of the states corresponding to the items
SO...the process of creating the LR state table parallels the process of constructing
an equivalent DFA from a machine with e-transitions. Been there, done that - this is
essentially the subset construction algorithm so we are in familiar territory here.
We need two operations: closure() and goto().
closure()

If I is a set of items for a grammar G, then closure(I) is the set of items


constructed from I by the two rules: Initially every item in I is added to closure(I)

If A -> a.Bb is in closure(I), and B -> g is a production, then add the initial item [B
-> .g] to I, if it is not already there. Apply this rule until no more new items can be
added to closure(I).

From our grammar above, if I is the set of one item {[E'-> .E]}, then closure(I)
contains: I0: E' -> .E

E -> .E +

T E -> .T

T -> .T

* F T -> .F
F -> .(E)

F -> .id

76
LECTURE NOTES

goto()

goto(I, X), where I is a set of items and X is a grammar symbol, is defined to


be the closure of the set of all items [A -> aX.b] such that [A -> a.Xb] is in I.
The idea here is fairly intuitive: if I is the set of items that are valid for some
viable prefix g, then goto(I, X) is the set of items that are valid for the viable
prefix gX.

SETS-OF-ITEMS-CONSTRUCTION

To construct the canonical collection of sets of LR(0) items for


augmented grammar G'.
procedure items(G') begin
C := {closure({[S' -> .S]})};
repeat
for each set of items in C and each grammar symbol X such that goto(I, X)
is not empty and not in C do
add goto(I, X) to C;
until no more sets of items can be added to C end;

ALGORITHM FOR CONSTRUCTING AN SLR PARSING TABLE

Input: augmented grammar G’

Output: SLR parsing table functions action and goto for G’

Method:

Construct C = {I0, I1 , ..., In} the collection of sets of LR(0) items for G'.

State i is constructed from Ii:

77
LECTURE NOTES

if [A -> a.ab] is in Ii and goto(Ii, a) = Ij,

then set action[i, a] to "shift j". Here a must be a terminal.

if [A -> a.] is in Ii,

then set action[i, a] to "reduce A -> a" for all a in FOLLOW(A).


Here A may not be S'.

if [S' -> S.] is in Ii,

then set action[i, $] to "accept"

If any conflicting actions are generated by these rules, the grammar is


not SLR(1) and the algorithm fails to produce a parser. The goto
transitions for state i are constructed for all non terminals A using

the rule: If goto(Ii, A)= Ij, then goto[i, A] = j.

All entries not defined by rules 2 and 3 are made "error".

The initial state of the parser is the one constructed from the set of
items containing [S' -> .S].

78
LECTURE NOTES

Let's work an example to get a feel for what is going on,

An Example

(1) E -> E * B

(2) E -> E + B

(3) E -> B

(4) B -> 0 (5) B -> 1

The Action and Goto Table The two LR(0) parsing tables for this grammar look as
follows:

79
LECTURE NOTES

LALR PARSER:

We begin with two observations. First, some of the states generated for LR(1)
parsing have the same set of core (or first) components and differ only in their
second component, the lookahead symbol.

Our intuition is that we should be able to merge these states and reduce the
number of states we have, getting close to the number of states that would be
generated for LR(0) parsing.

This observation suggests a hybrid approach: We can construct the canonical LR(1)
sets of items and then look for sets of items having the same core. We merge
these sets with common cores into one set of items.

The merging of states with common cores can never produce a shift/reduce
conflict that was not present in one of the original states because shift actions
depend only on the core, not the lookahead.

But it is possible for the merger to produce a reduce/reduce conflict.

Our second observation is that we are really only interested in the lookahead symbol
in places where there is a problem.

So our next thought is to take the LR(0) set of items and add lookaheads only
where they are needed. This leads to a more efficient, but much more complicated
method.

80
LECTURE NOTES

ALGORITHM FOR EASY CONSTRUCTION OF AN LALR TABLE

Input: G’

Output: LALR parsing table functions with action and goto for G'. Method:

1. Construct C = {I0, I1 , ..., In} the collection of sets of LR(1) items for G'.
2. For each core present among the set of LR(1) items, find all sets having that
core and replace these sets by the union.
3. Let C' = {J0, J1 , ..., Jm} be the resulting sets of LR(1) items. The parsing
actions for state i are constructed from Ji in the same manner as in the
construction of the canonical LR parsing table.
4. If there is a conflict, the grammar is not LALR(1) and the algorithm fails.
5. The goto table is constructed as follows: If J is the union of one or more sets of
LR(1) items, that is, J = I0U I1 U ... U Ik, then the cores of goto(I0, X), goto(I1, X),
..., goto(Ik, X) are the same, since I0, I1 , ..., Ik all have the same core. Let K be
the union of all sets of items having the same core asgoto(I1, X).

6. Then goto(J, X) = K. Consider the above example,

I3 & I6 can be replaced by their union I36:C->c.C,c/d/$

C->.Cc,C/D/$ C->.d,c/d/$ I47:C->d.,c/d/$


I89:C->Cc.,c/d/$ Parsing Table

81
LECTURE NOTES

state c d $ S C

0 S36 S47 1 2

1 Accept

2 S36 S47 5

36 S36 S47 89

47 R3 R3

5 R1

89 R2 R2 R2

HANDLING ERRORS

The LALR parser may continue to do reductions after the LR parser would have
spotted an error, but the LALR parser will never do a shift after the point the LR
parser would have discovered the error and will eventually find the error.

LR ERROR RECOVERY
An LR parser will detect an error when it consults the parsing action table and find
a blank or error entry. Errors are never detected by consulting the goto table. An
LR parser will detect an error as soon as there is no valid continuation for the
portion of the input thus far scanned. A canonical LR parser will not make even a
single reduction before announcing the error. SLR and LALR parsers may make
several reductions before detecting an error, but they will never shift an erroneous
input symbol onto the stack.

82
LECTURE NOTES

2) SLR Parser or LR(0)

Steps:
1. create augment grammar
2. generate kernel items
3. find closure
4. compute goto()
5.construct parsing table

6.parse the string


Let us consider grammar:

S->a

S->(L)

L->S

L->L,S

Step :1 Create augment grammar


S is start symbol in the given grammar
Augment grammar is S’-> S

Step :2 Generate kernel items

Introduce dot in RHS of the production``

S’-> .S

Step :3 Find closure


(Rule: A -> α.Xβ i.e if there is nonterminal next to dot then Include X
production))

S’-> .S
S-> . a
S->.(L) I0

83
LECTURE NOTES

Step :5 construct parsing table


Label the rows of table with no. of iterations

Divide the Column into two parts

• Action part: terminal symbols


• Goto part: Nonterminal symbols

ACTION GOTO

a ( ) , $ S L

0 S2 S3 1

1 acce
pt

2 R1 R1 R1

3 S2 S3 5 4

4 S6 S7

5 R3 R3

6 R2 R2 R2

7 S2 S3 8

8 R4 R4

84
LECTURE NOTES

Goto (I0, S) 1 Goto (I3, ( ) 3

Goto (I0, a) 2 Goto (I4, ) ) 6

Goto (I0, ( ) 3 Goto (I4, ,) 7

Goto (I3, L) 4 Goto (I7, S) 8

Goto (I3, S) 5 Goto (I7, a) 2

Goto (I3, a) 2 Goto (I7, ( ) 3

label Production Ending Follow(LHS)


iteration

R0 S’-> S . I1 Follow(S’)={ $ }

R1 S-> a . I2 Follow(S)={ $ )
,}

R2 S-> ( L ) . I6 Follow(S)={ $ )
,}

R3 L-> S . I5 Follow(L)={ ) ,
}

R4 L-> L , S . I8 Follow(L)={ ) ,
}

85
LECTURE NOTES

Stack Input Action

$0 (a,a)$ S3

$0(3 a,a)$ S2

$0(3a2 ,a)$ R1

$0(3S5 ,a)$ R3

$0(3L4 ,a)$ S7

$0(3L4 , 7 a)$ S2

$0(3L4 , 7a2 )$ R1

$0(3L4 , 7S8 )$ R4

$0(3L4 )$ S6

$0(3L4)6 $ R2

$0S1 $ accept

Hence the string parsed successfully !!!!!!

86
LECTURE NOTES

CLR Parser or LR(1):

Steps:
1. create augment grammar

2. generate kernel items and add 2nd component

3. find closure
4. compute goto()
5.construct parsing table
6.parse the string

Let us consider grammar:

S->L=R

S->R

L->*R

L->id

R->L

Rule to find 2nd component:


Consider the production of the form : A-> α .Bβ , a
THEN 2nd component of B is : β , if it is terminal
First (β ) if it is non terminal
a, if there is no β

Step :1 Create augment grammar


S is start symbol in the given grammar
Augment grammar is S’-> S

87
LECTURE NOTES

Step :2 Generate kernel items and add 2nd component


Introduce dot in RHS of the production
Add $ as 2nd component separated by comma
S’-> .S , $

Step :3 Find closure


(Rule: A -> α.Xβ i.e if there is nonterminal next to
production))

S’-> .S , $
S-> . L=R
S-> . R
I0
L-> . *R
L-> . Id
R-> . L
Next find 2nd component:
compare each of the production with A-> α
S’-> .S , $ .Bβ , a
S-> . L=R, S’ -> . s, $ here no β, so $ is 2nd comp to S
$ S -> . L=R,$
S-> . R, $ comp to L
L-> . *R, S-> . R, $ here no β, so $ is
I0
=/$ 2 comp to R
nd
L-> . L production is not in
Id,=/$ standard form
R-> . L, $ R-> . L, $ here no β, so $ is
2nd comp to L
Rule to find 2nd component:
Consider the production of the form : A-> α .Bβ , a
THEN 2nd component of B is : β , if it is terminal
First (β ) if it is non terminal
a, if there is no β

88
LECTURE NOTES

We notice that some states in CLR parser have the same core items and differ only
in possible lookahead symbols.
Such as
I4 and I4’
I5 and I5’
I7 and I7’
I8 and I8’

So we shrink the obtained CLR parser by merging such states to form LALR Parser
Hence
CLR PARSER has 14 States (I0, I1,I2,I3,I4,I4’,I5,I5’,I6,I7,I7’,I8,I8’,I9)
LALR PARSER or LR(2) has 10 states (I0, I1,I2,I3,I4,I5,I6,I7,I8,I9)

89
LECTURE NOTES

PARSER GENERATOR YACC


A parser generator is a program that takes as input a specification of a syntax, and
produces as output a procedure for recognizing that language. Historically, they are
also called compiler-compilers.

YACC (yet another compiler-compiler) is an LALR(1) (LookAhead, Left-to-right,


Rightmost derivation producer with 1 lookahead token) parser generator. YACC was
originally designed for being complemented by Lex.

YACC input file is divided in three parts.


1.Definition

2. Rules

3. Auxiliary routines

90
LECTURE NOTES
Definition:
The definition part includes information about the tokens used in the syntax
definition.

%token NUMBER
%token ID
Yacc automatically assigns numbers for tokens, but it can be overridden by
%token NUMBER 621
Yacc also recognizes single characters as tokens. Therefore, assigned token
numbers should no overlap ASCII codes.
The definition part can include C code external to the definition of the parser and
variable declarations, within %{ and %} in the first column.

It can also include the specification of the starting symbol in the grammar:
%start nonterminal
Rule Part:
The rules part contains grammar definition in a modified BNF form.
Actions is C code in { } and can be embedded inside (Translation schemes).
Auxiliary Routines Part:
The auxiliary routines part is only C code.
It includes function definitions for every function needed in rules part.
It can also contain the main() function definition if the parser is going to be run as a
program.
The main() function must call the function yyparse().
Steps for Compiling YACC Program:
Write lex program in a file file.l and yacc in a file file.y
Open Terminal and Navigate to the Directory where you have saved the files.
type lex file.l
type yacc file.y
type cc lex.yy.c y.tab.h -ll
type ./a.out

91
LECTURE NOTES

//Program to implement calculator using yacc Tool


%{
#define YYSTYPE double
%}
%token NUMBER
%left '+' '-'
%left '*' '/'
%%
list: /*nothing*/
|list '\n'
|list expr '\n' {printf("\t%.8g\n",$2);}
;
expr: NUMBER {$$=$1;}
|expr '+' expr {$$=$1+$3;}
|expr '-' expr {$$=$1-$3;}
|expr '*' expr {$$=$1*$3;}
|expr '/' expr {$$=$1/$3;}
|'(' expr ')' {$$=$2;}
;
%%
#include<stdio.h>
#include<ctype.h>
char *progname;
intlineno=1;
main(argc,argv)

92
LECTURE NOTES

char *argv[];
{
progname=argv[0];
yyparse();
}
yylex()
{
int c;
while((c=getchar())==' '||c=='\t')
;
if(c==EOF)
return 0;

if(c=='.'||isdigit(c))
{
ungetc(c,stdin);
scanf("%lf",&yylval);
return NUMBER;
}
if(c=='\n')
lineno++;
return(c);

93
LECTURE NOTES
 bison textFile.y - a y.tab.c file containing C code for a parser is created.
 gcc -c y.tab.c - compile parser code normally using gcc C compiler.
 gcc -o parse y.tab.o lex.yy.o -ll -ly - we link the parser, scanner and libraries.
 ./parse - we execute the parser that reads from standard input.
The bison input file format is similar to the yacc file format therefore we shall not repeat
it here.
Following the example calculator we defined in the previous section the makefile in
this case would resemble the one below:

calc: lex.yy.o
y.tab.o gcc -o calc lex.yy.o y.tab.o -ly -ll

lex.yy.c: calc.l y.tab.c


flex calc.l

y.tab.c: calc.y
bison -vdty calc.y

SOLVED PROBLEMS:
Example 1
Construct a Predictive Parsing table for the following grammar & also check whether
string id + id * id is accepted or not.
E → TE′
E′ → +TE′|ε
T′ → FT′
T′ → FT′|ε
F → (E)|id
Solution −
Step1− Elimination of Left Recursion & perform Left Factoring
As there is no left recursion in Grammar so, we will proceed as it is. Also, there is no
need for Left Factoring.
7
Step2− Computation of FIRST 8
LECTURE NOTES
FIRST(E) = FIRST(T) = FIRST(F) = {(,
id} FIRST (E′) = {+, ε}

FIRST (T′) = {*, ε}


Step3− Computation of FOLLOW
FOLLOW (E) = FOLLOW(E′) = {), $}
FOLLOW (T) = FOLLOW(T′) = {+, ), $}
FOLLOW (F) = {+,*,),$}
Step4− Construction of Predictive Parsing Table
Create the table, i.e., write all non-terminals row-wise & all terminal column-
wise.

Now, fill the table by applying rules from the construction of the Predictive Parsing
Table.

E → TE′

Comparing E → TE′with A → α

E→ TE′

A→ Α
∴ A = E, α = TE′
∴ FIRST(α) = FIRST(TE′) = FIRST(T) = {(, id}
Applying Rule (1) of Predictive Parsing Table
∴ ADD E → TE′ to M[E, ( ] and M[E, id]
∴ write E → TE′in front of Row (E)and Columns {(, id} (1)
 E′ → +TE′|𝜀

7
9
LECTURE NOTES
Comparing it with A → α

E→ +TE′

A→ α
∴ A = E′ α = +TE′
∴ FIRST(α) = FIRST(+TE′) = {+}
∴ ADD E → +TE′ to M[E′, +]
∴ write production E′ → +TE′ in front of Row (E′)and Column (+) (2)
 E′ → ε
Comparing it with A → α

E→ ε

A→ α
∴α=ε
∴ FIRST(α) = {ε}
∴ Applying Rule (2)of the Predictive Parsing Table.
Find FOLLOW (E′) = { ), $}
∴ ADD Production E′ → ε to M[E′, )] and M[E′, $]
∴ write E′ → ε in front of Row (E′)and Column {$,
)} (3)
 T → FT′
Comparing it with A → α

T→ FT′

A→ α
∴ A = T, α = FT′
∴ FIRST(α) = FIRST (FT′) = FIRST (F) = {(, id}
∴ ADD Production T → FT′ to M[T, (] and M[T, id]
∴ write T → FT′ in front of Row (T)and Column {(, id}
(4)
 T′ →*FT′
Comparing it with A → α

8
0
LECTURE NOTES

T→ *FT′

A→ α
∴ FIRST(α) = FIRST (* FT′) = {*}
∴ ADD Production T → +FT′ to M[T′,*]
∴ write T′ →∗ FT′ in front of Row (T′)and Column {*} (5)
 T′ → ε
Comparing it with A → α

T′ → ε

A→ α
∴ A = T′ α = ε
∴ FIRST(α) = FIRST {𝜀} = {ε}
∴ Applying Rule (2)of the Predictive Parsing Table. Find
FOLLOW (A) = FOLLOW (T′) = {+, ), $}
∴ ADD T′ → ε to M[T′, +], M[T′, )] and M[T′, $]
∴ write T′ → ε in front of Row (T′)and Column {+, ), $} (6)
 F →(E)
Comparing it with A → α

F→ (E)

A→ A
∴ A = F, α = E
∴ FIRST(α) = FIRST ((E)) = {(}
∴ ADD F → (E) to M[F, (]
∴ write F → (E) in front of Row (F)and Column (( ) (7)
 F → id
Comparing it with A → α

F→ Id

A→ A
8
1
LECTURE NOTES
∴ FIRST(α) = FIRST (id) = {id}
∴ ADD F → id to M[F, id]
∴ write F → id in front of Row (F)and Column (id) (8)
Combining all statements (1) to (8) will generate the following Predictive or LL (1)
Parsing Table −

Id + * ( ) $

E E → TE′ E → TE′

E′ E′ → +TE′ E′ → ε T′ → ε

T T → FT′ T → FT′

T′ T′ → ε T′ →* FT′ T′ → ε T′ → ε

F F → id F → (E)
Step5− Checking Acceptance of String id + id * id using Predictive Parsing Program
Initially, the stack will contain the starting symbol E and $ at the bottom of the stack.
Input Buffer will contain a string attached with $ at the right end.
If the top of stack = Current Input Symbol, then symbol from the top of the stack will
be popped, and also input pointer advances or reads next symbol.
The sequence of Moves by Predictive Parser

Stack Input Action

$E id + id * id $ E → TE′

$E′T id + id * id $ T → FT′

$E′T′F id + id * id $ F → id

$E′T′id id + id * id $ Remove id

$E′T′ +id * id $ T′ → ε

$E′ +id * id $ E′ → +TE′

$E′T + +id * id $ Remove +

$E′T id * id $ T → FT′

$E′T′F id * id $ F → id
8
2
LECTURE NOTES

Stack Input Action

$E′T′id id * id $ Remove id

$E′T′ * id $ T′ →* FT′

$E′T′F * * id $ Remove *

$E′T′F id $ F → id

$E′T′id id $ Remove id

$E′T′ $ T′ → ε

$E′ $ E′ → ε

$E $ Accept

EXAMPLE 2:
Consider the ambiguous grammar. E → E + E E → E * E E → (E) E → id (a)
Construct LR (0) items for above grammar. (b) Construct SLR parsing table for
grammar. (c) Parse the input string id + id * id.

Solution

Step1− Construct Augmented Grammar


(1) E′ → S
(2) E → E + E
(3) E → E ∗ E
(3) E → (E)
(4) E → id
Step2− Find closure & goto functions to construct LR (0)
items. Closure (E′ → ∙ E) =

8
3
LECTURE NOTES

84
LECTURE NOTES

Applying goto on I9
∵ goto cannot be applied on I9, as the dot in E → (E). is on the last position.
Step3− Computation of FOLLOW

Applying Rule (1) FOLLOW


FOLLOW(B) = {$}

 E→E+E
Comparing E → E + E with A → α B β
∴ α = ε, B = E, β = +E
∵ FIRST(β) = FIRST(+E) = {+}
∴ Rule (2a)of FOLLOW FOLLOW(E) = {+}
Applying Rule (3) (2)

Comparing E → E + E with A → α B
∴ A = E, α = E+, B = E FOLLOW(E) =
{FOLLOW(E)}
(3)
 E→E∗E
Applying Rule (2) and Rule (3) of FOLLOW, we get
FOLLOW(E) = {*} FOLLOW(E) = {FOLLOW(E)}
(4)
 E → (E)
Applying Rule (2) (5)

Comparing E → (E) with A → α B β


∴ FOLLOW (E) = {)}
Rule (3) cannot be applied to this production As E → (E) cannot be
compared with A → α B (6)

 E → id

Rule (2) and (3) of FOLLOW cannot be applied on E → id. As E → id cannot be


compared with A → α B β and A → α B.
Combining (1) to (6), we get
FOLLOW (E) = {$, +,*, )}

101
LECTURE NOTES
Step4 − Construction of SLR Parsing Table

In the parsing table, conflict occurs at Row state 7, 8, and column *, +.


In Action [7, +], Action [7, *]
Action [8, +], Action [8, *] there occurs a shift-reduce conflict.
The Association and precedence rules can remove this conflict.
Parsing the string id + id * id

Stack Input Action

0 id + id * id$ Shift

0 id 3 + id * id $ Reduce by E → id

0E1 +id * id $ Shift

0E1+4 id * id $ Shift

0 E 1 + 4 id * id $ Reduce by E → id
3

102
LECTURE NOTES

Stack Input Action

0E1+4E7 * id $ Conflict i. e. , s5 or r1 ∴ * has higher precedence then +


∴ Action [7,∗] = s5 So, shift-reduce, i.e., s5
0 E 1 + 4 E 7* 5 $ Shift

0 E 1 + 4 E 7* 5 id 3 $ Reduce by E → id

0 E 1 + 4 E 7* 5 E 8 $ Reduce by E → E * E

0E1+4E7 $ Reduce by E → E + E

0E1 $ accept

The above parsing solves the conflicting problem in Action [7, *]. So,
Action [7, *] = s5 instead of r1.
Similarly on Parsing the string id + id + id.

Stack Input Action

0 id + id + id$ Shift

0 id 3 + id + id $ Reduce by E → id

0E1 +id + id $ Shift

0 E 1 +4 id + id $ Shift

0 E 1 +4 id 3 +id $ Reduce by E → id

0 E 1 +4 E 7 +id $ Conflict i. e. , Action [7, +] = s4 or r1. ∴ + is Associative


(left). 0E1 + 4E7 will be reduced before shifting + ∴ Action
[7,+] = r1 ∴ Reduce by E → E + E

0E1 +id $ Shift

0E+4 id $ Shift

103
LECTURE NOTES

Stack Input Action

0 E + 4id 3 $ Reduce by E → id

0 E + 4E 7 $ Reduce by E → E + E

0E1 $ accept

So, the Above parsing shows how to resolve shift Reduce conflict at Action [7,
+] So, Action [7, +] = r1 instead of s4
Similarly, other entries such as Action [8, +] and Action [8, *] can be solved by
taking strings.
id * id * id and id * id + id

Action [8, +] = r2 and


Resolution is
Action [8, *] = r2.

8
8
9. ASSIGNMENT : UNIT – II

Set Number Questions


(Category)

Perform LR(1) for the given grammar


Set - 1 S  Aa / bAc / Bc / bBa
Ad
(Toppers)
Bd
Check whether the grammar is LR(1) but not LALA(1). (CO2, K2)

Set - 2 Consider the following grammar


EE+T/T
(Above T  TF / F
Average) F  F* / a / b
Construct the SLR parsing table for this grammar and also parse the input
a * b + a. (CO2, K2)
Consider the following grammar
Set - 3 EE+T/T
TT*F/F
(Average) F  (E) / id
Construct the SLR(1) parsing table for this grammar and also parse the
input string id * id + id. (CO2, K2)

Construct the predictive parser for the following grammar


Set - 4 Sa/^/(T)
TT,S/S
(Below and parse the input string (a,(a,a)). (CO2, K2)
Average)

Perform Shift-reduce parsing for the following grammar


Set - 5
S  ( L) / a
(Slow LL,S/S
Learners)
and parse the input string (a,(a,a)) (CO2, K2)

105
10. Part A – Questions & Answers

S.No Questions K CO
Level Level
1. Differentiate Top Down Parser And Bottom Up
Parser? Give Example for each.
Top down Parser are the parsers which constructs the
parse tree from the root to the leaves in pre- order for the
given input string. Predictive Parser, Recursive Descendent
K1
Parser. CO2

Bottom Up Parser are the parsers which constructs the


parse tree from the leaves to the root for the given input
string. LR Parser, SLR Parser.

2. Compare syntax tree and parse tree.


Syntax tree is a variant of a parse tree in which each leaf
represents an operand and each interior node represents an
operator.
A parse tree may be viewed as a graphical representation for
a derivation that filters out the choice regarding replacement
order. Each interior node of a parse tree is labeled by some K1 CO2

nonterminal A and that the children of the node are labeled


from left to right by symbols in the right side of the
production by which this A was replaced in the derivation.
The leaves of the parse tree are terminal symbols.

3. Define Handles.
A handle of a string is a substring that matches the right side
of a production. This reduction helps in constructing the
K1 CO2
parse tree or right most derivation.

106
10. Part A – Questions & Answers

S.No Questions K CO
Level Level
4. Define ambiguous grammar with an example, and
specify it demerits.
If a grammar produces more than one parse tree for the
given input string then it is called ambiguous grammar. Its
K1 CO2
demerit is It is difficult to select or determine which parse
tree is suitable for an input string.

Ex:
E E+E / E*E / id
5. Mention the properties of parse tree.
The root is labeled by the start symbol.
Each leaf is labeled by a token or by

Each interior node is labeled by a non terminal K1 CO2


If A is the Non terminal, labeling some interior node and
x1, x2, x3 .xn are the labels of the children.

6. What do you mean by a syntax tree?


Syntax tree is a variant of a parse tree in which each leaf
K1 CO2
represents an operand and each interior node represents
an operator.

7. Define Handle pruning.


A technique to obtain the rightmost derivation in reverse
(called canonical reduction sequence) is known as handle
pruning (i.e.) starting with a string of terminals w to be CO2
K1
parsed. If w is the sentence of the grammar then = n
where n is the nth right sentential form of unknown right
most derivation.

9
1
10. Part A – Questions & Answers

S.No Questions K CO
Level Level
8. How will you define a context free grammar?
A context free grammar consists of terminals, non-terminals,
a start symbol, and productions.
i.Terminals are the basic symbols from which strings are
formed. “Token” is a synonym for terminal. Ex: if, then,
K1 CO2
else.
ii.Nonterminals are syntactic variables that denote sets of
strings, which help define the language generated by the
grammar. Ex: stmt, expr.

iii. Start symbol is one of the nonterminals in a grammar and


the set of strings it denotes is the language defined by the
grammar. Ex: S.
iv.The productions of a grammar specify the manner in
which the terminals and non-terminals can be combined to
form strings Ex: expr-> id

9 What is left factoring? Give an example.


Left factoring is a grammar transformation that is useful for K3 CO2
producing a grammar suitable for predictive parsing.

9
2
10. Part A – Questions & Answers

S.No Questions K CO
Level Level
10. Draw syntax tree for the expression a=b*– c+b*– c.

K2 CO2

11. Construct a parse tree of (a+b)*c for the grammer E-


>E+E/E*E/(E)/id. (or) grammar –(id+id).

CO2
K2

12. Eliminate Left Recursion for the grammar


A Ac|Aad|bd.
A->bd A' K2
CO2
A'->c A'|ad A'| є

13. What are the various conflicts that occur during shift
reduce parsing?

K1 CO2
Reduce/Reduce conflict
Shift/ Reduce conflict

9
3
10. Part A – Questions & Answers

S.No Questions K CO
Level Level
14. What is dangling reference?
A dangling reference occurs when there is a reference to
storage that has been deallocated. It is a logical error to use
dangling references, since the value of deallocated storage is
K1 CO2
undefined according to the semantics of most languages.

15. Write the rule to eliminate left recursion in a


grammar. K1 CO2
A - > Aα|β : A -> βA’ ; A’ -> αA’|£
16. Derive the string and construct a syntax tree for the
input string ceaedae using the grammar S->SaA|A,A-
>AbB|B,B->cSd|e
S->SaA
S->AaA
K2 CO2
S->cSdaA
S->cSaAdaA
S->cAaAdaA
S->cBaAdaA
S->ceaBdaA
S->ceaedaB
C->ceaedae
17. List the factors to be considered for top-down
parsing.
We begin with the start symbol and at each step, expand
one of the remaining non-terminals by replacing it with the K1 CO2
right side of one of its productions. We repeat until only
terminals remain. The top-down parse produces a leftmost
derivation of the sentence.

9
4
10. Part A – Questions & Answers

S.No Questions K CO
Level Level
18. Mention the role of semantic analysis.
It is used to check the type information of the syntactically
verified statements. K1 CO2

19. What is the output of syntax analysis phase? What


are the three general types of parsers for grammars?
Parser (or) parse tree is the output of syntax analysis phase
General types of parsers:

1) Universal parsing
2) Top-down K1 CO2
3) Bottom-up
20. Write short notes on YACC.
YACC is an automatic tool for generating the parser program.
YACC stands for Yet Another Compiler Compiler which is
basically the utility available from UNIX. Basically YACC is K1 CO2

LALR parser generator. It can report conflict or ambiguities in


the form of error messages.

21. What is phrase level error recovery?


Phrase Level Recovery strategy is employed by most error
repairing compilers to recover from syntactic error. The main
idea in this method is to perform local correction on those K1 CO2

input strings that were remaining unchecked on discovering


an error.

9
5
10. Part A – Questions & Answers

S.No Questions K CO
Level Level
22. What are the different strategies that a parser can
employ to recover from a syntactic error?
• Panic mode
• Phrase level
• Error productions K1 CO2
• Global correction

23. What is phrase level error recovery?


On discovering an error, a parser may perform local
correction on the remaining input; that is, it may
K1
replace a prefix of the remaining input by some
string that allows the parser to continue. This is
CO2
known as phrase level error recovery.

24. What is a parse tree?


A parse tree may be viewed as a graphical
representation for a derivation that filters out the
choice regarding replacement order. Each interior
node of a parse tree is labeled by some nonterminal
K1 CO2
A and that the children of the node are labeled from
left to right by symbols in the right side of the
production by which this A was replaced in the
derivation. The leaves of the parse tree are terminal
symbols.

9
6
10. Part A – Questions & Answers

S.No Questions K CO
Level Level
25. Write the algorithm for FIRST and FOLLOW in parser.
FIRST(α) is the set of terminals that begin strings derived
from α.

Rules
To compute FIRST(X), where X is a grammar symbol
If X is a terinal, then FIRST(X)={X}
If X-> є is a production, then add є to FIRST(X)
If X is a non terminal and X->Y1 Y2..Yk is a production.
Then add FIRST(Y1) to FIRST (X). If Y1 derives є. Then add
FIRST(Y2) to FIRST(X)

FOLLOW (A) is the set of terminals α that appear K2 CO2


immediately to the right of A. For rightmost sentential form
of A, $ will be in FOLLOW (A).

Rules
If $ is the input end-marker, and S is the start symbol, $ ∈
FOLLOW(S).
If there is a production, A → αBβ, then (FIRST (β) – ε) ⊆
FOLLOW (B).
If there is a production, A → αB, or a production A → αBβ,
where ε ∈ FIRST (β), then FOLLOW (A) ⊆ FOLLOW (B).
26. Differentiate sentence and sentential form.
Sentence Sentential form
If S=>w then the If S=>a then a is a
string w is called sentential form of G.
Sentence of G. K1 CO2
Sentence is a string of Sentential form may
terminals. Sentence is contain non terminals
a sentential form with
no nonterminals.

9
7
10. Part A – Questions & Answers

S.No Questions K CO
Level Level
27. Define a context free grammar.
A context free grammar G is a collection of the following
V is a set of non terminals
T is a set of terminals
S is a start symbol
K1 CO2
P is a set of production rules
G can be represented as G = (V,T,S,P)
Production rules are given in the following form
Non terminal → (V U T)*
28. Define ambiguous grammar.
A grammar G is said to be ambiguous if it generates more K1 CO2
than one parse tree for some sentence of language L(G).

29. Define left factoring.


Left factoring is a
grammar transformation that is useful for producing a
grammar suitable for predictive parsing. The basic idea is
K1 CO2
that when it is not clear which of two alternative productions
to use to expand a nonterminal “A”, we may be able to
rewrite the “A” productions to refer the decision until we
have seen enough of the input to make the right choice.

9
8
10. Part A – Questions & Answers

S.No Questions K CO
Level Level
30. Define Terminals.
Terminals are the basic symbols from which strings are
formed. A set of productions (P). The productions of a K1 CO2
grammar specify the manner in which the terminals and non-
terminals can be combined to form strings.

31. Define Non-Terminals.


Non-terminals are syntactic variables that denote sets of
strings. The non-terminals define sets of strings that help
K1 CO2
define the language generated by the grammar. A set of
tokens, known as terminal symbols (Σ). Terminals are the
basic symbols from which strings are formed. A set of
productions (P).

32. Define Distinguished symbol.


In a grammar, one non terminal is distinguished as the start K1 CO2
symbol, and the set of strings it denotes is the language
defined by the grammar.
33. Define Production.
The productions of a grammar specify the manner in which
the terminals and non-terminals can be combined to form
strings. Each production consists of a non-terminal called K1 CO2

the left side of the production, an arrow, and a sequence of


tokens and/or on- terminals, called the right side of the
production.

9
9
10. Part A – Questions & Answers

S.No Questions K CO
Level Level
34. What are the different strategies that a parser can employ to
recover from a syntactic error?
There are four common error-recovery strategies that can be
implemented in the parser to deal with errors in the code.
Panic mode
K2 CO2
Statement mode
Error productions
Global correction
Abstract Syntax Trees

35. Define context free language. When will you say that two K1 CO2
CFGs are equal?
A context-free grammar is a set of recursive rules used
to generate patterns of strings. A context-free grammar
can describe all regular languages and more, but they
cannot describe all possible languages. Context-free
grammars are studied in fields of theoretical computer
science, compiler design, and linguistics.
Two grammars G1 and G2 are defined to be equivalent if
they generate the same language. The
grammar equivalence problem is the problem of deciding
whether two context-free grammars are equivalent.

9
9
11. Part B – Questions

S.No Questions K CO
Level Level
1. Explain error-recovery strategies in detail. K2 CO2

2. Explain Top- Down parsing and Bottom up Parsing. K2 CO2

3. Explain Error Recovery in Predictive Parsing. K2 CO2

4. Construct an SLR parsing table for the above grammar.


E -> E + T
E -> T
T -> T * F K3 CO2
T -> F
F -> (E)

F-> id
5. Construct an SLR parsing table for the given grammar.
K3 CO2
G: E -> E+T | T T -> TF | F F -> F* | a | b

6. Explain LR parsing algorithm with an example. K2 CO2

7. Check whether the following grammar is a LL(1)


grammar.
S-> iEtS | iEtSeS | a K3 CO2
E-> b

Also explain the FIRST and FOLLOW procedures.


8. Consider the grammar E -> E + E | E * E | (E) | id. Show
the sequence of moves made by the shift-reduce parser
K3 CO2
on the input id1 + id2 * id3 and determine whether the
given string is accepted by the parser or not.

10
1
11. Part B – Questions

S.No Questions K CO
Level Level
9. Construct Parsing table for the grammar and find moves
made by predictive parser on input id + id * id and find
FIRST and FOLLOW.
E -> E + T
E -> T CO2
T -> T * F K3

T -> F

F -> (E)
F-> id

10. Give an algorithm for finding the FIRST and FOLLOW


K2 CO2
positions for a given non-terminal.

11. Explain Context free grammars with examples. K2 CO2

12. Consider the grammar,


E -> E + T
E -> T
T -> T * F
T -> F
F -> (E) K3 CO2

F-> id
Construct a LALR parsing table for the grammar given
above. Verify whether the input string id + id * id is
accepted by the grammar or not.

13. What is a shift-reduce parser? Explain in detail the


K2 CO2
conflicts that may occur during shift-reduce parsing.

10
2
11. Part B – Questions

S.No Questions K CO
Level Level
Consider the grammar given below.
14.
E -> E + T
E -> T
T -> T * F
T -> F CO2
F -> (E) K3
F-> id
Construct an LR parsing table for the above grammar.
Give the moves of LR parser on id*id+id.

15. Explain the non-recursive predictive parsing with its


CO2
algorithm. Explain the LR parsing algorithm in detail. K2

What is an ambiguous grammar? Is the following


16. grammar ambiguous? Prove CO2
K3
E -> E + E | E * E | (E) | id.

Explain a syntax rule (YACC) for arithmetic expression in


17. K2 CO2
detail.

18. Explain SLR parsing table with an example. K2 CO2

19. Explain CLR construction algorithm with an example. K2 CO2

20. Construct a predictive parser for the grammar. Verify


whether the input string id+id*id is accepted by the
grammar or not.
E -> TE’
K3 CO2
E’ -> +TE/ ε
T -> FT’
T’ -> *FT’ / ε
F -> (E)/id
10
3
11. Part B – Questions

S.No Questions K CO
Level Level
21. Consider the grammar given below.
S -> CC
C -> aC CO2
C -> d
Construct a CLR parsing table for the above grammar. K3
22. i) Construct parse tree for the input string w = cad using
top-down parser.
S - > cAd

A - > ab | a CO2
K3
ii)List all LR(0) items for the following grammar.
S->AS|b
A->SA|a
23. Construct the LALR parsing table for the grammar.

E->E+T/T
K3 CO2
T->TF/F

F->F*/a/b
24. Show that the following grammar

S -> Aa/bAC/dc/bda K3 CO2

A -> d is LALR(1) but not SLR(1).

25. Show that the following grammar is LR(1) but not


LALR(1).

S -> Aa/bAC/Bc/bBa K3 CO2


A -> d
B -> d.

10
4
11. Part B – Questions

S.No Questions K CO
Level Level
Construct a predictive parser for the grammar CO2

26. E -> E’|’T|T K3

T -> T’ & ‘F|F

T’ -> !F|(E)|1|0

27. Check whether the following grammar is LL(1) grammar. CO2

S -> iE tS | iEtSeS | a
K3
E -> b.

Construct the predictive parser for the following CO2


28. grammar. Construct the behaviour of the parser on the K3
sentence (a,a) using the grammar

S -> (L)/a

L -> L, S|S.

Show that the following grammar CO2

S -> AaAb/BbBa

29. A -> ε
K3
B -> ε

is LL(1) but not SLR(1).

121
11. Part C – Questions

PART C QUESTIONS (CO2, K3)

1. Consider the CFG depicted below where “begin”, “end” and “x” are all
terminal symbols of the grammar and stat is considered the starting symbol
for this grammar. Productions are numbered in parenthesis and you can
abbreviate “begin” to “b” and “end” to “e” respectively.

Stat -> Block


Block -> begin Block end
Block -> Body
Body -> x
(i) Compute the set of LR (1) items for this grammar and draw the
corresponding DFA. Do not forget to augment the grammar with the initial
productions Sà Start$ as the production (0).
(ii)Construct the corresponding LR parsing table.

1. Show that the grammar is LR(1) but not LALR(1)


S- > Aa | bAc | Bc |bBa
A->d

B->d
And parse the statement “bdc” and “dd”.
Show that the following grammar:
S -> Aa | bAc | dc | bda
A -> a

Is LALR(1) but not SLR(1)


Design the Syntax analyzer using YACC tool.

122
12. Supportive online Certification courses

NPTEL : https://nptel.ac.in/courses/106105190
Swayam : https://www.classcentral.com/course/swayam-compiler-design-12926
coursera : https://www.coursera.org/learn/nand2tetris2
Udemy : https://www.udemy.com/course/introduction-to-compiler-construction-and-design/
Mooc : https://www.mooc-list.com/course/compilers-coursera
Edx : https://www.edx.org/course/compilers

123
13. REAL TIME APPLICATIONS : UNIT – II

Natural Language Processing - Syntactic Analysis


Syntactic analysis or parsing or syntax analysis is the third phase of NLP. The
purpose of this phase is to draw exact meaning, or you can say dictionary meaning
from the text. Syntax analysis checks the text for meaningfulness comparing to the
rules of formal grammar. For example, the sentence like “hot ice-cream” would be
rejected by semantic analyzer. In this sense, syntactic analysis or parsing may be
defined as the process of analyzing the strings of symbols in natural language
conforming to the rules of formal grammar. The origin of the word ‘parsing’ is from
Latin word ‘pars’ which means ‘part’.

Detailed description is available :


https://www.tutorialspoint.com/natural_language_processing/natural_language_pro
cessing_syntactic_analysis.htm

Some of the most common real world applications


·Text summarization to automatically provide the most important points in the
original document. This is particulary good for news summary learning semantic
relations between named entities.
· POS or parts-of-speech of the language. It is about finding parts-of-speach
such as nouns, adjectives, verbs based on context and relationship to adjacent

words.
· Coreference resolution. It is about understanding references to multiple
entities existing in the text and disambiguating that reference. For example the
difference between a mouse (the animal) and a mouse (PC input device).
·Argument mining which is the automatic extraction and identification of
argumentative structures in a text piece typically an article of a journal.

124
14. CONTENTS BEYOND SYLLABUS : UNIT – II

In computer-based language recognition, ANTLR (pronounced antler), or ANother Tool


for Language Recognition, is a parser generator that uses LL(*) for parsing. ANTLR is
the successor to the Purdue Compiler Construction Tool Set (PCCTS), first
developed in 1989, and is under active development. Its maintainer is Professor Terence
Parr of the University of San Francisco.
Example
In the following example, a parser in ANTLR describes the sum of expressions can be seen
in the form of "1 + 2 + 3":

// Common options, for example, the target language


options
{
language = "CSharp";
}
// Followed by the parser
class SumParser extends Parser;
options
{
k = 1; // Parser Lookahead: 1 Token
}
// Definition of an expression
statement: INTEGER (PLUS^ INTEGER)*;
// Here is the Lexer
class SumLexer extends Lexer;
options
{
k = 1; // Lexer Lookahead: 1 characters
}
PLUS: '+';
DIGIT: ('0'..'9');
INTEGER: (DIGIT)+;

125
15. ASSESSMENT SCHEDULE

• Tentative schedule for the Assessment During 2024-2025 Even Semester:

Name of the
S.NO Start Date End Date Portion
Assessment

1 Unit Test 1 06.01.2025 11.01.2025 UNIT 1

2 Internal Assessment 28.01.2025 03.02.2025 UNIT 1 & 2


Test 1

3 Unit Test 2 17.02.2025 22.02.2025 UNIT 3

4 Internal Assessment 10.03.2025 15.03.2025 UNIT 3 & 4


Test 2
5 Model Examination 03.04.2025 17.04.2025 ALL 5 UNITS
16. PRESCRIBED TEXT BOOKS & REFERENCE BOOKS

• TEXT BOOKS:

Alfred V. Aho, Monica S. Lam, Ravi Sethi, Jeffrey D. Ullman, “Compilers:


Principles, Techniques and Tools”, Second Edition, Pearson Education
Limited, 2014.

• REFERENCE BOOKS:
1. Randy Allen, Ken Kennedy, “Optimizing Compilers for Modern Architectures: A
Dependence-based Approach”, Morgan Kaufmann Publishers, 2002.
2. Steven S. Muchnick, “Advanced Compiler Design and Implementation”, Morgan
Kaufmann Publishers - Elsevier Science, India, Indian Reprint, 2003.
3. Keith D Cooper and Linda Torczon, “Engineering a Compiler”, Morgan
Kaufmann Publishers, Elsevier Science, 2004.
4. V. Raghavan, “Principles of Compiler Design”, Tata McGraw Hill Education
Publishers, 2010.
5. Allen I. Holub, “Compiler Design in C”, Prentice-Hall Software Series, 1993.

127
17. MINI PROJECT SUGGESTION
• Objective:
This module facilitate hands-on skills of the students (from the practical courses
more effectively) and they can try the following mini projects for deep
understanding in Compiler Design
Planning:
• This method is mostly used to improve the ability of students in application domain
and also to reinforce knowledge imparted during the lecture.
• Being a technical institute, this method is extensively used to provide empirical
evidence of theory learnt.
• Students are asked to prepare mini projects involving application of the
concepts, principles or laws learnt.
• The faulty guides the students at various stages of developing the project
and gives timely inputs for the development of the model.
Projects:
Set
Number Mini Project Title
(Category)
Set - 1 Design, develop and implement YACC/C program to
(Toppers)
demonstrate Shift Reduce Parsing technique for the
grammar rules: E →E+T | T, T →T*F | F, F → (E) |id and parse
the sentence: id + id * id (CO2, K3)

Set - 2 Design, develop and implement YACC/C program to


(Above
Average) demonstrate Shift Reduce Parsing technique for the
grammar rules: E →E+T | T, T →T*F | F, F → (E) |id and parse
the sentence: id + id * id (CO2, K3)
Set - 3 Design a "dynamic" LL(1) parser program (CO2, K3)
(Average)

Set - 4 Generate SLR Parse Table From CFG Grammar (CO2, K3)
(Below
Average)

Set - 5 Design a Simple Calculator using LEX and YACC tools


(Slow (CO2, K3)
Learners)
128
Thank you

Disclaimer:

This document is confidential and intended solely for the educational purpose of RMK Group of Educational
Institutions. If you have received this document through email in error, please notify the system manager. This
document contains proprietary information and is intended only to the respective group / learning community as
intended. If you are not the addressee you should not disseminate, distribute or copy through e-mail. Please notify
the sender immediately by e-mail if you have received this document by mistake and delete this document from
your system. If you are not the intended recipient you are notified that disclosing, copying, distributing or taking any
action in reliance on the contents of this information is strictly prohibited.

129

You might also like