CSC412 Compiler Construction I March 24 2022 NOUN-pages-1
CSC412 Compiler Construction I March 24 2022 NOUN-pages-1
GUIDE
CIT 316
COMPILER CONSTRUCTION I (PRINCIPLES
AND TECHNIQUES OF COMPILER)
Published by:
National Open University of Nigeria
ISBN: 978-058-470-6
ii
CIT 316 COURSE GUIDE
CONTENTS PAGE
Introduction ……………………………………………………. iv
What You Will Learn In This Course …………………………. iv
Course Aims …………………………………………………… v
Course Objectives ……………………………………………… v
Working through this Course …………………………………... v
Course Materials ……………………………………………….. vi
Study Units ……………………………………………………… vi
Textbooks and References ……………………………………… vii
Assignment File ………………………………………………… viii
Presentation Schedule ………………………………………….. viii
Assessment ……………………………………………………… viii
Tutor-Marked Assignment ……………………………………… ix
Final Examination and Grading ………………………………… ix
Course Marking Scheme ………………………………………... x
Course Overview ……………………………………………….. xi
How to Get the most of this Course ……………………………. xi
Tutors and Tutorials …………………………………………….. xiii
Summary ………………………………………………………… xiv
iii
CIT 316 COURSE GUIDE
INTRODUCTION
This course is divided into four modules. The first module deals with
the review of grammars, languages and automata, introduction to
compiler.
The third module deals with syntax analysis and discusses context-
free grammars, LL (k), LR (k), operator-precedence grammars, etc.
Also, implementation of various parsing techniques was discussed.
This Course Guide gives you a brief overview of the course contents,
course duration, and course materials.
iv
CIT 316 COURSE GUIDE
COURSE AIMS
The third goal is to build the foundation for students to pursue the
research in the areas of compiler, programme analysis, programme
modelling, and operating systems
COURSE OBJECTIVES
Certain objectives have been set out to ensure that the course
achieves its aims. Apart from the course objectives, every unit of this
course has set objectives. In the course of the study, you will need to
confirm, at the end of each unit, if you have met the objectives set at
the beginning of each unit. Upon completing this course you should
be able to:
Related Courses
Prerequisites: CIT 342; Computer Science students only
COURSE MATERIALS
These include:
1. Course Guide
2. Study Units
3. Recommended Texts
4. A file for your assignments and for records to monitor your
progress.
STUDY UNITS
vi
CIT 316 COURSE GUIDE
Aho, Alfred & Sethi, Ravi & Ullman, Jeffrey. Compilers: Principles,
Techniques, and Tools ISBN 0201100886 The Classic Dragon
book.
vii
CIT 316 COURSE GUIDE
ASSIGNMENTS FILE
These are of two types: the self-assessment exercises and the Tutor-
Marked Assignments. The self-assessment exercises will enable you
monitor your performance by yourself, while the Tutor-Marked
Assignment is a supervised assignment. The assignments take a
certain percentage of your total score in this course. The Tutor-
Marked Assignments will be assessed by your tutor within a specified
period. The examination at the end of this course will aim at
determining the level of mastery of the subject matter. This course
includes twelve Tutor-Marked Assignments and each must be done
and submitted accordingly. Your best scores however, will be
recorded for you. Be sure to send these assignments to your tutor
before the deadline to avoid loss of marks.
PRESENTATION SCHEDULE
ASSESSMENT
There are two aspects to the assessment of the course. First are the
tutor marked assignments; second, is a written examination.
At the end of the course, you will need to sit for a final three-hour
examination. This will also count for 70% of your total course mark.
viii
CIT 316 COURSE GUIDE
Assignment questions for the units in this course are contained in the
Assignment File. You should be able to complete your assignments
from the information and materials contained in your set textbooks,
reading and study units. However, you may wish to use other
references to broaden your viewpoint and provide a deeper
understanding of the subject.
The final examination for the course will carry 70% percentage of the
total marks available for this course. The examination will cover
every aspect of the course, so you are advised to revise all your
corrected assignments before the examination.
This course endows you with the status of a teacher and that of a
learner. This means that you teach yourself and that you learn, as
your learning capabilities would allow. It also means that you are in a
better position to determine and to ascertain the what, the how, and
the when of your language learning. No teacher imposes any method
of learning on you.
ix
CIT 316 COURSE GUIDE
This table shows how the actual course marking is broken down.
Assessment Marks
Assignment 1- 4 Four assignments, best three marks of the
four count at 30% of course marks
Final Examination 70% of overall course marks
Total 100% of course marks
COURSE OVERVIEW
x
CIT 316 COURSE GUIDE
Each of the study units follows a common format. The first item is
an introduction to the subject matter of the unit and how a particular
unit is integrated with the other units and the course as a whole. Next
is a set of learning objectives. These objectives enable you know
what you should be able to do by the time you have completed the
unit. You should use these objectives to guide your study. When you
have finished the units you must go back and check whether you have
achieved the objectives. If you make a habit of doing this you will
significantly improve your chances of passing the course.
Remember that your tutor’s job is to assist you. When you need help,
don’t hesitate to call and ask your tutor to provide it.
xi
CIT 316 COURSE GUIDE
7. Review the objectives for each study unit to confirm that you
have achieved them. If you feel unsure about any of the
objectives, review the study material or consult your tutor.
8. When you are confident that you have achieved a unit’s
objectives, you can then start on the next unit. Proceed unit by
unit through the course and try to pace your study so that you
keep yourself on schedule.
9. When you have submitted an assignment to your tutor for
marking, do not wait for its return before starting on the next
unit. Keep to your schedule. When the assignment is
returned, pay particular attention to your tutor’s comments,
both on the tutor-marked assignment form and also written on
the assignment. Consult your tutor as soon as possible if you
have any questions or problems.
10. After completing the last unit, review the course and prepare
yourself for the final examination. Check that you have
achieved the unit objectives (listed at the beginning of each
unit) and the course objectives (listed in this Course Guide).
Your tutor will mark and comment on your assignments, keep a close
watch on your progress and on any difficulties you might encounter
and provide assistance to you during the course. You must mail or
submit your tutor-marked assignments to your tutor well before the
due date (at least two working days are required). They will be
marked by your tutor and returned to you as soon as possible.
You should try your best to attend the tutorials. This is the only
chance to have face to face contact with your tutor and to ask
xii
CIT 316 COURSE GUIDE
questions which are answered instantly. You can raise any problem
encountered in the course of your study. To gain the maximum
benefit from course tutorials, prepare a question list before attending
them. You will learn a lot from participating in discussions actively.
SUMMARY
We hope that by the end of this course you would have acquired the
required knowledge to view compilers, programming languages and
programming environments in a new way.
We wish you success with the course and hope that you will find it
both interesting and useful.
xiii
MAIN
COURSE
CONTENTS PAGE
CONTENTS
1.0 Introduction
2.0 Objectives
3.0 Main Content
3.1 Formal Grammar
3.1.1 The Syntax of Grammars
3.1.2 The Semantic of Grammars
3.2 Types of Grammar and Automata
3.2.1 Type-0: Unrestricted Grammars
3.2.2 Type-1: Context-Sensitive Grammars
3.2.3 Type-2: Context-Free Grammars
3.2.4 Type-3: Regular Grammars
3.2.5 Analytic Grammars
3.3 Chomsky Hierarchy
3.3.1 The Hierarchy
3.4 Formal Languages
3.4.1 Words over an Alphabet
3.4.2 Language-Specification Formalisms
3.4.3 Operations on Languages
3.4.4 Uses of Formal Languages
3.5 Programming Languages
3.5.1 Formal Theories, Systems and Proofs
3.5.2 Interpretations and Models
4.0 Conclusion
5.0 Summary
6.0 Tutor-Marked Assignment
7.0 References/Further Reading
1.0 INTRODUCTION
1
CIT 316 COMPILER CONSTRUCTION
the grammar. The set of all strings that can be generated from the
grammar constitutes the language of the grammar.
In this unit, you will be taken through some of the things you learnt
previously on formal grammar, formal language and automata.
2.0 OBJECTIVES
Example 3.1
1.
2.
then we start with S, and can choose a rule to apply to it. If we choose
rule 1, we obtain the string aSb. If we choose rule 1 again, we replace S
with aSb and obtain the string aaSbb. If we now choose rule 2, we
replace S with ba and obtain the string aababb, and are done. We can
write this series of choices more briefly, using symbols:
. The language of the grammar is then
the infinite
set , where ak
is a repeated k times (and n in particular represents the number of times
production rule 1 has been applied).
• where * is the Kleene star operator and denotes set union. That
is, each production rule maps from one string of symbols to
another, where the first string (the "head") contains at least one
non-terminal symbol. In the case that the second string (the
"body") consists solely of the empty string – i.e. it contains no
symbols at all, it may be denoted with a special notation (often Λ,
e or ε) in order to avoid confusion.
4
CIT 316 MODULE 1
Example 3.2
Note that for these examples, formal languages are specified using set-
builder notation.
1.
2.
3.
4.
5
CIT 316 COMPILER CONSTRUCTION
The difference between these types is that they have increasingly strict
production rules and can express fewer formal languages. Two
important types are context-free grammars (Type 2) and regular
grammars (Type 3). The languages that can be described with such a
grammar are called context-free languages and regular languages,
respectively. Although much less powerful than unrestricted grammars
(Type 0), which can in fact express any language that can be accepted
by a Turing machine, these two restricted types of grammars are most
often used because parsers for them can be efficiently implemented
The language defined above is not a context-free language, and this can
be strictly proven using the pumping lemma for context-free languages,
but for example the language (at least 1 a followed by
the same number of b's) is context-free, as it can be defined by the
grammar G2 with , , S the start symbol, and the
following production rules:
6
CIT 316 MODULE 1
1.
2.
In regular grammars, the left hand side is again only a single non-
terminal symbol, but now the right-hand side is also restricted. The right
side may be the empty string, or a single terminal symbol, or a single
terminal symbol followed by a non-terminal symbol, but nothing else.
(Sometimes a broader definition is used: one can allow longer strings of
terminals or single non-terminals without anything else, making
languages easier to denote while still defining the same class of
languages.)
7
CIT 316 COMPILER CONSTRUCTION
8
CIT 316 MODULE 1
Recursively enumerable
Context-sensitive
Context-free
Regular
9
CIT 316 COMPILER CONSTRUCTION
However, there are further categories of formal languages that you can
read more about in the further reading.
10
CIT 316 MODULE 1
Examples 3.3
a. The following rules describe a formal language L over the
alphabet Σ = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, +, =}:
b. Every nonempty string that does not contain + or = and does not
start with 0 is in L.
The string 0 is in L.
c. A string containing = is in L if and only if there is exactly one =,
and it separates two valid strings in L.
d. A string containing + but not = is in L if and only if every + in
the string separates two valid strings in L.
e. No string is in L other than those implied by the previous rules.
f. Under these rules, the string "23+4=555" is in L, but the string
"=234=+" is not. This formal language expresses natural
numbers, well-formed addition statements, and well-formed
addition equalities, but it expresses only what they look like (their
syntax), not what they mean (semantics). For instance, nowhere
11
CIT 316 COMPILER CONSTRUCTION
12
CIT 316 MODULE 1
Example 3.4
a. Suppose L1 and L2 are languages over some common alphabet.
b. The concatenation L1L2 consists of all strings of the form vw
where v is a string from L1 and w is a string from L2.
c. The intersection L1 ∩ L2 of L1 and L2 consists of all strings which
are contained in both languages
d. The complement ¬L of a language with respect to a given
alphabet consists of all strings over the alphabets that are not in
the language.
e. The Kleene star: the language consisting of all words that are
concatenations of 0 or more words in the original language;
Reversal:
f. Let e be the empty word, then eR = e, and
g. for each non-empty word w = x1…xn over some alphabet, let
wR = xn…x1,
h. then for a formal language L, LR = {wR | w ∈ L}.
String homomorphism
Such string operations are used to investigate closure properties of
classes of languages. A class of languages is closed under a particular
operation when the operation, applied to languages in the class, always
produces a language in the same class again. For instance, the context-
13
CIT 316 COMPILER CONSTRUCTION
Formal languages are often used as the basis for richer constructs
endowed with semantics. In computer science they are used, among
other things, for the precise definition of data formats and the syntax of
programming languages.
Formal languages play a crucial role in the development of compilers,
typically produced by means of a compiler, which may be a single
programme or may be separated in tools like lexical analyser generators
(e.g. lex), and parser generators (e.g. yacc). Since formal languages
alone do not have semantics, other formal constructs are needed for the
formal specification of programme semantics.
SELF-ASSESSMENT EXERCISE II
14
CIT 316 MODULE 1
15
CIT 316 COMPILER CONSTRUCTION
4.0 CONCLUSION
In this unit you have been taken through a brief revision of formal
grammars, formal languages and automata because of crucial roles they
play in the development of compilers. You should read more on these
various topics before proceeding to the next unit. In the subsequent
units, you will be learning about compiler construction. Precisely, unit 2
of this module will introduce you to the concept of compilers.
5.0 SUMMARY
16
CIT 316 MODULE 1
17
CIT 316 COMPILER CONSTRUCTION
18
CIT 316 MODULE 1
CONTENTS
1.0 Introduction
2.0 Objectives
3.0 Main Content
3.1 Translators
3.2 Why do we need Translators?
3.3 What is a Compiler?
3.4 The Challenges in Compiler Development
3.5 Compiler Architecture
4.0 Conclusion
5.0 Summary
6.0 Tutor-Marked Assignment
7.0 References/Further Reading
1.0 INTRODUCTION
In the previous unit you were taken through some basic concepts you
learnt in an earlier course. This was done because of their
relevance/importance to your understanding of this course.
In this unit you will be introduced to the concept of compilers and their
importance to programme development.
2.0 OBJECTIVES
3.1 Translators
19
CIT 316 COMPILER CONSTRUCTION
Input
Fig. 1: Compilation and Execution
20
CIT 316 MODULE 1
You should note that both interpreters and compilers (like any other
programme) are written in some high-level programming language
(which may be different from the language they accept) and they are
translated into machine code. For example, a Java interpreter can be
completely written in Pascal, or even Java. The interpreter source
programme is machine independent since it does not generate machine
code. (Note the difference between generate and translated into
machine code.) An interpreter is generally slower than a compiler
because it processes and interprets each statement in a programme as
many times as the number of the evaluations of this statement. For
example, when a for-loop is interpreted, the statements inside the for-
loop body will be analysed and evaluated on every loop step. Some
languages, such as Java and Lisp, come with both an interpreter and a
compiler. Java source programmes (Java classes with .java extension)
are translated by the java compiler into byte-code files (with .class
extension). The Java interpreter, java, called the Java Virtual Machine
(JVM), may actually interpret byte codes directly or may internally
compile them to machine code and then execute that code.
Like was mention in section 3.1, compilers and interpreters are not the
only examples of translators. In the table below are a few more:
21
CIT 316 COMPILER CONSTRUCTION
SELF-ASSESSMENT EXERCISE
i. What is a translator?
ii. What is the importance of translators in programming?
iii. Distinguish among a translator, compiler and an interpreter.
1) Many variations:
a. many programming languages (e.g. FORTRAN, C++,
Java)
b. many programming paradigms (e.g. object-oriented,
functional, logic)
c. many computer architectures (e.g. MIPS, SPARC, Intel,
alpha)
d. many operating systems (e.g. Linux, Solaris, Windows)
2) Qualities of a compiler: these concerns the qualities that are
compiler must possess in other to be effective and useful. These
are listed below in order of importance:
a. the compiler itself must be bug-free
b. it must generate correct machine code
c. the generated machine code must run fast
22
CIT 316 MODULE 1
23
CIT 316 COMPILER CONSTRUCTION
But the above ideal separation of compilation into two phases does not
work very well for real programming languages and architectures.
Ideally, you must encode all knowledge about the source programming
language in the front end, you must handle all machine architecture
features in the back end, and you must design your IRs in such a way
that all language and machine features are captured properly.
The generated machine code is written in an object file. This file is not
executable since it may refer to external symbols (such as system calls).
The operating system provides the following utilities to execute the
code:
24
CIT 316 MODULE 1
4.0 CONCLUSION
In this unit you have been taken through the definition, functions, and
architecture of a compiler in the next unit you will be learning more
about the structure of a compiler and the various phases involved in
compilation process
5.0 SUMMARY
25
CIT 316 COMPILER CONSTRUCTION
26
CIT 316 MODULE 1
CONTENTS
1.0 Introduction
2.0 Objectives
3.0 Main Content
3.1 The Structure of a Compiler
3.2 Phases of a Compiler
3.2.1 The Lexical Analyser
3.2.2 The Syntax Analyser
3.2.3 The Intermediate Code Generator
3.2.4 Code Optimisation
3.2.5 Code Generation
3.2.6 The Table Management or Bookkeeping
3.2.7 The Error Handler
3.3 Passes
3.4 Reducing the Number of Passes
3.5 Cross Compilation
3.6 T-Diagrams
4.0 Conclusion
5.0 Summary
6.0 Tutor-Marked Assignment
7.0 References/Further Reading
1.0 INTRODUCTION
In the previous unit you were introduced to the concept of compilers and
their roles in programming.
In this unit you will learn about the structure of a compiler and the
various phases involved in the compilation process.
2.0 OBJECTIVES
27
CIT 316 COMPILER CONSTRUCTION
1. Front end
2. Back-end
3. Tables of information
4. Runtime library
a. Machine language
b. Assembly language
c. High level language or high level language with bootstrapping
facilities for flexibility and transporting.
28
CIT 316 MODULE 1
this is the first phase and it is also referred to as the Scanner. It separates
characters of the source language into groups that logically belong
together; these groups are called tokens. The usual tokens are keywords,
such as DO or IF, identifiers such as X or NUM, operator symbol such
as <= or +, and punctuation symbol such as parentheses or commas. The
output of the lexical analyser is a stream of tokens, which is passed to
the next phase, the syntax analyser or parser. The tokens in this stream
can be represented by codes which we may regard as integers. Thus DO
might be represented by 1, + by 2, and “identifier” by 3. In the case of a
token like “identifier”, a second quantity, telling which of those
identifiers used by the programme is represented by this instance of
token “identifier” is passed along with the integer code for “identifier”.
For Example, in the FORTRAN statement:
Source Programme
Lexical Analysis
Syntax Analysis
Code Optimisation
Code Generation
Target Programme
Fig. 1: Phases of a Compiler
29
CIT 316 COMPILER CONSTRUCTION
This groups tokens together into syntactic structures. For example, the
three tokens representing A+B might be grouped into a syntactic
structure called an expression. Expressions might further be combined to
form statements. Often the syntactic structure can be regarded as a tree
whose leaves are the tokens. The interior nodes of the tree represent
strings of tokens that logically belong together. The parser has two
functions. It checks that the tokens appearing in its input, which is the
output of the lexical analyser, occur in patterns that are permitted by the
specification for the source language. It also imposes on the tokens a
tree-like structure that is used by the subsequent phases of the compiler.
This is the final phase and it produces the object code by deciding on the
memory locations for data, selecting code to access each datum, and
selecting the registers in which each computation is to be done.
Designing a code generator that produces truly efficient object
programmes is one of the most difficult parts of a compiler design, both
practically and theoretically.
30
CIT 316 MODULE 1
This portion of the compiler keeps track of the names used by the
programme and records essential information about each, such as its
type (integer, real, etc.). The data structure used to record this
information is called a symbol table.
SELF-ASSESSMENT EXERCISE
3.2 Passes
The numbers of passes, and the grouping of phases into passes, are
usually dictated by a variety of considerations germane to a particular
language and machine, rather than by any mathematical optimality
criteria.
The structure of the source language has strong effect on the number of
passes. Certain languages require at least two passes to generate code
easily. For example, languages such as PL/I or ALGOL 68 allow the
declaration of a name to occur after uses of that name. Code for
expression containing such a name cannot be generated conveniently
until the declaration has been seen.
31
CIT 316 COMPILER CONSTRUCTION
As you have seen in section 3.1 of this unit, the operation of a compiler
includes:
32
CIT 316 MODULE 1
In this course, we will concern ourselves with mostly the front-end i.e.
those parts of compilation that can be automated which are lexical,
syntax and probably semantic analyses.
3.6 T-Diagrams
Compiler
Language
Implementation
Fig. 2: T-Diagrams
4.0 CONCLUSION
In this unit you have been taken through the structure, phases and
functions of each phase of a compiler. In the next unit you will be
learning more about the compilation process; such as cross compilation
and hand implementation.
5.0 SUMMARY
34