Exploring Programming Language Architecture in Perl: Bill Hails
Exploring Programming Language Architecture in Perl: Bill Hails
Bill Hails
Description: an online book using the Perl programming language to explore various
aspects of programming language architecture.
Keywords: perl, scheme, interpreter, pscheme
ISBN: 978-1-4452-2592-0
Contents
1 Introduction 3
1.1 Why Perl? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Why Scheme? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Typography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.5 A Note on the Interpreter Versions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2 An Introduction to PScheme 7
2.1 PScheme Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Simple Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3 Conditionals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.4 Global Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.5 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.6 Local Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
iii
iv CONTENTS
3.11.6 PScm/SpecialForm.pm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.11.7 PScm/Expr.pm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.11.8 t/PScm.t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.11.9 t/lib/PScm/Test.pm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.11.10 t/interactive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4 Implementing let 49
4.1 The Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.1.1 A Stack-based Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.1.2 A Linked List Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.2 Global Environments have a Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.3 Environment Passing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.4 let Itself . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.6 Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.7 Listings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.7.1 t/PScm Let.t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5 Implementing lambda 59
5.1 lambda . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.2 Evaluating a Closure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
5.3 Printing a Closure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
5.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
5.5 Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
5.6 Listings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
5.6.1 PScm/Closure.pm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
5.6.2 t/PScm Lambda.t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
8 List Processing 87
8.1 quote . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
8.2 list . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
8.3 car and cdr . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
8.4 cons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
8.4.1 Dot Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
8.5 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
8.5.1 Changes to Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
8.5.2 Changes to Primitives and Special Forms . . . . . . . . . . . . . . . . . . . . . . . 96
8.5.3 Changes to Closures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
8.5.4 Changes to the Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
8.5.5 Changes to the Reader . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
8.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
8.7 Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
8.8 Listings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
8.8.1 t/PScm List.t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
8.8.2 t/PScm Dot.t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
9 Macros 107
9.1 macro . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
9.2 Evaluating Macros . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
9.2.1 Trying it out . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
9.2.2 An Improvement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
9.2.3 One Last Addition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
9.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
9.4 Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
9.5 Listings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
9.5.1 t/PScm Macro.t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
9.5.2 t/PScm Eval.t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
11 define 131
11.1 Environment Changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
11.2 The define Special Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
11.3 Persistant Environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
11.4 Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
11.5 Listings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
11.5.1 t/PScm Define.t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
vi CONTENTS
13 Continuations 159
13.1 Tail Recursion and Tail Call Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
13.2 Continuation Passing Style . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
13.3 Example cps Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
13.4 The Trampoline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
13.5 Using cps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
13.6 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
13.6.1 Our Trampoline Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
13.6.2 cps let and lambda . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
13.6.3 cps letrec . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
13.6.4 cps let* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
13.6.5 cps List Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
13.6.6 cps macro and unquote . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
13.6.7 cps Sequences and Assignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
13.6.8 cps define . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
13.6.9 cps oop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
13.7 cps Without Closures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
13.8 cps Fun . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
13.8.1 An error Handler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
13.8.2 yield . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
13.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
13.10Listings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
13.10.1 PScm/Continuation.pm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
13.10.2 t/PScm CallCC.t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226
13.10.3 t/CPS Error.t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
13.10.4 t/CPS Yield.t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
CONTENTS vii
14 Threads 231
14.1 Variations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
14.2 Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
14.3 Listings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
14.3.1 t/CPS Spawn.t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
18 Summary 351
Bibliography 353
Index 354
List of Figures
8.1 Cons Cell Representation of a nested list (foo ("bar" 10) baz) . . . . . . . . . . . . . . 90
8.2 The pair (a . b) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
8.3 The structure (a b . c) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
ix
x LIST OF FIGURES
1
Chapter 1
Introduction
Madness.
– Larry Wall
By the end of this book you should have a thorough understanding of the inner workings of a programming
language interpreter. The source code is presented in full, and several iterations add more features until
it could be considered pretty complete. The interpreter is written to be as easy to understand as possible;
it has no clever optimizations that might obscure the basic ideas, and the code and the ideas will be
described to the best of my ability without any unexplained technical jargon. It is however assumed that
you have a good working knowledge of Perl (Perl5), including its object-oriented features.
The final implementation will demonstrate:
• conditional evaluation;
• local variables;
• recursion;
• list processing;
• quote—preventing evaluation;
• continuations;
3
4 CHAPTER 1. INTRODUCTION
• threads;
• exceptions;
• logic programming.
Having said that, time and space is not wasted fleshing the interpreter out with numerous cut’n’paste
system interfaces, i/o or even much basic arithmetic (the final implementation has only multiplication,
addition and subtraction—enough for the tests and examples to work,) but by then it should be a trivial
matter for anyone to add those themselves if they feel so inclined. Another point worth mentioning
up front is that no claims are made that this is in any way a production-quality, or even an efficient
implementation. It is just meant to be easy to understand.
Putting it another way, if you’ve come here looking for an off-the shelf scheme-like interpreter that
you can use, you’ve come to the wrong place: there are many and better freely available implementations
on the net. On the other hand if you’re more interested in how such interpreters might work, I’d like to
think that you might find what you’re looking for here.
that if C is the “chess” of programming languages, then Scheme is more like “go”. The official standard
for Scheme, the “Revised(6) Report on the Algorithmic Language Scheme” or R6RS [12] as it is known,
has this to say:
Programming languages should be designed not by piling feature on top of feature, but by
removing the weaknesses and restrictions that make additional features appear necessary.
Whether or not one agrees with that, and it’s hard to argue, it strongly suggests that such a consistent
language might be pretty straightforward to implement.
Another interesting feature of Scheme and the Lisp family of languages is that the list data that they
work with is also used to construct the internal representation of the programs themselves. Therefore
Scheme programs can directly manipulate their own syntax trees. This makes the definition of macros
(syntactic extensions) particularly easy from within the language itself, without recourse to any separate
preprocessing stage. Finally, another good reason for choosing Scheme is that it is extremely easy to
parse, as we shall see.
1.3 References
I provide and refer to a select bibliography. Almost all of the concepts in this book are well known in
academic circles and it would be disingenuous of me to try to pass them off as my own. The bibliography
should provide you with a small collection of useful jumping off points should you wish to investigate
any of these topics further.
1.4 Typography
All of the source code listings and extracts from the source are shown in fixed-width type with line
numbers, and are pulled directly from the code of the working interpreter. Furthermore, when displaying
a newer version of an individual method or package, the differences bethween that version and the
previous one are calculated automatically and displayed in bold. Package names are displayed Like::
This and methods like this(). Scheme code looks (like this).
Other in-line code, such as Scheme and occasionally Perl examples are unfortunately not so rigorously
constrained. The possibility exists that even though I have manually tested all of those examples there
could be an error or two in there, for which I can only apologise.
An Introduction to PScheme
7
8 CHAPTER 2. AN INTRODUCTION TO PSCHEME
• The result of evaluating a symbol is the value that that symbol currently has, or an error if the
symbol has no value;
• The result of evaluating a list of expressions is the result of evaluating each expression in turn,
then applying the first evaluated expression (which should be a function) to the other evaluated
expressions.
> 2
2
The “>” is the PScheme prompt. We gave the interpreter a 2, and it replied with 2, because 2 is 2 is 2.
Let’s try something a bit more adventurous:
> x
Error: no binding for x in PScm::Env
We asked for the value of a symbol, x, and because the interpreter doesn’t know what x is, we get an
error.
Here’s something that does work:
> (* 2 2)
4
Now that might look strange at first, but remember the first subexpression in a list should evaluate to a
function. The multiplication symbol “*” does indeed evaluate to the internal primitive definition of how
to multiply; we told PScheme to multiply 2 by 2, and it replied 4. In detail what it has done is:
One important thing to note here is that PScheme makes no distinction between functions and operators,
the operation always comes first. This has some advantages; because the operation always comes first,
it can often apply to variable numbers of arguments:
> (* 2 2 2 2)
16
A more syntax-rich language would require something like 2 * 2 * 2 * 2 to get the same result.
Now for something just a little more complex:
2.3. CONDITIONALS 9
> (* (- 8 3) 2)
10
Here we told the interpreter to subtract 3 from 8, then multiply the result by 2. It did it by:
3. Evaluating 2 to get 2;
Hopefully it is obvious that the interpreter is following a very simple set of rules here, albeit recursively.
This incidentally demonstrates another big simplification that PScheme makes: it is impossible for
there to be any ambiguity about operator precedence, because the language forces the precedence to be
explicit. In fact there is no notion of operator precedence in PScheme. In a more syntax-rich language,
to achieve the above result one would have to write (8 - 3) * 2 because the equally legal 8 - 3 * 2
would be misinterpreted (a lovely expression) as 8 - (3 * 2).
2.3 Conditionals
The keyword if introduces a conditional statement. The general form of an if expression is:
(if htesti
htrue-resulti
hfalse-resulti)
This is simple enough, if expects (in this implementation at least) three arguments: a test, a consequent
(true result) and an alternative (false result). For example:
> (if 0
> 3
> (- 8 3))
5
In this example since the test, 0, is false (again, in this implementation) the alternative (8 − 3 = 5) is
returned.
Even here we can start to see some of the power of the language:
10 CHAPTER 2. AN INTRODUCTION TO PSCHEME
> ((if 0 - *) 4 5)
20
In the author’s opinion this is a beautiful example of “removing the weaknesses and restrictions that
make additional features appear necessary”; because the language treats the operator position just like
any other expression, any expression that evaluates to an operation is valid in that position. Furthermore
because primitive operations are represented by symbols just like anything else, they can be treated just
like any other variable: the if with a false (0) test argument selects the value of “*” to return, rather
than the value of “-”. So it’s the multiplication function that gets applied to the arguments 4 and 5.
However there is a slight complication, Consider this:
> (if 0
> (a-long-calculation)
> (- 8 3))
5
Were if a normal function, the normal rules for evaluation would apply: evaluate all the com-
ponents of the list, then apply the if function to the evaluated arguments. That would mean
(a-long-calculation) and (- 8 3) would both get evaluated, then if would pick the result. Al-
though the value of the whole if expression is unaffected, provided (a-long-calculation) doesn’t
have any side-effects, we still don’t want to have that calculation executed unnecessarily. Now remember
it was said that PScheme evaluates each component of the list in a list expression? Well that’s not
entirely the case. It always evaluates the first component of the list, and if the result is a simple function
like multiplication, it then goes on to evaluate the other items on the list and passes the results to the
function just as has already been described. However if the first component is what is called a special
form, such as the definition of if, PScheme passes the un-evaluated arguments to the special form and
that special form can do what it likes with them.
In the case of if, if evaluates its first argument (the test) and if the result is true it evaluates and
returns its second argument (the consequent), otherwise it evaluates and returns its third argument (the
alternative). We can demonstrate that with a simple example:
> (if 1
> 10
> x)
10
Because the test result was true, the if only evaluated the consequent expression, there was no error
from the undefined symbol x in the alternative.
> (define x 5)
x
> x
5
In the above example we defined x to be 5. Then when we asked for the value of x PScheme replied 5.
Note again that the operation (define in this case) always comes first. Note also that define must be
a special form, because we didn’t get an error attempting to evaluate x during the definition. define
does however evaluate its second argument so:
> (define a b)
Error: no binding for b in PScm::Env
causes an immediate error attempting to evaluate the undefined symbol b before assigning the result to
a.
2.5 Functions
lambda, another special form, creates a function. The general form of a lambda expression is:
(lambda (hsymboli ...) hexpressioni)
The (hsymboli ...) part is the names of the arguments to the function, and the hexpressioni is the
body of the function.
Here’s an example:
> (define square
> (lambda (x) (* x x)))
square
Now that may also look a bit strange at first, but simply put, lambda creates an anonymous function,
and that is separate from giving that function a name with define. The function being defined in this
example takes one argument x and its function body is (* x x). The function body will execute when
the function is invoked. This is more or less equivalent to this Perl snippet:
our $square = sub {
my ($x) = @_;
$x * $x;
};
In fact, Perl’s anonymous sub {...} syntax can be considered pretty much synonymous with PScheme’s
(lambda ...). The big difference is that in PScheme that’s the only way to create a function2 .
Having created a square function, it can be called:
2
There are examples of Scheme code that show things like:
(define (square x)
(* x x))
This form of define, where the expression being defined is a list, is just syntactic sugar for the underlying form. define
essentially re-writes it into the simpler lambda statement before evaluating it. Since the definition here mimics the intended
usage of the function it is certainly a little bit easier to read, but personally I find that since I have to use lambda in some
expressions anyway, it makes sense to always use it. Plus the syntactic sugar tends to obscure what is really going on. In
any case PScheme does not support this alternative syntax for function definition.
12 CHAPTER 2. AN INTRODUCTION TO PSCHEME
> (square 4)
16
Although square was created by assignment, when it is used it is syntactically indistinguishable from
any built-in function.
Anonymous functions can also be called directly without giving them a name first:
Again this is much simpler than it might first appear. The first term of the list expression, the lambda
expression, gets evaluated resulting in a function which will square it’s argument. That function then
immediately gets applied to 3 resulting in 9. It is possible to do something similar in perl, like this:
As an aside, you may be wondering what the eleventh letter of the Greek alphabet has to do with the
creation of a function. The term comes from a branch of mathemetics called the lambda calculus which
is concerned with describing and reasoning about the behaviour of mathematical functions in general.
Even though the lambda calculus was devised before the creation of the first computer, it turns out that
it provides a sound theoretical basis for the implementation of programming languages, and Lisp was the
first programming language to exploit that fact. There is a good introduction to the lambda calculus in
[10], and a more detailed and rigorous treatment in [11].
(hsymboli hexpressioni)
let takes a list of bindings (symbol-value pairs) and a body to execute with those bindings in effect.
For example:
That can be read aloud as “let a = 10 and b = 10 + 10 in the expression a + b”. Symbol a is given the
value 10 and symbol b the value 20 while the body is evaluated. However if a later expression was to ask
for the value of a or b outside of the scope (the last closing brace) of the let, there would be an error
(assuming there weren’t global bindings of a and b in effect.)
The careful reader will have noticed that these were described as lexically scoped variables, and yes,
any functions defined in the scope of those variables are closures just like Perl closures and have access
to those variables when executed even if executed outside of that scope. For example:
When reading this it’s useful to remember that define does evaluate its second argument. That means
that this expression defines times2 to be the result of evaluating the let expression. Now that let
expression binds n to 2, then returns the result of evaluating the lambda expression (creating a function)
with that binding in effect. It is that newly created function that gets bound to the symbol times2.
When times2 is later used, for example in (times2 4), the body of the function (* n x) can still “see”
the value of n that was supplied by the let, even though the function is executed outside of that scope.
This is similar to the common Perl trick to get a private static variable:
{
my $n = 2;
sub times2 {
my ($x) = @_;
$n * $x;
}
}
our $times2 = do {
my $n = 2;
sub {
my ($x) = @_;
$n * $x;
}
};
And that’s pretty much all that is needed for now. Of course the final language has many other interesting
features, but these will be introduced in later sections as the need arises. Let’s take a look at our first
cut at an interpreter3 .
3
If you want more of an introduction to Scheme in general, you could do worse than look at [6].
14 CHAPTER 2. AN INTRODUCTION TO PSCHEME
Chapter 3
This preliminary version of the interpreter supports only three operations, namely multiplication (*), sub-
traction (-), and conditional evaluation (if). It does however lay the groundwork for more sophisticated
interpreters later on.
Scheme lisp interpreters, being interactive, are based around what is called a “read eval print loop”:
first read an expression, then evaluate it, then print the result, then loop. This long-winded term is
often abbreviated to repl. In order for the repl to evaluate the expression, there must additionally be an
environment in which symbols can be given values and in which values can be looked up. All this means
that there are six principle components to such an interpreter.
A Structure returned by the Reader, representing the expression (and incidentally returned by
the Evaluator, representing the result);
An Environment in which symbols can be associated with values and the values of symbols can
be looked up.
A Set of Primitive Operations bound to symbols in the initial environment, which implement
all of the individual built in commands.
A Print System which converts the result of evaluation back to text and displays it to the user.
The implementation we’re about to discuss takes a fairly strict OO approach, with each of these com-
ponents and pretty much everything else represented by classes of objects. As a consequence of this the
Evaluator and the Print system are distributed throughout the Structure component. This means that
for example to evaluate an expression you call its Eval method, and to print a result you call the Print
method on the result object. There is a good deal of scope for polymorphism with this approach, since
different types of object can respond differently to the same message.
15
16 CHAPTER 3. INTERPRETER VERSION 0.0.0
There are only three things in that environment. They are the objects that will perform the primitive
operations of multiplication, subtraction and conditional evaluation, and they’re bound to “*”, “-” and
“if” respectively. We’ll see how they work presently.
ReadEvalPrint() on Lines 37-46 is the central control routine of the whole interpreter. It takes an
input file handle and an output file handle as arguments. Starting on Line 40 it defaults the output
file handle to stdout, then on Line 41 it creates a new PScm::Read object on the input file handle,
and on Lines 42-45 it enters its main loop. The loop repeatedly collects an expression from the Reader,
then evaluates the expression by calling its Eval() method, then prints the result by calling its Print()
method:
The basis of the print system can be seen in the Print() and as string() methods in PScm.pm, but
we’re going to leave discussion of the print system until later on. In the next section we’ll look at our
first, very simple, implementation of an environment.
The LookUp() method on Lines 13-22 looks up a symbol in the bindings, die-ing if the symbol does not
have a binding:
Note that the $symbol passed in is an object, and LookUp() must call the symbol’s value() method
to get a string suitable for a hash key. The value() method for a symbol just returns the name of the
symbol as a perl string.
Because this first version of the interpreter has no support for local variables, this class doesn’t
provide any methods for adding values to the environment. That will come later.
And that’s all there is to our environment class. Let’s move on to look at the Reader.
Figure 3.1: Example PScheme Structure for (foo ("bar" 10) baz)
List
foo baz
String Number
"bar" 10
In this figure, showing the result of parsing that expression, the top-level list object has three components.
Reading left to right it contains the symbol object foo, another list object and the symbol object baz. The
sub-list contains the string object "bar" and the number object 10. It is apparent that that the structure
is a direct representation of the text, where each list corresponds to the contents of a matching pair of
braces. It should also be obvious that these structures are practically identical to Perl list references.
The scheme list (foo ("bar" 10) baz) corresponds directly to the nested perl listref [$foo, ["bar",
10], $baz]1 .
To simplify the creation of such a structure from an input stream, it is often convenient to split the
process into two parts:
A tokeniser which recognises and returns the basic tokens of the text (braces, symbols, numbers and
strings);
A builder or parser which assembles those tokens into meaningful structures (lists).
Apart from new() the only other publicly available method is Read(), which returns the next complete
expression, as a structure, from the input file. The Read() method calls the private next token()
method (the tokeniser) for its tokens.
Skipping over the Read() method for now, next token() on Lines 38-61 simply chomps the next
token off the input stream and returns it. It knows enough to skip whitespace and blank lines and to
return undef at eof (Lines 41-45). If there is a line left to tokenise, then a few simple regexes are tried in
turn to strip the next token from it. As soon as a token of a particular type is recognised, it is returned
to the caller.
Lines 47-59 do the actual tokenisation. The tokeniser only needs to distinguish open and close braces,
numbers, strings and symbols, where anything that doesn’t look like an open or close brace, a number
or a string must be a symbol. next token() returns its data in objects, which incidentally happens
to be a very convenient way of tagging the type of token returned. The objects are of two basic types:
PScm::Token; and PScm::Expr.
The PScm::Token types PScm::Token::Open and PScm::Token::Close represent an open and
a close brace respectively, and contain no data. The three PScm::Expr types, PScm::Expr::Number,
PScm::Expr::String and PScm::Expr::Symbol contain the relevant number, string or symbol.
Now that we know how next token() works, we can go back and take a look at Read().
The Read() method (Lines 17-36) has to return the next complete expression from the input stream.
That could be a simple symbol, string or number, or an arbitrarily nested list. It starts by calling
next token() at Line 20 and returning undef if next token() returned undef (signifying end of file).
Then, at Line 23 if the token is anything other than an open brace (determined by the call to is open -
token()2 ), Read() just returns it. Otherwise, the token just read is an open brace, so Read() initialises
an empty result @res to hold the list it expects to accumulate then enters a loop calling itself recursively
to collect the (possibly nested) components of the list. It is an error if it detects eof while a list is
unclosed, and if it detects a close brace (is close token()) it knows its work is done and it returns the
accumulated list as a new PScm::Expr::List object.
The structure returned by Read() is completely composed of subtypes of PScm::Expr, since the
PScm::Token types do not actually get entered into the structure. Let’s work through the parsing of
that simple expression (foo ("bar" 10) baz). In the following, the subscript number keeps track of
which particular invocation of Read() we are talking about.
• Read1 adds the ("bar" 10) to the end of its own growing list: (foo ("bar" 10).
2
It could have just said
return $token unless $token->isa(’PScm::Token::Open’);
but I always think it’s a bit rude to peep into the implementation like that, much better to ask it what it thinks it is, not
forcibly extract its data type.
3.3. THE READER 21
• Read1 adds the baz to the end of its own growing list: (foo ("bar" 10) baz.
• Read1 gets the ) so it knows it has reached the end of its list and returns the result: (foo ("bar"
10) baz).
PScm::Token inherits a stub new() method from the PScm class that just blesses an empty hash with
the argument class.
22 CHAPTER 3. INTERPRETER VERSION 0.0.0
As for the PScm::Expr objects that Read() accumulates and returns, as noted Read() has done all of
the work in constructing a tree of them for us, so they are more properly discussed in the next section
where we look at expressions.
PScm::Expr
Pscm::Expr::List PScm::Expr::Atom
PScm::Expr::Literal PScm::Expr::Symbol
PScm::Expr::String PScm::Expr::Number
This figure is drawn using a standard set of conventions for diagramming the relationships between
classes in an OO design, called “the Unified Modelling Language”, or UML. [5]
For those who don’t know UML, the triangular shape means “inherits from” or “is a subclass of”,
and the black arrow and circle coming from the white diamond means “aggregates zero or more of”.
The classes with names in italics are “abstract” classes. As far as Perl is concerned, calling a class
“abstract” just means that we promise not to create any actual object instances of that particular class.
The unterminated dotted line simply implies that we will be deriving other classes from PScm::Expr
later on.
The root of the hierarchy is PScm::Expr, representing any and all expressions. That divides into
lists (PScm::Expr::List) and atoms (PScm::Expr::Atom).
3.4. PSCHEME EXPRESSIONS 23
PScm::Expr
value
Pscm::Expr::List PScm::Expr::Atom
new new
value value
PScm::Expr::Literal PScm::Expr::Symbol
PScm::Expr::String PScm::Expr::Number
new
As you can see, there are three new() methods in the class structure. The PScm::Expr::Atom abstract
class is the parent class for strings and numbers (via PScm::Expr::Literal) and for symbols. Since all
of these types are simple scalars, the new() method in PScm::Expr::Atom does for most of them: it
blesses a reference to the scalar into the appropriate class.
However the PScm::Expr::Number package supplies its own new() method, because we avail ourselves
of the core Math::BigInt package for our integers. While it is nice to have arbitrary sized integers by
default, the main reason for doing this is to avoid the embarrassment of Perl’s automatic type conversion
to floating point on integer overflow when implementing a language that is only supposed to support
integer arithmetic.
24 CHAPTER 3. INTERPRETER VERSION 0.0.0
The PScm::Expr::List class has the other new() method that simply bundles up its argument Perl list
in a new object:
All three of these new() methods have already been seen in action in the Reader.
Alongside most of the new() methods is a value() method that does the exact reverse of new() and
retrieves the underlying value from the object. In the case of atoms, it dereferences the scalar value:
Even though PScm::Expr::Number has its own new() method, we don’t need a separate value()
method for numbers, we never need to retrieve the actual perl number from the Math::BigInt object so
we just inherit value() from PScm::Expr::Atom. We do however provide a default value() method
in PScm::Expr. This default method just returns $self.
This is solely for the benefit of those as-yet undescribed additional PScm::Expr subclasses, which will
all stand for their own values.
We’ve seen that the various PScheme expression types (lists, numbers, strings and symbols) arrange
themselves naturally into a hierachy of types and also form a recognised design pattern called “Compos-
ite”. Next we’re going to look at how those expressions are evaluated.
3.5. EVALUATION 25
PScm::Expr
Eval
Pscm::Expr::List PScm::Expr::Atom
Eval
PScm::Expr::Literal PScm::Expr::Symbol
Eval
PScm::Expr::String PScm::Expr::Number
3.5 Evaluation
To evaluate a PScm::Expr, as mentioned earlier, the top level ReadEvalPrint() loop just calls the
expression’s Eval() method. The Eval() methods of PScm::Expr are located in three of its subclasses
as shown in Figure 3.4.
The figure shows that there is a separate Eval() method for lists and for symbols, and a default method
for all other PScm::Expr.
This means that numbers and strings evaluate to themselves, as they should, and if we were to add other
types of expression later on, they too would by default evaluate to themselves.
Remember that LookUp() from PScm::Env expects a symbol object as argument and calls its value()
method to get a string that it can then use to retrieve the actual value from the hash representing the
environment.
The rest() method of PScm::Expr::List returns all but the first component of the list as a new
PScm::Expr::List object:
It’s surprisingly simple. a PScm::Expr::List just evaluates its first element (Line 64). That should
return one of PScm::Primitive::Multiply, PScm::Primitive::Subtract or PScm::SpecialForm::
If, which gets assigned to $op. Of course because we’re not doing any error checking, first() could
return anything, so we’re assuming valid input.
Because PScm::Expr::List’s Eval() does not know or care whether the operation $op it derived on
Line 64 is a simple primitive or a special form, on Line 65 it passes the rest of itself (the list of arguments)
unevaluated to that operations Apply() method which applies itself to those arguments. Each individual
operation’s Apply() method will decide whether or not to evaluate its arguments, and what to do with
them afterwards3 .
3
Lisp purists might raise an eyebrow at this point, because Eval() is supposed to know what kind of form it is evaluating
and decide whether or not to evaluate the arguments. But this is an object-oriented application, and it makes much more
sense to leave that decision to the objects that need to know.
3.6. PRIMITIVE OPERATIONS 27
So we’ve seen how PScm::Expr objects evaluate themselves. In particular we’ve seen how a list
evaluates itself by evaluating its first component to get a primitive operation or special form, then calling
that object’s Apply() method with the rest of the list, unevaluated, as argument. Next we’re going to
look at one of those Apply() methods, the PScm::Primitive Apply() method.
On Line 10 it extracts the arguments to the operation from the $form by calling the $form’s value()
method. $form is a PScm::Expr::List and we’ve already seen that the value() method for a list object
dereferences and returns the underlying list. Then, on Line 11, Apply() evaluates each argument by
mapping a call to each one’s Eval() method. Finally, on Line 12, it passes the resulting list of evaluated
arguments to a private apply() method and returns the result.
apply() is implemented differently by each primitive operation. So each primitive operation—each
subclass of PScm::Primitive—only needs an apply() method which will be called with a list of already
evaluated arguments.
The apply() in PScm::Primitive::Multiply is very straightforward. It simply multiplies its ar-
guments together and returns the result as a new PScm::Expr::Number. Note that, somewhat ac-
cidentally, if only given one argument it will simply return it, and if given no arguments it will return
1.
On Line 31 the rather convoluted trick to get an initial value will work whether or not the underlying
implementation of PScm::Expr::Number uses Math::BigInt or not.
The check type() method in the base class just saves us some typing, since we are checking the
type of argument to the primitive:
That’s all the primitive operations we support. There are a whole host of others that could trivially be
added here and it might be entertaining to add them, but all the really interesting stuff is happening
over in the special forms, discussed next.
For special forms, the Apply() method is in the individual operation’s class. On Line 15 PScm::
SpecialForm::If’s Apply() method extracts the condition, the expression to evaluate if the condition is
true, and the expression to evaluate if the condition is false, from the argument $form. Then on Line 17
it evaluates the condition, and calls the result’s isTrue() method to determine which branch to evaluate:
012 sub Apply {
013 my ($self, $form) = @_;
014
015 my ($condition, $true_branch, $false_branch) = $form->value;
016
017 if ($condition->Eval()->isTrue) {
018 return $true_branch->Eval();
019 } else {
020 return $false_branch->Eval();
021 }
022 }
If the condition is true, PScm::SpecialForm::If::Apply() evaluates and returns the true branch (Line
18), otherwise it evaluates and returns the false branch (Line 20). The decision of what is true or false
is delegated to an isTrue() method. The one and only isTrue() method is defined in PScm/Expr.pm
right at the top of the data type hierarchy, in the PScm::Expr class as:
007 sub isTrue {
008 my ($self) = @_;
009 scalar($self->value);
010 }
Remembering that value() just dereferences the underlying list or scalar, isTrue() then pretty much
agrees with Perl’s idea of truth, namely that zero, the empty string, and the empty list are false,
everything else is true4 .
•
That really is all there is to evaluation. Next we’re going to take a look at the print system.
3.8 Output
After Eval() returns the result to the repl, ReadEvalPrint() calls the result’s Print() method with
the output handle as argument. That method is defined in PScm.pm
048 sub Print {
049 my ($self, $outfh) = @_;
050 print $outfh $self->as_string, "\n";
051 }
All it does is print the string representation of the object obtained by calling its as string() method.
A fallback as string() method is provided in this class at Line 53.
4
This differs from a true Scheme implementation where special boolean values #t and #f represent truth and falsehood,
and everything else is true. The reason for having an isTrue() is to encapsulate the chosen behaviour. If we wanted to
change the meaning of truth, we need only do so here.
30 CHAPTER 3. INTERPRETER VERSION 0.0.0
It just returns the class name of the object. This is needed occasionally in the case where internals such
as primitive operations might be returned by the evaluator, for example:
> *
PScm::Primitive::Multiply
But that is an unusual and usually unintentional situation. The main as string() methods are strate-
gically placed around the by now familiar PScm::Expr hierarchy, as shown in Figure 3.5.
PScm::Expr
Pscm::Expr::List PScm::Expr::Atom
as_string as_string
PScm::Expr::Literal PScm::Expr::Symbol
PScm::Expr::String PScm::Expr::Number
as_string
Finally, PScm::Expr::String’s as string() method at Lines 97-104 overrides the one in PScm::Expr::
Atom because it needs to put back any backslashes that the parser took out, and wrap itself in double
quotes.
3.9 Summary
We’re finally in a position to understand the whole of PScm::Expr as shown in Listing 3.11.7 on page 43.
The final version of our diagram, with all of the methods from PScm::Expr in place is shown in
Figure 3.6 on the following page.
That may seem like a lot of code for what is effectively just a pocket calculator6 , but what has been
done is to lay the groundwork for a much more powerful set of language constructs that will be added
in subsequent chapters. Let’s recap with an overview of the whole thing.
• A global environment is set up in the PScm package, containing bindings for defined operations.
• The top-level read-eval-print loop (repl) in the PScm package creates a PScm::Read object and
repeatedly calls its Read() method.
• That Read() method returns PScm::Expr objects which the repl evaluates. It evaluates them by
calling their Eval() method.
PScm::Expr
isTrue
Eval
value
Pscm::Expr::List PScm::Expr::Atom
new new
value value
first as_string
rest
as_string
Eval
PScm::Expr::Literal PScm::Expr::Symbol
Eval
PScm::Expr::String PScm::Expr::Number
as_string new
∗ PScm::Primitive objects share an Apply() method that evaluates each of the arguments
and then passes them to the individual primitive’s private apply() method.
∗ PScm::SpecialForm objects each have their own Apply() method that decides whether,
and how, to evaluate the arguments.
• The repl then takes the result of the evaluation and calls its Print() method, which is defined in
the PScm base class.
– That Print() method just calls $self->as string() and prints the result.
∗ The PScm::Expr::Atom class has an as string() method that returns the underlying
scalar, but PScm::Expr::String provides an override that wraps the result in double
quotes.
∗ The PScm::Expr::List class has an as string() method that recursively calls as -
string() on its components and returns the result wrapped in braces.
At the heart of the whole interpreter is the dynamic between Eval() which evaluates expressions, and
Apply() which applies operations to their arguments.
3.10. TESTS 33
3.10 Tests
The test module for our first version of the interpreter is in Listing 3.11.8 on page 46. The PScm::Test
package shown in Listing 3.11.9 on page 47 provides an eval ok() sub which takes a string expression,
writes it out to a file, and calls ReadEvalPrint() on it, with the output redirected to another file. It
then reads that output back in and compares it to its second argument7 . The various simple tests just
exercise the system.
To allow users to play a little more with the interpreter, there’s a tiny interactive shell that requires
Term::ReadLine::Gnu and the libreadline library. It’s in t/interactive and can be run, without
installing the interpreter, by doing:
from the root of any version of the distribution. It’s short enough to show here in its entirety, in
Listing 3.11.10 on page 48.
7
Ok, I should have used IO::String, so sue me.
34 CHAPTER 3. INTERPRETER VERSION 0.0.0
3.11 Listings
3.11.1 PScm.pm
001 package PScm;
002
003 use strict;
004 use warnings;
005 use PScm::Read;
006 use PScm::Env;
007 use PScm::Primitive;
008 use PScm::SpecialForm;
009 use FileHandle;
010
011 require Exporter;
012
013 our @ISA = qw(Exporter);
014 our @EXPORT = qw(ReadEvalPrint);
015
016 =head1 NAME
017
018 PScm - Scheme-like interpreter written in Perl
019
020 =head1 SYNOPSIS
021
022 use PScm;
023 ReadEvalPrint($in filehandle[, $out filehandle]);
024
025 =head1 DESCRIPTION
026
027 Just messing about, A toy lisp interpreter.
028
029 =cut
030
031 our $GlobalEnv = new PScm::Env(
032 ’*’ => new PScm::Primitive::Multiply(),
033 ’-’ => new PScm::Primitive::Subtract(),
034 if => new PScm::SpecialForm::If(),
035 );
036
037 sub ReadEvalPrint {
038 my ($infh, $outfh) = @ ;
039
040 $outfh ||= new FileHandle(">-");
041 my $reader = new PScm::Read($infh);
042 while (defined(my $expr = $reader->Read)) {
043 my $result = $expr->Eval();
044 $result->Print($outfh);
045 }
046 }
047
048 sub Print {
049 my ($self, $outfh) = @ ;
3.11. LISTINGS 35
3.11.2 PScm/Env.pm
001 package PScm::Env;
002
003 use strict;
004 use warnings;
005 use base qw(PScm);
006
007 sub new {
008 my ($class, %bindings) = @ ;
009
010 bless { bindings => {%bindings}, }, $class;
011 }
012
013 sub LookUp {
014 my ($self, $symbol) = @ ;
015
016 if (exists($self->{bindings}{ $symbol->value })) {
017 return $self->{bindings}{ $symbol->value };
018 } else {
019 die "no binding for @{[$symbol->value]} ",
020 "in @{[ref($self)]}\n";
021 }
022 }
023
024 1;
3.11. LISTINGS 37
3.11.3 PScm/Read.pm
001 package PScm::Read;
002
003 use strict;
004 use warnings;
005 use PScm::Expr;
006 use PScm::Token;
007 use base qw(PScm);
008
009 sub new {
010 my ($class, $fh) = @ ;
011 bless {
012 FileHandle => $fh,
013 Line => ’’,
014 }, $class;
015 }
016
017 sub Read {
018 my ($self) = @ ;
019
020 my $token = $self-> next token();
021 return undef unless defined $token;
022
023 return $token unless $token->is open token;
024
025 my @res = ();
026
027 while (1) {
028 $token = $self->Read;
029 die "unexpected EOF"
030 if !defined $token;
031 last if $token->is close token;
032 push @res, $token;
033 }
034
035 return new PScm::Expr::List(@res);
036 }
037
038 sub next token {
039 my ($self) = @ ;
040
041 while (!$self->{Line}) {
042 $self->{Line} = $self->{FileHandle}->getline();
043 return undef unless defined $self->{Line};
044 $self->{Line} =~ s/^\s+//s;
045 }
046
047 for ($self->{Line}) {
048 s/^\(\s*// && return PScm::Token::Open->new();
049 s/^\)\s*// && return PScm::Token::Close->new();
050 s/^([-+]?\d+)\s*//
051 && return PScm::Expr::Number->new($1);
38 CHAPTER 3. INTERPRETER VERSION 0.0.0
3.11.4 PScm/Token.pm
001 package PScm::Token;
002
003 use strict;
004 use warnings;
005 use base qw(PScm);
006
007 sub is open token { 0 }
008 sub is close token { 0 }
009
010 ##########################
011 package PScm::Token::Open;
012
013 use base qw(PScm::Token);
014
015 sub is open token { 1 }
016
017 ###########################
018 package PScm::Token::Close;
019
020 use base qw(PScm::Token);
021
022 sub is close token { 1 }
023
024 1;
40 CHAPTER 3. INTERPRETER VERSION 0.0.0
3.11.5 PScm/Primitive.pm
001 package PScm::Primitive;
002
003 use strict;
004 use warnings;
005 use base qw(PScm::Expr);
006
007 sub Apply {
008 my ($self, $form) = @ ;
009
010 my @unevaluated args = $form->value;
011 my @evaluated args = map { $ ->Eval() } @unevaluated args;
012 return $self-> apply(@evaluated args);
013 }
014
015 sub check type {
016 my ($self, $thing, $type) = @ ;
017
018 die "wrong type argument(", ref($thing), ") to ", ref($self),
019 "\n"
020 unless $thing->isa($type);
021 }
022
023 ##################################
024 package PScm::Primitive::Multiply;
025
026 use base qw(PScm::Primitive);
027
028 sub apply {
029 my ($self, @args) = @ ;
030
031 my $result = PScm::Expr::Number->new(1)->value();
032
033 while (@args) {
034 my $arg = shift @args;
035 $self-> check type($arg, ’PScm::Expr::Number’);
036 $result *= $arg->value;
037 }
038
039 return new PScm::Expr::Number($result);
040 }
041
042 ##################################
043 package PScm::Primitive::Subtract;
044
045 use base qw(PScm::Primitive);
046
047 sub apply {
048 my ($self, @args) = @ ;
049
050 unshift @args, PScm::Expr::Number->new(0) if @args < 2;
051
3.11. LISTINGS 41
3.11.6 PScm/SpecialForm.pm
001 package PScm::SpecialForm;
002
003 use strict;
004 use warnings;
005 use base qw(PScm::Expr);
006
007 ##############################
008 package PScm::SpecialForm::If;
009
010 use base qw(PScm::SpecialForm);
011
012 sub Apply {
013 my ($self, $form) = @ ;
014
015 my ($condition, $true branch, $false branch) = $form->value;
016
017 if ($condition->Eval()->isTrue) {
018 return $true branch->Eval();
019 } else {
020 return $false branch->Eval();
021 }
022 }
023
024 1;
3.11. LISTINGS 43
3.11.7 PScm/Expr.pm
001 package PScm::Expr;
002
003 use strict;
004 use warnings;
005 use base qw(PScm::Token);
006
007 sub isTrue {
008 my ($self) = @ ;
009 scalar($self->value);
010 }
011
012 sub Eval {
013 my ($self) = @ ;
014 return $self;
015 }
016
017 sub value { $ [0] }
018
019 #########################
020 package PScm::Expr::Atom;
021 use base qw(PScm::Expr);
022
023 sub new {
024 my ($class, $value) = @ ;
025 bless \$value, $class;
026 }
027
028 sub value { ${ $ [0] } }
029
030 sub as string { $ [0]->value }
031
032 #########################
033 package PScm::Expr::List;
034 use base qw(PScm::Expr);
035
036 sub new {
037 my ($class, @list) = @ ;
038
039 $class = ref($class) || $class;
040 bless [@list], $class;
041 }
042
043 sub value { @{ $ [0] } }
044
045 sub first { $ [0][0] }
046
047 sub rest {
048 my ($self) = @ ;
049
050 my @value = $self->value;
051 shift @value;
44 CHAPTER 3. INTERPRETER VERSION 0.0.0
104 }
105
106 1;
46 CHAPTER 3. INTERPRETER VERSION 0.0.0
3.11.8 t/PScm.t
001 use strict;
002 use warnings;
003 use Test::More;
004 use lib ’./t/lib’;
005 use PScm::Test tests => 10;
006
007 BEGIN { use ok(’PScm’) }
008
009 eval ok(’1’, ’1’, ’numbers’);
010 eval ok(’+1’, ’1’, ’explicit positive numbers’);
011 eval ok(’-1’, ’-1’, ’negative numbers’);
012 eval ok(’"hello"’, ’"hello"’, ’strings’);
013 eval ok(’(* 2 3 4)’, ’24’, ’multiplication’);
014 eval ok(’(- 10 2 3)’, ’5’, ’subtraction’);
015 eval ok(’(- 10)’, ’-10’, ’negation’);
016 eval ok(’(if (* 0 1) 10 20)’, ’20’, ’simple conditional’);
017 eval ok(<<EOT, <<EOR, ’no overflow’);
018 (* 1234567890987654321 1234567890987654321)
019 EOT
020 1524157877457704723228166437789971041
021 EOR
022
023 # vim: ft=perl
3.11. LISTINGS 47
3.11.9 t/lib/PScm/Test.pm
001 package PScm::Test;
002 use strict;
003 use warnings;
004 use FileHandle;
005 require Exporter;
006
007 our @ISA = qw(Exporter);
008 our @EXPORT = qw(eval ok evaluate);
009
010 my $Test = Test::Builder->new;
011
012 sub import {
013 my ($self) = shift;
014 my $pack = caller;
015 $Test->exported to($pack);
016 $Test->plan(@ );
017
018 $self->export to level(1, $self, ’eval ok’);
019 $self->export to level(1, $self, ’evaluate’);
020 }
021
022 sub eval ok {
023 my ($expr, $expected, $name) = @ ;
024 my $result = evaluate($expr);
025 $result .= "\n" if $expected =~ /\n/;
026 $Test->is eq($result, $expected, $name);
027 }
028
029 sub evaluate {
030 my ($expression) = @ ;
031
032 my $fh = new FileHandle("> junk");
033 $fh->print($expression);
034 $fh = new FileHandle(’< junk’);
035 my $outfh = new FileHandle("> junk2");
036 PScm::ReadEvalPrint($fh, $outfh);
037 $fh = 0;
038 $outfh = 0;
039 my $res = ‘cat junk2‘;
040 chomp $res;
041 unlink(’junk’);
042 unlink(’junk2’);
043
044 # warn "# [$res]\n";
045 return $res;
046 }
047
048 1;
48 CHAPTER 3. INTERPRETER VERSION 0.0.0
3.11.10 t/interactive
001 use PScm;
002
003 package GetLine;
004
005 use Term::ReadLine;
006
007 sub new {
008 my ($class) = @ ;
009 bless {
010 term => new Term::ReadLine(’PScheme’),
011 }, $class;
012 }
013
014 sub getline {
015 my ($self) = @ ;
016 $self->{term}->readline(’> ’);
017 }
018
019 package main;
020
021 my $in = new GetLine();
022
023 ReadEvalPrint($in);
024
025 # vim: ft=perl
Implementing let
let allows the extension of the environment, temporarily, to include new bindings of symbols to data.
let was introduced in Section 2.6 on page 12 but as a quick reminder, here it is in action:
Of course the Environment that has been described so far is not extensible, so the first thing to do is to
look at how we might change the environment package to allow extension.
49
50 CHAPTER 4. IMPLEMENTING LET
Perl lists have push and pop operations, so we could easily use those with a simple array representing
the stack. Alternatively we could keep a current “top of stack” index, and increment that to push, or
decrement it to pop, something like:
sub push {
my ($self, $frame) = @_;
$self->{stack}[$self->{index}] = $frame;
++$self->{index};
}
sub pop {
my ($self) = @_;
--$self->{index};
die "stack underflow" if $self->{index} < 0;
return $self->{stack}[$self->{index}];
}
This has the minor advantage of not immediately loosing what was previously on the top of the stack
after a pop().
The major drawback of a stack is that it is a linear structure, and extending the stack again necessarily
obliterates what was previously there, see Figure 4.1. If we plan at a later stage to support closure, where
functions hang on to their environments after control has left them, then a stack is obviously inadequate
unless some potentially complex additional code protects and copies those vunerable environment frames.
The only change is on Lines 30–31 (in bold) where if LookUp() can’t find the symbol in the current
environment frame it looks in its parent frame, if it has one.
The PScm::Env::new() method is little changed, it additionally checks the argument class in case
new() is being called as an object method (which it will be), and adds a parent field to the object, with
an initial zero value meaning “no parent”.
Finally we need an Extend() method of PScm::Env that will create a new environment from an existing
one by creating a new frame with the new bindings, and setting the new frame’s parent to be the original
environment.
Because the Extend() method will be used by let and other constructs later, it takes a reference to an
array of symbols and a reference to an array of values, rather than the simple %initial hash that new()
takes. On Line 18 It maps the symbols to a list of strings, then on Line 19 it uses those strings as keys
in a hash mapping them to their equivalent values. On Line 20, creates a new environment with that
hash. Finally on Line 21 it sets that new environment’s parent to be the original environment $self and
returns the new environment.
It will have already extended the environment with the bindings for a and b so the global environment
at that point will look like Figure 4.3.
PScm::Env
"a" PScm::Expr::Number 10
"b" PScm::Expr::Number 20
parent
PScm::Env
"let" PScm::SpecialForm::Let
"if" PScm::SpecialForm::If
"*" PScm::Primitive::Multiply
"-" PScm::Primitive::Subtract
Now, consider what let might have had to do to extend the environment and might have to do to restore
it again afterwards:
4.3. ENVIRONMENT PASSING 53
2. Call Extend() on $PScm::GlobalEnv to get a new one with a and b appropriately bound;
There’s something not quite right there, something ugly. We’ve made it the responsibility of let to
restore that previous environment, and if we go down that road, all of the other operations that extend
environments will similarly be required to restore the environment for their callers. There’s another bit
of ugliness too, the simple existence of a global variable. It’s the only one in our application. Does it
have to be there? What could replace it?
1. Call Extend() on the argument environment to get a new one with a and b appropriately bound;
2. Return the result of calling Eval() on the expression (+ a b) with the new environment as argu-
ment.
Now you should remember from Section 3.5 on page 25 that there are three implementations of Eval(),
one in PScm::Expr, one in PScm::Expr::Symbol and one in PScm::Expr::List. Each of these must
deal with the extra environment argument they are now being passed.
The default Eval() method for literal atoms (strings, numbers and others) is functionally unchanged.
It ignores any argument environment because literals evaluate to themselves.
The Eval() method for symbols now uses the argument environment rather than a global one in which
to lookup its value:
The Eval() method for lists (expressions) is little changed either, it evaluates the operation in the current
(argument) environment then calls the operation’s Apply() method, passing the current environment as
an additional argument.
So Apply() must change too. As shown earlier, There is one Apply() method for all PScm::Primitive
classes, which evaluates all of the arguments to the primitive operation then calls the operation’s pri-
vate apply() method with its pre-evaluated arguments. That needs to change only to evaluate those
arguments in the argument environment:
4.4. LET ITSELF 55
Note particularly that there is no need to pass the environment to the private apply() method: since all
its arguments are already evaluated it has no need of an environment to evaluate anything in. Therefore
the primitive multiply and subtract operations are unchanged from the previous version of the interpreter.
The Apply() method for special forms is separately implemented by each special form. In our previous
interpreter there was only one special form: if, so let’s take a look at how that has changed.
Pretty simple, The only change is that PScm::SpecialForm::If::Apply() passes its additional argu-
ment $env to each call to Eval().
(hsymboli hexpressioni)
It starts off at Line 15 extracting the bindings and body from the argument form. Then it sets up two
empty lists to collect the symbols and the values (Line 16) separately. Then in the loop on Lines 18-22
it iterates over each binding collecting the unevaluated symbol in one list and the evaluated argument
in the other. Finally on Line 24 it calls the body’s Eval() method with an extended environment where
those symbols are bound to those values.
Line 24 encapsulates our new simple definition for let quite concisely. The environment created by
Extend() is passed directly to Eval() and the result of calling Eval() is returned directly.
4.5 Summary
In order to get let working easily we had to make two changes to the original implementation. Firstly
adding an Extend method to the PScm::Env class to create new environments, and secondly altering
the various Eval() and Apply() methods to pass the environment as argument rather than using a
global environment. Having done that, the actual implementation of let was trivial.
4.6 Tests
Rather than adding more tests to t/PScm.t, There’s a new t/PScm Let.t which you can see in List-
ing 4.7.1 on the facing page. It adds two tests, the first just tests that a value bound by a let expression
is available in the body of the let, and the second proves that the body of the let can be an arbitrary
expression.
4.7. LISTINGS 57
4.7 Listings
4.7.1 t/PScm Let.t
001 use strict;
002 use warnings;
003 use Test::More;
004 use lib ’./t/lib’;
005 use PScm::Test tests => 3;
006
007 BEGIN { use ok(’PScm’) }
008
009 eval ok(’(let ((x 2)) x)’, ’2’, ’simple let’);
010
011 eval ok(<<EOF, ’20’, ’conditional evaluation’);
012 (let ((a 00)
013 (b 10)
014 (c 20))
015 (if a b c))
016 EOF
017
018 # vim: ft=perl
Implementing lambda
Having derived an environment passing interpreter in version 0.0.1, the addition of functions, specifically
closures, becomes much more tractable.
So far the text has been pretty relaxed about the uses of the words function and closure, which is ok
because in PScheme and Perl all functions are in fact closures. But before we go any further we’d better
have a clearer definition of what a closure is, and what the difference between a function and a closure
might be.
First of all what precisely is a function? On consideration, functions are a lot like let expressions:
they both extend an environment then execute an expression in the extension. A let expression extends
an environment with key-value pairs, then evaluates its body in that new environment. A function
extends an environment by binding its formal arguments to its actual arguments1 then evaluates its
body in that new environment. For example, assuming the definition:
(square 4)
the global environment would be extended with a binding of x to 4, and the body of the function, (* x
x) would be evaluated in that new environment, as in Figure 5.1 on the next page.
Now, a closure is simply a function that when executed will extend the environment that was current at
the time the closure was created. Consider an example we’ve seen before.
The lambda expression is being executed in an environment where n is bound to 2. The result of that
lambda expression, a closure, is also the result of the let expression and therefore gets bound to the
symbol times2 in the global environment.
1
Formal arguments are the names that a function gives to its arguments. Actual arguments are the values passed to a
function.
59
60 CHAPTER 5. IMPLEMENTING LAMBDA
(* x x)
Env2
x 4
(square 4)
Env1
let
lambda
if
*
-
square
Now when times2 is called, the closure body (* n x) must execute in an environment where n is
still bound to 2, as in Figure 5.2.
(* n x)
Env3
x 4
n 2
(times2 4)
Env1
let
lambda
if
*
-
times2
So referring to that figure: let extended the global environment to Env2 with a binding of n to 2. Then
the closure, when it was created by lambda in Env2, must have somehow held on to, or “captured” Env2,
so that when the closure is later executed Env2 is the one that it extends to Env3 with its own argument
x.
5.1. LAMBDA 61
A function which is not a closure would have to pick a different environment to extend. It could
choose the environment it is being executed in but that would cause horrendous confusion: any variables
referred to in the function body that were not declared by the function might pick up values randomly
from the callers environment. Alternatively it could extend the global environment. The latter choice
is the standard one for non-closure implementations, but as already noted all functions in PScheme are
closures (and there are no advantages to them not being closures) so we don’t have to worry about that.
So we’re going to continue to use the words function and closure pretty much interchangeably, but
when we use the word function we’re emphasizing the functional aspects of the object under discussion,
and when we use the word closure, we’re emphasizing its environmental behaviour.
When considering the actual implementation of closures (functions) there are two parts to the story.
The first part is how lambda creates a closure, and the second is how the closure gets evaluated when it
is called. In the next section we’ll look at the first part, how lambda creates a closure.
5.1 lambda
We need a good, simple example of closures in action. The following example fits our purposes, but is a
bit more complicated than the examples we’ve seen so far:
This example is not much different from our earlier times2 example, except that an outer let provides
bindings for both times2 and a variable a that will be argument to times2. It is however just a little
tricky, so in detail:
• The outer let reads: “let a be 4 and times2 be the result of evaluating the inner let in the
expression (times2 a).”
• The value of that inner let is the result of evaluating that lambda expression and thus a closure,
and that is what gets bound to the symbol times2 by the outer let.
• When (times2 a) is evaluated, the closure bound to times2 can still “see” the variable n from
the environment that was current when it was created, and so the body of the closure, (* x n),
wilth x bound to 4 and n bound to 2, produces the expected result 8.
Just to be absolutely sure that semantics of that expression are well understood, here is an equivalent in
Perl:
{
my $a = 4;
my $times2 = do {
62 CHAPTER 5. IMPLEMENTING LAMBDA
my $n = 2;
sub {
my ($x) = @_;
$x * $n;
}
};
$times2->($a);
}
Now we’re going to walk through the execution of the PScheme statement in a lot more detail, considering
what the interpreter is actually doing to produce the final result.
The very first thing that happens when evaluating our PScheme example is that the outer let
evaluates the number 4 in the global environment. It does not yet bind that value to a, it first must
evaluate the expression that will be bound to times2.
The next thing that happens is the outer let initiates the evaluation of the inner let. The inner let
extends the global environment with a binding of n to 2, as hilighted in the following code and shown in
Figure 5.3 on the next page.
Then, in that new environment, labelled Env2, the let evaluates the lambda expression:
Evaluating a lambda expression is just the same as evaluating any other list expression, its (unevaluated)
arguments are passed to its Apply() method, along with the current environment. In our example the
arguments to the lambda’s Apply() would be:
5.1. LAMBDA 63
Env2
n 2
creates
(let ((a 4)
(times2 Env1
(let ((n 2))
(lambda (x) let
(* x n)))))
lambda
(times2 a))
if
*
-
2. the current environment, Env2 that the let just created, with a binding of n to 2.
To start to make this happen we first need to add a new subclass of PScm::SpecialForm, rather
unsurprisingly called PScm::SpecialForm::Lambda, and we need to add a binding from the symbol
lambda to an object of that class in the initial environment. Firstly, here’s ReadEvalPrint() with the
additional binding:
The only change is the addition of Line 43 with the new binding for lambda.
Now we can look at that new package PScm::SpecialForm::Lambda. All its Apply() method has
to do is to store the details of the function definition and the current environment in another new type
of object representing the closure:
045 package PScm::SpecialForm::Lambda;
046
047 use base qw(PScm::SpecialForm);
048 use PScm::Closure;
049
050 sub Apply {
051 my ($self, $form, $env) = @_;
052
053 my ($args, $body) = $form->value;
054 return PScm::Closure::Function->new($args, $body, $env);
055 }
056
057 1;
On Line 53 it unpacks the formal arguments (i.e. (x)) and body ((* x n)) of its argument $form (the
arguments to the lambda expression) and on Line 54 it returns a new PScm::Closure::Function object
containing those values and, most importantly, also containing the current environment (Env2 in our
example.)
That PScm::Closure::Function::new() method (actually in PScm::Closure) does no more than
bundle its arguments:
007 sub new {
008 my ($class, $args, $body, $env) = @_;
009
010 bless {
011 args => $args,
012 body => $body,
013 env => $env,
014 }, $class;
015 }
So in our example it is Env2 that is captured, along with the arguments and body of the function, in
the resulting closure. This is shown in Figure 5.4 on the facing page.
As we’ve noted, the value of the inner let expression is that new Closure object, and next the outer
let recieves the value of the inner let, and extends the global environment with a binding of times2 to
that. It also binds a to 4:
> (let ((a 4)
> (times2
> (let ((n 2))
> (lambda (x)
> (* x n)))))
> (times2 a))
8
5.1. LAMBDA 65
Closure creates
args (x)
(let ((a 4)
body (* x n) Env2 (times2
env (let ((n 2))
n 2 (lambda (x)
(* x n)))))
creates (times2 a))
(let ((a 4)
(times2 Env1
(let ((n 2))
(lambda (x) let
(* x n)))))
lambda
(times2 a))
if
*
-
Env3
Closure creates
a 4
times2 args (x)
(let ((a 4)
body (* x n) Env2 (times2
env (let ((n 2))
n 2 (lambda (x)
(* x n)))))
(times2 a))
creates
creates
Now at this point the only thing hanging on to the old Env2, where n has a value, is that Closure, and
the only thing hanging on to the Closure is the binding for times2 in Env3 (the code for the Apply()
method of the outer let is currently holding on to Env3.)
Having created Env3, the outer let evaluates its body, (times2 a) in that environment.
66 CHAPTER 5. IMPLEMENTING LAMBDA
That brings us to the second part of our story, how a function (a closure) gets evaluated.
First of all, on Line 46 it evaluates each component of the form (each argument to the function) with
map, passing the argument $env (Env3) to each call to Eval(). After all, closures are functions, and
functions take their arguments evaluated.
At Line 47 our closure’s Apply() returns the result of calling a separate apply() method on those
evaluated arguments, much as primitive operations do. Note particularily that it does not pass its
argument $env to the private apply() method.
The private apply() method is in the parent PScm::Closure class2 :
This apply() method does not need an argument environment because the correct environment to
extend is the one that was captured when the the closure object was created. On Line 24 It extends that
2
Why? Because a later version of the interpreter will support more than one type of closure.
5.2. EVALUATING A CLOSURE 67
previously captured environment with bindings from its formal arguments, also collected when the closure
object was created (i.e. x), to the actual arguments it was passed (i.e. 4, already evaluated). Because the
arguments are already evaluated, it must call a new variant of PScm::Env::Extend() called Extend-
Unevaluated(), which does just that. Lastly apply() evaluates its body (the body of the function, (*
x n)) passing that extended environment as argument and returns the result (Line 26).
Returning to our example, we’re still considering the evaluation of the subexpression (times2 a).
As we’ve said the closure’s Apply() method evaluates its argument a in the environment it was passed,
Env3, resulting in 4. But it is the captured environment, Env2, that the closure extends with a binding
of x to 4, resulting in Env4 (Figure 5.6). It is in Env4, with x bound to 4 and n still bound to 2, that
the closure executes the body of the function (* x n).
(5)
(let ((a 4)
Env4 (times2
(let ((n 2))
creates
x 4 (lambda (x)
(4) (* x n)))))
(let ((a 4) (times2 a))
(times2 Env3
(let ((n 2)) Closure creates
(lambda (x) a 4
(* x n))))) (2)
times2 args (x)
(times2 a)) (let ((a 4)
body (* x n) Env2 (times2
env (let ((n 2))
n 2 (lambda (x)
(* x n)))))
(times2 a))
creates
creates
(3)
(let ((a 4) (let ((a 4) (1)
(times2 Env1 (times2
(let ((n 2)) (let ((n 2))
(lambda (x) let (lambda (x)
(* x n))))) (* x n)))))
lambda
(times2 a)) (times2 a))
if
*
-
Figure 5.6 pretty much tells the whole story. Here’s our example one last time so it can be walked
through referring to the figure:
(let ((a 4)
(times2
(let ((n 2))
(lambda (x)
(* x n)))))
(times2 a))
• At (1) in the figure, the inner let extends the global env Env1 with a binding of n to 2 producing
Env2.
68 CHAPTER 5. IMPLEMENTING LAMBDA
• At (2) the inner let then evaluates the lambda expression in the context of Env2, creating a Closure
that captures Env2.
• At (3), the outer let extends the global environment Env1 with bindings of a to 4 and times2 to
the value of the inner let: the Closure.
• At (4) the outer let evaluates the subexpression (times2 a) in the context of Env3. In this
environment times2 evaluates to the closure, and its Apply() evaluates a in the same environment
Env3 where it evaluates to 4.
• Finally, at (5), the closure extends the originally captured environment Env2 with a binding of x
to 4 producing Env4 and evaluates its body, (* x n), in this environment.
This is trivially accomplished by overriding the default as string() method in PScm::Closure. Here’s
that override.
All it does is to construct a new PScm::Expr::List containing the symbol that constructed the closure
(lambda) the formal arguments to the closure and the body of the closure. It then calls that list’s as -
string() method and returns the result. The aquisition of the lambda symbol is deferred to a separate
method symbol() in PScm::Closure::Function (again because later later versions of the interpreter
will have different kinds of closures). Here’s symbol().
Job done. Closures, when printed, will now produce a useful representation of the function they perform.
5.4. SUMMARY 69
5.4 Summary
Hopefully the power, flexibility and elegance of an environment-passing interpreter combined with a
linked-list environment implementation is becoming apparent. The enormous advantage over a stack
discipline is that individual environments need not go away just because a particular construct returns.
They can hang around as long as they are needed and garbage collection will remove them when the
time comes. Without further ado then, here’s the full source for our new PScm::Closure package in
Listing 5.6.1 on the next page.
5.5 Tests
You can see the tests for the lambda form in a new file, t/PScm Lambda.t, in Listing 5.6.2 on page 72.
The first test exercizes the simple creation of a lambda expression, its binding to a symbol, and its
application to arguments. The second works through pretty much exactly the example we’ve been
working through. The third starts to flex the muscles of our nascent interpreter a little more. It creates
a local makemultiplier function that when called with an argument n will return another function that
will multiply n by its argument. It then binds the result of calling (makemultiplier 3) to times3 and
calls (times3 5), confirming that the result is 15, as expected. Incidentally, this demonstrates that the
environment created by a lambda expression is equally ameanable to capture by a closure.
We could rewrite the body of that last test in Perl as follows:
{
my $times3 = do {
my $makemultiplier = sub {
my ($n) = @_;
return sub {
my ($x) = @_;
return $n * $x;
}
};
$makemultiplier->(3);
};
$times3->(5);
}
70 CHAPTER 5. IMPLEMENTING LAMBDA
5.6 Listings
5.6.1 PScm/Closure.pm
001 package PScm::Closure;
002
003 use strict;
004 use warnings;
005 use base qw(PScm);
006
007 sub new {
008 my ($class, $args, $body, $env) = @ ;
009
010 bless {
011 args => $args,
012 body => $body,
013 env => $env,
014 }, $class;
015 }
016
017 sub args { $ [0]->{args}->value }
018 sub body { $ [0]->{body} }
019 sub env { $ [0]->{env} }
020
021 sub apply {
022 my ($self, @args) = @ ;
023
024 my $extended env =
025 $self->env->ExtendUnevaluated([$self->args], [@args]);
026 return $self->body->Eval($extended env);
027 }
028
029 sub as string {
030 my ($self) = @ ;
031 return PScm::Expr::List->new(
032 $self-> symbol,
033 $self->{args},
034 $self->{body}
035 )->as string;
036 }
037
038 ################################
039 package PScm::Closure::Function;
040
041 use base qw(PScm::Closure);
042
043 sub Apply {
044 my ($self, $form, $env) = @ ;
045
046 my @evaluated args = map { $ ->Eval($env) } $form->value;
047 return $self-> apply(@evaluated args);
048 }
049
5.6. LISTINGS 71
Let’s try a little experiment with the interpreter version 0.0.2. We’ll try to use let to define a recursive
function, the perennial factorial function1 .
It didn’t work. The reason that it didn’t work is obvious, considering how let works.
let evaluates the expression half of its bindings in the enclosing environment, before it binds the
values to the symbols in a new environment, so it is the enclosing environment (the global environment
in this case) that the lambda captures. Now that environment doesn’t have a binding for factorial,
factorial is only visible within the body of the let, so any recursive call to factorial from the body
of the function (closure) is bound to fail.
Putting it another way, let will create a binding for factorial, but only by extending the glob-
al environment after the lambda expression has been evaluated, in the global environment, therefore
capturing the global environment.
So the error is not coming from the call to (factorial 3), it’s coming from the attempted recursive
call to (factorial (- n 1)) inside the body of the factorial definition. The environment diagram in
Figure 6.1 on the next page should help to make that clear.
The let evaluates the lambda expression in the initial environment, Env1 at (1), so that’s the environment
that gets captured by the Closure. Then the let binds the closure to the symbol factorial in an
extended environment Env2, and that’s where the body of the let, (factorial 3), gets evaluated at
(2). Now after evaluating its argument 3 in Env2, the closure proceeds to extend the environment it
captured, the global environment Env1, with a binding of n to 3 producing Env3. It’s in Env3 that
the body of the factorial function gets evaluated at (3). Now n has a binding in that environment, but
unfortunately factorial doesn’t, so the recursive call fails.
1
Factorial(n), often written n!, is n × (n − 1) × (n − 2) × · · · × 1.
73
74 CHAPTER 6. RECURSION AND LETREC
(3) (2)
(factorial (- n 1)) (factorial 3)
Env3 Env2
n 3 factorial
Closure
args (n)
body (if n ...
env
(1)
(let ((factorial
Env1
(lambda (n) ..
6.1 letrec
What we need is a variation of let that arranges to evaluate the values for its bindings in an environment
where the bindings are already in place. Essentially the environments would appear as in Figure 6.2.
(3)
(factorial (- n 1))
Env3
n 3
(2)
(factorial 3)
Env2 Closure
In this figure the closure has been persuaded to capture an environment Env2 containing a binding that
refers back to the closure itself (a circular reference in effect.) In this circumstance any recursive call to
factorial from the body of the closure would work because the closure would have extended Env2 and
its body would execute in a context (Env3) where factorial did have a value.
The special form we’re looking for is called letrec (short for “let recursive”) and it isn’t too tricky,
6.1. LETREC 75
although a bit of a hack. Let’s first remind ourselves how let works.
1. Evaluate the value component of each binding in the current, passed in environment;
2. Create a new environment as an extension of the current one, with those values bound to their
symbols;
3. Evaluate the body of the let in that new environment.
Our variant, letrec, isn’t all that different. What it does is:
1. Create a new extended environment first, with dummy values bound to the symbols;
2. Evaluate the values in that new environment;
3. Assign the values to their symbols in that new environment;
4. Evaluate the body in that new environment.
Obviously if any of the values in a letrec are expressions other than lambda expressions, and they
make reference to other letrec values in the same scope, then there will be problems. Remember that
all lambda does is to capture the current environment along with formal arguments and function body.
It does not immediately evaluate anything in that captured environment. For that reason real letrec
implementations may typically only allow lambda expressions as values. PScheme doesn’t bother making
that check2 .
6.1.1 Assignment
To implement letrec then, we first need to add a method to the environment class PScm::Env that
will allow assignment to existing bindings. Here is that method.
Assign() uses a helper function lookup ref() to actually do the symbol lookup. If lookup ref()
finds a binding, then Assign() puts the new value in place through the reference that lookup ref()
returns. It is an error if there’s not currently such a symbol in the environment. This makes sense
because it keeps the distinction between variable binding and assignment clear: variable binding creates
a new binding; assignment changes an existing one.
lookup ref() is simple enough, it does pretty much what LookUp() does, except it returns a refer-
ence to what it finds, and returns undef, rather than die()-ing if it doesn’t find a value:
2
One possible way to detect this type of error would be to bind dummy objects to the symbols. These dummy objects
would have an Eval() method that would die() with an informative error message if it was ever called.
76 CHAPTER 6. RECURSION AND LETREC
Incidentally, LookUp() itself has been modified and simplified to make use of it.
The common code in PScm::SpecialForm::Let::UnPack() just unpacks the symbols, bindings and
body from the argument $form and returns them:
6.1. LETREC 77
Now, our new Apply() in PScm::SpecialForm::LetRec makes use of that same UnPack() method (by
inheriting from PScm::SpecialForm::Let). It differs from the original Apply() only in that it calls the
environment’s ExtendRecursively() method, rather than Extend().
It creates a new environment by extending the current environment, $self with the symbols bound to
their unevaluated values. Then it calls a new, private method eval values() on the new environment.
Here’s that method:
78 CHAPTER 6. RECURSION AND LETREC
All that does is to loop over all of its bindings, replacing the unevaluated expression with the result of
evaluating the expression in the current environment. Since all of those expressions are expected to be
lambda expressions, the resulting closures capture the environment that they are themselves bound in.
QED.
A careful reader may have realised that a valid alternative implementation of letrec would just
create an empty environment extension, then populate the environment afterwards with an alternative
version of Assign() which did not require the symbols to pre-exist in the environment. The main reason
it is not done that way is that the current behaviour of Assign() is more appropriate for later extensions
to the interpreter.
Just to be complete, here’s the new version of PScm::ReadEvalPrint() with the binding for letrec.
6.2 Summary
Let evaluates the values of its bindings in the enclosing environment. Then it creates an extended
environment with each symbol bound to its value, in which to evaluate the body of the let expression.
This means that recursive lambda expressions defined by let won’t work, because there’s not yet a
binding for the recursive function when the lambda expression is evaluated to create the closure. In
6.3. TESTS 79
order to get recursion to work, we needed to create a variant of let, called letrec (let recursive)
which sets up a dummy environment with stub values for the symbols in which to evaluate the lambda
expressions, so that the lambda expressions could capture that environment in their resulting closures.
Having evaluated those expressions, letrec assigns their values to the existing bindings in the new
environment, replacing the dummy values. Thus when the closure executes later, the environment it
has captured, and which it will extend with its formal arguments bound to actual values, will contain a
reference to the closure itself, so a recursive call is successful.
6.3 Tests
The tests for the letrec form are in t/PScm Letrec.t which you can see in Listing 6.4.1 on the next
page.
There are three tests. The first two, just prove what we already know, that let does not (and should
not) support recursion. The other new test replaces let with letrec and proves that letrec on the
other hand does support recursive function definitions.
80 CHAPTER 6. RECURSION AND LETREC
6.4 Listings
6.4.1 t/PScm Letrec.t
001 use strict;
002 use warnings;
003 use Test::More;
004 use lib ’./t/lib’;
005 use PScm::Test tests => 4;
006
007 BEGIN { use ok(’PScm’) }
008
009 ok(
010 !defined(eval {
011 evaluate(<<EOF) }), ’let does not support recursion’);
012 (let ((factorial
013 (lambda (n)
014 (if n (* n (factorial (- n 1)))
015 1))))
016 (factorial 4))
017 EOF
018
019 is($@, "no binding for factorial in PScm::Env\n",
020 ’let does not support recursion [2]’);
021
022 eval ok(<<EOF, "24", ’letrec and recursion’);
023 (letrec ((factorial
024 (lambda (n)
025 (if n (* n (factorial (- n 1)))
026 1))))
027 (factorial 4))
028 EOF
029
030 # vim: ft=perl
Suppose we have a fairly complicated calculation to make. We’d like to compute intermediate values in
order to simplify our code. For example:
It didn’t work. The error occurs when attempting to evaluate (* a 2). Why? Well in much the same
way as let fails for recursive definitions: because let binds its arguments in parallel, at the point that
it is trying to evaluate (* a 2), it is still doing so in the environment prior to binding a.
letrec can’t help here, because it sets up an environment with dummy values to evaluate its values
in, which is ok if those values are closures that just capture that environment for later, but very bad if
they’re actually going to try to do any immediate evaluations with those dummy values.
That would give us a set of environments as in Figure 7.1 on the following page.
The outer let binds a to 5 to create Env2. The next let evaluates (* a 2) in the context of Env2 and
creates Env3 with a binding of b to the result 10. The innermost let evaluates (- b 3) in Env3 and
binds c to the result, creating Env4. In Env4 the final evaluation of c results in 7, which is the result of
the expression.
81
82 CHAPTER 7. ANOTHER VARIATION ON LET
c
Env4
c 7
creates
b 10
creates
a 5
creates
While that works fine, it’s rather ugly and verbose code. Wouldn’t it be better if there was a variant
of let that did all that for us, binding its variables sequentially? This variant of let is called let*
(let-star) and is found in most lisp implementations.
7.2 let*
To implement let*, in the same way as we implemented letrec, we create a new sub-class of PScm::
SpecialForm::Let and give it an Apply() method, then bind a symbol (let*) to an instance of that
class in the initial environment. Our new class will be called PScm::SpecialForm::LetStar and here’s
that Apply() method.
Again it only differs from the let and letrec implementations of Apply() in the way it extends the
environment it passes to the Eval() of its body. In this case it calls the new PScm::Env method
ExtendIteratively().
This method implements the algorithm we discussed earlier, creating a new environment frame for each
individual binding, and evaluating the value part in the context of the previous environment frame. The
last environment frame, the head of the list of frames rooted in the original environment, is returned by
the method1 .
7.3 Summary
This may all seem a bit academic, but let’s remember that Perl supports both types of variable binding,
let and let*, in the following way.
Parallel assignment like let is done in a list context:
my $a = 5;
my $b = $a * 2;
my $c = $b - 3;
let has its uses, just as assignment in a list context does. For instance with parallel assignment it
becomes possible to swap the values of variables without needing an additional temporary variable. In
Perl:
and in PScheme:
1
An alternative implementation would be to only create one new environment frame, then iteratively evaluate and bind
each value in turn, in the context of that new environment.
84 CHAPTER 7. ANOTHER VARIATION ON LET
(let ((a b)
(b a))
...)
Again, just for completeness, here’s our 0.0.4 version of ReadEvalPrint() with the additional let*
binding.
7.4 Tests
The additional tests for 0.0.4 are in t/PScm LetStar.t which you can see in Listing 7.5.1 on the next
page.
The first test proves that ordinary let binds in parallel, by doing the variable swapping trick. The second
test demonstrates let* binding sequentially since the innermost binding of b to a sees the immediately
previous binding of a to the outer b.
7.5. LISTINGS 85
7.5 Listings
7.5.1 t/PScm LetStar.t
001 use strict;
002 use warnings;
003 use Test::More;
004 use lib ’t/lib’;
005 use PScm::Test tests => 3;
006
007 BEGIN { use ok(’PScm’) }
008
009 eval ok(<<EOF, ’1’, ’let binds in parallel’);
010 (let ((a 1)
011 (b 2))
012 (let ((a b)
013 (b a))
014 b))
015 EOF
016
017 eval ok(<<EOF, ’2’, ’let* binds sequentially’);
018 (let ((a 1)
019 (b 2))
020 (let* ((a b)
021 (b a))
022 b))
023 EOF
024
025 # vim: ft=perl
List Processing
It was mentioned in Section 1.2 on page 4 that one of the great strengths of the Lisp family of languages
is their ability to treat programs as data: to manipulate the same list structures that their expressions
are composed of. So far we haven’t seen any of that functionality implemented in our interpreter.
Those list structures are the ones constructed by the Reader, and the Reader can be considered
a general purpose data input package: all it does is categorise and collect input into strings, numbers,
symbols and lists. That’s a very useful structure for organizing any kind of information, not just PScheme
programs. The read-eval-print loop will, however, attempt to evaluate any such structure read in, so we
need a way of stopping that.
8.1 quote
The appropriate form is called quote and is a PScm::SpecialForm. It takes a single argument and
returns it unevaluated:
> (quote a)
a
> (quote (+ 1 2))
(+ 1 2)
The implementation is rather trivial: Quote is used to turn off evaluation. Since special forms don’t have
their arguments evaluated for them, all that the Apply() method in PScm::SpecialForm::Quote need
do is to return its first argument, still unevaluated.
Here’s PScm::SpecialForm::Quote.
87
88 CHAPTER 8. LIST PROCESSING
8.2 list
Another useful operation is called list. It takes a list of arguments and constructs a new list from them.
It is just a PScm::Primitive and so its arguments are evaluated:
It does nothing itself but return the list of its evaluated arguments to the caller as a new PScm::Expr::
List, so it’s also trivial to implement. To recap, all PScm::Primitive classes share a common Apply()
method that evaluates the arguments then calls the class-specific apply() method. So all we have to do
is to subclass PScm::Primitive to PScm::Primitive::List and give that new subclass an appropriate
apply() method.
As has been said, it’s trivial because it just returns its arguments as a new PScm::Expr::List.
It uses check type() to verify that its argument is a list, then calls its first() method, returning the
result.
Here’s the equivalent PScm::Primitive::Cdr class.
8.4 cons
We’re only missing one piece from our set of basic list operations now, but before adding that it is
necessary to explain and rectify a significant deviation that PScheme has so far made from other Lisp
and Scheme implementations. In our PScheme implementation lists have been implemented as object
wrappers around Perl lists. This had the advantage that the Perl implementation was as simple as it
could be. However real Lisp systems implement lists as chains of what are called cons cells, or more
commonly pairs. A cons cell, is a structure with two components, both references to other data types.
For a true list, the first component points at the current list element and the second component points
at the rest of the list. The first component is called the car and the second component the cdr, hence
the eponymous functions that access those components. So for example the lisp expression (foo ("bar"
10) baz) Is not properly implemented as in Figure 3.1 on page 17, but as shown in Figure 8.1 on the
following page.
The unfilled cdr pointers in the figure represent null pointers and terminate the list structure.
This means that a true Lisp list is in fact a linked list. A primary advantage of this is that the
internal first() and rest() (car and cdr) operations are equally efficient: there is no need for rest()
to construct a new list object, it just returns it’s rest component. A second advantage is that cons cells
are more flexible than lists. A true list is a chain of cons cells linked by their cdr component, ending in
a cons cell with an empty cdr. In general the cdr need not point to another cons cell, it could equally
well point to any other data type. A cons cell is constructed with the primitive operator cons.
Figure 8.1: Cons Cell Representation of a nested list (foo ("bar" 10) baz)
Cell
Cell
cdr Cell
cdr
car
car
Cell
Cell
cdr car
car
car
Symbol
String
foo Number
"bar" Symbol
10
baz
Cell
Symbol
cdr
b
car
Symbol
Dot notation is not limited to just dotted pairs, for example in Figure 8.3 on the facing page you can
see that more complex structures can also be represented.
Dot notation is actually capable of representing any structure we can envisage2 . In fact it is reasonable
to think of the normal list notation we have been using so far as merely a convenient shorthand for dot
notation. For example the list (a) is the same as the pair (a . ()) (because () is the empty list.)
Likewise the list (a b c) can be represented as (a . (b . (c . ()))). Obviously this unwieldy
notation is to be avoided unless necessary, but you should at least be aware of it.
Dot notation has a number of uses. Most importantly it allows us to easily specify variable numbers
of arguments to our lambda expressions: if the formal arguments in a lambda expression are dotted, then
2
Well, actually it is possible to imagine circular structures that would defeat any notation. A true Scheme will even
allow the creation of such circular lists, with such dubious expressions as (set-cdr! x x), but that’s a world of pain that
we will stay well away from.
8.5. IMPLEMENTATION 91
Cell
Cell
cdr
Symbol
cdr
car c
car
Symbol
Symbol
a
b
that can be taken to mean the dotted symbol is to be bound to a list of the remaining actual arguments.
For Example:
The a and b are bound to 1 and 2 as always, but the c, because it occupies the entirity of the rest of the
formal argument list gets bound to the remaining list of additional arguments.
Interestingly this also allows entirely arbitrary lists of arguments. If you think about it the dotted
pair notation ( . a) can be made perfectly legal for input, and is equivalent to the symbol a: the
opening brace implies a list, but the first symbol encountered is the cdr of a list that has not started
to form yet, so the result is just that symbol. Since we must accept lambda expressions with such an
argument declaration, we must also accept lambda expressions with a single symbol instead of a list of
arguments. For example we could define our list primitive in PScheme like:
list takes any number of arguments as a single list args and returns them.
8.5 Implementation
Let’s make the change to use that alternative implementation. Since the PScm::Expr::List class hides
its internal structure and provides accessor methods, technically that should be the only package that
needs to change. However it is worthwhile making use of the new list structure in other parts of the
PScheme system. The language is highly recursive, and this new linked list structure lends itself to
recursion much more naturally than a plain perl @list does.
92 CHAPTER 8. LIST PROCESSING
The new() method on Lines 62-72 is a little more complicated than it was, because it has to recurse
on its argument list building a linked list. If the list is not empty then on Line 68 it calls an ancilliary
method Cons() (defined on Lines 74-77) to actually construct a new PScm::Expr::List::Pair node (a
cons cell). So the PScm::Expr::List class is now in fact abstract. Although it has a new() method, that
method actually returns instances of either PScm::Expr::List::Pair or another new object, PScm::
Expr::List::Null, which represents the empty list.
if you remember the old implementation of new() just wrapped its argument list:
The as string() method of PScm::Expr::List has changed too. Rather than just mapping as -
string() over the components of the list, it calls a separate strings() method that will return an array
of strings, and joins and wraps that. The main reason for that is to cope with dotted pair notation.
We’ll see how strings() works soon.
8.5. IMPLEMENTATION 93
Much of the functionality that was in PScm::Expr::List has been moved out into PScm::Expr::
List::Pair, shown next:
As a minor optimization, PScm::Expr::List::Pair will store its first and rest components in an array
ref rather than a hash. For that reason it declares two constants FIRST and REST to act as indexes into
that structure. PScm::Expr::List::Pair has its own new() method which we’ve already seen being
called by Cons(). It just collects its two arguments into a new object.
The other methods in PScm::Expr::List::Pair are fairly straightforward.
• The value() method on Lines 98-102 converts the linked list back into a Perl list3 . Because the
structure is no longer necessarily a true list we must supply an alternative value() method in
PScm::Expr which just returns $self (ignoring the dot notation). There is another value()
method in PScm::Expr::List::Null that returns the empty (Perl) list and likewise terminates the
recursion of PScm::Expr::List::value().
• The first() and rest() methods are simplified, they are now just accessors to their equivalent
fields.
• As mentioned above, the as string() method from PScm::Expr::List has to deal with dot
notation, so cannot simply map an as string() over the list’s value(). Instead it calls a helper
method strings().
• strings(), on Lines 108-112, returns a perl list of the string representation of the first item on the
list, plus the result of calling itself on the rest of the list. There is an implementation of strings()
in PScm::Expr::List::Null that just returns the empty (Perl) list:
and another at the root of the heirarchy in PScm::Expr which catches the situation where a type
other than a list or null is the cdr of a list:
It returns a list of the string ’.’ plus the result of calling as string() on itself. Since it knows it
must be terminating a list, it need not recurse.
then the scalar context imposed by the isTrue() method in PScm::Expr would cause Perl to treat the comma as the
comma operator, throwing away the left hand side and returning only the right, recursively, so all lists would end up being
treated as false.
8.5. IMPLEMENTATION 95
• A new map eval() method on Lines 120-124 will come in very handy later. It takes an environment
as argument and builds a copy of itself with each component replaced by the result of evaluating
that component in the argument environment. Because, like strings() it must deal with the
possibility that the structure is not a true list, an additional map eval() method is provided in the
base PScm::Expr class that just returns the result of calling Eval() on $self.
• Finally, an identifying is pair() method is defined to be true in this class only. It is defined false
by default in PScm::Expr.
Another part of our alternative implementation of lists is that new PScm::Expr::List::Null class. It
represents the PScheme empty list, and also quite reasonably descends from the list class. It provides
a simple new() method with no arguments, and overrides the value() and strings() methods to just
return an empty perl list.
Interestingly, it also overrides first() and rest() to return $self, so the car or cdr of the empty list
is the empty list, and it overrides Eval() to just return $self too, so an empty list evaluates to itself4 .
Back to our cons function. Scheme implementations have a cons operation that takes two arguments
and creates a cons cell with its car referencing the first argument, and its cdr referencing the second.
Thus the car and cdr operations are the precise complement of cons: cons constructs a cell, and car
and cdr take the cell apart.
4
This interesting trick of having an object to represent the absence of another object is a well-known design pattern
called the Null Object Pattern.
96 CHAPTER 8. LIST PROCESSING
Provided the second argument to cons is a list, the result will also be a list, but there is no requirement
for the second argument to be a list. That makes cons a second way to create dotted pairs, other than
inputting them directly:
cons is implemented in the normal way, by subclassing PScm::Primitive and giving the new class,
PScm::Primitive::Cons in this case, an apply() method. Here’s that method.
The question arises as to why I’m then just calling value() on the result and passing an ordinary perl
list to the individual primitives, after I just said that it was worthwhile passing around the new linked
8.5. IMPLEMENTATION 97
list structures. It’s just a personal choice, but I feel that the individual primitives should present an
“abstraction layer” such that anything below that layer is pure Perl and does not depend on the details
of the PScheme implementation above that layer. Anyway, that’s how I see it.
Special forms, on the other hand, are very much part of the PScheme implementation and do make
full use of the new linked lists.
First up is PScm::SpecialForm::Let. If you remember let and friends make use of a shared Un-
Pack() method which used to return a list of two references to perl lists, one for the symbols and one
for the values, plus the body of the let expression. Now it will return PScheme lists instead, and those
lists will be passed to the various Extend*() methods in the environment implementation, which will
also have to change.
anyway here’s the new PScm::SpecialForm::Let::Apply():
Only the variable names have changed: $symbols and $values used to be called $ra symbols and
$ra values to indicate that they were references to arrays. This is no longer the case.
The UnPack() still just calls value() on the $form to get the bindings and the body, but now it
makes use of an additional unpack bindings() method to build the new PScheme lists:
040 $values)
041 );
042 }
043 }
If it has reached the end of the bindings it creates a new null object and returns two of it. otherwise it
calls itself on the rest of the bindings, collects the results, then prepends the symbol from the current
binding onto the first list and the value from the current binding onto the second list. Finally it returns
those two new lists. Essentially it recurses to the end of the bindings then builds a pair of lists on the
way back up. This guarantees that the symbols and values are in the same order in the results as they
were in the bindings.
Given the new UnPack() method, the Apply() methods for PScm::SpecialForm::LetRec:
and PScm::SpecialForm::LetStar:
Because it extracts the condition, true and false branches from the $form by using combinations of
first() and rest() rather than just using value(), and because the first() and rest() of the empty
list is the empty list, and because the empty list evaluates to itself, our new if no longer requires a third
argument. If the test fails and a false branch is not supplied the result will be the empty list.
PScm::SpecialForm::Lambda::Apply() is unchanged, but the closure that it constructs will make
use of the new lists when it interacts with the changed environment implementation.
The only remaining special form is the new PScm::SpecialForm::Quote, but we’ve seen that
already.
The shared apply() method recieves a PScheme list of actual arguments rather than a reference to
an array and it passes that, plus its PScheme list of formal arguments directly to the environment’s
ExtendUnevaluated() method:
Variable names have changed to reflect the fact that they are no longer references to arrays, and Extend()
uses map eval() to evaluate the list of values before passing them to ExtendUnevaluated().
ExtendUnevaluated() is similarily changed:
It uses a new private populate bindings() method to populate a hash of bindings from the $symbols
and $values lists. After that it does what it always did, creating a new PScm::Env and setting its
parent field to $self before returning it.
The private populate bindings() method actually does the “parsing” of the argument list of sym-
bols and their binding to equivalent values.
On Line 33 it checks to see if it has reached the end of the list of symbols. If it has, then it throws an
error if it has not also reached the end of the list of values (too many arguments).
If $symbols is not null, then on Line 35 it checks to see if it is a pair. If it is, then it gets the current
symbol, checks that it is a symbol, binds it to the equivalent value, and recurses on the rest of the two
lists.
If $symbols is not a pair, then on Line 38 it checks to see if it is itself a symbol. This would correspond
to either a single args in a lambda statement like (lambda args ...), or a dotted pair terminating a
list of arguments. In either case if $symbols is a symbol, it binds it to the entirity of the rest of the
$values list and terminates the recursion.
If $symbols is none of the empty list, a pair or a symbol then something is seriously wrong and
populate bindings() throws an exception.
Next up is ExtendRecursively(). It is unchanged except that, as in other cases, its arguments have
been renamed because they are no longer array references:
Finally, ExtendIteratively() benefits from the new lists too. It returns $self if the list of symbols is
empty, otherwise it calls Extend() on the first of each list then calls itself on the extended result with
the rest of the list:
Note that ExtendIteratively() is only used by let*, and so does not need to deal with non-lists.
It may look very different but all that has really happened is that the old looping code which collects a
list has been moved out into a separate method, and that method, read list(), is now recursive instead
of iterative:
read list() collects its components using another new method read list element(). On Line 33 if
it detects a closing brace it returns the empty list. If the token is not a closing brace, then on Line 35
if the token is a dot, it reads the next element, checks that it is an expression, checks that the element
after that is a closing brace, and returns the element just read as the entire list. In the final case, if the
token is neither a closing brace or a dot, it uses Cons() to construct a list with the current token as the
first element and the result of calling read list() as the rest. Hopefully you can convince yourself that
this will deal correctly with both ordinary lists and dotted pair notation.
read list element() just centralises the otherwise tedious and repetitive checking for eof which is
a syntax error (unterminated list) while collecting list elements:
Finally next token() has an extra clause to detect a standalone period (dot) and if it does it returns
an instance of a new token class PScm::Token::Dot:
8.6 Summary
In interpreter 0.0.5 five related operations for creating and manipulating list structures have been added.
We’ll put those to good use in the next version of the interpreter when we look at macros. In the process
of implementing one of those operations, cons, the basic list implementation was changed to be closer to
104 CHAPTER 8. LIST PROCESSING
a “standard” scheme implementation, and out of that we won the ability to construct dotted pairs, and
from that we got variable arguments to lambda expressions.
Just for the sake of completeness here’s the changes to our top-level PScm::ReadEvalPrint() method,
where we add the new bindings for these functions in the initial environment.
8.7 Tests
Tests for version 0.0.5 of the interpreter are in t/PScm List.t and t/PScm dot.t, which you can see in
Listing 8.8.1 on the next page and Listing 8.8.2 on page 106.
The tests in t/PScm List.t exercise our new list functions. The first test shows that list evaluates
its arguments then returns a list of them. The second test proves that the car of the list (1 2 3) is
the value 1. The third test proves that the cdr of the list (1 2 3) is the list (2 3). The fourth test
proves that (cons 1 (list 2 3)) is (1 2 3). The fifth test proves that quote protects its argument
from evaluation: (quote (list 1 2 3)) is (list 1 2 3). Lastly, three tests verify that dot notation
on input and output behaves as expected.
The tests in Listing 8.8.2 on page 106 verify the new variable arguments to lambda expressions, much as
our previous examples demonstrated.
8.8. LISTINGS 105
8.8 Listings
8.8.1 t/PScm List.t
001 use strict;
002 use warnings;
003 use Test::More;
004 use lib ’t/lib’;
005 use PScm::Test tests => 9;
006
007 BEGIN { use ok(’PScm’) }
008
009 eval ok(<<EOF, ’(1 2 3)’, ’list primitive’);
010 (let ((a 1)
011 (b 2)
012 (c 3))
013 (list a b c))
014 EOF
015
016 eval ok(<<EOF, ’1’, ’car primitive’);
017 (let ((a (list 1 2 3)))
018 (car a))
019 EOF
020
021 eval ok(<<EOF, ’(2 3)’, ’cdr primitive’);
022 (let ((a (list 1 2 3)))
023 (cdr a))
024 EOF
025
026 eval ok(<<EOF, ’(1 2 3)’, ’cons primitive’);
027 (cons 1 (list 2 3))
028 EOF
029
030 eval ok(’(quote (list 1 2 3))’, ’(list 1 2 3)’,
031 ’quote special form’);
032
033 eval ok(’(quote (1 2 . 3))’, ’(1 2 . 3)’, ’dot in’);
034
035 eval ok(’(quote (1 2 . (3)))’, ’(1 2 3)’, ’dot out’);
036
037 eval ok(’(quote (1 . (2 . (3 .()))))’,
038 ’(1 2 3)’, ’complex dot out’);
039
040 # vim: ft=perl
106 CHAPTER 8. LIST PROCESSING
Macros
What is a macro? People familiar with the C programming language will probably think of macros as
being purely a textual substitution mechanism done in some sort of preprocessing step before the compiler
proper gets to look at the code. However that’s a somewhat limited perspective, perfectly adequate for
languages like C but constraining from our point of view. A better definition of a macro is any sort of
substitution or replacement that can happen before the final code is executed.
The real importance of macros is their potential to allow syntactic extensions to their language. In
the case of PScheme, each special form is a syntactic extension to the language, and so our working
definition of a PScheme macro could be something that allows us to define our own special forms within
the language itself. Here’s an example. Suppose the language lacked the let special form. As was
mentioned in Chapter 5 on page 59, let shares a good deal in common with lambda. In fact any let
expression, say
The body of the let is the same as the body of the lambda, and the bindings of the let are split between
the formal and actual arguments to the lambda expression. In general any let expression:
107
108 CHAPTER 9. MACROS
Of course internally let doesn’t make use of closures, but in the case of the lambda equivalent to let,
the lambda expression is evaluated immediately in the same environment as it was defined, so closure is
immaterial. All that our purported let macro need do then, is to rewrite its arguments into an equivalent
lambda form and have that executed in its place. We developed all of the list manipulation tools we will
need to do that in the 0.0.5 version of the interpreter from Chapter 8 on page 87 (remember that code
and data are the same list objects so list functions can operate on both). All we need to do now is to
think of a way to allow us to define macros.
Macros will obviously share a great deal in common with functions. They will have a separate
declaration and use. They will also take arguments, and have a body that is evaluated in some way. In
fact the first part of their implementation, that of parsing their declaration will be virtually identical to
that of lambda expressions, except that the lambda keyword is already taken. We’ll use “macro” in its
place.
9.1 macro
As before then, we subclass PScm::SpecialForm and give the new class an Apply() method. The new
class is called PScm::SpecialForm::Macro after its eponymous symbol. Here’s the Apply() method
for PScm::SpecialForm::Macro.
And here again is the private apply() method in the base PScm::Closure class:
9.2. EVALUATING MACROS 109
Any implementation of macros will share something in common with this implementation of functions,
but there will be differences. Obviously a macro should be passed its arguments unevaluated. That way
it can perform whatever (list) operations it likes on that structure. Then when it returns a new form, it
is that form that gets evaluated.
In fact it’s as simple as that, and here’s the Apply() method for PScm::Closure::Macro:
(let* ((mylet
(macro (bindings body)
(let* ((names (cars bindings))
(values (cadrs bindings)))
(cons (list (quote lambda) names body) values)))))
110 CHAPTER 9. MACROS
(mylet ((a 1)
(b 2))
(list a b)))
This code uses let* (remember we’re pretending that we don’t have let) to bind mylet to a macro
definition, then it uses mylet in the body of the let*. It makes use of some supporting functions that
we’ll define presently, but first let’s try to get a feel for what it is doing. As stated above, the symbol
macro introduces a macro definition. The arguments to mylet will be the same as those to let, namely
a list of bindings and a body to execute with those bindings in place. It has to separate the bindings
(symbol-value pairs) into two lists, one of the symbols and one of the values. It might be useful in the
following discussion to refer to Figure 9.1 which shows the internal structure of the mylet form that we’ll
be rearranging.
op bindings body
mylet
binding binding
list a b
a 1 b 2
The mylet macro uses a function cars to extract the car of each binding (the symbol) into the list called
names.
Here’s the definition of cars:
(letrec (...
(cars
(lambda (lst)
(map car lst)))
...)
...)
It uses another yet to be defined function map, which does the same as Perl’s built in map: it applies
a function to each element of a list and returns a new list of the results1 . map is surprisingly easy to
implement in PScheme:
(letrec ((map
(lambda (op lst)
1
Perl actually borrows its map function from Lisp, which has had one for many years.
9.2. EVALUATING MACROS 111
(if lst
(cons (op (car lst))
(map op (cdr lst)))
())))
...)
It’s a recursive function, hence the need for letrec to bind it. Passed a function and list of zero or
more bindings, if the list is empty it returns the empty list, otherwise it cons-es the result of calling the
function on the car of the list with to the result of calling itself on the rest (cdr) of the list. So for
example if lst is ((a 1) (b 2)), then (map car lst) would return the list (a b), and that is exactly
what the cars function does.
cadrs2 is very similar. It walks the list collecting the second component of each sublist (the values
of the bindings). So for example given the list ((a 1) (b 2)), cadrs will return the list (1 2).
(letrec (...
(cadrs
(lambda (lst)
(map (lambda (x) (car (cdr x))) lst)))
...)
...)
Again it makes use of map this time passing it an anonymous function that will take the car of the cdr
of its argument. This is very much in the style of real Scheme programming now: constructing lambda
expressions on the fly and passing them to other functions as arguments, I hope you are aquiring a taste
for it. Anyway here’s the whole mylet definition plus some code that calls it.
(let* ((mylet
(letrec ((map
(lambda (op lst)
(if lst
(cons (op (car lst))
(map op (cdr lst)))
())))
(cars
(lambda (lst)
(map car lst)))
(cadrs
(lambda (lst)
(map (lambda (x) (car (cdr x))) lst))))
(macro (bindings body)
(let* ((names (cars bindings))
(values (cadrs bindings)))
(cons (list (quote lambda) names body)
values))))))
(mylet ((a 1)
2
The term cadr is a contraction of “car of the cdr” e.g. (cadr x) == (car (cdr x)). this sort of contraction is often
seen in scheme code, sometimes nested as much as four or five levels deep, i.e. cadadr.
112 CHAPTER 9. MACROS
(b 2))
(list a b)))
After collecting the names into one list and the values into another, the mylet macro builds:
Where hnamesi, hbodyi and hvaluesi are expanded using the appropriate magic:
A point worth noting is that the constructed mylet macro is a true closure, since it has captured the
definitions of the cars and cadrs functions and executes in an outer environment (the let*) where those
functions are not visible.
9.2.2 An Improvement
The macro substitution system demonstrated so far is pretty crude, after all it requires the programmer
to directly manipulate low-level list structures, rather than just supplying an “example” of how the
transformation is to be performed. In fact the topic of macro expansion as provided by a full Scheme
implementation is deserving of a book to itself. Apart from the templating ability, there are also issues
of avoiding variable collision (so-called hygenic macros) so that full Scheme macros are much closer to
the idea of C++’s inline functions than they are to C’s #define3 .
However there is one simple addition that we can make, which will greatly improve the usefulness
of macros, and that involves an extension to the quote special form that we introduced in Section 8.1
on page 87. If you remember quote just returns its argument, preventing unwanted evaluation. This
already has proved useful in the construction of macros, as we have seen above.
Now one perfect use of quote would be to provide templates for macros, if we could arrange that parts
of the quoted template could be substituted before the quoted template is returned. To that purpose
we introduce a keyword unquote which marks a section of a quoted form for evaluation. Perhaps an
example might make this clear:
(extend-syntax (mylet)
(mylet ((var val) ...) body)
((lambda (var ...) body) val ...))
9.2. EVALUATING MACROS 113
The let bindings bind x to the string "rain" etc. That is not the important part. The important part is
the body of the let where the use of the unquote keyword allows evaluation of the contained expressions
(x etc.) despite their being inside a quote.
How can this help us with macro definitions? Well in a big way! consider this macro definition of a
while loop:
(define while
(macro (test body)
(quote (letrec
((loop
(lambda ()
(if (unquote test)
(begin
(unquote body)
(loop))
()))))
(loop)))))
It uses a few features that aren’t available yet, like define and begin (which just executes one expression
after another), and it would seem to be in danger of running out of stack, but I hope you can see that
essentially the quote and unquote are doing all of the work building the body of the macro. The quoted
result is shown in bold, with the internal substitutions unbolded again.
Implementing unquote is easy, but it’s a little different from the normal special forms and primitives
we’ve seen up to now. I’ve been careful to only refer to it as a “keyword”, because it means nothing
special outside of a quoted expression.
We’ll obviously have to change the way quote works to make this happen, so lets start by looking at
the changed PScm::SpecialForm::Quote::Apply().
Rather than just returning its first argument, it now calls a new method Quote() on it, passing Quote()
the current environment. Quote() essentially just makes a copy of the expressions concerned, but it keeps
an eye out for unquote symbols. Now this method will be implemented in the PScm::Expr classes as
follows:
The default Quote() in PScm::Expr just returns $self:
On Line 135 it checks to see if the first element of the list is the symbol unquote (is unquote.) If it is
then it evaluates the second element in the current environment and returns it. If the first element is
not unquote then it hands over control to a helper routine quote rest().
Here’s quote rest().
It just walks the list, recursively, constructing a copy as it goes by calling Quote() on each element and
calling Cons() on the quoted subexpression and the result of the recursive call4 .
The PScm::Expr::List::Null package inherits Quote() from PScm::Expr, which just returns
$self, and PScm::Expr also has a quote rest() method which also just returns $self and usefully
terminates the recursion of the non-empty PScm::Expr::List quote rest() method.
That just leaves that is unquote() method. Well since only a symbol could possibly be unquote, we
can put a default is unquote() method at the top of the expression type hierachy, in PScm::Expr,
which just returns false:
Then for PScm::Expr::Symbol only, we override that with a method that checks to see if its value()
is the string "unquote":
That completes our re-implementation of quote to allow the recognition of the unquote keyword, but
we’re not quite done yet.
quote and unquote turn out to be so useful in the definition of macros that PScheme provides
shorthand syntactic sugar for these forms. The construct ’hexpressioni (note the single quote) gets
expanded to (quote hexpressioni), and similarily the construct ,hexpressioni with a leading comma
gets expanded to (unquote hexpressioni). This is fairly easy to do, so let’s see what changes we need
to make to the reader to make this happen.
First here’s the changes to PScm::Read:: next token().
4
Note the similarity between this method and the definition of map in Pscheme above.
9.2. EVALUATING MACROS 115
The change is very simple. You can see that on Lines 78–79 if it strips a leading quote or comma, it
returns an equivalent token object. Those new token types are both in PScm/Token.pm, here’s PScm::
Token::Quote.
It inherits from PScm::Token and a default is quote token() there returns false.
The equivalent PScm::Token::Unquote deliberately inherits from PScm::Token::Quote rather
than just PScm::Token so it gets the overridden is quote token() method, and supplies an additional
is unquote token() method returning true. Again a default is unquote token() in PScm::Token
returns false.
039
040 sub is_unquote_token { 1 }
The upshot of this is that both PScm::Token::Quote and PScm::Token::Unquote return true for
is quote token(), but only PScm::Token::Unquote returns true for is unquote token(). Finally,
let’s see how the reader PScm::Read::Read() makes use of these new token objects.
The additional code on Lines 23–33 checks to see if the token is a quote or unquote token, and if so
reads the next expression, checks that it is valid and returns a new PScm::Expr::List containing the
appropriate quote or unquote symbol and the expression read afterwards. The is expr() method is
defined to be true in PScm::Expr and false in PScm::Token, and its use here stops dubious constructs
like “’)”.
So we now have a convenient shorthand for quote and unquote. To demonstrate it in action, here’s
that while macro again, this time using the new tokens.
(define while
(macro (test body)
’(letrec
((loop
(lambda ()
(if ,test
(begin
9.2. EVALUATING MACROS 117
,body
(loop))
()))))
(loop))))
We’ll be making significant use of macro, quote and unquote in subsequent chapters, so it’s worth
familiarizing yourself now with this new idiom5 .
The quote stopped the first round of evaluation, but eval got another try at it. Here’s another example:
eval is quite simple. It is a special form because it needs access to an environment in which to perform
the evaluation (remember primitives have their arguments evaluated for them and so don’t need an
environment.) It evaluates its first argument in the current environment (special forms don’t have their
arguments evaluated for them,) then it evaluates the result a second time, this time in the top-level
environment. Here’s PScm::SpecialForm::Eval:
You can see that the second round of evaluation is done in the context of the top-level environment
obtained by calling a new method top() on the current environment. That top() method is also very
simple:
5
The quote and unquote described here are done differently in true Scheme. A true Scheme implementation distinguishes
between a simple quote which does not recognize unquote, and an alternative quasiquote which does. This means quote is
as efficient as our original implementation, but we still have access to an unquote mechanism. The quote form still has the
“’” syntactic sugar, and quasiquote uses the alternative “‘” (backtick) shorthand. Additionally a full Scheme provides an
unquote-splicing (“,@”) which expects a list and splices it into the existing form at that point.
118 CHAPTER 9. MACROS
It just checks to see if it has a parent, calling top() recursively on that if it has, and returning itself if
it hasn’t.
One thing to watch out for with eval: the code that is evaluated is not a closure. Any variables in that
code will be looked up in the top-level environment, not the one where the expression was constructed,
nor the one that is current when eval is called. For example:
Nonetheless eval is a useful tool in your kit, we’ll see it in action in later chapters.
9.3 Summary
Here’s the additions to ReadEvalPrint() which bind our new macro feature and eval in the initial
environment. The quote binding was already there, and as shown above, unquote is only a keyword and
does not need a binding:
9.4 Tests
The tests for macro and unquote are in Listing 9.5.1 on the next page.
The first test just implements and tests the mylet example we worked through in the text, and the
second test shows unquote in action with a variation on another example we’ve already seen. The third
test exercises the syntax extensions in the reader, and the fourth test demonstrates that macros, like
closures, produce a textual representation of themselves when printed.
The tests for eval are in Listing 9.5.2 on page 122. This just does a simple evaluation of a quoted
form.
120 CHAPTER 9. MACROS
9.5 Listings
9.5.1 t/PScm Macro.t
001 use strict;
002 use warnings;
003 use Test::More;
004 use lib ’t/lib’;
005 use PScm::Test tests => 5;
006
007 BEGIN { use ok(’PScm’) }
008
009 eval ok(<<EOF, ’(1 2)’, ’macros’);
010 (let* ((mylet
011 (letrec ((map
012 (lambda (op lst)
013 (if lst
014 (cons (op (car lst))
015 (map op (cdr lst)))
016 ())))
017 (cars
018 (lambda (lst)
019 (map car lst)))
020 (cadrs
021 (lambda (lst)
022 (map (lambda (x) (car (cdr x))) lst))))
023 (macro (bindings body)
024 (let* ((names (cars bindings))
025 (values (cadrs bindings)))
026 (cons (list (quote lambda) names body)
027 values))))))
028 (mylet ((a 1)
029 (b 2))
030 (list a b)))
031 EOF
032
033 eval ok(<<EOF, <<EOR, ’unquote’);
034 (let ((x (quote rain))
035 (y (quote spain))
036 (z (quote plain)))
037 (quote (the (unquote x)
038 in (unquote y)
039 falls mainly on the
040 (unquote z))))
041 EOF
042 (the rain in spain falls mainly on the plain)
043 EOR
044
045 eval ok(<<EOF, <<EOR, ’quote and unquote syntactic sugar’);
046 (let ((x ’rain)
047 (y ’spain)
048 (z ’plain))
049 ’(the ,x
9.5. LISTINGS 121
050 in ,y
051 falls mainly on the
052 ,z))
053 EOF
054 (the rain in spain falls mainly on the plain)
055 EOR
056
057 eval ok(<<EOF, <<EOR, ’macro to string’);
058 (macro (x)
059 ’(a ,x))
060 EOF
061 (macro (x) (quote (a (unquote x))))
062 EOR
063
064 # vim: ft=perl
122 CHAPTER 9. MACROS
Side Effects
The question arises as to why the implementation of the define special form, described back in Section 2.4
on page 10 has beed deferred for so long, when it would have made so much of the previous discussion
easier, particularily the Scheme examples. Well there are good reasons. Consider what the language so
far has got.
...
(let ((a (func1 x))
(b (func2 y))
(c (func3 z)))
(func4 (func5 a) (func6 b) (func7 c)))
...
The let could elect to send the subexpression (func1 x) (plus the environment where func1 and x
have a value) to one process, and the subexpression (func2 y) to another. While those two expressions
123
124 CHAPTER 10. SIDE EFFECTS
are being evaluated the let could get on with evaluating (func3 z) itself. Then when it had finished
that evaluation it would collect the result of the other two evaluations and proceed to evaluate its body.
Similarily the evaluation of the body (the call to func4) could outsource the evaluation of the arguments
(func5 a) and (func6 b) and get on with evaluating (func7 c), collecting the other results when it
finished, and then proceeding to evaluate the func4 call with those evaluated arguments. It probably
shouldn’t outsource something a simple as variable lookup.
The implementation of this networked programming language is left to you. Some sort of central
ticketing server would be needed, with a queue where requestors could post their requests in exchange
for a ticket, and a pool where clients could post their results so that they need not wait for the requestor
to get back to them, and there’d have to be some way of telling the server that a particular ticket depends
on other tickets, so that a client would never block waiting for the server to return an outstanding ticket. . .
quite an interesting project. The real point though is that both let and lambda not only conceptually
evaluate and bind their arguments in parallel, they could actually do so without disturbing the sense of
the program.
Beyond this point that kind of application becomes nearly impossible because we are about to intro-
duce side effects, in particular variable assignment and, to make that useful, sequences of expressions.
That’s just the proper name for something we’re all very familiar with—one statement following another.
In fact, the primary difference between a statement and an expression is that a statement is executed
purely for its side effects. There are only a couple of different side effects that we need to consider.
The first is variable assignment, that is to say the changing of the existing value of a variable binding.
The second is definition, the creation of new entries in the current environment. Note that this is very
different from what the various let special forms do. They always create a new environment, they never
modify an existing one. Even letrec which may appear to be modifying an existing environment by
changing bindings is not really doing so: nothing actually gets to use that environment until after letrec
has finished building it, the creation of that new environment is atomic as far as the PScheme language
is concerned.
Just to reiterate before we move on. In a functional language it makes no difference in what order
the arguments to a function are evaluated, but in a language with side-effects, if those arguments cause
side-effects during their evaluation, then the order of evaluation is significant and must be taken into
account when designing a program.
(set! a 10)
Error: no binding for a in PScm::Env
This may sound unnecessarily picky, but for variable assignment to work, there must already be a binding
in place that the assignment can affect. The reasoning is that the variable being assigned to need not be
1
The exclaimation point is used to suffix all such side-effecting operations, with the exception of define, for some reason.
10.2. VARIABLE ASSIGNMENT 125
in the current environment frame: it must be searched for, and if we allow set! to create new variables
then they would probably be installed in the current top frame. So the scope of a variable that was set!
would depend on whether the variable already existed or not, which is inconsistent at best.
Anyway, this works:
This is a bit contrived, we use a dummy variable to allow the set! expression to be evaluated before
the body of the let* returns the new value of a. However it should be clear from the example that the
set! did take effect.
So how do we go about implementing set!? Well as luck would have it, we already have a method
for changing existing bindings in an environment, we have the Assign() method that was developed
for the letrec special form in a previous incarnation. It has precisely the right semantics too (what a
co-incidence!) It searches through the chain of environment frames until it finds an appropriate binding,
assigning the value if it does and die()-ing if it doesn’t. Here it is again:
So all we have to do is wire it up. The process of creating new special forms should be familiar by now.
Firstly we subclass PScm::SpecialForm and give our new class an Apply() method. The new class
in this case is PScm::SpecialForm::Set. Its Apply() method as usual takes a form and the current
environment as arguments. In this case the form should contain a symbol and a value expression. This
Apply() will evaluate the expression and call the environment’s Assign() method with the symbol and
resulting value as arguments:
And that’s all there is to it, apart from adding a PScm::SpecialForm::Set object to the initial envi-
ronment, bound to the symbol set!.
126 CHAPTER 10. SIDE EFFECTS
10.3 Sequences
Sequences are another fairly trivial addition to the language. Rather than a single expression, a sequence
contains zero or more expressions. Each expression is evaluated in order, and the value of the last
expression is the value of the sequence. The value of an empty sequence is null, the empty list. This is
such a common thing in many other languages (such as Perl) that it goes without notice, but thinking
about it, sequences only become relevant and useful in the presence of side effects. Since only the value
of the last expression is returned, preceding expressions can only affect the computation if they have side
effects.
The keyword introducing a sequence in PScm is begin. begin takes a list of zero or more expressions,
evaluates each one, and returns the value of the last expression, or null if there are no expressions to
evaluate. It functions something like a block in Perl, except that a Perl block also encloses a variable
scope like let does, but begin does not imply any scope.
With begin, we can write that rather awkward set! example from the previous section a lot more
clearly:
begin could, in fact, be implemented as a function, provided that functions are guaranteed to evaluate
their arguments in left-to-right order, as this implementation does. However it is safer not to make that
assumption (remember the networked language where evaluation was envisaged in parallel,) so begin is
better implemented as a special form which can guarantee that left to right behaviour.
As might be imagined the code is quite trivial: evaluate each component of the form and return the
last value. We pick as an initial value the empty list, so that if the body of the begin is empty, that is
the result. As usual we subclass PScm::SpecialForm, in this case to PScm::SpecialForm::Begin,
and give the new class an Apply() method:
On Line 155 it extracts the expressions from the argument $form. Then on Line 157 it initialises the
return value $ret to an initial value. On Lines 159-161 it loops over each expression, evaluating it in the
current environment, and assigning the result to $ret, replacing the previous value. Lastly on Line 163
it returns the final value.
10.4 Summary
In this section we’ve seen variable assignment added to the language, but also taken some time to consider
some drawbacks of that feature. We’ve also looked at how sequences become useful in the presence of
variable assignment. In fact sequences serve no purpose without side-effects and side-effects are difficult
to do without sequences.
As usual we need to add our new forms to the interpreter by binding them in the initial environment.
Here’s ReadEvalPrint() with the additional bindings.
058 }
059 }
It’s worth pointing out a major difference between PScheme and standard scheme implementations here.
In standard scheme, function bodies and the bodies of let expressions and their variants are all implicit
sequences. So in fact in a real scheme implementation you could just write:
In PScheme function and let bodies are single statements and require an explicit begin to create a
sequence, because I wanted to keep the distinction between single expressions and sequences clear to the
reader.
10.5 Tests
A test of set! and begin can be seen in Listing 10.6.1 on the facing page. The test binds a to 1 in a
let, then in the body of the let it performs a begin which set!s a to 2 then returns the new value of
a.
10.6. LISTINGS 129
10.6 Listings
10.6.1 t/PScm SideEffects.t
001 use strict;
002 use warnings;
003 use Test::More;
004 use lib ’t/lib’;
005 use PScm::Test tests => 2;
006
007 BEGIN { use ok(’PScm’) }
008
009 eval ok(<<EOF, ’2’, ’begin and assignment’);
010 (let ((a 1))
011 (begin (set! a 2)
012 a))
013 EOF
014
015 # vim: ft=perl
define
And so to define. define is another type of side effect. It differs from set! in that it is not an error
if the symbol does not exist, and it differs also in that the binding is always installed in the current
environment. Therefore executing define at the top level prompt will install the binding in the global
environment.
It takes a symbol and a value (already evaluated) as arguments. On Line 134 it directly adds the binding
from the symbol to the value, reguardless of any previous value, and on Line 135 it returns the symbol
being defined, to give the print system something sensible to print.
131
132 CHAPTER 11. DEFINE
All it does is on Line 173 it extracts the symbol and the expression from the argument $form then
on Line 174 it calls the Define() environment method described above with the symbol and evaluated
expression (value) as argument.
058 $result->Print($outfh);
059 }
060 }
Now we can actually write some of the earliest examples from this book in the language at hand.
The factorial function was chosen to demonstrate that closures created by define can call themselves
recursively. After all, the environment they capture must, by virtue of how define operates, contain a
binding for the closure itself.
define can be used for other things too. Because of the simple semantics of the PScheme language,
define is perfectly suited for creating aliases to existing functions. For instance if a programmer doesn’t
like the rather obscure names for the functions car and cdr, they can provide aliases:
These are completely equivalent to the original functions except in name. The primitive definitions are
bound to the new symbols first and rest in the top level environment exactly as they are still bound
to their original symbols.
11.4 Tests
Tests for define can be seen in Listing 11.5.1 on the next page. The first test demonstrates global
definition, and also that closures bound by define are naturally capable of recursion since define
assigns them in the current environment. The second test shows that define can be used in any
environmental context, even in the body of a closure, to create new bindings. This second test is quite
interesting because it demonstrates a function that creates a “helper” function (h) that is only visible to
the containing times2 closure.
134 CHAPTER 11. DEFINE
11.5 Listings
11.5.1 t/PScm Define.t
001 use strict;
002 use warnings;
003 use Test::More;
004 use lib ’t/lib’;
005 use PScm::Test tests => 3;
006
007 BEGIN { use ok(’PScm’) }
008
009 eval ok(<<EOF, <<EOX, ’global definition’);
010 (define square
011 (lambda (x) (* x x)))
012 (square 4)
013 EOF
014 square
015 16
016 EOX
017
018 eval ok(<<EOF, <<EOX, ’local definition’);
019 (define times2
020 (lambda (x)
021 (begin
022 (define h
023 (lambda (k)
024 (- k (- k))))
025 (h x))))
026 (times2 5)
027 EOF
028 times2
029 10
030 EOX
031
032 # vim: ft=perl
Almost every modern programming language has an object-oriented extension or variant available. Some
languages, such as SmallTalk and Ruby are “pure” OO languages in that everything in the language is
an object1 . Other languages such as Perl and this PScheme implementation add OO features to what is
essentially a procedural core.
Every object implementation has its peculiarities. There are a lot of trade-offs and choices to be
made. Most of these differences come down to issues of visibility of object components from other parts
of a program: should the fields of an object be visible at all outside of that object? Should an object
be able to see the fields in an object it inherits from? Should an object of a particular class be able to
see fields of another object of the same class? Should certain methods of an object be hidden from the
outside world? from its descendants?
The implementation discussed here makes choices in order to leverage existing code. Those choices
result in a particular OO “style”. I’ve also decided, somewhat perversely, to be as different from the Perl
5 object implementation as possible within the constraints imposed, in order to give the reader a sense
of the different choices that are available.
hparent-expressioni is an expression evaluating to another class. Each hfieldi is a symbol naming one of
the object’s fields. Each hmethodi has the form:
1
If you don’t know SmallTalk, you might be surprised at how far that statement goes. Not only are the simple numeric
and string data types objects, but arrays, hashes (called Dictionaries), booleans, code blocks, exceptions and even classes
are objects in SmallTalk. Furthermore even the simplest operations such as addition are methods: adding 2 + 2 involves
sending the object 2 the message + with argument 2, and conditional expressions like if are implemented by sending a
boolean object representing the condition a message ifTrue with argument the code block to execute if the condition is
true. See [9] if you want more information.
135
136 CHAPTER 12. CLASSES AND OBJECTS
where hnamei is the name of the method, the hargis are the arguments, and hbodyi is the body, much like
lambda expressions. Also, somewhat like lambda expressions, but not identically, method bodies capture
the lexical environment current when the class is created.
The system provides a pre-built root class to act as a starting point for any class hierachy. That class
is bound to the symbol root.
So here is how we might create a crude “bank account” class:
(define Account
(make-class
root
(balance)
(init (amount) (set! balance amount))
(deposit (amount) (set! balance (+ balance amount)))
(withdraw (amount) (set! balance (- balance amount)))
(balance () balance)
(clone () (class balance))))
make-class returns the new class, and we bind that to the symbol Account with define.
Our new Account descends directly from the root class. It has a single balance field, and five
methods called init, deposit, withdraw, balance and clone. Note that there is no conflict between
the field called balance and the method of the same name: methods exist in a separate namespace.
The init method is special. It will get called whenever a new object is created. It should normally
assign values to the object’s fields, since they initially all have a value of zero.
We’ll come back to that clone method in a bit.
Creating instances of Account simply involves invoking the Account class with whatever arguments
its init method takes. It will return an object of class Account, suitably initialised:
This creates an object of class Account with an initial balance of 20, since init assigns the argument
20 to the balance field. define binds the new object to the symbol my-account.
This starts to explain that mysterious clone method. All methods have access to a special variable
called class, that refers to the class of the object; this has some parallels with the Perl PACKAGE
identifier. So clone need only call (class balance) to create a copy of the current object.
To call a method on an object you invoke the object, with the method name as the first argument
and arguments to the method itself as the remaining arguments:
The deposit method takes the argument 10 and adds it to the existing balance.
12.1.1 Inheritance
Classes and objects wouldn’t be much fun without inheritance, so here’s an example of a derived class:
12.1. FEATURES OF THIS IMPLEMENTATION 137
(define InterestAccount
(make-class
Account
(rate)
(init (interest amount)
(begin (super init amount)
(set! rate interest)))
(accumulate ()
(this deposit (* (this balance) rate)))))
Note a few things in particular.
• The parent class in this case is Account, the class we created previously.
• The InterestAccount class adds an extra field, rate.
• The InterestAccount’s init method, before setting the new object’s rate to interest, invokes
the parent’s init method with the call (super init amount) to set the balance. This is more
or less equivalent to the Perl SUPER method qualifier:
$self->SUPER::init($amount);
The super object is an implicit field of the class, and is automatically initialised when an object is
created. It represents the parent object.
• The special variable this is an implicit argument to methods, it represents the object on which
the method was originally invoked, just as $self conventionally does for perl methods2 .
• The InterestAccount class cannot see its parent’s fields, only its methods. It has to call (this
balance) to get the value of balance and (this deposit hargi) to change it.
• Methods are always called from an object. There are no shorthands. Even within a method body it
is necessary to use one of the special objects this or super to call a method on the current object.
The InterestAccount class can be used as follows:
> (define my-account (InterestAccount 20 2))
my-account
> (my-account deposit 10)
30
> (my-account balance)
30
> (my-account accumulate)
90
> (my-account balance)
90
Quite a nice rate of interest that is.
2
We could have chosen the name self instead, to make the examples easier for perl programmers to read, but the perl
under the hood might start to get ugly.
138 CHAPTER 12. CLASSES AND OBJECTS
(define Account
(let ((total 0))
(make-class
root
(balance)
(set-balance (op amount)
(begin (set! balance (op balance amount))
(set! total (op total amount))))
(init (amount) (this set-balance + amount))
(deposit (amount) (this set-balance + amount))
(withdraw (amount) (this set-balance - amount))
(balance () balance)
(total () total))))
The let binds total to an initial value of 0 then evaluates the make-class construct in this new
environment. The newly created class captures that environment. That new class is then returned by
the let expression and bound to Account.
Every instance of Account will share the lexically scoped variable total. Rather than change each
of init, deposit and withdraw to individually maintain both the value of total and balance, a new
method set-balance has been added. It takes an operation op (+ or -) and an amount and applies
the operation with the amount separately to both the balance and the total. The init, deposit
and withdraw methods have been modified to use this new method, and another new method, total,
provides read-only access to the value of total.
> (make-class
> cat
> ()
> (poke () (super poke))
> (respond () ’roar)
> ))
lion
> (define leo (lion))
leo
> (leo poke)
roar
• leo is an instance of lion and the call (leo poke) invokes lion’s poke method.
• lion’s poke method does (super poke), which invokes cat’s poke method.
• cat’s poke method does (this respond) but since, even after the super call, this is still the
originating object leo of class lion, it is lion’s respond method that gets invoked, resulting in
“roar” rather than the “purr” from cat’s respond method.
• There are no class variables or methods, but the capture of the lexical environment by a class
definition allows us to fake them.
• Only methods are inherited: fields are visible only to methods of the class in which they are defined.
• The special variable this is always available as an implicit argument to each method and refers to
the object that the method was originally invoked on.
• The special variable super is always available as an implicit field of every object and refers to the
parent object of the object that owns the method.
• Calling a method on the super object passes this, not super, as the implicit object argument to
the called method.
• The special variable class is always available as an implicit field in every object and refers to the
class of the object.
12.2 Implementation
So how do we go about implementing this extension? Well, to be frank, in a fairly ad hoc way. It
should be obvious that PScheme objects have a lot in common with environments, namely that they
store the values of variables. Our existing environment implementation can easilly be pressed in to
140 CHAPTER 12. CLASSES AND OBJECTS
service as a PScheme object. In fact in this implementation objects are environments, or rather, chains
of environments linked via a super field where each individual environment represents an instance of the
equivalent class in the class hierachy.
We need only add a couple of methods to our existing environment implementation to get all we
need.
• The first of those new PScm::Env methods would of course be an Apply(), since now environ-
ments are exposed in the language as objects and are invoked as operators: (hobjecti hmethodi
hargi...).
• The second method we’ll need is some sort of lookup method() method because we’ve said that
PScheme methods live in a separate namespace from normal fields. We can always recognise
PScheme method invocation and hence the PScheme method name by context: it is always the
first “argument” to an object. We cheat egregiously here and just use the class binding in each
PScheme object to locate the PScheme class, and check to see if the PScheme method is in it. If
not found then lookup method() recurses down the chain of objects via the super binding and
tries again.
Of course we need an object to represent PScheme classes, but that is just going to contain the parent
PScheme class, the fields and methods of the class, and the environment that was current at the point
of its creation. That nascent PScheme Class package will need an Apply() method that will create
PScheme objects on demand, passing any arguments to that object’s nearest init method.
That is pretty much all we need to do. Methods can act just like closures but will extend the
environment representing the object when they are called. There are a few fiddly details around method
invocation on a super object, but we’ll deal with that later.
193 }
194
195 1;
As is usual for a special form it just has an Apply() method. On Line 185 that unpacks the parent
expression, fields and methods from the argument $form, and on Line 188 it evaluates the parent expres-
sion in the current environment to get an actual parent class (PScm::Class) object. Finally on Line
189 it returns a new instance of PScm::Class capturing those values and the current environment.
The new() method in PScm::Class doesn’t do anything too clever either, On Line 10 it declares a
hashref $rh methods, and then on Line 11 it calls a helper static method populate methods hash()
which will chop up the PScheme methods into names, arguments and bodies, storing each pair of args
and body in the hash keyed on the PScheme method name. Then starting on Line 13 it returns a new
instance containing that hash along with the parent PScheme class, fields and current environment:
007 sub new {
008 my ($class, $parent, $fields, $methods, $env) = @_;
009
010 my $rh_methods = {};
011 $class->_populate_methods_hash($rh_methods, $methods);
012
013 return bless {
014 parent => $parent,
015 fields => [$fields->value],
016 methods => $rh_methods,
017 env => $env,
018 }, $class;
019 }
Here’s populate methods hash():
021 sub _populate_methods_hash {
022 my ($class, $rh_methods, $methods) = @_;
023 if ($methods->is_pair) {
024 my $method = $methods->first;
025 my ($name, $args, $body) = $method->value;
026 $rh_methods->{ $name->value } =
027 { args => $args, body => $body };
028 $class->_populate_methods_hash($rh_methods, $methods->rest);
029 }
030 }
That’s it for make-class. Following our previous course, we next need to look at PScheme object
creation, which occurs when a PScheme class is invoked with arguments intended for its init method:
(hclassi hargi...).
was captured by its class. Furthermore each PScheme object (environment) has a super field referring
to the anonymous PScheme object created by its parent PScheme class.
That means we end up with a situation something like Figure 12.1
PScm::Env
PScm::Env
super PScm::Env
super
Although this figure does not tell the whole story, it at least emphasizes that a PScheme object consists
of a number of environment frames, one for each class in the equivalent PScheme class hierachy, but those
environment frames are connected not by a direct parent/child relationship but via an ordinary variable
in each frame called super. The environment frames representing each object extend the environment
that their respective classes captured. This is implied, but not shown, by the unterminated arrows in
the figure.
So we were about to take a look at how PScheme classes create PScheme objects. Classes create
objects when they are invoked as (hclassi hargi...). To make anything invokeable we just need give
it an Apply() method, and here’s the one for PScm::Class:
On Line 35 it calls a make instance() method to create a new PScheme object (really a PScm::Env).
Then on Line 36 it calls the PScm init method of the new object. This is done using a call method()
method of PScm::Env. This takes the PScheme object on which the method is being invoked ($new -
object, which will be passed to the PScheme method as this), the name of the PScheme method
("init"), and the arguments to the method itself ($form and $env.) We’ll look at call method() later.
The make instance() method of PScm::Class must recurse down to the root of the PScheme class
hierachy, creating a chain of anonymous PScheme objects on the way back up, each linked to its parent
by a super field. Here it is:
The first thing it does, on Line 43 is to call its parent PScheme class’ make instance() method to get
an instance of its parent class. Then starting on Line 45 the rest of the method extends the environment
that the PScm::Class object captured when it was created, with appropriate bindings from class to
$self, super to the $parent instance and from each of the PScheme classes fields to initial values
of zero. It is this new environment that is returned by make instance().
If you were reading the above code carefully, you’d have noticed that the super field does not link
directly to the parent instance, but via a derivative of PScm::Env called PScm::Env::Super. This is
so that the super object can have a separate Apply() method. That gives the lie to our simple picture
of environments-as-objects in Figure 12.1 on the preceding page. In fact the true situation is shown in
Figure 12.2.
object
super object
PScm::Env root object
PScm::Env::super
super PScm::Env
class super
field 0 class
PScm::Env PScm::Env
PScm::Class
env PScm::Class::Root
parent
env
method
144 CHAPTER 12. CLASSES AND OBJECTS
To keep things simple this figure only shows an object whose immediate parent is root. You can see that
the PScheme object is joined to its parent via a PScm::Env::Super object bound to its super field,
and that the PScm::Env::Super object also has a super field providing the link to the real parent.
Additionally each PScheme object has a class binding referring to the PScheme class that created it.
That is a PScm::Class for all but the root object, which has no super binding and has a class binding
that refers to a PScm::Class::Root object. PScm::Class::Root is a derivative of PScm::Class, and
it is a PScm::Class::Root instance that will be bound to root in the initial environment.
That conveniently brings us back round to the make instance() method, and how that recursive call
to the parent PScheme class’ make instance() is terminated. That happens when it hits the make -
instance() method of the PScm::Class::Root package, shown next.
The new() method on Lines 74-83 is just meant to be easy to call from the repl where the root class
will be initialised. It creates a PScheme class with no parent, no fields, no methods, and whatever env
is passed in3 .
The make instance() method on Lines 85-93 is not recursive, it just extends the captured environ-
ment with a binding of class to $self (the PScm::Class::Root object,) returning the result. Note
3
It’s actually redundant for that root environment to have a class binding or a parent environment, since the root class
currently has no methods. However if we did want to extend the implementation to add generic methods to the root class
then all the pieces we need are in place, so we can accept that redundancy for now.
12.2. IMPLEMENTATION 145
that it takes advantage of the fact that ExtendUnevaluated() can cope with a single symbol and value
as well as lists of the same.
call method() is passed both the “real” perl object $self and the object representing PScheme’s idea
of the current object, $this, that the PScheme method is being invoked on. Normally these are one and
the same. Additionally it is passed the method name, arguments and another environment in which the
arguments are to be evaluated if a method is found. On Line 171 it uses lookup method() (discussed
next) to find the method, and if found then on Line 172 it invokes the PScheme method by calling its
ApplyMethod() and returns the result. If no method can be found it returns undef, and since in the
case of PScm::Class’ Apply() the result of calling init is discarded anyway, it is not fatal if an init
method is not found.
lookup method() employs a simple strategy to locate a method. First it checks in the current
PScheme object’s class, and if it can’t find the method there, it recurses to its super object. That leads
to the equally simple definition below.
So lookup method() breaks down into two simpler methods: lookup method here() and lookup -
method in super(), which it tries in turn. lookup method here() is similarily simple.
149 }
150 }
It checks to see if the current object has a class binding, and if so it calls get method() on the class,
returning the result. get method() will return undef if it can’t find the method in the class, and
lookup method here() returns undef if there is no class binding.
lookup method in super() is equally simple.
It checks to see if the current PScheme object has a super, and if so it calls lookup method() on it.
Otherwise it returns undef.
Since lookup method(), lookup method here() and lookup method in super() are all methods
of PScm::Env, they are all available to PScm::Env::Super where they work without modification:
super objects have a super field but no class field.
Going back to lookup method here(), if that found a class binding, it called get method() on the
PScheme class, passing it the method name to look for, and perhaps less obviously, $self as well. Here’s
what get method() back in PScm::Class does with those arguments.
On Line 62 it looks in its methods subhash for a key matching the string $method name. If it finds one it
knows it has found the method and returns a new instance of PScm::Closure::Method, a closure just
like a lambda expression, containing the relevant method args and method body from the subhash, and
most importantly capturing the environment $object. Reasoning backwards, this is correct, $object
(the $self from lookup method here()) is the environment in which the method was found, (via class)
and that is the environment that the method should extend when it executes, so that the method body
can “see” the fields of the object.
I’d just like to emphasize a point here, the object $object passed to get method() is not necessarily
the same as the this that will be passed to the method when it executes. That would only be true if
the method was found in the first PScheme object that lookup method() looked in.
There’s very little left to cover now. We just need to take a look at PScm::Closure::Method. This
is a subclass of PScm::Closure, as you can see.
12.2. IMPLEMENTATION 147
The new() method on Lines 71-81 is the one we just saw being called by get method(). What differen-
tiates it from the normal PScm::Closure::Function new() method is that on Line 75 it prepends the
symbol this to the argument list as it constructs the closure. That “implicit” argument will be supplied
by ApplyMethod() which you can also see in this package.
The defining feature of a closure is that it captures an environment when it is created and extends it
when it is executed. These method closures are no different, but the environment that they capture is
the object in whose class the method was found. Hence method bodies can see the fields of the object
as normal variables: they are normal variables.
ApplyMethod() also behaves pretty much like the normal closure’s Apply(), but it differs in having
an extra $this argument. On Line 85 it calls map eval() on the argument $form with the current
environment to get a list of evaluated arguments, just as the normal closure’s Apply() does. But then
on Line 86 it prepends $this (the PScheme this) to those actual arguments when calling the generic
PScm::Closure apply() method. This ties in with the new() method having supplied an extra symbol
this to the list of formal arguments.
We’ve now covered everything to do with PScheme object creation and initialisation in PScheme. Along
the way we’ve seen, by following the process of calling an object’s init method, most of the machinery
behind method invocation. There are only two remaining details to fill in.
148 CHAPTER 12. CLASSES AND OBJECTS
Since objects (environments) are now directly invokeable, they too must have an Apply() method, shown
here:
On Line 179 it splits the argument $form into a method name (a symbol) and a list of arguments to the
method. Then on Line 181 it attempts to call the method, and collects the result. Now the result will
only be undefined if a method could not be found, in which case an exception is raised. Otherwise the
result is returned.
If you compare the Apply() method here with the one in PScm::Env above, you can see they differ in
that on Line 199 the Apply() looks up this in the current environment. Then on Line 201 it passes
$this instead of $self to call method(). The upshot of that is the variable $this will be the one that
gets bound to the implicit this argument to the PScheme method when it is invoked.
12.2.6 Wiring it up
And finally, we just need to see how the new object code is wired into the repl. Here’s ReadEvalPrint()
for version 0.0.9 of our interpreter.
061 PScm::Class::Root->new($initial_env)
062 );
063
064 while (defined(my $expr = $reader->Read)) {
065 my $result = $expr->Eval($initial_env);
066 $result->Print($outfh);
067 }
068 }
The changes are in bold. On Line 41 I finally caved in and added primitive addition as a builtin. I
leave it to the reader to do the same. On Line 56 you can see the additional binding of make-class to
a PScm::SpecialForm::MakeClass object, and on Line 59 we attach a new PScm::Class::Root to
the symbol root in the initial environment. That needs to be done using Define() because we need to
pass the value of $initial env to the new() method of PScm::Class::Root
• A PScm::Class::Root object bound to root in the initial environment provides a base class in
which other classes can be rooted.
• PScm::Class has an Apply() method, and when a PScm::Class is invoked with arguments, that
Apply() method first creates a PScm object, which is in fact just an instance of a PScm::Env,
then calls that new object’s init method with the arguments that were passed to the class.
• To create a new object (environment) PScm::Class’s Apply() calls make instance() which re-
curses down the chain of PScheme classes to the root and creates a chain of objects (environments)
on the way back up
– Each element of this chain has a class binding referring to the PScm::Class instance that
created it.
– Each element is joined indirectly to the previous by a super binding referring to a PScm::
Env::Super object which itself has a super binding referring to the actual parent object.
– Each object in the chain extends the environment that its class captured when it was created.
• To call a PScheme method, init or otherwise, the call method() method of PScm::Env is used.
This uses lookup method() to locate the method and create an instance of PScm::Closure::
Method from it. If a method is found call method() invokes the method’s ApplyMethod().
12.4. TESTS 151
• lookup method() looks first in the current environment for a class binding and if found checks
the class for the method, otherwise it recurses on the super field.
• Apply() in PScm::Env passes $self (the object on which the method is being invoked) as the
value of this to call method().
• The Apply() method of PScm::Env::Super instead looks up the value of this in the current
environment and passes that as the value of this to call method.
• When a PScheme method is found in a PScheme class, the PScm::Class method get method()
creates an instance of PScm::Closure::Method, a closure which captures the environment (ob-
ject) in whose class the method was found, and which has an additional implicit self argument.
Since the new PScm::Class has a file to itself there’s a full listing in Listing 12.5.1 on page 153.
To recap, let’s consider our original example classes: Account and InterestAccount
Figure 12.3 on the following page shows the situation after the creation of the Account and
InterestAccount classes, and the my-account instance of an InterestAccount that was discussed
in the examples in Section 12.1 on page 135.
You can see that the my-account object is really just a PScm::Env and its parent env is the global
environment (implied by the unterninated heavy arrows.) The my-account object’s parent environment
is the global environment because that is the environment that the InterestAccount class was created
in. If the InterestAccount class had captured a different environment, then that would have been the
one that instances of that class extended.
Note the three bindings in the my-account object. The rate variable is the one supplied by the class
definition, the other two, super and class are automatically provided by the implementation when new
objects are created.
The super variable refers to a PScm::Env::Super object, derived from PScm::Env, which in turn
has a super variable, and differs from PScm::Env only in its Apply() method, which arranges to
forward the current value of this (rather than the super object itself) to the called method.
The class variable refers to a PScm::Class object which contains field (variable) names, method
names along with their definitions, the environment that was current at the time of the creation of the
PScm::Class, and a parent field pointing at the parent PScheme class.
12.4 Tests
Tests for our OO extension are in Listing 12.5.2 on page 155.
The first test exercizes the creation of a class. The second test creates a class (our account class
from the examples above) then creates an object from it and calls a couple of its methods. The third
test uses the interest-account example that we’ve looked at to test inheritance. The fourth test
demonstrates that lexical variables outside of a class are visible to its methods and can therefore be used
as class variables. Finally, the fifth test uses an abstract form of that “leo” example to demonstrate
that method calls on a super object persist the current value of this.
152 CHAPTER 12. CLASSES AND OBJECTS
my-account
PScm::Env
methods:
accumulate
init
env
parent
PScm::Env
methods:
deposit
withdraw
balance
init
env PScm::Env
parent
class
root PScm::Class:
:Root
env
12.5. LISTINGS 153
12.5 Listings
12.5.1 PScm/Class.pm
001 package PScm::Class;
002
003 use strict;
004 use warnings;
005 use base qw(PScm);
006
007 sub new {
008 my ($class, $parent, $fields, $methods, $env) = @ ;
009
010 my $rh methods = {};
011 $class-> populate methods hash($rh methods, $methods);
012
013 return bless {
014 parent => $parent,
015 fields => [$fields->value],
016 methods => $rh methods,
017 env => $env,
018 }, $class;
019 }
020
021 sub populate methods hash {
022 my ($class, $rh methods, $methods) = @ ;
023 if ($methods->is pair) {
024 my $method = $methods->first;
025 my ($name, $args, $body) = $method->value;
026 $rh methods->{ $name->value } =
027 { args => $args, body => $body };
028 $class-> populate methods hash($rh methods, $methods->rest);
029 }
030 }
031
032 sub Apply {
033 my ($self, $form, $env) = @ ;
034
035 my $new object = $self->make instance();
036 $new object->call method($new object, "init", $form, $env);
037 return $new object;
038 }
039
040 sub make instance {
041 my ($self) = @ ;
042
043 my $parent instance = $self->{parent}->make instance();
044
045 return $self->{env}->ExtendUnevaluated(
046 new PScm::Expr::List(
047 PScm::Expr::Symbol->new("class"), # $self
048 PScm::Expr::Symbol->new("super"), # $parent instance
049 @{ $self->{fields} }, # 0...
154 CHAPTER 12. CLASSES AND OBJECTS
050 ),
051 new PScm::Expr::List(
052 $self, # "class"
053 PScm::Env::Super->new(super => $parent instance), # "super"
054 ((PScm::Expr::Number->new(0)) x @{ $self->{fields} }), # field...
055 )
056 );
057 }
058
059 sub get method {
060 my ($self, $method name, $object) = @ ;
061
062 if (exists $self->{methods}{$method name}) {
063 return PScm::Closure::Method->new(
064 $self->{methods}{$method name}{args},
065 $self->{methods}{$method name}{body}, $object);
066 }
067 }
068
069 ##########################
070 package PScm::Class::Root;
071
072 use base qw(PScm::Class);
073
074 sub new {
075 my ($class, $env) = @ ;
076
077 return bless {
078 parent => 0,
079 fields => [],
080 methods => {},
081 env => $env,
082 }, $class;
083 }
084
085 sub make instance {
086 my ($self) = @ ;
087
088 return $self->{env}
089 ->ExtendUnevaluated(
090 new PScm::Expr::Symbol("class"),
091 $self
092 );
093 }
094
095 1;
12.5. LISTINGS 155
052 (init (x r)
053 (begin
054 (super init x)
055 (set! rate r)))
056 (accumulate ()
057 (this deposit (* (this balance) rate)))
058 ))
059 (define myaccount (interest-account 10 2))
060 (myaccount balance)
061 (myaccount withdraw 2)
062 (myaccount balance)
063 (myaccount accumulate)
064 (myaccount balance)
065 EOF
066 account
067 interest-account
068 myaccount
069 10
070 8
071 8
072 24
073 24
074 EOR
075
076 eval ok(<<EOF, <<EOR, ’class variables’);
077 (define counter-class
078 (let ((count 0))
079 (make-class
080 root
081 ()
082 (init () (set! count (+ count 1)))
083 (count () count)
084 )))
085 (define o1 (counter-class))
086 (o1 count)
087 (let ((o2 (counter-class))
088 (o3 (counter-class)))
089 (o1 count))
090 EOF
091 counter-class
092 o1
093 1
094 3
095 EOR
096
097 eval ok(<<EOF, <<EOR, ’super calls’);
098 (define c1
099 (make-class
100 root
101 ()
102 (ma () (this mb))
103 (mb () 0)
12.5. LISTINGS 157
104 ))
105 (define c2
106 (make-class
107 c1
108 ()
109 (ma () (super ma))
110 (mb () 1)
111 ))
112 ((c2) ma)
113 EOF
114 c1
115 c2
116 1
117 EOR
118
119 # vim: ft=perl
Continuations
What are continuations? Why should you want to know about them? The rest of this chapter is devoted
to answering the first of those questions, but the second question deserves some sort of an answer early
on, if only to encourage you to pursue the answer to the first.
I hope you can remember (I certainly do) that wonderful eureka moment when you first “got”
recursion, and all its implications. Grasping the concept of continuations is an even more rewarding and
dare I say transcendental experience, and well worth the effort.
Continuations are an advanced control-flow technique that can be used to implement any and all
standard control-flow mechanisms including but not limited to conditional branching, loops (with break
statements), goto, return etc. Beyond the standard control-flow mechanisms, continuations also promise
an almost limitless potential for new types of control flow that might be difficult or near impossible to
achieve in any other way, for example
• co-routines;
• threads;
• exceptions;
• logic programming
Let’s talk a bit about co-routines. Co-routines are groups of two or more functions or methods that
interact with one another in a much more even-handed way than just “A() calls B().” A classic example
is a producer-consumer pair of routines, which pass data, possibly via some intermediate structure such
as a list. The producer produces data, pushing it on to the list, and the consumer consumes it, shifting
it off the list again, something like Figure 13.1 on the next page.
Think of the producer using some complex algorithm to generate a stream of data while the consumer
uses an equally complex algorithm to parse it. Both the producer and consumer are independant loops,
so on the face of it, if the producer was called it would never relinquish control to give the consumer a
look-in, it would just continue to push data onto that list. Likewise the consumer, if it were running,
would just consume data until the list was exhausted. Both loops could have extensive internal logic and
state such that even if the producer could simply call the consumer when it had produced something, the
consumer would have great difficulty returning control to the producer without loosing all of its internal
state. Reversing the roles, so that the consumer called the producer would still have exactly the same
issues.
159
160 CHAPTER 13. CONTINUATIONS
Data
Put Get
Producer Consumer
The only apparent solution would be to implement the producer and the consumer as separate threads,
or even as completely separate processes and have some ipc mechanism to pass the data between them.
But this pair of co-routines might only be part of a much larger system and the division into separate
threads or processes might be inelegant or inappropriate. Continuations provide a solution by allowing
a sort of “procedural goto” wherby control passes directly from the center of one routine to the heart of
the other, and then back again, resuming exactly where the “goto” left off!
Threads are a common enough concept nowadays, but you might be surprised to hear that continuations
make it almost trivial to implement so called “green” threads (application as opposed to operating system
threads). We’ll actually do this in a later chapter.
Exceptions are a simple application of continuations, where control, rather than unwinding down a stack,
proceeds immediately to some handler routine. Perl’s eval{die} construct is an example of this sort of
thing. We’ll demonstrate a very simple error construct towards the end of this chapter.
Logic Programming, as demonstrated by languages such as Prolog [3], is a completely different paradigm;
it has more in common with recursive database search and large-scale pattern matching than the mostly
functional style of programming presented by PScheme. However a more advanced application of con-
tinuations makes it possible to implement the basis of such a language, and we will attempt that later
on too.
I hope that I have whet your appetite for the potential of continuations, however the topic of continuations
is somewhat difficult, and this chapter is a long one.
Before diving in, it would be a good idea to discuss a couple of related topics, namely tail recursion and
tail call optimization. Then with those under our belts, we can progress to continuations themselves. We
will talk about continuations by discussing continuation passing style, a programming technique available
to many languages, including Perl. Then we proceed to re-write the interpreter from Chapter 12 on
page 135 in continuation passing style, and by exposing the underlying continuations in the PScheme
language, we show what an incredibly powerful tool they are.
Much of the above may not make much sense on first reading, but hopefully the rest of the chapter
will make it clear, so let’s get started.
13.1. TAIL RECURSION AND TAIL CALL OPTIMIZATION 161
sub factorial {
my ($n) = @_;
if ($n == 0) {
return 1;
} else {
return $n * factorial($n - 1);
}
}
This is the classic recursive definition of the factorial function: the factorial of 0 is 1, and the factorial of
any other positive number is that number times the factorial of one less than that number (factorial is
not defined for negative numbers). You will be getting very familiar with this function in various guises
from here on in so it is probably worth taking a good long look at it now in its simplest form before we
start to change things.
To start off, consider the behaviour of this function when called with a positive numeric argument.
The evolution on the stack of the call to factorial(5) would proceed as follows:
factorial(5)
5 * factorial(4)
5 * 4 * factorial(3)
5 * 4 * 3 * factorial(2)
5 * 4 * 3 * 2 * factorial(1)
5 * 4 * 3 * 2 * 1 * factorial(0)
5 * 4 * 3 * 2 * 1 * 1
5 * 4 * 3 * 2 * 1
5 * 4 * 3 * 2
5 * 4 * 6
5 * 24
120
Although this picture omits many details, it is obvious that the stack grows (to the right in the example)
and that there are deferred multiplications that only get performed as the calls to factorial() return
and the stack is unwound again.
We can rewrite that factorial function in a different form with the addition of a helper function, like
this:
sub factorial {
my ($n) = @_;
return factorial_helper($n, 1);
}
sub factorial_helper {
162 CHAPTER 13. CONTINUATIONS
This version works by moving the body of the factorial function into the helper function and passing
it an additional value, an accumulator with an initial value of 1. This means that the helper function
can calculate the result as it proceeds up the stack rather than having to wait for $n to reach zero and
calculating the result on the way back down.
This is still a recursive definition, but because of the way the result is calculated it differs from the
original factorial() in one absolutely crucial detail: the last thing it does is to call itself recursively and
it immediately returns the result. In the original definition the result of the recursive call to factorial()
had to be multiplied by the current value of $n before it could be returned.
A function which is called and its result immediately returned is said to be in tail position and the
code making the call is said to be making a tail call. A recursive function which calls itself in tail position
is said to be tail recursive. Tail calls are special because the stack setup to make the call and teardown
afterwards is essentially redundant: the result of the function making the tail call is the result of the
function being called, and the caller’s stack frame is destroyed immediatly after the called function’s
stack frame is. If we could overwrite the caller’s arguments with the arguments to the called function,
then goto the called function, then when that function does a return it will return not to the caller,
but to the caller of the caller. This is called tail-call optimization (tco).
Figure 13.2 shows a normal procedure call in tail postion. You can see that the stack is extended by
the called function’s frame (which includes the return address), then that extension is discarded as the
called function returns, then the calling function’s frame is subsequently discarded as the calling function
returns to the previous caller (we are talking about the Perl stack here, not PScheme environments).
call return
Figure 13.3 on the facing page shows the effect of tail call optimization. The caller’s frame is replaced
by the called function’s frame, then the caller jumps to the called function. When the called function
returns it does so directly to the previous caller.
Perl allows us to do precisely this, by means of assignment to @ and the special goto &sub syntax.
Here’s our factorial helper() again, this time with tco:
13.1. TAIL RECURSION AND TAIL CALL OPTIMIZATION 163
goto return
sub factorial_helper {
my ($n, $result) = @_;
if ($n == 0) {
return $result;
} else {
@_ = ($n - 1, $n * $result);
goto \&factorial_helper;
}
}
This function, although written in a recursive style, operates in a constant space and consumes no stack.
In fact it is pretty much equivalent to this iterative definition:
sub factorial_helper {
my ($n, $result) = @_;
REPEAT:
if ($n == 0) {
return $result;
} else {
--$n; $result *= $n;
goto REPEAT;
}
}
which just emphasizes the point that tco’d tail-calls are really just gotos with arguments.
Many language implementations (gcc springs to mind) can perform implicit tco, detecting calls in
tail position and replacing the call with a goto, and that’s all calls in tail position, not just recursive
ones. Furthermore, some languages such as Scheme require this behaviour of their implementations1 .
Our PScheme implementation, through the use of continuations, will by the end of this chapter support
something equivalent.
That’s really all there is to tail recursion and tco2 . I’ve already said they have a direct bearing on
continuations, but there is a lot more to continuations than that, so lets take a look at using continuations
in perl.
1
R6RS requires that tail calls consume no resources, not necessarily that they perform precisely the tco demonstrated
here. For now if you can just accept that implicit tco or an equivalent is a desirable thing, all will become clear later.
2
For an alternative exposition of tco (sometimes called Tail Call Elimination) see [4, pp229–234]
164 CHAPTER 13. CONTINUATIONS
Continuation passing style eliminates the second of these operations; in pure continuation passing style
no function you call ever returns!
13.2. CONTINUATION PASSING STYLE 165
. . . That being the case, you need to figure out how to tell a cps function what to do with its result. So,
since a cps function can’t return a result, it is instead passed an additional procedure as argument: a
continuation, and it passes its result to that.
The continuation represents the remainder of the computation after a function “returns”. Since
calling a continuation is equivalent to returning a result in non-cps, you can also think of a continuation
as a reference to your function’s return statement.
As you might imagine, a computation which never returns will simply consume stack indefinately,
until it completes, but I hope the discussion of tco above has addressed some of your reservations on that
score, and as I’ve said, continuations themselves, when fully realised, provide an alternative mechanism
for dealing with the same issue.
So what does continuation passing style look like in Perl? Well since continuations are procedures,
closures are an obvious and easy way to implement them. So our continuations can be created by sub {
... } and called by $continuation->(...).
For a first example of cps transformation, we’ll go back to our original factorial() function from
Section 13.1 on page 161 and re-write it in cps. To save you having to refer back to it, here it is again.
sub factorial {
my ($n) = @_;
if ($n == 0) {
return 1;
} else {
return $n * factorial($n - 1);
}
}
Now as we have said, all cps functions take an additional continuation argument. The continuation we
pass it depends on what we want to do with the result. Our original example printed the result, so let’s
just pass a continuation to do that:
The additional continuation argument sub { print shift, "\n" } just takes one argument and prints
it.
Next up is factorial() itself. This cps factorial() takes an additional continuation as argument,
so the first couple of lines are easy:
sub factorial {
my ($n, $cont) = @_;
Next remember that whenever the function used to return a result, it must now call its continuation on
that result, so the next couple of lines are also pretty easy: wheras the original returned 1 if $n was 0,
the cps version calls its continuation on the value 1 instead.
13.2. CONTINUATION PASSING STYLE 167
if ($n == 0) {
$cont->(1);
This works for factorial(0, sub { print shift, "\n" } ): the continuation will get and print a 1.
That leaves the tricky bit. The original function reads:
} else {
return $n * factorial($n - 1);
}
You can see that recursive call to factorial() has some deferred computation, namely the multiplication
by $n to be done when the call returns. But as we’ve said a cps function never returns so we must
somehow wrap that deferred computation up in a new continuation and pass it to factorial().
If you get stuck on a difficult cps transform, it almost always pays to break the expression into a
sequence of simpler operations first. We can do that here easily enough:
} else {
my $factorial_result = factorial($n - 1);
my $result = $n * $factorial_result;
return $result;
}
So this is much easier now. You can see that the first thing that happens is that factorial() calls itself.
Then the result is multiplied by $n, and finally it is returned. So our new continuation is just the code
that now follows the call to factorial(), wrapped in a function:
sub {
my ($factorial_result) = @_;
my $result = $n * $factorial_result;
return $result;
}
Since factorial() will call this continuation with its result, $factorial result is the argument to the
continuation.
There is one additional change that we need to make. Where the original code did a return $result,
our new continuation must call the original continuation on the $result instead.
sub {
my ($factorial_result) = @_;
my $result = $n * $factorial_result;
$cont->($result);
}
This is our new continuation. All that remains is to pass it to our recursive factorial call:
168 CHAPTER 13. CONTINUATIONS
} else {
factorial($n - 1, sub {
my ($factorial_result) = @_;
my $result = $n * $factorial_result;
$cont->($result);
}
);
}
} else {
factorial($n - 1, sub { $cont->($n * shift) });
}
The new continuation is sub { $cont->($n * shift) }. It takes one argument: the result so far (this
is the value that our non-cps factorial() would have returned). It multiplies the result by the current
value of $n then calls the current continuation $cont on that value3 .
That completes our initial cps re-implementation of factorial():
sub factorial {
my ($n, $cont) = @_;
if ($n == 0) {
$cont->(1);
} else {
factorial($n - 1, sub { $cont->($n * shift) });
}
}
sub {
print shift, "\n"
}
sub {
sub {
print shift, "\n"
}->(3 * shift)
}
sub {
sub {
sub {
print shift, "\n"
}->(3 * shift)
}->(2 * shift)
}
sub {
sub {
sub {
sub {
print shift, "\n"
}->(3 * shift)
}->(2 * shift)
}->(1 * shift)
}
sub {
sub {
sub {
sub {
print shift, "\n"
}->(3 * shift)
}->(2 * shift)
}->(1 * shift)
}->(1) # factorial 0
sub {
sub {
sub {
print shift, "\n"
}->(3 * shift)
}->(2 * shift)
}->(1)
170 CHAPTER 13. CONTINUATIONS
sub {
sub {
print shift, "\n"
}->(3 * shift)
}->(2)
sub {
print shift, "\n"
}->(6)
print 6, "\n"
The deferred multiplications accumulate until we reach the point where the entire accumulated continua-
tion is finally called with argument 1, then they unwind from the outside in until the original continuation
gets invoked on the argument 6 and 6 is printed. If you think about it, this evolution is functionally
identical with the implicit deferred computations on the stack in our original factorial(), the only
difference being that now we have a variable $cont that explicitly refers to the continuation.
Still sticking with our cps factorial(), there is more that we can do. Because in cps no function
ever returns, all function calls must be in tail position!4 . As you can see our recursive call to factorial()
is now in tail position, so we can use tco to remove the spurious use of stack:
sub factorial {
my ($n, $cont) = @_;
if ($n == 0) {
$cont->(1);
} else {
@_ = ($n - 1, sub { $cont->($n * shift) });
goto \&factorial;
}
}
This is still a “recursive” definition of factorial(), but now it is not the stack which is growing, but
the continuation itself which consumes more and more space as our computation proceeds.
An astute reader will have realised that, in fact, we are still using stack when the continuations
actually get triggered: those calls to $cont->($n * shift) will of course use just as much stack as the
original did. However note that the continuations themselves must be called in tail position, so with a
little more work we can eliminate that stack overhead too:
sub factorial {
my ($n, $cont) = @_;
if ($n == 0) {
@_ = (1);
goto $cont;
} else {
@_ = ($n - 1, sub { @_ = ($n * shift); goto $cont; });
4
It’s obvious really: Since no cps function ever returns, any deferred computation must have been moved into the
continuation, and a function call without deferred computation is by definition in tail position.
13.2. CONTINUATION PASSING STYLE 171
goto \&factorial;
}
}
This is very messy, but it works as advertised: it consumes absolutely no stack at any point; all deferred
computations are in the continuations. Just bear in mind that in a language that provided implicit tco,
we wouldn’t need any of those assignments to @ or the gotos, and I’ve promised that continuations
themselves will allow an alternative and cleaner solution in our interpreter.
Moving on, what about that iterative/recursive definition of factorial() with a helper function
from Section 13.1 on page 161? We can re-write that in cps too. How does it compare? Well first here’s
a non-cps variation on the original again, thoroughly tco’d this time:
sub factorial {
my ($n) = @_;
@_ = ($n, 1);
goto \&factorial_helper;
}
sub factorial_helper {
my ($n, $result) = @_;
if ($n == 0) {
return ($result);
} else {
@_ = ($n - 1, $n * $result);
goto \&factorial_helper;
}
}
sub factorial {
my ($n, $cont) = @_;
@_ = ($n, 1, $cont);
goto \&factorial_helper;
}
sub factorial_helper {
my ($n, $result, $cont) = @_;
if ($n == 0) {
@_ = ($result);
goto $cont;
} else {
@_ = ($n - 1, $n * $result, $cont);
goto \&factorial_helper;
}
172 CHAPTER 13. CONTINUATIONS
Our new tail recursive cps factorial() function takes an additional continuation argument and passes
that to factorial helper(). factorial helper() either goes to the continuation with the result, or
goes to itself with new values for $n and $result; but since it has no deferred computation, it does not
need to construct a new continuation and just passes the existing continuation to the recursive call.
The take home message here is that this tail recursive definition of factorial() using factorial -
helper() translates into a cps where neither the stack nor the continuation grows. This is a general
result: functions written to be tail-recursive consume no stack when tco’d, and do not build new
continuations when rewritten into cps.
The “normal” way, or at least the easiest way to produce cps code is to do what we did above: take
non-cps code and translate it into cps. In the next section we’re going to look at a few examples of
simple, hypothetical function forms and how they translate into cps.
• The simplest kind of function is one that takes no arguments and returns a constant:
sub A {
return ’hello’;
}
The cps form of this takes a continuation as argument and calls the continuation on the constant:
sub A {
my ($ret) = @_;
$ret->(’hello’);
}
I’ve called the continuation $ret instead of $cont to emphasize it’s equivalence with a return
statement. One of the guiding principles of converting to cps is that calling the argument con-
tinuation in cps is equivalent to doing a return in non-cps. In fact you can mentally substitute
return(...) for $ret->(...) in many of these examples without disturbing the sense of them.
• The next simplest form is a function that takes arguments and performs only primitive operations
such as addition on them, returning the result:
13.3. EXAMPLE CPS TRANSFORMATIONS 173
sub A {
my ($x, $y) = @_;
return $x + $y;
}
In this case, since primitive operations can’t take a continuation, and because they are “terminal”
operations that won’t run away off up the stack, we again can just call the continuation on the
result:
sub A {
my ($x, $y, $ret) = @_;
$ret->($x + $y);
}
• Next come simple functions that just call another function without any deferred computation:
sub A {
my ($x) = @_;
return B($x);
}
Here, since there is no deferred computation, there need be no new continuation, we just pass the
existing continuation to the called function:
sub A {
my ($x, $ret) = @_;
B($x, $ret);
}
• Now we’re starting to get into areas where there is deferred computation, and this is where it starts
to get just a little bit tricky:
sub A {
return C(B());
}
B() gets called first, and the value it returns is passed as argument to C(). In cps B() would never
return so we must also call it first, passing it a new continuation that calls C() with B()’s result
and the current continuation:
174 CHAPTER 13. CONTINUATIONS
sub A {
my ($ret) = @_;
B(
sub {
my ($B_result) = @_;
C($B_result, $ret);
}
);
}
The new continuation calls C() with two arguments: the result of the call to B(), and the original
continuation $ret to which C() should return its result. Since the original call to C() was returned
as the result of the call to A(), C() is being told “return your result here”.
sub A {
B();
C();
D();
}
Here we must construct a nest of continuations to ensure that C() and D() get called in the correct
order after B():
sub A {
my ($ret) = @_;
B(
sub {
C(
sub{
D($ret);
}
);
}
);
}
We call B() with a continuation that will call C() with a continuation that will call D() with the
original continuation $ret since the result of the call to D() was the result of the original call to
A(). Again this is just saying to D() “return your result here.”
sub A {
my ($x) = @_;
if (B($x)) {
C($x);
} else {
D($x);
}
}
The call to B() in the condition will never return, so we must pass it a continuation that tests its
result and decides which branch to take accordingly, passing the the original continuation to the
chosen branch:
sub A {
my ($x, $ret) = @_;
B($x,
sub {
my ($B_result) = @_;
if ($B_result) {
C($x, $ret);
} else {
D($x, $ret);
}
}
);
}
Both the true and the false branch used to make a single call in tail position to C() or D(), so now
we simply pass the original continuation unchanged as an additional argument to C() or D().
sub A {
my $i = 0;
while ($i < 10) {
B();
++$i;
}
}
This needs a bit more thought. It turns out to be easiest to do a preliminary rewrite of this example
into a recursive form as follows:
176 CHAPTER 13. CONTINUATIONS
sub A {
A_h(0);
}
sub A_h {
my ($i) = @_;
if ($i < 10) {
B();
A_h($i + 1);
}
}
Turning that into cps then becomes just a re-application of examples we’ve seen before:
sub A {
my ($ret) = @_;
A_h(0, $ret);
}
sub A_h {
my ($i, $ret) = @_;
if ($i < 10) {
B(
sub { A_h($i + 1, $ret); }
);
}
}
A() calls A h() with the continuation unchanged (A h(), return your result here.) Since B() will
not return, it is passed a continuation that carries on the recursion on A h(), passing the original
continuation $ret.
The examples above give a taste of the sorts of transformations that we shall be applying to our interpreter
soon. There are other more difficult cases that might appear impossible at first sight (uses of map for
example,) but again they can be resolved by first re-writing the expressions in a more tractable form
before converting to cps. We’ll see examples of this sort of thing when we get to them.
It happens that there does exist a formal methodology for transforming statements in any language
capable of supporting cps into cps. The above example transformations are samples from that ruleset.
All such transformations can be automated. When I started this chapter I was hopeful that perhaps
something in the B package, the Perl compiler, would be available that could perform the transform but
that appears not to be the case. Anyhow we’ll learn a lot more about cps by performing the transform
manually, so that is the best approach to take.
3. If you were to return something it would be guaranteed to return all the way down the stack to
the originating caller5 .
Now just suppose that at well chosen points we do return something, and not just anything. Suppose
we return another continuation, this time taking no arguments, that when called just continues the
calculation from where it left off!
That is one of the surprising things about continuations, that they are completely self-contained and
require no external context to operate. You may need to convince yourself that this will work: Since we
can tco a cps function, such that it uses no Perl stack at all, then even if the cps code is not tco’d
there can be nothing on the Perl stack that it actually needs, just a long chain of return adresses that
it pases through after the computation is finished. Returning a continuation like this merely interleaves
this otherwise laborious chain of returns with the normal flow of control up the stack.
So how do we deal with this returned continuation? A handler routine, called a trampoline, starts
off by being called with a continuation of no arguments. It loops, repeatedly calling the continuation
and assigning the result (another continuation of no arguments) back to the continuation itself until the
result is undef. The code is easier to write than to describe:
sub trampoline {
my ($cont) = @_;
$cont = $cont->() while defined $cont;
}
To give you a feel of how this might work, let’s return once more to our cps factorial() function and
re-write it to make use of a trampoline instead of tco. First to refresh your memory here’s our first cps
attempt again (slightly modified) before we tco’d it:
sub factorial {
my ($n, $ret) = @_;
if ($n == 0) {
$ret->(1);
} else {
factorial($n - 1,
sub {
my ($a) = @_;
$ret->($n * $a)
});
}
}
sub factorial {
my ($n, $ret) = @_;
if ($n == 0) {
return sub { $ret->(1); };
} else {
return sub {
factorial($n - 1,
sub {
my ($a) = @_;
return sub { $ret->($n * $a) }
});
};
}
}
sub trampoline {
my ($cont) = @_;
$cont = $cont->() while defined $cont;
}
trampoline(
sub {
factorial(5, sub { print shift, "\n"; return undef; });
}
); # still prints 120
Changes from the original are in bold as usual. The key to understanding this is to notice that whenever
a function call was done in the original, either to factorial() or to the continuation, a closure which
will make that call is returned to the trampoline instead. Each time this happens the stack is completely
cleared down and the trampoline resumes the computation by calling the returned closure. Finally, at
the end of the computation, the original continuation passed to factorial() gets invoked, printing the
result and returning undef to the trampoline causing it to stop.
Like tco, the trampoline technique is not specific to cps, but both techniques require that the
modified calls be in tail position, making cps a prime candidate for either kind of optimisation6 .
Well that’s pretty scary stuff. Both tco and the trampoline are simply alternative strategies to avoid
unlimited use of the stack, and you may be wondering if the trampoline has any advantages over tco at
this point. I’d like to make a few arguments in favour of the trampoline here.
1. Our factorial() example is a very tight piece of code which somewhat overemphasizes the role of
the trampoline by doing a lot with it in a small space. Particularily the explicit return of a closure
to make the recursive call does not have to be done for every tail call, we just need to ensure it
6
The trampoline further requires that all intermediate calls also be in tail position, and that the values of all tail calls
are returned, so that a returned value would be guaranteed to reach the trampoline. Fortunately, cps satisfies the first of
these requirements, and Perl satisfies the second.
13.5. USING CPS 179
happens fairly regularily on our way up the stack. For example in a set of mutually tail recursive
subroutines, A() calling B() calling C() calling A(). . . , only one of those subroutines need do that
return. This is in contrast to tco, where any unoptimised tail call constitutes a permanently
unclaimed stack frame.
2. Some languages do not allow the possibility of doing tco, so any cps implementation using such
a language would have to use a trampoline.
3. We can hide the trampoline from client cps code by representing continuations as objects which
contain the closures, and putting the return to the trampoline in the method that invokes the
closure (provided that method is invoked in tail position).
It is the third argument that swings the case, and that’s exactly what we’ll be doing. If you don’t get
that argument yet, hold on and it will be made clear later.
sub A {
print "in A\n";
B();
7
This is what is meant by “Stackless Python”: an implementation of that language in cps with complete tco.
180 CHAPTER 13. CONTINUATIONS
sub B {
print " in B\n";
C();
print " back in B\n";
}
sub C {
print " in C\n";
}
sub X {
print "in X\n";
Y();
print "back in X\n";
}
sub Y {
print " in Y\n";
Z();
print " back in Y\n";
}
sub Z {
print " in Z\n";
}
A();
X();
A() calls B() which calls C(), and X() calls Y() which calls Z(). The top level calls A() then X(). You
shouldn’t take too long to convince yourself that it will produce the following output:
in A
in B
in C
back in B
back in A
in X
in Y
in Z
back in Y
back in X
Just to hammer home the simple point, Figure 13.4 on the facing page shows the thread of control flow
passing through A(), B(), C(), X(), Y() and Z().
13.5. USING CPS 181
C Z
B Y
A X
Now let’s re-write that program into cps, without changing any of it’s behaviour:
sub A {
my ($ret) = @_;
print "in A\n";
B(sub { $ret->(print "back in A\n") });
}
sub B {
my ($ret) = @_;
print " in B\n";
C(sub { $ret->(print " back in B\n") });
}
sub C {
my ($ret) = @_;
$ret->(print " in C\n");
}
sub X {
my ($ret) = @_;
print "in X\n";
Y(sub { $ret->(print "back in X\n") });
}
sub Y {
my ($ret) = @_;
print " in Y\n";
Z(sub { $ret->(print " back in Y\n") });
}
182 CHAPTER 13. CONTINUATIONS
sub Z {
my ($ret) = @_;
$ret->(print " in Z\n");
}
There are no new tricks that haven’t already been described in Section 13.3 on page 172 above, the only
difference is that since none of the original functions actually returned anything interesting (they returned
the results of print statements), the equivalent continuations don’t bother looking at their arguments.
This produces identical output to the original program, and exhibits exactly the same control flow.
Now let’s make just three tiny changes.
my $C_ret;
sub A {
my ($ret) = @_;
print "in A\n";
B(sub { $ret->(print "back in A\n") });
}
sub B {
my ($ret) = @_;
print " in B\n";
C(sub { $ret->(print " back in B\n") });
}
sub C {
my ($ret) = @_;
$C_ret = $ret;
$ret->(print " in C\n");
}
sub X {
my ($ret) = @_;
print "in X\n";
Y(sub { $ret->(print "back in X\n") });
}
sub Y {
my ($ret) = @_;
print " in Y\n";
Z( sub { $ret->(print " back in Y\n") });
}
sub Z {
my ($ret) = @_;
13.5. USING CPS 183
The first change is to declare a $C ret variable to hold a continuation. Then C(), before it calls its
continuation, stores it in this $C ret variable. Finally Z(), instead of calling its own continuation $ret,
calls the saved continuation $C ret instead.
This produces the output below. Whether or not you find this surprising will depend on how closely
you’ve been following the discussion:
in A
in B
in C
back in B
back in A
in X
in Y
in Z
back in B
back in A
in X
in Y
in Z
back in B
back in A
in X
in Y
in Z
back in B
back in A
...
All proceeds normally until we reach the first call to Z(). Since Z() calls the continuation that C() saved,
Z() instead of returning to X(), returns to B() instead. Then normal service is resumed, starting from
the return to B(), until the next return from Z(), which again returns to B() and so on, ad infinitum.
what we have achieved is the control flow shown in Figure 13.5 on the following page.
(Cue the Mony Python music.)
If this still isn’t clear, which I suspect may be the case, look at Figure 13.6 on the next page. In this
figure I’ve “broken apart” the functions from their continuations. A() calls B() calls C() which calls the
continuation of B() (e.g. cB()) which calls the continuation of A() etc. Now the continuation of B() is
just “return to A()” (call cA()) and the continuation of A() is to call X() etc.
I’m deliberately down-playing the idea of “return” now, this really is just function calls, and in that case
Figure 13.7 on the following page shows that there is really nothing special about Z calling cB, it’s just
a recursive loop, and tco or a trampoline will take care of the stack for us.
This is what I meant by saying that cps is a simplification. It linearizes control flow, so that it is just
a straight line of function calls. Once you get that idea, a whole world of possibilities opens up. For
184 CHAPTER 13. CONTINUATIONS
C Z
B Y
A X
C Z
B cB Y cY
A cA X cX
A B C cB cA X Y Z cY cX
instance you can probably imagine at this stage that with a little more work, adding loops and passing
continuations around, we could easily arrive at a coroutine implementation, where control does jump
from the heart of one loop to the heart of another and back again without disturbing the state of either
loop.
•
13.5. USING CPS 185
There is a big downside to writing in cps however, and that is that it makes your head hurt. A far
better approach is to use a language that has continuations built in “under the hood”. Then when you
write “return $val” you are really calling a continuation on $val, but you don’t have to worry about
it, and when you need to get hold of a continuation, you can ask for one. A language like that provides
continuations as first class objects in that they can be passed around as variables, much in the same way
as Perl provides anonymous subroutines (closures) as first class objects.
For example, if Perl had built-in continuations, and we could get at the current continuation by i.e.
taking a reference to the return keyword8 , then we could rewrite all of this example without cps, as
follows:
my $cont;
sub A {
print "in A\n";
B();
print "back in A\n";
}
sub B {
print " in B\n";
C();
print " back in B\n";
}
sub C {
$cont = \return;
print " in C\n";
}
sub X {
print "in X\n";
Y();
print "back in X\n";
}
sub Y {
print " in Y\n";
Z();
print " back in Y\n";
}
sub Z {
print " in Z\n";
$cont->()
}
8
Thanks to Tom Christiansen for this idea.
186 CHAPTER 13. CONTINUATIONS
A();
X();
Bold text shows the differences from the original non-cps version.
We are going to turn our PScheme interpreter into just such a language. The next few sections will
describe the changes we need to make.
13.6 Implementation
Rather than attempting to rewrite the interpreter of Chapter 12 on page 135 from start to finish in
cps, We’re going to backtrack to our first “interesting” interpreter, from Chapter 5 on page 59 which
has only let and lambda, and re-implement that. This has the advantage that we get a real working
interpreter with continuations which we can test early on, and we can demonstrate some of the power of
continuations with it. Then I’ll gloss the re-writing of the final interpreter in stages by working through
the intermediate versions pausing only to study any previously unencountered constructs that require
novel treatment. Finally we’ll have a continuation-passing version of the interpreter from Chapter 12 on
page 135 to play with.
sub repl {
my ($reader, $outfh) = @_;
while (my $expr = $reader->read()) {
my $result = $expr->Eval(new env);
$result->Print($outfh);
}
}
We’ve already seen in Section 13.3 on page 172 that the easiest way to transform a while loop into cps
is first to rewrite it into a recursive form, and this is easy to do here:
sub repl {
my ($reader, $outfh) = @_;
if (my $expr = $reader->read()) {
my $result = $expr->Eval(new env);
$result->Print($outfh);
repl($reader, $outfh);
}
}
13.6. IMPLEMENTATION 187
Now to recast that into cps is fairly trivial, especially if we remember that the reader PScm::Read::
Read() already returns undef on eof, and it can continue to do so, telling the trampoline to stop, and
only calling its continuation if there is something to evaluate.
sub repl {
my ($reader, $outfh, $ret) = @_;
$reader->read(
sub {
my ($expr) = @_;
$expr->Eval(
new env,
sub {
my ($result) = @_;
$result->Print(
$outfh,
sub { repl($reader, $outfh, $ret) }
)
}
)
}
)
}
So apart from the return of undef by the reader, where would we put these return statements that
return a continuation to the trampoline? Well as I’ve said we could place them throughout the code,
but there’s a better idea.
Instead of continuations being simple anonymous subroutines, we make them into objects that contain
those anonymous subroutines, with a Cont() method to invoke the underlying closure. Then instead of
just writing:
$ret->($arg);
$ret->Cont($arg);
sub Cont {
my ($self, $arg) = @_;
$self->{cont}->($arg);
}
we instead write
sub Cont {
my ($self, $arg) = @_;
return sub { $self->{cont}->($arg) };
}
188 CHAPTER 13. CONTINUATIONS
we have both effected the return of a continuation to the trampoline, and completely hidden the fact
from the client code9 !
In reality there are a few minor complications with this approach, but the above discussion is very
close to our final implementation.
It takes an anonymous subroutine as argument and stores it in a cont field. We don’t want to be writing
new PScm::Continuation( sub {...} ) all over the place, so we sweeten things with a little syntactic
sugar:
It’s functionally equivalent to the prototype trampoline() subroutine discussed above. The Bounce()
method is defined in PScm::Continuation to immediately invoke the continuation with no arguments:
9
Assuming of course that the Cont() method is invoked in tail position, but we’ve been here before.
10
In general it is always considered better form to use @EXPORT OK rather than @EXPORT. However it is justifiable here
firstly because PScm::Continuation is part of PScheme, not a standalone library module, and secondly the only reason
another class would use PScm::Continuation would be to gain access to the cont construct.
13.6. IMPLEMENTATION 189
It’s the same as we’ve seen before up to Line 37 where instead of entering it’s loop, it invokes the
trampoline with a continuation. That continuation invokes a new helper routine repl() with the reader
and the current output handle as arguments. Here’s repl().
So the guts of the old ReadEvalPrint() have been moved to repl(). It’s just an expansion of the cps
pseudocode for repl() in the previous section, and not nearly as bad as it might first appear, it’s really
just Read() calling Eval() calling Print() calling repl(), all through passed continuations.
There is also something new added to the environment. we’ll see what that new binding call/cc on
Line 53 is about later.
So the Read(), Eval() and Print() methods now all take continuations and must be modified
accordingly. Thankfully the modifications to Read() and Print() are trivial.
First we need to look at the cps Print() method.
It just does what it used to do, then calls its continuation with an arbitrary argument. That is the
continuation that will restart the repl and it doesn’t actually expect an argument, but Cont() does so
we’re just playing nice.
Notice that on Line 77 we don’t pass a continuation to the as string() method. This is just a
normal non-cps method call. The reasoning behind that is that although as string() is potentially
recursive, at no point will it cause evaluation of any PScheme expressions. Since we are only interested
in continuations that might be exposed to user code, we can classify any method call that cannot result
in a call to Eval() as a simple expression and deal with it as an atomic operation. Contrarywise, calls to
Eval() or calls to methods that might result in a call to Eval() are classified as significant expressions,
and must be rewritten into cps. This distinction makes our rewrite much simpler11 .
As described above, that Cont() method actually returns a continuation of zero arguments which the
trampoline will execute (by calling Bounce() on it). This is the trick I was enthusing about earlier: to
return a continuation to the trampoline that will call the current continuation, rather than just directly
calling the current continuation. The return will fall all the way back to the trampoline, effecting a
complete cleardown of whatever stack might have accumulated up to this point, then the trampoline will
kick things off again:
The really neat thing about this is that the code that is written to use this method neither knows
nor cares that the continuation is not simply being invoked directly at this point. The presence of the
trampoline is completely invisible to the client cps code.
Let’s take a look at Read() in PScm::Read next. In fact what we have done is to rename Read()
to read(), leaving it otherwise unchanged:
11
The reasoning is that each call to Eval() within the interpreter corresponds to a value being calculated and, most
importantly, returned in the PScheme language. These points of return are exactly the points that require continuations
to be used instead. If however in the later cps rewrite of the object system from Chapter 12 on page 135 we wanted to
allow PScheme objects to supply some sort of to-string method, and have that called in preference to the underlying Perl
as string() method, then we would have to rewrite as string() into cps.
13.6. IMPLEMENTATION 191
Read() collects the result of the call to read(), and if it is undef signifying eof it returns undef to the
trampoline telling it to stop. Otherwise it calls its continuation on the result.
Next we need to take a look at Eval().
All Eval() methods now also take an additional continuation as argument. All the Eval() methods
are in subclasses of PScm::Expr. Let’s start by looking at the simplest of those expressions: literals
and symbols.
The old Eval() method in PScm::Expr just returned $self (numbers, strings and anything else by
default evaluate to themselves). The new version is little different, it calls its argument continuation on
itself:
Evaluation of lists is a little more tricky, so to refresh our memories here’s the original PScm::Expr::
List::Eval() before cps transformation:
On Line 64 It evaluates the first component of the list to get the operator $op, then on Line 65 it applies
the operation $op to the rest of the unevaluated list.
Here’s the cps form:
Then try
13.6. IMPLEMENTATION 193
continuation as if it were the body of the function. It would still work, but might run out of stack in the
long run.
Secondly inside the method proper we assume that the first call to Eval(), in order to to get the
$op, will not return, so we pass it a continuation which accepts the result $op, and applies it to the rest
of the list, passing in the original continuation (Line 70). We must pass the original continuation $cont
to Apply(), rather than just calling the continuation on the result of the Apply(), because the Apply()
might make calls to Eval() to evaluate arguments to the $op, among other things, and must therefore
be rewritten in cps.
So that’s it for the rewrite of all of the Eval() methods in PScm::Expr. Now we need to follow the
chain of continuation passing into the various Apply() methods we have. Since this is an early version
of the interpreter, there aren’t too many, in fact they are in:
• PScm::Primitive;
• PScm::SpecialForm::Let;
• PScm::SpecialForm::If;
• PScm::SpecialForm::Lambda and
• PScm::Closure::Function.
Starting with PScm::Primitive::Apply(), you’ll remember that all primitive operations share a com-
mon Apply() method. Now individual primitives do not have to accept continuations because they are
terminal operations, so all that we have to do is to call the continuation that was passed to the shared
primitive Apply() on the result of applying the individual primitive to its arguments.
Unfortunately this is complicated by the fact that the primitive Apply() must first evaluate its
arguments. The original primitive Apply() did it with map:
This is a little tricky to rewrite in cps, so we’re going to attack it in stages. Stage one will be to
write a recursive version of the builtin map, which instead of taking a sub and list, takes a listref and an
environment, and for each element of the listref, calls that element’s Eval() method with the environment
as argument, accumulating the result in a new listref. But wait a minute, don’t we already have such
> (factorial 170)
7257415615307998967396728211129263114716991681296451376543577798900561
8434017061578523507492426174595114909912378385207766660225654427530253
2890077320751090240043028005829560396661259965825710439855829425756896
6313439612262571094946806711205568880457193340212661452800000000000000
000000000000000000000000000
Perl would have complained long before it hit that many levels of recursion.
194 CHAPTER 13. CONTINUATIONS
a recursive map eval() method? Yes, we wrote just such a method when we implemented true list
processing for version 0.0.5 back in Section 8.5.1 on page 92.
Here’s that method again.
Now remember that that is code from 0.0.5, and here we’re just rewriting version 0.0.2, so we don’t have
true list processing yet, lots of our methods are still expecting Perl array references, we don’t have a
Cons() method, and we don’t have any PScm::Expr::List::Null class. Nonetheless we can cast this
method back in to 0.0.2 terms quite easily.
This 0.0.2 map eval() method is not yet in cps form:
sub map_eval {
my ($self, $env) = @_;
if (@$self) {
return [ $self->first->Eval($env),
@{ $self->rest->map_eval($env) } ];
} else {
return [];
}
}
090 }
091 );
092 }
093 );
094 } else {
095 $cont->Cont([]);
096 }
097 }
This is the trickiest piece of code in the entire cps re-write. Fortunately having done it, it is useful in a
number of other scenarios. Now that we have map eval() we can use it to re-write PScm::SpecialForm::
Primitive::Apply():
Not too bad. The map eval() is passed a continuation that applies the primitive operation to the
evaluated arguments and calls the original argument continuation on the result.
It is worth noting again that there was no need to pass any continuation to the individual private
apply() methods for each primitive, so PScm::Primitive::Multiply etc. are unchanged.
The rest of the cps transformations are much simpler, on the whole, and others that require the
rewriting of map can make use of map eval().
Next up is PScm::SpecialForm::Let, here’s the changes:
027 \@values,
028 cont {
029 my ($newenv) = @_;
030 $body->Eval($newenv, $cont);
031 }
032 );
033 }
Since we know that the call to $env->Extend() will not return (those @values are still to be evaluated),
we instead have to pass a continuation that will accept the resulting extended environment and evaluate
the body in it. We have already dealt with all the Eval() methods (They’re all in PScm::Expr) and
they all take a continuation, so we pass the original continuation argument, since the Eval() is the
expression that this Apply() was previously returning.
Remembering to add PScm::Env::Extend() to our list of methods that will need looking at, we
proceed to PScm::SpecialForm::If::Apply(). We’ve already discussed how to transform a conditional
expression into cps form, but since this is our first encounter in the wild, let’s refresh our memory by
first looking at the original non-cps version:
It evaluates the condition in the current env, and calls isTrue() on the result, then uses that to decide
whether to evaluate the true branch or the false branch, both in the current environment.
Our cps version is not that different:
It evaluates the condition in the current environment and passes a continuation that will accept the
result. That continuation calls isTrue() on the result and uses that to decide, in exactly the same way,
whether to evaluate the true branch or the false branch. In either case the original continuation that
was argument to PScm::SpecialForm::If::Apply() is passed to the chosen branch’s Eval() method.
Staying with the program, our next task is the invocation of lambda handled by PScm::Special-
Form::Lambda::Apply():
There’s nothing very interesting here. lambda just creates a closure. There are no calls to Eval() that
it must make during this creation, so we can treat the call to new as a simple expression and invoke our
argument continuation on the result.
That just leaves PScm::Closure::Function::Apply() and PScm::Env::Extend(). Let’s start with
PScm::Closure::Function. The original just mapped Eval() over its arguments then called a private
apply() method on the results:
When the function invokes the continuation as a function, control returns to the call/cc and the
argument to the continuation becomes the result of the call to call/cc.
If the previous example doesn’t seem too exciting, how about this:
> (call/cc
> (lambda (cont)
> (if (cont 10)
> 20
> 30)))
10
Here the call to (cont 10) produced an immediate return of the value 10 through the call/cc even
though it was executed in the conditional position of an if statement.
These two examples only show control passing down the “stack” when a continuation is invoked.
However it is perfectly reasonable for control to return up the stack to a procedure that has already
returned. It is simply not easy to demonstrate with this version of the interpreter. Once we have an
interpreter with assignment and sequences, it becomes much easier.
call/cc is in fact a low-level, if not the lowest level continuation tool. It is possible to build higher
level control constructs using it. Abandoning pscheme for a moment, consider this Fibonacci13 sequence
generator in some hypothetical Perl-like language that supports co-routines:
sub fib {
my ($i, $j) = (0, 1);
for (;;) {
yield $i;
($i, $j) = ($j, $i + $j);
}
}
That yield call not only behaves like a return statement, but also remembers the current state of the
function so that the next time the function is called control resumes where it last left off. With built in
continuations this sort of control flow is very easy to achieve.
Anyway I hope this has whet your appetite a little for what call/cc can do, so let’s have a look at
its implementation.
It is of course a special form, and as usual it has an Apply() method:
13
The Fibonacci series starts with 0 and 1. the next number in the series is always the sum of the previous two, e.g. 0,
1, 1, 2, 3, 5, 8, 13, 21 . . .
200 CHAPTER 13. CONTINUATIONS
It evaluates its first argument, which should result in a function of one argument, passing the Eval() a
continuation which will Apply() the function to a form explicitly containing the current continuation. It
also passes the current env and the current continuation a second time, this time as the normal implicit
argument.
That’s all there is to it. Of course the continuation itself will need an Apply() method so that it can
be invoked as an operator.
We’re now ready to see the whole of the PScm::Continuation package, in Listing 13.10.1 on
page 225.
We’ve already seen most of this, only the Apply() method is new.
Apply() on Lines 23-32 is another method that makes use of map eval() to evaluate its arguments.
It passes it a continuation that calls itself on the first of its evaluated arguments, totally ignoring the
passed-in, current continuation, and effecting transfer of control to whatever context this continuation
represents.
And we’re done.
13.6. IMPLEMENTATION 201
The cps version calls a modified PScm::Env::ExtendRecursively(), passing a continuation that takes
the recursively extended environment and evaluates the body in it, passing the original continuation to
that Eval().
It uses a new helper map bindings(), where the original eval values() just used map. This works
in a similar way to map eval(), evaluating each value in the environment but then assigning the result
back to the original binding:
Other methods are unchanged so we have completed the cps rewrite of letrec, and all tests for 0.0.3
still pass in 0.1.3.
Just like letrec (which called a modified PScm::Env::ExtendRecursively(),) this calls a modified
ExtendIteratively(), passing a continuation that evaluates the body of the let* in the new environ-
ment with the original continuation.
Here’s the modifications to PScm::Env::ExtendIteratively():
The old version just iterated over the name/value pairs, creating an additional nested environment each
time around the loop and returning the final result. cps is easier with recursive definitions so this
ExtendIteratively() has been recast as a recursive method. It still does the same job, but additionally
arranges that the original continuation gets called on the final, extended environment.
The original returned its first argument unevaluated, the cps form calls its continuation on it. Remember
that the quote system was re-written for a later version of the interpreter to support unquote back in
Section 9.2.2 on page 112, so we’ll be returning to quote later on, in Section 13.6.6 on page 208, where
we rewrite that rewrite!
The other additional functions: car, cdr, cons and list are all primitives that share an Apply()
method that has already been rewritten into cps in Section 13.6.2 on page 188.
Among the PScm::Expr classes, the only thing that changes is the map eval() method. That
method was introduced in version 0.0.5 to work with pscheme lists, then re-introduced at an earlier stage
of the cps rewrite, in version 0.1.2, because we needed a recursive alternative to Perl’s map. Finally, here,
we combine the two implementations. Here’s PScm::Expr::List::Pair::map eval():
And here’s the new default PScm::Expr::map eval() that terminates the recursion of map eval() if
$self is PScm::Expr::Null, does the right thing if the cdr of the list is not a list, and handles
continuations, all in one tiny method:
13.6. IMPLEMENTATION 207
So far so good, we just pass the current continuation along with the current environment to the Quote()
method of whatever expression we’re quoting.
Let’s deal with the easy stuff first. PScm::Expr::Quote() used to just return $self, the cps version
calls the continuation on $self instead:
Great! since the calls to both Eval() and quote rest() are in tail position, it need only pass the
continuation along to both. All the Eval() methods have already been dealt with of course, so that
leaves quote rest(). Let’s first refresh our memories by looking at the non-cps original:
This is definately not tail recursive. But if we think it through there are no problems. The first thing it
does is call Quote() on its first() element, then it calls itself on the rest() of the list, then finally it
calls Cons() on those two results. Both the call to Quote() and quote rest() could potentially result
in calls to Eval() so we need to pass continuations to both. We can rewrite it a little first to make the
order of operations more explicit:
sub quote_rest {
my ($self, $env) = @_;
my $quoted_first = $self->first->Quote($env);
210 CHAPTER 13. CONTINUATIONS
my $quoted_rest = $self->rest->quote_rest($env);
Now all we do is rewrite that so that the call to Quote() gets a continuation that performs the remaining
two operations, including passing a second continuation to quote rest() that performs the last Cons().
here’s the cps rewrite:
It calls Quote() on its first element, passing a continuation (Lines 176-186) that accepts the $quoted -
first and then calls quote rest() on the rest of the elements, passing that a continuation (Lines
180-184) that accepts the $quoted rest and calls the original continuation on the result of Cons()-ing
the $quoted first and $quoted rest together.
Finally, as before, where the default PScm::Expr::quote rest() just returned $self, now it calls
its aregument continuation on $self:
That’s it for our cps rewrite of the unquote facility. All other methods are unchanged.
The original was iterative. This version has been recast in a recursive mould to make the cps transform
easier and to take advantage of PScm::Expr::List. If the list is empty it calls the continuation on the
empty list, otherwise it passes the form, environment and continuation to a helper method apply next():
It evaluates the first element of the list, passing a continuation that accepts the result. If there is more
of the list to process, it calls itself recursively on the rest of the list, otherwise it calls the original
continuation on the $val (the value of the last expression on the list). Note the similarity between this
method and map eval(). The diference is only that apply next() does not need to construct a new list
of all the evaluated results.
Next and last is set!. set! uses PScm::Env::Assign() that was developed for letrec to locate the
nearest binding for a symbol and change its value. set! is a special form since it evaluates its second
argument (the value) but not its first (the symbol).
The cps rewrite is pretty straightforward. It evaluates the value passing a continuation that will perform
the assignment (a simple expression) calling the original continuation on the result.
It evaluates the expression, passing a continuation that accepts the result and calls PScm::Env::Define()
to bind the result to the symbol in the current environment. We can treat the call to PScm::Env::
Define() as a simple expression and just call the original continuation on the result.
The first thing the old version did was to evaluate the parent expression (the value of the parent class)
then use that along with the fields and methods (unevaluated) to create the new class. Our rewrite
evaluates the parent expression passing a continuation that will create the class. We can treat the class
creation as a simple expression (it makes no further calls to Eval()) and just call the original continuation
on the result.
Nothing to follow up on there, so next we turn our attention to the application of a class to arguments,
which creates a new object. This is in PScm::Class::Apply():
As usual, Apply() now takes a continuation. The old version called make instance() to create a new
instance of the class, then called the init method of the new object with the arguments to the class, then
returned the new object. The cps version can still just call make instance() as a simple expression, but
needs to pass a continuation to call method() because call method() will be calling Eval() on both
the arguments and the body of the method. The continuation just calls the original continuation on the
new object (as if it were returning it).
Remembering that objects in this implementation are just environments, Here’s the rewrite of PScm::
Env::call method():
Hey, this isn’t too bad at all. If call method() finds the method, it calls ApplyMethod() on it, passing
the original continuation as an extra argument. Wheras the old version implicitly returned undef if the
method was not found, the cps version must explicitly call the continuation on undef14 .
Next we need to look at PScm::Closure::Method::ApplyMethod():
Again we use map eval() on the $form (the arguments to the method) to evaluate them, this time
passing a continuation that applies the method to its evaluated arguments, using PScm::Expr::List::
Cons() to prepend the current object $this to the PScm::Expr::List of arguments to the core PScm::
Closure:: apply() method, and passing in the original continuation. We’ve already discussed that core
apply() method in Section 13.6.2 on page 188 when we re-wrote lambda.
That just leaves the calling of methods on the objects themselves. Both PScm::Env and PScm::
Env::Super have an Apply() method. The one from PScm::Env::Super arranges to pass the current
value of this to the called method, otherwise they are very similar. Here’s PScm::Env::Apply():
14
Note that calling a continuation on undef is not the same as returning undef, the trampoline will never see this undef
and terminate the computation prematurely.
13.6. IMPLEMENTATION 215
The PScm::SpecialForm::Print package is unusual, therefore, in that it has a new() method and
creates its own instance, because it needs to save the argument filehandle. Other than that it is a
standard cps special form:
PScm::SpecialForm::Print::Apply() invokes its argument $form’s Eval() method with the current
environment and a continuation that will print the result, then call the original continuation on the
evaluated result. So print returns the expression that it printed.
You can see three continuations in this one method: the continuation $cont passed in as argument, the
outer cont{} returned to the trampoline, and the inner cont{} that applies the evaluated $op to the
as-yet unevaluated arguments. There is a lot of dependancy on the lexical scope of various variables in
this code. If we were going to do this without closures we would have to make all that explicit. Here’s
an attempt:
sub Eval {
my ($self, $env, $cont) = @_;
return new Bouncer(
new ListEvalFirstCont($self, $env, $cont)
);
}
The Bouncer class would cope with any continuations returned to the trampoline, it would have a
new() method to capture the argument continuation and a Bounce() method that invoked the captured
continuation:
package Bouncer;
sub new {
my ($class, $cont) = @_;
bless { cont => $cont }, $class;
}
13.7. CPS WITHOUT CLOSURES 219
sub Bounce {
my ($self) = @_;
$self->{cont}->Cont();
}
Then ListFirstEvalCont would need a new() method, and a Cont() method of no arguments, since
that is what the Bouncer would call on it it:
package ListEvalFirstCont;
sub new {
my ($class, $list, $env, $cont) = @_;
bless {
list => $list,
env => $env,
cont => $cont,
}, $class;
}
sub Cont {
my ($self) = @_;
$self->{list}->first()->Eval(
$self->{env},
new ListEvalRestCont($self->{list},
$self->{env},
$self->{cont});
);
}
You can see that the Cont() method here is doing the same thing that the closure was doing in the
original code, but it in turn must create a new ListEvalRestCont object rather than a closure. That
ListEvalRestCont would in turn need a new(), and a Cont() method, since that is what the operations
Eval() method would call:
package ListEvalRestCont;
sub new {
my ($class, $list, $env, $cont) = @_;
bless {
list => $list,
env => $env,
cont => $cont,
}, $class;
}
sub Cont {
220 CHAPTER 13. CONTINUATIONS
And that’s not the end of it, since really that last Cont() method should be returning a continuation to
the trampoline rather than just calling Apply() directly. . .
Admittedly this is just first pass untested code to give you an idea, but I’m not writing this just
to scare you off. The point to note is that there are three basic things that have to be kept track
of when implementing cps without continuations. One is the current position in the control flow (in
this case the list being evaluated.) The second is the current environment: the values of variables that
the continuations need to execute; the third is the containing continuation. In fact what the closure
implementation makes implicit and effectively hides from us is that there is a chain of continuations,
from closure to closure, back to the originating continuation in the repl. This is an important observation.
It will greatly simplify the writing of a PScheme compiler later.
There are even some advantages to implementing continuations in this way. Primarily because the
chain of continuations is explicit, it can be traversed and searched making all sorts of additional control
flow constructs easier to implement. For example a try / throw / catch / resume mechanism need merely
traverse back up the continuation chain looking for a catching continuation, invoke it with the originating
continuation, and if the catch block could fix the problem it would resume from the instruction after
the throw. While this is possible with our existing implementation, it is more tricky.
The addition never happened. Control returned directly to the top level. Using call/cc, we can define
an error escape procedure in the PScheme language itself, without needing to make further changes to
the interpreter.
All error has to do is to print its error message and call a continuation that returns control to the
top level. So, assuming that top level continuation is already installed as ^escape (I’m just using a caret,
“^”, to prefix any continuation names so they stand out,) the error procedure itself is straightforward:
(define error
(lambda (msg)
(begin
(print msg)
(^escape ()))))
Note that the ^escape continuation expects an argument, so error passes the empty list (), and that
is what the repl prints as the result of whatever expression error is called from.
Next we need to create that top level continuation. Here’s a first attempt:
This looks promising. The call/cc calls the anonymous lambda expression passing in the current
continuation. The lambda expression just returns its argument, which is what call/cc returns, and
that is what gets bound to the global ^escape. This is good, as far as it goes, the current continuation
certainly is bound to ^escape.
The problem is that the operation of defining ^escape is part of the continuation saved in ^escape.
Put another way, the first time error calls ^escape, control resumes at the point that call/cc is
returning its value and so the define is re-executed, binding the empty list to ^escape and forgetting
the previously bound continuation. So the ^escape continuation is only useable once.
There are two ways to fix this.
The first way would be to change error so that it passes ^escape as argument to itself: (^escape
^escape), The downside of that is that you will get a PScm::Continuation printed out as the result
of any call to error.
The other way to fix this is to change the way we set up ^escape in the first place. First of all we
create a global ^escape variable with an arbitrary initial binding:
(define ^escape 0)
Then we use call/cc as before but call an expression that directly assigns to ^escape:
(call/cc
(lambda (cont)
(begin
(set! ^escape cont))))
This way the call/cc has no enclosing context, no deferred operations to perform, and when ^escape
is invoked control returns directly to the top level.
A test of the error handler, which just duplicates the above code, is in Listing 13.10.3 on page 227.
Next let’s try something a bit more challenging.
222 CHAPTER 13. CONTINUATIONS
13.8.2 yield
Remember that strange Fibonacci generator that was described towards the end of Section 13.6.2 on
page 188, the one that used a hypothetical yield command to return a value while remembering its
current position so that a subsequent call to the function would resume where it left off? Well here’s
how we’d like to write it in PScheme:
(define fib
(yielder
(letrec ((fib-loop
(lambda (i j)
(begin
(yield i)
(fib-loop j (+ i j))))))
(fib-loop 0 1))))
Note that there is nothing at all special about the code that uses fib. All of the interesting details are
in the definition of fib itself. The function fib is defined as a yielder (my term). It creates a recursive
helper routine fib-loop and calls it with arguments 0 and 1. That helper routine yields the current
value of its first argument i, then calls itself with argument i replaced by j and j replaced with i + j.
The second part of the example loops 10 times printing the next value from fib each time around
the loop.
Of course this presupposes a few features that PScheme doesn’t appear to have. The while macro
was already introduced in Section 9.2.2 on page 112, but here it is again just for completeness:
(define while
(macro (test body)
’(letrec
((loop
(lambda ()
(if ,test
(begin
,body
(loop))
()))))
(loop))))
You can see that it’s a recursive definition, but by now that shouldn’t worry you: as each call to loop is
in tail position we won’t grow any context or eat any stack with this definition.
13.8. CPS FUN 223
The other two features we don’t have yet are yield and yielder. yield is actually quite easy to
write. When we say (yield value), what we really want to do is return not only the value, but also
the current continuation so that the next time we are called that continuation can be called instead,
returning us to where we left off. We can return more than one value by wrapping the values in a list,
so notionally (yield value) means:
(call/cc
(lambda (current-continuation)
(^return (list current-continuation value))))
In fact that’s it, all we need to do is wrap that up in a macro, and we have yield:
(define yield
(macro (value)
’(call/cc
(lambda (^here)
(^return (list ^here ,value))))))
Next we need to look at yielder. You probably won’t be surprised to find out it’s another macro.
It returns a function that, when called for the first time, invokes the body of the yielder expression,
saving the current continuation in a ^return variable. When yield is invoked control returns through
the ^return continuation. The ^here continuation part of the returned result is saved and the value
part is returned by the function as a whole.
On subsequent calls, rather than invoking the body of the yielder again, the saved ^here continu-
ation is invoked, returning control to where the yield left off. Here’s yielder:
(define yielder
(macro (body)
’(let ((firsttime 1)
(^resume 0)
(^return 0))
(lambda ()
(if firsttime
(let ((res (call/cc
(lambda (^cont)
(begin
(set! ^return ^cont)
,body)))))
(begin
(set! firsttime 0)
(set! ^resume (car res))
(car (cdr res))))
224 CHAPTER 13. CONTINUATIONS
13.9 Summary
This has been a long chapter and a difficult one, particularily if you were unfamiliar with the subject of
continuations. It has however provided numerous real-world examples of cps during the rewrite of the
interpreter and hopefully the basic principles of cps have been well covered.
To reiterate the basic idea, continuations can be thought of as the “rest of the computation”, or
perhaps more graphically as a “reference to a return statement” that can be called as a function.
Anyway, having achieved a cps interpreter in Section 13.6.2 on page 188 we then introduced the
call/cc form, which passes the current continuation to its argument function. If Perl functions could
take a reference to their return statement with a syntax like \return, then we could write call/cc in
perl like this:
sub call_cc {
my ($sub) = @_;
$sub->(\return);
}
Unfortunately Perl 5 does not support that syntax yet, but lest you think this is all irrelevant to Perl,
you should be aware that the Parrot virtual machine which will run Perl6 has continuations built in from
the ground up!
Finally, having completely re-worked the interpreter in cps, in Section 13.8 on page 220 we showed
how we could use call/cc in conjunction with macro to create two high-level control constructs: error
and yield from within the PScheme language itself.
The next few chapters will take continuations a little further, to show the sorts of things that can be
done by varying the internal details of the implementation of continuations.
13.10. LISTINGS 225
13.10 Listings
13.10.1 PScm/Continuation.pm
001 package PScm::Continuation;
002
003 use strict;
004 use warnings;
005 use base qw(PScm);
006
007 require Exporter;
008
009 our @ISA = qw(PScm Exporter);
010
011 our @EXPORT = qw(cont);
012
013 sub new {
014 my ($class, $cont) = @ ;
015 bless { cont => $cont }, $class;
016 }
017
018 sub cont(&) {
019 my ($cont) = @ ;
020 return PACKAGE ->new($cont);
021 }
022
023 sub Apply {
024 my ($self, $form, $env, $cont) = @ ;
025 $form->map eval(
026 $env,
027 cont {
028 my ($ra evaluated args) = @ ;
029 $self->Cont($ra evaluated args->[0]);
030 }
031 );
032 }
033
034 sub Cont {
035 my ($self, $arg) = @ ;
036 return cont { $self->{cont}->($arg) };
037 }
038
039 sub Bounce {
040 my ($self) = @ ;
041 $self->{cont}->();
042 }
043
044 sub Eval {
045 my ($self, $env, $cont) = @ ;
046 $cont->Cont($self);
047 }
048
049 1;
226 CHAPTER 13. CONTINUATIONS
052
053 (define fib
054 (yielder
055 (letrec ((fib-loop
056 (lambda (i j)
057 (begin
058 (yield i)
059 (fib-loop j (+ i j))))))
060 (fib-loop 0 1))))
061
062 (let ((n 10))
063 (while n
064 (begin
065 (set! n (- n 1))
066 (print (fib)))))
067 EOF
068 yield
069 yielder
070 while
071 fib
072 0
073 1
074 1
075 2
076 3
077 5
078 8
079 13
080 21
081 34
082 ()
083 EOR
084
085 # vim: ft=perl
230 CHAPTER 13. CONTINUATIONS
Chapter 14
Threads
Continuations make the implementation of threads almost trivial. The trick is in the trampoline. Our old
trampoline method repeatedly called Bounce() on the current continuation to get the next continuation,
until a continuation returned undef:
If you think about it, a continuation already represents a single thread of computation. The trampoline
is just managing that single thread, ensuring that it does not consume too much stack. Suppose that
trampoline(), instead of just repeatedly invoking the current continuation, kept a queue of continua-
tions, and after bouncing the one at the front of the queue put the result onto the back of the queue (if
the result was not undef,) looping until the queue was empty. This version does exactly that:
Note that it no longer takes an argument continuation, instead it gets the next continuation from the
front of the queue. @thread queue is a new lexical “my” variable in the PScm package.
We place new threads on that queue with a new thread() method, also in the PScm package:
231
232 CHAPTER 14. THREADS
can write code that does different things in different threads by testing the result, much like the unix
fork system call does:
Notice that although both threads run in parallel, one thread does an (exit) so only the result 1 from
the other thread gets printed.
spawn is a new special form in PScm::SpecialForm::Spawn, and it’s surprisingly easy to implement:
On Line 294 it calls new thread() with a new continuation that will call the current continuation with
argument 0, and on Line 298 it directly calls the current continuation with an argument of 1. This is so
easy I feel like I have cheated!, but really that’s all there is to it. The new continuation will get executed
in turn when the Cont() on Line 298 returns control to the trampoline, and the trampoline will continue
executing any threads on its queue until all threads have finished and the queue is empty.
exit is even more trivial. It has to be a special form because individual primitives do not get called
in tail position, but all that it has to do is to return undef to the trampoline:
Incidentally, exit provides a useful way of terminating the interactive interpreter. Typing (exit) at the
prompt while only one thread is running will result in an empty $thread queue so the trampoline will
finish.
All that remains is to wire this in to the repl:
068 PScm::Expr::Symbol->new("root"),
069 PScm::Class::Root->new($initial_env)
070 );
071 __PACKAGE__->new_thread(cont { repl($initial_env, $reader, $outfh) });
072 trampoline();
073 }
Apart from the addition of spawn and exit to the initial environment, there is only one change.
The repl uses new thread() to add the initial thread (continuation) to the @thread queue then calls
trampoline() with no arguments, rather than passing the continuation directly to trampoline().
14.1 Variations
A more complete thread implementation would also provide mechanisms for collecting the result of one
thread in another with a wait command—not so easy, you’d need to put the waiting thread on a separate
queue and have the exit command take an argument and put it somewhere that the wait command
could find.
You would also need to be able to prevent concurrent access to sections of code, best done with some
sort of atomic semaphore operation. But atomicity is easy to guarantee at the level of the interpreter
internals, as long as no continuations are called during the claiming of a semaphore.
These variations are exercises you can try at home.
14.2 Tests
A simple test for spawn is in Listing 14.3.1 on the facing page.
14.3. LISTINGS 235
14.3 Listings
14.3.1 t/CPS Spawn.t
001 use strict;
002 use warnings;
003 use Test::More;
004 use lib ’t/lib’;
005 use PScm::Test tests => 3;
006
007 BEGIN { use ok(’PScm’) }
008
009 eval ok(<<EOF, <<EOR, ’spawn’);
010 (if (spawn)
011 (begin
012 (print "hello")
013 (print "hello")
014 1)
015 (begin
016 (print "goodbye")
017 (print "goodbye")
018 (exit)))
019 EOF
020 "hello"
021 "goodbye"
022 "hello"
023 "goodbye"
024 1
025 EOR
026
027 eval ok(’(exit)’, ’’, ’exit’);
028
029 # vim: ft=perl
Section 13.8.1 on page 220 described how we could write our own error routine in the PScheme language,
using an escape procedure to return control to the top level and resuming the read-eval-print loop. That
implementation had a couple of drawbacks however.
• Apart from printing an error message, the error handler returned a value (the empty list) to the
repl which printed it.
• It would be useful if we could gain access to the error continuation from within the PScheme
interpreter, so that recoverable errors would no longer have to be fatal.
In this short chapter we remedy these deficiencies by providing a built-in error primitive, and show how
our interpreter can interface with it.
237
238 CHAPTER 15. BETTER ERROR HANDLING
You can see the token “error” being bound to a new PScm::SpecialForm::Error object, and the
constructor for that object is passed both the current $outfh and a continuation which just calls repl()
with appropriate arguments.
The constructor for PScm::SpecialForm::Error just stashes its arguments:
When we invoke error with for example (error "my error message") its Apply() method is invoked.
Here it is:
It has to use cps because the error message itself might be computed, we can’t just assume that it is
already a string. So it Eval()’s the message, passing in a continuation that will first of all convert the
resulting message to a string suitable for display, and then call a secondary method do error() on that
string. the display string() method is defined in PScm::Expr to just call as string():
The upshot of this is that the error message, if it’s a PScm::Expr::String, won’t be wrapped in quotes
when printed which is what the PScm::Expr::String::as string() method would have done.
Returning to PScm::SpecialForm::Error, do error() is also quite simple:
It expects only a simple perl string. it strips any trailing newline from the error message, prints it to
the stored output file handle, then returns the stored continuation to the trampoline. That continuation
will restart the repl, skipping the print stage of the current loop.
Apart from making the code a little easier on the eye, there is another reason for having a separate
do error() method, and that brings us to the second part of this chapter.
The only thing it has to be careful of is that it calls do error() in tail position, so that the continuation
gets returned to the trampoline.
Let’s look at a few places where we can make use of this new method. If you remember, way back in
Section 3.6 on page 27 we saw how the various primitive operations made use of a check type() method,
which would die if the argument object was not of the desired type. Now we can cheat a little, and rather
than rewriting those primitives in cps, we just catch the error with a (Perl) eval in the shared PScm::
Primitive::Apply() method, and call Error() with argument $@ if an error was detected. Here’s the
previous version of that PScm::Primitive::Apply():
It is safe for Apply(), on Line 16 to evaluate the individual primitive separately, since it is not in cps
form. Then all it has to do is either call the current continuation on the result, or invoke Error() with
$@, both calls being in tail position.
•
15.2. USING THE ERROR BUILTIN FOR INTERNAL ERRORS 241
Apart from primitive expressions, another place where we throw an exception on a recoverable error is
in the LookUp() method of PScm::Env, when we don’t find a binding for a variable. Unfortunately
LookUp() was treated as a simple expression in our cps rewrite, so we need to backtrack to find the cps
code that invokes LookUp() in order to install the error handling. Fortunately there is only one place
where that happens, when a symbol is evaluated. Here’s the previous PScm::Expr::Symbol::Eval().
Again, as with the primitive Apply() above, it is safe for it to execute the LookUp() first, since LookUp()
is not in cps. Then, depending on $@, it either invokes Error() or calls the continuation on the result
of the lookup.
Another place where we died was in PScm::Env:: populate bindings() where we handle the possibility
of dot notation and single values in the formal arguments to a lambda expression. This routine is only
called by ExtendUnevaluated(), but unfortunately ExtendUnevaluated() is not yet in cps form. In
this case, because ExtendUnevaluated() is called from a number of places and all those places would
have to be aware that ExtendUnevaluated() could throw a Perl exception, it seems better to rewrite
ExtendUnevaluated() into cps, and change its callers to use the cps form. Here’s the cps version of
ExtendUnevaluated().
Most of the methods that call ExtendUnevaluated() are already in cps so we don’t really need to see
the changes to them. One method, make instance() in PScm::Class is not in cps, so we need to
rewrite that too:
15.2. USING THE ERROR BUILTIN FOR INTERNAL ERRORS 243
The caller of make instance(), PScm::Class::Apply(), was already in cps so transforming that to call
the cps form of make instance() is trivial:
041 cont {
042 $cont->Cont($new_object);
043 }
044 );
045 });
046 }
228 }
229 }
230 );
231 }
Very simple: the die was already in tail position, so where it used to die, it invokes Error() instead.
15.3 Tests
A few simple tests for error are in Listing 15.4.1 on the next page. Primarily, besides demonstrating
that the error builtin works, they show that the repl is still up and running afterwards.
246 CHAPTER 15. BETTER ERROR HANDLING
15.4 Listings
15.4.1 t/CPS BuiltInError.t
001 use strict;
002 use warnings;
003 use Test::More;
004 use lib ’t/lib’;
005 use PScm::Test tests => 8;
006
007 BEGIN { use ok(’PScm’) }
008
009 eval ok(<<EOF, <<EOR, ’built in error’);
010 (define div
011 (lambda (numerator denominator)
012 (if denominator
013 (/ numerator denominator)
014 (error "division by zero"))))
015 (+ (div 2 0) 1)
016 EOF
017 div
018 Error: division by zero
019 EOR
020
021 eval ok(<<EOF, <<EOR, ’argument to error need not be a string’);
022 (error ’(an error "message"))
023 EOF
024 Error: (an error "message")
025 EOR
026
027 eval ok(<<EOF, <<EOR, ’internal type error and recovery’);
028 (* 2 "2")
029 (* 2 2)
030 EOF
031 Error: wrong type argument(PScm::Expr::String) to PScm::Primitive::Multiply
032 4
033 EOR
034
035 eval ok(<<EOF, <<EOR, ’internal lookup error and recovery’);
036 x
037 2
038 EOF
039 Error: no binding for x in PScm::Env
040 2
041 EOR
042
043 eval ok(<<EOF, <<EOR, ’method lookup error and recovery’);
044 (define testclass
045 (make-class
046 root
047 ()
048 (say-hello () ’hello)))
049 (define testobj (testclass))
15.4. LISTINGS 247
Chronological Backtracking
Wheras, with the exception of Chapter 12 on page 135, we have been extending this interpreter to be
more like a complete scheme implementation, this chapter makes a deliberate departure from the R6RS
[12] specification to add a feature not normally found in functional or procedural languages. This feature
is best introduced by example, but suffice to say that it is one step on the way to implementing a logic
programming language.
Understanding this chapter relies heavily on previous chapters. If you have skipped ahead to here,
you should at least make sure that you understand the implementation details of cps from Chapter 13
on page 159 before diving in to the details that follow. However you can read the next few sections on
their own if you want to get a taste of what this chapter has to offer.
> (amb 1 2 3)
1
> ?
2
> ?
3
> ?
Error: no more solutions
> ?
Error: no current problem
What’s going on here? Well amb is given a list of values, and returns all of them. But it returns them one
at a time. When a “?” is typed at the PScheme prompt control backtracks to amb and it returns its next
result. So the execution of expressions involving amb is somehow threaded into the read-eval-print loop
itself. I should probably point out that this new behaviour is not specific to amb, but rather a general
property of the interpreter:
249
250 CHAPTER 16. CHRONOLOGICAL BACKTRACKING
> ?
Error: no current problem
> (+ 2 2)
4
> ?
Error: no more solutions
> ?
Error: no current problem
The Error: no current problem message means just that: there is no current problem so no back-
tracking is possible, wheras the Error: no more solutions message means that the current “prob-
lem” has just exhahsted all of its posibilities. With no occurence of amb in the “problem” there is only
one possible outcome (4 in the (+ 2 2) example above) so the repl continues to behave as normal for
“normal” input.
amb will only return a subsequent value if it is told that the previous value is not acceptable. One
way of doing that, as we have seen, is by typing “?” at the scheme prompt. We can do the same thing
within our code however, as I’ll demonstrate next:
> (list (amb 1 2) (amb ’a ’b))
(1 a)
> ?
(1 b)
> ?
(2 a)
> ?
(2 b)
> ?
Error: no more solutions
Now that’s interesting. There are two calls to amb, and list collects the results. Best we go through
this one step at a time.
1. The expression first returns a list of the first arguments to each call to amb, namely 1 and a.
2. When we tell the interpreter that we’d like to see more results by typing ? at the prompt, the
second amb call intercepts the request and returns its second argument, so the whole expression
returns (1 b).
3. When we ask for a third result, the second amb again intercepts the request, but this time it has
run out of arguments, so it fails to satisfy the request and control propogates back to the first call
to amb. The first amb now returns its second result, 2, and control passes forwards again to the
second amb. This second amb is now being called afresh, as it were, and is back in its initial state
where it returns its first argument, so the whole third result is (2 a).
4. The request for a fourth result proceeds as the request for the second result did, with the second
amb producing b, resulting in (2 b).
5. With the fifth and final request, the second amb again fails, so propogates the failure back to the
first amb, but this time the first amb has also exhausted its results, so propogates the failure back
to the command loop and we get the “error”.
16.1. INTRODUCING AMB 251
The diagram in Figure 16.1 attempts to show this control flow in action1 .
read
eval
(amb 1 2)
1 2
? ?
1 a 1 b 2 a 2 b
? ? ? ?
(1 a) (1 b) (2 a) (2 b)
? ? ? ?
print "(1 a)"
? read
? read
? read
? read
So in what way does this demonstrate that we can control the backtracking behaviour of amb? Simple.
When amb itself fails it propogates control back to the chronologically previous call to amb, just as typing a
“?” at the prompt does. When the second amb call ran out of options in the example, control propogated
back to the first amb call. Now a call to amb with no arguments must immediately fail, because it has no
arguments to choose from:
1
Of course none of this would be possible without continuations. Only with cps do we have a situation where Read()
calls Eval() and so forth, but that’s best left for Section 16.3 on page 266, which discusses implementation.
252 CHAPTER 16. CHRONOLOGICAL BACKTRACKING
> (amb)
Error: no more solutions
So calling amb with no arguments forces any previous amb to deliver up its next value2 . We can wrap
that behaviour in a function that tests some condition, and forces another choice if the condition is false.
That function is called require:
(define require
(lambda (x)
(if x x (amb))))
The return value of x if the test succeeds is merely utilitarian, it is the call to amb with no arguments
if the test fails that is important. So how can we use requre? Well for example let’s assume we have
a predicate even? that returns true if its argument is even. We can use that to filter the results of an
earlier amb:
The expression (require (even? x)) filtered out the odd values of x, so only the even values were
propogated to the result(s) of the expression.
You should be starting to see how amb and cps are deeply interlinked, and how backtracking can
therefore return to any chronologically previous point in the computation, not just “down the stack” to
a caller of the code that initiates the backtracking.
Liars
Five schoolgirls sat for an examination. Their parents—so they thought—showed an undue
degree of interest in the result. They therefore agreed that, in writing home about the
examination, each girl should make one true statement and one untrue one. The following
are the relevant passages from their letters:
What in fact was the order in which the five girls were placed?
amb makes it easy to solve this type of problem by merely enumerating all the possibilities then eliminating
those possibilities that are wrong in some way:
The bindings in the let supply all possible grades to each of the girls, then the first require in the body
of the let makes sure that all the girls have different grades: the distinct? function only returns true if
there are no duplicates in its argument list. I’ll show you the implementation, and that of other functions
here, later. The remaining requirements simply list the two parts of each girl’s statement, requiring that
one is true and one is false: the xor (exclusive or) function returns true only if one of its arguments is
true and the other is false. The eq? function tests if two expressions are equal.
254 CHAPTER 16. CHRONOLOGICAL BACKTRACKING
So we start out by requiring that all five girls have distinct positions in the exam results. The we go
on to require that exactly one of each of the girls two statements is true. Finally we build and return
a list of pairs of the girl’s names, and the associated positions that satisfy all the requirements, using
quote and unquote.
Of course this is horribly inefficient. There are 55 = 3125 permutations of betty, ethel, joan, kitty
and mary, and that first distinct requirement forces a re-evaluation of all but 5! = 120 of them5 , so
about 96% of the initial possibilities are pruned at the first step, and backtracking is provoked. In fact,
when writing tests for this amb example, this single function took so long to run (about 14 seconds on
my laptop) that I was forced to find ways to optimize it. The optimizations demonstrate some additional
behaviour of amb, so here’s the optimized version:
(define liars
(lambda ()
(let* ((betty (amb 1 2 3 4 5))
(ethel (one-of (exclude (list betty)
(list 1 2 3 4 5))))
(joan (one-of (exclude (list betty ethel)
(list 1 2 3 4 5))))
(kitty (one-of (exclude (list betty ethel joan)
(list 1 2 3 4 5))))
(mary (car (exclude (list betty ethel joan kitty)
(list 1 2 3 4 5)))))
(begin
(require (xor (eq? kitty 2) (eq? betty 3)))
(require (xor (eq? ethel 1) (eq? joan 2)))
(require (xor (eq? joan 3) (eq? ethel 5)))
(require (xor (eq? kitty 2) (eq? mary 4)))
(require (xor (eq? mary 4) (eq? betty 1)))
’((betty ,betty)
(ethel ,ethel)
(joan ,joan)
(kitty ,kitty)
(mary ,mary))))))
It starts out as before, setting betty to an amb choice from the available positions, but then calls a couple
of new functions to calculate the value for ethel and the rest of the girls. exclude returns a list of all
the elements in its second list that aren’t in its first list. So for example if betty is 1, then ethel only
gets the choice of values 2 through 5. I’ll show you exclude later. one-of is more interesting, since
it makes use of require and amb. It does the same thing as amb, but takes a single list of values as
argument rather than individual arguments:
(define one-of
(lambda (lst)
(begin
(require lst)
(amb (car lst) (one-of (cdr lst))))))
5
Yes, a practical application of the factorial function!
16.2. EXAMPLES OF AMB IN ACTION 255
Firstly it requires that the list is not empty, then it uses amb to choose either the car of the list, or
one-of the cdr of the list. This in fact demonstrates that amb must be a special form: this function
would not work if amb had its arguments evaluated for it; if both arguments to that second amb were
evaluated before amb saw them then one-of would get recursively executed until the list was empty, then
the first amb to be actually invoked would be the one that terminates recursion when (require lst)
fails, so this function would always fail if amb were a primitive.
Back to our optimized liars example. The use of let* instead of let makes the values of the
previous bindings available to subsequent ones. By the time we get to assigning to mary, there is only
one choice left, so we just take it with car rather than using one-of. Since our values are now guaranteed
to be distinct, we can remove that explicit requirement from the code, and the optimized version runs
in a little under a second on the same machine.
•
As promised, here are the rest of the scheme functions needed to implement the solution to the “Liars”
puzzle and other examples seen earlier. You can skim these if you’re not interested in the details, they
don’t really do anything new.
To make these functions easier to write (and read) I’ve introduced the boolean short circuiting special
forms and and or to this version of the interpreter: (and a b) will return a without evaluating b if a is
false, and (or a b) will return a without evaluating b if a is true.
Some of these functions also make use of not. and and or have been added as special forms to the
interpreter and so interact with the amb rewrite, so you’ll have to wait to see those, but not is just:
(define not
(lambda (x)
(if x 0 1)))
And so to our support routines. Firstly even?:
(define divisible-by
(lambda (n)
(lambda (v)
(begin
(define loop
(lambda (o)
(if (eq? o v)
1
(if (> o v)
0
(loop (+ o n))))))
(loop 0)))))
(define even?
(lambda (a)
((divisible-by 2) a)))
This is really just demonstrating the functional programming style that Scheme promotes6 . The function
divisible-by takes an argument number n and returns another function that will return true if its
6
and that we can still get away without a “/” primitive.
256 CHAPTER 16. CHRONOLOGICAL BACKTRACKING
argument is divisible by n. It creates an inner loop method which loops over 0, n, 2n, 3n . . . until either
equal to or greater than the number being tested. even? uses this to create a function that tests for
divisibility by 2, and calls it on its argument a. It is total overkill to do it this way, but fun.
Next distinct?:
(define distinct?
(lambda (lst)
(if lst
(and (not (member? (car lst) (cdr lst)))
(distinct? (cdr lst)))
1)))
distinct? says if the list is not empty, then it is distinct if its first element (its car) is not a member
of the rest of the list and the rest of the list is distinct. If the list is empty, then it is distinct. distinct
makes use of another function, member?, shown next.
(define member?
(lambda (item lst)
(if lst
(or (eq? item (car lst))
(member? item (cdr lst)))
0)))
member? determines if its argument item is a member of its argument lst. It says if the list is not
empty, then the item is a member of the list if it is equal to the car of the list or a member of the cdr of
the list. The item is not a member of an empty list. member? uses another function eq? to test equality,
but that’s been added to the interpreter as a primitive, so we’ll leave that for later.
Next up for consideration is xor. xor takes two arguments and returns true only if precisely one of
those arguments is false.
(define xor
(lambda (x y)
(or (and x (not y))
(and y (not x)))))
Lastly for the support routines, our optimized example made use of exclude which returns its second
argument list after removing any items on its first argument list. It’s easy to do now that we have
member?:
(define exclude
(lambda (items lst)
(if lst
(if (member? (car lst) items)
(exclude items (cdr lst))
(cons (car lst)
(exclude items (cdr lst))))
())))
For a non-empty list: if the first element is to be excluded then just return the result of calling exclude
on the rest of the list. If it is not to be excluded, then prepend it to the result of calling exclude on the
rest of the list. For an empty list the only result can be the empty list.
16.2. EXAMPLES OF AMB IN ACTION 257
Barrels of Fun
A wine merchant has six barrels of wine and beer containing:
• 30 gallons
• 32 gallons
• 36 gallons
• 38 gallons
• 40 gallons
• 62 gallons
Five barrels are filled with wine and one with beer. The first customer purchases two barrels
of wine. The second customer purchases twice as much wine as the first customer. Which
barrel contains beer?
Here’s a solution:
(define barrels-of-fun
(lambda ()
(let* ((barrels (list 30 32 36 38 40 62))
(beer (one-of barrels))
(wine (exclude (list beer) barrels))
(barrel-1 (one-of wine))
(barrel-2 (one-of (exclude (list barrel-1) wine)))
(purchase (some-of (exclude (list barrel-1 barrel-2) wine))))
(begin
(require (eq? (* 2 (+ barrel-1 barrel-2))
(sum purchase)))
beer))))
Again, it is more or less just a statement of the problem. We start off by picking the beer barrel at
random. Then we say that the wine barrels are the remaining barrels. Next we randomly pick the first
barrel of wine bought from the wine barrels, and the second from the remaining wine barrels. We don’t
know how many barrels the second customer bought, so we merely assign some-of the remaining barrels
to that purchase. Finally in the body of the let we require that the second customer buys twice as much
wine as the first, then return the beer barrel (the answer is 40 by the way).
We haven’t seen some-of before. It is very similar to one-of described above, and makes direct use
of amb.
(define some-of
(lambda (lst)
(begin
258 CHAPTER 16. CHRONOLOGICAL BACKTRACKING
(require lst)
(amb (list (car lst))
(some-of (cdr lst))
(cons (car lst)
(some-of (cdr lst)))))))
It requires the list to be non-empty, then chooses between just the first element of the list (as a list), some
of the rest of the list, or the first element prepended to some of the rest of the list. This will eventually
produce all non-empty subsets of the list.
The only other function we haven’t seen before is sum. It adds all the values in its argument list and
is quite trivial:
(define sum
(lambda (lst)
(if lst
(+ (car lst)
(sum (cdr lst)))
0)))
The sum of a list is the car of the list plus the sum of the cdr of the list. The sum of an empty list is
zero.
((x 6) (y 8) (z 10))
> ?
((x 8) (y 6) (z 10))
> ?
Error: no more solutions
And so it was. After defining square, we pick some ranges of numbers x, y and z, then require that
the sum of the squares of x and y equals the square of z.
Although it is simple and easy to understand, that’s a terribly naiive implementation. We just guessed
the range 1. . . 8 for x and y based on a fixed range 1. . . 12 for z. Plus the result includes duplicates: ((x
3) (y 4) (z 5)) is the same as ((x 4) (y 3) (z 5)). Plus, the number of results is constrained by
the highest value of z, altogether not very satisfactory.
With the addition of a couple more functions, we can remedy all of these deficiencies. Firstly, here’s
a function integers-between that will ambivalently return every number between its lower bound and
its upper bound, in ascending order:
(define integers-between
(lambda (lower upper)
(begin
(require (<= lower upper))
(amb lower
(integers-between (+ lower 1) upper)))))
It begins by requiring that its lower bound is less than or equal to its upper bound, then ambivalently
returns first the lower bound, then the result of calling itself with its lower bound incremented by one.
Thinking about it, if we were to remove the bounds check from integers-between it would continue
to produce new integers, one at a time, ad-infinitum, and without the bounds check it would have no
need for the upper bound argument. That realisation gives us our second function, integers-from:
(define integers-from
(lambda (x)
(amb x
(integers-from (+ x 1)))))
This function will just carry on returning one integer after another as long as it is backtracked to.
Given these two simple functions we can write a much more satisfactory version of
pythagorean-triples:
(define pythagorean-triples
(lambda ()
(let* ((z (integers-from 1))
(x (integers-between 1 z))
(y (integers-between x z)))
(begin
(require (eq? (+ (square x)
(square y))
(square z)))
’((x ,x) (y ,y) (z ,z))))))
260 CHAPTER 16. CHRONOLOGICAL BACKTRACKING
It uses let* to make the value of z available to the definition of x, and likewise the value of x available
to the definition of y, much as in the liars puzzle above. It lets z equal each of the positive integers in
turn, then it lets x range over the values 1 to z. Then, to avoid duplication, it only allows y to range over
the values x to z. The rest of the implementation is unchanged. It’s a little slow, but it will continue to
generate unique pythagorean triples as long as you keep asking it for more:
> (pythagorean-triples)
((x 3) (y 4) (z 5))
> ?
((x 6) (y 8) (z 10))
> ?
((x 5) (y 12) (z 13))
> ?
((x 9) (y 12) (z 15))
> ?
((x 8) (y 15) (z 17))
> ?
((x 12) (y 16) (z 20))
> ?
((x 7) (y 24) (z 25))
> ?
...
To wrap up this section, although it should be obvious, it’s probably worth pointing out that there is a
pitfall to using amb to generate infinite sequences like this. The function integers-from can never fail,
so unless it is the first call to amb in your program, any previous calls to amb will never get backtracked to.
This works out pretty well for pythagorean-triples: since we need the current value of z to constrain
the values of x, the call to integers-from had to happen first, but even if we hadn’t needed the value
of z first, we would still have to have calculated it first, otherwise any previous calls to amb would never
get a chance to yield more than their first result. For example the following just won’t work:
...
(let ((x (integers-from 1))
(y (integers-from 1))
(z (integers-from 1)))
...
The last call to integers-from to provide the value of z, when backtracked to (by hitting “?” or by
some downstream call to amb), would just keep on producing values, so the declarations of x and y would
never get backtracked to and never produce alternative values.
any simple bottom-up parser like the one used to parse PScheme itself. In fact it is quite capable of
parsing some restricted subsets of natural language.
To understand what follows, it is essential to realise that even set!, when backtracked through, will
have its effect undone. This is what is meant by “chronological backtracking”: chronological backtrack-
ing really does restore the state of the machine to a previous time, as if nothing since the amb being
backtracked to ever happened. I think that is quite amazing.
To start the discussion on parsing, consider the following two English sentences:
Although superficially very similar, the two sentences have radically different structures and semantics:
Time, “the indefinite continued progress of existence” is noted to always fly forward in the manner of an
arrow, wheras fruit flies of the genus Melanogaster are known to be quite partial to bannanas.
This demonstrates quite vividly that it is in fact impossible to correctly parse natural language
without involving semantics, and of course it is impossible to extract the semantics without parsing; a
chicken and egg problem that I hope to show amb can neatly circumvent.
Drawing on old school grammar lessons, Figure 16.2 shows a reasonable parse tree for the first
sentence. It consists of the noun “time” and a verb phrase. The verb phrase consists of the verb “flies”
and a prepositional phrase. The prepositional phrase consists of the preposition “like” and a noun phrase.
The noun phrase consists of the determinant “an” and the noun “arrow”.
sentence
noun verb phrase
Similarily, Figure 16.3 on the next page shows a parse tree for the second sentence. This time the
sentence breaks down into the classic noun phrase plus verb phrase structure (as did the first, but the
noun phrase just contained a noun). The noun phrase contains the adjective “fruit” and the noun “flies”.
The verb phrase contains the verb “like” and another noun phrase. This second noun phrase consists of
the determinant “a” and the noun “bannanna”.
In order to parse these sentences, we can start off by categorizing the individual words:
262 CHAPTER 16. CHRONOLOGICAL BACKTRACKING
sentence
noun phrase verb phrase
The first symbol on each list identifies the type of the rest of the words on the list. Note that a number
of the words occur on more than one of the lists: “like” acts as a preposition in the first sentence, while
it is a verb in the second. Similarily “flies” is the verb in the first sentence, but a noun in the second.
Additionally, I’ve added a couple of categorisations that aren’t needed to parse those sentences correctly,
but would nonetheless be present in a sufficiently general lexicon: “fruit” is certainly a noun, and “time”
is a perfectly acceptable adjective (“time travel” for example). These additional classifications are exactly
what cause us to do that double take when we first encounter these two sentences, and will make the
parsing more realistic.
Next we create a global variable *unparsed* to hold the words remaining to be parsed. this is initially
defined to be empty:
(define parse
(lambda (input)
(begin
(set! *unparsed* input)
(let ((sentence (parse-sentence)))
(begin
(require (not *unparsed*))
sentence)))))
parse starts by setting the global *unparsed* to its argument. Then it calls parse-sentence, collecting
the result. Finally it requires that there is nothing left in *unparsed* and returns the result of
parse-sentence.
16.2. EXAMPLES OF AMB IN ACTION 263
Readers who appreciate the dangers of global state and mutation might be wondering what on earth
is going on here. A function that accepts an argument then just assigns it to a global variable? Worse,
it then proceeds to mutate that global as the parse proceeds? Surely that is the antithesis of good
programming? There is a very sound reason that it is done this way, and that is to demonstrate what
amb is capable of. Please bear with me.
parse will be called like (parse ’(fruit flies like a bannanna)) and should return a parse tree
with the nodes of the tree labelled, like:
We have seen that parse calls parse-sentence, and we shall see shortly that parse-sentence calls
out to other parse-noun-phrase etc. routines to futher break down the sentence. The various parse-*
routines all indirectly consume tokens from the global *unparsed* variable, but the only function that
directly removes tokens from *unparsed* is the function parse-word:
(define parse-word
(lambda (words)
(begin
(require *unparsed*)
(require (member? (car *unparsed*) (cdr words)))
(let ((found-word (car *unparsed*)))
(begin
(set! *unparsed* (cdr *unparsed*))
(list (car words) found-word))))))
The argument words will be one of the lists of words defined above, where the car is the type of the words
and the cdr is the actual words to be recognized. Hence the use of car and cdr to get the appropriate
components.
So parse-word is called like (parse-word nouns) and will succeed and return a list of a type and a
word if the first word of *unparsed* is one of its argument words. For example if *unparsed* is ’(flies
like an arrow) and we call (parse-word nouns) it should return the list (noun flies) and as a side
effect set *unparsed* to ’(like an arrow).
parse-word requires that there are tokens left to parse, then requires that the first word of
*unparsed* is a member of its list of candidate words. If so then it removes the first word from
*unparsed* and returns it, appended to the category of words that matched. If there are no words left
to parse, or if the next word in *unparsed* is not one of the argument words, then parse-word fails
and control backtracks to the previous decision point where the next alternative is tried. It is important
to remember here that the effect of set! on *unparsed* can be undone by the backtracking of amb.
Back to parse. parse calls parse-sentence:
(define parse-sentence
(lambda ()
(amb (list ’sentence
264 CHAPTER 16. CHRONOLOGICAL BACKTRACKING
(parse-word nouns)
(parse-verb-phrase))
(list ’sentence
(parse-noun-phrase)
(parse-verb-phrase)))))
parse-sentence ambivalently chooses to parse either the structure of the first sentence or the structure
of the second. It prepends the result with the appropriate grammatical label just as parse-word did.
Since the second part of both sentences is the same (a verb phrase) we could equivalently have said:
(define parse-sentence
(lambda ()
(list ’sentence
(amb (parse-word nouns)
(parse-noun-phrase))
(parse-verb-phrase))))
In fact this second formulation is likely to be more efficient since it doesn’t have to backtrack through
parse-verb-phrase unneccessarily.
Next let’s look at parse-verb-phrase. Our two example verb phrases are different. The first consists
of a verb and a prepositional phrase, the second consists of a verb and a noun phrase. We can combine the
two, eliminating the duplication on verbs for a slightly more efficient parse. Here’s parse-verb-phrase:
(define parse-verb-phrase
(lambda ()
(list ’verb-phrase
(parse-word verbs)
(amb (parse-prep-phrase)
(parse-noun-phrase)))))
(define parse-noun-phrase
(lambda ()
(list ’noun-phrase
(amb (parse-word adjectives)
(parse-word determinants))
(parse-word nouns))))
We have two example noun phrases: an adjective followed by a noun and a determinant followed by a
noun. Again we’ve removed the duplication, this time on the noun.
Lastly, we have to parse prepositional phrases, of which we have only one example: a preposition
followed by a noun phrase:
(define parse-prep-phrase
(lambda ()
(list ’prep-phrase
(parse-word prepositions)
(parse-noun-phrase))))
16.2. EXAMPLES OF AMB IN ACTION 265
With these definitions in place, we can attempt to parse our two sentences (output reformatted manually
to aid readability):
So you’ve seen what amb can do. The rest of this chapter discusses its implementation.
266 CHAPTER 16. CHRONOLOGICAL BACKTRACKING
The default Eval() method in PScm::Expr demonstrates the simplest kind of transformation. The
previous version simply called its continuation on $self, so by default expressions evaluate to themselves.
The new amb version takes an extra $fail continuation as argument, and passes it along to the original
continuation as an extra argument:
Next up, let’s take a look at transforming an example method that creates a new continuation. The
PScm::SpecialForm::Let::Apply() method does that. It extends the current environment with the
new bindings for the let expression, passing a continuation that will evaluate the body of the let in
that new environment. The new version for amb is not that different. As you can see all the method
calls that used to take a single continuation as argument now take an extra $fail continuation, and the
original continuations themselves now take an extra $fail continuation, passing it to any method that
now expects it. Otherwise, it’s unchanged:
Please note however that there are two $fail variables here. The first one is passed to Apply() as
argument on Line 14 and gets passed on as an additional argument to Extend() on Line 24. The second
$fail is argument to the new continuation on Line 21 and is passed on as an additional argument to
Eval() on Line 22. It is very important that these two $fail variables are kept distinct.
Before we finally get around to some code that actually does more than just pass the failure con-
tinuation around, let’s take a look at a fairly involved use of continuations, and the (still mechanical)
transformation that amb requires of it. In Section 8.5.1 on page 92 we introduced PScm::Expr::List::
Pair::map eval(), which evaluates each component of its list and returns an arrayref of those evaluated
components. That method was introduced even earlier in our cps rewrite in Section 13.6.2 on page 188
and was finally reunited with its original list implementation in Section 13.6.5 on page 206 where it deals
with both continuations and true PScheme lists. Here’s PScm::Expr::List::Pair::map eval() so far:
159
160 $self->[FIRST]->Eval(
161 $env,
162 cont {
163 my ($evaluated_first) = @_;
164 $self->[REST]->map_eval(
165 $env,
166 cont {
167 my ($evaluated_rest) = @_;
168 $cont->Cont($self->Cons($evaluated_first,
169 $evaluated_rest));
170 }
171 );
172 }
173 );
174 }
And here it is after the amb changes:
171 sub map_eval {
172 my ($self, $env, $cont, $fail) = @_;
173
174 $self->[FIRST]->Eval(
175 $env,
176 cont {
177 my ($evaluated_first, $fail) = @_;
178 $self->[REST]->map_eval(
179 $env,
180 cont {
181 my ($evaluated_rest, $fail) = @_;
182 $cont->Cont(
183 $self->Cons($evaluated_first,
184 $evaluated_rest),
185 $fail
186 );
187 },
188 $fail
189 );
190 },
191 $fail
192 );
193 }
It’s a bit longer, but I hope you can see that the only change is that extra $fail argument alongside
each passed continuation, and as an extra argument to any continuation which is actually called. Note
again that it’s very important that each continuation actually declares its extra argument. Although the
same $fail variable name is used throughout, the actual scope of each variable is different, and could
easily have a different value. Having said that, this is the main reason that this rewrite is so mechanical.
270 CHAPTER 16. CHRONOLOGICAL BACKTRACKING
We have to manually import the cont and bounce subroutines from PScm::Continuation because
they’re in the same file (a failure of use base.) Then on Lines 45-55 we see the Apply() method for
continuations (remember call/cc presents continuations as functions so they need an Apply() method.)
Apply() is unchanged except for the passing of the extra $fail continuation. This means that the failure
continuation is kept track of even through the use of call/cc.
Lastly the Cont() method is similarily unchanged except that it uses bounce{} instead of cont{}
to create a PScm::Continuation::Bounce for the trampoline, and of course it has the extra $fail
continuation to pass on.
PScm::Continuation::Fail is somewhat shorter:
Again we must manually import the bounce{} construct that we need, but then the Fail(), method,
which takes no arguments, merely returns a bounce{} continuation to the trampoline that will invoke
the failure continuation with no arguments.
PScm::Continuation::Bounce is even shorter. It’s single Bounce() method, again with no argu-
ments, directly invokes its stored continuation as it always did.
So back to the rewrite. How about actually doing something with the failure continuation? As I’ve said,
there are only a few places in the interpreter where a new failure continuation is constructed, namely
• in the repl.
These are the only places in the interpreter where the amb rewrite is not purely mechanical. We’ll go
through these cases in the same order, starting with PScm::SpecialForm::Amb::Apply().
272 CHAPTER 16. CHRONOLOGICAL BACKTRACKING
It’s really not that bad. It takes the same arguments as any normal Apply() method, including the
extra failure continuation. On Line 485 it tests to see if the argument $choices (the actual arguments
to the amb function) is the empty list. If $choices is not empty, then on Line 486 it evaluates the first
choice, passing the original success continuation $cont which will return the result downstream to the
caller. But instead of just passing in its argument $fail continuation, on Lines 489-496 it passes a new
fail{} continuation that will, if backtracked to, call Amb::Apply() again on the rest of the arguments.
Note that on Line 494 the new failure continuation passes amb’s original failure continuation to Amb::
Apply(). So if amb itself decides to backtrack by calling that, control will pass immediately back to
whatever failure continuation was in place before amb installed this new one. If on the other hand
the new failure continuation is ever invoked downstream of this, it will cause control to proceed back
upstream to this occurence of amb which then returns its next value back downstream via Apply() to
the current success continuation.
16.3. IMPLEMENTING AMB 273
If the list of arguments is empty, then on Line 499 Apply() invokes its original argument $fail
continuation causing execution to immediately backtrack further upstream.
define in the previous version evaluated its value part in the current environment, passing a continuation
that would call the top environment frame’s Define() method on the symbol and the result. This new
version must additionally keep track of the previous value of the symbol, if any, and arrange that its failure
continuation restores that value before backtracking further. Apart from the extra $fail argument, the
first thing that is new is that on Line 334 it calls a new method PScm::Env::LookUpHere() which only
looks in the top frame and returns either the value of the argument symbol or undef. Then things
proceed as normal apart from the extra $fail continuation until Lines 342-349 where a replacement
fail{} continuation is passed to define’s original success continuation.
That new fail{} continuation checks to see if the $old value is defined. If it is, then it calls
Assign() on the environment to restore the old value. If it is not defined (there was no previous value)
then it must call a new method of PScm::Env, UnSet(), to remove the binding from the top frame. In
either case, it finally returns through the original $fail continuation to backtrack further upstream.
274 CHAPTER 16. CHRONOLOGICAL BACKTRACKING
The location of the fail{} is quite subtle, in fact an earlier version of this code had a bug that
went unnoticed for a considerable time. Consider the following PScheme fragment (assume x is already
defined):
Obviously, when backtracking, we want the previous value of x to be restored before we cons the next
value from amb on to it, otherwise we would be breaking the semantics of chronological backtracking. i.e.
if x starts out as (5), then after the first time throught it will obviously be (1 5), and the second time
through it should be (2 5).
Now, referring to Figure 16.4, think about the order that things happen here. Passing continuations
is much like tearing a function into two or more pieces: the first piece is the “head” of the function,
before it makes any calls of its own. The remaining pieces are the continuations that it passes to the
functions that it calls. This figure omits many details, but you can see that define calls cons which
calls amb, then amb calls the continuation of cons which in turn calls the waiting continuation of define.
By the way this is another example of cps being a simplification in that it linearizes control flow.
0 1 2
In this figure, “downstream” is left to right and “upstream” is right to left. Additionally the circles
represent new failure continuations being created and passed downstream. If control backtracks upstream
into this piece of code, it will first encounter the most recently installed, e.g. the rightmost failure
continuation. You can see that amb installs a new failure continuation at 1, and that in order for define
to have its failure continuation supplant the one set up by amb it must be created downstream of amb’s.
Therefore it is define’s success continuation that must install the failure continuation, at 2 in the figure.
If instead the initial code on entry to define had installed the failure continuation at 0 (by passing
it as the last argument to the outermost Eval), then backtracking would find amb’s failure continuation
first, and define would not get a chance to undo its effect before amb sent its next value downstream
again.
That was the bug of course, setting up the failure continuation at 0 instead of 2—it works almost all
of the time, unless evaluation of the second argument to define or set! results in a call to amb.
082 cont {
083 my ($expr) = @_;
084 $expr->Eval(
085 $env,
086 cont {
087 my ($result) = @_;
088 $result->Print(
089 $outfh,
090 cont {
091 repl($env, $reader, $outfh);
092 }
093 )
094 }
095 )
096 }
097 )
098 }
As I’ve said before, it’s really just Read() called with a continuation that calls Eval() with a continuation
that calls Print() with a continuation that calls repl() again. As before there are going to be extra
failure continuations passed around, but that part of the rewrite is purely mechanical. The additional
complications are because repl() must additionally install the final upstream failure continuations, and
additionally must check if the expression just read is a “?” request to backtrack. Bearing all that in
mind it’s really not too bad:
109 )
110 },
111 $fail1
112 )
113 }
To aid readability somewhat, I’ve named the various occurences of the failure coninuation separately:
$fail1, $fail2 etc. They could all just be called $fail without breaking anything, but it would be
more confusing.
The $fail1 argument to repl() is optional. Neither Error() nor the new thread() call that initially
installs the repl on the trampoline bother to pass one. If no failure continuation is passed, then on Line
89 repl() defaults it to a call to Error() with a “no current problem” message.
Then, as before repl() calls Read() with a continuation that will call Eval() etc. Read() itself
changes slightly however: if it reads a “?” it will invoke the current failure continuation.
If the expression read is not a retry request, then everything proceeds as normal, bar the extra failure
continuations: Eval() is called with a continuation that calls Print() with a continuation that calls
repl() again. Note that on Line 101 the continuation passed to Print() calls repl() with its argument
failure continuation $fail4, which is how backtracking works when a “?” is read subsequently.
One last thing to notice. On Lines 106-108 The failure continuation passed to Eval() produces the
“no more solutions” error, which will be printed if required before repl() reinstates the default “no
current problem” failure.
As mentioned above, Read() has changed a little. Here’s the new definition:
Before returning its value, Read() must first check that the expression it is about to pass to its success
continuation is not “?”. On Line 98 Read() checks to see if the expression returned by read() is a retry
request. is retry() is defined to return false in PScm::Expr, but PScm::Expr::Symbol redefines
this to return true if the symbol’s value() is “?”. If it is a retry request, Read() invokes its argument
failure continuation. At the very start this will be the "no current problem" error, so typing “?” at a
fresh PScheme prompt will produce this error.
That’s really all there is to amb. The rest of this section joins the dots by showing the support routines
that I’ve glossed over.
278 CHAPTER 16. CHRONOLOGICAL BACKTRACKING
LookUpHere() just checks the current frame to see if the binding exists. It is called by our new define
to save any previous value before define replaces it.
Next is LookUpNoError():
It uses the existing lookup ref() method to locate the symbol, either dereferencing and returning the
value if it was found, or returning undef. LookUpNoError() is called by set! before assigning a new
value to the found variable.
The other addition to PScm::Env was an UnSet() method which would remove a binding from the
environment.
This method just deletes a binding from the current frame. It is only called by define when backtracking
to remove the setting that define added to the current environment frame, so it need not, and should
not recurse.
Finally, and most trivially, there is an is retry() method of PScm::Expr, so that the continuation
passed to Read() can ask politely if the expression just read is a request to backtrack (“?”). The base
PScm::Expr class defines this to be false as a default:
But PScm::Expr::Symbol redefines this to return true if the symbol’s value() is "?".
16.4. SUPPORT FOR TESTING AMB 279
• and and or. Both are special forms so that arguments do not get evaluated unnecessarily and
short-circuit evaluation is possible.
• A general equality test eq?. This function will work for numbers, symbols, strings and lists (two
lists are considered eq? if their cars are eq? and their cdrs are eq?)7 .
PScm::SpecialForm::Begin inherits from that to get its Apply() method, its apply next() is un-
changed other than having the extra failure continuation:
7
This differs from scheme which has separate equality tests for symbols and lists, for efficiency reasons.
280 CHAPTER 16. CHRONOLOGICAL BACKTRACKING
294 $fail
295 );
296 }
PScm::SpecialForm::Or behaves similarily. it evaluates each of its arguments until one of them is
true, in which case it returns that result. If all of its arguments are false, it returns false.
So they all take an arbitrary number of arguments like the arithmetic primitives. For example (<=
2 3 3 4) is true because each argument is “<=” the next argument. apply() iterates over its argu-
ments, checking each one is a number and calling a separate compare() method on each pair. The
compare() method called on Line 170 is implemented separately by each of PScm::Primitive::Lt,
PScm::Primitive::Gt, PScm::Primitive::Le and PScm::Primitive::Ge. They all go exactly the
same way, so for example here’s PScm::Primitive::Lt:
16.4.3 eq?
Finally eq?. The eq? implementation is a bit more interesting. It can be used to compare any PScheme
data types that inherit from PScm::Expr. Equality is a relative term however. For instance, unlike
Perl, a string and a number will never be considered equal, however two lists with the same content are
considered equal. Here’s the new PScm::Primitive::Eq class:
As you can see, like the inequality tests above, it will take an arbitrary number of arguments. Apply()
keeps comparing adjacent arguments by calling their Eq() method until a test fails, or all tests pass.
The Eq() method is defined differently for various types of PScm::Expr. A default method in the base
PScm::Expr just compares object identity:
040 sub Eq {
041 my ($self, $other) = @_;
042 return $self == $other;
043 }
This means that, for example, two functions with the same arguments, env and body would still not be
considered equal. This could be fixed, but I’m not sure it’s worth it.
Anyway PScm::Expr::Atom overrides this Eq() method to do a string comparison on the (scalar)
values of the two objects, first checking that the two objects are of the same type. This is good enough
for strings, numbers and symbols:
092 sub Eq {
093 my ($self, $other) = @_;
094 return 0 unless $other->isa(ref($self));
095 return $self->value eq $other->value;
096 }
PScm::Expr::List::Pair::Eq() is more interesting. Firstly it does a quick check for object identity,
that will save unnecessary recursion if the two objects are actually the same object. Then it checks that
the object is a list, and finally it recursively calls itself on both first() and rest() to complete the
test:
228 sub Eq {
229 my ($self, $other) = @_;
230 return 1 if $self == $other;
231 return 0 unless $other->is_pair;
232 return $self->[FIRST]->Eq($other->[FIRST]) &&
233 $self->[REST]->Eq($other->[REST]);
234 }
Last of the Eq() methods is in PScm::Expr::List::Null. This method returns true only if the other
object is also a PScm::Expr::List::Null, since null is only equal to null:
255 sub Eq {
256 my ($self, $other) = @_;
257 return $other->is_null;
258 }
16.4.4 Wiring it up
Finally, here’s the additional methods wired in to ReadEvalPrint(). You can also see that on Line
67 the new thread() routine installs a bounce{} continuation that starts the repl. That continuation
doesn’t pass a failure continuation to repl(), so repl() will default that to the Error: no current
problem error.
284 CHAPTER 16. CHRONOLOGICAL BACKTRACKING
081 PScm::Class::Root->new($initial_env)
082 );
083 __PACKAGE__->new_thread(bounce { repl($initial_env, $reader, $outfh) });
084 trampoline();
085 }
thing? Well, imagine a language where all the control flow (for and while loops, break, continue,
return etc.) were implemented by continuations. Then a for loop would install break and continue
continuations (and uninstall them again), a subroutine would install a return continuation, etc.
Even more exciting, consider an environment of continuations as a parameter object. Then for
example nested for loops would push and pop their break and continue continuations. It would then
be relatively easy to break or continue or return to an arbitrary containing point.
These alternatives, while exciting, are left as an open exercise should you wish to pursue them.
16.6 Tests
The first set of tests in Listing 16.7.1 on the next page tries out or equality and inequality operators. It’s
nice to know they all work as expected.
The next set of tests in Listing 16.7.2 on page 289 exercizes the new repl itself ensuring that the appro-
priate error messages are produced after various requests for backtracking, and that the repl recovers
gracefully in all situations.
The last set of tests, in Listing 16.7.3 on page 290 gives amb a thorough workout. It tests most of the
examples that we have seen in this chapter, plus a few more for good measure. Additionally, it tests that
set! and define do in fact undo their assignments in the face of backtracking
16.7. LISTINGS 287
16.7 Listings
16.7.1 t/PScm Compare.t
001 use strict;
002 use warnings;
003 use Test::More;
004 use lib ’./t/lib’;
005 use PScm::Test tests => 38;
006
007 BEGIN { use ok(’PScm’) }
008
009 eval ok(’(eq? 1 1)’, ’1’, ’eq numbers’);
010 eval ok(’(eq? 1 2)’, ’0’, ’neq numbers’);
011 eval ok(’(eq? 1 "1")’, ’0’, ’neq numbers and strings’);
012 eval ok("(eq? 1 ’a)", ’0’, ’neq numbers and symbols’);
013 eval ok("(eq? 1 (list 1))", ’0’, ’neq numbers and lists’);
014
015 eval ok(’(eq? "a" "a")’, ’1’, ’eq strings’);
016 eval ok(’(eq? "a" "b")’, ’0’, ’neq strings’);
017 eval ok(’(eq? "1" 1)’, ’0’, ’neq strings and numbers’);
018 eval ok(’(eq? "a" \’a)’, ’0’, ’neq strings and symbols’);
019 eval ok(’(eq? "a" (list "a"))’, ’0’, ’neq strings and lists’);
020
021 eval ok("(eq? ’a ’a)", ’1’, ’eq symbols’);
022 eval ok("(eq? ’a ’b)", ’0’, ’neq symbols’);
023 eval ok("(eq? ’a 1)", ’0’, ’neq symbols and numbers’);
024 eval ok(’(eq? \’a "a")’, ’0’, ’neq symbols and strings’);
025 eval ok("(eq? ’a (list ’a))", ’0’, ’neq symbols and lists’);
026
027 eval ok("(eq? (list 1 2) (list 1 2))", ’1’, ’eq lists’);
028 eval ok("(eq? (list 1 2) (list 1 2 3))", ’0’, ’neq lists’);
029 eval ok("(eq? (list 1) 1)", ’0’, ’neq lists and numbers’);
030 eval ok(’(eq? (list "a") "a")’, ’0’, ’neq lists and strings’);
031 eval ok("(eq? (list ’a) ’a)", ’0’, ’neq lists and symbols’);
032
033 eval ok("(eq? () ())", ’1’, ’eq empty lists’);
034 eval ok("(eq? () (list 1))", ’0’, ’neq empty lists’);
035 eval ok("(eq? 1 1 1 1)", ’1’, ’eq multiple arguments’);
036 eval ok("(eq? 1 1 1 2)", ’0’, ’neq multiple arguments’);
037
038 eval ok("(< 1 2 3 4)", ’1’, ’< multiple arguments’);
039 eval ok("(< 1 2 3 3)", ’0’, ’!< multiple arguments’);
040
041 eval ok("(<= 1 2 3 3)", ’1’, ’<= multiple arguments’);
042 eval ok("(<= 1 2 3 2)", ’0’, ’!<= multiple arguments’);
043
044 eval ok("(> 4 3 2 1)", ’1’, ’> multiple arguments’);
045 eval ok("(> 4 3 2 2)", ’0’, ’!> multiple arguments’);
046
047 eval ok("(>= 4 3 2 2)", ’1’, ’>= multiple arguments’);
048 eval ok("(>= 4 3 2 3)", ’0’, ’!>= multiple arguments’);
049
288 CHAPTER 16. CHRONOLOGICAL BACKTRACKING
156 EOR
157
158 # Baker, Cooper, Fletcher, Miller and Smith live on different
159 # floors of a five-storey building. Baker does not live on the
160 # top floor. Cooper does not live on the bottom floor. Fletcher
161 # does not live on the top or the bottom floor. Miller lives
162 # on a higher floor than Cooper. Smith does not live on a
163 # floor adjacent to Fletcher’s. Fletcher does not live on a floor
164 # adjacent to Cooper’s. Where does everyone live?
165
166 eval ok(<<EOT, <<EOR, ’amb’);
167 $prereqs
168
169 (define multiple-dwelling
170 (lambda ()
171 (let* ((baker (one-of (list 1 2 3 4)))
172 (cooper (one-of (exclude (list baker) (list 2 3 4 5))))
173 (fletcher (one-of (exclude (list baker cooper) (list 2 3 4))))
174 (miller (one-of (exclude (list baker cooper fletcher)
175 (list 1 2 3 4 5))))
176 (smith (car (exclude (list baker cooper fletcher miller)
177 (list 1 2 3 4 5)))))
178 (begin
179 (require (> miller cooper))
180 (require (not (eq? (difference smith fletcher) 1)))
181 (require (not (eq? (difference cooper fletcher) 1)))
182 (list (list ’baker baker)
183 (list ’cooper cooper)
184 (list ’fletcher fletcher)
185 (list ’miller miller)
186 (list ’smith smith))))))
187
188 (multiple-dwelling)
189 EOT
190 $prereqs output
191 multiple-dwelling
192 ((baker 3) (cooper 2) (fletcher 4) (miller 5) (smith 1))
193 EOR
194
195 eval ok(<<EOF, <<EOR, ’Liars (optimized)’);
196 $prereqs
197 (define liars
198 (lambda ()
199 (let* ((betty (amb 1 2 3 4 5))
200 (ethel (one-of (exclude (list betty)
201 (list 1 2 3 4 5))))
202 (joan (one-of (exclude (list betty ethel)
203 (list 1 2 3 4 5))))
204 (kitty (one-of (exclude (list betty ethel joan)
205 (list 1 2 3 4 5))))
206 (mary (car (exclude (list betty ethel joan kitty)
207 (list 1 2 3 4 5)))))
294 CHAPTER 16. CHRONOLOGICAL BACKTRACKING
208 (begin
209 (require (xor (eq? kitty 2) (eq? betty 3)))
210 (require (xor (eq? ethel 1) (eq? joan 2)))
211 (require (xor (eq? joan 3) (eq? ethel 5)))
212 (require (xor (eq? kitty 2) (eq? mary 4)))
213 (require (xor (eq? mary 4) (eq? betty 1)))
214 ’((betty ,betty)
215 (ethel ,ethel)
216 (joan ,joan)
217 (kitty ,kitty)
218 (mary ,mary))))))
219 (liars)
220 ?
221 EOF
222 $prereqs output
223 liars
224 ((betty 3) (ethel 5) (joan 2) (kitty 1) (mary 4))
225 Error: no more solutions
226 EOR
227
228 eval ok(<<EOF, <<EOR, ’set! backtracking’);
229 (let ((x 1))
230 (let ((y (amb ’a ’b)))
231 (begin
232 (print (list ’x x))
233 (set! x 2)
234 (print (list ’x x))
235 y)))
236 ?
237 EOF
238 (x 1)
239 (x 2)
240 a
241 (x 1)
242 (x 2)
243 b
244 EOR
245
246 eval ok(<<EOF, <<EOR, ’define backtracking’);
247 (define x 1)
248 (let ((y (amb ’a ’b)))
249 (begin
250 (print (list ’x x))
251 (define x 2)
252 (print (list ’x x))
253 y))
254 ?
255 EOF
256 x
257 (x 1)
258 (x 2)
259 a
16.7. LISTINGS 295
260 (x 1)
261 (x 2)
262 b
263 EOR
264
265 eval ok(<<EOT, <<EOR, ’parsing’);
266 $prereqs
267
268 (define proper-nouns ’(john paul))
269 (define nouns ’(car garage))
270 (define auxilliaries ’(will has))
271 (define verbs ’(put))
272 (define articles ’(the a his))
273 (define prepositions ’(in to with))
274 (define degrees ’(very quite))
275 (define adjectives ’(red green old new))
276
277 (define parse-sentance
278 (lambda ()
279 (amb (list (parse-noun-phrase)
280 (parse-word auxilliaries)
281 (parse-verb-phrase))
282 (list (parse-noun-phrase)
283 (parse-verb-phrase)))))
284
285 (define parse-noun-phrase
286 (lambda ()
287 (amb (parse-word proper-nouns)
288 (list (parse-word articles)
289 (parse-adj-phrase)))))
290
291 (define parse-adj-phrase
292 (lambda ()
293 (amb (list (parse-deg-phrase)
294 (parse-adj-phrase))
295 (parse-word nouns))))
296
297 (define parse-deg-phrase
298 (lambda ()
299 (amb (list (parse-word degrees)
300 (parse-deg-phrase))
301 (parse-word adjectives))))
302
303 (define parse-verb-phrase
304 (lambda ()
305 (list (parse-word verbs)
306 (parse-noun-phrase)
307 (parse-prep-phrase))))
308
309 (define parse-prep-phrase
310 (lambda ()
311 (list (parse-word prepositions)
296 CHAPTER 16. CHRONOLOGICAL BACKTRACKING
312 (parse-noun-phrase))))
313
314 (define parse-word
315 (lambda (words)
316 (begin
317 (require *unparsed*)
318 (require (member? (car *unparsed*) words))
319 (let ((found-word (car *unparsed*)))
320 (begin
321 (set! *unparsed* (cdr *unparsed*))
322 found-word)))))
323
324 (define *unparsed* ())
325
326 (define parse
327 (lambda (input)
328 (begin
329 (set! *unparsed* input)
330 (let ((sentance (parse-sentance)))
331 (begin
332 (require (not *unparsed*))
333 sentance)))))
334
335 (parse ’(john will put his car in the garage))
336 (parse ’(paul put a car in his garage))
337 (parse ’(paul has put a very very old car in his quite new red garage))
338 EOT
339 $prereqs output
340 proper-nouns
341 nouns
342 auxilliaries
343 verbs
344 articles
345 prepositions
346 degrees
347 adjectives
348 parse-sentance
349 parse-noun-phrase
350 parse-adj-phrase
351 parse-deg-phrase
352 parse-verb-phrase
353 parse-prep-phrase
354 parse-word
355 *unparsed*
356 parse
357 (john will (put (his car) (in (the garage))))
358 (paul (put (a car) (in (his garage))))
359 (paul has (put (a ((very (very old)) car)) (in (his ((quite new) (red garage))))))
360 EOR
361
362 # vim: ft=perl
16.7. LISTINGS 297
This chapter first gives an example of the sort of things that logic programming is capable of, then
gets on with the implementation by introducing the concept of pattern matching. Then it explores a
generalization of pattern matching called unification. Having implemented unification and other more
specialized support routines in the interpreter core, we then proceed, with the help of amb from Chapter 16
on page 249, to build a logic programming system in the PScheme language itself. The advantages of
doing this in a language which has continuation-passing and backtracking built in should become very
apparent by the end of the chapter. This implementation is based on the one given in [7, pp295–300],
but made a little easier by using amb.
(define the-rules
’(((mary likes wine))
((mary likes cheese))
((john likes beer))
((john likes wine))
((john likes chips))
((person mary))
((person john))))
Our database is called the-rules and it is a list of statements of various sorts. This database lists facts
about what mary and john like, and also the facts that both mary and john are people. There is nothing
special about the structure of these individual facts, just as long as we remain consistent in their use,
and write queries that interrogate the database appropriately. In other words it is we, the programmers,
who decide the “meaning” of ((mary likes wine)): the system attaches no particular significance to
that structure in itself; in particular likes is not a keyword or an operator of the logic programming
system, it is just a symbol that I have chosen to use.
299
300 CHAPTER 17. UNIFICATION AND LOGIC PROGRAMMING
In our logic programming implementation, we will use symbols with initial capital letters as pattern
variables which can match parts of the database. So given the database above, the system can respond
to a query such as (mary likes X) with the facts (mary likes wine) and (mary likes cheese), and
can respond to the question (person X) with the facts (person mary) and (person john). Of course
there is nothing here that a simple sql query could not do, so let’s make things more interesting by
adding a rule to the facts.
This rule states “mary likes anyone who likes chips.” This is written as:
((mary likes X) (person X) (X likes chips))
You can read this as “mary likes X if person X and X likes chips.”
Rules all have this general form. The first statement in the rule is true if all of the other statements
in the rule are true or can be proved to be true. The first statement is called the head of the rule, and
the remaining statements are called the body of the rule.
Looked at in this way, bare facts are just rules with no body: they are true in themselves because
there is nothing left to prove. This explains the apparently redundant extra parentheses around each
fact in our example database.
Given our extended database, which now looks like this:
(define the-rules
’(((mary likes wine))
((mary likes cheese))
((john likes beer))
((john likes wine))
((john likes chips))
((person mary))
((person john))
((mary likes X) (person X) (X likes chips)))
the system can answer the question (mary likes john) in the affirmative, and furthermore, when
prompted to list all the things that mary likes with (mary likes X), john will be among the results:
> (prove ’((mary likes X)))
> ?
((mary likes wine))
> ?
((mary likes cheese))
> ?
((mary likes john))
Also note that the statement to be proved has those apparently redundant extra braces too. This is
because the system can be asked to prove any number of things at once, for example:
> (prove ’((mary likes X) (X likes beer)))
((mary likes john) (john likes beer))
> ?
Error: no more solutions
This can be read as “prove mary likes X and X likes beer.” The query only succeeds if all of the
components succeed, so it is just like the body of a rule in this respect.
17.1. LOGIC PROGRAMMING EXAMPLES 301
Johann
Ambrosius
WF
Ernst
Now, we have a mixture of facts about fathers alone, and parents where the mother is known. But we
can rectify that with a few extra rules as follows:
That underscore is a special pattern variable that always matches anything. The first rule then says “X
is the father of Y if X and anybody are parents of Y” (the parents-of facts always list the father first.)
Likewise the second rule says “X is the mother of Y if anybody and X are parents of Y.” The last pair of
302 CHAPTER 17. UNIFICATION AND LOGIC PROGRAMMING
rules (actually really one rule in two parts) says that “X is a parent of Y if X is the father of Y or X is the
mother of Y.”
So we have a general way of expressing both and and or in our rules: and is expressed by adjacent
statements in the body of a single rule, while or is expressed as alternative versions of the same rule.
Now that we have a general parent-of relation, we can add a recursive ancestor-of relationship as
follows:
We can add yet more rules to this database. For example X and Y are siblings if X and Y have the same
parent:
If we don’t require that X and Y are not equal then this rule would think that old J. S. was his own
brother. Given that, we can ask:
etc.
(define append
(lambda (a b)
(if a
(cons (car a)
(append (cdr a)
b))
b)))
It walks to the end of the list a, then at that point returns the list b, and as the recursion unwinds it
builds a copy of a prepended to b. So for example:
> (append ’(a b) ’(c d))
(a b c d)
This definition is useful enough in itself, but logic programming allows a much more flexible definition
of append as follows:
(define the-rules
(list ’((append () Y Y))
’((append (A . X) Y (A . Z)) (append X Y Z))))
Note that this uses the PScheme dotted pair notation introduced in Section 8.4.1 on page 89. So the
expression (A . X) refers to a list who’s car is A and who’s cdr is X.
The first rule says that the result of appending something to the empty list is just that something.
The second rule says that you can join (A . X) and Y to make (A . Z) if you can join X and Y to make
Z.
Why is that more powerful than the PScheme append? Because we can ask lots of questions of it.
Not only can we ask “what do we get if we append (a b) and (c d)?”:
> (prove ’((append (a b) (c d) X)))
((append (a b) (c d) (a b c d)))
We can also ask “what do we need to append to (a) to get (a b c d e)?”:
> (prove ’((append (a) X (a b c d e))))
((append (a) (b c d e) (a b c d e)))
And even “what can we append together to make (a b c d)?”:
> (prove ’((append X Y (a b c d))))
((append () (a b c d) (a b c d)))
> ?
((append (a) (b c d) (a b c d)))
> ?
((append (a b) (c d) (a b c d)))
> ?
((append (a b c) (d) (a b c d)))
> ?
((append (a b c d) () (a b c d)))
> ?
Error: no more solutions
304 CHAPTER 17. UNIFICATION AND LOGIC PROGRAMMING
This idea is absolutely core to the concept of logic programming. A rule that states that “A and B
make C” is equally capable of describing “what do I need to make C?” provided C has a value when the
question is asked. It is as if the relationship described by the rule can be inspected from many different
angles when seeking a solution to a problem.
Back to append. The rules for append can of course be made available to other rules, for example
(peeking ahead a bit)
Says S is a sentence if you can append NP and VP to make S, and NP is a noun phrase, and VP is a verb
phrase. Rules for noun-phrase and verb-phrase would be very similar, and rules for individual words
would just be of the form ((noun (garage))) etc.
(define the-rules
(list
’((factorial 0 1))
’((factorial N X) (T is (- N 1)) (factorial T U) (X is (* N U)))))
This isn’t as bad as it might first look. The first rule is the bare fact that the factorial of 0 is 1. The
second rule says that the factorial of N is X if T is N - 1 and the factorial of T is U and X is N * U. The
special infix is operator forces arithmetic evaluation of its right hand side, then requires that its left
hand side is the same as the result of that evaluation.
given the above we can calculate factorials:
However there is a limitation here. Because of the unidirectional nature of that is operator, we cannot
ask “what number can we apply factorial to to get x”:
So it goes.
• The derivative of x in x is 1.
If you don’t remember the maths from school, just sit back and enjoy the ride. These rules can translate
directly into our logic system as:
(define the-rules
(list ’((derivative X X 1))
’((derivative N X 0) (require (number? ’N)))
’((derivative (X ^ N) X (N * (X ^ P))) (P is (- N 1)))
’((derivative (F + G) X (DF + DG))
(derivative F X DF)
(derivative G X DG))
’((derivative (F - G) X (DF - DG))
(derivative F X DF)
(derivative G X DG))
’((derivative (F * G) X ((F * DG) + (G * DF)))
(derivative F X DF)
(derivative G X DG))
’((derivative (1 / F) X ((- DF) / (F * F)))
(derivative F X DF))
’((derivative (F / G) X (((G * DF) - (F * DG)) / (G * G)))
(derivative F X DF)
(derivative G X DG))))
This is obviously way more complex than Mary and John, but there isn’t actually much that you haven’t
seen before. The first and most important thing to realise is that, with one exception, the arithmetic
operators “+”, “-” etc. mean nothing special to the logic program: they are just symbols in patterns
to be matched. Having said that we do need ways to perform numeric tests and to do arithmetic. The
body of second rule requires that N is a number, and the body of third rule evaluates (- N 1) before
“assigning” it to P. Rules of the form (require hexpri) and (hvari is hexpri) are recognized and
treated specially by the system.
Anyway, having entered these rules we can ask the system:
306 CHAPTER 17. UNIFICATION AND LOGIC PROGRAMMING
(prove ’((derivative
(((x ^ 2) + x) + 1)
x
X)))
((derivative
(((x ^ 2) + x) + 1)
x
(((2 * (x ^ 1)) + 1) + 0)))
The last line is the computed value for X, the pattern variable in the query. For now we will have to
simplify that manually:
(((2 * (x ^ 1)) + 1) + 0)
(2 * x ) + 1
You can see that the result is indeed the differential of x2 + x + 1, namely 2x + 1.
That should be enough to whet your appetite for what the rest of this chapter has to offer. We next
turn our attention to pattern matching, which is the basis of unification, which along with amb from the
previous chapter is the basis of our logic programming implementation.
• The pattern (a b c) will only match the structure (a b c) because the pattern contains no
variables.
• The pattern (a b c) will not match the structure (a b foo) because c does not equal foo.
2
A real Scheme implementation is case-insensitive, so does not have this luxury. PScheme is case-sensitive which allows
us to follow the convention of the logic programming language Prolog, where capital letters introduce pattern variables.
17.2. PATTERN MATCHING 307
• The pattern (a X c) will match the structure (a b c) because the variable X can stand for the
symbol b.
• The pattern X will match the structure (a b c) because X can stand for the entire structure.
• The pattern (a X c) will match the structure (a (1 2 3) c) because X can stand for (1 2 3).
• The pattern (a (1 X 3) c) will match the structure (a (1 2 3) c) because X can stand for 2.
• The pattern (a X Y) will match the structure (a b c) because X can stand for b and Y can stand
for c.
• The pattern (a X X) will not match the structure (a b c) because the variable X must stand for
the same thing throughout the matching process: it cannot be both b and c.
where capitalized strings represent the pattern variables. It matches them against structures such as
Here’s a birds-eye view of how our first pattern matcher will work. It walks both the pattern and the
structure in parallel, also passing an additional, initially empty environment around. If it encounters a
variable in the pattern then it checks to see if the variable is already set in the environment. If it is,
then it checks that the current structure component is the same as the value of the variable, failing if it
is not. If the variable is not set in the environment, it extends the environment, binding the variable to
the current structure component, and continues. Of course matching will also fail if the pattern and the
structure are not otherwise identical. On success, it returns the environment, and on failure, it dies.
308 CHAPTER 17. UNIFICATION AND LOGIC PROGRAMMING
Both for the fun of it, and for completeness’ sake, the pattern matcher described here can handle
perl hashrefs as well as listrefs. This means it will accept patterns and structures containing mixtures of
both types. However only the values of a hash in a pattern will be recognised as pattern variables, the
keys will not.
Here’s the top-level match() routine:
sub match {
my ($pattern, $struct, $env) = @_;
$env ||= {};
if (var($pattern)) {
match_var($pattern, $struct, $env);
} elsif (hashes($pattern, $struct)) {
match_hashes($pattern, $struct, $env);
} elsif (arrays($pattern, $struct)) {
match_arrays($pattern, $struct, $env);
} elsif (strings($pattern, $struct)) {
match_strings($pattern, $struct, $env);
} else {
fail();
}
return $env;
}
Firstly, if $env is not passed, then match() initializes it to an empty hashref. The pattern matching
algorithm is never required to undo any variable bindings that it creates, so we can just pass around a
hash by reference and allow the bindings to accumulate in it. Then we see various tests for the types of
$pattern and $struct. The var() check is just:
sub var {
my ($thing) = @_;
!ref($thing) && $thing =~ /^[A-Z]/;
}
sub hashes {
my ($a, $b) = @_;
hash($a) && hash($b);
}
sub hash {
my ($thing) = @_;
ref($thing) eq ’HASH’;
}
The other two checks, arrays() and strings() are defined equivalently:
17.2. PATTERN MATCHING 309
sub arrays {
my ($a, $b) = @_;
array($a) && array($b);
}
sub array {
my ($thing) = @_;
ref($thing) eq ’ARRAY’;
}
sub strings {
my ($a, $b) = @_;
string($a) && string($b);
}
sub string {
my ($thing) = @_;
!var($thing) && !ref($thing);
}
sub match_var {
my ($var, $struct, $env) = @_;
if (exists($env->{$var})) {
match($env->{$var}, $struct, $env);
} else {
$env->{$var} = $struct;
}
}
It checks to see if the $var is in the environment. If it is then it attempts to match() the value of the
variable against the $struct3 . If the $var is not already in the environment then it puts it there with
a value equal to the current $struct (an unassigned variable will always match the current structure
component, and will be instantiated to it.)
Returning to match(), if both the $pattern and the $struct are arrays(), match() calls match -
arrays() on them.
match arrays() walks both arrays in tandem, calling match() on each pair of elements. If the arrays
are not the same length then they cannot possibly match, so this sanity check is performed first:
sub match_arrays {
my ($pattern, $struct, $env) = @_;
my @patterns = @$pattern;
3
We use match() here only because it is convenient: neither the value of the variable nor the structure will contain
variables, so match() is being used as a recursive equality test (like eq?.)
310 CHAPTER 17. UNIFICATION AND LOGIC PROGRAMMING
my @structs = @$struct;
if (@patterns != @structs) { fail(); }
while (@patterns) {
match(shift @patterns, shift @structs, $env);
}
}
Back to match() again. If the $pattern and the $struct are both hashes(), then match() calls
match hashes() on them:
sub match_hashes {
my ($pattern, $struct, $env) = @_;
check_keys_eq($pattern, $struct);
foreach my $key (sort keys %$pattern) {
match($pattern->{$key}, $struct->{$key}, $env);
}
}
Much as match arrays() checks that the two arrays are the same length, match hashes() checks that
the two hashes have the same keys using check keys eq():
sub check_keys_eq {
my ($as, $bs) = @_;
my $astr = join(’.’, sort keys %$as);
my $bstr = join(’.’, sort keys %$bs);
fail unless $astr eq $bstr;
}
This is a cheap trick and could easily be fooled, but it’s good enough for our demonstration purposes.
Assuming that the hashes have equal keys (this pattern matcher does not allow—or at least expect—
hash keys to be variables), match hashes() walks the keys matching the individual components in much
the same way as match arrays() did. It sorts the keys before traversing them to ensure the order of
any variable assignment is at least deterministic.
Back to match() yet again. If both the $pattern and the $struct are strings(), match() calls
match strings() on them. This is the most trivial of the matching subroutines: it just compares the
strings and fails if they are not equal:
sub match_strings {
my ($pattern, $struct, $env) = @_;
fail if $pattern ne $struct;
}
my @facts = (
{
composer => ’beethoven’,
initials => ’lv’,
lived => [1770, 1829]
},
{
composer => ’mozart’,
initials => ’wa’,
lived => [1756, 1791]
},
{
composer => ’bach’,
initials => ’js’,
lived => [1685, 1750]
}
);
That’s it for pattern matching. We next turn our attention to Unification, which as I’ve said is an
extension to Pattern Matching and is much more interesting.
17.3 Unification
Unification, in a nutshell, is the matching of two patterns. It solves the problem “given two patterns,
that might both contain variables, find values for those variables that will make the two patterns equal.”
Our pattern matcher from the previous section is a good jumping off point for implementing true
unification. In fact it has most of the things we’ll need already in place. The next section discusses the
modifications we will need to make to it.
312 CHAPTER 17. UNIFICATION AND LOGIC PROGRAMMING
and
A => ’abc’,
B => [’g’, ’abc’]
You can see the process graphically in Figure 17.2. The variable ’A’ unifies with the term ’abc’ while
the variable ’B’ unifies with the compound term [’g’, ’A’], where ’A’ is provided with a value ’abc’
from the previous unification.
Figure 17.2: Unification of [’f’, [’g’, ’A’], ’A’] with [’f’, ’B’, ’abc’]
[ f [ g A ] A ]
[ f B abc ]
[ f [ g abc ] abc ]
Unification is capable of even more complex resolutions, for example it can unify (ommitting quotes for
brevity this time)
with
To show that
• A = 1
• B = 2
• C = [1, 2]
17.3. UNIFICATION 313
• D = 1
• E = 2
• F = [1, 2]
You can see this in action in Figure 17.3 if you just follow the differently styled arrows starting from the
three ringed nodes in the figure as they propogate information around.
[ F [ A 1 B ] [ A B ] 2 ]
[ C [ D D E ] C E ]
[ [ 1 2 ] [ 1 1 2 ] [ 1 2 ] 2 ]
This unifier has additional feature: the anonymous variable “ ” (underscore) behaves like a normal
variable but will always match anything, since it is never instantiated. This allows you to specify a
“don’t care” condition. For example, going back to our database of composers, the pattern:
{
composer => ’COMPOSER’,
initials => ’_’,
lived => ’_’
}
will just retrieve all of the composers names from the database, without testing or instantiating any other
variables. Not also that since “ ” always matches, and is never instantiated, it can be reused throughout
a pattern.
This unifier is a direct modification of the pattern matcher from the previous section, so let’s just
concentrate on the differences. Firstly match() has been renamed to unify(), and it has an extra clause,
in case the old structure, which is now also a pattern, contains variables. The various match * subs have
also been renamed unify *, and the variables $pattern and $struct, now both patterns, have been
renamed to just $a and $b:
314 CHAPTER 17. UNIFICATION AND LOGIC PROGRAMMING
sub unify {
my ($a, $b, $env) = @_;
$env ||= {};
if (var($a)) {
unify_var($a, $b, $env);
} elsif (var($b)) {
unify_var($b, $a, $env);
} elsif (hashes($a, $b)) {
unify_hashes($a, $b, $env);
} elsif (arrays($a, $b)) {
unify_arrays($a, $b, $env);
} elsif (strings($a, $b)) {
unify_strings($a, $b, $env);
} else {
fail();
}
return $env;
}
The extra clause, if $b is a var, simply reverses the order of the arguments to unify var(). Note that
the single environment means that variables will share across the two patterns. If you don’t want this,
simply make sure that the two patterns don’t use the same variable names.
unify hashes(), unify arrays() and unify strings() are identical to their match * equivalents,
except that unify arrays() and unify hashes() call unify() instead of match() on their components.
The var() check is slightly different, to allow for the anonymous variable:
sub var {
my ($thing) = @_;
!ref($thing) && ($thing eq ’_’ || $thing =~ /^[A-Z]/);
}
That leaves unify var(), where the action is. unify var() is still quite similar to match var(), it just
has more things to watch out for:
sub unify_var {
my ($var, $other, $env) = @_;
if (exists($env->{$var})) {
unify($env->{$var}, $other, $env);
} elsif (var($other) && exists($env->{$other})) {
unify($var, $env->{$other}, $env);
} elsif ($var eq '_') {
return;
} else {
$env->{$var} = $other;
}
}
17.3. UNIFICATION 315
So $struct was renamed to $other, and unify var() calls unify() instead of match(). If $var is not
set in the environment, instead of immediately assuming it can match $other, unify var() looks to
see if $other is a var and already has a value. If so it calls unify() on $var and the value of $other.
If $other is not a var, or has no binding, unify var() next checks to see if $var is the anonymous
variable. If it is, then because the anonymous variable always matches and is never instantiated, it just
returns. Lastly, only when all other options have been tried, it adds a binding from the $var to $other
and returns.
Let’s walk through the actions of unify() as it attempts to unify the two patterns [’f’, [’g’,
’A’], ’A’] and [’f’, ’B’, ’abc’].
• unify([’f’, [’g’, ’A’], ’A’], [’f’, ’B’, ’abc’], {}) is called with the two complete pat-
terns and an empty environment, and determines that both patterns are arrays, so calls unify -
arrays().
– unify arrays([’f’, [’g’, ’A’], ’A’], [’f’, ’B’, ’abc’], {}) simply calls unify()
on each component.
∗ unify(’f’, ’f’, ()) determines that both its arguments are strings, and calls unify -
strings().
· unify strings(’f’, ’f’, {}) = {} succeeds but the environment is unchanged.
∗ unify([’g’, ’A’], ’B’, {}) determines that it’s second argument is a variable and so
calls unify var() with the arguments reversed.
· unify var(’B’, [’g’, ’A’], {}) = {B => [’g’, ’A’]} succeeds, and unify -
var() extends the environment with ’B’ bound to [’g’, ’A’].
∗ unify(’A’, ’abc’, {B => [’g’, ’A’]}) determines that it’s first argument is a vari-
able and so calls unify var() again, passing the environment that was extended by the
previous call to unify var().
· unify var(’A’, ’abc’, {B => [’g’, ’A’]}) = {B => [’g’, ’A’], A =>
’abc’} also succeeds, extending the environment with a new binding of ’A’ to
’abc’. This environment is the final result of the entire unification.
So the final result {B => [’g’, ’A’], A => ’abc’} falls a little short of our expectations, because the
value for ’B’ still contains a reference to the variable ’A’4 . However this is not a problem as such. we
can patch up the result with a separate routine called resolve().
sub resolve {
my ($pattern, $env) = @_;
while (var($pattern)) {
if (exists $env->{$pattern}) {
$pattern = $env->{$pattern};
} else {
return $pattern;
}
}
if (hash($pattern)) {
4
In fact it is quite possible to retrieve bindings of one variable directly to another, like {A => ’B’} in other circumstances,
for example if ’B’ did not have a value when it was unified with ’A’.
316 CHAPTER 17. UNIFICATION AND LOGIC PROGRAMMING
my $ret = {};
foreach my $key (keys %$pattern) {
$ret->{$key} = resolve($pattern->{$key}, $env);
}
return $ret;
} elsif (array($pattern)) {
my $ret = [];
foreach my $item (@$pattern) {
push @$ret, resolve($item, $env);
}
return $ret;
} else {
return $pattern;
}
}
resolve() takes a pattern, and the environment that was returned by unify(). If and while the pattern
is a variable, it repeatedly replaces it with its value from the environment, returning the variable if it
cannot further resolve it. Then if the result is a hash or an array reference, resolve() recursively calls
itself on each component of the result, also passing the environment. The final result of resolve() is a
structure where any variables in the pattern that have values in the environment have been replaced by
those values. Note that resolve() does not change the environment in any way.
This completes our stand-alone implementation of unify(). Hopefully seeing it out in the open like
this will make the subsequent implementation within PScheme easier to digest. It is a little difficult to
demonstrate the utility of unify() at this point, since it’s purpose is mostly part of the requirements
of logic programming, however to give you some idea, consider that the result of one unification, an
environment or hash, can be passed as argument to a second unification, thus constraining the values
that the pattern variables in the second unification can potentially take.
There is one extremely interesting and useful application of unification outside of logic programming
which makes use of this idea. Consider that we might want to check that the types of the variables
in a PScheme expression are correct before we eval the expression. Assume that we know the types of
the arguments and return values from all primitives in the language. Furthermore we can also detect
the types of any variables which are assigned constants directly. It is therefore possible to detect if a
variable’s assigned value does not match it’s eventual use, even if that eventual use is remote (through
layers of function calls) from the original assignment. Such a language, which does not declare types
but is nonetheless capable of detecting type mismatches, is called an implicitly typed language. We can
use unification to do this type checking, by treating PScheme variables as pattern variables and unifying
them with their types and with each other across function calls, accumulating types of arguments and
return values for lambda expressions and functions in the process.
That however, is for another chapter. Next we’re going to look at the implementation of unify in
PScheme.
like PScheme we can distribute a Unify() method around the various data types and avoid this explicit
type checking for the most part.
A second point worth noting is that where the above unify() did a die on failure, our new Unify()
can quite reasonably invoke backtracking instead, to try another option, which fits in quite neatly with
our existing amb implementation.
A third and final point. unify() above made use of a flat hash to keep track of variable bindings,
but PScheme already has a serviceable environment implementation and we should make use of it. This
will mean exposing the environment as a PScheme data type since that is what is explicitly passed to
and returned by Unify(), but this is not a concern since we have done this once before in our classes
and objects extension from Chapter 12 on page 135.
We’d better start by looking at the unify command in action in the interpreter. The result of a call
to unify is a PScm::Env which isn’t much direct use. However we can add another PScheme command
that will help us out there. substitute takes a form and an environment, and replaces any pattern
variables in the form with their values from the argument environment. It also performs the resolve()
function on each value before substitution. So for example:
> (substitute
> ’(f B A)
> (unify ’(f (g A) A)
> ’(f B abc)))
(f (g abc) abc)
The call to unify provides the second argument, an environment, to substitute, which then performs
the appropriate replacements on the expression (f B A) to produce the result (f (g abc) abc). Note
that in all cases we have to quote the expressions to prevent them from being evaluated. We do need
the interpreter to evaluate the arguments to substitute and unify in most cases however, because the
actual forms being substituted and unified may be passed in as (normal PScheme) variables or otherwise
calculated.
I should probably also demonstrate that unify does proper backtracking if it fails:
It’s still somewhat difficult to demonstrate the usefulness of unify combined with amb at this stage
however. That will have to wait until the next section where we finally get to see logic programming in
action.
The first thing we need to do then, is to create a new special form PScm::SpecialForm::Unify
and give it an Apply() method. This special form will be bound to the symbol unify in the top-
level environment. unify will take two or three arguments. The first two arguments are the patterns
to be unified. The third, optional argument is an environment to extend. If unify is not passed an
environment, it will create a new, empty one. We have to make unify a special form because it needs
access to the failure continuation. Here’s PScm::SpecialForm::Unify:
You can see that it uses map eval() from Section 13.6.5 on page 206 to evaluate its argument $form,
passing it a continuation that breaks out the patterns $a and $b, and the optional environment $qenv
from the evaluated arguments. Then it defaults $qenv to a new, empty environment, and calls Unify()
on $a passing it the other pattern, the query environment and the success and failure continuations.
Referring back to our test implementation of unify() in Section 17.3.1 on page 312 we can see that
the first thing that implementation does is to check if its first argument is a var, and if so call unify var()
on it. We can replace this explicit conditional with polymorphism by putting a Unify() method at an
appropriate place in the PScm::Expr hierarchy. But the PScm::Expr::Symbol class is not the best
place: not all symbols are pattern variables, only those starting with capital letters or underscores. So
here’s the trick. We create a new subclass of PScm::Expr::Symbol called PScm::Expr::Var and put
the method there. Read() can detect pattern variables on input and create instances of this new class
instead of PScm::Expr::Symbol. Since the new class inherits from PScm::Expr::Symbol, and we do
not override any of that class’s existing methods, these new PScm::Expr::Var objects behave exactly
like ordinary symbols to the rest of the PScheme implementation. Here’s the change to next token()
from PScm::Read to make this happen.
The only change is on Lines 88–89 where if the token matched starts with a capital letter or underscore
then next token() returns a new PScm::Expr::Var where otherwise it would have returned a PScm::
Expr::Symbol.
Now we have somewhere to hang the functionality equivalent to unify var() from our test imple-
mentation, we can put it in a method called Unify() in PScm::Expr::Var:
It’s identical to the earlier unify var() except that it makes use of method calls and is written in cps.
The is var() method is defined to be false at the root of the expression hierarchy in PScm::Expr, but
is overridden to be true in PScm::Expr::Var alone. Equivalently is anon() is defined false in PScm::
320 CHAPTER 17. UNIFICATION AND LOGIC PROGRAMMING
The only other occurrence of Unify() is at the root of the hierarchy in PScm::Expr:
This really just takes care of the case where the first pattern is not a var but the second pattern is. If the
second pattern is a var it calls Unify() on it, passing $self as the argument, reversing the order in the
same way as our prototype unify() did. If the second pattern is not a var, then it calls a new method
UnifyType() on $self. UnifyType() is just another name for Unify() and allows a second crack at
polymorphism since it is implemented separately in a couple of places in the PScm::Expr hierarchy.
The first such place is in PScm::Expr itself.
This works for all atomic data types. If the two patterns are Eq() then succeed, otherwise fail. This
is the first place we’ve actually seen the failure continuation invoked. Note that the equality test Eq()
implicitly deals with type equivalence for us, so we don’t need the arrays() routines etc. from the
prototype. Now the only other place we need to put UnifyType() is in PScm::Expr::List::Pair
282 cont {
283 my ($qenv, $fail) = @_;
284 $self->[REST]->Unify(
285 $other->[REST],
286 $qenv,
287 $cont,
288 $fail
289 );
290 },
291 $fail
292 );
293 } else {
294 $fail->Fail();
295 }
296 }
This PScm::Expr::List::Pair::UnifyType() is in fact simpler in one sense than the unify arrays()
from our test implementation. First it performs a simple check that the $other is a list. If not it calls the
failure continuation. Then, rather than walking both lists, it only needs to call Unify() on its first()
and rest(), passing the $other’s first() or rest() appropriately. Of course this is a little complicated
because it’s in cps, but nonetheless that is all it has to do.
That’s all for unify itself. If you remember from the start of this section, we will also need a substitute
builtin to replace pattern variables with values in the environment. It is called like (substitute
hpatterni henvi) and returns the hpatterni suitably instantiated. We can make this a primitive rather
than a special form because it has no need of an environment (other than the one that is explicitly
passed) and no need to access the failure continuation (it always succeeds). Here’s PScm::Primitive::
Substitute:
It does nothing much by itself, merely calling a ResolveAll() method on the argument environment
then passing the result to the $body’s Substitute() method. We’ll take a look at that new PScm::
Env::ResolveAll() method first.
This ResolveAll() loops over each key in the environment, calling a subsidary Resolve() method on
each and saving the result in a temporary %bindings hash. Then it creates and returns a new PScm::
Env with those bindings.
If you refer back to our resolve() function in the prototype, you can see that in the first stage, if the
$pattern is a variable, it repeatedly attempts to replace it with a value from the environment until either
it is not a variable anymore or it cannot find a value. This ResolveAll() is effectively pre-processing the
environment so that any subsequent lookup for a pattern variable will not need to perform that iteration.
Keys() just collects all the keys from the environment:
and Resolve() does exactly what resolve() did in our test implementation: it repeatedly replaces the
variable with its value from the environment until the variable is not a variable any more, or cannot be
found. If it finds a non-variable value it calls its ResolveTerm() method on it, passing the env $self as
argument, and returning the result.
ResolveTerm() gives any compound term a chance to resolve any pattern variables it may contain.
There are two definitions of ResolveTerm(). The only compound terms in PScheme are lists, and
pattern variables themselves have already been resolved, so the default ResolveTerm() in PScm::Expr
just returns $self:
17.3. UNIFICATION 323
It walks itself, calling the argument $qenv’s Resolve() method on each component, and returning a new
PScm::Expr::List of the results.
So we’re talking about how substitute works, and we saw that primitive’s apply() method called
the argument $qenv’s ResolveAll() method to return a new environment with any pattern variables in
the values replaced, where possible. Then apply() passed that new environment to its argument $body’s
Substitute() method. We’ve just seen how ResolveAll() works, now we can look at Substitute()
itself.
Only pattern variables can be substituted, but lists need to examine themselves to see if they contain
any pattern variables. So a default Substitute() method in PScm::Expr takes care of all the things
that can’t be substituted, it just returns $self:
The Substitute() in PScm::Expr::Var returns either its value from the environment or itself if it is
not in the environment:
And that’s substitute. To sum up, it tries as hard as it can to replace all pattern variables in the form
with values from the environment, recursing not only into the form it is substituting, but also into the
values of the variables themselves.
324 CHAPTER 17. UNIFICATION AND LOGIC PROGRAMMING
There are a few more things we need to add to the interpreter before we can show off its new prowess.
Firstly we will have occasion to pass an empty environment into unify (indirectly), and for that we’ll
need a new-env primitive. This is as simple as it gets:
Another thing we’ll need goes back to a passing comment I made a while back. It can be a problem if
you try to unify two patterns that inadvertantly use the same pattern variable names. Sometimes you
want the variables to share a value, and sometimes you don’t. For this reason we need something that
will take a pattern and replace its variable names with new variable names that are guaranteed to be
unique. The same variable occurring more than once in the pattern should correspond to the same new
variable occurring more than once in the result, but we should be reasonably confident that the new
variable name won’t appear anywhere else in the program by accident.
This new PScheme function is called instantiate. It could be written in the PScheme language,
but that would require adding other less germane primitives for creating symbols etc. so all in all it is
probably better to build it in to the core. It can be a primitive, but it will need a bit more than just an
apply() method:
Remember that there is only one instance of any given primitive or special form in the PScheme environ-
ment, and that persists for the duration of the repl. So by giving this primitive its own new() method
we provide a convenient place to store a singleton counter that we can use to generate new symbols.
The seen field of the object however, which keeps track of which variables the instantiate function
has already encountered, must be re-initialized to an empty hash on each separate application of the
primitive. After (re-)initializing seen on Line 281, apply() calls its argument $body’s Instantiate()
method passing $self as argument.
Obviously the only PScm::Expr type that will avail itself of the instantiate object is PScm::Expr::
Var, and that will make use of the callback method Replace() to find or generate a replacement variable.
Replace() then, first checks to see if the variable is the anonymous variable (Line 287). If so then it
just returns the variable, since the anonymous variable never shares and should never be replaced by
a variable that will share. Then unless it has already seen the variable it creates a new alias for it by
using the incrementing counter (Lines 288-291). This works well because the reader would never create
a PScm::Expr::Var from a number.
Just as with Unify() and Substitute(), a default method Instantiate() in PScm::Expr handles
most cases and just returns $self:
The PScm::Expr::Var version of instantiate calls the $instantiator’s Replace() callback to get a
new variable, passing $self as argument because Replace() needs to keep track of the variables it has
seen already:
The last thing we’re going to need, for pragmatic reasons, is a way to check the type of various expressions
from within the PScheme language. A proper scheme implementation has a full set of such type checking
functions, but we’re only going to need pair?, number? and var? (note the question marks.) They are
all primitives, and in fact have so much in common that we will create an abstract parent class called
PScm::Primitive::TypeCheck and put a shared apply() method in there:
You can see that it calls a test() method which we must subclass for each test, then it returns an
appropriate true or false value depending on the test. Here’s the test for pair? in PScm::Primitive::
TypeCheck::Pair:
We’ve already seen that is pair() is defined false in PScm::Expr and overridden to be true in PScm::
Expr::List::Pair alone. The equivalent number? and var? PScheme functions are bound to PScm::
Primitive::TypeCheck::Number and PScm::Primitive::TypeCheck::Var, and make use of equiv-
alent is number() and is var() methods in PScm::Expr.
We have now implemented the four components we need to get on with defining a logic programming
language: unify, substitute, new-env and instantiate. Along with those we have also added the
three type tests pair?, var? and number? which just check if their argument is of that type. They are
all wired in to the repl in the normal way, here’s the additions:
17.3. UNIFICATION 327
Socrates
While a mortal man should have no difficulty answering “yes” to the above puzzle, sql queries might
have some difficulty.
Here’s a formulation of the rules in our system:
(define the-rules
’(
((mortal X) (man X))
((man socrates))
))
The first rule on the list should be read as (mortal X) if (man X) that is, “X is mortal if X is a man,”
or colloquially “all men are mortal”.
The second rule is just the bare fact “Socrates is a man.”
In response to a query (mortal socrates) the system will respond in the affirmative. In response
to a query like (mortal aristotle) you will just get Error: no more solutions.
You already know what unification does, so you should be able to start to see what is happening here.
The system is given the query ((mortal socrates)) so it scans through the-rules looking only at the
head of each rule, trying to unify it with the first term in the query, (mortal socrates). It succeeds
in unifying it with (mortal X). The result of that unification is an environment where X is bound to
socrates. In the context of that environment, it descends into the body of the rule, attempting to prove
each component of the body just as if it had been entered as a direct query, but with the variable parts
17.4. LOGIC PROGRAMMING 329
substituted for their values. In this case that means trying to find a rule that matches (man X) where
X=socrates. This succeeds matching the fact (man socrates) and so the entire query succeeds.
What about asking “Who is mortal?”
In response to the query (mortal X) the head of each rule is again scanned. But I haven’t told you
the full story at this point.
If you remember from Section 17.3.2 on page 316 we said there might be problems trying to unify
two patterns which happened to contain the same variable names, and for that reason we implemented
instantiate to replace the variables in a pattern with others that were equivalent, but guaranteed to
be unique. In fact when the database of rules is scanned, instantiate is called on each rule before the
unification with the head is attempted. This has no effect on our first example query (mortal socrates)
but it does make a difference for (mortal X), because the X appears in both the query and the rule.
Thanks to instantiate, (mortal X) unifies with the head of a rule that looks like ((mortal h0i) (man
h0i)).
So the variable X unifies with the variable h0i rather than itself, and it is in this environment that
the body of the rule (man h0i) is investigated.
The statement (man h0i) succeeds in unifying with (man socrates), and in the process h0i is bound
to socrates. Now the entire rule has succeeded and the query succeeds, resulting in an environment
where X=h0i and h0i=socrates. substitute is given the form (mortal X) and that environment, and
produces the result (mortal socrates).
So let’s see how to build this system from our “toolkit” of unify, substitute, instantiate, and
new-env. Before diving into the code, here is a slightly more formal definition of what we shall be doing.
So you can see that the body of a rule is treated exactly the same as a top-level query, except that the
current environment may contain variables that were instantiated by prior unifications. Therefore our
implementation can recurse on the body of a rule, re-using the code that we shall write to process a top-
level query. You should also be aware that the environment always accumulates on its way downstream
to a solution, only backtracking causes bindings in the accumulating environment to be discarded on the
way back upstream.
This is the first time we have really used the PScheme language to implement any serious piece of
functionality. I’ve chosen to do it this way for two reasons. Firstly it emphasises a nice abstraction barrier
330 CHAPTER 17. UNIFICATION AND LOGIC PROGRAMMING
between our “toolkit” of primitive Perl operations and the PScheme “glue” that binds them into a logic
programming system; secondly and perhaps more importantly, it’s actually easier to do in PScheme than
it would have been in Perl, which I’m honestly quite pleased about. Having said all that, the code may
take a little more study if you’re not used to reading Scheme programs yet, but it is blissfully short and
sweet.
The top-level function, as you have seen from examples above, is called prove, and here’s its definition:
(define prove
(lambda (goals)
(substitute goals
(match-goals goals
(new-env)))))
prove is given a list of goals (statements to be proved.) It calls another function match-goals passing
both the goals and a new empty environment. If match-goals succeeds, it will return an environment
with appropriate variables bound, and prove passes that environment along with the original goals
to substitute, which replaces variables in the goals with their values then returns the result. If
match-goals fails, control will backtrack through prove and we will see Error: no more solutions.
Here’s match-goals:
(define match-goals
(lambda (goals env)
(if goals
(match-goals (cdr goals)
(match-goal (car goals)
env))
env)))
match-goals walks its list of goals, calling another function match-goal on each, and collecting the
resulting extended environment. If there are no goals left to prove, then match-goal succeeds and
returns the environment it was passed. Incidentally this means that prove with an empty list of goals
will always succeed and return its argument environment unchanged.
Here’s match-goal:
(define match-goal
(lambda (goal env)
(match-goal-to-rule goal
(one-of the-rules)
env)))
Here is where we start to see amb coming in to play. match-goal uses the one-of function that we
defined in Section 16.1 on page 249 to pick one of the list of rules to try to match against the goal. It
passes the goal, the chosen rule, and the environment to match-goal-to-rule, which does the actual
unification and recursion.
Here’s match-goal-to-rule:
17.4. LOGIC PROGRAMMING 331
(define match-goal-to-rule
(lambda (goal rule env)
(let* ((instantiated-rule (instantiate rule))
(head (car instantiated-rule))
(body (cdr instantiated-rule))
(extended-env (unify (substitute goal env)
head
env)))
(match-goals body extended-env))))
It uses let* to first create an instantiated copy of the rule, and then extract the head and the body from
the instantiated-rule. Then it calls unify on the (substituted) goal and the head of the instantiated
rule. If unify succeeds, then the result is an extended environment that match-goal-to-rule uses to
recursively call match-goals on the body of the rule. This is the point of recursion discussed above,
where the body of a rule is treated as a new query.
That’s all there is to it! Of course what is implicit in the above code is the backtracking that
both unify and amb provoke if a unification fails or the list of rules to try is exhausted. This is most
apparent in match-goal-to-rule above: if unify fails, then control simply backtracks out of the func-
tion at that point. Likewise in match-goal, when one-of runs out of options, control backtracks and
match-goal-to-rule is never called.
There are a couple of refinements we can make howver. We would like to support require and is from
our examples at the start of the chapter6 .
Additionally, it would be useful if we could check that the result returned by prove does not contain
any unresolved variables. If it does, this should be considered a failure. For that reason we make use of
those two type checking functions pair? and var? to write a little recursive test called no-vars?:
(define no-vars?
(lambda (expr)
(if (pair? expr)
(and (no-vars? (car expr))
(no-vars? (cdr expr)))
(not (var? expr)))))
We can use that to write a variant of substitite called substitute-all that first of all performs the
substitution then requires that the result contains no vars:
(define substitute-all
(lambda (expr env)
(let ((subst-expr (substitute expr env)))
(begin
(require (no-vars? subst-expr))
subst-expr))))
6
(requre hexpressioni) fails if hexpressioni is false, and (hexpr1i is hexpr2i) fails if hexpr1i does not equal hexpr2i,
in both cases after variable substitution and evaluation.
332 CHAPTER 17. UNIFICATION AND LOGIC PROGRAMMING
If the require succeeds, then substitute-all returns the substituted expression just as substitute
does. Otherwise substitute-all backtracks. We can use this instead of substitute in the top-level
prove function:
(define prove
(lambda (goals)
(substitute-all goals
(match-goals goals
(new-env)))))
(define match-goal
(lambda (goal env)
(if (eq? (car (cdr goal)) 'is)
(match-is goal env)
(if (eq? (car goal) 'require)
(match-require goal env)
(match-goal-to-rule goal
(one-of the-rules)
env)))))
I’ve added two new functions match-is and match-require to actually deal with those special rules.
neither of them are particularily complex. match-is extracts the var and the expression from the rule,
then it calls substitute-all on the expression. It will therefore fail and backtrack if the expression
contains variables that have not been bound. This makes sense because the next thing it does is to pass
the expression to eval (defined in Section 9.2.3 on page 117) and it makes no sense for eval to operate
on an expression which contains instantiated pattern variables (h0i, h1i etc.) that cannot possibly have
values in the normal pscheme environment. If it gets that far, then match-is finally unifies the var term
with the substituted and evaluated expression, returning the new environment:
(define match-is
(lambda (goal env)
(let* ((var (car goal))
(value (car (cdr (cdr goal))))
(svalue (substitute-all value env)))
(unify var (eval svalue) env))))
Note particularily that “is” is not necessarilty a test, it is a unification that will fail if the left hand
expression cannot be unified with the right, so it can be considered both an assertion and potentially an
assignment. To be clear match-is allows us to deal with statements like:
17.5. MORE LOGIC PROGRAMMING EXAMPLES 333
(N is (+ X Y))
provided X and Y are bound. In this example either N must already have a numeric value equal to the
sum of X and Y, or it must be unbound, in which case it will recieve that value.
match-require is quite similar to match-is.
(define match-require
(lambda (goal env)
(let ((sgoal (substitute-all goal env)))
(begin
(eval sgoal)
env))))
It too calls substitute-all, this time on the entire expression, for the same reasons match-is did.
Then it passes the whole expression to eval. If the require in the substituted goal fails, control will
backtrack from that point as usual.
That concludes our implementation. Let’s try it out!
• “Paul has put a very very old car in his quite new red garage.”
These are obviously more complex than the sentences we worked through with amb, but we can deal with
them easily enough here:
(define the-rules
(list
’((proper-noun (john)))
’((proper-noun (paul)))
’((noun (car)))
’((noun (garage)))
’((auxilliary (will)))
334 CHAPTER 17. UNIFICATION AND LOGIC PROGRAMMING
’((auxilliary (has)))
’((verb (put)))
’((article (the)))
’((article (a)))
’((article (his)))
’((preposition (in)))
’((preposition (to)))
’((preposition (with)))
’((degree (very)))
’((degree (quite)))
’((adjective (red)))
’((adjective (green)))
’((adjective (old)))
’((adjective (new)))
’((append () Y Y))
’((append (A . X) Y (A . Z)) (append X Y Z))
’((sentence S)
(append NP VP S)
(noun-phrase NP) (verb-phrase VP))
’((sentence S)
(append _X VP S) (append NP AUX _X)
(noun-phrase NP) (auxilliary AUX) (verb-phrase VP))
’((noun-phrase NP)
(append ART ADJP NP)
(article ART) (adj-phrase ADJP))
’((noun-phrase NP) (proper-noun NP))
’((verb-phrase VP)
(append _X PP VP) (append VB NP _X)
(verb VB) (noun-phrase NP) (prep-phrase PP))
’((prep-phrase PP)
(append PR NP PP)
17.5. MORE LOGIC PROGRAMMING EXAMPLES 335
This is little more than a declaration of the rules of the grammar. Let’s look at a few of those rules a
little more closely.
This rule about adjectival phrases is in two parts. The first part says that ADJP is an adj-phrase if ADJP
is a noun. The second part says ADJP is an adj-phrase if some DGP and ADJP2 append to form ADJP,
and DGP is a degree-phrase, and ADJP2 is an adj-phrase.
’((noun (car)))
’((noun (garage)))
This pair of rules defines the nouns we know about. They say (car) is a noun and (garage) is a noun.
The nouns themselves are in lists because the system deals with lists: consider the adj-phrase rules
above, where ADJP must be a list for append to work on it.
Given the above, we can ask the obvious question:
You get the desired response, eventually7 . Interestingly, you can also ask it “what is a sentence”:
It gets stuck in a recursive rut after a while, but it’s exciting to see that it has the potential to generate
all sentences of a grammar if it could avoid those traps8 . However that’s not the real problem, the real
7
it took over ten seconds on my computer.
8
One way to avoid recursive traps would be to randomize the order in which one-of returns its values. Unfortunately
that will cause other problems in general.
336 CHAPTER 17. UNIFICATION AND LOGIC PROGRAMMING
problem is that it is horribly inefficient. All of those calls to append, most of which produce useless
results, consume huge amounts of resources.
However we can take a hint from the way that we implemented parsing with amb in Section 16.2.4
on page 260. In that implementation, the routine parse-word removed tokens from the front of the
*unparsed* global, effectively “directing” the progress of the parse, since it exposed the next token and
only certain words would match that newly exposed token. We can’t use “global variables” in a logic
programming system—the concept has no meaning—but we can nonetheless keep track of what has been
parsed so far.
We can do this by making all of our grammar rules take a second argument. For example:
(noun-phrase S R)
Will succeed if there is a noun phrase at the start of S, and if R is the remainder of S after removing that
noun phrase. Perhaps this approach is most easily demonstrated for individual words. Here’s the new
definition of noun:
’((noun (car . S) S))
’((noun (garage . S) S))
It will match if its first argument is a list starting with car or garage, an in the process instantiate its
second argument to the remainder of the list. For example:
> (prove ’((noun (garage is red) X)))
((noun (garage is red) (is red)))
If we write the other rules for single words similarily, we can start to build up more complex rules on
top:
’((noun-phrase S X)
(article S S1) (adj-phrase S1 X))
This says that there is a noun phrase at the start of S, leaving X remaining if there is an article at the
start of S leaving S1 remaining, and an adjectival phrase at the start of S1, leaving X remaining. Note
that we no longer need to use append to “generate and test”. We can further build on these intermediate
rules just as with the previous grammar:
’((sentence S X)
(noun-phrase S S1) (verb-phrase S1 X))
In order to use this new parser, we must remember to pass an empty list as the second argument to
sentence, to ensure that all of the tokens in the first argument are consumed:
> (prove ’((sentence (john put his car in the garage) ())))
((sentence (john put his car in the garage) ()))
Putting all of this together, we can see the considerably faster (but uglier) version in the test at the end
of Listing 17.8.2 on page 342.
Ugliness is not just an aesthetic thing, it gets in the way of clear and readable code. For that
reason any full Prolog implementation provides rewriting rules that will accept a simple grammar with
statements like9 :
9
When I say “like”, I’m not suggesting that Prolog looks exactly like this, it’s actual syntax is somewhat different. I’m
only saying these are conceptually alike.
17.5. MORE LOGIC PROGRAMMING EXAMPLES 337
’((s (X * 0) 0))
’((s (0 * X) 0))
’((s (X * 1) X))
’((s (1 * X) X))
’((s (X * Y) Z)
(require (and (number? ’X) (number? ’Y)))
(Z is (* X Y)))
’((s (X * Y) (X * Y)))
Likewise for exponentiation, except that we don’t have a “^” operator in PScheme so we can’t perform
any actual numeric computation in this case:
’((s (X ^ 0) 1))
’((s (X ^ 1) X))
’((s (X ^ Y) (X ^ Y)))))
With these rules in place we can ask the system to simplify that unwieldy expression we saw before:
Note that these rules for simplification are very incomplete, most noticeably they do not deal with
subtraction or division. However they are sufficient to demonstrate that they work and you can add the
extra rules yourself if you want to extend or experiment (this example is part of the tests in Listing 17.8.2
on page 342.)
likes(mary, cheese).
In this example likes is called the functor of the rule. Because it takes two arguments, it is said to
have an arity of 2, and is classified as likes/2. This is distinct from any likes functor with a different
number of arguments, Just as in our implementation lists of different length cannot match.
The big advantage in distinguishing functor/arity like this is that Prolog can index its database on
this basis. Unification is expensive, so when searching for a rule to match i.e. likes(mary, X) Prolog
need only inspect rules that are likes/2.
Of course this means that the functor cannot be matched by a pattern variable. The expression:
X(mary, wine)
is not valid Prolog. However Prolog has mechanisms to extract the functor from a term as a variable,
and to call an expression constructed with a variable functor, so this apparent limitation can be worked
around.
Prolog also recognises operators and operator precedence, such that the normal mathematical opera-
tors are infix and are parsed with the correct precedence and associativity. Prolog operators are functors,
so for example the form:
a + b * c
is equivalent to:
Prolog also allows you to define new operators as any sequence of non-alphanumeric characters, and
assign them a precedence and associativity. These user defined operators don’t actually do anything:
they just make it easier to write Prolog terms as they translate internally to normal functors just as the
built-in operators do.
This was saying that you can simplify E to E if E is not a pair (is atomic.) We might translate this into
Prolog as:
simplify(E, E) :- atomic(E).
simplify(E, E) :- atomic(E), !.
This says that if this rule succeeds, then no other simplify/2 rules should be considered if backtracking
occurs through the cut. If the expression E is atomic, then it cannot be simplified, end of story.
There are other uses of the cut, but they are best described in a book on Prolog.
In order to properly implement the cut, we would have to pass a third cut{} continuation around,
which makes the Parameter Object pattern discussed in Section 16.5.1 on page 285 even more attractive.
17.7 Tests
The first set of tests in Listing 17.8.1 on the next page exercise the individual additions to this version
of the interpreter.
The first test proves that unify elicits backtracking on failure.
The second test proves that unify returns a PScm::Env on success.
the third test demonstrates that unify plus substitute can resolve the variable terms in an example
that we’ve seen before.
The last test in this file shows instantiate in action. Although the digits in the result look like
numbers, they are actually PScm::Expr::Vars
The second set of tests in Listing 17.8.2 on page 342 tries out our logic programming system. It works
through pretty much the examples we’ve already covered in this chapter.
10
i.e. I don’t know how to make it work yet.
17.8. LISTINGS 341
17.8 Listings
17.8.1 t/PScm Unify.t
001 use strict;
002 use warnings;
003 use Test::More;
004 use lib ’./t/lib’;
005 use PScm::Test tests => 5;
006
007 BEGIN { use ok(’PScm’) }
008
009 eval ok(<<’EOT’, <<’EOR’, ’simple unify’);
010 (unify ’(a b c) ’(a b d))
011 EOT
012 Error: no more solutions
013 EOR
014
015 eval ok(<<’EOT’, <<’EOR’, ’simple unify 2’);
016 (unify ’(a b c) ’(a b c))
017 EOT
018 PScm::Env
019 EOR
020
021 eval ok(<<’EOT’, <<’EOR’, ’unify and substitute’);
022 (substitute
023 ’((a A) (b B))
024 (unify ’(f (g A) A)
025 ’(f B abc)))
026 EOT
027 ((a abc) (b (g abc)))
028 EOR
029
030 eval ok(<<’EOT’, <<’EOR’, ’instantiate’);
031 (instantiate ’((f (g A) A) (f B abc)))
032 EOT
033 ((f (g 0) 0) (f 1 abc))
034 EOR
035
036 # vim: ft=perl
342 CHAPTER 17. UNIFICATION AND LOGIC PROGRAMMING
052
053 (define match-goal
054 (lambda (goal env)
055 (if (eq? (car (cdr goal)) ’is)
056 (match-is goal env)
057 (if (eq? (car goal) ’require)
058 (match-require goal env)
059 (match-goal-to-rule goal
060 (one-of the-rules)
061 env)))))
062
063 (define match-is
064 (lambda (goal env)
065 (let* ((var (car goal))
066 (value (car (cdr (cdr goal))))
067 (svalue (substitute-all value env)))
068 (unify var (eval svalue) env))))
069
070 (define match-require
071 (lambda (goal env)
072 (let ((sgoal (substitute-all goal env)))
073 (begin
074 (eval sgoal)
075 env))))
076
077 (define match-goal-to-rule
078 (lambda (goal rule env)
079 (let* ((instantiated-rule (instantiate rule))
080 (head (car instantiated-rule))
081 (body (cdr instantiated-rule))
082 (extended-env (unify (substitute goal env)
083 head
084 env)))
085 (match-goals body extended-env))))
086 EOT
087
088 my $prereqs output = <<EOT;
089 not
090 require
091 one-of
092 no-vars?
093 substitute-all
094 prove
095 match-goals
096 match-goal
097 match-is
098 match-require
099 match-goal-to-rule
100 EOT
101
102 $prereqs output =~ s/\n$//s;
103
344 CHAPTER 17. UNIFICATION AND LOGIC PROGRAMMING
104
105 eval ok(<<EOT, <<EOR, ’socrates’);
106 $prereqs
107 (define the-rules
108 (list ’((man socrates))
109 ’((mortal X) (man X))))
110 (prove ’((mortal socrates)))
111 EOT
112 $prereqs output
113 the-rules
114 ((mortal socrates))
115 EOR
116
117 my $rules = <<’EOT’;
118 (define the-rules
119 (list ’((mary likes cheese))
120 ’((mary likes wine))
121 ’((john likes beer))
122 ’((john likes wine))
123 ’((john likes chips))
124 ’((person mary))
125 ’((person john))
126 ’((mary likes X) (person X)
127 (require (not (eq? ’X ’mary)))
128 (X likes Y)
129 (mary likes Y))))
130 EOT
131
132 eval ok(<<EOT, <<EOR, ’mary and john [1]’);
133 $prereqs
134 $rules
135 (prove ’((mary likes john)))
136 EOT
137 $prereqs output
138 the-rules
139 ((mary likes john))
140 EOR
141
142 eval ok(<<EOT, <<EOR, ’mary and john [2]’);
143 $prereqs
144 $rules
145 (prove ’((mary likes X)))
146 ?
147 ?
148 ?
149 EOT
150 $prereqs output
151 the-rules
152 ((mary likes cheese))
153 ((mary likes wine))
154 ((mary likes john))
155 Error: no more solutions
17.8. LISTINGS 345
156 EOR
157
158 $rules = <<’EOT’;
159 (define the-rules
160 (list ’((append () Y Y))
161 ’((append (A . X) Y (A . Z)) (append X Y Z))))
162 EOT
163
164 eval ok(<<EOT, <<EOR, ’append [1]’);
165 $prereqs
166 $rules
167 (prove ’((append (a b) (c d) X)))
168 EOT
169 $prereqs output
170 the-rules
171 ((append (a b) (c d) (a b c d)))
172 EOR
173
174 eval ok(<<EOT, <<EOR, ’append [2]’);
175 $prereqs
176 $rules
177 (prove ’((append X Y (a b c d))))
178 ?
179 ?
180 ?
181 ?
182 ?
183 EOT
184 $prereqs output
185 the-rules
186 ((append () (a b c d) (a b c d)))
187 ((append (a) (b c d) (a b c d)))
188 ((append (a b) (c d) (a b c d)))
189 ((append (a b c) (d) (a b c d)))
190 ((append (a b c d) () (a b c d)))
191 Error: no more solutions
192 EOR
193
194 eval ok(<<EOT, <<EOR, ’symbolic differentiation’);
195 $prereqs
196 (define the-rules
197 (list ’((derivative X X 1))
198 ’((derivative N X 0) (require (number? ’N)))
199 ’((derivative (X ^ N) X (N * (X ^ P))) (P is (- N 1)))
200 ’((derivative (log X) X (1 / X)))
201 ’((derivative (F + G) X (DF + DG))
202 (derivative F X DF)
203 (derivative G X DG))
204 ’((derivative (F - G) X (DF - DG))
205 (derivative F X DF)
206 (derivative G X DG))
207 ’((derivative (F * G) X ((F * DG) + (G * DF)))
346 CHAPTER 17. UNIFICATION AND LOGIC PROGRAMMING
312 EOF
313
314 eval ok(<<EOF, <<EOT, ’parsing’);
315 $prereqs
316 $rules
317 (prove ’((sentance (john will put his car in the garage) ())))
318 EOF
319 $prereqs output
320 the-rules
321 ((sentance (john will put his car in the garage) ()))
322 EOT
323
324 $rules = <<EOF;
325 (define the-rules
326 (list
327 ’((simplify E E) (require (not (pair? ’E))))
328 ’((simplify (X OP Y) E)
329 (simplify X X1)
330 (simplify Y Y1)
331 (s (X1 OP Y1) E))
332 ’((s (X + 0) X))
333 ’((s (0 + X) X))
334 ’((s (X + Y) Z)
335 (require (and (number? ’X) (number? ’Y)))
336 (Z is (+ X Y)))
337 ’((s (X + Y) (X + Y)))
338 ’((s (X * 0) 0))
339 ’((s (0 * X) 0))
340 ’((s (X * 1) X))
341 ’((s (1 * X) X))
342 ’((s (X * Y) Z)
343 (require (and (number? ’X) (number? ’Y)))
344 (Z is (* X Y)))
345 ’((s (X * Y) (X * Y)))
346 ’((s (X ^ 0) 1))
347 ’((s (X ^ 1) X))
348 ’((s (X ^ Y) (X ^ Y)))))
349 EOF
350
351 eval ok(<<EOF, <<EOT, ’simplification’);
352 $prereqs
353 $rules
354 (prove ’((simplify (((2 * (x ^ 1)) + 1) + 0) X)))
355 EOF
356 $prereqs output
357 the-rules
358 ((simplify (((2 * (x ^ 1)) + 1) + 0) ((2 * x) + 1)))
359 EOT
360
361 $rules = <<EOR;
362 (define the-rules
363 (list
17.8. LISTINGS 349
Summary
In this book we’ve watched the evolution of a programming language from humble beginnings to a
powerful if somewhat incomplete implementation.
Starting from a global environment model in Chapter 3 on page 15 with basic arithmetic and con-
ditional evaluation, in Chapter 4 on page 49 we introduced an environment passing model which made
possible the implementation of local variables and much else besides. We also reasoned that trees are
a much better structure for combining environment frames than stacks are, especially in Chapter 5 on
page 59 where we introduced function definition and closure.
We then went on to introduce recursive functions in Chapter 6 on page 73, and showed that a different
kind of binding is necessary to get recursive functions to work. Moving on, in Chapter 7 on page 81, we
introduced another type of binding which is performed sequentially. In the next chapter we looked at
adding list processing to the language, allowing it to manipulate directly the structures that the language
is composed of. Then in posession of that new set of functions, in Chapter 9 on page 107 we added a
macro facility that allowed the program to generate parts of its own structure.
Before adding other desirable features to the language we paused to describe the benefits of a language
without such features, and noted that such a pure functional language was amenable to parallel evalua-
tion. Brushing aside those concerns we moved on to add side effects, (both definition and assignment,)
sequences and global definition.
Chapter 12 on page 135 described a simple object-oriented extension to the language, using our
existing environment implementation to model objects.
In Chapter 13 on page 159 we re-wrote the entire interpreter in Continuation Passing Style, giving
the language direct access to those continuations via call/cc (call-with-current-continuation) and
showed how powerful a control tool continuations are.
In the short but sweet Chapter 14 on page 231 we showed how trivial a threaded interpreter is once
continuations are available, and in the equally short Chapter 15 on page 237 we added built-in error
handling and error recovery.
Chapter 16 on page 249 took continuations even further, making a radical departure1 from a standard
Scheme implementation to add the amb operator and backtracking. By showing that it is possible for
an interpreter to pass both a normal (success) continuation and a failure continuation, backtracking was
easily included into the PScheme core.
Chapter 17 on page 299 discussed pattern matching and unification, added unification and other
support routines as extensions to the interpreter. Then it used amb alongside those extentions to imple-
ment a simple but complete logic programming application in the PScheme language, demonstrating the
1
Why are departures always radical?
351
352 CHAPTER 18. SUMMARY
[1] Harold Abelson, Gerald Jay Sussman, and Julie Sussman. Structure and Interpretation of Computer
Programs, 2nd Edition. The MIT Press, Cambridge, Massachusetts, 1996.
[2] Philip Carter and Ken Russel. Logic Brainteasers. Carlton, London, 2006.
[3] W. F. Clocksin and C. S. Mellish. Programming in Prolog. Springer-Verlag, Berlin, Heidelberg, New
York, London, Paris, Tokyo, 1987.
[4] Mark Jason Dominus. Higher Order Perl. Morgan Kaufmann Publishers, Amsterdam, 2005.
[6] Daniel P. Friedman and Matthias Felleisen. The Little Schemer. The MIT Press, Cambridge,
Massachusetts, 1996.
[7] Daniel P. Friedman, Mitchell Wand, and Christopher T. Haynes. Essentials of Programming Lan-
guages, 2nd Edition. The MIT Press, Cambridge, Massachusetts, 2001.
[8] Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides. Design Patterns. Addison Wesley,
Boston, 1995.
[9] Adele Goldberg and David Robson. Smalltalk-80 the Language. Addison Wesley, Boston, 1989.
[10] John C. Mitchell. Concepts in Programming Languages. Cambridge University Press, Cambridge,
England, 2003.
[11] Benjamin C. Pierce. Types and Programming Languages. The MIT Press, Cambridge, Massachusetts,
2002.
[12] Michael Sperber, R. Kent Dybvig, Matthew Flatt, Anton van Straaten, Richard Kelsey, William
Clinger, Jonathan Rees, Robert Bruce Findler, and Jacob Matthews. Revised6 report on the algo-
rithmic language scheme. http://www.r6rs.org/.
[13] Larry Wall, Tom Christiansen, and Jon Orwant. Programming Perl, 3rd Edition. O’Reilly, Se-
bastopol, 2000.
353
Index
354
INDEX 355
cons, 89 Expr::List::first(), 88
cons cells, 89 Expr::List::Pair::Eq(), 283
consequent, 9 Expr::List::Pair::Instantiate(), 325
constraint network, 340 Expr::List::Pair::map eval(), 206, 268
Cont(), 187, 188, 190, 219, 220, 232, 267, 270 Expr::List::Pair::UnifyType(), 321
continuation, 166 Expr::List::value(), 94
continuation passing style, 160 Expr::map eval(), 206
Continuation::Cont(), 192 Expr::Quote(), 209
Expr::quote rest(), 210
D(), 174, 175 Expr::String::as string(), 239
Define(), 131, 132, 150, 273 Expr::Symbol::Eval(), 191, 241
define, 10 expression, 7, 124
definition, 124 Extend(), 51–53, 56, 77, 99–101, 268
dependancy-directed backtracking, 285, 340 Extend*(), 97, 99
display string(), 239 ExtendIteratively(), 83, 101, 204, 205
do error(), 239, 240 ExtendRecursively(), 77, 101
dot notation, 89 ExtendUnevaluated(), 67, 99, 100, 145, 242
Env:: populate bindings(), 242 factorial(), 161, 162, 166–168, 170–172, 177, 178,
Env::Apply(), 214 186, 192, 194
Env::Assign(), 211, 241 factorial helper(), 162, 172
Env::call method(), 214 Fail(), 267, 271
Env::CallMethodOrDie(), 215, 244 fail(), 310
Env::Define(), 212 File::Find, 224
Env::Extend(), 53, 67, 196–198 first(), 26, 89, 94, 95, 99, 209, 283, 321, 325
Env::ExtendIteratively(), 204 first class objects, 185
Env::ExtendRecursively(), 202, 204 function, 59, 61
Env::ExtendUnevaluated(), 202 functor, 338, 339
Env::LookUp(), 50
Env::LookUpHere(), 273 get method(), 146, 147, 151
Env::new(), 51 global variables, 10
Env::ResolveAll(), 321 hash(), 308
Env::Super::Apply(), 215 hashes(), 308, 310
Environment, 15 head, 300
Eq(), 283, 320 hygenic macros, 112
eq?, 253, 279
Error(), 239–241, 245, 266, 275, 277 if, 9
error, 237 implicitly typed language, 316
escape procedures, 220 init, 136
Eval(), 54, 190, 219, 268, 275 Instantiate(), 325
Eval, 15 instantiate, 324
eval, 117 IO::String, 33
eval ok(), 33 is anon(), 319, 320
Evaluator, 15 is close token(), 20, 21
exit, 232 is dot token(), 103
Expr::List::Cons(), 101, 214 is expr(), 116
Expr::List::Eval(), 26, 192, 218 is number(), 326
356 INDEX
PScm::Expr::List, 20, 22, 24, 26, 27, 30–32, 54, PScm::SpecialForm::LetStar, 82, 98
68, 88, 91–94, 96, 99, 114, 116, 206, 211, PScm::SpecialForm::Macro, 108
214, 323 PScm::SpecialForm::MakeClass, 140, 150
PScm::Expr::List::Null, 92, 94, 95, 114, 194, PScm::SpecialForm::Or, 281
206, 283 PScm::SpecialForm::Print, 215, 216
PScm::Expr::List::Pair, 92–94, 113, 206, 209, PScm::SpecialForm::Quote, 87, 99, 208
320, 323, 326 PScm::SpecialForm::Sequence, 279
PScm::Expr::Literal, 23 PScm::SpecialForm::Set, 125
PScm::Expr::Null, 206 PScm::SpecialForm::Spawn, 232
PScm::Expr::Number, 19, 23–25, 27, 28, 31 PScm::SpecialForm::Unify, 317
PScm::Expr::String, 19, 23, 25, 31, 32, 239 PScm::Test, 33
PScm::Expr::Symbol, 19, 23, 25, 31, 54, 114, PScm::Token, 19–21, 103, 115, 116
191, 277, 278, 318, 319 PScm::Token::Close, 19, 21
PScm::Expr::Var, 318–320, 323, 325, 340 PScm::Token::Dot, 103
PScm::Primitive, 27, 28, 32, 54, 88, 96, 193, 281 PScm::Token::Open, 19, 21
PScm::Primitive::Car, 88 PScm::Token::Quote, 115, 116
PScm::Primitive::Cdr, 89 PScm::Token::Unquote, 115, 116
PScm::Primitive::Compare, 281 pythagorean triples, 258
PScm::Primitive::Cons, 96
PScm::Primitive::Eq, 282 Quote(), 113, 114, 209, 210
PScm::Primitive::Ge, 282 quote, 87
PScm::Primitive::Gt, 282 quote rest(), 114, 209, 210
PScm::Primitive::Le, 282
Read(), 18–20, 22, 31, 101, 190, 191, 251, 266, 276–
PScm::Primitive::List, 88
278, 318
PScm::Primitive::Lt, 282
read eval print loop, 15
PScm::Primitive::Multiply, 26, 27, 195
Read:: next token(), 114
PScm::Primitive::Substitute, 321 Read::Read(), 116, 187
PScm::Primitive::Subtract, 26, 28 read list(), 102
PScm::Primitive::TypeCheck, 326 read list element(), 102
PScm::Primitive::TypeCheck::Number, 326 Reader, 15
PScm::Primitive::TypeCheck::Pair, 326 ReadEvalPrint(), 16, 25, 29, 33, 53, 63, 78, 104,
PScm::Primitive::TypeCheck::Var, 326 118, 127, 132, 149, 190, 215, 275, 283
PScm::Read, 16, 18, 31, 190, 318 repl, 15
PScm::SpecialForm, 32, 63, 76, 87, 108, 125, 126, repl(), 275
131 repl(), 189, 190, 238, 275–277, 283
PScm::SpecialForm::And, 280 Replace(), 325
PScm::SpecialForm::Begin, 126, 279 Resolve(), 322, 323
PScm::SpecialForm::Define, 131 resolve(), 315–317, 322
PScm::SpecialForm::Error, 238, 239 ResolveAll(), 321–323
PScm::SpecialForm::Eval, 117 ResolveTerm(), 322, 323
PScm::SpecialForm::If, 26, 29, 98, 193 rest(), 26, 88, 89, 94, 95, 99, 209, 283, 321, 325
PScm::SpecialForm::Lambda, 63, 64, 108, 193 root, 136
PSCm::SpecialForm::Lambda::Apply(), 208 rule, 300
PScm::SpecialForm::Let, 52, 76, 77, 82, 97, 193,
195 Scheme, 4
PScm::SpecialForm::LetRec, 77, 98 sequence, 126
358 INDEX