Scheme Basics¶
Any sufficiently complicated C or Fortran program contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of Common Lisp.
—Greenspun’s tenth rule
Obligatory, relevant XKCD:
Why Scheme? Scheme is incredibly simple so should be easy to understand the implementation of (famous last words) but has a sophistication that rivals any other programming language and surpasses most [citation needed].
If it’s that good, then why haven’t Scheme (and any other Lisp-like language, say, Common Lisp) taken over the world? Almost certainly the first reason is that it looks too alien for our ALGOL accustomed eyes – Lisps are Lots of Irritating Stupid Parentheses, as the old gag goes – but it might be that it is too different in other ways. We shall see.
The language’s name was Schemer but fell victim to the original implementation’s file system limits of six character entries.
Scheme is a Lisp – there are lots of Lisps – with a distinction that the core consists of half a dozen forms which, combined with Scheme’s use of homoiconicity, allows you to build anything else.
This ease of implementation allows for many shouts of “I’ve implemented Scheme in 50 lines of Brainfuck!” and more general use in academia as a pedagogical tool.
The downside of that small core implementation means that, nominally, Scheme doesn’t have arrays, hash tables, indeed, much of anything. People wanted some standard libraries and standardisation, for that matter, and so a series of Revised Report on Scheme then Revised Revised Report on Scheme (becoming Revised2 Report on Scheme or, more commonly, R2RS) through to the current R7RS small. R6RS caused something of a stink by departing from the minimalist feel resulting in the decision to split Scheme into large and small variants.
Basic Operations¶
Scheme only really has two high level operations: a reader reads in source code and having de-/re-constructed it, much like a lexer, passes it on to the evaluator. The evaluator does all the heavy lifting.
These two are generally called in a loop:
read an expression
evaluate it
loop back to the start
There is a variation on this for interactive sessions which looks like:
read an expression
evaluate it
print the result
loop back to the start
Which give us the well-known acronym, REPL.
Reader¶
The reader isn’t quite innocent, it is gifted with a few the power to do a few re-writes, usually trivial and accommodating the laziness inherent in all programmers.
Importantly, though, it does no interpretation of the entities it sees, it really only wants to figure out distinct code blocks through matching parentheses and bundle them up into lists (of lists (…)).
Evaluator¶
In the beginning, the evaluator looked at the lists of lists and figured out some meaning from them. Particularly the element in functional position and what kind of an element it is.
For some forms it can apply some special treatment, for others it may do some syntactic transformations (which may alter the meaning of the expression!) before finally deciding to actually invoke some behaviour and do something useful.
More advanced implementations might start performing some combinations of:
translation from Scheme-ish forms into intermediate representations
code analysis and subsequent optimisation
translation into pseudo-machine code for, commonly, a byte compiler. Home grown or more widely supported, eg. the JVM.
generation of a compilable external language, eg. C (and subsequent compilation)
direct generation of host-specific machine code
Instances running pseudo-machine code in a byte compiler (in a programming language virtual machine) might want to invest in a JIT compilation system.
Syntactic Structure¶
Forms¶
Famously, Scheme, like all Lisps, uses s-expressions and its basic syntactic structure is a form which looks like:
(thing thing thing ...)
Where each thing can be another s-exp or an atom which, basically, is something that is not an s-exp (numbers, strings and so on).
And… that’s it. There is some syntactic sugar for expressions that are used a lot but, in essence, the syntactic structure of Scheme is a list (of lists (of lists (of lists (can you see what I’m doing, here?)))).
That said, technically, lists don’t exist in Scheme.
Wait, what? In truth, a list, such as (1 2 3)
, is syntactic
sugar for a chained series of pairs. We’ll get onto
that.
Atoms¶
Atoms are those things that are not s-exps meaning they’re numbers, strings and so on. The likes of:
booleans:
#t
and#f
numbers:
-1
,23
,3.14
etc.Let’s take a moment to think about implementation.
Numbers in the source code, like,
-1
and23
, should be looked as constructors for values representing the numbers -1 and 23.That they are constructors should be obvious (or distracting or confusing) as we don’t know what a value is in the memory of our Scheme instance. They enter Scheme as a sequence of digits in the source code and emerge as a sequence of digits when printed out. What happened in between?
A number could be represented by a C
int
, say (hint: it almost certainly isn’t). That doesn’t tell us much about how big a number we can create (an 8, 16, 32, 64 or 128 bit integer?) and doesn’t look like it’ll hold a floating point number so something is going to have to make a decision on what kind of a number we are storing with the implication that there are going to be more than one. Unless we only deal with (small) integers.Scheme also handles “large” numbers in the form of bignums. So, if you fancy counting all the atoms in the universe…
So, all we can really say is that Scheme will consume the sequence of characters
-
and1
or2
and3
or3
and.
and1
and4
and construct a value in memory such that various arithmetic operations can be performed using it and that that value can be printed out.strings:
"foo"
etc.Again, a constructor for a value in memory representing the characters in between the double quotes.
What implementation here? C-like
NUL
-terminated strings? UTF-8? UTF-32? We don’t know and shouldn’t care so long as Scheme allows us to manipulate them with the usual string functions.symbols:
a
,foo
,user-name
,*ENV*
,arity+1
etc.Names, if you like, used as both identifiers (which are used to refer to values or, in the Scheme nomenclature, they are bound to the value) and as symbolic (hah!) flags.
There is a little curiosity with symbols, they are unique across the program in the sense that when you use the symbol
x
in one function and the symbolx
in another function then the symbol is the same. There is only one symbol,x
, in the program.Clearly (hopefully!) they refer to different values as the functions are being evaluated but the symbol is the same. This means that comparing symbols – though the circumstances when you might do that might not be obvious right now – is as fast as any equality can be in the underlying implementation. In C that is going to be a pointer comparison.
Note
Many punctuation characters are allowed in symbols which, depending on taste, allows for a more readable style. I, for one, like a hyphen instead of an underscore.
Scheme is largely delimited by whitespace and parentheses which frees up a lot of characters to add meaning and colour to identifiers.
There’s no hard and fast rules but there are conventions, for example:
a predicate will usually have a
?
character appended so Scheme might have a functionnumber?
rather thanisa-number
(which itself is using punctuation, a hyphen, in its name).a memory-modifying function will usually have a
!
character appended to make it clear you’re up to no good:set!
(for assignment) is the canonical example.
Of course, that means that statements like the shell’s assignment:
a=2
with no whitespace is a single symbol in Scheme. If Scheme had infix notation then we would be obliged to type:
a = 2
for an assignment.
Of course, the shell demands that there is no whitespace in an assignment – the very opposite!
There are other data types: chars, vectors etc. which we’ll come to when necessary.
Pairs¶
A pair is (1 . 2)
or (a . b)
etc..
A simple tuple of any two other objects – including other pairs – where the source code and printed output representation contains a dot/period, hence the common name, a dotted-pair.
A pair can be constructed with cons
(short for, uh, construct):
(cons 1 2)
results in (1 . 2)
.
(cons 1 (cons 2 3))
results in (1 . (2 . 3))
.
If the second element is another pair and the final element is (the
well-known symbol) nil
, eg. (1 . (2 . (3 . nil)))
, then the
source/printed representation can be reduced to a (more traditional
looking) list: (1 2 3)
.
This is such a common idiom that you can save some typing as the
constructor for a list, say, (cons 1 (cons 2 (cons 3 nil)))
can be
simplified to (list 1 2 3)
.
This form isn’t used often but does show up to handle arguments for a variable-arity function, for example.
If the final element is not nil
then the collapsed format retains
the final dot and becomes an improper list: (1 . (2 . 3))
becomes (1 2 . 3)
.
(1 . nil)
will be a list containing a single element: (1)
.
()
, the empty list is equivalent to nil
, so (1 . ())
is
also (1)
.
The first element of a pair can be retrieved with car
(CAR means
Contents of the Address part of the Register from the IBM 704 that
Lisp was first developed on):
(car (cons 1 2))
results in 1
.
(car (list 1 2 3))
also results in 1
.
(car (list (list 1 2) 3))
results in (1 2)
.
The second element of a pair can be retrieved with cdr
(Contents
of the Decrement Register):
(cdr (cons 1 2))
results in 2
.
Note
The cdr
of a list (not a pair) will be a list:
(cdr (list 1 2 3))
, results in (2 3)
or nil
:
(cdr (list 1))
results in nil
(because (list 1)
is
really (cons 1 nil)
).
Of course either element of a pair can be a pair (or a list) which allow us to build arbitrarily complex data structures, essentially, lists of lists (of lists (…)).
((1 . 2) . (3 . 4))
is not the same as ((1 2) . (3 4))
as in
the latter case each of the sub-elements of the outermost pair is a
proper list whereas it is a simple dotted-pair in the first.
Delving into the depths of complex data structures then requires
liberal use of car
and cdr
for which the ever efficient folk
in Scheme-land have some shorthand: cadr
is equivalent to
calling car
on the result of calling cdr
on some value. Given
the symbol value
bound to the data structure ((1 2) . (3 4
5))
:
(car value)
results in (1 2)
(caar value)
results in 1
(cdar value)
results in (2)
(cadar value)
results in 2
(cdr value)
results in (3 4 5)
(cadr value)
results in 3
(cddr value)
results in (4 5)
(caddr value)
results in 4
Remember the cd*
functions return a list if their argument is a
list, not the first element of that list even if it is the only thing
left – of course, it isn’t, there’s a unwritten nil
hiding at the
end – hence you require an extra car
to get the element itself.
Much of the interpretation of Scheme involves walking around such list data structures, you’ll be pleased to know. Although when you see how neatly a recursive implementation makes it happen you’ll nod sagely in appreciation.
Mental Model¶
In the above description for pair I wrote:
(cons 1 2)
results in (1 . 2)
.
which is slightly disingenuous. (cons 1 2)
results in a value
in memory which, you can probably imagine, is going to be some data
structure that ultimately has two pointers off to other things, here,
things that represent the numbers 1 and 2.
(1 . 2)
is the printed form of that value but I think we are
happy enough to use it as our mental model of that value in memory
hence we can say that (cons 1 2)
results in (1 . 2)
. Much
like, however numbers are stored internally, we represent that
implementation in our heads as 1.
Homoiconicity Again¶
Back to (cons 1 2)
, that is itself a list, right? It’s got that
(thing thing thing ...)
vibe going where the first value is the symbol cons
, the second
value is the number 1 and the third value is the number 2.
So, hang on. I can get the symbol cons
by calling car
on that
list and the number 1 by calling cadr
and… Whoa! …the
syntactic form of the language can be manipulated by the language
itself.
That’s what they mean by homoiconic.
Etc.¶
Interestingly, Scheme, being quite old, is missing what many modern languages have an expectation of being normal. Hash tables, for example, are not a default Scheme data type. That leads many, if not most, Scheme examples to work without them, which they do quite well.
In the meanwhile, things like hash tables are nice to have so a suite of “implemented in Scheme” examples have been created under the banner of Scheme Requests For Implementations, aka. SRFIs.
Evaluation Model¶
The evaluation model is equally simple: boolean, number and string values evaluate to themselves, symbols evaluate to the value they are bound to in memory and lists are treated as function calls where the first element is the function to invoke (often a symbol, ie. by name) and the remaining elements in the list are arguments to the function.
Note
Everything returns a value.
Some examples:
12
This will be interpreted as the constructor for a number value. Evaluation of the constructor returns a number value. Not much use as it stands!
(number-add a 1)
A list. The first element of the list is expected to be a function.
In this case it is the symbol number-add
. Symbols, when evaluated
return the value they are bound (ie. refer) to.
Note
In Scheme it is assumed that the value returned by a symbol in the first position of a list being evaluated is a function, no particular type checking is done while the symbol is evaluated! Obviously, when Scheme tries to apply the function to the arguments then it will quickly discover whether the value is a function or not.
The remaining elements are the arguments to the function, in this case
another symbol, a
, and 1
, a number (constructor). For these,
the symbol will be evaluated and the bound value returned and the
number will be constructed and the value returned.
Finally, the function value (the result of evaluating number-add
to get its bound value) will be applied to the argument values
(whatever the value was that a
was bound to and the number value
of 1
). Which is a slightly roundabout way of saying we’re going
to add 1 to the value being referenced by a
.
Except that it isn’t particularly roundabout. We’ve made assumption
after assumption about how things work, or rather, you’ve just
followed what been’s said and accepted it as true. What happens if
number-add
is not a function, a
is not a number,
number-add
doesn’t take two arguments let alone two numbers,
number-add
(or a
for that matter) is not defined anywhere (and
whether we have any rights to know about it if it has been defined
elsewhere)? Should we have evaluated the arguments before calling the
function? Should we have evaluated the element in the functional
position before or after the arguments? Should we have evaluated the
arguments left to right, vice versa or in parallel (if we could)?
Does any of that lot matter? Yes, of course. If a language is not well defined then implementers are at liberty to make decisions which means users will see different behaviour if they run their programs using different implementations. Evaluating arguments left to right (or vice versa) may have different side-effects and make or break the user’s program. Not good.
That said, number-add
is not a standard function, +
is:
(+ a 1)
Now, a famous bane of Scheme is that the function is always
the first element of the list – there are no infix or postfix
operators. We are very used to seeing arithmetic written with infix
operators, ie. a + 1
, and, for many people, there is a natural
cognitive pause when reading prefix notation for arithmetic (and
assignment).
But that’s bizarre, every other function call we make in the vast majority of programming languages uses prefix notation:
number_add (a, 1);
number_add ($a, 1);
[number_add $a 1]
(number-add a 1)
We don’t skip a beat when reading those yet have a mental hiccup with regular arithmetic functions in prefix notation:
(+ a 1)
There’s no particular answer to that. Other than to enable infix notation – which is messy in Scheme and so Lisp language programmers just get used to prefix notation of arithmetic.
As an aside, prefix notation has a benefit in that +
isn’t limited
to 2 arguments: (+ 1 2 3 4 5)
is just fine. As, indeed, is (+
1)
.
—
Extending our trivial example, now a list of lists:
(+ 1 (* 2 3))
Aha! We’ve got this. The outer list is really (+ 1 X)
where
X
is the result of evaluating the inner list, the expression (*
2 3)
, so we would expect that +
will we called with the
arguments 1
and 6
. Easy.
Indeed, so. Notice that there was no arithmetic precedence rules,
eg. 1 + 2 * 3
, we had to explicitly state the multiplication
expression. In fact, we have to explicitly delimit every expression
in Scheme – there are no shortcuts! This leads to another bane of
Scheme, parentheses overload, there’s a lot of them floating about.
That said, if you took away arithmetic precedence and used regular
prefix function calls, every other language would look similar:
add (1, mul (2, 3))
which, if you remove commas:
add (1 mul (2 3))
and pull the function names inside the parentheses:
(add 1 (mul 2 3))
and replace arithmetic function names with symbols:
(+ 1 (* 2 3))
then we’re back to Scheme – we’ve only changed the syntactic sugar, not the substance of the expression. Arithmetic precedence rules and infix operators, that’s really the only difference.
Incidentally, given that everything is going to be wrapped in parentheses and that we’re going to have lists of lists of lists as we delimit every expression, Scheme calls these lists of lists (of lists …) forms.
Special Forms¶
Wait, though, there some special cases where we can’t just perform
evaluation like the above. if
is a case in point:
(if #t (carry-on) (sleep-forever))
The if
form takes three arguments: a condition expression which
should result in a boolean value, a consequent expression to be
evaluated if the boolean result was true and an alternate expression
to be evaluated if the boolean expression was false.
Notice the careful wording there: to be evaluated if…. if
cannot have had its arguments evaluated before it is applied to the
arguments otherwise we would have tried evaluating (sleep-forever)
(and the other arguments) to find out their values. We might surmise
that (sleep-forever)
is going to take a while to complete and
since the condition was #t
then logically we shouldn’t have been
running it anyway.
So if
must be being treated specially and indeed it is called a
special form. Special forms do not have their arguments evaluated
before the function value is applied to the arguments.
To handle this, the evaluation engine will have compared the car
of the form to the symbol if
and if it matches then starts to run
the behaviour of the special form. The arguments remain unevaluated
and are passed as is (atoms or lists) to the if
behaviour code.
OK, let’s think this through. Inside the function that implements
if
there must be something that evaluates the condition
expression and then if the result was true then evaluates the
consequent expression otherwise evaluates the alternate
expression. Who implements this if
inside if
?
Well, here we get into the murky meta world of language implementation
and the implementing language. In the pedagogical examples of SICP
([AwJS96]) and LiSP ([Que94]), the implementation of if
(the if
of our proto-Scheme, Scheme0,
say) will be written in some existing Scheme,
Scheme1, say.
Scheme2, of course.
OK, so what is Scheme1’s if
written in?
Well, you’ll have to go and look at the source code but you can
imagine that you’ll eventually find an implementation of
Schemen in C and therefore if
will have
been implemented using C’s if
. C’s if
is,
in turn, (probably) implemented with a machine code instruction to
test and branch, itself written in processor microcode and eventually
into transistor NAND gates where, um, … Look! A squirrel!
In the meanwhile, glossing over the detail, most functions are derived forms and their arguments are evaluated before the function value is applied to them but some names (in function position) are identified as special and the behaviour code is given the arguments as is.
The term derived is from that idea that there are a few core forms, usually special forms, from which everything else can be… derived. We’ll also have the distinction between primitive and (I don’t think there is a special name for them so let’s use) regular functions. A regular function is one we write in Scheme. A primitive function is one written in the underlying implementation language, say, C, whose interface is exposed such that it can be invoked as if it were a regular function.
You create primitive functions as a combination of:
elementary memory management functions – think of
cons
to create some, say, C structure on the heap, andcar
andcdr
to poke about in itbootstrap functions – you might want a simple
map
function before it can be re-written into all of its gloryplain old expediency and efficiency functions – some things just want a fast, tight loop.
Special forms sounds great for system-defined functions and people who
want to hack away at the source code but what if I want one?
Well, in many languages you’re clean out of luck. Want to add a new
syntax operator alongside if
, while
etc. to C?
That’s not going to happen. However, the folks in
Scheme-land are more generous and have blessed us with
macros.
And arguably cursed us with macros. A double-edged sword.
Bindings¶
How do we bind symbols to values? We had (+ a 1)
before. What is
a
referring to, how did a
get bound to a value?
There are a number of binding forms.
let¶
(let bindings body+)
where bindings
is a list of bindings and each binding is a list of
a symbol and an expression and body+
is one or more forms (hence
the +
in body+
) to be evaluated in the context of the
bindings.
Visually, bindings
is potentially confusing. If we go over that a
step at a time:
bindings
is a list of bindings:(let (binding1 binding2 binding3) body+)
although your more likely to see it written (and write it) as:
(let (binding1 binding2 binding3) body+)
each binding is a list of a symbol and an expression:
(let ((a 2) (b 1) (c (* 3 4))) body+)
For each binding, the expression is evaluated and the symbol is bound
to the resultant value then the remaining body+
forms are
evaluated in sequence with the symbols available for use.
After the body+
forms have been evaluated then the symbols are no
longer available – in other words, the extent of the symbols is
limited to the lifetime of body+
.
The result of the let
form is the value of the last form in the
body+
forms.
(let ((a 2))
(+ a 1))
Here, bindings
is ((a 2))
, ie. a list of a single binding,
(a 2)
, and the body+
forms is just a single form, (+ a 1)
.
For our single binding, (a 2)
, recall that it is a list of a
symbol and an expression. Here, the symbol is a
, the car
, and
the expression is 2
, the cadr
. The expression, 2
, a
number constructor, is evaluated resulting in a number value and a
is bound to that value.
If any of the forms in body+
requires the evaluation of the symbol
a
then the result will be the number value 2.
Of course, we do want to use a
in body+
where the expression
(+ a 1)
can now have its arguments successfully evaluated
resulting in (+ 2 1)
.
The function name, +
, can be looked up and (we presume) results in
a function value that will add two numbers together for us. The
result of applying the function value of +
to the argument values
1 and 2 should be a number value (representing 3) and as it is the
only form in body+
then it is also the last and therefore the
result of let
itself.
Wait a minute, let
returns a value? Yes, as noted earlier,
everything returns a value (albeit that some of them are
unspecified). In fact, that let
returns a value is used
innumerable times to create closures. More of that later.
A fractionally more complex example:
(let ((a 1)
(b 2))
(+ a b)
(- a b))
we’ve now bound two symbols, a
and b
, to two numbers, 1
and 2
and we have two forms in body+
. The first form adds the
two numbers together and returns that result.
Wait, returns it to whom? Well, we did say that the body+
forms
were going to be evaluated in sequence which means there must be some
entity iterating over each form and it is to that entity that we’ve
returned the number value.
Scheme has been a little underhand here and rewrites forms
from time to time. For a let
, the body+
is re-written to
(begin body+)
where the sequencing function, begin
will
iterate through the individual forms in body+
discarding all
results except the last which it returns as its own result.
This means the original multiple-body form variant of let
with
body+
has been transformed into a pure single body form: (let
bindings body)
. This sort of transformation (from an impure to a
pure(r) format) is at the heart of syntactic transformations,
aka. macros.
So Scheme re-wrote our little snippet of code to:
(let ((a 1)
(b 2))
(begin
(+ a b)
(- a b)))
and begin
has consumed the result of the addition and will now
evaluate the next form, a subtraction. This is the last (original)
body+
form and so the result of the subtraction, the number value
-1, becomes the result of the begin
function – everything returns
a value, remember – and as the begin
function has inserted itself
as the only and therefore last body form of let
, it will be the
result of the let
too.
Lexical Scope¶
Let’s get busy:
(let ((a 1))
(let ((b 2))
a))
has an outer let
binding a
to 1 and its body is another
let
.
This inner let
binds b
to 2 and evaluates its body which is
the simple symbol, a
. The extent of the outer let
’s binding
is still valid – we’re still in the outer let
’s body – so a
is bound to 1. The inner let
has finished processing its body
which has returned a value, 1, which it returns itself.
The inner let
’s body has now returned a value and as it was the
only body form so the outer let
also returns 1.
Some visual gymnastics, now.
(let ((a 1))
(let ((a 2))
a))
has an outer let
binding a
to 1 and its body is another
let
.
This inner let
binds a new a
to 2 and evaluates its body
which is the simple symbol, a
. In this inner let
a
is 2
– it is the first a
found when walking outwards across lexical
bindings. The inner let
has a value, 2, from its body which it
returns.
The inner let
’s body has now returned a value and so the outer
let
also returns 2.
What if the outer let
’s body did a bit more?
(let ((a 1))
(let ((a 2))
a)
a)
The inner let
continues to create a new a
bound to the number
value 2 and returns it. However, the inner let
is the first of
two body forms in the outer let
and therefore its result is
ignored by the implicit begin
. All of the bindings of the inner
let
are now irrelevant and when begin
evaluates the final
form, a
the only binding for a
in scope is that of the outer
let
meaning a
evaluates to 1.
let
has its limits, though, for a very specific reason:
(let ((a 1)
(b (+ a 1)))
(+ a b))
This seems reasonable enough, a
is bound to the number 1
and b
is going to be the result of adding 1
to a
.
Except it is an error. let
won’t allow you to use a symbol being
defined in the bindings in the expression used to bind another symbol.
We’ll see why in the advanced section.
Functions¶
Function definitions are likely to look something like a list of formal parameters and a body. The formal parameters are bound to the values of some expressions passed as arguments when the function is called and the formal arguments are available for use for the lifetime of the body. It’s sounding quite similar to the let, above.
Functions are first-class values in Scheme, this is to say you can pass them around like numbers or strings. All you have to do to use a function value is to put it in the first slot of a form.
In Scheme function values are created with the special form
lambda
(which you might think of as a function
keyword):
(lambda formals body+)
Where formals
is a list of formal parameters (possibly the empty
list!) followed by a list of one or more body forms, evaluated in
sequence – sounds familiar again! Hence we might expect a similar
syntactic transformation to a more pure form:
(lambda formals (begin body+))
Hmm, notice we haven’t given the function a name. What we have is a function constructor – like the constructors for numbers and strings – the result of which is a function value. Unless we bind it to a variable we’re going to lose it.
Noting that it is without a name this might be called an anonymous function – though that’s not a term you particularly see in Lisp languages as it’s just a function (value).
let* and letrec¶
The pedantry in defining things continues with a couple of variations on a theme!
(let* ((a 1)
(b (+ a 1)))
(+ a b))
Finally, we can have variables dependent on previous ones!
But what’s happening here? let*
, “let iteratively”, iterates over
the bindings, one at a time, placing each parameter in the scope of
later ones. It has transformed:
(let* ((formal1 arg1) (formal2 arg2)) body+)
into:
(let ((formal1 arg1)) (let ((formal2 arg2)) body+))
Where each let
only handles a single binding and the remaining
bindings are performed in the body form of that binding, hence the
earlier parameters are available for use in subsequent argument
expressions.
letrec
, “let recursively,” is a slightly different idea, it
handles the situation where two or more function definitions are
mutually dependent. The usual example is these rather inefficient
odd?
/even?
predicates:
(letrec ((odd? (lambda (n) (if (= n 0)
#f
(even? (- n 1)))))
(even? (lambda (n) (if (= n 0)
#t
(odd? (- n 1))))))
(even? 4))
Here, each predicate will call the definition of the other, so who do we define first? Does it matter, though, as these are function definitions and not invocations?
Yes, it does, as the interpreter, when it is analysing the function
definition for odd?
will see a reference to even?
. It now has
to look that up. In the best case there isn’t an even?
in scope
and you get a error. In the worse case it’ll find some dubious
binding to even?
hanging about and good luck to everyone when it
gets called. Clearly these two definitions are meant to go together
– they are concomitant.
letrec
performs a little trick, rather than define one before the
other, it creates placeholder bindings for the symbols and then
redefines the values in the placeholders in turn to their proper
definition – it doesn’t matter in which order, now, as references to
even?
/odd?
will find either the proper definition or the
placeholder (to be filled in with the definition in a moment’s time).
As whichever is found will be the locally defined placeholder
even?
/odd?
then we won’t be unearthing some dubious code from
elsewhere. So letrec
might be transformed from:
(letrec ((formal1 arg1) (formal2 arg2)) body+)
into:
(let ((formal1 #<undefined>) (formal2 #<undefined>)) (let ((tmp1 arg1) (tmp2 arg2)) (set! formal1 tmp1) (set! formal2 tmp2) body+))
which itself introduces a few issues: the idea of an undefined value
(perhaps an internal implementation thing?); temporary variables (we
don’t want to conflict with the user’s code); and set!
, the
destroyer of things!
Define¶
With the various forms of let
we can introduce new variables and
override them but sometimes, to much gnashing of teeth from the
purists, we need to change the value a symbol is bound to. By and
large in Scheme, where something is going to change something
then its name has an exclamation mark at the end signifying the danger
of mutation!
In our putative transformation of letrec
, above, we use set!
to modify the values that the formal parameters were bound to. That’s
fine inside the let
that introduces those formal parameters but
what do we do at the top level (outside of any binding form) or
anywhere else where a symbol has not yet been introduced?
(set! a 1)
is an error in Scheme as a
does not exist – we can’t change the
value a symbol is bound to if the symbol doesn’t have a binding yet.
This may appear as another bout of pedantry as set!
could create
the binding if it didn’t already exist.
You might well ask, then, where set!
should create the binding.
In the current scope (let
/lambda
level)? At the top level
(whatever that is)?
Whilst we ponder the possibilities, Scheme says that you should use
define
to introduce a symbol binding at the current level – a
binding you can subsequently set!
.
(define a 1)
Introduces the binding of the symbol a
to the number 1
after
which anything can use it:
(let ((b 2))
(+ a b))
should return the number value 3 even though a
was not introduced
by the let
, the define
at the top level has meant it is in
scope.
We could have written:
(let ((b 2))
(define c 3)
(+ b c))
which would return the number value 5. The define
, here, inside
the let
seems slightly wasteful – as you could have simply had
let
introduced a new binding for c
itself – however the
technique is used frequently for defining ancillary functions within
another function.
Yes, of course, functions within functions. What’s not to like?
Defining Functions¶
Associating a name with a function value – we know how to create
function value with lambda
and we now have define
so it is
obviously:
(define +1 (lambda (n) (+ 1 n)))
(a function called +1
? Outrageous!)
This is such a common idiom that define
has an alternate syntax:
(define (name formals) body+)
to be re-arranged as:
(define name (lambda (formals) body+))
ie. for our +1
function:
(define (+1 n) (+ 1 n))
or
(define (+1 n)
(+ 1 n))
That’s a bit cleaner!
Notice that define
(without an exclamation mark) is introducing a
new binding whereas set!
is modifying an existing one (hence
define
and let
rather than define!
and let!
).
It’s not quite that simple as the number of formal arguments cause a twist.
No Arguments¶
For the define
form that’s easy, just don’t specify any when you
define the function – and obviously don’t pass any when you call it!
(define (hi-string)
"hi!")
(hi-string)
For the lambda
form it’s not dissimilar, we just have an empty
list:
(lambda nil "hi!")
although you must explicitly write nil
otherwise the evaluator
will be left with an incoherent set of arguments for lambda
, just
a string in this case.
Varargs¶
What if you have a variable number of arguments? Remember, improper
lists, (a b . c)
? There’s two variants here:
Some Formal Parameters¶
(define (foo a b . c)
...)
I must call foo
with at least two arguments, for the formal
parameters, a
and b
, and any remaining arguments are bundled
up into a list and bound to c
.
c
could be an empty list, nil
, or a list of one or more
values, (...)
.
No Formal Parameters¶
Here, all arguments are bundled up into a list and passed as the formal parameter.
For define, this is OK:
(define (foo . c)
...)
and works like above. In fact the function list
itself – whose
purpose is to bundle up its arguments into a list – is usually rather
cheekily defined as:
(define (list . ls)
ls)
where it has had the evaluator do the hard word of bundling the arguments into a list and passes it off as its own work. Clever.
For the lambda
form it is visually different:
(lambda c
...)
where c
is now a standalone symbol, not in a list of formal
parameters, to capture all the arguments as a list.
Closures¶
All functions are closures, that is they can use variables in scope in their bodies. That’s probably not a huge shock until you note that a lot of the time a function is returned from an expression and will continue to be able to use the variables that were in the lexical scope of it’s definition from far far away.
(define a 1)
(define (a+ n)
(+ a n))
(a+ 5)
You’ll expect to get the number value 6 – which you do. However, there’s a couple of variations on a theme here:
(set! a 2)
(a+ 5)
will result in the number value 7 but
(let ((a 3))
(a+ 5))
will give you the number value 7 again. Why?
We’re back to bindings. When the interpreter analysed the definition
of a+
it saw that the variable a
was not bound by the
parameters of the function (nor by any bindings introduced within its
body) and was therefore a free identifier.
Rummaging about it will have found the top level a
, introduced by
define
, and will have used that binding in the implementation of
a+
. Even though the binding was modified by set!
the function
body would still be referring to that binding irrespective of any
subsequently introduced bindings.
In the terminology, a+
was closed over a
and therefore
a+
is a closure.
What if we wanted to use the locally current binding of a
(to the
number value 3) at the time we evaluated the function call? That
would then be using dynamic scope. Scheme does use dynamic scope
but prefers lexical scope in the definition of functions. It could
have chosen to do differently but it didn’t.
The shell, on the other hand, is quite the reverse. Dynamic scope is
used unless a variable is declared with lexical scope (with
typeset
).
Quoting¶
There’s a few reader macros to help us. Reader macros aren’t true macros we will describe later but rather allow for some cheap syntactic tricks to save typing.
If the evaluation engine sees a list it will assume the first element is a function and the remaining elements are arguments to that function. What if we want to create an actual list and not have it evaluated? We need to quote the list:
(quote (1 2 3))
quote
stops the evaluation engine from trying to find a function
associated with the number constructor 1
. Typing (quote
thing)
gets tiresome so we can use a single quote, ie. 'thing
as
shorthand:
'(1 2 3)
The reader macro associated with '
will read the next expression,
thing
, say, and return the longwinded (quote thing)
for the
evaluation engine to process.
Types¶
Values have types (boolean, number, string etc.) in Scheme but identifiers do not. You can freely rebind a symbol to any value – though you’re only likely to confuse yourself.
This contrasts with strongly-typed languages where a type is associated with an identifier:
int i = 3;
so that the compiler can keep track of things and call out any obvious errors:
char *s = strlen (i);
Common Scheme Functions¶
Equivalence Predicates¶
What does “the same” really mean?
eq?
is most exacting in that, where appropriate, the comparison is
on the (internal) pointer into memory where the value is stored. A
pointer test can be very quick.
This leads to the slightly odd:
(eq? 1 1)
returning #f
.
eqv?
is the same as eq?
except that the values of numbers and
chars are compared. Thank goodness for that!
equal?
will recursively descend into pairs, vectors and strings
applying eqv?
to their contents. Broadly, does the printed
representation look the same?
C, by way of comparison, can only perform integer
comparisons, ==
, <
and the like. Any complex data type like a
string or a struct requires that you delve inside and compare the
constituent parts “int by int”.
Conditional Expressions¶
(if c1 s1
(if c2 s2
(if c3 s3
...)))
Cascading if
clauses aren’t easy on the eye. cond
is the
alternative:
(cond
(c1 s1+)
(c2 s2+)
(c3 s3+))
where, more than likely, each of the cn
and sn+
clauses are themselves lists:
(cond
((boolean? exp) "boolean")
((number? exp) "number")
((string? exp) (string-append "string-" exp)))
cond
is slightly different to if
in that multiple expressions,
the sn+
, are allowed if the condition is true.
It has an else
form, of course, and a very different creature, an
=>
clause:
(cond
((string-match str "foo") => func)
(else (string-append "string-" exp)))
My example is rather poor in that you might presume that
string-match
produces an index into a string but,
seemingly, func
has no reference to str
. Maybe
func
was closed over str
, maybe it’s just
accumulating a list of matches, maybe I should think
harder about my examples….
For =>
the cn
expression is evaluated resulting in some
value
. If the value
is not false then func
is
applied to the value
, ie. (func value)
, or
(func (string-match str "foo"))
if and only if (string-match str
"foo")
is not false.
case
is more like C’s switch
or the shell’s case
statements:
(case key
((o1+) e1+)
((o2+) e2+)
((o3+) e3+))
where key
is evaluated and then compared to each of the
on+
objects with eqv?
. If any object matches key
then the expressions en+
are evaluated in sequence with the
result being that of the last expression.
Sequences¶
and
, or
and begin
are all logical sequencing instructions
which only differ in their initial premise (#t
, #f
and
unspecified respectively) and when they stop processing the sequence
(when an expression returns false, when an expression returns true and
to the end, respectively). They all return the value the last
expression they evaluated returned (or the initial premise if there
were no expressions!).
Common Scheme Idioms¶
Result List¶
A very common idiom is for a function to construct a result list by
cons
ing together the putative calculation on the car
of the
list of arguments and then calling itself on the remaining arguments.
So to add 1 to every element of a list:
(define (inc-list lis)
(cons (+ (car lis) 1)
(inc-list (cdr lis))))
Here we are cons
ing together the per-element calculation (+ X
1)
where X
is the first element of the list and a recursive call
to itself on the rest of the list, (inc-list (cdr lis))
.
Calling the argument list lis
is a common enough idiom itself as
calling the argument list list
would conflict with the function
list
and calling it l
as ever risks visual confusion with
1
.
Actually, that’s not quite correct as there’s nothing identifying the end of the list!
(define (inc-list lis)
(if (pair? lis)
(cons (+ (car lis) 1)
(inc-list (cdr lis))))
nil)
When we reach the last pair in the list, where the cdr
is nil
and call inc-list
on that nil
then the if
’s condition is
false and that invocation of inc-list
returns nil
which is
just what we want in the caller’s cons
, (cons thing nil)
. As
the recursive calls unwind we get a neatly constructed set of nested
cons
calls resulting in the list we want.
Indeed, applying a function to each element of a list is the work of
map
:
(map func list)
and in our “add 1” case, an anonymous function is suitable:
(map (lambda (n) (+ n 1))
list)
map
itself will follow the result list idiom – constructing the
result list as the list is descended.
That’s similar to map
in most languages but Lispers
like a list so the real map
will iterate down multiple lists at
once calling func
with multiple args (one per list).
Safe Bootstrap¶
Particularly in bootstrap code you might see the peculiar:
(define (foo args) ...)
(define bar
(let ((foo foo))
(lambda (args)
...use foo...
)))
What’s the point of defining foo
as foo
in the let
in
bar
?
The problem is that we don’t know when bar
is going to be called
and, if foo
is something that may be redefined with slightly
different behaviour then we need to be sure that the foo
that
bar
uses is this one, the one defined immediately before it. In
other words, we’ve bound ourselves to the local definition of foo
so that we don’t get any unexpected behaviour.
Anonymous Closures¶
Anonymous closures returned from function calls are very common:
(define (adder-factory arg)
(lambda (a)
(+ arg a)))
(define add3 (adder-factory 3))
(add3 5)
should get us the number 8 as a result.
Thunks¶
A thunk is a zero parameter function which doesn’t sound much use in itself but following on from anonymous closures is a useful trick to delay the evaluation of some code.
(define a (+ 3 5))
(+ a 1)
will perform the calculation (+ 3 5)
and bind a
to the result.
No biggie. But:
(define a (lambda () (+ 3 5)))
(+ (a) 1)
Notice that a
is a now a zero-argument function and therefore
requires a zero-argument function invocation, (a)
, in the final
expression!
Here, the calculation, (+ 3 5)
, will only be performed when a
is invoked as a function.
This is a trivial case but preparing a body of code to be evaluated on demand will come in very useful.
Recall, that if
is a special form, (if condition consequent
alternative)
? Perhaps we could have transformed it into
(if condition consequent-thunk alternative-thunk)
by replacing, say, consequent
with (lambda () consequent)
,
then neither of the consequent nor alternative expressions would be
evaluated until their respective thunks were invoked when if
decided what to do with the result of condition.
OK, using a macro to skip the special form-ness of if
seems a bit
pointless but the principle remains:
(define (preparer args)
(lambda ()
... args ...
))
we can prepare some functionality to be evaluated on demand – perhaps never, of course.
Last built at 2024-12-21T07:11:17Z+0000 from 463152b (dev)