Advanced Scheme

Much of what we’ve seen so far has been slightly different ways of expressing the same things as other languages.

Can we go a bit deeper?

Syntax Transformers

As suggested previously, this is the word of macros. The basic premise is that you can determine some minimalist or comfortable expression and a macro can re-write your code into proper Scheme.

It’s not so far from, say, C pre-processor macros except you don’t have to figure out what the list and string manipulation limitations of the C pre-processor are, you have the full power of Scheme at your disposal. Including, of course, other macros.

Of interest, Python is considering Syntactic Macros.

To be fair, raw macros are “a bit tricky” so people have gone a bit meta and written syntax transforming macros for you to use. These let you, in turn, create syntax transforming macros. It seems a bit redundant but the syntax transforming macros limit what you can do with the idea being to keep people away from the real deal when they don’t need to be there.

let

let is a macro, it is a syntax transformer. Here’s how.

Just to make you hate the parenthesis a little bit more, our evaluation model for Scheme says that functions are invoked when they are the first element in a list and we’ve said that lambda is a function constructor. Let’s combine the two. So the func in:

(func args)

can be replaced with a (lambda ...) construction:

((lambda formals body+) args)

The number of arguments in args should clearly match the number of formal parameters in formals.

Can we work an example? How about a function that adds two numbers:

(lambda (a b) (+ a b))

or, in a more common styling:

(lambda (a b)
 (+ a b))

This is constructing a function value which takes two formal parameters, a and b, to be in scope for the extent of the body forms. It has a single body form, (+ a b) the result of which is what the function will return.

Like a more obvious example, say, (+ 1 2), had this function been bound to the symbol add we would have happily invoked (add 1 2) where add is going to evaluate to a function value. A function value is just the thing we created with lambda. So put it all together:

((lambda (a b) (+ a b)) 1 2)

(I told you you wouldn’t like it.)

There’s a lot of visual clutter there to figure out it is saying:

(func 1 2)

and we haven’t made the arguments complicated expressions at all!

But that is what it is saying and so the function value, created on the hoof, is applied to the arguments and we get a result. (Hopefully, 3!)

Where does this errant nonsense with on-the-fly function values get us?

let, of course! (The clue was in the heading…)

Consider a let statement:

(let ((a 1)
      (b (* 2 3)))
  (+ a b))

In the bindings we said each binding was a list containing a symbol and an expression, so our symbols are the cars of each binding, a and b, and the expressions are the cadrs, 1 and (* 2 3).

So a syntactic transformation of let could walk down the list of bindings putting the car of each binding into a list of symbols called, say, formals, and the cadr of each binding into a list of expressions called, say, args. The body+ forms look like they’re going to be handled the same way in a let as in a function so we can easily transform the let:

(let ((formal1 arg1)
      (formal2 arg2))
 body+)

in the first case into a function definition:

(lambda (formal1 formal2)
 (begin body+))

and then into a function call form by putting the function definition as the first element and the arguments to be evaluated as the remaining elements:

((lambda (formal1 formal2) (begin body+)) arg1 arg2)

And, lo! let has been transformed into an on-the-fly function call.

This transformation also explains why let does not allow you to re-use a formal argument in the definition of another as arg1 and arg2 are presented as independent values to the function. arg2 cannot use arg1 because arg1 exists only as a value not as a name bound to a value that you can reference. arg1 is not usable as formal1 until inside the function.

No one expects to be able to write the following in C:

int main (char **argv, int argc = length(argv))

where one formal parameter is derived from an earlier one. The formal parameters are independent and, so far as you can tell (from inside the function), created in parallel.

Why are we doing this anyway? What was wrong with plain old let? In principle, nothing, in practice it means we have to support two different ways of introducing variables into scope. With a syntax transformation like this we only have to support function calls. let vanishes.

In some sense it is much like any transformation of an iterative loop into a tail recursive loop (or vice versa) which might be more convenient for your language to support.

Let Variant

let has another funky form:

(let name bindings body+)

which really needs an example to understand.

(let foo ((f1 e1)
          (f2 e2))
 body+)

gets transformed into:

(letrec ((foo (lambda (f1 f2))
               body+))
 (foo e1 e2))

Eh? What’s happened there and what’s that doing?

Look carefully. We’ve done a similar trick with the bindings. Firstly, the symbols have become the formal arguments again but, rather than for an anonymous function, this time for a named function, foo. Secondly, rather than apply the anonymous function to the arguments we’ve called the named function with those arguments.

What we’ve done is declared a (usually self-recursive, hence letrec) function called foo which takes formal parameters, f1 and f2, and we automatically called it with arguments, e1 and e2.

So we’ve declared and invoked a function in one statement. It seems a bit… exciting …although not a million miles removed from the standard let.

However, it’s actually really useful. You recall that Scheme prefers a recursive loop and there’s any number of occasions when we want to loop over something, counting, filtering, whatever and this let form allows us to declare a loop function (that can self-recurse) and kick it off with some starting values.

(define stuff '(1 "two" 3 'four))

(let numbers ((ls stuff)
              (r nil))
 (if (null? ls)
     (reverse r)
     (numbers (cdr ls) (if (number? (car ls))
                           (cons (car ls) r)
                           r))))

The function numbers is defined as taking formal parameters ls (for the “current” value of the list) and r (for the accumulated result). numbers is initially called with stuff and nil.

The body of the function is a single if statement where it asks, is ls an empty list? If so return the reversed value of r (seems a bit odd).

Otherwise, recursively call numbers again with the tail of the list (ah, clever) and the result of another if statement.

This second if statement is checking to see if the head of the list is a number with number?. If it is then prepend the head of the list to r otherwise just use r as it is. So, as we walk down the original list, stuff, left to right, if the head of the list is a number we create a new r which has this most recent value prepended to it. Hence the need to reverse the final result.

It is a different way of thinking.

For what it’s worth, the iterative variant would look something like:

(define stuff '(1 "two" 3 'four))

(let ((ls stuff)
      (r nil))
 (while (not (null? ls))
  (if (number? (car ls))
   (set! r (cons (car ls) r)))
  (set! ls (cdr ls)))
 (reverse r))

Horses for courses, I suppose, though the use of set! will upset the purists.

Just to keep you on your toes, as Scheme is tail-call optimised then it makes sense to transform any iterative calls, like while, into tail recursive calls as originally penned. Then we have the discomfort of re-working the user’s set! calls inside our elegant recursive function. Bah!

Macros

Thar be Dragons!

everyone

A macro, which is ostensibly a regular function, is created using a special form, define-macro that tags your function’s name as a macro. This tagging function clearly has to have hooks into the backend because, having looked out for all the special forms, the Scheme engine will check to see if the function name for a form is tagged as a macro and if so it does not evaluate the arguments but passes them onto the function/macro as is (atoms, lists). The macro can then poke about with its arguments as it sees fit and return some value.

The trick with macros is that the returned value must be something that can be immediately re-evaluated – so the return value must be an atom or a list. Macros, then, are code generators.

A great example is the production of boilerplate code. Given a name we might want to produce a host of related functions that do name-related stuff. Imagine some object-oriented system where you might want to define a class which has a parent class and some fields:

(define-class name parent field+)

Who, in their right minds, wants to type in the constructor:

(define make-name (args)
  ...
  parent
  ...)

and accessors for the arbitrary number of fields:

(define name-field1 (o)
  field1
)

(define set-name-field1! (o v)
  (set! field1 v)
)

and a predicate:

(define name? (o)
  ...
)

and whatever else is necessary for every class where all the ... are tracts of code that are identical across all classes when you already have name, parent and all the field names in your hands? The define-class call has been given everything it needs to know, the rest of it should get built automatically.

To help create those chunks of code the Schemers have come up with quasiquoting and, normally, a macro will return a quasiquoted expression:

(define-macro (name formals)
 ...do any prep work...
 quasiquoted-expression)

Quasiquoting

Literally quoting things, ', doesn’t let us do much in the way of substituting in values, well, not without a struggle:

(let ((a 1))
  '(a 2 3))

returns (a 2 3) which is probably not the (1 2 3) you wanted. You could try:

(let ((a 1))
  (list a 2 3))

which is fine but will itself start to get rather long winded and you’ll have to explicitly quote all of the other arguments that don’t want evaluating.

Instead we can quasiquote things where, like variable interpolation in strings in the shell, eg.:

echo "this is $0"

will iterate through its argument expression looking for unquoting expressions otherwise leave things quoted just like quote:

(let ((a 1))
  (quasiquote ((unquote a) 2 3)))

Here, quasiquote’s argument, ((unquote a) 2 3), is a list of three elements, the first is an unquote expression so quasiquote will evaluate the argument to unquote and replace the unquote expression with the result of the evaluation. The other two arguments remain quoted. Evaluating a results in the number value 1 hence the result of the quasiquote expression is (1 2 3).

Like quote, quasiquote’s result will not be evaluated. Eh? Isn’t that the point? Not quite. What we want to do is return what the user might have typed, the Scheme engine can then choose to evaluate it or not.

quasiquote and unquote have reader macros similar to quote’s ' which are ` (backquote/backtick) and , (comma) respectively. We could have written the above example as:

(let ((a 1))
  `(,a 2 3))

So unquote/, is very much like variable interpolation in the shell, with quote/quasiquote akin to the use of single quotes or double quotes in the shell.

One important difference, though, is that it is an expression that is evaluated not simply evaluating a variable:

(let ((a 1))
  `(,(a+ 5) 2 3))

will return (7 2 3) (assuming we still have the previous definition for a+ floating about).

There is another unquoting expression, unquote-splicing (with a reader macro of ,@) which expects its argument expression, when evaluated, to be a list then it will splice the elements of that list into the quasiquote expression whereas unquote would have simply inserted the list as a single entity. Given:

(define b '(1 2 3))

Then:

`(a ,b c)

returns (a (1 2 3) c) whereas:

`(a ,@b c)

returns (a 1 2 3 c).

It goes without saying that real macros are not trivial examples like this. In practice the a in

`(,a 2 3)

is likely to be an expression and surprisingly commonly it is a map where each time round the loop you will be generating another quasiquoted expression. Think about the field+ from the class suggestion above. The engine doesn’t know how many field names you’re passing in so it must use a loop. Of course, , isn’t ideal for handling the list of results from map, we probably want ,@ to splice in the results to the output of the macro:

(define-macro (define-class name parent field+)
 ...class prep work...
 `(
   ...quasiquote with ,name ,parent etc...
   ,@(map (lambda (field)
           ...field prep work...
           `(...quasiquote with ,field ,name etc...))
           field+)
  ...
  ))

(The prep work is likely to be creating symbols from combining the symbols for the name and field variables and so on.)

Probably the most subtle misstep people take is that macros are expanded “at compile time” (whatever that is but you get the impression that it’s not an ideal time). Which, in turn, should really say that macros are expanded (recursively and in a separate evaluation-space) before the resultant program is evaluated. In fact, it’s even more complicated than that as if a macro defines and uses macros itself then those macro-macros live in a meta-evaluation space and another level again if those macro-macros define and use macros….

Macro Issues

The three primary problems with macros are:

  1. you forget its arguments are not evaluated – so you can’t pass a variable you’ve calculated

  2. you forget it is run at compile time – your variable doesn’t exist anyway

  3. you’re a dirty hacker

The latter point (probably well made) is about hygiene. If your macro is off generating code then you’re likely as not going to be using some variables. Variables, of course, are names which, unless you’ve washed your hands carefully, are going to get in the way of the real program’s variables.

Suppose we wanted to define some logical-or functionality where you would evaluate a pair of forms in sequence until one of them returned a non-false result and then you would return that result. As we’re not going to evaluate all the arguments (forms) before invoking the function we must define a macro:

(define-macro (my-or e1 e2)
  `(let ((tmp ,e1))
     (if tmp
         tmp
         ,e2)))

which looks good. We’ve even been smart enough to use a temporary variable, tmp so that e1 is only evaluated once in the enclosed if statement. Less smart users would have written:

(define-macro (my-or e1 e2)
  `((if ,e1
        ,e1
        ,e2)))

and if e1 had been (honk-horn) then their macro would have expanded out to:

(if (honk-horn)
    (honk-horn)
    ...)

and they’d have heard “parp! parp!”. Hah, suckers!

In the meanwhile our sophisticated winning solution for:

(my-or "this" "that")

will expand to:

(let ((tmp "this"))
  (if tmp
      tmp
      "that"))

which wins all the plaudits until someone types:

(let ((tmp "foo"))
  (my-or #f tmp))

The macro is expanded thusly:

(let ((tmp "foo"))
  (let ((tmp #f))
    (if tmp tmp tmp)))

Yikes! (if tmp tmp tmp)? Which tmp is which?

This problem is solved by using hygienic macros – actually macros are still macros but rather the macro writer uses a little code trick to inject a unique variable name in the prep work:

(define-macro (my-or e1 e2)
  (let ((tmp (gensym)))
    `(let ((,tmp ,e1))
       (if ,tmp
           ,tmp
           ,e2))))

Here, we used gensym (generate symbol!) to come up with a unique symbol that cannot conflict with anything else. tmp, now a regular locally-introduced symbol during the evaluation of my-or (albeit in “compile-time macro-space” – do keep up at the back!), is bound to that unique symbol value and, having been defined in the outer let, can be unquoted and evaluated (becoming the unique symbol) inside the quasiquote expression.

So, tmp will have the symbol value G707, say, and the result of the macro, the last form, the quasiquote expression will look like:

(let ((G707 e1))
  (if G707
      G707
      e2))

If we called my-or again, tmp would be bound to a new (unique) symbol value, G713, say, and we’ll end up with a similar expansion but one that did not conflict – which means we could have my-ors inside my-ors without the per-invocation local tmp variables colliding (as they never appear in the expanded code).

However, macro writers are not perfect and hygiene is difficult to ensure so much work has been made to remove risk from macro writing. There have been two thrusts: syntax-rules and the more advanced syntax-case both of which have pattern matching at their hearts. Mind you, the latter is so complicated that it requires a 30,000 line pre-built version of itself to bootstrap!

The broad idea of both is to re-imagine macros as syntax transformers (which they always were…) within strict limitations (that’s the key). Elements of the macro calls are tagged and the tags are managed so that no unhygienic variables are introduced. (Unless you really really want to!)

Closures

Let’s try a more dynamic variant:

(define a 1)

(let ((ab+ (let ((b 3))
            (lambda (n)
             (+ a b n)))))
 (ab+ 5))

Cripes! What’s happening here? The inner let’s body is a function constructor (lambda) and so the inner let is returning a function value. The function is using n, a formal parameter, b, a variable introduced by the inner let, and a from the top level.

The outer let is binding that function value to the symbol ab+ which is called in the outer let’s body. The result (from the outer let) should be 9, the sum of a, b and n.

But wait! b isn’t in scope in the outer let! No it isn’t, but the function value in the inner let was closed over b trapping it, if you like, within its body. We can’t access b in any sense from ab+, other than the fact that it is used inside ab+, so there’s no risk of b being used directly outside of its bound scope. We can’t change it, it is stuck forever bound to the value 3.

So we have here the idea of enclosing the access to a variable inside a function. If the code was different, could we modify such a closed variable?

Yes, of course. This idea of, in effect, creating a private variable is as common in Scheme as anywhere else. You might want to keep a counter such that every time your closure was called it incremented the private variable and returned the result. You might want to maintain a list of stuff.

The critical part of this trick is that only this function has access to this variable. There is no other way for anything to use it or modify it. It is completely private.

You can extend this idea to enclosing a suite of functions over a common set of variables which the functions co-operatively maintain. Clearly you can’t return multiple function definitions from a single let but you can play a similar trick to letrec:

(define this #f)
(define that #f)

(let ((private 0))
 (set! this (lambda ...))
 (set! that (lambda ...)))

Both of the functions bound to this and that have global scope (because the names were declared at top level with define) and both of the function values have access to private (because they were created in the scope of private). No-one else has access to private other than through the interfaces, this and that.

This is about as complicated as it gets!

Ports

Scheme uses ports to describe things that you can read from and write to. The obvious use case is files.

In the pernickety way of Scheme there are separate functions to open files for reading and writing: open-input-file and open-output-file. After that you can read, write, check for end-of-file, close, “seek”, “tell” and so on. All the usual things.

On top of that Scheme has the notion of the current input, output and error ports. Obviously, maybe, those start off being the usual stdin, stdout and stderr that we’re used to. However, there are common idioms to temporarily switch those noting that many functions are well aware that they might be switched and will rigorously ask for the current input, output or error port.

This little snippet from S9fES ([Hol14]) combines several techniques to create a function that takes a file name and a thunk then switches the current input port to the newly opened file, runs the thunk, saving its result, closes the file, resets the input port and returns the saved result:

(define with-input-from-file
  (let ((set-input-port! set-input-port!))
    (lambda (file thunk)
      (let ((outer-port (current-input-port))
            (new-port (open-input-file file)))
        (set-input-port! new-port)
        (let ((input (thunk)))
          (close-input-port new-port)
          (set-input-port! outer-port)
          input)))))

Most programming languages have the capability to manipulate files in a similar fashion. We’re not breaking new ground, here.

String Ports

This is a bit more interesting. Instead of ports meaning an interface to files, how about an interface to strings?

Eh?

For an input string port you need to be able to:

  • open one – that means wrappering an existing string with a port construct to give it the port interfaces

  • read from one – we can probably get characters from a string, right? All we have to do is maintain a pointer to where we’ve read so far in the string.

  • check for EOF – are we at the end of the string yet?

  • close one – stop using it?

  • seek and tell? Well those sound like jumping about to different indexes in the string, setting or returning the current pointer into the string. Can’t be that hard.

As it is an input port then all write operations should fail on principle.

For an output string port we need a little more trickery under the hood:

  • open – requires an output port construct where the underlying string is extensible.

  • write – every time we write to the output string port we extend the underlying dynamic string.

  • check for EOF – is that meaningful?

  • close – stop using it? Maybe set a flag.

  • seek and tell? Well those sound like jumping about to different indexes in the string – so we should have been maintaining a current pointer into the string like above. Can’t be that hard.

As it is an output port then all read operations are invalid. However, unlike files, where we have an (externally) examinable result – one we can call open-input-file on, if nothing else – this output string port is hiding the underlying string in memory. We must be able to retrieve our carefully crafted musings, hence, get-output-string which returns the (underlying) string from the string port.

Maybe, at this point, string ports seem a little quixotic but, if nothing else, you can recognise that if all kinds of ports are maintaining the same interfaces then they should be interchangeable. Where, previously, you might have been writing to a temporary file, you can now be writing to a temporary string and you wouldn’t know. Whoa!

To be fair, that does rely on the code doing the writing to be scrupulously honest and ask for the current output port if it is going to write anything. Which is why (most) Scheme functions do.

Error Handling

Lisps do things a little differently, here. Rather than have “errors” or “exceptions” they have conditions. Which have a sort of class hierarchy feel to them as the different types of conditions are (looks for a better word but fails) sub-typed from one another. A difference is that they are not restricted to errors but users can create their own condition type (hierarchies) and such a condition can be signalled (ie. raised) whenever the user deems that a particular state of processing has occurred.

Mostly errors and exceptions, though.

A second, somewhat more profound difference, is where in the evaluation the condition handler is run. For most languages that have some kind of exception mechanism, say Python’s try mechanism:

try:
    risky_command ()
except Exception as e:
    print ("ERROR: risky_command() said: {0}".format (e))

you don’t really know, and probably don’t care, quite what risky_command was doing when it raiseed the exception. What you do know is that, however deep into its evaluation stack it has gone (think: how deeply nested the function calls in the risky library got), the evaluation stack is truncated back to here – all those function calls from risky_command through to the failure are discarded, we call print and move on with our lives.

Lisps don’t go in for that, rather, the condition handler is run instead of the failed function (that raised the condition). It can choose how to proceed: return an appropriate value on behalf of the failed function; call the next condition handler up; raise a different condition.

Sometimes you do want to truncate processing in which case you need continuations.

Nested Handlers

Your handler can be buggy (I know, unlikely) in which case who handles that? Whenever a condition handler is installed it sits first in the queue, if you like, of handlers to be consulted when a condition is raised.

Correspondingly, when a handler is run it is run in the context of the next handler out. In other words, you don’t handle your own bugs!

Obviously this cascades back out to some system installed handlers – which, you would like to think, are bug-free and can take anything in their stride.

Restarts

In concert with conditions, Scheme supports the idea of restarts. The idea is that the flow of the code gathers up a set of labelled blocks of code called restarts – in one sense a bit like picking up variables defined in the source that you normally wouldn’t use. When a condition is raised the condition handler is at liberty to determine the set of restarts collected and make a decision about whether to call one of them or not or revert to a higher (outer) handler which might have a better view (or a better restart) available.

If chosen, control is handed to the restart code block and the program continues. The trick being that, having repaired whatever mess had been caused, it continues back into the code that raised the error in the first place, this time, hopefully, continuing without error.

Consider a process managing backup files where the number of backup files is less important than the amount of disk space they are using. Eventually you will fill the disk which requires operator intervention.

On the other hand, the “disk full” handler might identify there is a “throw out the old” restart which we picked up before entering the section to create backup files. The restart can be called, have the old backup files cleared out and then carry on back into the section to create backups.

I, like many people, have written plenty of code to pro-actively (although often, post-actively) trim the set of backup files but in neither case is that code reactive.

Whether that’s the right way to approach problem solving is another question.

Continuations

Timey-wimey stuff.

—The Doctor

Continuations are the greatest idea in computing… that you’ve never heard of – despite using continuations all day every day!

They are also, possibly, the last thing you should be allowed to touch. So, noting we are curious but not cats, let’s dive in!

I think I have an example that explains the idea of a continuation to a non-continuation audience – if not, it’s going to be tricky. Consider a little snippet of C:

...
int i = a + b * c;
...

There are (arguably) four continuations there, four! Let’s break it down. As a starter for ten, we know from C’s operator precedence rules that a + b * c is really a + (b * c) that is, the calculation of b * c is performed first and the result passed to the next sub-expression, a + [] – with [] standing in for the result of the previous sub-expression, b * c.

In turn, the initialisation of i is waiting for the result of the addition, so, in this style, it looks like int i = []. If we were to rewrite our original snippet it might look like:

...
b * c
a + []
int i = []
...

This is now looking like a little chain of sub-expressions where each sub-expression generates a (single!) value ready for the next sub-expression to use in turn. From the perspective of the b * c sub-expression, it will calculate its value then continue (aha!) onto a + [], therefore a + [] is described as being the continuation of b * c.

a + [], in turn will calculate its value and continue onto int i = []. int i = [] is a + []’s continuation.

You said four continuations? Yes. The continuation of the line before b * c has a continuation too: it is b * c except b * c chooses not to do anything with the value provided by the line before.

Similarly, int i = [] has a continuation as well, the line after it. We can’t see from this snippet whether the line after chooses to use the value that int i = [] provides. (Which is itself an interesting question, what is the result of an assignment? Discuss.)

Each of these sub-expressions is a bit of code that is expecting an argument then calls another bit of code, like a chain of (continuation) functions. We can re-fashion our snippet as (the vastly uglier):

...
k_l(v) { k_m(b * c); }
k_m(v) { k_n(a + v); }
k_n(v) { k_o(int i = v); }
...

Again, exactly the same thing is happening, b * c is now encoded in k_l() and ignores the parameter, v. It calls k_m() with its calculated value.

a + [] is now k_m(), it does use the parameter v and calls k_n() with the result.

int i = [] is now k_n(), assigns the parameter v to i and (assuming you’ve figured out what the result of an assignment is) passes its result to k_o() and the chain continues.

There’s a variation called continuation-passing style (CPS) which might look a bit like:

...
k_l(v, k) { k(b * c, k_n); }
k_m(v, k) { k(a + v, k_o); }
k_n(v, k) { k(int i = v, k_p); }
...

where the continuation that you should pass your result to is passed to you and you have to tell your continuation whom to call in turn. That’s looks “difficult” to create but it turns out to be easier than you think.

You can even go through a process of transforming your, say, C code into a CPS program which, with your C hat on, looks like you are in a never-ending chain of callbacks.

Which is exactly what you are in!

Of course, you can’t be in a never ending chain of callbacks because your program will clearly(?) be in a tight loop. At the end of the chain is a NULL pointer (or some other sentinel value) which tells you to stop.

How you start and how you stop are another matter.

In order to generate the chain, the way I like to think about it is you start with your end-of-chain sentinel value. Each splodge of code you add is like a balloon inflating between where you are now and the sentinel, with the new code scribbled over the (arbitrarily expandable) surface. For the next bit of code you again squeeze it in before the sentinel but with the addition that the last code balloon no longer points at the sentinel as its continuation but now points at the start of the new balloon.

When you’re done processing the source code you can call the first of your balloons with some dummy value and the chain kicks off and will trundle through, function calling function calling function.

OK, back on course. So now we have a seemingly pointless rewrite of our code into the tiniest sub-expressions each of which are doomed to call the next sub-expression. What have we achieved? To be honest? Not much.

However, and this is the trick. These continuation functions, taking a value at a time, they look quite a lot like regular functions, taking a value at a time. Suppose I could ask for one of these continuation functions then I could call it with some value of my choosing, right?

Let’s take, a + []. If I had access to k_m() then I could call k_m(37) and then the program would continue from that point onwards – because everything is chained together and the links in the chain cannot be reordered – with k_m() calculating a + 37 and passing that result onto k_n(). I have dived into the middle of the chain and inserted a value that ignores the (immediately) previous calculation? What happened to b * c? Pfft! Don’t care, we’re continuing with 37.

Apart from that immediately previous calculation everything else is the same. The code still accesses the same lexical and global variables, will call the same functions in due course, it’s all hard-coded into the program. It has the same prior state every time you (re-)call it. Meh! Except not quite the same state if you have modified one of those (global) variables in the meanwhile.

Whoa! That’s a bit weird. So weird, in fact, that most languages don’t give you access to them. Although, to be honest, it’s more likely because they can (and probably will) create causal havoc.

Once you’ve got a continuation (function) you can keep calling it. Wooo! Although that will get boring after a while.

Except when it’s really cool. Think about generators where you want to go back and ask for the next value from a sequence without having to have pre-created all values in advance. Those require that the generating function stop processing and yield a value and then when you next call it the generator continues (hint, hint) from where it left off.

The greater use of continuations is not, however, repeatedly calling the same thing but rather for being able to jump to another point in the program – usually a local point but not always. Indeed, continuations have been called programmatic gotos with all the baggage that anything that is called “goto” incurs.

Jumping to another point in the code sounds awful but talk to me again about:

  • return in a function

  • next/last or break/continue in while and for loops

  • try/except – remember the risky_command example where all that stack of functions calls was discarded? How did that happen? Hint: non-local goto.

where you mess about with the natural control flow of the code? Of course, you might be so attuned to something like return that it has never occurred to you that you are prematurely jumping out of the flow of the code. But return and next/last are only jumping about in the loop/function? Sure, but they’re still jumping about, something has to make that happen.

What we’re saying here with continuations is that that ability to decide on some kind of control flow jump is no longer the preserve of the language implementers (for example, granting you return out of the decency of their hearts) but instead is exposed to the user for them to create their own.

In fact, continuations are capable of creating any control flow structure. Any.

OK, it sounds like a dangerous free-for-all, there must be some constraints? Yes, thankfully, you have to be able to get hold of a continuation (function).

call/cc

You can’t get any old continuation – that would be madness (though there’s probably some programming languages where you can). In Scheme you have to have reached the statement before the continuation you want to capture – that clearly places a severe limitation on being able to jump about. On top of that it is caught with a slightly confusing function, call-with-current-continuation or call/cc (a relief to everyone’s keyboard, if nothing else).

call/cc does exactly what it says on the tin. It figures out its own continuation and passes that to the function you supplied. It has called your function with the current continuation of the call to call/cc. Can’t fault the name!

call/cc calls the function with the continuation and, sort of like a single form’ed body of a function, will return the value that the function returns. Confusing, eh?

Actually, that’s even more annoying than at first blush. What on earth can I do with a continuation if it is just a parameter in a function? That’s the least of your worries. Your first problem is which continuation are you capturing?

Remember our snippet of C where we had four continuations, albeit just the two in our single line of code?

...
b * c
a + []
int i = []
...

If we want to capture k_m() – so we can jump in at a + [] with a value – we need to back-track a little and ask, whose continuation is k_m()? Here, as we know, k_m() is the continuation of b * c.

Now, this is where it gets tricky. We said call/cc will return its own continuation to the supplied function and we want it to be the continuation of b * c so we need to insert call/cc where b * c currently is.

OK, so what happens to b * c? Recall that call/cc will call the supplied function with the continuation and return the value the function returns. Aha! So we can put b * c in the body of the supplied function and that calculation will be returned by call/cc to its continuation which is k_m(), ie. a + [].

That sounds like a mess! It is but it’s a well-structured mess – and you get used to it.

Let’s see if we can visualise that:

func(k) { b * c; }

...
call/cc (func)
a + []
int i = []
...

I’ve got a unary function (one-argument!), func, whose body is the calculation b * c so that’s the value func should return.

We’ve also slipped in call/cc into the sub-expression stream in place of b * c so that call/cc’s continuation is k_m()/a + [].

call/cc should now call func with the argument k_m(). func will calculate b * c and return it to call/cc which will return it in turn to… k_m().

Yay! … Yay? Have we done anything there? Hmm, fortunately, no. But that’s the idea, we’ve managed to slip in some continuation capturing code and we haven’t changed a thing. Definitely a big wooo! there, believe you me!

But I thought we were going to capture k_m() ? Well, technically we did, in func, except we didn’t do anything with it.

It’s not like we can do anything useful with it in func either. Imagine if you called k in func:

func(k) {
          k (37);
          b * c;
        }

Cool! You will immediately goto k_m() with the value 37. Not only that, you will only ever call k_m(37) because continuations are control flow operators. It happens right there, right now. b * c is never going to be called.

The only useful thing you can do with a continuation inside the function supplied to call/cc is save it in a variable with wider scope. Wide enough scope that you can call it sometime later. A variable with global scope is considered a bad place to store it as it means that anyone at any time can jump straight back in with gvar(37) (or whatever you’ve called it).

Instead, you’re probably going to be storing it a variable with the least wide scope you can get away with to solve your problem (doubtless there will be situations where global scope is appropriate). Think about the private variables discussed earlier.

Finally, we’ve been playing with the sub-expressions, what do we do in the real code?

func(k) {
 private_k = k;
 b * c;
}

...
int i = a + call/cc (func)
...

Sweet!

Naturally, Schemers don’t like named functions because it’s a one-use function and we don’t need to clutter the namespace – and no need when you can throw in a lambda expression and get those parentheses going!

(let ((private-k #f))
 (let ((i (+ a (call/cc (lambda (k)
                         (set! private-k k)
                         (* b c)))))))
 (private-k 37))

Which looks cool (we’ll gloss over the set!) but will loop forever.

The trouble is, the continuation that we’ve captured is part of the lead up to the call to (private-k 37) the invocation of which means we instantly appear to return from the call/cc function with the value 37 and work our way through to the (private-k 37) line whereon we instantly appear to return from the call/cc function with the value 37 and work our way…

Oops. Continuations, here, with a popular narrative of “call once, return many times…” need some careful thought.

Another, annoying/entertaining example, whilst we’re befuddling ourselves is the innocent capture of the current continuation into a top level variable:

(define top-level-k (call/cc (lambda (k)
                              k)))

which certainly sets top-level-k to a continuation. We can call that continuation with (top-level-k 37) whereon we return (again) from call/cc with the value, uh, 37 which gets assigned to top-level-k. Oopsies, we’ve just trashed our saved continuation on its first use.

I might have mentioned there are “issues” with poorly chosen continuations.

Corner Cases

Choosing to capture a continuation in the middle of a sequence isn’t quite fair. What about our two “hidden” continuations, the one for the line before and the one for the line after?

The line before’s continuation is quite easy. We know – because we’re looking at the code – that b * c doesn’t use the value passed from the previous expression so if we want to capture this continuation then we can simply insert a dummy call/cc before our line. Dummy in the sense that its only job is to capture the continuation but not provide any useful calculation. In fact, it will produce a result, what ever the result of the act of saving k is (you have figured out the result of assignment, right?):

(call/cc (lambda (k)
          ...save k...))
(set! i (a + (b * c)))

k, if invoked with any argument, will then start processing b * c (which will ignore the value passed) and we continue cleanly.

The line after’s continuation is more of a mixed bag. From looking at the code we might determine that the next statement does nothing with the value from the previous statement in which case we can insert a similar dummy call/cc or we realise that our (set! i (a + (b * c))) statement is in the middle of something else which is reliant on the result in which case we need to wrapper the whole set! form in call/cc:

(call/cc (lambda (k)
          ...save k...
          (set! i (a + (b * c)))))

where the value returned by call/cc (the value returned by set!) is passed to whomever wants it. k, now, is in the position of giving whomever a different value than whatever you’ve spent all that time figuring out what the result of an assignment is.

Escapes

We’re familiar with (or have at least heard of!) various exception handling memes:

  • Python’s try/except/else/finally and proactive raise

  • Java’s try/catch/finally and throw

But those are far from any complete set. Lispers have thrown a few more hats into the ring:

  • (catch label form+) allowing during the execution of form+ a (throw label form) which requires a dynamic escape environment (as those labels are evaluated and then associated with continuations)

  • (block label form+) and the corresponding (return-from label form) where label is not evaluated (ie. is a symbol/identifier) but still bound to a continuation but the combination is only available in a lexical escape environment.

  • let/cc (and Dylan’s bind-exit) support assigning a continuation to a dynamic variable

  • delimited, composable continuations or prompts

    • shift/reset

None of those support the idea of a finally clause. The headline solution to that is Scheme’s unwind-protect which is a derivative of dynamic-wind which, despite being in the RnRS, is probably the most subtle and hackiest thing I have seen. So we will implement it.

The implementation, though, requires several layers to support it which we will get to in due course.

Continuations in C

Naturally, these abstract computer science notions are too high brow for C. Until you realise that C has had the same tools since forever in setjmp(3) and longjmp(3) and their signal-safe cousins sigsetjmp(3) and siglongjmp(3) (probably all on the same man page!).

Indeed, there’s a reasonable likelihood that Scheme’s continuations are implemented using C’s sigsetjmp/siglongjmp.

The Linux man page notes:

In summary, nonlocal gotos can make programs harder to understand and maintain, and an alternative should be used if possible.

Wise words.

Compound Data Types

There is a tendency for these to be implementation-specific but broadly you can use the usual compound data types of strings, arrays (or vectors), hash tables and structures.

Strings, perhaps unusually for Scheme, allow you to modify them (poor form!). I guess this is a commonly expected functionality. string-ref and string-set! are accessors.

Arrays and hash tables work much as you would expect with the element accessor functions following a similar style: array-ref/array-set! and hash-ref/hash-set!.

Structures

Various Scheme implementations introduce the idea of records (and have normalised the interfaces in various SRFIs). It is very much like the suggestions been given previously for creating classes in that a structure has a type, defining its name and fields (and possibly parent!) and then instances of that structure can be created and passed around.

Accessor functions for the fields are created when the record type is created giving you same style of names as above: commonly name-field1 (omitting the -ref) and set-name-field1! and the like.

Object Orientation

Object-oriented programming is often bound tightly to the concept of message passing. In the original Klingon Smalltalk:

object <- message arguments

which you might imagine, in a Scheme-ly way, to look like:

(object message arguments)

which would be perfectly fine so long as we allow an object in functional position – remember that the first element in a list is going to be applied to the remaining elements in the list. So you might think that should be:

(message object arguments)

although we’ve only just said we can call k(37) where k is a continuation not a “real” function. Quite what is allowed to be in functional position in a Scheme form is becoming a little greyer.

EPLAiP ([Hai10]) p.135, implements the former (in Perl) in a straightforward manner. If the evaluated value in functional position is an object then the first argument is assumed to be an object-specific function, a method, and there’s a mechanism for looking up a method within the hierarchy of the class. The method is then applied to the arguments with the object itself bound to the variable this (cf. self) for the duration of the method.

On the whole, though, that’s not very Scheme-ly which prefers functions in functional position.

There’s a second argument in favour of not using the message passing idiom in that if there are multiple objects within the arguments then only the first argument can be used to distinguish the call. For example, if we wanted to add two numbers together where we might have a class hierarchy of Integer is a Real is a Number, then:

(add Number1 Number2)

we only get to make decisions about which actual function to dispatch to based on Number1:

(case (class-of Number1)
  ((Integer) (add-integer Number1 Number2))
  ((Real)    (add-real    Number1 Number2))
  ((Number)  (add-number  Number1 Number2)))

which means the implementation functions, add-integer, add-real and add-number, must look at the class of Number2 to really determine what to do:

(define  (add-integer n1 n2)
  (if (not (eq (class-of n2) 'Integer))
      (add-real n1 n2)
      (make-Integer (+ (Integer-value n1) (Integer-value n2)))))

Using multiple arguments to determine the correct implementation method is called multiple dispatch and for that we need generic functions.

LiSP ([Que94]) p.87 introduces a basic object system will allow us to define a class:

(define-class classname superclass (field+))

which, as a side-effect, creates a flurry of class-oriented functions based on the identifiers used above. A constructor:

(make-classname arguments)

field accessors for an object of that type:

(classname-field1 obj)

field mutators:

(set-classname-field1! obj value)

(note the trailing !) and a predicate:

(classname? obj)

Generic Functions

Again, from LiSP ([Que94]) p.88

Generic functions is a mechanism for declaring the preferred behaviour based on the class of (at least) one of the arguments (although, single dispatch is the norm, multiple dispatch is possible albeit with exponentially increased complexity which is why it isn’t common). The point being you can choose which of your arguments should be the one to distinguish behaviour rather than being forced to choose the first.

Queinnec’s generic function is introduced with:

(define-generic (name arguments)
  body)

Where body will commonly be a call to an error function to catch instances where the programmer has forgotten to define a class-specific method.

One of the arguments must be distinguished (not necessarily the first argument!). That is done by making it a list (of itself):

(define-generic (foo arg1 (arg2) arg3)
  (error "bad args"))

Here, we’ve made the second argument, arg2, the distinguishing one.

Remember, define-generic is a syntax transformer so its arguments are passed to it direct. There is no danger of (arg2) being evaluated. Not by the Scheme engine, anyway.

A generic method (ie. a class-specific method) is introduced with:

(define-method (name arguments)
  body)

notably, syntactically identical to the define-generic above

Here the distinguished argument must again be identified as a list but this time of itself and the name of the class this method is specific to. Our foo example might look like:

(define-method (foo arg1 (arg2 <X>) arg3)
  ...)

where this foo method is specific to calls where arg2 is an object of class <X>.

Our number example looks like:

(define-generic (add (n1) n2)
  (wrong "no method defined for" n1 n2))

(define-method (add (n1 Integer) n2)
  (add-integer n1 n2))

(define-method (add (n1 Real) n2)
  (add-real n1 n2))

CLOS

Queinnec doesn’t need to head down the road of multiple-dispatch (where more than one argument has a class distinguisher) for his purposes.

The Common Lisp Object System (CLOS) is multiple-dispatch and has a very complicated Meta Object Protocol (MOP) which is far too overbearing for our needs.

Subsequently, Gregor Kiczales developed Tiny-CLOS (Tiny-CLOS) at Xerox Parc in 1992 which is a much simpler but still multiple-dispatch with a simple MOP variation on CLOS. Tiny-CLOS has been re-implemented in many Schemes as well as other languages.

Type Inference

That said, whilst Scheme won’t help you, there is opportunity for compiler writers to introduce type inference where in principle, if only the basic functions have any type information about them stored, everything else can be derived:

(define (strlen s)
  (string-length s))
(define i 3)
(strlen i)

A type inferencing compiler, knowing only that string-length takes a string as an argument would deduce therefore that s, the first argument of strlen, should be a string, that i is a number therefore the function call (strlen i) is a type error.

Last built at 2024-09-07T06:11:30Z+0000 from 463152b (dev)