Error Handling

Basic traps

I’ve noted before that the error handling is a spin on Queinnec’s [Que94]’s monitor extended to take one or more conditions and called trap.

Ostensibly, as the code runs it picks up a series of traps, each interested in their own set of conditions. Each trap has a link to its parent handler with the bottom-most trap linking to itself. (Which we assume is coded well enough to cope with any error!)

digraph lower { node [ shape=box; fontsize=12; ] t4 [label="trap *this*"]; t3 [label="trap *that*"]; t2 [label="trap *next*"]; t1 [label="trap *...*"]; t0 [label="trap *reset*"]; t4 -> t3; t3 -> t2; t2 -> t1; t1 -> t0; t0 -> t0; }

When a condition arises, we can trot through the set of established traps and look to run the handler associated with the first one prepared to handle this kind of condition and then we play a little trick. In case this handler itself fails (or decides that, in fact, it can’t handle the condition and wants to pass the buck) we run the handler in the context of its own parent handler.

To make that work we copy the parent handler and stick it on the top of the stack. So, suppose a *that* condition occurred. We would skip *this* (assuming it wasn’t an ancestor of *that*!), find the trap for *that* and put a copy of *that*’s parent, *next*, on the top of the stack:

digraph lower { node [ shape=box; fontsize=12; ] t5 [label="trap *next* (copy)"]; t4 [label="trap *this*"]; t3 [label="trap *that* (handler being run)"]; t2 [label="trap *next*"]; t1 [label="trap *...*"]; t0 [label="trap *reset*"]; t5 -> t1; t4 -> t3; t3 -> t2; t2 -> t1; t1 -> t0; t0 -> t0; { edge[style=invis]; t5 -> t4; } }

Now, should this handler raise a condition itself (by design or foul play) then the standard mechanism for finding a handler will start at the top of the stack and find… this trap’s parent, which as we copied it, points at its own parent skipping all the unwanted (and doomed to fail?) traps set up in the meanwhile.

In Extremis

Sadly, it’s not traps all the way down. Almost but not quite. Something needs to do some serious error handling when all else fails.

The last (first?) two traps (automatically installed) are less inclined to play nicely and will attempt to reset the system. That doesn’t mean they will always succeed, mind!

In the first instance, the restart condition handler will invoke a continuation quietly wrapped around every top level form. Nominally called ABORT in the code generator and VM, its continuation is the next top level form.

This is prone to issues with (global) state where we’ve crashed out of some complicated processing leaving (global) variables in an unhelpful state.

In the second instance, the reset condition handler should be trying to restart the VM in a safe manner, hence the name, except that I’m not particularly clear on what safe state is. Arguably, there are only two known safe states, the start of the program and the end. And given that we ran through the program and got into a mess that only leaves us with the end of the program as a safe state, hence the reset condition handler invokes the “exit” continuation.

To be fair it could actually just call exit(3) but I want it to stagger onto attempting to shutting the program down (cleanly). Part of that will be to drop out some VM state (if requested) which is useful for debugging.

That said, the internal state is quite often messed up – not helped by my debugging trying to print out values that have been garbage collected – and so it quite often fails. More work required.

*

Neither of these fallback trap handlers satisfy one of my own favourite shell settings, exit on error, though. As Idio matures, exit on error will migrate towards the fore!

Implementation

Pushing extra stuff on the stack as we process traps needs some careful management. We can’t just leave it there for the next guy along to trip over.

First of all, what do we know has happened? Well, we were called from somewhere so we need to remember where to go back to, some PC. We ought to save the current machine state (although as we are notionally running in place of the code raising the condition then it’s a bit moot. Plus, of course, we’re pushing another trap on the stack.

All things being well, then, the “repair” should be to pop the trap off the stack, restore the machine state and return to said PC. What’s more, we’ll be doing that for every invocation of a trap handler.

Well, the same thing being run every time has the whiff of a snippet of code in the Prologue and so that’s what we’ll have. We’ll call it “condition handler return” or CHR.

Finally, we need to convince the trap handler code to run through our CHR snippet. Of course, all bits of handler code will call RETURN to whatever is on the top of the stack. So the obvious thing is for us to push the PC of the CHR code on the stack.

So, for the following bit of code:

trap ^rt-divide-by-zero-error (function (c) {
                                 ;; {HERE}
                                 3
}) {
  display* "1: 1 / 0 =" (1 / 0)
}

then at the point {HERE} we’ll have pushed the next trap onto the stack (in case we mess up) as well as the PC for where we were called from:

digraph lower { graph [ rankdir="LR" ]; node [ shape=box; fontsize=12; ] body [ shape="record" label="(body)|<f0> ...|<f1>1 / 0|<f2>..." ]; handler [ shape="record" label="(handler)|<f0> ;; HERE|<f1>3|<f2>RETURN" ]; stack [ shape="record" label="(stack)|<f0>*CHR* PC|<f1>TRAP (next is ?)|<f2>STATE|<f3>*body* PC|..." ]; CHR [ shape="record" label="(prologue)|...|<f0> POP-TRAP|<f1>RESTORE-STATE|<f2>RETURN|..." ]; body:f1 -> handler:f0 handler:f2 -> stack:f0 [style="dashed"] stack:f0 -> CHR:f0 CHR:f2 -> stack:f3 [style="dashed"] stack:f3 -> body:f2 }

(Obviously, the handler doesn’t call anything on the stack, I’m just trying to represent the linkages between bits of code and the PC they jump to.)

The trap information on the stack has:

  • the condition the trap will handle

  • the trap handler function

  • the (stack position) of the next handler

Hence, in this case, we don’t know what the next handler is as it’s not in this snippet. Quite possibly one of the default handlers. The point being, though, that whatever this next trap is, it has been pushed onto the stack in readiness for this handler going awry.

Operational Behaviour

The expression 1 / 0 will get a little way into some piece of code before something realises that the denominator is zero. That code will raise an ^rt-divide-by-zero-error condition and call idio_raise_condition() in src/vm.c.

That code will rummage about on the stack looking down the list of traps on the stack looking for one which will handle ^rt-divide-by-zero-error or one of its ancestors.

Having found such a trap it will then push onto the stack:

  • the current PC – ie. where to return to

  • the program state

  • the trap details for this (just found) trap’s parent so that if this handler fails and the code circles around, idio_raise_condition() will invoke its parent (and not itself).

  • the PC for CHR in the prologue

We then run the handler.

In this case the handler sets val to 3 and ends, ie. calls RETURN.

RETURN finds the PC for CHR on the top of the stack which will pop a trap off the stack (our handler’s parent), restore the program state and call RETURN.

This RETURN (in CHR) will find the PC for the original body code and the invocation of 1 / 0 will return with the value… 3.

OK, OK, possibly not the best way to handle a divide by zero error!

Double-stacking

What if my handler reverts to another handler? Whelp, steady on.

trap ^idio-error (function (c) {
                    ;; {HERE}
                    3
}) {
  trap ^rt-divide-by-zero-error (function (c) {
                                   raise c
  }) {
    display* "1: 1 / 0 =" (1 / 0)
  }
}

Now we’ll have had two (protecting?) traps pushed onto the stack. In invoking the inner ^rt-divide-by-zero-error handler, idio_raise_condition() will have pushed the next trap handler onto the stack (irrespective of whether it handles ^rt-divide-by-zero-error because we don’t know what the handler is going to do!).

As we can see, that handler passes the buck and idio_raise_condition() finds that, as it happens, the next trap handler does handle ^rt-divide-by-zero-error (as ^idio-error is an ancestor of all Idio errors) and so it has to push the parent of that handler onto the stack. Again, like before, we don’t know who that parent is.

That gives us the slightly messy graph:

digraph lower { graph [ rankdir="LR" ]; node [ shape=box; fontsize=12; ] body [ shape="record" label="(body)|<f0> ...|<f1>1 / 0|<f2>..." ]; handler_c1 [ shape="record" label="(handler c1)|<f0> raise condition|<f21>RETURN" ]; handler_c2 [ shape="record" label="(handler c2)|<f0> ;; HERE|<f1>3|<f2>RETURN" ]; stack [ shape="record" label="(stack)|<f0>*CHR* PC|<f1>TRAP (next is ?)|<f2>STATE|<f3>*handler_c1* PC|<f4> *CHR* PC|<f5>TRAP (next is ^idio-error)|<f6>STATE|<f7>*body* PC|..." ]; CHR [ shape="record" label="(prologue)|...|<f0> POP-TRAP|<f1>RESTORE-STATE|<f2>RETURN|..." ]; body:f1 -> handler_c1:f0 handler_c1:f0 -> handler_c2:f0 handler_c2:f2 -> stack:f0 [style="dashed"] stack:f0 -> CHR:f0 CHR:f2 -> stack:f3 [style="dashed"] stack:f3 -> handler_c1:f21 handler_c1:f21 -> stack:f4 [color="blue";style="dashed"] stack:f4 -> CHR:f0 [color="blue"] CHR:f2 -> stack:f7 [color="blue";style="dashed"] stack:f7 -> body:f2 [color="blue"] }

Where, ultimately, we bounce through the two handlers returning 3.

Note

It’s important to note here that raise returned with a value.

An awful lot of the time, errors are unexpected and the engine will eventually engage the default handlers which will try to cope.

Indeed, we might go further and only be intending to use raise when we can’t cope and want the main engine to bail out for us. In this case we’re not expecting raise to return. But as you can see it is entirely possible for it to return which may very well come as “a surprise” to some.

The same problem exists in the underlying C code.

reraise

Handling conditions like that works a treat although there’s a corner case (isn’t there always?) where you might want to deliberately throw up a different condition in response to a first and allow all of the user-created traps to kick in.

My go-to example (seeing as I had to implement it for this) is handling child processes exiting. We, being the parent, will get an operating system signal, SIGCHLD. We don’t want the user to be handling SIGCHLD (as we’re the shell and it becomes very messy, very quickly) but we do want to alert them if the process failed that their (child) process has failed – if it exits with 0 then all is well and we don’t need to say anything. So, when a child process fails we want to raise a “command status error” condition.

Now, the problem lies in that the (default) job control SIGCHLD handler is way down the list of traps. In fact it’s not much above *reset* in the graphs, at the top of the page, which means we’ve bypassed all of the user-created traps which are the ones looking to handle ^rt-command-status-error. Oops!

So, rather than raise the “rcse” condition, we reraise it which, rather than look for the first trap in the stack, looks for the trap with the highest “next” pointer. Given that all traps have a next pointer pointing further down the stack then the one with the highest next pointer must be at the top of the (original) trap tree.

Advanced traps

In the previous chapter we went to a lot of trouble (possibly far too much) to investigate and handle conditions and then delimited conditions and dance around the problem of unwinding. With traps we can get into a similar position.

trap-return

Most people who have used exceptions are used to the try/except form where the extant state of the machine is trancated when the handler kicks in:

try:
    something_risky()
except ThisException:
    ...
except ThatException:
    ...

where we aren’t handling the exception in any way, we are simply acknowledging that the exception occurred, somewhere, and maybe we’ll get a stringified clue in the exception object.

Which is perfectly fine! Let’s make that clear. The code “dun goofed” and we’re still running, able to acknowledge the exception and move on.

It is a paradigm for handling exceptions, it’s one people are used to. Can we do the same? Just to re-iterate, we want trap-return to continue processing at the continuation of the trap.

In the first instance, a trap’s handler can only do one of two things:

  1. return a value

  2. raise a condition (quite often the one we were run to handle!)

Neither of those seem to give us much leeway to truncate the state of the machine.

Of course, the whole of the last chapter was about (slightly bonkers) ways to jump around in the code. We should be able to do something there.

We can, of course, but it requires a bit of tinkering. The obvious and main thrust is going to be to rewrite trap to use a delimited continuation with prompt-at.

But wait! trap (at the time of writing) was a special form and we saw how those are not amenable to…er, anything.

OK, job #1, rename the special form %trap and write a trap template that is a simple wrapper around %trap. Done!

Next job, make the simple wrapper use prompt-at and a standard trap-return prompt tag. With that, introduce a new expression, trap-return which can use that prompt-tag. OK:

lib/delim-control.idio
trap-return-tag := make-prompt-tag 'trap-return
(define-syntax trap-return
  (syntax-rules ()
    ((trap-return)   (trap-return (void)))
    ((trap-return v) (control-at trap-return-tag k v))))

(define-syntax trap
  (syntax-rules ()
   ((trap conds handler body ...)  {
     prompt-at trap-return-tag {
       %trap conds handler {
         (begin body ...)
       }
     }
   })))

and now we can write:

trap ^rt-divide-by-zero-error (function (c) {
                                 display "trapping ^rt-divide-by-zero-error"
                                 trap-return 3
}) {
  display* "1: 1 / 0 =" (1 / 0)
}
display* "done"

which gets us:

trapping ^rt-divide-by-zero-error
done

which seems promising enough until we introduce our old friend…

unwind-protect

*shakes fist*

Now, this is a bit subtle (and took me a while to spot) so I’ve added a few more helpful prints.

trap ^rt-divide-by-zero-error (function (c) {
                                 display* "trapping ^rt-divide-by-zero-error"
                                 trap-return 3
}) {
  unwind-protect {
    display* "1: 1 / 0 =" (1 / 0)
    display* "not run"
  } {
    display* "cleaning up!"
  }
  display* "post unwind"
}
display* "done"

from which we get:

trapping ^rt-divide-by-zero-error
cleaning up!
post unwind
done

which is mostly what we’d expect apart from the post unwind bit. What is our nominal trap-return doing continuing to run the body of the trap?

Here’s the problem with dynamic-wind: ultimately, when it is run, it captures its own continuation (here, the post unwind expression and, essentially, the rest of the trap’s body) and then mixes it up with the cleanup code thunk.

When we invoke trap-return which, as a thin shim around control-at, will start unwinding holes. The holes (and their continuations) are for the cleanup code and unwind-protect’s continuation. This means we’ll have the cleanup code run which will automatically (and out of our control) run the continuation of unwind-protect.

Bah! That’s a bit annoying.

We’ve a couple of choices. We could simply throw all the intermediary holes between us and our trap-return prompt-tag away. Which is an option although it seems a bit…unfriendly.

Otherwise we could try to see if we can pick apart these holes and just run the cleanup clauses.

Hmm, in the first instance we can’t actually tell them apart as they’re all (default tagged) reset/shift tuples.

What we really want to do is capture the continuation that is the cleanup clause in order that we can find it and call it. For that we need to revisit dyn-wind-yield.

dynamic-wind III

Here, we’re creating an unwind-protect prompt-tag and jiggling dyn-wind-yield around a bit so that the after-thunk can be invoked in a separate clause:

unwind-protect-tag := make-prompt-tag 'unwind-protect

define (dyn-wind-yield before-thunk thunk after-thunk) {
  dwy-loop :+ function (th) {
                prompt-at unwind-protect-tag {
                  (function (res) {
                     (after-thunk)
                     res
                  }) (control-at unwind-protect-tag dwy-k {
                    (before-thunk)
                    res := (th)
                    dwy-k res
                    (try-yield res
                               (r r)                         ; return the result
                               (v k {
                                 reenter := yield v
                                 dwy-loop (function () {
                                             k reenter
                                 })
                               }))
                  })
                }
  }

  dwy-loop (function () {
              reset (thunk)
  })
}

dynamic-wind := dyn-wind-yield

There’s a couple of things hiding in there:

  • in order that we can do more than the most trivial operation in the unwind clause we need to do the closed function trick we mentioned previously with the control-at clause being the argument to it

  • in place of invoking after-thunk in control-at’s body we call the continuation, here, dwy-k, to do it for us.

    The argument we’re passing is the result of (th) (ie. the original (thunk)) thus allowing the unwinding clause to return it (or something else, if called directly).

We’re not done yet, though. Every unwind-protect expression is going to drop two holes onto the list: a default-reset tagged hole and an unwind-protect tagged hole.

So we call the continuation in the unwind-protect tagged hole, right? No, where did you get that idea from?

We have to remind ourselves of the holes and continuations being created. reset will create a hole, tagged with default-reset and with a continuation of [outer-k], that of reset itself.

reset then runs (thunk), its own body.

In the case of our improved dyn-wind-yield, above, that body has a prompt-at unwind-protect with a body of a closed function expression with an argument that is the control-at unwind-protect expression. So the hole that is being created there is tagged with unwind-protect and whose continuation is the closed function expression.

But wait. That continuation (pointing at the closed function expression) is not the substance of closed function expression but merely a future intention. The substance of the closed function expression is still embedded in that first hole, tagged default-reset with a continuation of after the reset, itself embedded in a thunk.

Where does that leave us? Confused for a start, if you’re me! The upshot is, though, that our holes doing things are (arguably) mis-labelled. The one tagged unwind-protect contains the code for the before-thunk and th/thunk and the hole, rather generically, tagged default-reset has the code for running the after-thunk.

Knowing that we can identify the holes that do the cleanup code we can now write an alternate unwinding procedure and change trap-return to use it:

define (unwind-to* tag v) {
  holes-prefix := unwind-till-tagged! tag

  loop :+ function (h*) {
            if (null? h*) #n {
              if (and (hole-tagged? (ph h*) unwind-protect-tag)
                      (pair? (pt h*))) {
                        vm/hole-push! (pht h*)
                        loop (ptt h*)
              } {
                loop (pt h*)
              }
            }
  }

  loop (reverse holes-prefix)

  abort-top! v
}

(define-syntax trap-return
  (syntax-rules ()
    ((trap-return)   (trap-return (void)))
    ((trap-return v) (unwind-to* trap-return-tag v))))

So unwind-to* is a bit like the control-shift-at* code in that we get all the holes leading up to the one we’re interested in, trap-return. We now have a shufti at those holes.

With the holes reversed, when we’ve had an unwind-protect expression we’ll get one tagged unwind-protect then one tagged default-reset where the default-reset one is the code we want to run.

So, let’s put those default-reset-after-unwind-protect holes back on the list and let the normal unwinding behaviour kick in.

Last built at 2024-09-16T06:11:12Z+0000 from 463152b (dev)