Evaluating

The evaluator is how we find meaning in the source code.

Knowing that the reader has supplied us with lists of lists then it shouldn’t come as huge surprise that the evaluator is quite recursive in its nature but is relatively straightforward all the same.

There are plenty of complications, of course, for example there are several situations where the source code is normalised by re-writing it.

This normalisation is a form of implicit syntax transformation (as in, the Idio language makes the transformation according to its built-in rules, which we’re about to discuss). The use of syntax expanders (via templates aka macros) allows users to make explicit syntax transformations.

Context

We’re not deviating a great deal from the technique outlined in LiSP ([Que94]) based on which our search for meaning is going to involve a few basic repeating variables:

  • e – the expression we’re currently evaluating (not a huge surprise)

    As we recursively evaluate the elements of a list, say, then the expression to be evaluated will become the, say, head of the list. When the evaluation recursion unwinds, the expression “to hand” will revert as expected.

    We’ll likely have eh and et as the head and tail of a pair/list and further derivatives.

  • src – an Idio addition is to maintain the original source expression in order that we can pass on any source code properties (namely the lexical object defined by the reader) to any derived expression we might generate

  • nametree – a “name tree”

    As we walk through the lists of lists and determine that new lexical variables are introduced then they push in front of previous lexical variables giving us a hierarchy of known names. This name tree is then available for us to check down when a variable is referenced.

    As the lists of lists recursion unwinds then the nametree unwinds with it.

    It is a list of lists where the inner lists are of the variables introduced by a given variable-introducing statement.

    A name tree is slightly more obvious in Scheme where multiple variables can be introduced in a single let (or variant) statement but the effect is still true in Idio where an assignment operator introduces variables one let* at a time.

    {
      a := 1
    
      ;; nametree ~ ((a))
    
      ;; => a is first list, first slot ~> SHALLOW-ARGUMENT-REF0
    
      b := 2
    
      ;; nametree ~ ((b) (a))
    
      ;; => a is second list, first slot ~> DEEP-ARGUMENT-REF 1 0
    
    }
    

    Hmm, not the most clear example but our list of names in the name tree has:

    • after the first variable assignment a variable a in scope

    • after the second variable assignment it then has a b in scope and then the a, now a level out

    The reason this is important goes back to the LiSP mechanism for accessing lexical variables through a linked list of frames. The opcodes go one level back, SHALLOW-ARGUMENT-REFn, or multiple levels back, DEEP-ARGUMENT-REF d n (for some depth d and index within the frame i).

    The nested frame mechanism is required because when we call a closure we’ll create a frame here for the arguments to go into then invoke the closure. The first thing the closure mechanism does is reset the frame hierarchy to that which the closure had when it was created. The frame we just created is linked into that “historic” frame hierarchy and the closure runs.

    From the closure’s perspective, it sees the arguments to itself in front of the original set of lexical variables when the closure was created.

    let is still legal syntax so we can make it a bit more obvious:

    {
      let ((a 1)
           (b 2)
           (c 3)) {
    
      ;; nametree ~ ((a b c))
    
      ;; => c is first list, third slot ~> SHALLOW-ARGUMENT-REF2
    
      x := 2
    
      ;; nametree ~ ((x) (a b c))
    
      ;; => c is second list, third slot ~> DEEP-ARGUMENT-REF 1 2
    
    }
    

    Here, now, after the x assignment, we have an x in scope and then all three of a, b and c are known names another level out. All three were created with the same variable-introducing statement, let

  • flags – we’ll need some flags to indicate whether:

    • the expression is in tail position

      This is very important – and surprisingly easy to maintain – to give us the power of tail call optimisation.

    • an Idio addition is the nature of variables being created

      Here, we’re looking at whether the variable is being created:

      • lexically, because we found it in the name tree

      • at top-level, because we couldn’t find it in the lexical name tree

      • in a dynamic or environmental or computed context – which is effectively top-level but managed in a different way

  • cs – a set of known constants

    Nominally, this can be used as a “known top-level names” list (amongst other things) but in Idio it is used to map a constant of any kind (symbol, list, array, etc.) into a unique integer for embedding in the byte code.

  • cm – an Idio addition is the current module

    As the source code switches between modules the expectation is that the evaluator can find the correct variable (ie. my v not the other guy’s v) and to effect that we need to track any changes to the sense of the current module by latching onto any module changing statements in the source code.

All of which are C lexical variables used throughout src/evaluate.c (and Idio lexical variables in the Idio variant lib/evaluate.idio, the metacircular evaluator).

In effect, all of the above become formal parameters to almost every function in the evaluator.

In case anyone is still reading the s in cs for the constants is for “star” as in a more Lispy or EBNF-y c* meaning zero or more.

There is also use of the likes of ep with the p for “plus” as the C equivalent of e+ meaning one or more.

evaluate

Kicking it all off is idio_evaluate() which looks like this:

IDIO idio_evaluate (IDIO src, IDIO cs)
{
    ...

    IDIO m = idio_meaning (src,
                           src,
                           idio_S_nil,       /* name tree */
                           IDIO_MEANING_FLAG_NONE,
                           cs,
                           idio_thread_current_module ());

    ...

    return IDIO_LIST2 (..., m);
}

It’ll take some source code, src, and a list of known constants, cs. The source code bit is obvious and most invocations will pass the virtual machine’s constants array for constants.

Fundamental Meaning

As noted elsewhere, we rely on the evaluator distinguishing between special forms, templates and anything left over is a derived form or a constant.

It is hugely tempting to add to the list of special forms. Of course, the magic works but it will become a bind it is hard to extract yourself from.

However, even Scheme has a minimal set of special forms to let you bootstrap everything else:

  • define and set! allow you to bind a name to a value and to change that binding

  • quote prevents the evaluator evaluating an expression

  • if provides the conditional consequent / alternative without the evaluator evaluating either

  • lambda (or function in Idio) lets you define abstractions which you can subsequently invoke – these are the derived forms

  • define-macro (or define-template in Idio) lets you define your own “special form” – special in that the arguments are not evaluated – albeit all you can do is return more code for the evaluator to evaluate.

There are other special forms which have a genuine need to be handled specially – think of anything that needs to manipulate internal C state – and, of course, some that have snuck in because it is convenient etc..

So, the premise of the main evaluation loop is simply to look at the expression to hand and determine if it is special, a template or otherwise treat it as a derived form or constant.

Lispy languages always have the functional part in the first position of a list so, if the expression is a list we simply need to look at what the first element is.

idio_meaning() in src/evaluate.c (a debatably poor name as it is the evaluator but almost everything is called idio_meaning_something!) has a big test:

IDIO idio_meaning (IDIO src, IDIO e, IDIO nametree, int flags, IDIO cs, IDIO cm)
{

    if (idio_isa_pair (e)) {
        IDIO eh = IDIO_PAIR_H (e);
        IDIO et = IDIO_PAIR_T (e);

        /* e is (eh ...) */

        if (...) {
            ...
        } else if (idio_S_quote == eh) {
            ...
        } else if (idio_S_function == eh) {
            ...
        } else if (idio_S_if == eh) {
            ...
        } else if (idio_S_set == eh) {
            ...
        } else if (idio_S_define_template == eh) {
            ...
        } else if (idio_S_define == eh) {
            ...
        } else {
            /* could be a template */
            if (idio_isa_symbol (eh)) {
                if (idio_expanderp (eh)) {
                    return idio_meaning_expander (e, e, nametree, flags, cs, cm);
                }
            }

            /* default is a function call */
            return idio_meaning_application (src, eh, et, nametree, flags, cs, cm);
        }
    } else {
        /*
         * do something with:
         *
         * symbols: (de-)reference them
         *
         * constants: quote them -- evaluate (12) -> 12
         */
    }
}

and, without suggesting that that is everything, in fact, that single (large) conditional clause is the guts of a Lisp evaluator.

The idio_meaning() function is physically large because it also embeds the initial syntactic checking.

For example, quote takes a single argument to be quoted. Which means that no argument, (qoute), or more than one argument, (quote 1 2), must be caught and flagged as errors.

These are slightly obscure and might not happen in practice – as most use of quote is through 'expr where the reader ensures that there is only one expression passed to quote – but we should flag up the error to catch wayward typing.

This testing could be devolved to the specific special form handler, idio_meaning_quotation(), in this case. Yeah, maybe, but I’ve started so I’ll finish.

The Result of the Meaning

Not yet defined is what idio_meaning() is meant to return. What is it meant to do?

Our goal, from inferring some meaning from the lists of lists the reader gave us, will be to head off to the code generator so we probably want something amenable to that.

In our case, we’re going to have the evaluator generate some “intermediate code.” By this we mean to have reduced the source code expressions down to some high level statements of intent with a vague eye on how the virtual machine works. I confess, that’s not a terribly clear description as, for me, it’s a bit hard to describe without showing examples (coming in the next section).

You can imagine, though, in our highfalutin source language we bind variables to values whereas in the grubby world of machine code we’re going to “set” something.

The intermediate language has a group of constants, IDIO_I_some_thing – with the _I_ for intermediate, which, when we’re finished doing whatever we intend to do with intermediate code, will be translated reasonably straightforwardly into our virtual machine’s machine code, another group of constants, IDIO_A_some_thing – with the _A_ for assembler.

Often, though, I’ll refer to SOME-THING arg meaning the corresponding assembly code written in a more Idio-sympathetic way.

The structure of the intermediate code is… you guessed it, a list of lists of lists. The code generator is expecting that, of course, but as it descends the tree of intermediate code statements it will eventually reach the point where it has to emit a stream of byte code, one intermediate instruction at a time.

In that sense the list of lists of lists becomes a depth-first sequence of instructions for the virtual machine.

Specific Meaning

I don’t want to go through all of the special forms but we can look at a few to get the general gist.

quote

idio_meaning() invokes a slightly truncated argument list with:

return idio_meaning_quotation (IDIO_PAIR_H (et),
                               IDIO_PAIR_H (et),
                               nametree,
                               flags);

which, on reflection, could be even shorter still as idio_meaning_quotation() is the straightforward:

static IDIO idio_meaning_quotation (IDIO src, IDIO v, IDIO nametree, int flags)
{
    ...

    return IDIO_LIST2 (IDIO_I_CONSTANT_SYM_REF, v);
}

in other words, only the argument to quote, the head-of-the-tail of the original e, is used.

What we’re doing is returning an intermediate instruction to create a “symbolic reference” to a constant from v.

We haven’t created the constant – the code generator will do that – but that is our intent.

What we imagine, then, is that the code generator will add v to the virtual machine’s array of constants and get back the integer index into the array. The code generator will then encode a corresponding IDIO_A_CONSTANT_SYM_REF and then the integer into the byte code.

When the VM runs it’ll hit the IDIO_A_CONSTANT_SYM_REF instruction which will prompt it to read an integer from the byte code and then set the val register to the element in the constants array (indexed by the integer it just read).

So, slightly indirectly, the current value being processed will be v.

The code generator is much more complicated as is tries to make a few educated guesses about how to speed things up. For example, the integer 1 is used “a bit” so it might make some sense to have a special IDIO_A_CONSTANT_1 opcode that simply deposits 1 in the val register and avoids a lengthy indirection via the constants array.

if

if is the canonical special form in the sense that it must not have its arguments evaluated before calling the “function” if – there is no function if, of course, its behaviour is encoded in the byte code generated from the special form’s behaviour.

The other Schemely aspect to if is that everything is “true” except #f.

As a side-effect, that means that #t’s existence is very nearly pointless as any value other than #f is true. However, people like a solid two values to choose from in a boolean set so we need to keep #t around.

First, of course, there’s a bit of argument checking. if takes two or three arguments: (if condition consequent alternate) and a variant for when there’s no “else” clause, (if condition consequent).

The latter causes us a problem when some wise-guy rumbles: (if #f consequent). Um, if must return a value – everything returns a value – yet there is no alternate clause… what gives? The Scheme answer appears to be: “void”. A special value suggesting “no computed answer.” The “void” value has no printed representation – well, it’ll come out as #<void> which the reader will reject – although you can test for it with the primitive predicate void?.

For the most part, you suspect it is used in situations where the result from the if clause is thrown away anyway. In the meanwhile, we have a shoo-in value for non-existent alternate clause, idio_S_void – another magic constant-symbol.

idio_meaning() invokes:

return idio_meaning_alternative (src,
                                 IDIO_PAIR_H (et),   /* condition */
                                 IDIO_PAIR_H (ett),  /* consequent */
                                 ehttt,      /* alternate -- could be <void> */
                                 nametree,
                                 flags,
                                 cs,
                                 cm);

In other words the full set of lexical state. This is because any of condition, consequent or alternate could be of arbitrary complexity.

idio_meaning_alternative() is the surprisingly concise:

static IDIO idio_meaning_alternative (IDIO src, IDIO e1, IDIO e2, IDIO e3, ...)
{
    ...

    IDIO m1 = idio_meaning (e1, e1, nametree, IDIO_MEANING_NOT_TAILP (flags), cs, cm);
    IDIO m2 = idio_meaning (e2, e2, nametree, flags, cs, cm);
    IDIO m3 = idio_meaning (e3, e3, nametree, flags, cs, cm);

    return IDIO_LIST4 (IDIO_I_ALTERNATIVE, m1, m2, m3);
}

where we recursively figure out the meanings of the three arguments and return them in a list with the IDIO_I_ALTERNATIVE intermediate code.

So, nothing interesting at all. The code generator for if is quite cunning, mind.

tailp

The only thing that will catch your eye is the use of IDIO_MEANING_NOT_TAILP (flags) which unsets the “in tail position” bit in flags. What’s going on here?

Let’s have a quick think about things in tail position. If your alternate expression is in the middle of a sequence:

define (foo) {
  this
  if condition consequent alternate
  that
}

then you assume that whatever is processing the sequence will have handled that this if is not in tail position so us unsetting the “tailp” flag is neither here nor there.

What if we are in tail position?

define (foo) {
  this
  that
  if condition consequent alternate
}

We know that one of two possible code sequences will apply: either the evaluation of the condition results in “true” and then we’ll run the code for the consequent:

define (foo) {
  this
  that
  condition
  consequent
}

or the evaluation of the condition results in “false” and then we’ll run the code for the alternate:

define (foo) {
  this
  that
  condition
  alternate
}

In both cases, though, the evaluation of the condition is not the last thing to be run. It is never in tail position hence we can scrub the flag when processing it.

Either of the of the consequent or alternate could be in tail position so we’ll leave the flag alone.

But notice that we don’t set the flag. We only ever disable it.

How does it ever get set, then? Well, it’s only ever set for the body clause of a function definition. The reason is slightly back-to-front.

The whole reason to have tail call optimisation is to avoid “blowing up the stack” by making too many nested function calls. Every function call tacks a bit more stuff on the stack – we save a bit of state in case the thing we call overwrites it – and that accumulated stuff will, eventually, add up.

If we know that we’re in a function call and the last thing we do in this function call is make a function call to someone else then we can skip any state preservation nonsense because whatever the guy we’re about to call is going to return is what we would be returning ourselves in turn. So this guy might as well return direct to our caller.

The details for returning to our caller are on the stack ready for us to use so instead of the full function invocation palaver we effect a sort of function “goto.” This next guy replaces me and, instead of returning a value to me, will non-the-wiser be returning the value to my caller.

So, this “tailp” trickery must require that we’re in a function call – otherwise the replacement and expectation about a function return won’t be on the stack – for us to enact it. Hence the “tailp” flag is only set during the evaluation of a function definition.

A function’s body, however, is usually a sequence – as in a block – in which case the “tailp” flag is suppressed for all but the last statement in the sequence.

Thereafter, whenever a function is invoked, when it reaches the last statement in the body, “tailp” would have been enabled during the evaluation of the meaning of that statement and if that statement resulted, ultimately, in a function call at the end, then the function call will be a function “goto.”

define

define introduces a variable at “top level” and then assigns a value to it, or, more properly, binds it to a value.

The English language expression, “assign to”, suggests that the variable might be a container for the value. In practice, most Idio values are allocated on the C heap and the underlying C IDIO values refer/point to the allocated heap memory – unless it’s a constant or fixnum in which case we squeeze it into the upper bits of the IDIO “pointer”.

So, correctly (most of the time), the C IDIO variable refers/points to some splodge of memory and, by extension, the Idio variable is bound to that splodge of memory (value).

If we subsequently “assign” a different value to the variable then in practice we are simply changing the reference in the IDIO entity to point at a different splodge of memory and the Idio variable is now bound to a different value.

The phrase “assigning a value to a variable” is endemic and mostly incorrect. However, it’s what we say.

“Top level” could mean a global table of known names or, as in the case for Idio, a module-specific table of known names.

This “top level” is usually described as the environment during Lisp language processing. Of course “environment” has an alternative meaning to us shell-people so I’m slightly loathe to use it. The virtual machine’s register is still env, though, as a throwback to our Scheme-ly origins.

You might ask why we want to define things rather than simply assign to them, auto-creating the name in the top level as we go? Well, I suppose we could (and, indeed, we can) but there’s an air of organisation and clarity if we’re defining things.

In addition, if a variable is defined before it is (otherwise) used – ie. there are no forward lookups of variables – then we don’t have to employ extra checks to ensure a variable was eventually defined and we’ve not just been left hanging in the wind, here.

define itself has a couple of forms it can be used in:

  1. define name expression – for the straightforward assignment/binding of name, a symbol, to some value resulting from the evaluation of expression

  2. define (name formals*) expression – for the definition of a function with the reultant function value assigned to name

    expression will most likely be a block:

    define (name formals*) {
      ...
    }

    This second form is the equivalent of:

    define name (function (formals*) {
      ...
    })

    and this rewrite is exactly what the evaluator does.

    You’ll note the extra parentheses around the function definition which, in the first instance, mean that define isn’t given an arbitrary number of arguments but just two, the name and expression, and secondly give the impression (realised in practice) that like any other argument, say, (+ 1 2), the anonymous function definition is instantiated into a function value and it is the function value that is passed to define.

    We’ll see this rewrite in a second.

I’m as lazy as the next guy so the := operator has been co-opted into use as a synonym for the first form of define: name := expression.

Of course, if it’s the second form, ie. the second argument is a list, and we’re implicitly constructing a function from it then we need to re-tag the newly created function with the source code properties of the original.

idio_meaning() invokes:

idio_meaning_define (src, IDIO_PAIR_H (et), ett, nametree, flags, cs, cm);

where idio_meaning_define() looks like:

static IDIO idio_meaning_define (IDIO src, IDIO name, IDIO e, ...)

Here, name might be a symbol or a list – depending of which form of define was in use.

If name is a list then we know it is (name formals*) so we can extract both name and formals* (the head and tail of the incoming name) to construct a new function, rewriting both name and e in the process:

if (idio_isa_pair (name)) {
    /*
     * (define (func arg) ...) => (define func (function (arg) ...))
     *
     * NB e is already a list
     */
    e = idio_list_append2 (IDIO_LIST2 (idio_S_function,
                                       IDIO_PAIR_T (name)),
                           e);
    name = IDIO_PAIR_H (name);

    idio_meaning_copy_src_properties (src, e);
}

If name wasn’t a list then this is a simple assignment/binding and we can do a quick check on e as that should just be a simple expression.

if (idio_isa_pair (name)) {
    ...
} else {
    if (idio_isa_pair (e)) {
        e = IDIO_PAIR_H (e);
        idio_meaning_copy_src_properties (src, e);
    }
}

this means that define name expr1 expr2 ... is quietly reduced to just define name expr1. Perhaps we should complain more?

Next we need to look name up. It might already exist. In fact, it might be a lexical variable! In both of those cases, we’ll simply be reverting to assignment of the existing variable – not creating a new one.

IDIO sk = idio_meaning_variable_kind (src,
                                      nametree,
                                      name,
                                      IDIO_MEANING_TOPLEVEL_SCOPE (flags),
                                      cs,
                                      cm);

/* some top level variable creation hocus-pocus if required */

IDIO_MEANING_TOPLEVEL_SCOPE (flags) is used to indicate what sort of variable should be created if an existing variable is not found (hint: a toplevel variable).

The “hocus-pocus” is important – though the details aren’t as it’s a bit bespoke – in that if the result of the variable lookup does not have a VM variable array index associated with it then we generate one right now. We are defining the variable, it definitely exists.

Almost done. We now have an existing or new (top level) variable in our hands so we can do the real action, the assignment which, given that assignment, = or the Scheme-ish set!, needs to be handled in its own right simply means we can jump on the back of it:

return idio_meaning_assignment (src,
                                name,
                                e,
                                nametree,
                                IDIO_MEANING_DEFINE (IDIO_MEANING_TOPLEVEL_SCOPE (flags)),
                                cs,
                                cm);

We pass in a “define” flag with IDIO_MEANING_DEFINE (flags) which adds a prefix to what the assignment function will generate.

We could pull the prefix code the assignment function adds back here but two other places (defining dynamic and environment variables) also do the same. So, put the prefix code in three places or one?

Assignment

Assignment is a lot more interesting. Remember it’s called directly as well as from define.

A quick recap on the various ways we might stumble over the assignment of, in particular, a free variable. If we have previously defined a variable (or are in the act of defining one) then we should have an index into the VM’s variable array to hand, vi, and can perform the assignment directly with a GLOBAL-VAL-SET vi instruction.

On the other hand, if we’re mid-function assigning to a variable we haven’t seen defined yet, ie. a forward reference, then we ought to complain if, come the time of assignment when the code is run, the variable had never been defined. That’s poor form on the part of the coder (bad user!).

This is where it gets a little tricky. We know the variable is used – we’re about to assign to it – but we need to know separately whether the variable was defined. So the variable lookup also returns the extra information – in particular it returns 0 (zero) for the VM variable array index.

Under these circumstances we need to have the VM perform a check, which means a different opcode, GLOBAL-SYM-SET ci, where we require to pass in an index into the VM’s constants array in order that we can dig out the symbol and perform the necessary lookups (through the module’s top-level and the exports of its imported modules) to find out if its been defined yet.

Clearly, this isn’t as lean a process as simply assigning to a known variable. What is worse is that we cannot change the opcodes (it’s been a while since you’ve been able to modify assembler mid-run – think: read-only .TEXT segments – and we should not be bucking any trends here) so this assignment will always have to perform this relatively convoluted lookup to get the variable array index it ultimately needs to do the real assignment.

When I get round to Pre-Compiled Modules which will require a double dereference for pre-compiled byte code brought in “from the cold” then I suspect that all the generated byte code will fall into line – for consistency if nothing else.

The only thing that will lose out are any known direct variable assignments, GLOBAL-VAL-SET vi, which would be replaced with a double dereference.

Unless it’s left in as an option.

Anyway, back to assignment in idio_meaning_assignment().

We’ll skip the bit about Setters (too advanced) and syntax checking (too dull).

We’ll figure out the meaning of the expression passed in:

IDIO m = idio_meaning (e,
                       e,
                       nametree,
                       IDIO_MEANING_NO_DEFINE (IDIO_MEANING_NOT_TAILP (flags)),
                       cs,
                       cm);

which handles two things:

  1. the expression is not being evaluated in tail position

    This is the expression on the right hand side of an assignment. It will be evaluated before the assignment itself and therefore cannot be in tail position.

  2. we turn off the “define” flag (if it was turned on)

We’ll then lookup what kind of a variable name is. If the variable didn’t exist previously then it will now, as a top level one, except it’ll have no value index associated with it.

The kind of variable is now important as it affects the code we want generated:

  • if it is a lexical variable then we can generate SHALLOW-ARGUMENT-SETi or DEEP-ARGUMENT-SET d i code as appropriate where the variable lookup will have informed us of the relevant values for d and i (and it’s a “shallow” reference if d is zero)

    These can be return immediately, there’s nothing more to do.

  • if it is a top-level variable then:

    • if we haven’t seen a definition yet then we can generate a GLOBAL-SYM-SET ci assignment

      assign = IDIO_LIST3 (IDIO_I_GLOBAL_SYM_SET, fmci, m);
      
    • otherwise we can generate a GLOBAL-VAL-SET vi assignment

      assign = IDIO_LIST3 (IDIO_I_GLOBAL_VAL_SET, fgvi, m);
      
  • if it is a dynamic or environment variable we generate a GLOBAL-SYM-SET ci assignment

  • if it is a computed variable we generate a COMPUTED-SYM-SET ci with or without a definition tag and return immediately

  • if it is a predefined variable – ie. a primitive – then there’s a bit of a dance regarding templates which might get run between now (when we’ve just created a new toplevel variable overriding the predefined variable) and when the byte code is run to (re-)define this new toplevel variable.

    So we temporarily set the new toplevel to the old predefined value.

    In this sense, there is a general assumption that if you intend to redefine map, say, then your intention is to create a new function to iterate over lists, applying a function and collecting a result and not, say, go off on some cartographic odyssey.

    Maintaining the old functionality until the new functionality is defined seems sensible enough.

Finally, then, we can return either the assignment or the assignment with a “define” prefix:

if (IDIO_MEANING_IS_DEFINE (flags)) {
    return IDIO_LIST2 (IDIO_LIST4 (IDIO_I_GLOBAL_SYM_DEF, name, kind, fmci),
                       assign);
} else {
    return assign;
}

sequence

For a sequence of statements it is quite important to squash the tailp flag for all but the final statement.

Otherwise the three sequence functions, begin, and and or, only really differ by:

  1. their default value if they are not passed any arguments:

    • (begin) is “void” (see if)

    • (and) is #t

    • (or) is #f

  2. how they decide to stop processing the sequence and what value to return

    • begin – stop when it gets to the end of the sequence and return the value of the last expression

    • and – stop if any value is #f and return the last value computed

    • or – stop when any value is not #f and return the last value computed

    Remember, these are the sequence functions not the and and or operators.

They are processed identically, though, at this stage.

Assuming they do have some arguments idio_meaning() calls:

return idio_meaning_sequence (et, et, nametree, flags, eh, cs, cm);

where eh will be begin, and or or and et will be the argument expressions.

idio_meaning_sequence() does a quick test:

  • if the arguments are, in fact, a single argument then we call idio_meanings_single_sequence() which (recursively) returns the meaning of the head of the list of argument expressions.

  • otherwise we would have followed in the footsteps of LiSP in calling a function idio_meanings_multiple_sequence() except LiSP, using the underlying Scheme implementation can recurse to its heart’s content whereas we will eventually blow up our C stack if the sequence is too large.

    The exemplar “large sequence” is that sequence of statements in a large source file.

    In practice, then, we convert the Schemely recursion into a C-friendly iterative loop and walk down the list of arguments converting each one in turn into some meaning and tacking it onto a list.

    Technically, we push it onto the front of a now reversed list of meanings which, come the end of the loop, we reverse.

    However we have managed it, we have a correctly ordered list of meanings onto the front of which we tack the IDIO_I_sequence intermediate code – IDIO_I_BEGIN etc..

In other words:

  • (and e1) becomes just m1 (converting an expression into a meaning)

  • (and e1 e2 e3) becomes (IDIO_I_AND m1 m2 m3)

module

As mentioned previously, the evaluator cares about the current module and the virtual machine… not so much. The virtual machine does retain the value for the current module if only to have a value to return for (current-module).

The evaluator, of course, needs to keep track of the current module so it can figure out which v you are referring to.

Today it all just works but back when I was loading files a little differently, module and friends, required some evaluator support. This section gives a little history you might learn from.

First, a quick diversion.

In the source code you’ll be using module m to change module. module is a template, though, partly because it needs to be concomitant with load.

We have a semantic problem in that if you load in a file which, at the top, says module foo then when do you stop being in module foo? Naturally, you will say, at the end of the file. When is that, given that you are reading a sequence of statements from the file?

There needs to be a hook into load to handle this – but not the hook you necessarily expect. load could also fail and quit early because of any kind of error when reading and evaluating the file. You would expect it to “unwind” the module statement then too.

For handling modules I’ve taken an idea from STklos, that of a module “stack” and lets you nested (define-module name & body) statements. define-module will catch any conditions an unwind the module stack.

I don’t actually use define-module but rather have a simple module name statement which flips the rest of the file (or to the next module statement) into module name.

I did add an (in-module name & body) which functions identically to define-module but just feels better purposed. Not that I use it much but it can come in handy.

OK, when we run load it needs to be module-aware – and condition aware! – and reset the current module back to whatever it was when load started. And remember to return the result of the actual (original) load call – not that many people will look at it.

Back to the evaluator. In fact, back to when I was entertaining myself with the idea of reading all the expressions in from the source, evaluating them all then running the generated code from them all. (Rather than, read, evaluate and run one expression, read, evaluate and run the next expression, etc..)

The module statement – as the evaluator sees it – isn’t going to change the sense of the current module until we actually get round to running the code which is going to be ages away after we’ve evaluated the rest of the statements in the file. The very statements that want to know they’re in a different module. Hmm.

The above diversion tells us that module is a template – which ultimately calls the primitive %set-current-module! name. It seems we have a choice, we could replace the primitive %set-current-module! with a special form (which makes a single function call to set the mod register in the VM) or we could have the evaluator spot module as a special form and then run the expander code for module anyway.

For some reason I did the latter. I think it’s because %set-current-module! can be given a parameter rather than a symbol and therefore the evaluator won’t know the value of it until the code is run. module, on the other hand, is forced to be passed a symbol (because it’s a template).

Anyway, for the evaluator, when we see the module statement, we’ll steal the argument (which must be a symbol because module is a template and so won’t have had any arguments evaluated) and set the current module directly here and now. This immediately affects all future variable lookups which will use the current module as its starting point.

This feels slightly wrong. We’re changing the state of the currently running process whilst evaluating and therefore before any code is run. However, it does mean that the evaluator has the correct sense of the current module and subsequent variable lookups do the right thing.

Also note that nothing has set the module back to its original value. We rely on the improved concomitant load to do that work for us.

There is a similar knock-on effect on module imports and, arguably, exports, as, in particular, module imports need updating immediately in order that the rest of the statements can successfully use variables exported from other modules. We can’t wait until the code is run before knowing what we’ve imported from other modules.

So, the problem here is entirely the “all in one” loading method. If we read, evaluate and run a statement at a time then everything just falls into place.

Last built at 2024-10-18T06:11:17Z+0000 from 463152b (dev)