Pre-Compilation I

This has been a huge investment of time and effort with a pragmatic but incomplete result. I’ve conflated ideas, tried to pursue too much at once, missed swathes. A hopeless mess. Let’s get into it.

The basic plan is to replace the live evaluation of source code when Idio starts up with a system that reads in the pre-compiled byte code saving time and effort.

Many Lispy languages support the notion of undumping where, having parsed all of the startup source code into its internal (byte code) form, the system saves that internal form back out ready to roll for next time.

The Problem

The crux of the problem with reading in some byte code it that that byte code cannot know anything about the running Idio engine when it starts. Well, it can make some presumptions about the set of primitives in the idio executable, probably, but it can’t know whether, say, the symbol load, refers to a primitive or a closure. (In fact, load will refer to both, and be used in the wrong order!)

This is individually true for every file loaded in as none of the loaded files can predict what a previously loaded file has done to modify the state of the Idio engine.

The fix to not knowing anything is to have to look everything up. Easy!

Strategy

I ran around some thoughts previously in Pre-Compiled Modules and hopefully have brought it all together here.

A couple of things from that discussion didn’t sink in at the time. The first is the x86_64’s use of relative instruction pointers (RIP) into the GOT to access variables for the loaded object which might be extremely convenient there but does seem to my eye to be a touch confusing. A naïve x + x would use two different RIPs to access the same variable. How would you keep track?

The second was the global offset table (GOT) where all the doubly-dereferenced entities live. I couldn’t quite see where the GOT was kept as it is loaded-object specific. It turns out that it is in a register. Which register? Why, any register, of course, it’s up to you to maintain it and apply the appropriate register to your instructions.

This then led me to the idea that the VM’s threads should have another “register” maintaining a reference to whichever set of tables needed double dereferencing for this block of code.

For the VM

That leads us to the idea of a table, or series of tables, where the byte code (restricted to small positive integers) will embed a symbol index, si, which will be mapped through (some other tables) to a global value index, gvi, containing the value the VM wants to operate on. We need a common global value index because my symbol index to the global variable A is different to your symbol index for A. If I then used my own value index for A then how are you supposed to know what to access/update?

In between symbol index and global value index we want to:

  • do a cached lookup of the gvi immediately

  • otherwise, a one-time:

    • recover a symbol from the loaded object’s tables

    • do a symbol lookup of the symbol through the current and imported modules (which will contain the global value index or be auto-vivified as a symbol – think ls)

    • set the cached gvi for next time

    • use the gvi as above

This creates the idea of an execution environment, or xenv, one per loaded file, with an associated xi, the index into the table of all xenvs.

I settled on a basic three tables for the VM:

  1. a symbol table mapping the symbol index, si, to an index into the constants table, ci

    You only need symbol indices for things you need to lookup (as though they were variables). symbol symbols are just constants (if that makes any sense!):

    Idio> printf "%s is %s\n" 'load load
    load is #<CLOS load [14]@50196/0x7f2dc6f7f610/Idio>
    

    Here, we quote the first use of load, as in 'load, to access the symbol load which will result in a lookup of some constant index, ci in the byte code. The second use of load is as a variable which means we must do a SYM-REF of a symbol index, si, to force the VM to resolve a gvi.

    In particular, the si for load (the second) will map to the same ci for 'load as they reference the same symbol and there can be only one!

  2. a constants table containing all of the, uh, constants used in the loaded file

    Here, constants are, essentially, anything that isn’t a small positive integer that can be embedded into the byte code – think symbols, strings, bignums etc..

    In the example from above, there will be constants for the symbol printf, the string "%s is %s\n" and the symbol load (only once, there can be only one… [Ed - stop it!] )

    There is an associated constants hash table which reverses the constants table lookup – very handy.

  3. a values table which maps loaded file value indices, vi, to the global value indices, gvi used by the VM

    A gvi of 0 (zero) is a sentinel value indicating that the global value index hasn’t been looked up yet.

    Again, for our example, there will be value indices for the symbol indices for printf and load.

Here, it is critical that the si and vi align. That’s because the very first thing we do with a si is query the values table at slot si to see if the corresponding index is non-zero. If non-zero we can access the global values table straight away.

If it is zero then we’ll have to:

  1. use the symbol table to map the si to some ci

  2. use the constants table to map the ci to a symbol

  3. lookup the symbol information tuple in the current environment (or create a new symbol tuple for the auto-vivified symbol)

  4. extract the gvi from the symbol information tuple

  5. set the values table at index si (the si and notional vi align) to gvi ready for next time

  6. use the gvi index into the global values table as before

Note, however, that for operational reasons, you will have more value indices (values you need to use) than you will have symbol indices (symbols you want to lookup) leaving the symbol table with some gaps.

A prominent example, here, is the previous work to separate the creation of closures into two parts: a one-time CREATE-FUNCTION and a per-use CREATE-CLOSURE. The CREATE-CLOSURE needs to refer to the value resulting from the CREATE-FUNCTION and a value index is reserved for that purpose leaving the corresponding symbol table index unused.

There are some knock-on effects of xenvs in that, say, any closure defined in an xenv will have embedded in its byte code references to that xenv’s tables. As a consequence, the definition of a closure must now include the xenv or, rather, the xi, it was defined in such that those lookups will be correct. Similarly, the arguments constant index, aci, for a frame refers to a constant in the current xenv and so to decode it, the frame needs to include the xi for the aci.

You can see that in the printed form of the closure load, above: #<CLOS load [14]@50196/0x7f2dc6f7f610/Idio> where the [14] is indicating that load was defined in xi 14.

The VM also has some other tables of interest:

  1. a source code expressions table

    This is an array of source code expressions.

    This exists whilst the source code is being parsed but is not saved as it is not possible to recreate this data sensibly without re-parsing the source code.

    Consider, ((1 + 2) * (3 + 4)) where the reader can successfully maintain references to the internal sub-lists of the main expression but when written out becomes three distinct lists, ((1 + 2) * (3 + 4)), (1 + 2) and (3 + 4), with the larger one repeating the smaller two. This duplication of lists balloons the memory requirements enormously.

    It may be reported in the disassembled code if the --vm-tables flag is passed to idio but otherwise is lost.

  2. a source code properties table

    (This is one-for-one with the source code expressions table, above.)

    This is an array of (fi line) tuples where fi is the index into the constants table for the source code’s file name.

    The source code properties table is what is indexed by the SRC-EXPR opcode.

Modules

The symbol/variable lookup works much as before in that there is a module-specific tables of known (and possibly) exported symbols which contain some internal symbol information data.

That symbol information is a tuple of:

  • the kind of symbol ('predef (primitive), 'toplevel, 'dynamic, 'environ, etc.)

  • an si

    Up to r0.2 this was an unreliably module-specific ci.

  • a ci

    Up to r0.2 this was a global ci.

  • a gvi

  • the module the symbol was defined in

    This is useful for debugging as, in order to get this tuple, you must have known the correct module to ask the question of in the first place. In other words, a module is already known. However, as modules import names from other modules, some import history is observable.

  • a descriptive string

    This is useful for debugging as it usually hints at the last modifier of this data. Although this isn’t reliable.

Evaluator

In addition, the evaluator wants to maintain slightly more information, essentially an association list of the symbol and the symbol information tuple.

It’ll keep track of things in an evaluation environment, or eenv.

Three Problems

I tried to do three things at once which, you might guess, didn’t end well.

Maintain the Existing Mechanism

As I already had a perfectly good working system, despite it not implementing this new strategy, I thought I could maintain it. The plan was to pre-compile the known startup files (using lib/evaluate.idio rather than src/evaluate.c).

Here, several things went wrong:

  • a lack of surety that lib/evaluate.idio actually does the same thing as src/evaluate.c

    The problem here being that, whilst lib/evaluate.idio can run the test suite quite happily, the test suite doesn’t exercise the evaluator quite as much as the startup code does.

  • the pre-compilation was flawed (see below)

  • lib/evaluate.idio, via lib/expander.idio, implements a different template expansion mechanism than src/evaluate.c, via src/expander.c

    This means the resultant (disassembled) code is hard to compare (easily).

  • when you load the pre-compiled code, using the new strategy, back in, the Idio engine behaves differently to the way it behaves when evaluating the startup code “live.”

    This makes debugging very difficult as the engine is following different paths through the code base.

In the end I junked the idea of a special “first pass”, reworking it, and therefore everything, to use the new strategy.

Pre-Compile

I hadn’t thought pre-compilation through properly.

Here the problem is that the behaviour of the code is dependent on its current state. If one expression would have modified that state then the next expression may react differently. Most of the time this isn’t an issue as the VM would “just” look a value up or, at least, just use a value as modified by the previous expression changing it.

However, we made the judicious decision to allow predefs (primitives) to be shadowed by (hopefully) improved versions in the form of closures. The problem here is that the calling conventions for primitives and closures are not the same. Internally, once the arguments have been computed you can simply call the primitive’s underlying C function with the various arguments. For a closure, we construct a frame and copy the argument values into the frame, change the PC to the start of the function and have the closure code extract arguments from the frame. The creation and filling of the frame is completely unnecessary for primitives.

For pre-compilation we have to pretend to modify the program state so as to know whether the use of a symbol will be for a predef or a closure.

As it stands, though, the code modifies the real program state not a pretend one. That’s sort of OK for source code you’ve not seen before (although, in effect, you are running the source code, not performing a standoff-ish pre-compilation) but is close to disastrous for source code you have seen before such as the startup code.

For the startup code we are trying to distinguish between running the code for the first time and any subsequent runs which we don’t want to modify the program state because it affects the running program…

A great example, here, is the use of operators by the reader. Having started up we now have a full set of operators and we want to run lib/evaluate.idio over the startup code. However, as the startup code is being read in, which would normally have no operators present (as they haven’t been defined yet) operators get applied because we’re a fully operational[sic] system.

I did try an entertaining method of removing the user-defined operators then carefully re-adding them as the user-definition appeared. That (surprisingly) worked but didn’t help with the other modifications of real program state.

Of course it took a while to realise that in modifying the extant state I wasn’t pre-compiling but merely sowing confusion.

Implement the Strategy

Actually implementing the strategy is a thing in itself, of course, with its own set of problems and nuances.

We’ll cover this in detail in the next sections. In the meanwhile…

Undump

I spent a lot of time debugging and requiring more and more detailed dumping of state, the sort you can see with the --vm-tables flag, before realising that the “live” state I was dumping was, in fact, the very thing I should be using. All I needed to do was reformat the output into the chosen cache file format and… Boom! Job done.

Of course, this is exactly the undumping described previously. Ho hum.

Implementation

It’s a little bit complicated with a symbiotic relationship between xenvs and eenvs. Here, it transpires, as the evaluator reads in source code it must maintain the newly extended source, constant and value tables as well as generating the byte code for the VM.

The reason for this is that any interactive session must cycle between reading in source code and running the compiled byte code. The evaluator sets everything up ready for the VM to roll.

The VM also needs to be able to refer back to the original source file used by the evaluator so it maintains a reference to the eenv through the xenv.

Evaluator Changes

Much of the evaluator’s new behaviour has been abstracted behind some idio_meaning_find_scope_symbol() (and meaning-find-scope-symbol) functions which, if the symbol is not already known to the evaluator will call idio_meaning_extend_tables() (meaning-extend-tables) which looks to ensure that the symbol and values tables are aligned.

The symbol itself can be added to the constants table (and hash), if necessary, and that ci used in the symbol table.

The values table is extended to keep it aligned but note that, at this point, no gvi can be known and the value table entry is set to zero.

It will grow the (association) list of known symbols (in this eenv) and, optionally, set the module’s symbol information tuple with the same information (including the zero gvi).

xi 0

xi 0 (zero) is the xenv for the global tables, notably the global values table which everything else indirectly seeks to refer to.

C add_primitives

Most of the C modules have an idio_X_add_primitives() function to bootstrap the system with several hundred primitives.

There’s nothing to stop module A defining the same named primitive as module B – it does happen: number? is either the regular number? (for fixnums and bignums) or the C-specific C/number? that identifies C numeric types. So the code in idio_meaning_predef_extend() needs to select a module-specific eenv to add the primitive into.

This is the primary reason there are so many xenvs “prior to” starting up: they’re the result of the evaluator creating a separate eenv (and therefore xenv) for each Idio module that has had a primitive defined for it. These are really just holding spaces with no associated code.

This addition of a primitive is one of the rare occasions that a value is known in the here and now for the evaluator. Accordingly, it can request a new global value index and retro-fit that gvi back into the symbol information tuple created by the extend tables code.

Evaluation

Evaluation proceeds much as it did before except that name lookups are performed through the eenv-specific list of known symbols rather than resorting to any recursive lookup through imported modules.

The resultant symbol information tuple now contains the all important si for the symbol which is going to be embedded in the byte code.

This means that the first time the evaluator sees the symbol, say, printf, in your code it will fail to find it in its list of previously seen symbols and so extend its tables for this eenv to add printf as a new constant, an si mapping to that new ci and a new values table entry of zero (meaning, the VM needs to look this up).

This is deferring that resolution of printf to runtime. It only needs to be resolved once (as the resolving code will have updated the values table index with the real live gvi) but it is delayed to runtime.

This is the correct behaviour for a “pre-compiled” file being loaded in. It doesn’t know how this idio executable has laid everything out so it must look it up at runtime.

Shadowing Predefs

This doesn’t happen much but it does happen with a rather critical function, load, so we need to sort it out.

The primitive load opens the file and tries to evaluate and run every expression. That’s great until there’s an error or even where the loaded file redefines the current module. Who resets everything?

That’s when our closure load (in lib/bootstrap/module.idio) comes into its own. It utilizes unwind-protect to ensure that the current module is reset to whatever it was before load was invoked. Of course, otherwise, this load simply calls the primitive load.

As noted previously, the calling convention for primitives and closures is different therefore we need to ensure that we know when which variant of load is being called.

In practice, the closure load isn’t a definition, it is an assignment to the symbol load because the closure is being defined inside a block to retain access to the original primitive load, something like:

{
  orig-load := load

  load = function (filename) {
    ...
      orig-load filename
    ...
  }
}

We can’t define load inside the block as it will become a block local variable (who’s scope ends at the end of the block).

So we need to capture assignments to a symbol which is currently flagged as a predef.

We can’t just override the value because somewhere there will be byte code (you imagine the call to orig-load filename being one of them) which is expecting a primitive and it’s calling convention and, at the same time, we need to prep for code that is expecting our new shiny closure and its calling convention.

So we need to extend the eenv’s symbols again with the same symbol but now different information. Future lookups of load will get a symbol information tuple for a regular toplevel definition and do the right thing.

xi Predefs

Part of the problem, here, for load is that the assignment happens before the primitive is called. Technically, the primitive load is invoked in order to load bootstrap.idio but that happens in a non-xenv-specific way. When the bootstrap code is run, load (closure or primitive) is not invoked until after we’ve performed this assignment which makes you wonder who set up the original association between some si and the primitive load in this xenv in order that the assignment to orig-load at the start of the block finds anything at all.

Answer: no-one!

So we need to record the set of predefs that this xenv uses in order that, having loaded the compiled code file in we can run round the set of used predefs setting up the value table mapping from si to the gvi that the C “add primitives” code set up.

xi Operators

In a similar way, we might define an operator in this eenv/xenv but operators always exist in the operator module, not the current module. That’s because their usage is effectively global in that they must be visible through the reader which looks for them in a consistent place, the operator module.

That said, the definition of the operator closure is in this eenv and all of its internal symbol references will use this eenv’s symbol table etc..

So, operators are handled slightly differently to other toplevel definitions in that they exist in this eenv but their symbol information tuples are set in the operator module.

This distinct set of known operators is saved and restored for no good reason other than general tidiness.

Stack Values

The VM will push some “values” onto the stack as it is processing. For example, the condition a trap is for, dynamic and environment variables.

What we can’t do is push values that are xenv-specific as we won’t know which xenv we are referring to. We need to push values that are globally recognised.

That’s not a big deal but it is a deal we need to do.

For all of them, we’ll pass an si in the usual way and have the runtime code discover the appropriate current global value.

Traps

For a trap we want to push the global equivalent of the si’s ci. So that’s an extra constants_lookup_or_extend in the global constants table (in xi zero).

Dynamics and Environs

For dynamic and environment variables we’ll (continue to) push the gvi of the variable.

That allows separate bodies of code to temporarily create a shadowing dynamic/environment variable, which will use a new gvi.

Invoking

--save-xenvs

Operational Effects

It’s a mixed bag, I’ll be honest.

In the first instance, loading the compiled startup files, there is a huge runtime saving. When evaluating the code, the first time, the evaluator has to expand all the templates which is very expensive. This results in some 7.5 million opcodes being run, all in, by the VM.

The resultant code is at least an order of magnitude smaller requiring around 208 thousand opcodes to be run.

However, the constant double dereferencing far outweighs that benefit and the overall effect is that the test suite, say, takes about 15-20% longer.

It’s not just the double de-reference per se, it is the jumping about in memory and the complete lack of cache coherence in the code that is the problem.

On reflection, it might be necessary to revisit this with a view to a leaner “first pass” with the startup code (and only the startup code?).

Here we’d be going back to the idea of using xi zero directly, ie. the si, and therefore nominal vi, is the gvi and needn’t be dereferenced again.

Problems Problems

We have a problem if C code wants to look a variable up as no xenv specifically represents a module.

We have a problem if a loaded file redefines a value already set in this file.

We have a problem in that gensyms are saved.

Last built at 2024-09-07T06:11:23Z+0000 from 463152b (dev)