Extensions Overview

We’ll use the JSON5 extension as an example which defines a json5 module.

Inspired by Simon Schoenenberger’s standalone C library for JSON5 I wrote a standalone JSON5 parser/generator which is then bundled into an Idio extension. The standalone part is important, for our purposes, as it demonstrates the ability to hook-in non-Idio code.

There is a little hoop-jumping, perhaps, in that to provide a common base, the standalone code needs to copy the (generated) Idio src/usi.[ch] files and uses its own, reduced, usi-wrap.[ch] files to create a nominal libjson5-bare.so shared library.

To create an Idio extension I’ve created two C source files (one would have been enough!):

  • json5-module.c

    Here, I’ve put the error handling functions and generic module functions, idio_init_json5 () etc..

  • json5-api.c

    Which, like src/lib-api.c, provides Idio primitives for the underlying JSON5 library functions.

The core JSON5 library files, json5-token.[ch] etc. are identical for the standalone and Idio libraries and make no reference to Idio features.

Ideally, both the Idio module and the standalone code would call the same libjson5.so but in this case we have re-used the USI code which would make things a mess.

How and When?

How and when do we know if there is a dynamic shared library to be loaded? There’s lots of possible mechanisms so let’s consider something really basic.

In the bowels of the loader there’s a table of readers and evaluators by filename extension. In practice there’s only one reader and evaluator, today, but we can imagine, say, a reader for pre-compiled Idio code.

We can extend that table, initially for an extension of .so with some dummy reader – there’s no Idio “reading” (as in REPL) to be done for a shared library – with, and this is our choice, the .so extension coming first. We can further extend the table with prefixes and suffixes so that we can try a variety of constructed filenames from a given root, eg. libfoo.so and foo.idio from the original request to load foo.

If, then, when we say load json5, we find libjson5.so (and the “reader” is our dummy reader) then we can choose to dlopen(3) the shared library and initialise it.

Extending this mechanism a little further and, again, our choice, we can say, “surely the user will have an associated Idio file containing relevant functionality?” and, look for an adjacent json5.idio file to be loaded with the normal Idio reader and evaluator.

Note that that is an adjacent .idio file and not one in a far flung IDIOLIB directory. The idea being that these two were installed and meant to be run together. Who knows what that other json5.idio file is expecting?

Obviously, the .so and .idio might not be in the literally same directory but certainly in the same hierarchy in a constructable fashion.

There are any number of problems with architecture-dependent files (shared libraries being one such type) in shared filesystems where you would want The Right Thing™ to happen and so we should expect architecture-dependent and architecture-independent subtrees to form.

Modules

The modules code is table-driven in the sense that as modules are initialised in the C code-base, they register a “finalizer” function, idio_final_module (), which can unwind any data structures and generally free up memory allocations. Those finalizers are called in reverse order of initialization.

For our extension we need to be able to hook into that mechanism so the action immediately after loading a shared library is to dlsym(3) and call the idio_init_module (void *handle) function, idio_init_json5 (handle), in this case.

The void *handle parameter is the dlopen(3) return value. I did originally use a regular GC finalizer but it transpired that the GC would choose an unfortunate time to invoke it, and the corresponding dlclose(3). Notably before the idio_final_module () function was attempted to be called.

That means the modules tables are extended by an optional void *handle which can be dlclose(3)’d at a safe time.

This also means that all extension shared libraries must have an idio_init_module (void *handle) function. Whether that function chooses to register a finalizer is up to it. It could be a no-op but it must exist.

Note

This function signature differs to the core Idio modules defined in src which do not take the void *handle argument. They weren’t dynamically loaded so don’t need a handle to be closed.

I suppose you could change all of them to accept a void *handle and pass NULL when calling them but that would appear disingenuous to my eye.

Source Code

The initialising function, idio_init_json5 (void *handle), really is our only required hook – and it needn’t do anything.

However, we do want to do something for our JSON5 module so there are some bits and pieces dotted about.

JSON5 API

The JSON5 lexing, tokenizing and parsing all takes place in the json5-token.c and related files. Remember that this is also standalone functionality.

  • json5-unicode.c

    We define a json5_unicode_string_t which is remarkably similar to an Idio string in that it is a 1-, 2- or 4-bytes array of Unicode code points.

    The rest of the struct differs but the array format is identical to Idio meaning it is cheap to copy.

    There is also a plethora of ECMAScript-oriented tests – partly a feature of JSON5’s willingness to allow ECMAScript Identifiers as object member names but also to accommodate the various escape sequences that JSON5 strings support.

  • utf8.c

    This is Bjoern Hoehrmann’s DFA-based decoder, again, the same as in Idio whose only real purpose is to transcribe the UTF-8 input stream, here, into a json5_unicode_string_t

  • json5-token.c

    This constructs json5_value_ts as part of the tokenizing.

  • json5-parser.c

    This validates the JSON5 token stream and returns the aggregated json5_value_t.

    It also provides the main C interfaces to parse either a C string or read from a file descriptor.

Note that there is no JSON5 generator, per se, although the standalone code does have one in the code that uses the library. A generator need only walk around the json5_value_t printing out suitable forms.

JSON5 Module

Ostensibly, then, we have two JSON5 parser interfaces: a file descriptor, something we can extract from file (and pipe) handles, and a C string.

We can quickly augment the latter by having a function to create a json5_unicode_string_t from an Idio string and separate out the parse (json5_unicode_string_t *) from the parse (char *) interfaces and, hey presto, we have the ability to parse Idio strings as JSON5.

That’s not quite enough on two parts:

  1. the JSON5 API has left us with a json5_value_t – albeit one that has fairly direct associations with Idio values

    It’s easy enough, of course, to walk over the json5_value_t creating the corresponding Idio values as we go.

  2. there’s no generator

This time, the generator is slightly more complex. At no time has Idio seen a “native” JSON5 value. JSON5 is a data interchange format and the UTF-8 byte stream has been reified into an Idio value.

The json5_value_t was an intermediate form. As it happens, that’s all the standalone code needs but it is of no use in Idio as Idio wants Idio values.

Clearly, what we want is for an Idio value to be serialised as a UTF8 stream. Idio values can be quite rich – a closure can be the key or value of a hash table, say – so the generator needs to be slightly leery about what is valid JSON5 as it walks over the Idio value.

In fact, we can be a little bit more generous and offer a JSON (rather than JSON5) generator as well which limits the set of valid values further.

Errors

The final missing part is error handling. In the standalone code, the json5_error_printf () function prints the message and calls exit (1) which suffices for its use case.

In our case we want to replace those with raising conditions but that raises a thorny problem itself. After lots of iterations I finally had the C code invoke the Idio function condition-report to normalize the way conditions are, um, reported.

That’s great except this JSON5 module wants to create new conditions so we need some way of hooking these new conditions in.

On the C side, the code looks no different to any other module. Or rather would look no different except that all of the conditions are bundled into src/condition.[ch]. That’s purely an administrative choice, the definitions of the conditions could have been scattered about the code-base like everything else.

So, all we need do is declare the condition and then call the definition macro:

IDIO_DEFINE_CONDITION0 (idio_condition_rt_json5_error_type, "^rt-json5-error", idio_condition_runtime_error_type);

where it is just another descendent of ^runtime-error.

This is where automatically loading the adjacent json5.idio is useful as we can invoke the define-condition-type-accessors-only calls there.

We still don’t have anything for condition-report, though. For that we’ve had to augment it with a helper function, condition-report-extend which records a callback function to be called when a particular condition type is being reported.

As condition-report uses a couple of private functions to construct its message those need to be passed as arguments along with the original condition.

In the end, nothing too traumatic.

Last built at 2024-12-21T07:11:01Z+0000 from 463152b (dev)