.. include:: ../../global.rst ********* Functions ********* We've two types of functions, *primitives* and *closures*, which are both *abstractions* to encapsulate parameterised behaviour. *Primitives* are :lname:`C` code which have several broad categorisations: #. they are the only things that can manipulate :lname:`C` memory Which sounds a bit obvious but at some point it'll get lost. #. they bootstrap the user-level environment Quite a lot of the bootstrap code is then wrappered with more exotic forms of the same thing. This is partly because doing anything complicated in :lname:`C` gets a bit awkward. #. they are quick Again, it sounds a bit obvious but after writing your glorious :lname:`Idio` function you might just have to bite the bullet and write a :lname:`C` version to get any performance. #. they are our interface to another world Whether that is system calls or inter-language calls. *Closures* are :lname:`Idio` code. The VM has to be able to cope with both types of functions as invocable things but there is an interesting reality-warping difference: *primitives* are instantaneous (mostly). So far as the VM is concerned, the moment a *primitive* is called the answer is back in its hands. .. _primitives: Primitives ========== Primitives are far harder to conceptualise and implement. So we'll start with them. VM Usage -------- Rather than working from the bottom up let's ask the question, what does the VM need? Well, it'll want to know this is a primitive, we'll need a pointer to the :lname:`C` function, we'll need to know how many formal parameters it is expecting and whether it is expecting a variable number of parameters, over and above the formal parameters. And we can probably chuck in a :lname:`C` string for a name: .. code-block:: c :caption: gc.h typedef struct idio_primitive_s { struct idio_s *grey; struct idio_s *(*f) (); /* don't declare args */ char *name; uint8_t arity; char varargs; } idio_primitive_t; #define IDIO_PRIMITIVE_GREY(P) ((P)->u.primitive->grey) #define IDIO_PRIMITIVE_F(P) ((P)->u.primitive->f) #define IDIO_PRIMITIVE_NAME(P) ((P)->u.primitive->name) #define IDIO_PRIMITIVE_ARITY(P) ((P)->u.primitive->arity) #define IDIO_PRIMITIVE_VARARGS(P) ((P)->u.primitive->varargs) As the comment notes, we don't declare any arguments for the :lname:`C` function pointer as that will vary from primitive to primitive. I've limited the number of formal parameters, ``arity``, to 255. That *should* be OK for most people. In practice, you need to edit ``idio_vm_invoke()`` in :file:`src/vm.c` to have more than five formal parameters. ``varargs`` is a boolean and could, perhaps, should, be a type-specific flag. *\*shrugs\** I've skipped how one of these is created. That's a complication we can come back to. .. rst-class:: center \* The actual function *call* needs to gather together the arguments to be able to say, broadly, for a binary function: .. parsed-literal:: :name: ``idio_vm_invoke()`` in :file:`src/vm.c` result = (IDIO_PRIMITIVE_F (prim)) (*arg1*, *arg2*, *args*); The arguments will have been marshalled up by the VM ready for a generic function call, actually, marshalled up for a *closure* call. For a *primitive* we have to un-marshall them with :samp:`{args}` being the remaining arguments beyond the formal ones. :socrates:`Wait! What happened with varargs?` Ah, dirty secret time, *all* functions, *closures* and *primitives*, are *always* called with varargs arguments. The ``varargs`` field is used to syntactically verify the call point, *not* the invocation. *Bah!* Even that's not true. Fixed argument primitives are encoded such that they can be called directly by the VM with just their fixed argument, er, arguments. Varargs primitives go through the full function call interface (marshalled arguments and all). :lname:`C` Construction ----------------------- This is a good deal more involved. We have an idea of what the VM wants, now we need to construct a primitive. Obviously we need to write the actual code (*duh!*) but we simultaneously need to record the primitive's *pointer*, *arity*, *varargs* and *name* -- and that's just for the VM. We might also want to record meta-information like the primitive's *signature* and *documentation string*. I like the idea of some sort of `Type Inference `_ in which case it is a requirement that the primitive functions fully declare their parameter and return types. This is to bootstrap the whole type inference system. More *stuff* to record! That sort of data, describing the primitive, looks like: .. code-block:: c :caption: gc.h typedef struct idio_primitive_desc_s { struct idio_s *(*f) (); char *name; uint8_t arity; char varargs; char *sigstr; char *docstr; } idio_primitive_desc_t; To make that recording happen I've cobbled together some increasingly complex :lname:`C` macros which require the developer to be on song. When a primitive function is written we won't use a stock :lname:`C` function signature, instead we'll use a macro. So, rather than: .. code-block:: c IDIO idio_pair_primitive (IDIO h, IDIO t) { return idio_pair (h, t); /* the actual constructor */ } we use (in its simplest form): .. code-block:: c IDIO_DEFINE_PRIMITIVE2 ("pair", pair, (IDIO h, IDIO t)) { return idio_pair (h, t); } Let's break this down: * ``IDIO_DEFINE_PRIMITIVE2``, from :file:`src/idio.h`, says I'm defining a primitive that takes 2 formal parameters. ``IDIO_DEFINE_PRIMITIVE2V`` says two formal parameters and varargs. (You can guess the rest of the variations. Hopefully.) * ``"pair"`` is the :lname:`C` string name which will become the function's :lname:`Idio` name In this case, I can expect to find an :lname:`Idio` function called ``pair``: .. code-block:: idio-console Idio> pair # * ``pair`` is the :lname:`C` function name snippet that will be prefixed with ``idio_defprimitive_`` Clearly, it should be unique amongst all other primitives which, if you keep it trivially aligned with the function's :lname:`Idio` name should be OK. Notice that this newly created :lname:`C` function, ``idio_defprimitive_pair`` does not now clash with the pair constructor, ``idio_pair``. It's hardly in a separate namespace but so long as developers don't call their regular :lname:`C` functions :samp:`idio_defprimitive_{name}` then we're pretty safe. * ``(IDIO h, IDIO t)`` is the formal parameter list to the :lname:`C` function. :samp:`IDIO_DEFINE_PRIMITIVE{x}` do two key things: #. create a ``static`` primitive function description structure for use later #. prefix the actual function code with an appropriate header In combination, that means: .. code-block:: c IDIO_DEFINE_PRIMITIVE2 ("foo-idio", foo_C, (T1 a1, T2 a2)) { ... } will expand to: .. code-block:: c IDIO idio_defprimitive_foo_C (T1 a1, T2 a2); static struct idio_primitive_desc_s idio_primitive_data_foo_C = { idio_defprimitive_foo_C, "foo-idio", 2, 0, "", "" }; IDIO idio_defprimitive_foo_C (T1 a1, T2 a2) { ... } which looks pretty promising. The next variants have a ``_DS`` suffix meaning they expect a *signature string* and a *documentation string*. The two strings are used when constructing user-friendly information for help and debugging. The signature string for a closure is, in essence, the formal parameter list and so it should be similar for a primitive. This is straightforward for non-varargs functions, like ``pair`` where the signature string can be just ``h t``. For varargs functions, the interpretation varies considerably. For some functions the varargs might be "more of the same" but for others it might be optional parameters, like with ``display`` which takes a formal parameter, the value to be displayed, and an optional argument, encoded as varargs, for the *handle* to display to. If no handle is passed, it defaults to the *current output handle*. I'd lean towards a signature string of ``v [handle]`` but this ability to be more specific about the meaning of varargs parameters is inconsistent with closures which can only report ``v & args`` from the formal parameters. *Dunno.* The documentation string is for use with ``help`` where it will get printed out. I have this fanciful idea that the documentation string might be processed by, say, :program:`rst2txt` (ReStructuredText to text) and so I've encoded most documentation strings with that in mind. What I haven't done is made any progress in doing the processing. The actual ``pair`` PRIMITIVE looks like: .. code-block:: c IDIO_DEFINE_PRIMITIVE2_DS ("pair", pair, (IDIO h, IDIO t), "h t", "\ create a `pair` from `h` and `t` \n\ ") { IDIO_ASSERT (h); IDIO_ASSERT (t); return idio_pair (h, t); } Note the assertions of the parameters being passed in. The only difference is that we've filled in the ``sigstr`` and ``docstr`` fields. You can immediately guess that the non-``_DS`` variants simply call the ``_DS`` variants with ``""`` for the signature and documentation strings. .. rst-class:: center \* So far, then, we've organised to create a static instance of a ``struct idio_primitive_desc_s`` and defined a :lname:`C` function. We need to do something with it otherwise the pair of them will just hang about in the executable doing nothing useful as no-one knows to call them. That requires an explicit use the of primitive's description structure although we don't need to know anything about that directly, we just need to know the :lname:`C` function name snippet and we can work out the rest from there. You may recall all of the :file:`src/*.c` files relating to types have a similar bootstrap structure including: .. code-block:: c void idio_pair_add_primitives () { } into which we need to "add" our primitive: .. code-block:: c void idio_pair_add_primitives () { IDIO_ADD_PRIMITIVE (pair); } :samp:`IDIO_ADD_PRIMITIVE(pair)` is another :lname:`C` macro which expands into: .. code-block:: c void idio_pair_add_primitives () { idio_add_primitive (&idio_primitive_data_pair, idio_vm_constants, __FILE__, __LINE__); } and ``idio_add_primitive()`` does the dirty business of adding a new symbol (derived from the :lname:`C` string :samp:`{name}`) and have it reference the ``idio_primitive_t`` (itself constructed from the ``struct idio_primitive_desc_s``). In principle, this arrangement allows us to construct a primitive function "manually" as separate from the above sequence of "automated" construction. That said, I have yet to create a primitive manually so this dance through a static object feels like a bit of a waste. That's it. For primitives, then, we have a two-step shimmy, a :lname:`C` macro at the start of the function's implementation and another macro to add it into the :lname:`Idio` engine. .. rst-class:: center --- There are, of course, a few variations on the theme. Normally, primitives are added to the ``Idio`` module and implicitly exported from that module. However, we might want to add a primitive to another module. :samp:`libc` is a module wrappering a number of :lname:`libc` system calls where the result is useful as an :lname:`Idio` value (and thus requires some data representation transmutation). :lname:`libc`'s systems call names frequently clash with "normal" :lname:`Idio` usage -- ``read`` is our canonical example. So, here, we want to add the primitive to a different module (:samp:`libc` rather than :samp:`Idio`). Separately, we might want to export that primitive from the module. Indeed, all of the primitives in :samp:`libc` are exported. By exporting the name we can call it "long-hand" from elsewhere: .. code-block:: idio libc/getrlimit libc/RLIMIT_NOFILE If ``getrlimit`` (and ``RLIMIT_NOFILE``) was not exported from :samp:`libc` then the above call would fail. For what it's worth, :manpage:`getrlimit(2)` will update a :lname:`C` ``struct rlimit``. For us, the ``getrlimit`` primitive has then transcribed the results into an :lname:`Idio` ``struct-rlimit`` with a couple of familiar looking fields: .. code-block:: idio-console # Had we stored the result in a variable and quizzed the (simplistic) help system it would have told us a bit more about the structure instance: .. code-block:: idio-console Idio> x := libc/getrlimit libc/RLIMIT_NOFILE # Idio> help x struct-instance: x struct-type: struct-rlimit fields (rlim_cur rlim_max) Closures ======== Closures are a bit more tricky to describe without knowing how the VM works. But let's give it a go. In essence, a closure needs two things: * some code (*duh!*) * an *environment* within which that code is run -- from which we access the *free variables* (non-lexical variables) in the function In our case we have compiled our code for a *byte compiler* which means everything is a bit more like assembly language programming and the *code* becomes a *program counter* (PC) such that we jump to that *PC* and keep processing. The *environment*, here, is the combination of the lexical context at the point of creation and the module the closure was created in. Remember that many closures are created with the intent of managing private variables. It is quite unlike, say, :lname:`C`. Documentation ------------- Like `Emacs Lisp `_ we should allow the user to write a documentation string for their function which should be the third argument of four: :samp:`function {formals} {docstr} {body}` or :samp:`define ({name} {formals}) {docstr} {body}`. If there are only three arguments then the :samp:`{body}` takes precedence -- if all you want is for your function to return a string then that seems legitimate. The use of :samp:`{docstr}` will need to adhere to our "single line" reader processing -- which is partly why strings are allowed to be multi-line. So we'll have something like: .. code-block:: idio :caption: common.idio define (atom? x) "predicate to test if object is an atom :param x: object to test" { not (pair? x) } Notice how the multi-line documentation string continues the "single line" from the end of the declaration through to the start of the body form. At the moment the documentation string is not processed though I would expect some ReStructuredText-style formatting to handle parameter definitions and cross-references. Implementation -------------- Like primitives there is a two-step process to manipulating closures. We need to create one and then use it. The data we need for a closure looks like: .. code-block:: c :caption: gc.h typedef struct idio_closure_s { struct idio_s *grey; size_t code_pc; size_t code_len; struct idio_s *frame; struct idio_s *env; } idio_closure_t; #define IDIO_CLOSURE_GREY(C) ((C)->u.closure->grey) #define IDIO_CLOSURE_CODE_PC(C) ((C)->u.closure->code_pc) #define IDIO_CLOSURE_CODE_LEN(C) ((C)->u.closure->code_len) #define IDIO_CLOSURE_FRAME(C) ((C)->u.closure->frame) #define IDIO_CLOSURE_ENV(C) ((C)->u.closure->env) Let's take a look: * ``code_pc`` is the *program counter* for this closure It is a ``size_t`` because it is in index into the byte array of compiled code. * ``code_len`` is, not surprisingly, the length of the byte compiled code for this closure It's not really used in anger but turns out to be quite handy if you want to disassemble the code for just one closure (otherwise the disassembler doesn't know when to stop). * ``frame`` is a reference to the lexical context * ``env`` is a reference to the module the closure was created in The *frame* and *env* require a little more explanation. When we *run* a closure, in our mind's eye we look at the source code and can see two things: #. the lexical environment -- which the closure body may or may not use #. the arguments passed into the function by the calling code Let's try an about-as-complicated-as-it-gets example. Here in module ``foo`` we create a function ``f`` which uses both "top level" variables from module ``foo`` as well as lexical variables defined outside of the function as well as the parameters passed in: .. code-block:: idio :caption: file :file:`foo.idio` module foo export (f) ; top level variable in *foo* this := 1 f = { ; lexical variable only visible inside this block that := 2 function (the-other) { + this that the-other } } and .. code-block:: idio :caption: file :file:`bar.idio` module bar import foo n := 3 ; call the "remote" function f n We have a reasonable feeling that the result of calling ``f 3`` in module ``bar`` should be 6. But wait! These two sets of information are disjoint. We have a set of lexical information which we can see by looking at the source that is defined in :file:`foo.idio` and therefore the closure must have access to it and *at the same time* we have this set of parameters being passed in from who knows where (:file:`bar.idio`). How does that external information get linked in with our local lexical information? Well, that's the nature of a linked list of frames of parameters. At the time of creation of the closure we stash the current linked list of frames -- which will be all the parameters leading up to the definition of the closure in the source code -- *in* the closure structure, the ``frame`` field. The protocol is that when someone is due to invoke a closure they will construct a frame and fill it with the evaluated (or unevaluated for templates!) values and leave it in the *val* register. The act of invocation does a sequence of *frame*\ y things: #. it stashes the current frame (of the caller) on the stack #. it replaces the current frame with the stored frame of the closure, the original lexical environment of the closure #. it calls the closure: #. the very first thing the closure does is verify the arity is correct by checking the number of args in the frame in *val* #. the next thing it does is link the frame in *val* into the current frame (for the function defined in :file:`foo.idio`) which we just swapped with the original lexical environment frame (from the call point in :file:`bar.idio`) At this point, the frame tree looks like the values from the caller in the top-most frame and then everything the closure saw at the point of its definition: .. code-block:: text frame 0: [ 3 ] ; from the call in bar frame 1: [ {that} ] ; from the block in foo frame 2: #n ; no other frames! {this} is in *env* #. at some point the closure has computed a value and left it in *val* and ``RETURN``\ s #. the calling code's frame is restored from the stack At this point we are exactly as before the call to the closure except we have had the frame of evaluated arguments in *val* replaced with the computed result of the closure. Having said all that about the *frame*, we can repeat it in kind for *env*, the "top-level" environment that the closure was defined in: #. stash the caller's top-level *env* on the stack #. it replaces the current env with the stored env of the closure, the original lexical top-level of the closure #. it calls the closure Now we can find :samp:`{this}` in the environment of the function ``f`` defined in module ``foo``. In fact the (stored) environment is just the name of the environment at the time of definition, ie. ``foo`` in this case. That variables defined in this environment are an attribute of the module. So, even though ``f`` was called in ``bar``, because ``f`` was defined in ``foo`` the current environment is toggled to ``foo`` and we can (successfully) find :samp:`{this}` in the known variables of ``foo`` -- even though the call is "live" in ``bar``. #. the calling code's *env* is restored from the stack .. _`function creation`: Creation ^^^^^^^^ Creating a closure is a little bit of art. In the first instance, we don't have to but we will encode the *creation* of the closure as well as the code for the *implementation* of the closure in the output byte code. Subsequently, we have a generated code stream that means if we could store the byte code out as a loadable module then we have the creation code embedded in it. When we hit a function abstraction: .. code-block:: idio function (a b) { a + b } we have two things: * a function prototype: this example takes two formal parameters and no varargs * the function implementation or body In addition we have the lexical context (we *should* have the lexical context, *we're* doing the evaluation!) and the current module. In combination we have everything we require to satisfy the need to create a closure. Remember, though, with a function declaration, we're only looking to define the function, not run it, so: .. code-block:: idio x := 1 define (y a b) { a + b } z := x + 1 is the pseudo-code: .. parsed-literal:: define :samp:`{x}` as the result of evaluating :samp:`1` define :samp:`{y}` as the result of evaluation the function definition :samp:`function (a b) \\{ a + b }` into a closure value define :samp:`{z}` as the result of evaluating :samp:`{x} + 1` In other words, we're not *running* the function during this process, just establishing it as something that can be run. A *closure value* is, from the data structure above, a program counter, a code length, a future frame (of arguments) and a pointer to an environment. The process is: #. we'll create the code for the body in a temporary array From that we can calculate the ``code_len``. Actually, the body: * is prefaced with an arity check and something to link the argument frames * has a ``RETURN`` instruction tacked on the end but that's by the by. #. in the normal byte code array we'll add the ``CREATE-CLOSURE`` instruction which requires an *offset* to the ``code_pc`` (which will be after the upcoming ``JUMP`` instruction) plus some standard fields: * the code length * a reference to the signature string * a reference to the document string All of which we will need to be able to construct the ``struct idio_closure_s``. #. since we are only *creating* the closure, we don't want to run it right now, we'll add a ``JUMP`` instruction followed by the code length (again) so that we (during creation) jump over the function body. This means that when this creation code is *run*, we'll see: a. ``CREATE-CLOSURE`` with a ``code_pc`` (and its three arguments: code length, signature string and documentation string) then a #. ``JUMP`` to *beyond the end of the body* #. the definition of :samp:`{z}` in the example, above. The ``CREATE-CLOSURE`` has been given the *offset* to the ``code_pc`` and can create the closure value. *Neat!* However we do it, we want to jump over the function body so as not to run it. #. We can now copy the function body from the temporary array The first byte of which (now in the normal byte code array) is ``code_pc``. But wait, this is now an *unknown* number of bytes beyond the ``CREATE-CLOSURE`` instruction because we're not sure how many bytes it took to encode the ``JUMP`` instruction. How do we determine what *offset* should have been? Of course, the answer is that we're *expecting* this to happen, so made sure to look at the code length and figure out that either: a. it is a short jump in which case the jump length is encoded in a byte and the overall *offset* is two bytes (including one for the jump instruction) #. it is a long jump in which case we encode the jump length in (yet another) temporary byte code array and can say that the *offset* is one plus the length of this temporary array -- plus one because the jump instruction takes one byte! The code generator can now add the correct *offset* to the ``code_pc`` after the ``CREATE-CLOSURE`` instruction. Another way of looking at it is this abstraction of the generated byte code: .. code-block:: text :linenos: ... CREATE-CLOSURE (length of #3) sigstr docstr JUMP to #5 function body (including a trailing RETURN) ... When we come to *run* the closure, the closure value has had ``code_pc`` set to ``#4`` and will stop processing before it hits ``#5`` -- the definition of :samp:`{z}` -- because it has a ``RETURN`` statement stamped on the end. Local Assignments ================= We saw in :ref:`scheme-let` how ``let`` statements can be transformed into closed function calls. *Woo!* Great *Art* is being made. Except it's pretty inefficient. Every function call requires some saving and restoration of the current state, the creation of and assignment to a frame and the closed function call itself. For *every* "local" variable. It doesn't look pretty. So I've created a Bit of a Hackā„¢. The goal is to say that, where we are creating a "local" variable in the context of a function body, then do a bit of shuffling so that the variable is treated like an extra, unseen parameter to the function: .. code-block:: idio define (foo a) { b := 1 a + 1 } will have had created a *frame* for ``a`` and space for the not-necessarily-used varargs -- which we'll call :samp:`{v}`: ``#[ *a* *v* ]``. We know that access to ``a`` is made with efficient ``SHALLOW-ARGUMENT-REF0``-style calls in the VM. When we see ``b`` being defined in the context of a function body then: * extend the list of names in the *nametree* for this level with ``b`` -- we'll need to insert a dummy name, ``#f`` for the varargs argument otherwise we'll get an off-by-one error. * we'll extend the *frame* with a slot for ``b``, ``#[ *a* *v* *b* ]`` * and set that slot with a ``SHALLOW-ARGUMENT-SET2`` and then carry on with further references to ``b`` now being to the otherwise unknown parameter ``b`` in slot #2 of the frame. As we've updated the *nametree* then any enclosed blocks which make reference to ``b`` will successfully find it and reference with :samp:`DEEP-ARGUMENT-REF {i} 2` as appropriate. Top-Level Block --------------- In the case of a top-level block such as: .. code-block:: idio { a := 1 b := 2 a + b } there is no enclosing function body for the definition of ``a`` so we have to revert to the previously described :ref:`scheme-let` mechanism. The defintion of ``b``, however, is now inside a function body, created by the implied ``let`` and so ``b`` can be handled as a local assignment. Operations ========== :samp:`function? {value}` is :samp:`{value}` a function, ie. a primitive or a closure .. include:: ../../commit.rst