.. include:: ../global.rst
******************
:lname:`Idio` Look
******************
:lname:`Scheme` looks like it can be used to program computers,
indeed, it looks like it can do far more in terms of programming
computers that we, simple scripters, are used to or, more likely,
capable of! It is very succinct with several key implementation
details:
* closures, we can close over variables and use them in a function
body
* anonymous functions, we can create functions on the fly
* functions are first class, we can construct and return them for
others to call
* compound data types
* error handling
* macros, at least syntax transforming macros
* continuations, as a means of deriving exceptional behaviour they are
required though whether we would want the full power of
continuations available is debatable. If we don't then users cannot
create their own escape procedures -- remembering that we're meant
to be writing a shell!
Variables might be introduced differently but are otherwise the same.
Data structures are broadly the same with a pair simply being the guts
of a linked list.
Calling functions is essentially no different, function followed by
arguments.
But, and I suppose it is the never ending bugbear of :lname:`Lisp`\ s,
it *looks a bit funny*. However, thanks to syntax transformations, it
can *look like* whatever we want it to, so long as we can transform it
back into something the underlying :lname:`Scheme`-ish engine can
handle.
We should recognise that we're describing two things: firstly we need
the core functionality to make some behaviour happen, think about the
ability to create and manipulate file descriptors, and secondly, some
syntactic sugar to abstract away the repetitive detail.
So, how should the language *look*\ ?
Line-Oriented
=============
I'm quite content with my ALGOL-inspired syntax and with my shell hat
on, I want, nay, *insist* that I be able to type:
.. code-block:: idio
ls -l
With no (obvious) punctuation, just like the shell. Isn't that the
point?
Mind you, with our :lname:`Scheme` hats on, ``ls`` is *undefined* in
the program (probably). We know, with our shell hats on, that if
``ls`` is not a function or alias then the shell will take it upon
itself to have a rummage around the shell's :envvar:`PATH` to find an
:program:`ls` executable. From a :lname:`Scheme` perspective, though,
we need to break the behaviour that ``ls`` is undefined.
Similarly, for ``-l``. Is that a funky function to subtract the value
of ``l`` from its argument (like a ``+1`` function) or just a string
of characters to be passed as an argument to some command?
.. sidebox::
It's an error because there's no sensible translation of an
:lname:`Idio` function (value) to an argument in
:manpage:`execve(3)`.
``-l`` is OK because, even though we will have it in our hands as
an :lname:`Idio` *symbol* there is an obvious translation of a
symbol into a string of characters.
The inverse is also an issue. Typing ``make debug`` results in an
error when ``debug`` is the name of a function. Tricky.
I like the idea that I can, at long last, have hyphens in my variable
names saving us from the thrall of underscores or CamelCase or
whatever -- albeit with the cost that whitespace is *required* to
distinguish between terms. I can live with that, whitespace is
*cool*.
I like that other punctuation characters are available to add meaning
to my names, ``?`` and ``!``, and I have a cunning plan for accessing
structure fields with ``.`` in the Jinja_ way (my original inspiration
was :lname:`Perl`'s `Template::Toolkit
`_).
However, there is a far worse bind we put ourselves under if we allow
punctuation characters in names as it conflicts with the
meta-characters the shell uses to spot pathname patterns:
.. code-block:: sh
ls -l *foo*.txt
Is that:
* the variable ``*foo*.txt``?
* the variable ``*foo*``, a structure, where we want to access the
field ``txt``?
* the variable ``*foo*``, an array where we want to access the
``txt``\ :sup:`th` element (``txt`` itself being a variable)?
* a pattern match expression to be passed to :manpage:`glob(3)`?
Hmm.
What if none of the above are true? If ``*foo*`` (or ``*foo*.txt``)
is yet to be defined at the top level, say, then we have a *runtime*
decision to make which is not a good place to be if we are thinking
about any form of compilation.
I confess I'm at a bit of a loss here and I'm leaning in favour of a
consistent programming language over shell-ish conveniences.
For pathname matching I'm turning towards the idea that since we
intend that the result of a :manpage:`glob(3)` pattern match be a list
that we *preserve as a list* until it is required then perhaps we
should distinguish the pattern match that creates it. Does something
like:
.. code-block:: idio
ls -l #P{ *foo*.txt }
cause palpitations? In one sense it is a bit like preparing a regular
expression in :lname:`Python` or :lname:`Perl`.
It's not *Art*, I agree. However, there is some precedence with
Murex_ taking to the likes of:
.. code-block:: text
ls -l @{g *.go} # globbing
ls -l @{rx \.go$} # by regex
although I have this feeling that it would be better/more generic/more
in the style of *something?* if we indicated a filter to the globbing
mechanism through the environment in the same was we might use
:envvar:`IFS`.
There's a very similar problem with plain filenames that you might use
instinctively in the shell:
.. code-block:: idio
ls -l foo
As this is :lname:`Idio` and not the shell, ``foo`` will be looked to
be evaluated -- and could be -- otherwise would be at risk of causing
an *unbound* error like ``ls`` might. Here we get a mixed bag, if
``foo`` is defined then we'll get the value of the variable ``foo``
(which could be anything, of course) otherwise the *symbol* ``foo``
will be passed through to the (external) command execution system and
the string of characters ``foo`` will be passed to :program:`ls`.
As a :strike:`transient` feature most things that expect a filename --
recall we don't have a filename *type* internally -- will work with a
string (implying no globbing!):
.. code-block:: idio
ls -l > "foo"
open-input-file "foo"
This has led to a few thoughts that perhaps filenames should be a
subclass of strings -- an extra flag on the underlying string type,
maybe -- allowing for them to be handled appropriately.
There's a myriad of problems here, though, in that until we have
successfully :manpage:`stat(2)`'ed the file we don't know it does or
even *can* exist. Further complicating matters is that operations
such as ``open-output-file`` are allowed to *create* files therefore
we must allow for a filename that we don't know if it can exist.
Single Word Feature
-------------------
By and large command lines are a command and some arguments, ``ls -l``
etc.. That's fine because it translates easily into a function
application, ``(ls -l)``, just like ``(+ 1 2)``.
However, there is an outstanding feature which is a side-effect of the
REPL. If your command consists of a single word, eg. ``ls`` then it
is indistinguishable in a standard :lname:`Scheme`-ly REPL where any
single word will be evaluated and its value printed:
.. code-block:: idio-console
Idio> n := 10
Idio> n
10
Idio> map
#
giving us the value of ``n`` and some internal representation of the
closure, ``map``.
(Internal representations having no useful meaning possibly even to
developers. Indeed, I had to look at the source to remind myself what
those values represent as I only really look at the ``CLOS`` part
telling me it is a closure.)
Which is what you want. Well, it's what many people would expect from
an interactive :lname:`Lisp`\ y session.
Consequently, typing ``ls`` will get, um, ``ls`` printed back as it is
determined to be an undefined symbol where we choose, in an
un-:lname:`Scheme`-ly fashion, to simply print the symbol (rather than
a more :lname:`Scheme`-ly raise an *unbound* error).
If you want to force the single word command ``ls`` to be invoked then
you *must* put it in parentheses:
.. code-block:: idio-console
Idio> (ls)
My Documents
...
That's true of functions as well:
.. code-block:: idio
osh := (open-output-string)
It's quite annoying for me where I habitually type ``make`` at the
:lname:`Idio` prompt to have ``make`` printed back at me. Not
helpful. I've taken to typing ``make -k`` a lot more....
.. sidebox::
TBD
I don't *like* it. I'm not sure what a better behaviour might be.
Complex Commands
----------------
There's another awkwardness from the idea of a line-oriented shell for
complex functions, ones that have multiple clauses. Take, ``cond``,
for example, which is nominally:
.. code-block:: idio
(cond (c1 e1)
(c2 e2)
(else e3))
``cond`` by rights, should be invokable in the same way as ``ls``,
ie. without leading parenthesis but that would lead us with:
.. code-block:: idio
cond (c1 e1)
(c2 e2)
(else e3)
which our line-oriented engine is going to see as three distinct
statements -- albeit with the second two having exaggerated indents.
:lname:`Python` supports the idea of indented code -- indeed you can
see references to *indent* and *deindent* in the parser -- but it
doesn't feel like the indentation here is a syntactic thing, it's
really a visual *aide-mémoir*, after all, we could have written:
.. code-block:: idio
cond (c1 e1) (c2 e2) (else e3)
and be done.
Except the condition and expression clauses are almost certainly
complex and the resultant enormous line would be difficult to read let
alone maintain.
The original ``(cond ...)`` across multiple lines works because the
:lname:`Scheme`-ish engine is looking for the matching close-parens
(for the leading open-parens on the first line) and so will consume
all lines until it gets it.
For our "unwrapped" ``cond``, we can use a regular shell-ish line
continuation character:
.. code-block:: idio
cond (c1 e1) \
(c2 e2) \
(else e3)
But, be honest, it looks a bit clumsy. And I can say that with some
confidence as I had, out of a duty to see it through, written *all*
the complex multi-clause forms using this style. (What an *idiot*\ !
I sense growing sagacity of the language name...).
This gives us the dreadful:
.. code-block:: idio
if (some test) \
(truth-fully) \
(not so truthy)
I know, I know! (And it gets worse.)
Of course, you *can* continue to use the wrapping parenthesis -- all
that the non-wrapped line is doing is having the wrapping parenthesis
silently added -- but the result is like the `Curate's egg`_:
.. code-block:: idio
ls -l
(if (some test)
(truth-fully)
(not so truthy))
and, to be honest, I find it less appealing than the clumsy variant.
The line-continuation style has the decency to be consistent.
.. rst-class:: center
\*
Actually, when I came to write the :term:`metacircular evaluator` in
:file:`lib/evaluate.idio` I gave up and used the "wrapped" form.
Mostly because it was easier to convince my editor to line the
elements up to preserve my sanity.
It's OK, in the end.
Infix Operators
===============
There is the issue of infix operators, ``|`` and arithmetic operators
amongst a plethora of others.
I think there's a trick we can pull here following in the footsteps of
the reader macros for ``quote`` and friends.
Suppose we have a means to declare a symbol as an infix operator
together with some behavioural code. Then, after the reader has read
the whole line/parenthetical expression in, it goes back and looks to
see if any of the words are an infix operator. This is much like
macros where the evaluator goes and looks for macros and behaves
differently except we are running this before reaching the evaluator.
So, if I had typed:
.. code-block:: idio
zcat file | tar tf -
then the reader will have read in six words in a(n implied) list. It
can scan along, find that ``|`` is an infix operator and call its
behavioural code. I'll assume we're all happy that it wants to rework
this into:
.. code-block:: idio
(| (zcat file)
(tar tf -))
.. sidebox::
Always a good position.
which is a simple list transformation requiring no knowledge of
anything.
After this transformation we have a ``|`` in functional position and
so the evaluator will expect it to be the name of a function.
Had someone typed the second form in directly then the reader would
have left it alone as the thing that *looks* like an infix operator
can't be, because it's the first element in the list. An infix
operator (surely?) has to have something *before it* to be *in*\ fix.
Recall I suggested that this happens in the reader for parenthetical
expressions so that if you'd typed:
.. code-block:: idio
zcat file | (tar tf - | grep foo)
(I'm leaving ``foo`` in there as it fails my point about symbols and
expansion but is easier to read whilst we mull over the idea.)
Although we start reading ``zcat file ...``, the first *complete*
parenthetical expression read would be ``(tar tf - | grep foo)`` which
can be re-written as:
.. code-block:: idio
(| (tar tf -)
(grep foo))
to become part of the outer line-oriented expression when it is
eventually completely read in (by hitting the end of line):
.. code-block:: idio
zcat file | (| (tar tf -)
(grep foo))
This time, even though there's a ``|`` in the middle of the second
expression it isn't directly in the outer expression which looks, to
the reader, like:
.. parsed-literal::
*rhubarb* *rhubarb* | *rhubarb* *rhubarb*
.. aside::
I should be careful of referencing 1970s British TV comedy for fear
of attracting :ref-title:`The Phantom Raspberry Blower of Old
London Town`!
and can be transformed into:
.. code-block:: scheme
(| (zcat file)
(| (tar tf -)
(grep foo)))
Contrast that with multiple instances of the operator in the same
expression:
.. code-block:: idio
zcat ... | tar ... | grep ...
which we might transform into:
.. code-block:: idio
(| (zcat ...)
(tar ...)
(grep ...))
It's more subtle than that, though, as a pipeline (and the logical
operators ``and`` and ``or``) take multiple words as their arguments,
*including other operators*, yet arithmetic operators (and IO
redirection) take only a single argument either side.
:lname:`Scheme` might allow:
.. code-block:: scheme
(+ 1 2 3)
but
.. code-block:: idio
1 + 2 3
is incorrect in regular arithmetic. That means that the code for
operators needs to do some syntax checking. It's not great that
syntax checking is happening in the reader but, hey ho. Let's run
with it.
To complicate matters, ``+`` and ``-`` are commonly unary operators as
well as binary ones: ``- n`` should return negative ``n`` (remembering
that ``-n`` is a symbol!).
Operator Associativity
----------------------
The arithmetic operators have *associativity*, that is ``1 - 2 - 3`` is
equivalent to ``(1 - 2) - 3`` as ``-`` is left-associative. ``+`` is,
mathematically, non-associative although usually defined in
programming languages as left-associative. Assignment is
right-associative -- evaluate the value first!
Pipelines are left associative, hence the triple pipeline example is
quite likely to be *executed* as:
.. code-block:: idio
(| (| (zcat ...)
(tar ...)
(grep ...)))
Even if its nominal form is all three children parented by the same
``|`` operator.
Operator Precedence
-------------------
.. sidebox::
*Careful!* You might have internalised :lname:`C`'s operator
precedence rules as The Truth!
We'll use the same rules but only for familiarity reasons.
There's also *precedence*: ``(1 + 2 * 3)`` could be ``((1 + 2) * 3)``
or ``(1 + (2 * 3))`` depending on which operator was run first.
Logical operators, pipelines, arithmetic, logical operators and IO
redirection are all ordered by precedence:
.. code-block:: bash
tar *.txt 2>/dev/null | gzip > foo || echo whoops
should be interpreted as first ``pipeline || pipeline``, then ``cmd+io
| cmd+io`` then, finally, arrange ``io`` followed by executing
``cmd``. Hence we might derive:
.. code-block:: scheme
(or (| (io-> 2 /dev/null
tar *.txt)
(io-> 1 foo
gzip))
echo whoops)
using some putative ``(io-> fd file cmd . args)`` function to handle
IO redirection (which doesn't handle multiple redirections so doesn't
exist).
Notice no :manpage:`execve(3)` function has been introduced as, at
this stage, we don't know if ``tar`` and ``gzip`` are internal
functions or external commands. All we're doing is rewriting the
statements involving infix operators.
The Reader and Infix Operators
------------------------------
Such transforms will also mean that:
.. code-block:: idio
echo 1 + 2 3 * 4
will be re-written as:
.. code-block:: idio
(echo (+ 1 2) (* 3 4))
resulting in:
.. code-block:: idio-console
3 12
but the original form was hard for the human mind to scan -- the
pedantic grouping of sub-expressions of :lname:`Scheme` would have
forced us to write:
.. code-block:: idio
echo (1 + 2) (3 * 4)
which is, at least, clearer in intent!
It's not uncommon to have multiple clauses in a logical statement,
drawn out over multiple lines for clarity:
.. code-block:: idio
if (this and
that and
the-other) \
...
So I feel that if the last word on a line is an infix operator then
the expression is assumed to continue on the next line.
That said, I've gotten quite used to writing the more
:lname:`Scheme`-ly:
.. code-block:: idio
if (and this
that
the-other) \
...
but the trailing operator trick stills stands.
Operator Overloading
--------------------
"Operator overloading" is a fan favourite in other languages -- which
is another way of saying, heavily controversial.
Think ``+`` is just about adding integers together? It's common
enough to appear as string concatenation and so I guess people would
be happy enough to see it used for any kind of append operation
(lists, arrays, hashes(?)).
Our infix operators are blind to your types, though. They simply
massage lists into another form. You'll be wanting function
overloading which means :ref:`generic functions`.
Operator Summary
----------------
I think this reader macro trick has some mileage.
Lexical Blocks
==============
If you're a :lname:`Scheme`\ r, you're quite used to:
.. code-block:: scheme
(let ((a 1))
(+ a 1))
whereas others would be more at home with a more :lname:`ALGOL`-ish:
.. code-block:: idio
{
a = 1
a + 1
}
where ``{`` starts a lexical block in which we can introduce lexically
scoped variables. They would likely all be introduced as a ``let*``
or ``letrec`` type as that's the sort of behaviour non-\
:lname:`Scheme`\ rs would expect where we can have one variable
derived from another:
.. code-block:: idio
{
a = 1
b = a + 1
odd? = function ... even? ...
even? = function ... odd? ...
}
Additionally, the lexical block has an implied ``begin`` meaning the
last calculated value is the one to be returned by the block.
The reader is intent on matching bracket-type things so will read
multiple lines -- the lexical block's body -- to get the closing
``}``.
They sound quite handy for :ref:`functions`.
Assignment
==========
I like the idea of ``define`` to force the user to declare names --
and we'll see the (pedantic :lname:`Scheme`\ ly) reasons why later --
however I think we :lname:`ALGOL`-types generally prefer the ``=``
style.
I was thinking of some way to avoid the ``=``/``==`` mistakes
prevalent in :lname:`C`:
.. code-block:: c
if (a = 1) ...
is probably not what you want, and started off thinking that a
:lname:`Pascal`-style ``:=`` would come in handy. We can also make it
an infix operator:
[confident narrator voice] *...we have the technology...*
.. code-block:: idio
a := 1
...
which would be transformed by the reader into ``(:= a 1)`` and then
the evaluator can introduce the variable ``a`` giving it the value 1.
Technically, in the underlying :lname:`Scheme`-ish engine that's going
to be a ``define`` or a ``let``. In a lexical block, for example:
.. code-block:: idio
{
a := 1
...
a
}
would get transformed into:
.. code-block:: idio
(let ((a 1))
(begin
...
a))
in other words, after any ``:=`` statement, the entire rest of the
lexical block becomes the body of the implied ``let`` such that the
``let*``-ish:
.. code-block:: idio
a := 1
b := a + 2
...
b
would get transformed into:
.. code-block:: idio
(let ((a 1))
(let ((b (a + 2)))
(begin
...
b)))
which, I think, works OK.
I have, however, failed in my visual-distinction task in that to
"modify" ``a`` we use regular ``=``:
.. code-block:: idio
a := 1
...
a = a + 1
Noting that the *reader* will transform ``a + 1`` into ``(+ a 1)``
*first* as ``+`` has a higher infix operator precedence than the
``=``:
.. code-block:: idio
(= a (+ a 1))
Function calls
--------------
Function calls in assignments come out in the wash, here, as:
.. code-block:: idio
a = func sol bruh va
is transformed into:
.. code-block:: idio
(= a (func sol bruh va))
and ``(func sol bruh va)`` is a regular evaluable form returning its
result to the assignment operator, ``=``.
We *are* still stuck with the single word feature as described above
so you need to type:
.. code-block:: idio
a = (func)
if you're not passing any arguments. *\*shrugs\**
Top Level Assignments
---------------------
Defining variables at the top level, ie. outside of a lexical block,
comes in a couple of forms.
``define`` has two forms itself:
.. parsed-literal::
define *variable* *value*
define (*func-name* *formals+*) *body*
or you can use one of the assignment infix operators:
.. parsed-literal::
*variable* := *value*
*function-name* := function (*formals+*) *body*
The ``define`` form for functions is much cleaner but the assignment
variant is used liberally when re-defining an existing function.
Non-lexical Variables
---------------------
We want to get to handling environment variables seamlessly which I've
been suggesting are a kind of tagged dynamic variable.
Dynamic Variables
^^^^^^^^^^^^^^^^^
Dynamic variables "live on the stack", that is to say that their
existence is dependent on the code path you have run and as your
function call hierarchy unwinds the dynamic variables disappear.
Access to the variable is more work because you need to run back up
(down?) through the stack looking for your transient variable.
(This idea is reused -- probably a bit too much!)
Nominally, there's the ``dynamic-let`` call which introduces a dynamic
variable (onto the stack) and starts processing:
.. code-block:: idio
(dynamic-let ((X 10))
(foo))
Normally, some function calls deeper, you would call ``(dynamic X)``
to get the value of ``X``, your dynamic variable.
Obviously, you don't want to be asking for a dynamic variable in some
random bit of code on the off-chance. You're meant to know what
you're doing!
.. sidebox::
This evaluator trickery doesn't work quite as cleanly as we'd like.
Dynamic variables effectively become top level variables.
Digressing a little, there is a mechanism in the evaluator to keep
track of variable names which can keep track of the lexical or dynamic
nature of a variable -- by remembering what kind of form introduced
it. Subsequently, if we were to reference ``X`` we would know it was
a dynamic variable and can therefore replace the variable reference,
``X`` with ``(dynamic X)`` -- essentially to provoke the stack-walking
mechanism -- and all is good.
Back to our variable initialisation.
.. aside::
*Genius!*
We have ``:=`` for lexical assignments, how about ``:~`` for dynamic
variables? ``~`` representing the maybe, maybe not dynamic nature of
the beastie.
Environment Variables
^^^^^^^^^^^^^^^^^^^^^
Environment variables are much more shell-ish. I'm suggesting we want
to implement them in a similar fashion, as an "environment" variable
with a dynamic nature. They are different to dynamic variables in
that whenever a program is executed it will have an environment
created which is built from any extant "environment" variables.
.. sidebox::
Look, it's just a little bit of technical debt that, when we get
round to it, we can apply the extra experience we'll have picked up
in the meanwhile to do a better refactoring job.
What's not to like?
It could be implemented by tagging some dynamic variables as
"environment" variables or it could be implemented as an entirely
parallel and separate set of dynamic variables. *\*cough\**
The reason for this dynamic nature is I want to be able to say:
.. code-block:: idio
{
PATH := path-prepend PATH /some/where/else
do stuff
}
Here, for the duration of this lexical block, ie. whatever effect we
create should be unwound at the end of the block, I am creating a
*new* ``PATH`` variable which should be used by anyone looking up the
:envvar:`PATH` environment variable if, say, they want to find an
executable.
After this lexical block people can find the old value.
.. aside::
*Stop it, please!*
I'm thinking ``:*`` here, with ``*`` signifying the stars, the
*environment* surrounding us!
So that should have been:
.. code-block:: idio
{
PATH :* path-prepend PATH /some/where/else
do stuff
}
You might ask why didn't I just modify the value of ``PATH``? Well,
modifying it means that everyone after this lexical block will see my
transient changes unless I ensure that I can unwind the changes
manually before I'm done.
:socrates:`Even on error and in the face of continuations?` Hmmm, "tricky."
Of course, modifying ``PATH``, say, for everyone following is
perfectly normal we're just covering the variable assignment prefix
case of:
.. code-block:: sh
PATH=/some/where/else:$PATH do stuff
Un-setting Transient Variables
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
There is another corner case where you might want to unset such
dynamic variables, possibly transiently, and almost always for
environment variables.
For that we need to add a stack marker that says: stop looking any
further and return "failed to find".
We need to kind of indication we're stomping on the normal way of
things so I'm penning in ``!~`` and ``!*`` for dynamic and environment
variables respectively.
Computed Variables
^^^^^^^^^^^^^^^^^^
There's another class of shell variables we want to emulate. Remember
``SECONDS`` which returns us the number of seconds the shell has been
running for? (There's ``$^T`` in :lname:`Perl` as well.)
There's clearly a bit of magic there where the simple act of accessing
a variable has resulted in a (hidden) function call.
I'm calling these *computed variables* (not least because many others
have done so before).
It would be neat if we could allow the user to create these.
``SECONDS``, mind, is probably one for the language implementer as it
requires something in the language bootstrap to start the clock
rolling!
There's another twist for computed variables. They might be
read-only, like ``SECONDS`` for which it makes no sense to assign a
value to them. They might be write-only like :manpage:`srandom(3)`,
the seeding mechanism for :manpage:`random(3)` where it defeats the
purpose to get back that secret seed value. It might be read-write
like the shell's ``RANDOM`` (which combines the behaviour of both
``srandom`` and ``random``).
If the user is defining a computed variable then they must pass two
parameters to the initialisation: a getter and a setter. If you want
it to be read-only pass ``#n`` (aka. ``nil``) for the setter. Pass
``#n`` for the getter for a write-only variable. Passing ``#n`` for
both should result in an error -- *don't be annoying!*
.. aside::
*Oh, puh-lease!*
As for the infix operation/function name, try ``:$`` for size!
We can then rustle up something like:
.. code-block:: idio
getter := #f
setter := #f
{
p := 0
getter = function () {
p = p + 1
p
}
setter = function (v) {
p = v
}
}
cv :$ getter setter
printf "%d %d\n" cv cv
cv = 10
printf "%d %d\n" cv cv
which should display:
.. code-block:: idio-console
1 2
11 12
.. _functions:
Functions
=========
Of course, functions could change very little from how they exist
now:
.. code-block:: sh
function foo ()
{
echo $(( $1 + 1));
}
other than a little textual transformation into, say:
.. code-block:: idio
define (foo a) {
b = a + 1
b
}
the only useful difference being that we have formal parameters.
Notice too the start of a lexical block, ``{``, *on the end of the
first line* of the function declaration. That means our line-oriented
reader will continue reading through to the matching ``}`` thus
creating the function body.
If it wasn't on the end of that line:
.. code-block:: idio
define (foo a)
{
b = a + 1
b
}
would be a semantic error as ``define (foo a)`` was read and deemed by
the line-oriented code reader to be a whole expression. But it is a
function declaration with no body and so is an error. The remaining
lexical block is, well, just a lexical block. That's legal although
you might get complaints about this new variable ``a`` that has
appeared.
.. aside::
Haters gonna hate!
I like the ``{`` on the end of the line as it is, by and large, the
way I write code anyway. Others will, no doubt, be very angry.
Functional Block
----------------
:lname:`Ruby` and :lname:`Swift` both allow a block to take formal
arguments thus allowing an alternate form of creating anonymous
closures:
.. code-block:: ruby
{ |n| n + 1 } ; Ruby
.. code-block:: swift
{ (n) in n + 1 } ; Swift - could have said { $0 + 1 }
The more you're used to seeing it the easier it is to scan.
I'm not so tied to the idea, although anonymous function declarations
are quite wordy. The ``define`` above is re-written internally to:
.. code-block:: idio
define foo (function (a) {
b = a + 1
b
})
with the extra parentheses around ``function ...`` to create the
function value -- otherwise it is just a string of words giving
``define`` too many arguments!
What the :lname:`Ruby` and :lname:`Swift` variants are doing is
getting rid of the ``function`` keyword and shifting the formal
arguments inside the block.
Pipelines
=========
We're fairly happy, I think, that we should have regular shell-ish
pipelines:
.. code-block:: idio
zcat file | tar tf -
I've not implemented it yet -- partly because ``&`` is used for pairs
-- but you can image a form of postfix operator, ``&`` to background a
pipeline, like the shell:
.. code-block:: idio
zcat file | tar tf - &
Previously I said I was going to reserve the :lname:`C`/shell logical
operators ``&&`` and ``||`` for nefarious purposes.
Again, not implemented, but I'm thinking that ``||`` is more of an
object pipeline -- so, like PowerShell_ and friends -- we might be
able to create some functional composition/comprehension/cascade
(there must be a proper term) where the value returned by one function
call is the argument to the next:
.. code-block:: idio
func args || f2 || f3
although you immediately think that the later functions should be
allowed to have other arguments themselves in which case you'd need
some symbolic argument for the value being passed down.
Alternatively, ``f2`` and ``f3`` could be the result of currying
themselves and only take a single argument.
:lname:`Perl` used to use ``$_`` for the anonymous value so our
equivalent would be something like:
.. code-block:: idio
func args || f2 a1 _ a2 || f3 _ b1 b2
However it might be done, you can recognise that it is a
straightforward enough transformation into a nested call:
.. code-block:: idio
(f3 (f2 a1 (func args) a2) b1 b2)
On a different tack, ``&&``, might be used to "kick off in the
background" a thread (should we have any):
.. code-block:: idio
keep-an-eye-on stuff &&
This double-punctuation character thing might have some mileage.
How about ``>>`` to collect "output" in a string? Of course, ``>>``
is the append variant of ``>`` for (pipeline) IO redirection. That's
a bit unfortunate.
I appear to have lost my reference but I thought I'd seen some
alternate IO redirection forms, eg. ``>+`` might be used for append
and ``>=`` with an offset might be used to :manpage:`fseek(3)`
somewhere in a file before writing.
Traps
=====
We need to be able to handle errors and exceptions and we'll use the
broad idea of :lname:`Scheme`'s conditions. We want to trap a
condition.
A condition (a state of being of the program, if you like) is the
parent of errors and exceptions (and other derived conditions, of
course). For no particularly strong reason I wanted to distinguish
between errors and exceptions. To some degree it doesn't really
matter as they are all conditions and all get managed in the same way.
The distinction is largely taxonomic and poorly implemented as I look
back now and find that I've called *everything*
something-something-:code:`error`.
An *error* is something that cannot be fixed without rewriting the
source code:
.. code-block:: idio
pair 1
is an error because no matter how many times you run or whatever
external/environmental changes you make, ``pair`` takes two arguments
and you've only given it one. It is unrecoverable in any sense other
than to edit the source code.
Accordingly, for an error, it is not possible to continue with this
thread of processing and the engine will revert to some safe place.
The user cannot trap it.
An *exception* is any kind of transient condition, usually outside of
the program's control
- if a system call goes wrong we'll get an exception raised (by the
:lname:`C` code wrappering the call) indicating ``errno``
- if you fail to open a file -- permissions, disk issues (inodes and
data blocks in Unix), etc.
- if you try to get the ``string-length`` of a number the function
will raise an exception.
Some of those are genuinely transient in that you can, say, fix the
file system outside of the (running?) program to allow your file
operation to go ahead. Some of them are effectively permanent if you
are passing the wrong kinds of parameters to system calls or functions
but are still classed as exceptions as a subsequent run through the
code could have changed the values the parameters are bound to and the
system/function call might now succeed.
:lname:`Bash` ``trap``\ s signals, mostly, but also has a
pseudo-signal, ``ERR``, representing the failure of a simple command.
We can do the same but be a bit more profuse with the kinds of errors
(er, exceptions!).
Actually, it's also pretty reasonable to throw in some signal handling
through the exact same mechanism just using a different part of the
condition hierarchy tree.
Condition Types
---------------
Conditions types are a hierarchy of structures with ``^condition`` at
the root. I chose to have a leading caret, ``^``, at the start of
condition type names to suggest the handling of conditions "above" the
normal operation of the code. I'm not massively taken by it on
reflection and might engage in a code-wide edit.
An ``^error`` is a child of ``^condition`` -- merely to start a clean
tree.
``^idio-error``, a child of ``^error``, is the start of our tree of
interest and introduces three fields:
- ``message`` -- the nominal problem
- ``location`` -- some indication, preferably relating to the user's
source code, of where the error occurred
- ``detail`` -- any other pertinent information, for example, the
value of a likely problematic variable.
We can now start deriving trees of condition types from
``^idio-error``, for example:
- ``^read-error`` with additional fields of ``line`` and ``position``
(byte index into the source *handle* -- file or string!) being the
common root of all reader errors.
- ``^evaluation-error`` with the additional field of ``expr`` being
the common root of all evaluation errors.
- ``^io-error`` and its derivatives primarily relating to files
- ``^runtime-error`` which has a wide tree of children including:
- ``^rt-parameter-type-error`` for passing the wrong type of value
into a function, for example we can't get the ``ph`` (pair-head)
of a number.
- ``^rt-command-status-error`` for when external commands fail
- ``^rt-signal`` for, er, signals albeit this is derived from
``^error`` directly as its asynchronous, out-of-band provenance
means there's no particular association with any message, location
or detail that ``^idio-error`` requires for its fields
And so on. There's quite a few.
SIGCHLD
^^^^^^^
Just as food for thought, there's some fun with *SIGCHLD*. When an
external process completes (or is stopped but that confuses this train
of thought) we, :lname:`Idio`, will get a *SIGCHLD* which means we'll
have updated our ``volatile sig_atomic_t`` and, at an appropriate
time, run an :lname:`Idio` *interrupt handler* -- a condition handler
being run out of turn -- which will, almost certainly be the code
written for *Job Control* (because we're a shell, remember).
If the external process exited non-zero then *that* code will raise an
``^rt-command-status-error`` condition which the user can write a
handler for.
Separately, the code that ran the external process will eventually
unwind to return ``#f``, if the process failed, which can be used by
any of the conditional forms in the shell-ish way:
.. code-block:: idio-console
Idio> if (false) "worked" "failed"
job 95645: (false): completed: (exit 1)
"failed"
.. sidebox::
For many many years I have had a :lname:`Bash` ``PROMPT_COMMAND``
which has figured out the exit/killed status and printed something
useful.
I feel that if *I*'m writing a shell, everyone should get that
useful behaviour.
The ``job 95645: ...`` line is printed by the *Job Control* code
separately to ``if`` being returned ``#f`` by the expression
``(false)`` -- which was reworked into an invocation of the external
command ``false`` which, er, exited non-zero.
``if`` can then run its *alternate* clause which is the string
"failed" which is what ``if`` returns and what the REPL prints.
Trap
----
Remember, first, that the :lname:`Scheme` model is to effectively
*replace* the failing code with the *handler*, in other words,
whatever the handler returns is what the original call returns.
We need to be able to indicate which kind of condition we are
interested in, some handler code and the set of code this is going to
be valid for:
.. parsed-literal::
trap *condition* *handler* *form+*
where
- ``condition`` can be a single named condition type or a list of
named condition types
- ``handler`` is a unary function which is given the condition
instance if the condition type matches
- ``form+`` is the set of code the trap is valid for
This gives us code of the likes of (using both an anonymous function
and the ``{`` on the end of the line trick visually twice, albeit
lexically once):
.. code-block:: idio
a := #[ 1 2 3 ]
trap ^rt-array-bounds-error (function (c) {
#f
}) {
array-ref a 4
}
where
.. code-block:: idio
(function (c) {
#f
})
constructs my handler -- which could have been reduced to
``(function (c) #f)`` -- and
.. code-block:: idio
{
array-ref a 4
}
is my body forms.
The slightly weird indentation of the body of the anonymous
function is the result of my limited abilities with hacking an
Emacs mode although I've come to rather like it. The body
statements are two spaces indented from the word ``function``.
Here in the body forms, ``array-ref a 4``, we try to access the fifth
element of a three element array, ``a``. This will result in an
instance of ``^rt-array-bounds-error`` being created and raised which
will look for a handler by walking back out through the installed
handlers.
We happen to have installed an ``^rt-array-bounds-error`` handler
immediately beforehand (:socrates:`lucky break!`) so this will be
found and run. In this case the body of the handler is simply ``#f``
and as this handler is being run in place of ``array-ref a 4`` then
``#f`` is passed to its continuation (not shown).
Now that might not be what you expect, shouldn't we have seen some
error? Shouldn't we have collapsed back to the prompt? What does my
code do with ``#f``?
Certainly in the case of the latter point, why are you asking me, you
wrote the code! What you're really asking is how do I handle array
bounds exceptions? Well, that's up to you but you might want to test
the value you just got back and compare it to ``#f``.
``#f`` isn't a very good sentinel value anyway as, amongst other
things, it is the default value for an array element! Rather, you
might want to use a genuinely unique magic number courtesy of
``eq?`` and the uniqueness of values created on the underlying
:lname:`C` heap:
.. code-block:: idio
a := #[ 1 2 3 ]
magic := pair 'magic 'value
trap ^rt-array-bounds-error (function (c) {
magic
}) {
v := array-ref a 4
if (eq? v magic) ...
}
Here, ``magic`` the variable refers to a pair on the heap which
*cannot* ``eq?`` any other pair on the heap (even if created with
the same elements) because ``eq?`` does a :lname:`C` pointer
comparison. So, armed with something unique, return that from
your handler and compare against it. (Even better if ``magic`` is
a lexically private variable.)
Others might have suggested that you check the size of the array
first but at least this method flexes your algorithmic muscles.
We can change the handler to print a message and get someone else to
sort it out:
.. code-block:: idio
a := #[ 1 2 3 ]
trap ^rt-array-bounds-error (function (c) {
eprintf "Dun goofed!\n"
raise c
}) {
array-ref a 4
}
(no surprises that ``eprintf`` *printf*\ s to *stderr*)
Now, the ``raise c`` expression (re-)raises ``c`` to whomever further
out in the condition handler tree is willing to handle
``^rt-array-bounds-error``.
The chances are that no-one is except the system condition handlers
which will splurge some message to *stderr* and restart processing
from some safe place:
.. code-block:: idio-console
Dun goofed!
foo.idio:line 7: ^rt-array-bounds-error: array bounds error: abs (4) >= #elem 3: idio_array_ref
(The ``abs (4)`` refers to the idea that you can index an array from
the end using negative numbers but either way is more than the array's
size, 3.)
We don't have to (re-)raise ``c``. Conditions are just things, we can
create a different condition and raise that instead.
Syntactic Sugar
^^^^^^^^^^^^^^^
The basic idea behind trapping conditions works well enough but we're
lacking some syntactic sugar to do two things:
- we probably want something like the familiarity of :lname:`Python`'s
``try``/``except`` where there is a single body form but multiple
handlers associated with it:
.. code-block:: python
try:
form+
except TypeA_Exception:
handler1
except TypeB_Exception:
handler2
which looks like it can be re-shaped into nested ``trap`` statements
fairly easily:
.. code-block:: idio
trap TypeB_Exception (function (c) {
handler2
}) {
trap TypeA_Exception (function (c) {
handler1
}) {
form+
}
}
noting the ordering rework from lexical to nested!
- add ``trap``'s continuation as something a handler can invoke to
collapse the function call tree:
.. code-block:: idio
trap $condition (function (c) {
eprintf "Giving up!\n"
trap-return 37
}) {
...
}
Not implemented.
Comments
========
:lname:`Scheme` has a rich set of comment mechanisms. So let's do
*more*.
Unfortunately, the normal scripting comment character, ``#`` has been
sucked into playing the wake-up call to the reader that a "funny"
expression is coming. Even more annoying, the stock :lname:`Scheme`
comment character is ``;``, which is a shell's normal statement
separator. *Bah!*
I can't honestly claim to have spent too much time thinking about it.
Things are different, get used to it.
Line
----
You can comment everything to the end of the line with ``;``.
There is some sense of styling when using this form of comment where a
single ``;`` is normally beyond the end of the text and is a
line-specific comment. Usually indented to column 40 (or more).
A double semicolon, ``;;`` is usually indented to the same level as
the block of code.
A triple semicolon, ``;;;`` is usually indented at column 0.
S-exp
-----
You can usually comment out an entire s-exp with ``#;``, including
multi-line s-exps:
.. code-block:: scheme
(let ((a 1))
#;(set! a
(+ a 1))
a)
should avoid ``a`` changing value.
I'm not sure if all :lname:`Scheme` implementations support this.
Multi-line
----------
Traditionally, :lname:`Scheme`\ s supports multi-line comments with
``#|`` through to ``|#``. A commenting style that can be nested --
making commenting out blocks of code containing already commented out
blocks much easier.
That's a good idea.
.. sidebox::
I appreciate this isn't quite the paradigm Knuth is suggesting but
just let's roll with it.
But this is where I want to differ. I like the idea of
:ref-author:`Donald Knuth`'s `literate programming`_ although I'm
unsure that it should be natural language interspersed with code
snippets. I fancy it should be code interspersed with natural
language -- the sort of thing that can be contained in commentary.
However what we need is something to extract those comments and pipe
them into some documentation generation system, and that's where a
multi-line commentary system bracketed by something involving a ``|``
symbol says to me "pipe it out". Albeit we've not identified, to
what.
Suppose, then, we take the first line (of the ``#|``) and it can
describe the necessary documentation generation system and whomsoever
is extracting the commentary can execute some appropriate system.
I'm being slightly vague there as, primarily, I don't have any feel
for that that documentation generation system might be. So,
currently, ``#|`` is just a nested multi-line comment waiting... *for
Godot?*
We should still be able to have regular nested multi-line comments
though without the fear that they'll be emitted to some external
system? Of course. Let's pencil in ``#*`` through to ``*#``.
And what's the point unless those two look out for one another? So
you should be able to both embed literate commentary inside regular
commentary and, of course, be able to comment out parts of your
literate commentary.
Finally, some complexity. The nested commentary systems are a bit
naive. They are only looking out for themselves and each other. You
might be a bit unfortunate with something like:
.. code-block:: idio
#*
; don't use *# here
*#
with the predictably hilarious result that the reader will find the
closing ``*#`` in the middle of the line comment and start processing
again with the word ``here``. Not brilliant.
To prevent that we need an escape character which, to no-one's
surprise is ``\``:
.. code-block:: idio
#*
; don't use \*# here
*#
But wait, ``\`` is my favourite character for my comments! **
OK, the first character after ``#*`` or ``#|``, if it is a graphic
character (ie. not whitespace) is the escape character:
.. code-block:: idio
#*%
; don't use \% %*# here
*#
where ``\`` does nothing special, ``%``, the escape character, will
first escape a space character then escape the ``*`` preventing the
line comment ending the multi-line comment. The nominal result being
``...\ *#...``.
*No, you can't use whitespace as an escape character!*
.. include:: ../commit.rst