Idio Look¶
Scheme looks like it can be used to program computers, indeed, it looks like it can do far more in terms of programming computers that we, simple scripters, are used to or, more likely, capable of! It is very succinct with several key implementation details:
closures, we can close over variables and use them in a function body
anonymous functions, we can create functions on the fly
functions are first class, we can construct and return them for others to call
compound data types
error handling
macros, at least syntax transforming macros
continuations, as a means of deriving exceptional behaviour they are required though whether we would want the full power of continuations available is debatable. If we don’t then users cannot create their own escape procedures – remembering that we’re meant to be writing a shell!
Variables might be introduced differently but are otherwise the same. Data structures are broadly the same with a pair simply being the guts of a linked list.
Calling functions is essentially no different, function followed by arguments.
But, and I suppose it is the never ending bugbear of Lisps, it looks a bit funny. However, thanks to syntax transformations, it can look like whatever we want it to, so long as we can transform it back into something the underlying Scheme-ish engine can handle.
We should recognise that we’re describing two things: firstly we need the core functionality to make some behaviour happen, think about the ability to create and manipulate file descriptors, and secondly, some syntactic sugar to abstract away the repetitive detail.
So, how should the language look?
Line-Oriented¶
I’m quite content with my ALGOL-inspired syntax and with my shell hat on, I want, nay, insist that I be able to type:
ls -l
With no (obvious) punctuation, just like the shell. Isn’t that the point?
Mind you, with our Scheme hats on, ls
is undefined in
the program (probably). We know, with our shell hats on, that if
ls
is not a function or alias then the shell will take it upon
itself to have a rummage around the shell’s PATH
to find an
ls executable. From a Scheme perspective, though,
we need to break the behaviour that ls
is undefined.
Similarly, for -l
. Is that a funky function to subtract the value
of l
from its argument (like a +1
function) or just a string
of characters to be passed as an argument to some command?
It’s an error because there’s no sensible translation of an Idio function (value) to an argument in execve(3).
-l
is OK because, even though we will have it in our hands as
an Idio symbol there is an obvious translation of a
symbol into a string of characters.
The inverse is also an issue. Typing make debug
results in an
error when debug
is the name of a function. Tricky.
I like the idea that I can, at long last, have hyphens in my variable names saving us from the thrall of underscores or CamelCase or whatever – albeit with the cost that whitespace is required to distinguish between terms. I can live with that, whitespace is cool.
I like that other punctuation characters are available to add meaning
to my names, ?
and !
, and I have a cunning plan for accessing
structure fields with .
in the Jinja way (my original inspiration
was Perl’s Template::Toolkit).
However, there is a far worse bind we put ourselves under if we allow punctuation characters in names as it conflicts with the meta-characters the shell uses to spot pathname patterns:
ls -l *foo*.txt
Is that:
the variable
*foo*.txt
?the variable
*foo*
, a structure, where we want to access the fieldtxt
?the variable
*foo*
, an array where we want to access thetxt
th element (txt
itself being a variable)?a pattern match expression to be passed to glob(3)?
Hmm.
What if none of the above are true? If *foo*
(or *foo*.txt
)
is yet to be defined at the top level, say, then we have a runtime
decision to make which is not a good place to be if we are thinking
about any form of compilation.
I confess I’m at a bit of a loss here and I’m leaning in favour of a consistent programming language over shell-ish conveniences.
For pathname matching I’m turning towards the idea that since we intend that the result of a glob(3) pattern match be a list that we preserve as a list until it is required then perhaps we should distinguish the pattern match that creates it. Does something like:
ls -l #P{ *foo*.txt }
cause palpitations? In one sense it is a bit like preparing a regular expression in Python or Perl.
It’s not Art, I agree. However, there is some precedence with Murex taking to the likes of:
ls -l @{g *.go} # globbing
ls -l @{rx \.go$} # by regex
although I have this feeling that it would be better/more generic/more
in the style of something? if we indicated a filter to the globbing
mechanism through the environment in the same was we might use
IFS
.
There’s a very similar problem with plain filenames that you might use instinctively in the shell:
ls -l foo
As this is Idio and not the shell, foo
will be looked to
be evaluated – and could be – otherwise would be at risk of causing
an unbound error like ls
might. Here we get a mixed bag, if
foo
is defined then we’ll get the value of the variable foo
(which could be anything, of course) otherwise the symbol foo
will be passed through to the (external) command execution system and
the string of characters foo
will be passed to ls.
As a transient feature most things that expect a filename – recall we don’t have a filename type internally – will work with a string (implying no globbing!):
ls -l > "foo"
open-input-file "foo"
This has led to a few thoughts that perhaps filenames should be a subclass of strings – an extra flag on the underlying string type, maybe – allowing for them to be handled appropriately.
There’s a myriad of problems here, though, in that until we have
successfully stat(2)’ed the file we don’t know it does or
even can exist. Further complicating matters is that operations
such as open-output-file
are allowed to create files therefore
we must allow for a filename that we don’t know if it can exist.
Single Word Feature¶
By and large command lines are a command and some arguments, ls -l
etc.. That’s fine because it translates easily into a function
application, (ls -l)
, just like (+ 1 2)
.
However, there is an outstanding feature which is a side-effect of the
REPL. If your command consists of a single word, eg. ls
then it
is indistinguishable in a standard Scheme-ly REPL where any
single word will be evaluated and its value printed:
Idio> n := 10
Idio> n
10
Idio> map
#<CLOS map @51321/0x30c4580/Idio>
giving us the value of n
and some internal representation of the
closure, map
.
(Internal representations having no useful meaning possibly even to
developers. Indeed, I had to look at the source to remind myself what
those values represent as I only really look at the CLOS
part
telling me it is a closure.)
Which is what you want. Well, it’s what many people would expect from an interactive Lispy session.
Consequently, typing ls
will get, um, ls
printed back as it is
determined to be an undefined symbol where we choose, in an
un-Scheme-ly fashion, to simply print the symbol (rather than
a more Scheme-ly raise an unbound error).
If you want to force the single word command ls
to be invoked then
you must put it in parentheses:
Idio> (ls)
My Documents
...
That’s true of functions as well:
osh := (open-output-string)
It’s quite annoying for me where I habitually type make
at the
Idio prompt to have make
printed back at me. Not
helpful. I’ve taken to typing make -k
a lot more….
TBD
I don’t like it. I’m not sure what a better behaviour might be.
Complex Commands¶
There’s another awkwardness from the idea of a line-oriented shell for
complex functions, ones that have multiple clauses. Take, cond
,
for example, which is nominally:
(cond (c1 e1)
(c2 e2)
(else e3))
cond
by rights, should be invokable in the same way as ls
,
ie. without leading parenthesis but that would lead us with:
cond (c1 e1)
(c2 e2)
(else e3)
which our line-oriented engine is going to see as three distinct statements – albeit with the second two having exaggerated indents.
Python supports the idea of indented code – indeed you can see references to indent and deindent in the parser – but it doesn’t feel like the indentation here is a syntactic thing, it’s really a visual aide-mémoir, after all, we could have written:
cond (c1 e1) (c2 e2) (else e3)
and be done.
Except the condition and expression clauses are almost certainly complex and the resultant enormous line would be difficult to read let alone maintain.
The original (cond ...)
across multiple lines works because the
Scheme-ish engine is looking for the matching close-parens
(for the leading open-parens on the first line) and so will consume
all lines until it gets it.
For our “unwrapped” cond
, we can use a regular shell-ish line
continuation character:
cond (c1 e1) \
(c2 e2) \
(else e3)
But, be honest, it looks a bit clumsy. And I can say that with some confidence as I had, out of a duty to see it through, written all the complex multi-clause forms using this style. (What an idiot! I sense growing sagacity of the language name…).
This gives us the dreadful:
if (some test) \
(truth-fully) \
(not so truthy)
I know, I know! (And it gets worse.)
Of course, you can continue to use the wrapping parenthesis – all that the non-wrapped line is doing is having the wrapping parenthesis silently added – but the result is like the Curate’s egg:
ls -l
(if (some test)
(truth-fully)
(not so truthy))
and, to be honest, I find it less appealing than the clumsy variant. The line-continuation style has the decency to be consistent.
*
Actually, when I came to write the metacircular evaluator in
lib/evaluate.idio
I gave up and used the “wrapped” form.
Mostly because it was easier to convince my editor to line the
elements up to preserve my sanity.
It’s OK, in the end.
Infix Operators¶
There is the issue of infix operators, |
and arithmetic operators
amongst a plethora of others.
I think there’s a trick we can pull here following in the footsteps of
the reader macros for quote
and friends.
Suppose we have a means to declare a symbol as an infix operator together with some behavioural code. Then, after the reader has read the whole line/parenthetical expression in, it goes back and looks to see if any of the words are an infix operator. This is much like macros where the evaluator goes and looks for macros and behaves differently except we are running this before reaching the evaluator.
So, if I had typed:
zcat file | tar tf -
then the reader will have read in six words in a(n implied) list. It
can scan along, find that |
is an infix operator and call its
behavioural code. I’ll assume we’re all happy that it wants to rework
this into:
(| (zcat file)
(tar tf -))
Always a good position.
which is a simple list transformation requiring no knowledge of anything.
After this transformation we have a |
in functional position and
so the evaluator will expect it to be the name of a function.
Had someone typed the second form in directly then the reader would have left it alone as the thing that looks like an infix operator can’t be, because it’s the first element in the list. An infix operator (surely?) has to have something before it to be infix.
Recall I suggested that this happens in the reader for parenthetical expressions so that if you’d typed:
zcat file | (tar tf - | grep foo)
(I’m leaving foo
in there as it fails my point about symbols and
expansion but is easier to read whilst we mull over the idea.)
Although we start reading zcat file ...
, the first complete
parenthetical expression read would be (tar tf - | grep foo)
which
can be re-written as:
(| (tar tf -)
(grep foo))
to become part of the outer line-oriented expression when it is eventually completely read in (by hitting the end of line):
zcat file | (| (tar tf -)
(grep foo))
This time, even though there’s a |
in the middle of the second
expression it isn’t directly in the outer expression which looks, to
the reader, like:
rhubarb rhubarb | rhubarb rhubarb
and can be transformed into:
(| (zcat file)
(| (tar tf -)
(grep foo)))
Contrast that with multiple instances of the operator in the same expression:
zcat ... | tar ... | grep ...
which we might transform into:
(| (zcat ...)
(tar ...)
(grep ...))
It’s more subtle than that, though, as a pipeline (and the logical
operators and
and or
) take multiple words as their arguments,
including other operators, yet arithmetic operators (and IO
redirection) take only a single argument either side.
Scheme might allow:
(+ 1 2 3)
but
1 + 2 3
is incorrect in regular arithmetic. That means that the code for operators needs to do some syntax checking. It’s not great that syntax checking is happening in the reader but, hey ho. Let’s run with it.
To complicate matters, +
and -
are commonly unary operators as
well as binary ones: - n
should return negative n
(remembering
that -n
is a symbol!).
Operator Associativity¶
The arithmetic operators have associativity, that is 1 - 2 - 3
is
equivalent to (1 - 2) - 3
as -
is left-associative. +
is,
mathematically, non-associative although usually defined in
programming languages as left-associative. Assignment is
right-associative – evaluate the value first!
Pipelines are left associative, hence the triple pipeline example is quite likely to be executed as:
(| (| (zcat ...)
(tar ...)
(grep ...)))
Even if its nominal form is all three children parented by the same
|
operator.
Operator Precedence¶
Careful! You might have internalised C’s operator precedence rules as The Truth!
We’ll use the same rules but only for familiarity reasons.
There’s also precedence: (1 + 2 * 3)
could be ((1 + 2) * 3)
or (1 + (2 * 3))
depending on which operator was run first.
Logical operators, pipelines, arithmetic, logical operators and IO redirection are all ordered by precedence:
tar *.txt 2>/dev/null | gzip > foo || echo whoops
should be interpreted as first pipeline || pipeline
, then cmd+io
| cmd+io
then, finally, arrange io
followed by executing
cmd
. Hence we might derive:
(or (| (io-> 2 /dev/null
tar *.txt)
(io-> 1 foo
gzip))
echo whoops)
using some putative (io-> fd file cmd . args)
function to handle
IO redirection (which doesn’t handle multiple redirections so doesn’t
exist).
Notice no execve(3) function has been introduced as, at
this stage, we don’t know if tar
and gzip
are internal
functions or external commands. All we’re doing is rewriting the
statements involving infix operators.
The Reader and Infix Operators¶
Such transforms will also mean that:
echo 1 + 2 3 * 4
will be re-written as:
(echo (+ 1 2) (* 3 4))
resulting in:
3 12
but the original form was hard for the human mind to scan – the pedantic grouping of sub-expressions of Scheme would have forced us to write:
echo (1 + 2) (3 * 4)
which is, at least, clearer in intent!
It’s not uncommon to have multiple clauses in a logical statement, drawn out over multiple lines for clarity:
if (this and
that and
the-other) \
...
So I feel that if the last word on a line is an infix operator then the expression is assumed to continue on the next line.
That said, I’ve gotten quite used to writing the more Scheme-ly:
if (and this
that
the-other) \
...
but the trailing operator trick stills stands.
Operator Overloading¶
“Operator overloading” is a fan favourite in other languages – which is another way of saying, heavily controversial.
Think +
is just about adding integers together? It’s common
enough to appear as string concatenation and so I guess people would
be happy enough to see it used for any kind of append operation
(lists, arrays, hashes(?)).
Our infix operators are blind to your types, though. They simply massage lists into another form. You’ll be wanting function overloading which means Generic Functions.
Operator Summary¶
I think this reader macro trick has some mileage.
Lexical Blocks¶
If you’re a Schemer, you’re quite used to:
(let ((a 1))
(+ a 1))
whereas others would be more at home with a more ALGOL-ish:
{
a = 1
a + 1
}
where {
starts a lexical block in which we can introduce lexically
scoped variables. They would likely all be introduced as a let*
or letrec
type as that’s the sort of behaviour non-Schemers would expect where we can have one variable
derived from another:
{
a = 1
b = a + 1
odd? = function ... even? ...
even? = function ... odd? ...
}
Additionally, the lexical block has an implied begin
meaning the
last calculated value is the one to be returned by the block.
The reader is intent on matching bracket-type things so will read
multiple lines – the lexical block’s body – to get the closing
}
.
They sound quite handy for Functions.
Assignment¶
I like the idea of define
to force the user to declare names –
and we’ll see the (pedantic Schemely) reasons why later –
however I think we ALGOL-types generally prefer the =
style.
I was thinking of some way to avoid the =
/==
mistakes
prevalent in C:
if (a = 1) ...
is probably not what you want, and started off thinking that a
Pascal-style :=
would come in handy. We can also make it
an infix operator:
[confident narrator voice] …we have the technology…
a := 1
...
which would be transformed by the reader into (:= a 1)
and then
the evaluator can introduce the variable a
giving it the value 1.
Technically, in the underlying Scheme-ish engine that’s going
to be a define
or a let
. In a lexical block, for example:
{
a := 1
...
a
}
would get transformed into:
(let ((a 1))
(begin
...
a))
in other words, after any :=
statement, the entire rest of the
lexical block becomes the body of the implied let
such that the
let*
-ish:
a := 1
b := a + 2
...
b
would get transformed into:
(let ((a 1))
(let ((b (a + 2)))
(begin
...
b)))
which, I think, works OK.
I have, however, failed in my visual-distinction task in that to
“modify” a
we use regular =
:
a := 1
...
a = a + 1
Noting that the reader will transform a + 1
into (+ a 1)
first as +
has a higher infix operator precedence than the
=
:
(= a (+ a 1))
Function calls¶
Function calls in assignments come out in the wash, here, as:
a = func sol bruh va
is transformed into:
(= a (func sol bruh va))
and (func sol bruh va)
is a regular evaluable form returning its
result to the assignment operator, =
.
We are still stuck with the single word feature as described above so you need to type:
a = (func)
if you’re not passing any arguments. *shrugs*
Top Level Assignments¶
Defining variables at the top level, ie. outside of a lexical block, comes in a couple of forms.
define
has two forms itself:
define variable value define (func-name formals+) body
or you can use one of the assignment infix operators:
variable := value function-name := function (formals+) body
The define
form for functions is much cleaner but the assignment
variant is used liberally when re-defining an existing function.
Non-lexical Variables¶
We want to get to handling environment variables seamlessly which I’ve been suggesting are a kind of tagged dynamic variable.
Dynamic Variables¶
Dynamic variables “live on the stack”, that is to say that their existence is dependent on the code path you have run and as your function call hierarchy unwinds the dynamic variables disappear. Access to the variable is more work because you need to run back up (down?) through the stack looking for your transient variable.
(This idea is reused – probably a bit too much!)
Nominally, there’s the dynamic-let
call which introduces a dynamic
variable (onto the stack) and starts processing:
(dynamic-let ((X 10))
(foo))
Normally, some function calls deeper, you would call (dynamic X)
to get the value of X
, your dynamic variable.
Obviously, you don’t want to be asking for a dynamic variable in some random bit of code on the off-chance. You’re meant to know what you’re doing!
This evaluator trickery doesn’t work quite as cleanly as we’d like. Dynamic variables effectively become top level variables.
Digressing a little, there is a mechanism in the evaluator to keep
track of variable names which can keep track of the lexical or dynamic
nature of a variable – by remembering what kind of form introduced
it. Subsequently, if we were to reference X
we would know it was
a dynamic variable and can therefore replace the variable reference,
X
with (dynamic X)
– essentially to provoke the stack-walking
mechanism – and all is good.
Back to our variable initialisation.
We have :=
for lexical assignments, how about :~
for dynamic
variables? ~
representing the maybe, maybe not dynamic nature of
the beastie.
Environment Variables¶
Environment variables are much more shell-ish. I’m suggesting we want to implement them in a similar fashion, as an “environment” variable with a dynamic nature. They are different to dynamic variables in that whenever a program is executed it will have an environment created which is built from any extant “environment” variables.
Look, it’s just a little bit of technical debt that, when we get round to it, we can apply the extra experience we’ll have picked up in the meanwhile to do a better refactoring job.
What’s not to like?
It could be implemented by tagging some dynamic variables as “environment” variables or it could be implemented as an entirely parallel and separate set of dynamic variables. *cough*
The reason for this dynamic nature is I want to be able to say:
{
PATH := path-prepend PATH /some/where/else
do stuff
}
Here, for the duration of this lexical block, ie. whatever effect we
create should be unwound at the end of the block, I am creating a
new PATH
variable which should be used by anyone looking up the
PATH
environment variable if, say, they want to find an
executable.
After this lexical block people can find the old value.
I’m thinking :*
here, with *
signifying the stars, the
environment surrounding us!
So that should have been:
{
PATH :* path-prepend PATH /some/where/else
do stuff
}
You might ask why didn’t I just modify the value of PATH
? Well,
modifying it means that everyone after this lexical block will see my
transient changes unless I ensure that I can unwind the changes
manually before I’m done.
Even on error and in the face of continuations? Hmmm, “tricky.”
Of course, modifying PATH
, say, for everyone following is
perfectly normal we’re just covering the variable assignment prefix
case of:
PATH=/some/where/else:$PATH do stuff
Un-setting Transient Variables¶
There is another corner case where you might want to unset such dynamic variables, possibly transiently, and almost always for environment variables.
For that we need to add a stack marker that says: stop looking any further and return “failed to find”.
We need to kind of indication we’re stomping on the normal way of
things so I’m penning in !~
and !*
for dynamic and environment
variables respectively.
Computed Variables¶
There’s another class of shell variables we want to emulate. Remember
SECONDS
which returns us the number of seconds the shell has been
running for? (There’s $^T
in Perl as well.)
There’s clearly a bit of magic there where the simple act of accessing a variable has resulted in a (hidden) function call.
I’m calling these computed variables (not least because many others have done so before).
It would be neat if we could allow the user to create these.
SECONDS
, mind, is probably one for the language implementer as it
requires something in the language bootstrap to start the clock
rolling!
There’s another twist for computed variables. They might be
read-only, like SECONDS
for which it makes no sense to assign a
value to them. They might be write-only like srandom(3),
the seeding mechanism for random(3) where it defeats the
purpose to get back that secret seed value. It might be read-write
like the shell’s RANDOM
(which combines the behaviour of both
srandom
and random
).
If the user is defining a computed variable then they must pass two
parameters to the initialisation: a getter and a setter. If you want
it to be read-only pass #n
(aka. nil
) for the setter. Pass
#n
for the getter for a write-only variable. Passing #n
for
both should result in an error – don’t be annoying!
As for the infix operation/function name, try :$
for size!
We can then rustle up something like:
getter := #f
setter := #f
{
p := 0
getter = function () {
p = p + 1
p
}
setter = function (v) {
p = v
}
}
cv :$ getter setter
printf "%d %d\n" cv cv
cv = 10
printf "%d %d\n" cv cv
which should display:
1 2
11 12
Functions¶
Of course, functions could change very little from how they exist now:
function foo ()
{
echo $(( $1 + 1));
}
other than a little textual transformation into, say:
define (foo a) {
b = a + 1
b
}
the only useful difference being that we have formal parameters.
Notice too the start of a lexical block, {
, on the end of the
first line of the function declaration. That means our line-oriented
reader will continue reading through to the matching }
thus
creating the function body.
If it wasn’t on the end of that line:
define (foo a)
{
b = a + 1
b
}
would be a semantic error as define (foo a)
was read and deemed by
the line-oriented code reader to be a whole expression. But it is a
function declaration with no body and so is an error. The remaining
lexical block is, well, just a lexical block. That’s legal although
you might get complaints about this new variable a
that has
appeared.
I like the {
on the end of the line as it is, by and large, the
way I write code anyway. Others will, no doubt, be very angry.
Functional Block¶
Ruby and Swift both allow a block to take formal arguments thus allowing an alternate form of creating anonymous closures:
{ |n| n + 1 } ; Ruby
{ (n) in n + 1 } ; Swift - could have said { $0 + 1 }
The more you’re used to seeing it the easier it is to scan.
I’m not so tied to the idea, although anonymous function declarations
are quite wordy. The define
above is re-written internally to:
define foo (function (a) {
b = a + 1
b
})
with the extra parentheses around function ...
to create the
function value – otherwise it is just a string of words giving
define
too many arguments!
What the Ruby and Swift variants are doing is
getting rid of the function
keyword and shifting the formal
arguments inside the block.
Pipelines¶
We’re fairly happy, I think, that we should have regular shell-ish pipelines:
zcat file | tar tf -
I’ve not implemented it yet – partly because &
is used for pairs
– but you can image a form of postfix operator, &
to background a
pipeline, like the shell:
zcat file | tar tf - &
Previously I said I was going to reserve the C/shell logical
operators &&
and ||
for nefarious purposes.
Again, not implemented, but I’m thinking that ||
is more of an
object pipeline – so, like PowerShell and friends – we might be
able to create some functional composition/comprehension/cascade
(there must be a proper term) where the value returned by one function
call is the argument to the next:
func args || f2 || f3
although you immediately think that the later functions should be
allowed to have other arguments themselves in which case you’d need
some symbolic argument for the value being passed down.
Alternatively, f2
and f3
could be the result of currying
themselves and only take a single argument.
Perl used to use $_
for the anonymous value so our
equivalent would be something like:
func args || f2 a1 _ a2 || f3 _ b1 b2
However it might be done, you can recognise that it is a straightforward enough transformation into a nested call:
(f3 (f2 a1 (func args) a2) b1 b2)
On a different tack, &&
, might be used to “kick off in the
background” a thread (should we have any):
keep-an-eye-on stuff &&
This double-punctuation character thing might have some mileage.
How about >>
to collect “output” in a string? Of course, >>
is the append variant of >
for (pipeline) IO redirection. That’s
a bit unfortunate.
I appear to have lost my reference but I thought I’d seen some
alternate IO redirection forms, eg. >+
might be used for append
and >=
with an offset might be used to fseek(3)
somewhere in a file before writing.
Traps¶
We need to be able to handle errors and exceptions and we’ll use the broad idea of Scheme’s conditions. We want to trap a condition.
A condition (a state of being of the program, if you like) is the
parent of errors and exceptions (and other derived conditions, of
course). For no particularly strong reason I wanted to distinguish
between errors and exceptions. To some degree it doesn’t really
matter as they are all conditions and all get managed in the same way.
The distinction is largely taxonomic and poorly implemented as I look
back now and find that I’ve called everything
something-something-error
.
An error is something that cannot be fixed without rewriting the source code:
pair 1
is an error because no matter how many times you run or whatever
external/environmental changes you make, pair
takes two arguments
and you’ve only given it one. It is unrecoverable in any sense other
than to edit the source code.
Accordingly, for an error, it is not possible to continue with this thread of processing and the engine will revert to some safe place. The user cannot trap it.
An exception is any kind of transient condition, usually outside of the program’s control
if a system call goes wrong we’ll get an exception raised (by the C code wrappering the call) indicating
errno
if you fail to open a file – permissions, disk issues (inodes and data blocks in Unix), etc.
if you try to get the
string-length
of a number the function will raise an exception.
Some of those are genuinely transient in that you can, say, fix the file system outside of the (running?) program to allow your file operation to go ahead. Some of them are effectively permanent if you are passing the wrong kinds of parameters to system calls or functions but are still classed as exceptions as a subsequent run through the code could have changed the values the parameters are bound to and the system/function call might now succeed.
Bash trap
s signals, mostly, but also has a
pseudo-signal, ERR
, representing the failure of a simple command.
We can do the same but be a bit more profuse with the kinds of errors
(er, exceptions!).
Actually, it’s also pretty reasonable to throw in some signal handling through the exact same mechanism just using a different part of the condition hierarchy tree.
Condition Types¶
Conditions types are a hierarchy of structures with ^condition
at
the root. I chose to have a leading caret, ^
, at the start of
condition type names to suggest the handling of conditions “above” the
normal operation of the code. I’m not massively taken by it on
reflection and might engage in a code-wide edit.
An ^error
is a child of ^condition
– merely to start a clean
tree.
^idio-error
, a child of ^error
, is the start of our tree of
interest and introduces three fields:
message
– the nominal problemlocation
– some indication, preferably relating to the user’s source code, of where the error occurreddetail
– any other pertinent information, for example, the value of a likely problematic variable.
We can now start deriving trees of condition types from
^idio-error
, for example:
^read-error
with additional fields ofline
andposition
(byte index into the source handle – file or string!) being the common root of all reader errors.^evaluation-error
with the additional field ofexpr
being the common root of all evaluation errors.^io-error
and its derivatives primarily relating to files^runtime-error
which has a wide tree of children including:^rt-parameter-type-error
for passing the wrong type of value into a function, for example we can’t get theph
(pair-head) of a number.^rt-command-status-error
for when external commands fail
^rt-signal
for, er, signals albeit this is derived from^error
directly as its asynchronous, out-of-band provenance means there’s no particular association with any message, location or detail that^idio-error
requires for its fields
And so on. There’s quite a few.
SIGCHLD¶
Just as food for thought, there’s some fun with SIGCHLD. When an
external process completes (or is stopped but that confuses this train
of thought) we, Idio, will get a SIGCHLD which means we’ll
have updated our volatile sig_atomic_t
and, at an appropriate
time, run an Idio interrupt handler – a condition handler
being run out of turn – which will, almost certainly be the code
written for Job Control (because we’re a shell, remember).
If the external process exited non-zero then that code will raise an
^rt-command-status-error
condition which the user can write a
handler for.
Separately, the code that ran the external process will eventually
unwind to return #f
, if the process failed, which can be used by
any of the conditional forms in the shell-ish way:
Idio> if (false) "worked" "failed"
job 95645: (false): completed: (exit 1)
"failed"
For many many years I have had a Bash PROMPT_COMMAND
which has figured out the exit/killed status and printed something
useful.
I feel that if I’m writing a shell, everyone should get that useful behaviour.
The job 95645: ...
line is printed by the Job Control code
separately to if
being returned #f
by the expression
(false)
– which was reworked into an invocation of the external
command false
which, er, exited non-zero.
if
can then run its alternate clause which is the string
“failed” which is what if
returns and what the REPL prints.
Trap¶
Remember, first, that the Scheme model is to effectively replace the failing code with the handler, in other words, whatever the handler returns is what the original call returns.
We need to be able to indicate which kind of condition we are interested in, some handler code and the set of code this is going to be valid for:
trap condition handler form+
where
condition
can be a single named condition type or a list of named condition typeshandler
is a unary function which is given the condition instance if the condition type matchesform+
is the set of code the trap is valid for
This gives us code of the likes of (using both an anonymous function
and the {
on the end of the line trick visually twice, albeit
lexically once):
a := #[ 1 2 3 ]
trap ^rt-array-bounds-error (function (c) {
#f
}) {
array-ref a 4
}
where
(function (c) {
#f
})
constructs my handler – which could have been reduced to
(function (c) #f)
– and
{
array-ref a 4
}
is my body forms.
The slightly weird indentation of the body of the anonymous function is the result of my limited abilities with hacking an Emacs mode although I’ve come to rather like it. The body statements are two spaces indented from the word
function
.
Here in the body forms, array-ref a 4
, we try to access the fifth
element of a three element array, a
. This will result in an
instance of ^rt-array-bounds-error
being created and raised which
will look for a handler by walking back out through the installed
handlers.
We happen to have installed an ^rt-array-bounds-error
handler
immediately beforehand (lucky break!) so this will be
found and run. In this case the body of the handler is simply #f
and as this handler is being run in place of array-ref a 4
then
#f
is passed to its continuation (not shown).
Now that might not be what you expect, shouldn’t we have seen some
error? Shouldn’t we have collapsed back to the prompt? What does my
code do with #f
?
Certainly in the case of the latter point, why are you asking me, you
wrote the code! What you’re really asking is how do I handle array
bounds exceptions? Well, that’s up to you but you might want to test
the value you just got back and compare it to #f
.
#f
isn’t a very good sentinel value anyway as, amongst other things, it is the default value for an array element! Rather, you might want to use a genuinely unique magic number courtesy ofeq?
and the uniqueness of values created on the underlying C heap:a := #[ 1 2 3 ] magic := pair 'magic 'value trap ^rt-array-bounds-error (function (c) { magic }) { v := array-ref a 4 if (eq? v magic) ... }Here,
magic
the variable refers to a pair on the heap which cannoteq?
any other pair on the heap (even if created with the same elements) becauseeq?
does a C pointer comparison. So, armed with something unique, return that from your handler and compare against it. (Even better ifmagic
is a lexically private variable.)Others might have suggested that you check the size of the array first but at least this method flexes your algorithmic muscles.
We can change the handler to print a message and get someone else to sort it out:
a := #[ 1 2 3 ]
trap ^rt-array-bounds-error (function (c) {
eprintf "Dun goofed!\n"
raise c
}) {
array-ref a 4
}
(no surprises that eprintf
printfs to stderr)
Now, the raise c
expression (re-)raises c
to whomever further
out in the condition handler tree is willing to handle
^rt-array-bounds-error
.
The chances are that no-one is except the system condition handlers which will splurge some message to stderr and restart processing from some safe place:
Dun goofed!
foo.idio:line 7: ^rt-array-bounds-error: array bounds error: abs (4) >= #elem 3: idio_array_ref
(The abs (4)
refers to the idea that you can index an array from
the end using negative numbers but either way is more than the array’s
size, 3.)
We don’t have to (re-)raise c
. Conditions are just things, we can
create a different condition and raise that instead.
Syntactic Sugar¶
The basic idea behind trapping conditions works well enough but we’re lacking some syntactic sugar to do two things:
we probably want something like the familiarity of Python’s
try
/except
where there is a single body form but multiple handlers associated with it:try: form+ except TypeA_Exception: handler1 except TypeB_Exception: handler2
which looks like it can be re-shaped into nested
trap
statements fairly easily:trap TypeB_Exception (function (c) { handler2 }) { trap TypeA_Exception (function (c) { handler1 }) { form+ } }
noting the ordering rework from lexical to nested!
add
trap
’s continuation as something a handler can invoke to collapse the function call tree:trap $condition (function (c) { eprintf "Giving up!\n" trap-return 37 }) { ... } <trap-return to here>
Not implemented.
Comments¶
Scheme has a rich set of comment mechanisms. So let’s do more.
Unfortunately, the normal scripting comment character,
#
has been sucked into playing the wake-up call to the reader that a “funny” expression is coming. Even more annoying, the stock Scheme comment character is;
, which is a shell’s normal statement separator. Bah!I can’t honestly claim to have spent too much time thinking about it. Things are different, get used to it.
Line¶
You can comment everything to the end of the line with
;
.There is some sense of styling when using this form of comment where a single
;
is normally beyond the end of the text and is a line-specific comment. Usually indented to column 40 (or more).A double semicolon,
;;
is usually indented to the same level as the block of code.A triple semicolon,
;;;
is usually indented at column 0.S-exp¶
You can usually comment out an entire s-exp with
#;
, including multi-line s-exps:should avoid
a
changing value.I’m not sure if all Scheme implementations support this.
Multi-line¶
Traditionally, Schemes supports multi-line comments with
#|
through to|#
. A commenting style that can be nested – making commenting out blocks of code containing already commented out blocks much easier.That’s a good idea.
I appreciate this isn’t quite the paradigm Knuth is suggesting but just let’s roll with it.
But this is where I want to differ. I like the idea of Donald Knuth’s literate programming although I’m unsure that it should be natural language interspersed with code snippets. I fancy it should be code interspersed with natural language – the sort of thing that can be contained in commentary.
However what we need is something to extract those comments and pipe them into some documentation generation system, and that’s where a multi-line commentary system bracketed by something involving a
|
symbol says to me “pipe it out”. Albeit we’ve not identified, to what.Suppose, then, we take the first line (of the
#|
) and it can describe the necessary documentation generation system and whomsoever is extracting the commentary can execute some appropriate system.I’m being slightly vague there as, primarily, I don’t have any feel for that that documentation generation system might be. So, currently,
#|
is just a nested multi-line comment waiting… for Godot?We should still be able to have regular nested multi-line comments though without the fear that they’ll be emitted to some external system? Of course. Let’s pencil in
#*
through to*#
.And what’s the point unless those two look out for one another? So you should be able to both embed literate commentary inside regular commentary and, of course, be able to comment out parts of your literate commentary.
Finally, some complexity. The nested commentary systems are a bit naive. They are only looking out for themselves and each other. You might be a bit unfortunate with something like:
with the predictably hilarious result that the reader will find the closing
*#
in the middle of the line comment and start processing again with the wordhere
. Not brilliant.To prevent that we need an escape character which, to no-one’s surprise is
\
:But wait,
\
is my favourite character for my comments! <sigh>OK, the first character after
#*
or#|
, if it is a graphic character (ie. not whitespace) is the escape character:where
\
does nothing special,%
, the escape character, will first escape a space character then escape the*
preventing the line comment ending the multi-line comment. The nominal result being...\ *#...
.No, you can’t use whitespace as an escape character!
Last built at 2024-12-21T07:10:59Z+0000 from 463152b (dev)