Creating Variables and Values

So, three basic rules:

  1. whitespace is important

  2. one expression per line – remembering that the reader will keep going until it finds the matching brace, parenthesis, bracket or double-quote

  3. everything returns a value

Also, Idio is a bit pernickety about things not working.

The memory used by Idio is Garbage Collected so that, by and large, you don’t need to be concerned about pro-actively allocating or freeing memory.

Of course, as with any programming language, if you maintain references to scarce operating system resources, like file descriptors through file or pipe handles, then you’ll eventually run out.

Variables

Variable names can contain lots of meaningful punctuation characters hence why whitespace is important.

Declaration

Variables must be declared otherwise the system will think that you don’t know what you’re doing.

In a very formal way you would define variables:

define a 10

but there’s a more familiar infix operator, :=:

a := 10

That’s created a “top-level” lexical scope variable in the current module.

If you were inside a block, { ... }, then the declaration is local to the block:

a := 10

{
  a := 5     ; local to block declaration,
             ; occludes the top-level

  a          ; 5
}

a            ; 10

Environment Variables

As a variation on a theme, :* [1] creates an environment variable or, more precisely, a dynamically scoped variable tagged as an environment variable.

The environment part doesn’t do much, yet. Only when the system decides to run an external command will it gather up all the existing tagged-as-environment variables and create an environment for the command.

All existing operating system environment variables (environ(3P)) at startup are created as these environment variables. Just like a regular shell.

Rather than remembering the old value, assigning a new and then re-assigning the old value back, you might want to “re-create” an environment variable for the transient period of a block:

ls -l                ; /usr/bin/ls (probably)

{
  PATH :* "/danger/will/robinson"

  ls -l              ; /danger/will/robinson/ls (possibly, or not found)
}

ls -l                ; /usr/bin/ls (again)

Here, the “new” PATH occludes the “main” one (or any other similarly defined “new” versions) and lasts, like any dynamically scoped variable until the end of the block it is defined in. Next time PATH is used we’ll be back to the original version.

Note

String Interpolation, say, to prepend something to PATH, is a bit clumsy in Idio, see below

Dynamic Variables

Similarly, :~ [2] creates a regular dynamically scoped variable.

For both dynamic and environment variables it is an error to try to use the variable outside of the scope of its definition.

Computed Variables

Similarly, :$ [3] creates a lexically scoped computed variable.

When you create a computed variable you defined it with either or both of a getter and setter function, the getter taking no arguments and the setter one (the value you are setting it with).

Those functions can do whatever.

Subsequently, whenever the evaluator sees a reference to the variable it will replace it with a call to the getter and where it is being assigned to it will be replaced with a call to the setter.

If you don’t define a getter or setter then it is an error to reference or set the variable as appropriate.

var-get := #n
var-set := #n

{
  local-v := 19

  var-get = function () {
    local-v + 10
  }

  var-set = function (v) {
    local-v = v
  }
}

var :$ var-get var-set

var                          ; 29
var = -9                     ; local-v is -9
var                          ; 1

As a more practical example, Idio defines SECONDS as a read-only value which returns the number of seconds that idio has been running for. There is no setter.

printf "running for %ds\n" SECONDS

Nested Recursive Functions

As mentioned before, :+ [4] creates a lexically scoped recursively aware variable inside a block. Clearly this is only useful for a function.

You cannot use this at the top-level, use regular define.

Assignment

Changing a variable or, rather, changing the value a variable references could use the formal set! or its more familiar = infix operator:

a := 10

{
  a = 7      ; change top-level

  a          ; 7

  a := 5     ; local to block declaration from now on,
             ; occludes the top-level

  a          ; 5

  a = 3      ; the local a is updated

  a          ; 3
}

             ; back to using the top-level

a            ; 7

Assignment works for variables of all kinds and scopes.

Values

Variables reference values and only values haves types.

Constants

#t and #f are the usual booleans.

Only #f is false and everything else is true. As a consequence, many lookup functions, in particular, return a useful value or #f so that you can test with the returned value and go.

#n is a nil/null value. It’s most common use is marking the end of a list.

A few other constants appear, in errors or at the prompt. They are all unusable in source code as there’s no mechanism to create them. They’ll have the form #<name>: #<unspec>, #<void>, #<eof> etc..

Numbers

There are two types of numbers, “small” integer fixnums or “arbitrary” precision bignums, which can be used interchangeably.

They look like regular numbers: -1, 1.23, 10e99 etc..

There’s a small caveat that to avoid one of the infix operators, floating point numbers must have a digit before the decimal place.

There is no support for non-finite quantities such as NaN or infinities.

Characters

“Characters” is a loose term in Unicode/ISO10646 when you really mean code points. There’s two ways of expressing a unicode code point:

  • #U+HHHH where the number of hex digits, H, is “enough” and leading zeroes are not required:

    c1 := #U+127
    c2 := #U+0127
    

    both are U+0127 (LATIN SMALL LETTER H WITH STROKE)

  • #\x where x is the UTF-8 representation of the code point

    c1 := #\ħ
    

    is also U+0127 (LATIN SMALL LETTER H WITH STROKE)

    The usefulness of this second form depends on editors and font support.

Strings

Strings are an efficient array of unicode code points contained within double-quotes and using \ as an escape character.

Strings can be multi-line.

The usual escapes are:

  • ANSI C: \t, \n etc.

    s1 := "hello\nworld"
    s2 := "hello
    world"
    

    are the same (assuming there are no other non-printing characters in the second example, just the string extending across two lines).

  • Unicode escapes:

    • \uHHHH where there are up to four Hs

    • \UHHHHHHHH where there are up to eight Hs – albeit only six should ever be required for Unicode code points.

  • byte escapes: \xHH with up to two hex digits

    This exists to create pathnames with non-UTF-8 byte sequences.

    Idio strings with ASCII NULs in them are fine, "hello\x0world", but operating system interfaces will not be so happy. A “format” error will be raised if you forget and pass such a string to those functions that might try to pass it to an operating system interface.

Pathnames

Pathnames are a specially tagged string. They exist because filesystems don’t have an encoding associated with them and are strictly a sequence of bytes (excluding 0x00) and so “strings” returned from the filesystem need to be flagged as different.

It’s a bit awkward but mostly correct.

Problems arise if you try to compare your UTF-8 encoded source code string with the byte-sequence string from the filesystem.

Of course, you can create pathnames in source to do such comparisons with the %P{...} expression:

utf8-name := "© 2021"
file-name := %P{\xA9 2021}

Here, utf8-name is seven bytes long as the UTF-8 encoding of U+00A9 (COPYRIGHT SIGN) is 0xC2 0xA9 whereas file-name starts with 0xA9 directly (and is invalid UTF-8).

Hint

If an operating system interface is queried for a filename, Idio will always return the filename tagged as a pathname.

Octet Strings

A final variant is an octet-string which is a sequence of bytes including ASCII NULs. It has a %B{...} reader form.

Mixing Strings

You can append-string strings together and join-string strings with a delimiter but be careful as mixing string variants will result in a gracefully degraded result: unicode to pathname to octet-string.

String Interpolation

There’s no “trivial” string interpolation in Idio, we need to do a little more work than the common shell-like embedding of variables, "/danger/will/robinson:${PATH}", although the expression to be expanded can more useful:

{
  PATH :* #S{/danger/will/robinson:${PATH}}

  ...
}

where the #S{...} construct is looking for embedded ${expr} expressions. The expr is a proper expression although it will commonly be a variable name.

Whatever the result of evaluating expr, it will converted to a string (if not one already) and the various snippets of string joined together.

It’s very useful for code generation.

Symbols

Symbols, “names”, perhaps, are a type in their own right. They are normally used as variable names, that is, references to values, but are quite handy as flags.

In a small number of cases you might need to quote the symbol to prevent the evaluator resolving a variable name to a value when you actually want the variable name:

a := 10

printf "%s is %d\n" 'a a

Keywords

Keywords are, essentially, symbols that start with : and are used as semantic flags: indicating an optional parameter to a function, say.

Handles

Handles are used for I/O and come in a few flavours: the obvious file handle, and less obvious pipe handle which together are file descriptor (FD) handles. There’s also string handles which act like file handles but use memory rather than the file system.

You don’t create a handle directly but, rather, create one of the flavours then largely continue using generic handle functions.

File Handles

open-input-file, open-output-file etc.. The usual sorts of interfaces.

Pipe Handles

Pipe handles tend to be created as input to or output from an external command. They are notionally identical to file handles except you can’t seek on them.

String Handles

open-input-string returns an input handle based on the supplied string. You can read and seek about in it like a file handle.

open-output-string returns an output handle where the backing store is memory. Write to it like a output file handle.

To get the resultant string back you need get-output-string which returns whatever you wrote. Compare with using, say, cat, to get back the contents of the file you just wrote.

C Data Types

For interaction with libc, the standard library, and C extensions, Idio supports the fourteen C base types and C library-oriented typedefs thereof and C pointer types through the C module.

There is limited manipulation of C data types, they are generally passed around opaquely. You can do some comparisons and some limited arithmetic of identical types – there’s no implicit casting between types. The general idea being that you ask the C API for some value which you can pass on to another C API.

The C pointer types can be tagged which, with appropriate support, means you can access fields of C structs reasonably easily:

sb := libc/stat "."
sb.st_ino            ; 69256525

Here, sb is a C/pointer type (and, specifically, a “CSI” tagged as libc/struct-stat) and the st_ino field is the portable C ino_t type.

You can query the C/pointer with C/pointer-name and C/pointer-members.

The member names are operating system-dependent and may include additional members (eg. st_atime for a libc/struct-stat which is likely to be a #define to st_atim.tv_sec) if supported.

Idio> C/pointer-members sb
(st_dev st_ino st_nlink st_mode st_uid st_gid st_rdev st_size st_blksize st_blocks st_atim st_mtim st_ctim st_atime st_mtime st_ctime)

Going deeper, you can query what an ino_t really is on your system in a couple of ways:

Idio> type->string sb.st_ino
"C/ulong"
Idio> libc/ino_t
ulong
  • type->string will return a representation of the C base type

  • there is a variable named after the module’s typedef which has the (symbolic) value of the C base type which is useful for creating instances of that type with C/integer->:

    C/integer-> 2 libc/ino_t
    

    for those people who feel confident about their inodes…

Functions

Functions are first class values in their own right. It is normal for them to be returned from function calls or blocks like any other value.

Declaration

The most common definition form is:

define (add a b) "
add `a` to `b`
...
" {
  ;; calculate a result in a local variable
  r := a + b

  ;; return the result
  r
}

define is playing a little trick and is actually creating an anonymous function value under the hood:

define add (function (a b) "
add `a` to `b`
...
" {
  ;; calculate a result in a local variable
  r := a + b

  ;; return the result
  r
})

The anonymous function, here, is wrapped in parentheses otherwise define would be being given (far) too many arguments.

Just to re-iterate: define is define var val where val is a single expression like "hello" or (2 * 3) or, here, a function value definition, which itself is function formals [docstr] body and which, because define wants a single expression, is wrapped in parentheses like the multiplication.

We can play this anonymous function trick ourselves with (a variant not bothering with local variables or documentation):

add := function (a b) {
  a + b
}

Here, everything to the right of the declaration, :=, is given over to the evaluator which recognises it as a function declaration: function formals [docstr] body.

We can even go one step further and have the function returned at the end of a block – because everything returns a value:

add := {
  n := 10

  function (a b) {
    a + b + n
  }
}

This is a bit more interesting as the block also defines a local variable, n, which the function is using in its calculation. Nothing else can see that variable, it is entirely private to this function.

You can only return one thing at a time so there is a final trick if you want two or more functions to share such a private variable. Here, you recall, we can assign values to top-level variables inside the block:

add := #f
sub := #f

{
  n := 10

  add = function (a b) {
    a + b + n
  }

  sub = function (a b) {
    n - (a + b)
  }
}

Here, whilst the top-level variables add and sub are re-assigned inside the block, the block itself returns #<unspec>, a “there is no sensible value to return” value – not that anyone cares as no-one is using the value returned from the block, it is dropped on the floor.

At this point, you might think that with n as a fixed value things are pretty limited. But that block looks very much like the blocks being used as the bodies of the functions. Function take parameters, so n could have been a parameter. Call a function create-add-sub with some value, x, and the functions add and sub (technically, the function values the add and sub variables reference!) now use x in their calculations.

Nested Functions

Now we’ve got the idea that we can create, that is define, functions inside other functions, we have nested functions. That is, we don’t have to save them out to the top-level we could just be using them as helper functions within the body of the outer function.

There’s nothing to stop the helper functions having helper functions inside them. You could go a bit wild, here, but try to think of the person who has to maintain your code.

Nested Recursive Functions

Corner case alert!

Free variables are the problem here, these are variables which are referenced inside a block of code which the block did not contain the definition for:

define (foo a) {
  a + n
}

n isn’t defined anywhere (obvious) so we assume that it is defined in our top-level or the top-level of one of the modules we import. That seems reasonable, what else can we do? When we come to use n we’ll shuffle about looking for an n in our top-level or the top-level of one of the modules we import.

What if we want to call ourselves?

define (foo a*) {
  if (null? a*) #n {
    foo (pt a*)
  }
}

(this function doesn’t do anything useful other than walk along a list to the end and returns #n)

We know that is going to be rewritten to:

define foo (function (a*) {
  if (null? a*) #n {
    foo (pt a*)
  }
})

and, as the precursor to defining foo, the evaluator will see the function value creation:

function (a*) {
  if (null? a*) #n {
    foo (pt a*)
  }
}

which contains the free variable foo. Hmm, we haven’t defined foo yet but we’re trying to use it.

Well, we sort of get away with this as when we come to use the function value we’ll find foo in our own top-level and promptly call ourselves, which is what we want.

Technically, though, you could subsequently redefine foo and we’ll get the new foo instead. Which might not be what you want.

This is even worse inside a block as names disappear and when the code is called there’ll be no foo at the top-level. Oh dear.

{
  ...

  define (foo a*) {
    if (null? a*) #n {
      foo (pt a*)
    }
  }

  foo '(1 2 3)
}

There is an answer to all of this pernickety nonsense with what is called letrec in Lispy languages. Inside a block, any use of define is rewritten as a call to letrec.

letrec plays a little trick like the one we played above where we defined add and sum as #f outside the block and re-assigned to them inside the block. That means that the evaluator can see a use of the variable in its lexical scope and can therefore reference that instance rather than guessing that the variable is probably defined at the top-level somewhere.

Inside a block, letrec has a :+ infix operator:

{
  ...

  foo :+ function (a*) {
    if (null? a*) #n {
      foo (pt a*)
    }
  }

  foo '(1 2 3)
}

Pairs and Lists

A pair is, unsurprisingly, an object that references two things, a head and a tail. You’d create one with pair head tail.

The head and the tail can reference any value but the most common structure is for the head to reference something and the tail to reference another pair. That second pair’s head is likely to reference a similar kind of thing as the first pair’s head and the second pair’s tail to reference another pair. And so on. The final pair’s tail will reference #n.

That arrangement of pairs is called a list and occurs pretty frequently. So frequently that there’s a couple of ways in:

l1 := pair 1 (pair 2 (pair 3 #n))
l2 := list 1 2 3

l1 and l2 are identical constructions. Most people would prefer the latter construction if you have all the elements to hand but if you’re building a list piecemeal then you’ll see a lot of recursive loops using pair.

The value the head references doesn’t have to be as simple as an integer, it could be a list itself, a hash table, a ….

In the case of the head being a list, it is commonly called an association list:

l3 := '((#\a "apple" 'fruit)
        (#\b "banana" 'fruit)
        (#\c "carrot" 'vegetable))

where #\a is the key “associated” with the list (#\a "apple 'fruit). Functions like assq etc. look for a key, the #\a, and return the associated list if found.

Arrays

An array is an integer indexed collection of references to values starting at 0 (zero). You can create an array with make-array or by using a simple static initialiser:

arr := #[ 1 2 3 ]

(It’s the reader creating the initial array, here, and it doesn’t understand variables so simple numbers and strings only.)

or something a bit more flexible:

arr := array this that the-other

(array isn’t really a constructor per se but just calls list->array on your behalf.)

They are dynamically allocated but you can only access those elements that have been placed. You can’t expect to create an array of one element then try to access the fourteenth.

You can grow them by pushing elements onto the end with array-push! or front with array-unshift!. The array will grow by one element.

array-pop! and array-shift! will shrink the array by one element.

You can use negative indices which become array-length + index such that -1 gets the last index, -2 the second last index, etc.. Obviously, if array-length + index is actually negative you’ll get a range error.

You can iterate over the array with array-for-each-set and fold-array.

Hash Tables

A hash table is an indexed collection of references to values. The index can be any value except #n.

So, want to use function values as an index to something? Go right ahead.

Test for a hash table with hash?.

You can create a hash table with make-hash or by using a simple static initialiser:

ht := #{ (#\a & "apple") (#\b & "banana") }

(It’s the reader creating the initial hash table, here, and it doesn’t understand variables so simple numbers and strings only and single values for the head and tail defining each key-value pair.)

or something a bit more flexible (noting the reader form ultimately calls alist->hash):

ht := alist->hash '((#\a "apple")
                    (#\b "banana"))

(which doesn’t quite create the same values as the previous example as here the hash values are lists of one string)

make-hash let’s you specify your own equivalence and hashing functions.

Use hash-ref and hash-set! to access and create/override entries and hash-delete! to delete entries. hash-exists?.

Use hash-keys and hash-values to get the set of keys and values.

You can iterate over the hash with hash-walk and fold-hash.

Structures

A struct is a named indexed collections of references to values.

Names as in symbols which means you might need to be careful and quote the field name if someone has created a variable that uses your cunningly named n field.

There’s two parts to structs, a type which declares the field names and instances of that struct type.

Formally, structs have a parent type leading to a graph of relationships between structs as in the condition types hierarchy.

If you define a structure you can control how it is printed.

Modules

A module allows you to create namespaces.

The mechanism is simple and you can only use true names, not variables:

module name

export (func1 value2)
export func2

import other-module

...

provide name

With the module’s source code in name.idio.

Any names other than the ones you export cannot be accessed by any importing module. They can’t see your top-level n variable or func function unless you export it.

It’s all-the-names-or-nothing on the import front, at the moment.

Some modules are quite polluting, the standard library, libc, defines names that clash with regular shell commands, for example.

Instead, you can pick and choose “direct” names, say, libc/mkdir, using the combined module/name scheme, rather than get every libc name with a crude import:

module Foo

mkdir "foo"          ; /usr/bin/mkdir (probably)

libc/mkdir "bar" (C/integer-> #x644 libc/mode_t)

libc is an awkward example as it is used by the bootstrap and therefore has already been loaded by someone. In general, if you want to use a name from a module, require the module first:

module Foo

require json5        ; Foo's namespace not modified but
                     ; json5's names can be used directly

define (parse-file fn) {
  json5/parse-file fn
}

; no possible conflict in names
parse-file "foo.json"

Last built at 2026-01-04T22:40:03Z+0000 from da47fd3 (dev)