.. include:: ../global.rst ************* Scheme Basics ************* .. aside:: Hey, buddy, some of us *aspire* to writing an ad hoc, informally-specified, bug-ridden, slow implementation of half of Common Lisp! .. epigraph:: Any sufficiently complicated C or Fortran program contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of Common Lisp. -- Greenspun's tenth rule Obligatory, relevant XKCD: .. image:: https://imgs.xkcd.com/comics/lisp_cycles.png :target: https://xkcd.com/297/ Why :lname:`Scheme`? :lname:`Scheme` is incredibly simple so should be easy to understand the implementation of (famous last words) but has a sophistication that rivals any other programming language and surpasses most *[citation needed]*. If it's that good, then why haven't :lname:`Scheme` (and any other Lisp-like language, say, :lname:`Common Lisp`) taken over the world? Almost certainly the first reason is that it looks too alien for our ALGOL_ accustomed eyes -- Lisps are *L*\ ots of *I*\ rritating *S*\ tupid *P*\ arentheses, as the old gag goes -- but it might be that it is too different in other ways. We shall see. .. sidebox:: The language's name was :lname:`Schemer` but fell victim to the original implementation's file system limits of six character entries. :lname:`Scheme` is *a* :lname:`Lisp` -- there are lots of :lname:`Lisp`\ s -- with a distinction that the core consists of half a dozen *forms* which, combined with :lname:`Scheme`'s use of :term:`homoiconicity`, allows you to build anything else. .. aside:: Whilst lines can be quite long in :lname:`Brainfuck` I suspect this is one shout we won't see. This ease of implementation allows for many shouts of "I've implemented :lname:`Scheme` in 50 lines of :lname:`Brainfuck`!" and more general use in academia as a pedagogical tool. The downside of that small core implementation means that, nominally, :lname:`Scheme` doesn't have arrays, hash tables, indeed, much of anything. People wanted some standard libraries and standardisation, for that matter, and so a series of :ref-title:`Revised Report on Scheme` then :ref-title:`Revised Revised Report on Scheme` (becoming :ref-title:`Revised`:superscript:`2` :ref-title:`Report on Scheme` or, more commonly, :ref-title:`R2RS`) through to the current :ref-title:`R7RS small`. :ref-title:`R6RS` caused something of a stink by departing from the minimalist feel resulting in the decision to split :lname:`Scheme` into large and small variants. Basic Operations ================ :lname:`Scheme` only really has two high level operations: a *reader* reads in source code and having de-/re-constructed it, much like a lexer, passes it on to the *evaluator*. The evaluator does all the heavy lifting. These two are generally called in a loop: #. read an expression #. evaluate it #. loop back to the start There is a variation on this for interactive sessions which looks like: #. read an expression #. evaluate it #. print the result #. loop back to the start Which give us the well-known acronym, :abbr:`REPL (Read Evaluate Print Loop)`. Reader ------ The reader isn't quite innocent, it is gifted with a few the power to do a few re-writes, usually trivial and accommodating the laziness inherent in all programmers. Importantly, though, it does no interpretation of the entities it sees, it really only wants to figure out distinct code blocks through matching parentheses and bundle them up into lists (of lists (...)). Evaluator --------- In the beginning, the evaluator looked at the lists of lists and figured out some meaning from them. Particularly the element in functional position and what kind of an element it is. For some forms it can apply some special treatment, for others it may do some syntactic transformations (which may alter the meaning of the expression!) before finally deciding to actually invoke some behaviour and do something useful. More advanced implementations might start performing some combinations of: * translation from :lname:`Scheme`-ish forms into intermediate representations * code analysis and subsequent optimisation * translation into pseudo-machine code for, commonly, a byte compiler. Home grown or more widely supported, eg. the JVM_. * generation of a compilable external language, eg. :lname:`C` (and subsequent compilation) * direct generation of host-specific machine code Instances running pseudo-machine code in a byte compiler (in a programming language *virtual machine*) might want to invest in a :abbr:`JIT (Just In Time)` compilation system. Syntactic Structure =================== Forms ----- Famously, :lname:`Scheme`, like all :lname:`Lisp`\ s, uses s-expression_\ s and its basic syntactic structure is a *form* which looks like: .. parsed-literal:: (*thing* *thing* *thing* ...) Where each *thing* can be another *s-exp* or an *atom* which, basically, is something that is not an *s-exp* (numbers, strings and so on). And... that's it. There is some syntactic sugar for expressions that are used a lot but, in essence, the syntactic structure of :lname:`Scheme` is a list (of lists (of lists (of lists (:socrates:`can you see what I'm doing, here?`)))). That said, *technically*, *lists* don't exist in :lname:`Scheme`. *Wait, what?* In truth, a *list*, such as ``(1 2 3)``, is syntactic sugar for a chained series of :ref:`pair `\ s. We'll get onto that. Atoms ----- Atoms are those things that are not *s-exp*\ s meaning they're numbers, strings and so on. The likes of: - booleans: ``#t`` and ``#f`` - numbers: ``-1``, ``23``, ``3.14`` etc. Let's take a moment to think about implementation. Numbers in the source code, like, ``-1`` and ``23``, should be looked as constructors for *values* representing the numbers -1 and 23. That they are constructors should be obvious (or distracting or confusing) as we don't know what a *value* is in the memory of our :lname:`Scheme` instance. They enter :lname:`Scheme` as a sequence of digits in the source code and emerge as a sequence of digits when printed out. What happened in between? A number could be represented by a :lname:`C` ``int``, say (hint: it almost certainly isn't). That doesn't tell us much about how big a number we can create (an 8, 16, 32, 64 or 128 bit integer?) and doesn't look like it'll hold a floating point number so something is going to have to make a decision on what *kind* of a number we are storing with the implication that there are going to be more than one. Unless we only deal with (small) integers. .. aside:: If only the clouds would keep still! :lname:`Scheme` also handles "large" numbers in the form of *bignums*. So, if you fancy counting all the atoms in the universe... So, all we can really say is that :lname:`Scheme` will consume the sequence of characters ``-`` and ``1`` or ``2`` and ``3`` or ``3`` and ``.`` and ``1`` and ``4`` and construct a value in memory such that various arithmetic operations can be performed using it and that that value can be printed out. - strings: ``"foo"`` etc. Again, a constructor for a value in memory representing the characters in between the double quotes. What implementation here? :lname:`C`-like ``NUL``-terminated strings? UTF-8? UTF-32? We don't know and shouldn't care so long as :lname:`Scheme` allows us to manipulate them with the usual string functions. - symbols: ``a``, ``foo``, ``user-name``, ``*ENV*``, ``arity+1`` etc. Names, if you like, used as both identifiers (which are used to refer to values or, in the :lname:`Scheme` nomenclature, they are *bound* to the value) and as symbolic (hah!) flags. There is a little curiosity with symbols, they are unique across the program in the sense that when you use the symbol ``x`` in one function and the symbol ``x`` in another function then the *symbol* is the same. There is only one symbol, ``x``, in the program. Clearly (hopefully!) they refer to different values as the functions are being evaluated but the symbol is the same. This means that comparing symbols -- though the circumstances when you might do that might not be obvious right now -- is as fast as any equality can be in the underlying implementation. In :lname:`C` that is going to be a pointer comparison. .. note:: Many punctuation characters are allowed in symbols which, depending on taste, allows for a more readable style. I, for one, like a hyphen instead of an underscore. :lname:`Scheme` is largely delimited by whitespace and parentheses which frees up a lot of characters to add meaning and colour to identifiers. There's no hard and fast rules but there are conventions, for example: * a predicate will usually have a ``?`` character appended so :lname:`Scheme` might have a function ``number?`` rather than ``isa-number`` (which itself is using punctuation, a hyphen, in its name). * a memory-modifying function will usually have a ``!`` character appended to make it clear you're up to no good: ``set!`` (for assignment) is the canonical example. Of course, that means that statements like the shell's assignment: .. code-block:: bash a=2 with no whitespace is a single symbol in :lname:`Scheme`. If :lname:`Scheme` had infix notation then we would be obliged to type: .. code-block:: bash a = 2 for an assignment. Of course, the shell demands that there is *no* whitespace in an assignment -- the very opposite! There are other data types: chars, vectors etc. which we'll come to when necessary. .. _pair: Pairs ----- A *pair* is ``(1 . 2)`` or ``(a . b)`` etc.. A simple tuple of any two other objects -- including other pairs -- where the source code and printed output representation contains a dot/period, hence the common name, a *dotted-pair*. A pair can be constructed with ``cons`` (short for, uh, *construct*): ``(cons 1 2)`` results in ``(1 . 2)``. ``(cons 1 (cons 2 3))`` results in ``(1 . (2 . 3))``. If the second element is another pair and the final element is (the well-known symbol) ``nil``, eg. ``(1 . (2 . (3 . nil)))``, then the source/printed representation can be reduced to a (more traditional looking) list: ``(1 2 3)``. This is such a common idiom that you can save some typing as the constructor for a list, say, ``(cons 1 (cons 2 (cons 3 nil)))`` can be simplified to ``(list 1 2 3)``. .. sidebox:: This form isn't used often but does show up to handle arguments for a variable-arity function, for example. If the final element is not ``nil`` then the collapsed format retains the final dot and becomes an *improper* list: ``(1 . (2 . 3))`` becomes ``(1 2 . 3)``. ``(1 . nil)`` will be a list containing a single element: ``(1)``. ``()``, the empty list is equivalent to ``nil``, so ``(1 . ())`` is also ``(1)``. The first element of a pair can be retrieved with ``car`` (CAR means *Contents of the Address part of the Register* from the IBM 704 that :lname:`Lisp` was first developed on): ``(car (cons 1 2))`` results in ``1``. ``(car (list 1 2 3))`` also results in ``1``. ``(car (list (list 1 2) 3))`` results in ``(1 2)``. The second element of a pair can be retrieved with ``cdr`` (*Contents of the Decrement Register*): ``(cdr (cons 1 2))`` results in ``2``. .. note:: The ``cdr`` of a list (not a pair) will be a list: ``(cdr (list 1 2 3))``, results in ``(2 3)`` or ``nil``: ``(cdr (list 1))`` results in ``nil`` (because ``(list 1)`` is really ``(cons 1 nil)``). Of course either element of a pair can be a pair (or a list) which allow us to build arbitrarily complex data structures, essentially, lists of lists (of lists (...)). ``((1 . 2) . (3 . 4))`` is not the same as ``((1 2) . (3 4))`` as in the latter case each of the sub-elements of the outermost pair is a proper list whereas it is a simple dotted-pair in the first. Delving into the depths of complex data structures then requires liberal use of ``car`` and ``cdr`` for which the ever efficient folk in :lname:`Scheme`-land have some shorthand: ``cadr`` is equivalent to calling ``car`` on the result of calling ``cdr`` on some value. Given the symbol ``value`` bound to the data structure ``((1 2) . (3 4 5))``: ``(car value)`` results in ``(1 2)`` ``(caar value)`` results in ``1`` ``(cdar value)`` results in ``(2)`` ``(cadar value)`` results in ``2`` ``(cdr value)`` results in ``(3 4 5)`` ``(cadr value)`` results in ``3`` ``(cddr value)`` results in ``(4 5)`` ``(caddr value)`` results in ``4`` Remember the ``cd*`` functions return a list if their argument is a list, not the first element of that list even if it is the only thing left -- of course, it isn't, there's a unwritten ``nil`` hiding at the end -- hence you require an extra ``car`` to get the element itself. Much of the interpretation of :lname:`Scheme` involves walking around such list data structures, you'll be pleased to know. Although when you see how neatly a recursive implementation makes it happen you'll nod sagely in appreciation. Mental Model ^^^^^^^^^^^^ In the above description for *pair* I wrote: ``(cons 1 2)`` results in ``(1 . 2)``. which is slightly disingenuous. ``(cons 1 2)`` *results* in a value in memory which, you can probably imagine, is going to be some data structure that ultimately has two pointers off to other things, here, things that represent the numbers 1 and 2. ``(1 . 2)`` is the *printed* form of that value but I think we are happy enough to use it as our mental model of that value in memory hence we can say that ``(cons 1 2)`` results in ``(1 . 2)``. Much like, however numbers are stored internally, we represent that implementation in our heads as 1. Homoiconicity Again ^^^^^^^^^^^^^^^^^^^ Back to ``(cons 1 2)``, that is *itself* a list, right? It's got that .. parsed-literal:: (*thing* *thing* *thing* ...) vibe going where the first value is the symbol ``cons``, the second value is the number 1 and the third value is the number 2. So, hang on. I can get the symbol ``cons`` by calling ``car`` on that list and the number 1 by calling ``cadr`` and... *Whoa!* ...the syntactic form of the language can be manipulated by the language itself. .. aside:: Wait, I think I can implement :lname:`Scheme` in 50 lines of :lname:`Scheme`! That's what they mean by :term:`homoiconic `. Etc. ---- Interestingly, :lname:`Scheme`, being quite old, is missing what many modern languages have an expectation of being normal. Hash tables, for example, are not a default :lname:`Scheme` data type. That leads many, if not most, :lname:`Scheme` examples to work without them, which they do quite well. In the meanwhile, things like hash tables are nice to have so a suite of "implemented in :lname:`Scheme`" examples have been created under the banner of *Scheme Requests For Implementations*, aka. SRFI_\ s. Evaluation Model ================ The evaluation model is equally simple: boolean, number and string values evaluate to themselves, symbols evaluate to the value they are bound to in memory and lists are treated as function calls where the first element is the function to invoke (often a symbol, ie. by name) and the remaining elements in the list are arguments to the function. .. note:: *Everything* returns a value. Some examples: .. code-block:: scheme 12 This will be interpreted as the constructor for a number value. Evaluation of the constructor returns a number value. Not much use as it stands! .. code-block:: scheme (number-add a 1) A list. The first element of the list is expected to be a function. In this case it is the symbol ``number-add``. Symbols, when evaluated return the value they are *bound* (ie. refer) to. .. note:: In :lname:`Scheme` it is assumed that the value returned by a symbol in the first position of a list being evaluated is a function, no particular type checking is done while the symbol is evaluated! Obviously, when :lname:`Scheme` tries to apply the function to the arguments then it will quickly discover whether the value is a function or not. The remaining elements are the arguments to the function, in this case another symbol, ``a``, and ``1``, a number (constructor). For these, the symbol will be evaluated and the bound value returned and the number will be constructed and the value returned. Finally, the function value (the result of evaluating ``number-add`` to get its bound value) will be applied to the argument values (whatever the value was that ``a`` was bound to and the number value of ``1``). Which is a slightly roundabout way of saying we're going to add 1 to the value being referenced by ``a``. Except that it isn't particularly roundabout. We've made assumption after assumption about how things work, or rather, you've just followed what been's said and accepted it as true. What happens if ``number-add`` is not a function, ``a`` is not a number, ``number-add`` doesn't take two arguments let alone two numbers, ``number-add`` (or ``a`` for that matter) is not defined anywhere (and whether we have any rights to know about it if it has been defined elsewhere)? Should we have evaluated the arguments before calling the function? Should we have evaluated the element in the functional position before or after the arguments? Should we have evaluated the arguments left to right, vice versa or in parallel (if we could)? Does any of that lot matter? Yes, of course. If a language is not well defined then implementers are at liberty to make decisions which means users will see different behaviour if they run their programs using different implementations. Evaluating arguments left to right (or vice versa) may have different side-effects and make or break the user's program. Not good. That said, ``number-add`` is not a standard function, ``+`` is: .. code-block:: scheme (+ a 1) Now, a famous bane of :lname:`Scheme` is that the function is *always* the first element of the list -- there are no infix or postfix operators. We are very used to seeing arithmetic written with infix operators, ie. ``a + 1``, and, for many people, there is a natural cognitive pause when reading prefix notation for arithmetic (and assignment). But that's bizarre, every other function call we make in the vast majority of programming languages uses prefix notation: .. code-block:: c :caption: C number_add (a, 1); .. code-block:: perl :caption: Perl number_add ($a, 1); .. code-block:: tcl :caption: Tcl [number_add $a 1] .. code-block:: scheme :caption: Scheme (number-add a 1) We don't skip a beat when reading those yet have a mental hiccup with regular arithmetic functions in prefix notation: .. code-block:: scheme (+ a 1) There's no particular answer to that. Other than to enable infix notation -- which is messy in :lname:`Scheme` and so *Lisp* language programmers just get used to prefix notation of arithmetic. As an aside, prefix notation has a benefit in that ``+`` isn't limited to 2 arguments: ``(+ 1 2 3 4 5)`` is just fine. As, indeed, is ``(+ 1)``. .. rst-class:: center --- Extending our trivial example, now a list of lists: .. code-block:: scheme (+ 1 (* 2 3)) Aha! We've got this. The outer list is really ``(+ 1 X)`` where ``X`` is the result of evaluating the inner list, the expression ``(* 2 3)``, so we would expect that ``+`` will we called with the arguments ``1`` and ``6``. Easy. Indeed, so. Notice that there was no arithmetic precedence rules, eg. ``1 + 2 * 3``, we had to explicitly state the multiplication expression. In fact, we have to explicitly delimit every expression in :lname:`Scheme` -- there are no shortcuts! This leads to another bane of :lname:`Scheme`, parentheses overload, there's a lot of them floating about. That said, if you took away arithmetic precedence and used regular prefix function calls, every other language would look similar: .. code-block:: c add (1, mul (2, 3)) which, if you remove commas: .. code-block:: c add (1 mul (2 3)) and pull the function names inside the parentheses: .. code-block:: scheme (add 1 (mul 2 3)) and replace arithmetic function names with symbols: .. code-block:: scheme (+ 1 (* 2 3)) then we're back to :lname:`Scheme` -- we've only changed the syntactic sugar, not the substance of the expression. Arithmetic precedence rules and infix operators, that's really the only difference. Incidentally, given that everything is going to be wrapped in parentheses and that we're going to have lists of lists of lists as we delimit every expression, :lname:`Scheme` calls these lists of lists (of lists ...) *forms*. Special Forms ------------- Wait, though, there some special cases where we can't just perform evaluation like the above. ``if`` is a case in point: .. code-block:: scheme (if #t (carry-on) (sleep-forever)) The ``if`` form takes three arguments: a *condition* expression which should result in a boolean value, a *consequent* expression to be evaluated if the boolean result was true and an *alternate* expression to be evaluated if the boolean expression was false. Notice the careful wording there: to be evaluated *if*.... ``if`` cannot have had its arguments evaluated before it is applied to the arguments otherwise we would have tried evaluating ``(sleep-forever)`` (and the other arguments) to find out their values. We might surmise that ``(sleep-forever)`` is going to take a while to complete and since the *condition* was ``#t`` then logically we shouldn't have been running it anyway. So ``if`` must be being treated specially and indeed it is called a *special form*. Special forms do not have their arguments evaluated before the function value is applied to the arguments. To handle this, the evaluation engine will have compared the ``car`` of the form to the symbol ``if`` and if it matches then starts to run the behaviour of the special form. The arguments remain unevaluated and are passed as is (atoms or lists) to the ``if`` behaviour code. OK, let's think this through. Inside the function that implements ``if`` there must be something that evaluates the *condition* expression and then if the result was true then evaluates the *consequent* expression otherwise evaluates the *alternate* expression. Who implements this ``if`` inside ``if``? Well, here we get into the murky meta world of language implementation and the implementing language. In the pedagogical examples of SICP (:cite:`SICP`) and LiSP (:cite:`LiSP`), the implementation of ``if`` (the ``if`` of our proto-:lname:`Scheme`, :lname:`Scheme`\ :sub:`0`, say) will be written in some existing :lname:`Scheme`, :lname:`Scheme`\ :sub:`1`, say. .. sidebox:: :lname:`Scheme`\ :sub:`2`, of course. .. aside:: :socrates:`And it?` Why, it's :lname:`Scheme`\ s all the way down. OK, so what is :lname:`Scheme`\ :sub:`1`’s ``if`` written in? Well, you'll have to go and look at the source code but you can imagine that you'll eventually find an implementation of :lname:`Scheme`\ :sub:`n` in :lname:`C` and therefore ``if`` will have been implemented using :lname:`C`’s ``if``. :lname:`C`’s ``if`` is, in turn, (probably) implemented with a machine code instruction to test and branch, itself written in processor microcode and eventually into transistor NAND gates where, um, ... *Look!* A squirrel! In the meanwhile, glossing over the detail, most functions are *derived* forms and their arguments are evaluated before the function value is applied to them but some *names* (in function position) are identified as special and the behaviour code is given the arguments as is. The term *derived* is from that idea that there are a few *core* forms, usually special forms, from which everything else can be... derived. We'll also have the distinction between *primitive* and (I don't think there is a special name for them so let's use) regular functions. A regular function is one we write in :lname:`Scheme`. A primitive function is one written in the underlying implementation language, say, :lname:`C`, whose interface is exposed such that it can be invoked as if it were a regular function. You create primitive functions as a combination of: * elementary memory management functions -- think of ``cons`` to create some, say, :lname:`C` structure on the heap, and ``car`` and ``cdr`` to poke about in it * bootstrap functions -- you might want a simple ``map`` function before it can be re-written into all of its glory * plain old expediency and efficiency functions -- some things just want a fast, tight loop. .. aside:: Local forms, for local people! Special forms sounds great for system-defined functions and people who want to hack away at the source code but what if **I** want one? Well, in many languages you're clean out of luck. Want to add a new syntax operator alongside ``if``, ``while`` etc. to :lname:`C`? That's not going to happen. However, the folks in :lname:`Scheme`-land are more generous and have blessed us with *macros*. And arguably cursed us with macros. A double-edged sword. Bindings -------- How do we bind symbols to values? We had ``(+ a 1)`` before. What is ``a`` referring to, how did ``a`` get bound to a value? There are a number of *binding forms*. .. _let: let ^^^ .. parsed-literal:: (let *bindings* *body+*) where ``bindings`` is a list of bindings and each binding is a list of a symbol and an expression and ``body+`` is one or more forms (hence the ``+`` in ``body+``) to be evaluated in the context of the bindings. Visually, ``bindings`` is potentially confusing. If we go over that a step at a time: #. ``bindings`` is a *list* of bindings: .. parsed-literal:: (let (*binding1* *binding2* *binding3*) *body+*) although your more likely to see it written (and write it) as: .. parsed-literal:: (let (*binding1* *binding2* *binding3*) *body+*) #. each binding is a *list* of a symbol and an expression: .. parsed-literal:: (let ((a 2) (b 1) (c (* 3 4))) *body+*) For each binding, the expression is evaluated and the symbol is bound to the resultant value then the remaining ``body+`` forms are evaluated in sequence with the symbols available for use. After the ``body+`` forms have been evaluated then the symbols are no longer available -- in other words, the *extent* of the symbols is limited to the lifetime of ``body+``. The result of the ``let`` form is the value of the last form in the ``body+`` forms. .. code-block:: scheme (let ((a 2)) (+ a 1)) Here, ``bindings`` is ``((a 2))``, ie. a list of a single binding, ``(a 2)``, and the ``body+`` forms is just a single form, ``(+ a 1)``. For our single binding, ``(a 2)``, recall that it is a list of a symbol and an expression. Here, the symbol is ``a``, the ``car``, and the expression is ``2``, the ``cadr``. The expression, ``2``, a number constructor, is evaluated resulting in a number value and ``a`` is *bound* to that value. If any of the forms in ``body+`` requires the evaluation of the symbol ``a`` then the result will be the number value 2. Of course, we *do* want to use ``a`` in ``body+`` where the expression ``(+ a 1)`` can now have its arguments successfully evaluated resulting in ``(+ 2 1)``. The function name, ``+``, can be looked up and (we presume) results in a function value that will add two numbers together for us. The result of applying the function value of ``+`` to the argument values *1* and *2* should be a number value (representing 3) and as it is the only form in ``body+`` then it is also the last and therefore the result of ``let`` itself. Wait a minute, ``let`` returns a value? Yes, as noted earlier, *everything* returns a value (albeit that some of them are *unspecified*). In fact, that ``let`` returns a value is used innumerable times to create closures. More of that later. A fractionally more complex example: .. code-block:: scheme (let ((a 1) (b 2)) (+ a b) (- a b)) we've now bound two symbols, ``a`` and ``b``, to two numbers, ``1`` and ``2`` and we have two forms in ``body+``. The first form adds the two numbers together and returns that result. Wait, returns it to whom? Well, we did say that the ``body+`` forms were going to be evaluated in sequence which means there must be some entity iterating over each form and it is to that entity that we've returned the number value. :lname:`Scheme` has been a little underhand here and rewrites forms from time to time. For a ``let``, the ``body+`` is re-written to ``(begin body+)`` where the sequencing function, ``begin`` will iterate through the individual forms in ``body+`` discarding all results except the last which it returns as its own result. This means the original multiple-body form variant of ``let`` with ``body+`` has been transformed into a pure single body form: ``(let bindings body)``. This sort of transformation (from an impure to a pure(r) format) is at the heart of syntactic transformations, aka. macros. So :lname:`Scheme` re-wrote our little snippet of code to: .. code-block:: scheme (let ((a 1) (b 2)) (begin (+ a b) (- a b))) and ``begin`` has consumed the result of the addition and will now evaluate the next form, a subtraction. This is the last (original) ``body+`` form and so the result of the subtraction, the number value -1, becomes the result of the ``begin`` function -- everything returns a value, remember -- and as the ``begin`` function has inserted itself as the only and therefore last body form of ``let``, it will be the result of the ``let`` too. Lexical Scope """"""""""""" Let's get busy: .. code-block:: scheme (let ((a 1)) (let ((b 2)) a)) has an outer ``let`` binding ``a`` to 1 and its body is another ``let``. This inner ``let`` binds ``b`` to 2 and evaluates its body which is the simple symbol, ``a``. The extent of the outer ``let``'s binding is still valid -- we're still in the outer ``let``'s body -- so ``a`` is bound to 1. The inner ``let`` has finished processing its body which has returned a value, 1, which it returns itself. The inner ``let``'s body has now returned a value and as it was the only body form so the outer ``let`` also returns 1. Some visual gymnastics, now. .. code-block:: scheme (let ((a 1)) (let ((a 2)) a)) has an outer ``let`` binding ``a`` to 1 and its body is another ``let``. This inner ``let`` binds a *new* ``a`` to 2 and evaluates its body which is the simple symbol, ``a``. In this inner ``let`` ``a`` is 2 -- it is the first ``a`` found when walking outwards across lexical bindings. The inner ``let`` has a value, 2, from its body which it returns. The inner ``let``'s body has now returned a value and so the outer ``let`` also returns 2. What if the outer ``let``'s body did a bit more? .. code-block:: scheme (let ((a 1)) (let ((a 2)) a) a) The inner ``let`` continues to create a new ``a`` bound to the number value 2 and returns it. However, the inner ``let`` is the first of two body forms in the outer ``let`` and therefore its result is ignored by the implicit ``begin``. All of the bindings of the inner ``let`` are now irrelevant and when ``begin`` evaluates the final form, ``a`` the only binding for ``a`` in scope is that of the outer ``let`` meaning ``a`` evaluates to 1. ``let`` has its limits, though, for a very specific reason: .. code-block:: scheme (let ((a 1) (b (+ a 1))) (+ a b)) This seems reasonable enough, ``a`` is bound to the number ``1`` and ``b`` is going to be the result of adding ``1`` to ``a``. Except it is an error. ``let`` won't allow you to use a symbol being defined in the bindings in the expression used to bind another symbol. We'll see why in the advanced section. Functions --------- Function *definitions* are likely to look something like a list of formal parameters and a body. The formal parameters are bound to the values of some expressions passed as arguments when the function is called and the formal arguments are available for use for the lifetime of the body. It's sounding quite similar to the :ref:`let`, above. Functions are first-class values in :lname:`Scheme`, this is to say you can pass them around like numbers or strings. All you have to do to use a function value is to put it in the first slot of a form. In :lname:`Scheme` function *values* are created with the special form ``lambda`` (which you might think of as a ``function`` keyword): .. parsed-literal:: (lambda *formals* *body+*) Where ``formals`` is a list of formal parameters (possibly the empty list!) followed by a list of one or more body forms, evaluated in sequence -- sounds familiar again! Hence we might expect a similar syntactic transformation to a more pure form: .. parsed-literal:: (lambda *formals* (begin *body+*)) Hmm, notice we haven't given the function a *name*. What we have is a function constructor -- like the constructors for numbers and strings -- the result of which is a function value. Unless we bind it to a variable we're going to lose it. Noting that it is without a name this might be called an *anonymous function* -- though that's not a term you particularly see in :lname:`Lisp` languages as it's just a function (value). let* and letrec --------------- The pedantry in defining things continues with a couple of variations on a theme! .. code-block:: scheme (let* ((a 1) (b (+ a 1))) (+ a b)) Finally, we can have variables dependent on previous ones! But what's happening here? ``let*``, "let iteratively", iterates over the bindings, one at a time, placing each parameter in the scope of later ones. It has transformed: .. parsed-literal:: (let* ((*formal1* *arg1*) (*formal2* *arg2*)) *body+*) into: .. parsed-literal:: (let ((*formal1* *arg1*)) (let ((*formal2* *arg2*)) *body+*)) Where each ``let`` only handles a single binding and the remaining bindings are performed in the body form of that binding, hence the earlier parameters are available for use in subsequent argument expressions. ``letrec``, "let recursively," is a slightly different idea, it handles the situation where two or more function definitions are mutually dependent. The usual example is these rather inefficient ``odd?``/``even?`` predicates: .. code-block:: scheme (letrec ((odd? (lambda (n) (if (= n 0) #f (even? (- n 1))))) (even? (lambda (n) (if (= n 0) #t (odd? (- n 1)))))) (even? 4)) Here, each predicate will call the definition of the other, so who do we define first? Does it matter, though, as these are function definitions and not invocations? Yes, it does, as the interpreter, when it is analysing the function definition for ``odd?`` will see a reference to ``even?``. It now has to look that up. In the best case there isn't an ``even?`` in scope and you get a error. In the worse case it'll find some dubious binding to ``even?`` hanging about and good luck to everyone when it gets called. Clearly these two definitions are meant to go together -- they are :term:`concomitant`. ``letrec`` performs a little trick, rather than define one before the other, it creates placeholder bindings for the symbols and then redefines the values in the placeholders in turn to their proper definition -- it doesn't matter in which order, now, as references to ``even?``/``odd?`` will find either the proper definition or the placeholder (to be filled in with the definition in a moment's time). As whichever is found will be the locally defined placeholder ``even?``/``odd?`` then we won't be unearthing some dubious code from elsewhere. So ``letrec`` might be transformed from: .. parsed-literal:: (letrec ((*formal1* *arg1*) (*formal2* *arg2*)) *body+*) into: .. parsed-literal:: (let ((*formal1* #) (*formal2* #)) (let ((tmp1 *arg1*) (tmp2 *arg2*)) (set! *formal1* tmp1) (set! *formal2* tmp2) *body+*)) which itself introduces a few issues: the idea of an *undefined* value (perhaps an internal implementation thing?); temporary variables (we don't want to conflict with the user's code); and ``set!``, the destroyer of things! Define ------ With the various forms of ``let`` we can introduce new variables and override them but sometimes, to much gnashing of teeth from the purists, we need to change the value a symbol is bound to. By and large in :lname:`Scheme`, where something is going to change something then its name has an exclamation mark at the end signifying the danger of mutation! In our putative transformation of ``letrec``, above, we use ``set!`` to modify the values that the formal parameters were bound to. That's fine inside the ``let`` that introduces those formal parameters but what do we do at the top level (outside of any binding form) or anywhere else where a symbol has not yet been introduced? .. code-block:: scheme (set! a 1) is an error in :lname:`Scheme` as ``a`` does not exist -- we can't change the value a symbol is bound to if the symbol doesn't have a binding yet. This may appear as another bout of pedantry as ``set!`` could create the binding if it didn't already exist. You might well ask, then, *where* ``set!`` should create the binding. In the current scope (``let``/``lambda`` level)? At the top level (whatever that is)? Whilst we ponder the possibilities, :lname:`Scheme` says that you should use ``define`` to introduce a symbol binding at the current level -- a binding you can subsequently ``set!``. .. code-block:: scheme (define a 1) Introduces the binding of the symbol ``a`` to the number ``1`` after which anything can use it: .. code-block:: scheme (let ((b 2)) (+ a b)) should return the number value 3 even though ``a`` was not introduced by the ``let``, the ``define`` at the top level has meant it is in scope. We could have written: .. code-block:: scheme (let ((b 2)) (define c 3) (+ b c)) which would return the number value 5. The ``define``, here, *inside* the ``let`` seems slightly wasteful -- as you could have simply had ``let`` introduced a new binding for ``c`` itself -- however the technique is used frequently for defining ancillary functions within another function. Yes, of course, functions within functions. What's not to like? Defining Functions ------------------ Associating a name with a function value -- we know how to create function *value* with ``lambda`` and we now have ``define`` so it is obviously: .. code-block:: scheme (define +1 (lambda (n) (+ 1 n))) (a function called ``+1``? *Outrageous!*) This is such a common idiom that ``define`` has an alternate syntax: .. parsed-literal:: (define (*name* *formals*) *body+*) to be re-arranged as: .. parsed-literal:: (define *name* (lambda (*formals*) *body+*)) ie. for our ``+1`` function: .. code-block:: scheme (define (+1 n) (+ 1 n)) or .. code-block:: scheme (define (+1 n) (+ 1 n)) That's a bit cleaner! Notice that ``define`` (without an exclamation mark) is *introducing a new* binding whereas ``set!`` is *modifying an existing* one (hence ``define`` and ``let`` rather than ``define!`` and ``let!``). It's not quite that simple as the number of formal arguments cause a twist. No Arguments ^^^^^^^^^^^^ For the ``define`` form that's easy, just don't specify any when you define the function -- and obviously don't pass any when you call it! .. code-block:: scheme (define (hi-string) "hi!") (hi-string) For the ``lambda`` form it's not dissimilar, we just have an empty list: .. code-block:: scheme (lambda nil "hi!") although you must explicitly write ``nil`` otherwise the evaluator will be left with an incoherent set of arguments for ``lambda``, just a string in this case. Varargs ^^^^^^^ What if you have a variable number of arguments? Remember, improper lists, ``(a b . c)``? There's two variants here: Some Formal Parameters """""""""""""""""""""" .. code-block:: scheme (define (foo a b . c) ...) I must call ``foo`` with at least two arguments, for the formal parameters, ``a`` and ``b``, and any remaining arguments are bundled up into a list and bound to ``c``. ``c`` could be an empty list, ``nil``, or a list of one or more values, ``(...)``. No Formal Parameters """""""""""""""""""" Here, *all* arguments are bundled up into a list and passed as the formal parameter. For define, this is OK: .. code-block:: scheme (define (foo . c) ...) and works like above. In fact the function ``list`` itself -- whose purpose is to bundle up its arguments into a list -- is usually rather cheekily defined as: .. code-block:: scheme (define (list . ls) ls) where it has had the evaluator do the hard word of bundling the arguments into a list and passes it off as its own work. Clever. For the ``lambda`` form it is visually different: .. code-block:: scheme (lambda c ...) where ``c`` is now a standalone symbol, not in a list of formal parameters, to capture all the arguments as a list. Closures -------- All functions are closures_, that is they can use variables in scope in their bodies. That's probably not a huge shock until you note that a lot of the time a function is returned from an expression and will continue to be able to use the variables that were in the lexical scope of it's *definition* from far far away. .. code-block:: scheme (define a 1) (define (a+ n) (+ a n)) (a+ 5) You'll expect to get the number value 6 -- which you do. However, there's a couple of variations on a theme here: .. code-block:: scheme (set! a 2) (a+ 5) will result in the number value 7 but .. code-block:: scheme (let ((a 3)) (a+ 5)) will give you the number value 7 again. Why? We're back to bindings. When the interpreter analysed the definition of ``a+`` it saw that the variable ``a`` was not bound by the parameters of the function (nor by any bindings introduced within its body) and was therefore a *free identifier*. Rummaging about it will have found the top level ``a``, introduced by ``define``, and will have used that binding in the implementation of ``a+``. Even though the binding was modified by ``set!`` the function body would still be referring to that binding irrespective of any subsequently introduced bindings. In the terminology, ``a+`` was *closed over* ``a`` and therefore ``a+`` is a *closure*. What if we wanted to use the locally current binding of ``a`` (to the number value 3) at the time we evaluated the function call? That would then be using *dynamic scope*. :lname:`Scheme` does use dynamic scope but prefers *lexical scope* in the definition of functions. It could have chosen to do differently but it didn't. The shell, on the other hand, is quite the reverse. Dynamic scope is used unless a variable is declared with lexical scope (with ``typeset``). .. _quoting: Quoting ^^^^^^^ There's a few *reader macros* to help us. Reader macros aren't true macros we will describe later but rather allow for some cheap syntactic tricks to save typing. If the evaluation engine sees a list it will assume the first element is a function and the remaining elements are arguments to that function. What if we want to create an actual list and not have it evaluated? We need to *quote* the list: .. code-block:: scheme (quote (1 2 3)) ``quote`` stops the evaluation engine from trying to find a function associated with the number constructor ``1``. Typing ``(quote thing)`` gets tiresome so we can use a single quote, ie. ``'thing`` as shorthand: .. code-block:: scheme '(1 2 3) The reader macro associated with ``'`` will read the next expression, ``thing``, say, and return the longwinded ``(quote thing)`` for the evaluation engine to process. Types ----- Values have types (boolean, number, string etc.) in :lname:`Scheme` but identifiers do not. You can freely rebind a symbol to any value -- though you're only likely to confuse yourself. This contrasts with strongly-typed languages where a type is associated with an identifier: .. code-block:: c int i = 3; so that the compiler can keep track of things and call out any obvious errors: .. code-block:: c char *s = strlen (i); Common :lname:`Scheme` Functions -------------------------------- Equivalence Predicates ^^^^^^^^^^^^^^^^^^^^^^ What does "the same" really mean? ``eq?`` is most exacting in that, where appropriate, the comparison is on the (internal) pointer into memory where the value is stored. A pointer test can be very quick. This leads to the slightly odd: .. code-block:: scheme (eq? 1 1) returning ``#f``. ``eqv?`` is the same as ``eq?`` except that the values of numbers and chars are compared. :socrates:`Thank goodness for that!` ``equal?`` will recursively descend into pairs, vectors and strings applying ``eqv?`` to their contents. Broadly, does the printed representation look the same? :lname:`C`, by way of comparison, can only perform integer comparisons, ``==``, ``<`` and the like. Any complex data type like a string or a struct requires that you delve inside and compare the constituent parts "int by int". Conditional Expressions ^^^^^^^^^^^^^^^^^^^^^^^ .. code-block:: scheme (if c1 s1 (if c2 s2 (if c3 s3 ...))) Cascading ``if`` clauses aren't easy on the eye. ``cond`` is the alternative: .. code-block:: scheme (cond (c1 s1+) (c2 s2+) (c3 s3+)) where, more than likely, each of the :samp:`c{n}` and :samp:`s{n}+` clauses are themselves lists: .. code-block:: scheme (cond ((boolean? exp) "boolean") ((number? exp) "number") ((string? exp) (string-append "string-" exp))) ``cond`` is slightly different to ``if`` in that multiple expressions, the :samp:`s{n}+`, are allowed if the condition is true. It has an ``else`` form, of course, and a very different creature, an ``=>`` clause: .. code-block:: scheme (cond ((string-match str "foo") => func) (else (string-append "string-" exp))) .. sidebox:: My example is rather poor in that you might presume that ``string-match`` produces an index into a string but, seemingly, ``func`` has no reference to ``str``. Maybe ``func`` was closed over ``str``, maybe it's just accumulating a list of matches, maybe I should think harder about my examples.... For ``=>`` the :samp:`c{n}` expression is evaluated resulting in some :samp:`{value}`. If the :samp:`{value}` is not false then ``func`` is applied to the :samp:`{value}`, ie. :samp:`(func {value})`, or ``(func (string-match str "foo"))`` if and only if ``(string-match str "foo")`` is not false. ``case`` is more like :lname:`C`’s ``switch`` or the shell's ``case`` statements: .. code-block:: scheme (case key ((o1+) e1+) ((o2+) e2+) ((o3+) e3+)) where ``key`` is evaluated and then compared to each of the :samp:`o{n}+` objects with ``eqv?``. If any object matches ``key`` then the expressions :samp:`e{n}+` are evaluated in sequence with the result being that of the last expression. Sequences ^^^^^^^^^ ``and``, ``or`` and ``begin`` are all logical sequencing instructions which only differ in their initial premise (``#t``, ``#f`` and *unspecified* respectively) and when they stop processing the sequence (when an expression returns false, when an expression returns true and to the end, respectively). They all return the value the last expression they evaluated returned (or the initial premise if there were no expressions!). Common :lname:`Scheme` Idioms ----------------------------- Result List ^^^^^^^^^^^ A very common idiom is for a function to construct a result list by ``cons``\ ing together the putative calculation on the ``car`` of the list of arguments and then calling itself on the remaining arguments. So to add 1 to every element of a list: .. code-block:: scheme (define (inc-list lis) (cons (+ (car lis) 1) (inc-list (cdr lis)))) Here we are ``cons``\ ing together the per-element calculation ``(+ X 1)`` where ``X`` is the first element of the list and a recursive call to itself on the rest of the list, ``(inc-list (cdr lis))``. Calling the argument list ``lis`` is a common enough idiom itself as calling the argument list ``list`` would conflict with the function ``list`` and calling it ``l`` as ever risks visual confusion with ``1``. Actually, that's not quite correct as there's nothing identifying the end of the list! .. code-block:: scheme (define (inc-list lis) (if (pair? lis) (cons (+ (car lis) 1) (inc-list (cdr lis)))) nil) When we reach the last pair in the list, where the ``cdr`` is ``nil`` and call ``inc-list`` on that ``nil`` then the ``if``’s condition is false and that invocation of ``inc-list`` returns ``nil`` which is just what we want in the caller's ``cons``, ``(cons thing nil)``. As the recursive calls unwind we get a neatly constructed set of nested ``cons`` calls resulting in the list we want. Indeed, applying a function to each element of a list is the work of ``map``: .. parsed-literal:: (map *func* *list*) and in our "add 1" case, an anonymous function is suitable: .. code-block:: scheme (map (lambda (n) (+ n 1)) list) ``map`` itself will follow the result list idiom -- constructing the result list as the list is descended. That's similar to ``map`` in most languages but :lname:`Lisp`\ ers like a list so the real ``map`` will iterate down multiple lists at once calling ``func`` with multiple args (one per list). Safe Bootstrap ^^^^^^^^^^^^^^ Particularly in bootstrap code you might see the peculiar: .. code-block:: scheme (define (foo args) ...) (define bar (let ((foo foo)) (lambda (args) ...use foo... ))) What's the point of defining ``foo`` as ``foo`` in the ``let`` in ``bar``? The problem is that we don't know when ``bar`` is going to be called and, if ``foo`` is something that may be redefined with slightly different behaviour then we need to be sure that the ``foo`` that ``bar`` uses is this one, the one defined immediately before it. In other words, we've bound ourselves to the local definition of ``foo`` so that we don't get any unexpected behaviour. Anonymous Closures ^^^^^^^^^^^^^^^^^^ Anonymous closures returned from function calls are very common: .. code-block:: scheme (define (adder-factory arg) (lambda (a) (+ arg a))) (define add3 (adder-factory 3)) (add3 5) should get us the number 8 as a result. .. _thunk: Thunks ^^^^^^ A thunk is a zero parameter function which doesn't sound much use in itself but following on from anonymous closures is a useful trick to delay the evaluation of some code. .. code-block:: scheme (define a (+ 3 5)) (+ a 1) will perform the calculation ``(+ 3 5)`` and bind ``a`` to the result. No biggie. But: .. code-block:: scheme (define a (lambda () (+ 3 5))) (+ (a) 1) Notice that ``a`` is a now a zero-argument function and therefore requires a zero-argument function invocation, ``(a)``, in the final expression! Here, the calculation, ``(+ 3 5)``, will only be performed when ``a`` is invoked as a function. This is a trivial case but preparing a body of code to be evaluated on demand will come in very useful. Recall, that ``if`` is a special form, ``(if condition consequent alternative)``? Perhaps we could have transformed it into .. parsed-literal:: (if *condition* *consequent-thunk* *alternative-thunk*) by replacing, say, ``consequent`` with ``(lambda () consequent)``, then neither of the consequent nor alternative expressions would be evaluated until their respective thunks were invoked when ``if`` decided what to do with the result of condition. OK, using a macro to skip the special form-ness of ``if`` seems a bit pointless but the principle remains: .. code-block:: scheme (define (preparer args) (lambda () ... args ... )) we can prepare some functionality to be evaluated on demand -- perhaps never, of course. .. include:: ../commit.rst