.. include:: ../../global.rst .. _bignums: ******* Bignums ******* I mentioned before that I've taken :ref-author:`Nils M Holm`'s (2009) :ref-title:`S9fES` (:cite:`S9fES`) bignum implementation and re-written it for :lname:`Idio`. Bignums are, um, big numbers. *How big?* Quite big. The goal is ostensibly arbitrary precision numbers which use as many significant digits as required. Both integer and floating point maths. You might want to ask, how many significant digits do I need to do real-world maths? I have two sobering examples: * No modern treatise is complete without at least one relevant XKCD: .. image:: https://imgs.xkcd.com/comics/coordinate_precision.png :alt: XKCD: Coordinate Precision :target: https://xkcd.com/2170/ .. aside:: `Angels on the head of a pin? `_ *We've got you now!* which, reading the bottom row suggests we need 17 significant digits to track the latitude and longitude of individual *atoms*. * This 2016 question to NASA on `how many decimals of PI we really need `_. The upshot of which is that: #. 15 digits for the circumference a sphere of radius 12.5 billion miles (encompassing Voyager 1 at the time, now only Voyager 2 -- check `where is Voyager `_ for their current whereabouts) to an accuracy of 1.5 *inches* The same 15 digits would have your odometer off by the size of a *molecule* (whose sizes vary, granted) had you cycled/kayaked around the Earth's equator. #. 40 digits to calculate the circumference of a circle the size of the visible universe (46 billion light years -- at the time) to an accuracy equal to the diameter of a hydrogen atom So, not really very many required at all, I'm thinking. As mentioned previously, many projects choose to use the GMP_ (GNU Multi Precision Arithmetic Library) but my selfish need-to-know means we *must* have one we've figured out for ourselves -- albeit by re-implementing someone else's code. (We must re-implement the :ref-title:`S9fES` version because the GCs for :lname:`Idio` and :ref-title:`S9fES` are quite different.) :ref-title:`S9fES` (and GMP) aren't the only implementations we could look at. I see that `Emacs `_ comes with a *mini-gmp* library though that is, so far as I can tell, simply a port of the :file:`mini-gmp.c` code included in GMP. :ref-author:`Fredrik Johansson`'s :ref-title:`mpmath` library for :lname:`Python` is another alternative. However, we are reasonably confident that the :ref-title:`S9fES` code is targeting :lname:`Scheme` so is a decent place to start. Implementation ============== The broad thrust of the mechanics is long-hand arithmetic using arrays of ``intptr_t``\ s for the digits in the *mantissa*/integer. The array elements are called segments, here. We'll need an *exponent* and some flags (integer, real, etc.) and the implementation uses the top bit of the most significant segment as the sign bit for integers -- all the other segments for an integer have the top bit ignored. How many decimal digits can fit in a machine word? log\ :sub:`2`\ (10) is 3.322, that is each decimal digit uses 3.322 binary bits. 19 of those would be 63.12 bits which means we couldn't also fit the sign bit in to a 64-bit word. So, round that down and we can fit 18 decimal digits in a 64-bit word. Similarly 9 decimal digits in a 32-bit word. .. aside:: *And* keep an eye on those pesky angels. Certainly, then, with a 64-bit system we can do decent *space maths* with a single segment. *Cool!* That's the maximum number of decimal digits per word you could squeeze in and the algorithm allows you to define a particular number of digits per word (up to the maximum). As mentioned previously there is a slight trick to the way numbers are handled in that we don't attempt to pack data into our segments with maximal efficiency. Instead we take a step back and say, OK, we can fit 18 decimal digits into our 64-bit word. In other words, the absolute decimal value 1,000,000,000,000,000,000 (19 digits) marks overflow. If we have overflowed we can account for *carry* in our long-hand arithmetic and therefore avoid the undefined behaviour that plagues integer maths in :lname:`C`. .. sidebox:: Obviously, ``pi`` (π) is a `transcendental number `_ and any numeric representation of it should be inexact. Inexactness is transitive so any use of ``pi`` results in another inexact number. :lname:`Scheme`'s number tower supports slightly more interesting numbers than many programming languages in that it has the concept of *exact* and *inexact* numbers. Inexact numbers are where we've lost precision because of rounding errors or, because we can, we were given an inexact number in the first place. *I* don't have any immediate use for inexact numbers in their own right but someone might. Summing that up: * we need some flags which can be type-specific. A bignum is going to be an integer or a real and if it is a real then we need a flag for being negative and another for being inexact. I've thrown in a not-a-number although it's not used. .. code-block:: c :caption: gc.h #define IDIO_BIGNUM_FLAG_NONE (0) #define IDIO_BIGNUM_FLAG_INTEGER (1<<0) #define IDIO_BIGNUM_FLAG_REAL (1<<1) #define IDIO_BIGNUM_FLAG_REAL_NEGATIVE (1<<2) #define IDIO_BIGNUM_FLAG_REAL_INEXACT (1<<3) #define IDIO_BIGNUM_FLAG_NAN (1<<4) * we need an exponent and mantissa. We'll use an ``int32_t`` exponent because... .. sidebox:: because *we can*. So what if there are only 10\ :sup:`80` atoms in the universe? You're obviously not including all the other universes! Well, from a practical perspective, because it is a separate field in the ``idio_bignum_s`` structure and we don't have any others meaning it **will** consume a machine word. So, 32 bits for a machine word on a 32-bit system and we'll take the hit on a 64-bit system for consistency. .. code-block:: c :caption: gc.h typedef int32_t IDIO_BE_T; typedef struct idio_bignum_s { IDIO_BE_T exp; IDIO_BSA sig; } idio_bignum_t; #define IDIO_BIGNUM_FLAGS(B) ((B)->tflags) #define IDIO_BIGNUM_EXP(B) ((B)->u.bignum.exp) #define IDIO_BIGNUM_SIG(B) ((B)->u.bignum.sig) where ``sig`` is our *significand array* of segments: .. code-block:: c :caption: gc.h typedef struct idio_bsa_s { size_t refs; size_t avail; size_t size; IDIO_BS_T *ae; } idio_bsa_t; typedef idio_bsa_t* IDIO_BSA; #define IDIO_BSA_AVAIL(BSA) ((BSA)->avail) #define IDIO_BSA_SIZE(BSA) ((BSA)->size) From ``refs`` you can deduce that I had an intention to use reference counting for these significand arrays as the low-level mechanics of bignums didn't seem appropriate for the main GC to get involved in. It turns out no reference counting occurs (other than creation and destruction). ``avail`` is the number of segments in the significand array and ``size`` is the number in use. That might seem a bit odd but shift operations will reduce the number of significant digits in use which will trip over segment boundaries. Note that the significand array is not fixed in the sense that as the number is being constructed the array may grow. Here, our ``pi`` inexactness comment comes to the fore. We (correctly) construct it with 61 significant figures (why not?) but, in normalising the constructed bignum back into the nominal 18 digits we clearly truncated this four (or eight) segment number down to one (or two) and flagged it as inexact. Job done! The definition should really start ``#i3.14159...`` to avoid any ne're-do-wells fiddling in :file:`src/bignum.h` and changing the defaults. * we have the usual accessors seen above with the slightly more interesting array accessors: .. code-block:: c :caption: gc.h #define IDIO_BSA_AE(BSA,i) ((BSA)->ae[i]) #define IDIO_BIGNUM_SIG_AE(B,i) IDIO_BSA_AE((B)->u.bignum.sig,i) * we have some bignum definitions: .. code-block:: c :caption: bignum.h #ifdef __LP64__ #define IDIO_BIGNUM_MDPW 18 #define IDIO_BIGNUM_DPW 18 #define IDIO_BIGNUM_INT_SEG_LIMIT 1000000000000000000LL #define IDIO_BIGNUM_SIG_SEGMENTS 1 #else #define IDIO_BIGNUM_MDPW 9 #define IDIO_BIGNUM_DPW 9 #define IDIO_BIGNUM_INT_SEG_LIMIT 1000000000L #define IDIO_BIGNUM_SIG_SEGMENTS 2 #endif Where ``IDIO_BIGNUM_SIG_SEGMENTS`` is the maximum number of segments we will use. Suffice it to say that whilst the results will be more accurate calculations becomes exponentially longer the more segments you use. Here, 32-bit computers using two segments will be to some degree slower than 64-bit machines using one segment. Bignums are comparatively expensive to operate -- they require memory, time to consume, time to operate with and time to print out -- so we should avoid them if necessary. Unfortunately the opportunities to avoid them are slim. Reading ======= Reading is a two-step process. #. In the reader, ``idio_read_number_C()`` in :file:`src/read.c` we are concerned with *recognising* a number, a syntactic check of sorts Actually, there's another number-recogniser/constructor, ``idio_read_bignum_radix()`` in :file:`src/read.c` (which I suppose should have a trailing ``_C`` to be consistent) for a particular input form. #. In ``idio_bignum_C`` in :file:`src/bignum.c` we are concerned with *constructing* a number from the character-based representation The Reader ---------- A generic number (in :lname:`Scheme`, here from `Numbers in Scheme `_) is something like: * ``[+-]?[0-9]+`` * ``[+-]?[0-9]+.[0-9]*`` * ``[+-]?[0-9]+.[0-9]*E[+-]?[0-9]+`` (There must be a numeric part before any ``.`` otherwise it will be interpreted as the ``value-index`` operator.) The exponent character, ``E`` in the last example, can be one of several characters: * for base-16 numbers (see below): ``e`` and ``E`` quite obviously clash with the possible digits of hexadecimal numbers so for base-16 numbers you can use the exponent characters ``s``/``S`` or ``l``/``L`` -- I don't know what the history of those are. * other exponent-able numbers can use ``d``/``D``, ``e``/``E``, ``f``/``F`` -- and ``s``/``S`` and ``l``/``L`` -- which seem to be the generally accepted set of exponent characters. ``+10``, ``1.``, ``-20.1``, ``0.3e1``, ``-4e-5``, ``6L7`` ``idio_read_number_C()`` in :file:`read.c` will check the overall format and make some decisions about the way forward: * if the number is definitely not an integer, ie. it has: - a decimal point - an exponent - is inexact then it will call ``idio_bignum_C()`` * has a small number of digits then we can construct a fixnum directly. The guesstimate is based on that 3.32 bits per decimal digit -- which we'll round up to four for safety. With :samp:`{i}` digits we'll require :samp:`4{i}` bits. The fixnum implementation can cope with 8 (technically, ``CHAR_BIT``) bits per byte of an ``intptr_t`` less the number of tags bits. If one is less than the other then we're good to go. (That should be more closely analysed for leading sign characters.) ``idio_fixnum_C()`` in :file:`src/fixnum.c` uses :manpage:`strtoll(3)` which I think is non-controversial and we shouldn't be pushing the boundaries of an ``intptr_t``. * otherwise we call ``idio_bignum_C()`` for a full conversion to a bignum then attempt to convert the bignum back to a fixnum with ``idio_bignum_to_fixnum()`` in :file:`src/bignum.c`. (Some fixnums may not be realised by this conversion. I need to (re-)\ *do the math*.) Non-base-10 Numbers ^^^^^^^^^^^^^^^^^^^ There are several reader input forms for non-base-10 numbers all of which call ``idio_read_bignum_radix()`` in :file:`src/read.c` with a different *radix*: .. csv-table:: Non-base-10 reader number formats :header: "form", "radix", "example", "decimal equivalent" :widths: auto :align: left ``#b``, 2, ``#b101``, 5 ``#o``, 8, ``#o101``, 65 ``#d``, 10, ``#d101``, 101 ``#x``, 16, ``#x101``, 257 ``idio_read_bignum_radix()`` supports bases up to 36 (using 0-9 then a-z/A-Z) although there are no specific reader input forms. :samp:`read-number {str} [{radix}]` can be used to read numbers in your specialised base-23 number-system. ``idio_read_bignum_radix()`` is a slight hybrid recogniser/constructor in that: * it initialises an accumulated bignum value to zero * as it walks through the input stream asserting that the next character is appropriate for the radix then * multiplies the accumulated value by the radix (base) then * adds to the accumulator the (decimalised) value of the character bignum.c -------- :file:`src/bignum.c` looks fearsomely complex but that's because there's: * a sort of combinatorial explosion for handling integers and reals * a variety of calls to create bignums from :lname:`C` integers and construct :lname:`C` integers from large (but not too large) bignums * printing bignums is possibly complicated * actual bignum operations (add, subtract etc.) Bignums are normalized after most operations to provide some consistency: .. code-block:: idio-console Idio> 1234e-3 1.234e+0 Idio> 12340000e-7 1.234e+0 there's a ``bignum-dump`` left in for debug. On a 64-bit machine: .. code-block:: idio-console Idio> pi #i3.14159265358979323e+0 Idio> bigum-dump pi idio_bignum_dump: Ri segs[ 1]: 314159265358979323 e-17 and on a 32-bit machine: .. code-block:: idio-console Idio> pi #i3.14159265358979323e+0 Idio> bigum-dump pi idio_bignum_dump: Ri segs[ 2]: 314159265 358979323 e-17 What you can see is that ``pi`` is an *inexact* number, ``#i`` in its normal printed form. The ``bignum-dump`` output has some flags ``Ri`` for *real* number and *inexact*, that it use 1 or 2 segments (machine-specific) and then each segment is printed in its decimal form and finally the exponent is printed. What you can readily see from the internal dumped format is that the significand array is storing the significant digits as a (long-hand) integer and the exponent is scaled appropriately. Writing ======= ``idio_bignum_as_string()`` in :file:`src/bignum.c` is called from ``idio_as_string()`` in :file:`src/util.c`. Integers -------- Integer number bignums are printed by ``idio_bignum_integer_as_string()`` but the format is fixed to being *decimal*. The problem relates to us having split the value of the bignum into DPW *decimal* segments. If I've got the first of two segments in my hands and its value is 1, that won't guarantee to be 1 in any other format. For example, imagine we can store 3 digits per word then 1234 will be stored as 1 and 234. Then the 1 in the first segment won't guarantee to be a 1 in the hex (#x4D2), octal (#o2322) or binary (#b10011010010) -- OK, a reasonable chance with the binary. To generate some other :samp:`{base}`, hexadecimal, say, we would have to do the reverse of the reader's input method for non-base-10 numbers. We would need to calculate :samp:`1234 % {base}`, subtract that from 1234, divide the result by :samp:`{base}` and loop around again until the result was zero. So, *do-able*, but expensive. I haven't found a need. You can only usefully specify a precision which, for an integer, just becomes a padding with leading zeroes. You can get this through :ref:`format ` or one of the *printf* functions. Reals ----- Real number bignums are printed by ``idio_bignum_real_as_string()`` which supports ``e`` and ``f`` :manpage:`printf(3)`-style formatting. You can get either of these through :ref:`format ` or one of the *printf* functions. The default output format is still the :lname:`Scheme`-ish one -- or rather the original :ref-title:`S9fES` format -- which is similar to the :manpage:`printf(3)` ``e`` except prints full precision. Operations ========== .. idio:function:: bignum? value is `value` a bignum .. idio:function:: real? bignum is bignum `bignum` a real (ie. not an integer) .. idio:function:: exact? number is number `number` an exact number (ie. an integer or not inexact) .. idio:function:: inexact? number is number `number` an inexact number (ie. not an integer or is inexact) .. idio:function:: exact->inexact number an inexact version of `number` is returned .. idio:function:: inexact->exact number an exact version of `number` is returned .. idio:function:: mantissa number the mantissa of `number` is returned .. idio:function:: exponent number the exponent of `number` is returned .. idio:function:: bignum-dump bignum display some of the internal structure of `bignum` .. include:: ../../commit.rst