Bignums¶

I mentioned before that I’ve taken Nils M Holm’s (2009) S9fES ([Hol14]) bignum implementation and re-written it for Idio.

Bignums are, um, big numbers. How big? Quite big. The goal is ostensibly arbitrary precision numbers which use as many significant digits as required. Both integer and floating point maths.

You might want to ask, how many significant digits do I need to do real-world maths? I have two sobering examples:

No modern treatise is complete without at least one relevant XKCD:

Angels on the head of a pin? We’ve got you now!

which, reading the bottom row suggests we need 17 significant digits to track the latitude and longitude of individual atoms.
This 2016 question to NASA on how many decimals of PI we really need.

The upshot of which is that:
1. 15 digits for the circumference a sphere of radius 12.5 billion miles (encompassing Voyager 1 at the time, now only Voyager 2 – check where is Voyager for their current whereabouts) to an accuracy of 1.5 inches
  
  The same 15 digits would have your odometer off by the size of a molecule (whose sizes vary, granted) had you cycled/kayaked around the Earth’s equator.
2. 40 digits to calculate the circumference of a circle the size of the visible universe (46 billion light years – at the time) to an accuracy equal to the diameter of a hydrogen atom

So, not really very many required at all, I’m thinking.

As mentioned previously, many projects choose to use the GMP (GNU Multi Precision Arithmetic Library) but my selfish need-to-know means we must have one we’ve figured out for ourselves – albeit by re-implementing someone else’s code.

(We must re-implement the S9fES version because the GCs for Idio and S9fES are quite different.)

S9fES (and GMP) aren’t the only implementations we could look at. I see that Emacs comes with a mini-gmp library though that is, so far as I can tell, simply a port of the mini-gmp.c code included in GMP.

Fredrik Johansson’s mpmath library for Python is another alternative.

However, we are reasonably confident that the S9fES code is targeting Scheme so is a decent place to start.

Implementation¶

The broad thrust of the mechanics is long-hand arithmetic using arrays of intptr_ts for the digits in the mantissa/integer. The array elements are called segments, here. We’ll need an exponent and some flags (integer, real, etc.) and the implementation uses the top bit of the most significant segment as the sign bit for integers – all the other segments for an integer have the top bit ignored.

How many decimal digits can fit in a machine word? log₂(10) is 3.322, that is each decimal digit uses 3.322 binary bits. 19 of those would be 63.12 bits which means we couldn’t also fit the sign bit in to a 64-bit word. So, round that down and we can fit 18 decimal digits in a 64-bit word. Similarly 9 decimal digits in a 32-bit word.

Certainly, then, with a 64-bit system we can do decent space maths with a single segment. Cool!

That’s the maximum number of decimal digits per word you could squeeze in and the algorithm allows you to define a particular number of digits per word (up to the maximum).

As mentioned previously there is a slight trick to the way numbers are handled in that we don’t attempt to pack data into our segments with maximal efficiency. Instead we take a step back and say, OK, we can fit 18 decimal digits into our 64-bit word. In other words, the absolute decimal value 1,000,000,000,000,000,000 (19 digits) marks overflow. If we have overflowed we can account for carry in our long-hand arithmetic and therefore avoid the undefined behaviour that plagues integer maths in C.

Obviously, pi (π) is a transcendental number and any numeric representation of it should be inexact.

Inexactness is transitive so any use of pi results in another inexact number.

Scheme’s number tower supports slightly more interesting numbers than many programming languages in that it has the concept of exact and inexact numbers. Inexact numbers are where we’ve lost precision because of rounding errors or, because we can, we were given an inexact number in the first place. I don’t have any immediate use for inexact numbers in their own right but someone might.

Summing that up:

we need some flags which can be type-specific. A bignum is going to be an integer or a real and if it is a real then we need a flag for being negative and another for being inexact.

I’ve thrown in a not-a-number although it’s not used.

gc.h¶

#define IDIO_BIGNUM_FLAG_NONE          (0)
#define IDIO_BIGNUM_FLAG_INTEGER       (1<<0)
#define IDIO_BIGNUM_FLAG_REAL          (1<<1)
#define IDIO_BIGNUM_FLAG_REAL_NEGATIVE (1<<2)
#define IDIO_BIGNUM_FLAG_REAL_INEXACT  (1<<3)
#define IDIO_BIGNUM_FLAG_NAN           (1<<4)

we need an exponent and mantissa. We’ll use an int32_t exponent because…

because we can. So what if there are only 10⁸⁰ atoms in the universe? You’re obviously not including all the other universes!

Well, from a practical perspective, because it is a separate field in the idio_bignum_s structure and we don’t have any others meaning it will consume a machine word. So, 32 bits for a machine word on a 32-bit system and we’ll take the hit on a 64-bit system for consistency.
gc.h¶
```
typedef int32_t IDIO_BE_T;

typedef struct idio_bignum_s {
    IDIO_BE_T exp;
    IDIO_BSA sig;
} idio_bignum_t;

#define IDIO_BIGNUM_FLAGS(B)       ((B)->tflags)
#define IDIO_BIGNUM_EXP(B)         ((B)->u.bignum.exp)
#define IDIO_BIGNUM_SIG(B)         ((B)->u.bignum.sig)
```
where sig is our significand array of segments:
gc.h¶
```
typedef struct idio_bsa_s {
    size_t refs;
    size_t avail;
    size_t size;
    IDIO_BS_T *ae;
} idio_bsa_t;

typedef idio_bsa_t* IDIO_BSA;

#define IDIO_BSA_AVAIL(BSA)        ((BSA)->avail)
#define IDIO_BSA_SIZE(BSA)         ((BSA)->size)
```
From refs you can deduce that I had an intention to use reference counting for these significand arrays as the low-level mechanics of bignums didn’t seem appropriate for the main GC to get involved in. It turns out no reference counting occurs (other than creation and destruction).

avail is the number of segments in the significand array and size is the number in use. That might seem a bit odd but shift operations will reduce the number of significant digits in use which will trip over segment boundaries.

Note that the significand array is not fixed in the sense that as the number is being constructed the array may grow.

Here, our pi inexactness comment comes to the fore. We (correctly) construct it with 61 significant figures (why not?) but, in normalising the constructed bignum back into the nominal 18 digits we clearly truncated this four (or eight) segment number down to one (or two) and flagged it as inexact. Job done!

The definition should really start #i3.14159... to avoid any ne’re-do-wells fiddling in src/bignum.h and changing the defaults.

we have the usual accessors seen above with the slightly more interesting array accessors:

gc.h¶

#define IDIO_BSA_AE(BSA,i)         ((BSA)->ae[i])

#define IDIO_BIGNUM_SIG_AE(B,i)    IDIO_BSA_AE((B)->u.bignum.sig,i)

we have some bignum definitions:

bignum.h¶

#ifdef __LP64__
#define IDIO_BIGNUM_MDPW          18
#define IDIO_BIGNUM_DPW           18
#define IDIO_BIGNUM_INT_SEG_LIMIT 1000000000000000000LL
#define IDIO_BIGNUM_SIG_SEGMENTS  1
#else
#define IDIO_BIGNUM_MDPW          9
#define IDIO_BIGNUM_DPW           9
#define IDIO_BIGNUM_INT_SEG_LIMIT 1000000000L
#define IDIO_BIGNUM_SIG_SEGMENTS  2
#endif

Where IDIO_BIGNUM_SIG_SEGMENTS is the maximum number of segments we will use. Suffice it to say that whilst the results will be more accurate calculations becomes exponentially longer the more segments you use. Here, 32-bit computers using two segments will be to some degree slower than 64-bit machines using one segment.

Bignums are comparatively expensive to operate – they require memory, time to consume, time to operate with and time to print out – so we should avoid them if necessary. Unfortunately the opportunities to avoid them are slim.

Reading¶

Reading is a two-step process.

In the reader, idio_read_number_C() in src/read.c we are concerned with recognising a number, a syntactic check of sorts

Actually, there’s another number-recogniser/constructor, idio_read_bignum_radix() in src/read.c (which I suppose should have a trailing _C to be consistent) for a particular input form.
In idio_bignum_C in src/bignum.c we are concerned with constructing a number from the character-based representation

The Reader¶

A generic number (in Scheme, here from Numbers in Scheme) is something like:

[+-]?[0-9]+
[+-]?[0-9]+.[0-9]*
[+-]?[0-9]+.[0-9]*E[+-]?[0-9]+

(There must be a numeric part before any . otherwise it will be interpreted as the value-index operator.)

The exponent character, E in the last example, can be one of several characters:

for base-16 numbers (see below): e and E quite obviously clash with the possible digits of hexadecimal numbers so for base-16 numbers you can use the exponent characters s/S or l/L – I don’t know what the history of those are.
other exponent-able numbers can use d/D, e/E, f/F – and s/S and l/L – which seem to be the generally accepted set of exponent characters.

+10, 1., -20.1, 0.3e1, -4e-5, 6L7

idio_read_number_C() in read.c will check the overall format and make some decisions about the way forward:

if the number is definitely not an integer, ie. it has:
- a decimal point
- an exponent
- is inexact
then it will call idio_bignum_C()
has a small number of digits then we can construct a fixnum directly.

The guesstimate is based on that 3.32 bits per decimal digit – which we’ll round up to four for safety. With i digits we’ll require 4i bits. The fixnum implementation can cope with 8 (technically, CHAR_BIT) bits per byte of an intptr_t less the number of tags bits. If one is less than the other then we’re good to go.

(That should be more closely analysed for leading sign characters.)

idio_fixnum_C() in src/fixnum.c uses strtoll(3) which I think is non-controversial and we shouldn’t be pushing the boundaries of an intptr_t.
otherwise we call idio_bignum_C() for a full conversion to a bignum then attempt to convert the bignum back to a fixnum with idio_bignum_to_fixnum() in src/bignum.c.

(Some fixnums may not be realised by this conversion. I need to (re-)do the math.)

Non-base-10 Numbers¶

There are several reader input forms for non-base-10 numbers all of which call idio_read_bignum_radix() in src/read.c with a different radix:

Non-base-10 reader number formats¶
form	radix	example	decimal equivalent
`#b`	2	`#b101`	5
`#o`	8	`#o101`	65
`#d`	10	`#d101`	101
`#x`	16	`#x101`	257

idio_read_bignum_radix() supports bases up to 36 (using 0-9 then a-z/A-Z) although there are no specific reader input forms. read-number str [radix] can be used to read numbers in your specialised base-23 number-system.

idio_read_bignum_radix() is a slight hybrid recogniser/constructor in that:

it initialises an accumulated bignum value to zero
as it walks through the input stream asserting that the next character is appropriate for the radix then
- multiplies the accumulated value by the radix (base) then
- adds to the accumulator the (decimalised) value of the character

bignum.c¶

src/bignum.c looks fearsomely complex but that’s because there’s:

a sort of combinatorial explosion for handling integers and reals
a variety of calls to create bignums from C integers and construct C integers from large (but not too large) bignums
printing bignums is possibly complicated
actual bignum operations (add, subtract etc.)

Bignums are normalized after most operations to provide some consistency:

Idio> 1234e-3
1.234e+0
Idio> 12340000e-7
1.234e+0

there’s a bignum-dump left in for debug. On a 64-bit machine:

Idio> pi
#i3.14159265358979323e+0
Idio> bigum-dump pi
idio_bignum_dump:  Ri segs[ 1]: 314159265358979323 e-17

and on a 32-bit machine:

Idio> pi
#i3.14159265358979323e+0
Idio> bigum-dump pi
idio_bignum_dump:  Ri segs[ 2]: 314159265 358979323 e-17

What you can see is that pi is an inexact number, #i in its normal printed form. The bignum-dump output has some flags Ri for real number and inexact, that it use 1 or 2 segments (machine-specific) and then each segment is printed in its decimal form and finally the exponent is printed.

What you can readily see from the internal dumped format is that the significand array is storing the significant digits as a (long-hand) integer and the exponent is scaled appropriately.

Writing¶

idio_bignum_as_string() in src/bignum.c is called from idio_as_string() in src/util.c.

Integers¶

Integer number bignums are printed by idio_bignum_integer_as_string() but the format is fixed to being decimal.

The problem relates to us having split the value of the bignum into DPW decimal segments. If I’ve got the first of two segments in my hands and its value is 1, that won’t guarantee to be 1 in any other format.

For example, imagine we can store 3 digits per word then 1234 will be stored as 1 and 234. Then the 1 in the first segment won’t guarantee to be a 1 in the hex (#x4D2), octal (#o2322) or binary (#b10011010010) – OK, a reasonable chance with the binary.

To generate some other base, hexadecimal, say, we would have to do the reverse of the reader’s input method for non-base-10 numbers. We would need to calculate 1234 % base, subtract that from 1234, divide the result by base and loop around again until the result was zero.

So, do-able, but expensive. I haven’t found a need.

You can only usefully specify a precision which, for an integer, just becomes a padding with leading zeroes. You can get this through format or one of the printf functions.

Reals¶

Real number bignums are printed by idio_bignum_real_as_string() which supports e and f printf(3)-style formatting. You can get either of these through format or one of the printf functions.

The default output format is still the Scheme-ish one – or rather the original S9fES format – which is similar to the printf(3) e except prints full precision.

Operations¶

function bignum? value¶: is value a bignum

function real? bignum¶: is bignum bignum a real (ie. not an integer)

function exact? number¶: is number number an exact number (ie. an integer or not inexact)

function inexact? number¶: is number number an inexact number (ie. not an integer or is inexact)

function exact->inexact number¶: an inexact version of number is returned

function inexact->exact number¶: an exact version of number is returned

function mantissa number¶: the mantissa of number is returned

function exponent number¶: the exponent of number is returned

function bignum-dump bignum¶: display some of the internal structure of bignum

Last built at 2024-12-21T07:11:04Z+0000 from 463152b (dev)

Bignums¶

Implementation¶

Reading¶

The Reader¶

Non-base-10 Numbers¶

bignum.c¶

Writing¶

Integers¶

Reals¶

Operations¶

DIPS

Table of Contents

Related Topics