Bitsets¶
Bitsets are simple sets of integral-indexed bits originally introduced to support attribute testing of Unicode char-sets so both integers and Unicode code points can be used as indices. You can use them as a set of flags much like bitmasks in C.
The first bit is index 0 (zero).
;; create a five element bitset -- all bits are initially unset
bs1 := make-bitset 5
printf "bs1 is %s\n" bs1
;; set the first two bits
bitset-set! bs1 0
bitset-set! bs1 1
printf "bs1 is %s\n" bs1
;; if we know the starting bits we can create it immediately
bs2 := #B{ 5 11 }
printf "bs2 is %s\n" bs2
;; more than 8 bits
bs3 := make-bitset 40
;; set the 9th and tenth bits
bitset-set! bs3 8
bitset-set! bs3 9
printf "bs3 is %s\n" bs3
;; an inverted bs3
bs4 := not-bitset bs3
;; clear the last bit
bitset-clear! bs4 ((bitset-size bs4) - 1)
printf "bs4 is %s\n" bs4
$ idio simple-bitsets
bs1 is #B{ 5 }
bs1 is #B{ 5 11000 }
bs2 is #B{ 5 11000 }
bs3 is #B{ 40 8:11000000 }
bs4 is #B{ 40 0-0 00111111 10-18 11111110 }
The printed (and reader) form is a condensed set with listed ranges
all set (otherwise the bits are unset). You can use the #B{ ... }
form as bitset constructors.
In the previous example’s output:
bs3 has the first two bits set in the block starting at index 8 (a hex index)
bs4 has
all bits in the blocks starting at (hex) 0 through to the block starting at (hex) 0 inclusive are set
the next block has the first two bits unset then the rest set
all bits in the blocks starting at (hex) 10 through to the block starting at (hex) 18 inclusive are set
the next block has all the bits set except the last
SRFI-14¶
SRFI-14 defines several sets of char-sets which use arrays of bitsets (representing Unicode planes) to cover the 221 bits of Unicode code points.
require SRFI-14
;; the ASCII-only variant of char-set:lower-case
printf "%s\n" %char-set:lower-case
$ idio simple-char-sets
#<SI sparse-char-set size=1114112 planes=#[
#B{ 65536 60:01111111 68-70 11100000 }
#f #f #f #f #f #f #f #f #f #f #f #f #f #f #f #f ]>
NB. The output has been edited for readability.
For the ASCII-only variant you can see the bitset for the first plane
is #B{ 65536 60:01111111 68-70 11100000 }
which reads as bits from
index 61 (hex) through to 7a (hex). Which looks reasonable.
Try char-set:lower-case
(without the %
) for the complete
Unicode SRFI 14 definition of lower case code points.
Last built at 2025-01-22T07:11:09Z+0000 from 77077af (dev) for Idio 0.3