Unicode Type¶
The unicode type is a representation of Unicode (arguably ISO10646) code points. In practice they are much like an integer ranging from 0 to 0x10FFFF although they do not overlap or have any direct interaction with integers, they are a separate type.
Reader Forms¶
The canonical reader form is #U+HHHH
where the number of hex
digits, H
, is “enough.” Leading zeroes are not required,
#U+127
is the same as #U+0127
.
An alternate reader form is #\x
where x
is the
UTF-8 representation of the code point – for example, #\ħ
would
be read as U+0127 (LATIN SMALL LETTER H WITH STROKE). Clearly, the
usefulness of this form is dependent on general support by fonts and
editors.
A final reader form is for a limited number of named characters, say,
#\{newline}
, with the name in braces. We could, but we don’t, use
the Unicode Character Database names, say, #\{LATIN SMALL LETTER H
WITH STROKE}
.
Instead the set of names is limited to:
name |
code point |
C escape sequence |
---|---|---|
nul |
U+0000 |
|
soh |
U+0001 |
|
stx |
U+0002 |
|
etx |
U+0003 |
|
eot |
U+0004 |
|
enq |
U+0005 |
|
ack |
U+0006 |
|
bel |
U+0007 |
|
bs |
U+0008 |
|
ht |
U+0009 |
|
lf |
U+000A |
|
vt |
U+000B |
|
ff |
U+000C |
|
cr |
U+000D |
|
so |
U+000E |
|
si |
U+000F |
|
dle |
U+0010 |
|
dcl |
U+0011 |
|
dc2 |
U+0012 |
|
dc3 |
U+0013 |
|
dc4 |
U+0014 |
|
nak |
U+0015 |
|
syn |
U+0016 |
|
etb |
U+0017 |
|
can |
U+0018 |
|
em |
U+0019 |
|
sub |
U+001A |
|
esc |
U+001B |
|
fs |
U+001C |
|
gs |
U+001D |
|
rs |
U+001E |
|
us |
U+001F |
|
sp |
U+0020 |
|
alarm |
U+0007 |
|
backspace |
U+0008 |
|
tab |
U+0009 |
|
linefeed |
U+000A |
|
newline |
U+000A |
|
vtab |
U+000B |
|
page |
U+000C |
|
return |
U+000D |
|
carriage-return |
U+000D |
|
esc |
U+001B |
|
escape |
U+001B |
|
space |
U+0020 |
|
del |
U+007F |
|
delete |
U+007F |
|
lbrace |
U+007B |
|
{ |
U+007B |
Unicode Predicates¶
- function unicode? o¶
test if o is a unicode value
- Param o:
object to test
- Return:
#t
if o is a unicode value,#f
otherwise
Unicode Constructors¶
- function integer->unicode i¶
convert integer i to a Unicode code point
- Param i:
number
- Type i:
integer
- Return:
Unicode code point
- Rtype:
unicode
Unicode Functions¶
- function unicode=? cp1 cp2 [...]¶
test if unicode arguments are equal
- Param cp1:
unicode
- Param cp2:
unicode
- Param …:
unicode
- Return:
#t
if arguments are equal,#f
otherwise
Last built at 2024-12-21T07:10:46Z+0000 from 62cca4c (dev) for Idio 0.3.b.6