unicode Functions¶
The predicates in this list are asserting some Unicode Category or Property. There are at least 65 Categories and Properties, the ones here are those required for Idio to do what it needs to do.
There are three conversion functions between cases.
- function unicode/ASCII_Hex_Digit? o¶
Does o have the Unicode Property
ASCII_Hex_Digit
?- Param o:
object to test
- Type o:
unicode|string
- Return:
boolean
- function unicode/Alphabetic? o¶
Does o have the Unicode Property
Alphabetic
?- Param o:
object to test
- Type o:
unicode|string
- Return:
boolean
- function unicode/Control? o¶
Does o have the Unicode Property
Control
?- Param o:
object to test
- Type o:
unicode|string
- Return:
boolean
- function unicode/Decimal_Number? o¶
Is o in the Unicode Category
Nd
?- Param o:
object to test
- Type o:
unicode|string
- Return:
boolean
- function unicode/Extend? o¶
Does o have the Unicode Property
Extend
?- Param o:
object to test
- Type o:
unicode|string
- Return:
boolean
- function unicode/Fractional_Number? o¶
Is the Unicode Property
Numeric_Value
of o a fraction?- Param o:
object to test
- Type o:
unicode|string
- Return:
boolean
- function unicode/L? o¶
Does o have the Unicode Property
L
?- Param o:
object to test
- Type o:
unicode|string
- Return:
boolean
- function unicode/LV? o¶
Does o have the Unicode Property
LV
?- Param o:
object to test
- Type o:
unicode|string
- Return:
boolean
- function unicode/LVT? o¶
Does o have the Unicode Property
LVT
?- Param o:
object to test
- Type o:
unicode|string
- Return:
boolean
- function unicode/Letter? o¶
Is o in any of the Unicode Categories
L*
?- Param o:
object to test
- Type o:
unicode|string
- Return:
boolean
- function unicode/Lowercase? o¶
Does o have the Unicode Property
Lowercase
?- Param o:
object to test
- Type o:
unicode|string
- Return:
boolean
- function unicode/Mark? o¶
Is o in any of the Unicode Categories
M*
?- Param o:
object to test
- Type o:
unicode|string
- Return:
boolean
- function unicode/Number? o¶
Is o in any of the Unicode Categories
N*
?- Param o:
object to test
- Type o:
unicode|string
- Return:
boolean
- function unicode/Punctuation? o¶
Does o have the Unicode Property
Punctuation
?- Param o:
object to test
- Type o:
unicode|string
- Return:
boolean
- function unicode/Regional_Indicator? o¶
Does o have the Unicode Property
Regional_Indicator
?- Param o:
object to test
- Type o:
unicode|string
- Return:
boolean
- function unicode/Separator? o¶
Is o in any of the Unicode Categories
Z*
?- Param o:
object to test
- Type o:
unicode|string
- Return:
boolean
- function unicode/SpacingMark? o¶
Does o have the Unicode Property
SpacingMark
?- Param o:
object to test
- Type o:
unicode|string
- Return:
boolean
- function unicode/Symbol? o¶
Is o in any of the Unicode Categories
S*
?- Param o:
object to test
- Type o:
unicode|string
- Return:
boolean
- function unicode/T? o¶
Does o have the Unicode Property
T
?- Param o:
object to test
- Type o:
unicode|string
- Return:
boolean
- function unicode/Titlecase_Letter? o¶
Does o have the Unicode Property
Titlecase_Letter
?- Param o:
object to test
- Type o:
unicode|string
- Return:
boolean
- function unicode/Uppercase? o¶
Does o have the Unicode Property
Uppercase
?- Param o:
object to test
- Type o:
unicode|string
- Return:
boolean
- function unicode/V? o¶
Does o have the Unicode Property
V
?- Param o:
object to test
- Type o:
unicode|string
- Return:
boolean
- function unicode/White_Space? o¶
Does o have the Unicode Property
White_Space
?- Param o:
object to test
- Type o:
unicode|string
- Return:
boolean
- function unicode/ZWJ? o¶
Does o have the Unicode Property
ZWJ
?- Param o:
object to test
- Type o:
unicode|string
- Return:
boolean
- function unicode/ASCII-Decimal_Number? cp¶
Is cp in the Unicode Category
Nd
and less than 0x80?- Param cp:
code point to test
- Type cp:
unicode
- Return:
boolean
This function overrides the ASCII-Decimal_Number? version found in
lib/bootstrap/common.idio
.
- function unicode/->Lowercase cp¶
Return the Unicode
Simple_Lowercase_Mapping
of cp- Param cp:
value to convert
- Type cp:
unicode
- Return:
unicode
Note that the default lower-case mapping is to cp.
- function unicode/->Titlecase cp¶
Return the Unicode
Simple_Titlecase_Mapping
of cp- Param cp:
value to convert
- Type cp:
unicode
- Return:
unicode
Note that the default Title-case mapping is to cp.
- function unicode/->Uppercase cp¶
Return the Unicode
Simple_Uppercase_Mapping
of cp- Param cp:
value to convert
- Type cp:
unicode
- Return:
unicode
Note that the default upper-case mapping is to cp.
- function unicode/numeric-value cp¶
return the Unicode
Numeric_Value
of cp- Param cp:
code point
- Type cp:
unicode
- Return:
integer or string
- Raises ^rt-param-value-error:
if cp is not
Numeric?
The Unicode
Numeric_Value
can be a decimal integer or a rational which is returned as a stringSee also
- function unicode/describe o¶
print the Unicode attributes of o
- Param o:
value to describe
- Type o:
unicode or string
- Return:
#<unspec>
The unicode/describe reports a pseudo Unicode Character Database entry plus the Categories and Properties associated with the code point and indications of any Lowercase, Uppercase or Titlecase variants and any possible Numeric_Value.
It will do the same for each code point in a string (which may, of course, be more than the number of “characters”).
Idio> unicode/describe "é" 00E9;;Ll;;;;;;;;;;00C9;;00C9 # Letter Lowercase Alphabetic Uppercase=00C9 Titlecase=00C9 Idio> unicode/describe "é" 0065;;Ll;;;;;;;;;;0045;;0045 # Letter Lowercase Alphabetic ASCII_Hex_Digit Uppercase=0045 Titlecase=0045 0301;;Mn;;;;;;;;;;;; # Mark Extend Idio> describe "🏴" 1F3F4;;So;;;;;;;;;;;; # Symbol E0067;;Cf;;;;;;;;;;;; # Extend E0062;;Cf;;;;;;;;;;;; # Extend E0073;;Cf;;;;;;;;;;;; # Extend E0063;;Cf;;;;;;;;;;;; # Extend E0074;;Cf;;;;;;;;;;;; # Extend E007F;;Cf;;;;;;;;;;;; # Extend
The third string is an example of an emoji, in this case,
flag: Scotland
, 🏴, in the Subdivision Flags Category. Don’t worry if you can’t actually see a Saltire, maybe just a black flag, as desktop browser support is poor, mobile phone support is better. The point being that a single (double-width) “character” is, in this case, constructed from seven Unicode code points:1F3F4
WAVING BLACK FLAG
E0067
TAG LATIN SMALL LETTER G
E0062
TAG LATIN SMALL LETTER B
E0073
TAG LATIN SMALL LETTER S
E0063
TAG LATIN SMALL LETTER C
E0074
TAG LATIN SMALL LETTER T
E007F
CANCEL TAG
revealing the GB then SCT elements. There are corresponding WLS and ENG variants for the flags for Wales and England, respectively. These abbreviations appear to be derived from ISO 3166-2:GB.
Utility Functions¶
Some utility functions for dealing with SRFI-14 Module char-sets.
- function unicode/unicode->plane cp¶
return the Unicode plane of cp
- Param cp:
unicode to analyse
- Return:
Unicode plane cp
- Rtype:
fixnum
- function unicode/unicode->plane-codepoint cp¶
return the lower 16 bits of cp
- Param cp:
unicode to convert
- Return:
lower 16 bits of of cp
- Rtype:
fixnum
Last built at 2025-02-05T07:10:43Z+0000 from 62cca4c (dev) for Idio 0.3.b.6