unicode Functions

The predicates in this list are asserting some Unicode Category or Property. There are at least 65 Categories and Properties, the ones here are those required for Idio to do what it needs to do.

There are three conversion functions between cases.

function unicode/ASCII_Hex_Digit? o

Does o have the Unicode Property ASCII_Hex_Digit?

Param o:

object to test

Type o:

unicode|string

Return:

boolean

function unicode/Alphabetic? o

Does o have the Unicode Property Alphabetic?

Param o:

object to test

Type o:

unicode|string

Return:

boolean

function unicode/Control? o

Does o have the Unicode Property Control?

Param o:

object to test

Type o:

unicode|string

Return:

boolean

function unicode/Decimal_Number? o

Is o in the Unicode Category Nd?

Param o:

object to test

Type o:

unicode|string

Return:

boolean

function unicode/Extend? o

Does o have the Unicode Property Extend?

Param o:

object to test

Type o:

unicode|string

Return:

boolean

function unicode/Fractional_Number? o

Is the Unicode Property Numeric_Value of o a fraction?

Param o:

object to test

Type o:

unicode|string

Return:

boolean

function unicode/L? o

Does o have the Unicode Property L?

Param o:

object to test

Type o:

unicode|string

Return:

boolean

function unicode/LV? o

Does o have the Unicode Property LV?

Param o:

object to test

Type o:

unicode|string

Return:

boolean

function unicode/LVT? o

Does o have the Unicode Property LVT?

Param o:

object to test

Type o:

unicode|string

Return:

boolean

function unicode/Letter? o

Is o in any of the Unicode Categories L*?

Param o:

object to test

Type o:

unicode|string

Return:

boolean

function unicode/Lowercase? o

Does o have the Unicode Property Lowercase?

Param o:

object to test

Type o:

unicode|string

Return:

boolean

function unicode/Mark? o

Is o in any of the Unicode Categories M*?

Param o:

object to test

Type o:

unicode|string

Return:

boolean

function unicode/Number? o

Is o in any of the Unicode Categories N*?

Param o:

object to test

Type o:

unicode|string

Return:

boolean

function unicode/Punctuation? o

Does o have the Unicode Property Punctuation?

Param o:

object to test

Type o:

unicode|string

Return:

boolean

function unicode/Regional_Indicator? o

Does o have the Unicode Property Regional_Indicator?

Param o:

object to test

Type o:

unicode|string

Return:

boolean

function unicode/Separator? o

Is o in any of the Unicode Categories Z*?

Param o:

object to test

Type o:

unicode|string

Return:

boolean

function unicode/SpacingMark? o

Does o have the Unicode Property SpacingMark?

Param o:

object to test

Type o:

unicode|string

Return:

boolean

function unicode/Symbol? o

Is o in any of the Unicode Categories S*?

Param o:

object to test

Type o:

unicode|string

Return:

boolean

function unicode/T? o

Does o have the Unicode Property T?

Param o:

object to test

Type o:

unicode|string

Return:

boolean

function unicode/Titlecase_Letter? o

Does o have the Unicode Property Titlecase_Letter?

Param o:

object to test

Type o:

unicode|string

Return:

boolean

function unicode/Uppercase? o

Does o have the Unicode Property Uppercase?

Param o:

object to test

Type o:

unicode|string

Return:

boolean

function unicode/V? o

Does o have the Unicode Property V?

Param o:

object to test

Type o:

unicode|string

Return:

boolean

function unicode/White_Space? o

Does o have the Unicode Property White_Space?

Param o:

object to test

Type o:

unicode|string

Return:

boolean

function unicode/ZWJ? o

Does o have the Unicode Property ZWJ?

Param o:

object to test

Type o:

unicode|string

Return:

boolean

function unicode/ASCII-Decimal_Number? cp

Is cp in the Unicode Category Nd and less than 0x80?

Param cp:

code point to test

Type cp:

unicode

Return:

boolean

This function overrides the ASCII-Decimal_Number? version found in lib/bootstrap/common.idio.

function unicode/->Lowercase cp

Return the Unicode Simple_Lowercase_Mapping of cp

Param cp:

value to convert

Type cp:

unicode

Return:

unicode

Note that the default lower-case mapping is to cp.

function unicode/->Titlecase cp

Return the Unicode Simple_Titlecase_Mapping of cp

Param cp:

value to convert

Type cp:

unicode

Return:

unicode

Note that the default Title-case mapping is to cp.

function unicode/->Uppercase cp

Return the Unicode Simple_Uppercase_Mapping of cp

Param cp:

value to convert

Type cp:

unicode

Return:

unicode

Note that the default upper-case mapping is to cp.

function unicode/numeric-value cp

return the Unicode Numeric_Value of cp

Param cp:

code point

Type cp:

unicode

Return:

integer or string

Raises ^rt-param-value-error:

if cp is not Numeric?

The Unicode Numeric_Value can be a decimal integer or a rational which is returned as a string

function unicode/describe o

print the Unicode attributes of o

Param o:

value to describe

Type o:

unicode or string

Return:

#<unspec>

The unicode/describe reports a pseudo Unicode Character Database entry plus the Categories and Properties associated with the code point and indications of any Lowercase, Uppercase or Titlecase variants and any possible Numeric_Value.

It will do the same for each code point in a string (which may, of course, be more than the number of “characters”).

Idio> unicode/describe "é"
00E9;;Ll;;;;;;;;;;00C9;;00C9 # Letter Lowercase Alphabetic Uppercase=00C9 Titlecase=00C9
Idio> unicode/describe "é"
0065;;Ll;;;;;;;;;;0045;;0045 # Letter Lowercase Alphabetic ASCII_Hex_Digit Uppercase=0045 Titlecase=0045
0301;;Mn;;;;;;;;;;;; # Mark Extend
Idio> describe "🏴󠁧󠁢󠁳󠁣󠁴󠁿"
1F3F4;;So;;;;;;;;;;;; # Symbol
E0067;;Cf;;;;;;;;;;;; # Extend
E0062;;Cf;;;;;;;;;;;; # Extend
E0073;;Cf;;;;;;;;;;;; # Extend
E0063;;Cf;;;;;;;;;;;; # Extend
E0074;;Cf;;;;;;;;;;;; # Extend
E007F;;Cf;;;;;;;;;;;; # Extend

The third string is an example of an emoji, in this case, flag: Scotland, 🏴󠁧󠁢󠁳󠁣󠁴󠁿, in the Subdivision Flags Category. Don’t worry if you can’t actually see a Saltire, maybe just a black flag, as desktop browser support is poor, mobile phone support is better. The point being that a single (double-width) “character” is, in this case, constructed from seven Unicode code points:

1F3F4

WAVING BLACK FLAG

E0067

TAG LATIN SMALL LETTER G

E0062

TAG LATIN SMALL LETTER B

E0073

TAG LATIN SMALL LETTER S

E0063

TAG LATIN SMALL LETTER C

E0074

TAG LATIN SMALL LETTER T

E007F

CANCEL TAG

revealing the GB then SCT elements. There are corresponding WLS and ENG variants for the flags for Wales and England, respectively. These abbreviations appear to be derived from ISO 3166-2:GB.

Utility Functions

Some utility functions for dealing with SRFI-14 Module char-sets.

function unicode/unicode->plane cp

return the Unicode plane of cp

Param cp:

unicode to analyse

Return:

Unicode plane cp

Rtype:

fixnum

function unicode/unicode->plane-codepoint cp

return the lower 16 bits of cp

Param cp:

unicode to convert

Return:

lower 16 bits of of cp

Rtype:

fixnum

Last built at 2024-05-17T06:10:59Z+0000 from 62cca4c (dev) for Idio 0.3.b.6