String Type¶
Strings are arrays of Unicode code points efficiently packed into variable-width arrays.
Substrings are references into sections of Idio strings but are otherwise handled the same.
Pathnames are a subset of strings where the elements of the string are not treated as UTF-8. Any file name value returned from the operating system will be a pathname.
Consequently, you cannot directly compare a file name from the file system to a string from your source code. See string->pathname for a conversion function. There is no reverse function (pathname to string) as there is no encoding in a file name, it is just a sequence of bytes.
Reader Form¶
The input form for a string is the usual "...", that is a U+0022
(QUOTATION MARK) delimited value.
The collected bytes are assumed to be part of a valid UTF-8 sequence. If the byte sequence is invalid UTF-8 you will get the (standard) �, U+FFFD (REPLACEMENT CHARACTER) and the decoding will resume with the next byte. This may result in several replacement characters being generated.
There are a couple of notes:
\, U+005C (REVERSE SOLIDUS – backslash) is the escape character. The obvious character to escape is"itself allowing you to embed a double-quote symbol in a double-quoted string:"hello\"world".In the spirit of C escape sequences Idio also allows:
Supported escape sequences in strings¶ sequence
(hex) ASCII
description
\a07
alert / bell
\b08
backspace
\e1B
escape character
\f0C
form feed
\n0A
newline
\r0D
carriage return
\t09
horizontal tab
\v0B
vertical tab
\\5C
backslash
\x...up to 2 hex digits representing any byte
\u...up to 4 hex digits representing a Unicode code point
\U...up to 8 hex digits representing a Unicode code point
Any other escaped character results in that character.
For
\x,\uand\Uthe code will stop consuming code points if it sees one of the usual delimiters or a code point that is not a hex digit:"\Ua9 2021"silently stops at the SPACE character giving"© 2021"and, correspondingly,"\u00a92021"gives"©2021"as a maximum of 4 hex digits are consumed by\u.\xis unrestricted (other than between 0x0 and 0xff) and\uand\Uwill have the hex digits converted into UTF-8.Adding
\xbytes into a string is an exercise in due diligence.Idio allows multi-line strings:
str1 := "Hello World" str2 := "Hello\nWorld"
The string constructors for
str1andstr2are equivalent.
Pathnames¶
%P"..." (or matching brackets, %P(...) or %P{...} or
%P[...] or, in general, %Pc...c) where the ... is
a regular string as above.
That’s where the \xHH escape for strings comes into its
own. If we know that a filename starts with ISO8859-1’s 0xA9 (the
same “character” as ©, U+00A9 (COPYRIGHT SIGN)), as in a literal byte,
0xA9, and not the UTF-8 sequence 0xC2 0xA9, then we can create such a
string: %P"\xa9...".
Pathnames, or strings being used as pathnames, with an ASCII NUL
(\x00) will result in a format error when they are attempted to be
used. They are perfectly valid code points for Idio strings
but it is not possible to have an ASCII NUL in a C string,
being passed to the operating system’s API.
Octet Strings¶
%B"..." (or matching brackets, %B(...) or %B{...} or
%B[...] or, in general, %Bc...c) where the ... is
a regular string as above.
Note
The name, byte string, seems too overloaded but the nominal reader
form, %O is too easily confused with a putative %0. So we
have a mixed result, the name, octet string, with a reader form
derived from byte string.
Mixing Strings¶
You can append-string strings together and join-string strings with a delimiter but be careful as mixing string variants will result in a gracefully degraded result: unicode to pathname to octet-string.
Interpolated Strings¶
From time to time it is convenient to want to expand references to variables inside a string. There is a special reader form for such interpolated strings:
#S{...${expr}...}
Here, everything between the outermost matching { and } are
scanned for instances of the interpolation sigil, $. A matching
set of { and } is read in and the expression therein is
evaluated, the result being converted to a string (if required) and
replacing the interpolated expression. The rest of the string is
added in a similar way.
If you want to embed an actual interpolation sigil, $, you can
escape it with the default escape character \:
#S{Your \$PATH will be '${(frob-path)}'!}
Whatever the call to frob-path returns will be converted to a
string (if necessary) giving a string equivalent to:
"Your $PATH will be '...'!"
In this particular case, there’s little advantage over using sprintf etc. but in code generation it is much more convenient to see (pre-)constructed variable references in situ in the expected output.
There are two options you can pass, between the #S and opening
brace: an alternative interpolation sigil and an alternative escape
character.
In effect, normal behaviour is:
#S$\{...}
If you only want to change the escape character, use . for the
interpolation sigil – which implies that the interpolation sigil
cannot be ..
If the use of braces, { and }, means you would need to escape
braces within the interpolated string a lot you can use parenthesis or
brackets as the delimiting pair:
; generate some C code
printf #S[
if ($condition) {
doit(${c-name arg1}, ${c-name arg2});
}
]
although note that you can only use braces for the expression delimiters.
String Predicates¶
- function string? o¶
test if o is an string
- Param o:
object to test
- Return:
#tif o is an string,#fotherwise
- function pathname? o¶
test if o is an pathname
- Param o:
object to test
- Return:
#tif o is an pathname,#fotherwise
Note
type->string will report a pathname as a string.
- function octet-string? o¶
test if o is an octet string
- Param o:
object to test
- Return:
#tif o is an octet string,#fotherwise
Note
type->string will report an octet-string as a string.
String Constructors¶
- function make-string size [fillc]¶
create a string with an initial length of size
- Param size:
initial string size
- Type size:
integer
- Param fillc:
fill character value, defaults to
#\{space}- Type fillc:
unicode, optional
- Return:
the new string
- Rtype:
string
- function substring s p0 [pn]¶
return a substring of s from position p0 through to but excluding position pn
- Param s:
string
- Type s:
string
- Param p0:
position
- Type p0:
integer
- Param pn:
position, defaults to string length
- Type pn:
integer, optional
- Return:
the substring
- Rtype:
string
If p0 or pn are negative they are considered to be with respect to the end of the string. This can still result in a negative index.
Note
Technically, the return type is a substring but as substrings are indistinct from strings at a user level then a return type of string suffices.
type->string will reveal the difference.
- function list->string l¶
return a string from the list of the Unicode code points in l
- Param l:
list of code points
- Type s:
list
- Return:
string
- Rtype:
string
- function symbol->string s¶
convert symbol s into a string
- Param s:
symbol to convert
- Type s:
symbol
- Return:
string
- Rtype:
string
- function keyword->string kw¶
convert keyword kw to a string
- Param kw:
keyword to convert
- Type kw:
keyword
- Return:
string
- function string->pathname s¶
return a pathname of the UTF-8 encoding of s
- Param s:
string
- Type s:
string
- Return:
pathname
- Rtype:
pathname
- function string->octet-string s¶
return an octet string of the UTF-8 encoding of s
- Param s:
string
- Type s:
string
- Return:
octet string
- Rtype:
octet string
- function octet-string->string s¶
return a string from the UTF-8 decoding of s
- Param s:
string
- Type s:
octet string
- Return:
string
- Rtype:
string
Warning
This is highly likely to generate #U+FFFD REPLACEMENT CHARACTER in the resultant string.
- function ->string o¶
convert o to a string unless it already is a string
- Param o:
object to convert
- Return:
a string representation of o
->stringdiffers from string in that it won’t stringify a string!
- function string o¶
convert o to a string
- Param o:
object to convert
- Return:
a string representation of o
String Attributes¶
- function string-length s¶
return the number of code points in s
- Param s:
string
- Type s:
string
- Return:
number of code points
- Rtype:
integer
- function string-ref s index¶
return code point at position index in s
positions start at 0
- Param s:
string
- Type s:
string
- Param index:
position
- Type index:
integer
- Return:
code point
- Rtype:
unicode
- function string-set! s index c¶
set position index of s to c
positions start at 0
- Param s:
string
- Type s:
string
- Param index:
position
- Type index:
integer
- Param c:
code point
- Type c:
unicode
- Return:
#<unspec>
string-set! will fail if c is wider than the existing storage allocation for s
- function string-fill! s fill¶
set all positions of s to fill
- Param s:
string
- Type s:
string
- Param fill:
code point
- Type fill:
unicode
- Return:
#<unspec>
string-fill! will fail if c is wider than the existing storage allocation for s
String Functions¶
- function append-string [args]¶
append strings
- Param args:
strings to append together
- Type args:
list, optional
- Return:
string (”” if no args supplied)
append-stringwill gracefully degrade the string variant based on the arguments: unicode > pathname > octet-stringappend-stringtakes multiple arguments each of which is a string.See also
concatenate-string which takes a single argument which is a list of strings.
- function concatenate-string ls¶
concatenate strings in list ls
- Param ls:
list of strings to concatenate together
- Type ls:
list, optional
- Return:
string (”” if ls is
#n)
concatenate-stringtakes a single argument, which is a list of strings. It is roughly comparable toapply append-string ls
See also
append-string takes multiple arguments each of which is a string.
- function copy-string s¶
return a copy of s which is not
eq?to s- Param s:
string
- Type s:
string
- Return:
string
- Rtype:
string
- function join-string delim args¶
return a string of args interspersed with delim
- Param delim:
string
- Type delim:
string
- Param args:
string(s) to be joined
- Type args:
list, optional
- Return:
string (”” if args is
#n)
- function string-index s c¶
return the index of c in s or
#f- Param s:
string
- Type s:
string
- Param c:
code point
- Type c:
unicode
- Return:
index or
#f- Rtype:
integer or
#f
- function string-rindex s c¶
return the rightmost index of c in s or
#f- Param s:
string
- Type s:
string
- Param c:
code point
- Type c:
unicode
- Return:
index or
#f- Rtype:
integer or
#f
- function fields in¶
split string in using characters from IFS into an array with the first element the original string
- Param in:
string to split
- Type in:
string
- Return:
array (of strings)
Adjacent characters from IFS are considered a single delimiter.
See also
split-string which returns a list
- function split-string in [delim]¶
split string in using characters from delim into a list of strings
- Param in:
string to split
- Type in:
string
- Param delim:
delimiter characters, defaults to IFS
- Type delim:
string, optional
- Return:
list (of strings)
Adjacent characters from delim are considered a single delimiter.
See also
split-string-exactly which treats delim more fastidiously and fields which returns an array
- function split-string-exactly in [delim]¶
split string in using characters from delim into a list of strings
- Param in:
string to split
- Type in:
string
- Param delim:
delimiter characters, defaults to IFS
- Type delim:
string, optional
- Return:
list (of strings)
Adjacent characters from delim are considered separate delimiters.
See also
- function string<? s1 s2 [...]¶
apply the less-than comparator to strings
- Param s1:
string
- Type s1:
string
- Param s2:
string
- Type s2:
string
- Return:
the result of comparing the arguments
- Rtype:
boolean
string<?with more than one argument (a minimum of two) has each subsequent argument compared to the one to its left. The result is#tif all subsequent arguments are less-than the argument to their left otherwise the result is#f.string<?converts the Idio string to a UTF-8 representation in a C string then uses strncmp(3) to compare using the shorter length string.If the strings are considered equal then the shorter string is considered less than the longer string.
- function string<=? s1 s2 [...]¶
apply the less-than-or-equal comparator to strings
- Param s1:
string
- Type s1:
string
- Param s2:
string
- Type s2:
string
- Return:
the result of comparing the arguments
- Rtype:
boolean
string<=?with more than one argument (a minimum of two) has each subsequent argument compared to the one to its left. The result is#tif all subsequent arguments are less-than-or-equal to the argument to their left otherwise the result is#f.string<=?converts the Idio string to a UTF-8 representation in a C string then uses strncmp(3) to compare using the shorter length string.If the strings are considered equal then the shorter string is considered less than the longer string.
- function string=? s1 s2 [...]¶
apply the equality comparator to strings
- Param s1:
string
- Type s1:
string
- Param s2:
string
- Type s2:
string
- Return:
the result of comparing the arguments
- Rtype:
boolean
string=?with more than one argument (a minimum of two) has each subsequent argument compared to the one to its left. The result is#tif all subsequent arguments are equal to the argument to their left otherwise the result is#f.string=?converts the Idio string to a UTF-8 representation in a C string then uses strncmp(3) to compare using the shorter length string.If the strings are considered equal then the shorter string is considered less than the longer string.
- function string>=? s1 s2 [...]¶
apply the greater-than-or-equal comparator to strings
- Param s1:
string
- Type s1:
string
- Param s2:
string
- Type s2:
string
- Return:
the result of comparing the arguments
- Rtype:
boolean
string>=?with more than one argument (a minimum of two) has each subsequent argument compared to the one to its left. The result is#tif all subsequent arguments are greater-than-or-equal to the argument to their left otherwise the result is#f.string>=?converts the Idio string to a UTF-8 representation in a C string then uses strncmp(3) to compare using the shorter length string.If the strings are considered equal then the shorter string is considered less than the longer string.
- function string>? s1 s2 [...]¶
apply the greater-than comparator to strings
- Param s1:
string
- Type s1:
string
- Param s2:
string
- Type s2:
string
- Return:
the result of comparing the arguments
- Rtype:
boolean
string>?with more than one argument (a minimum of two) has each subsequent argument compared to the one to its left. The result is#tif all subsequent arguments are greater-than the argument to their left otherwise the result is#f.string>?converts the Idio string to a UTF-8 representation in a C string then uses strncmp(3) to compare using the shorter length string.If the strings are considered equal then the shorter string is considered less than the longer string.
- function string-ci<? s1 s2 [...]¶
apply the less-than comparator to case-insensitive strings
- Param s1:
string
- Type s1:
string
- Param s2:
string
- Type s2:
string
- Return:
the result of comparing the arguments
- Rtype:
boolean
string<?with more than one argument (a minimum of two) has each subsequent argument compared to the one to its left. The result is#tif all subsequent arguments are less-than the argument to their left otherwise the result is#f.string<?converts the Idio string to a UTF-8 representation in a C string then uses strncasecmp(3) to compare using the shorter length string.If the strings are considered equal then the shorter string is considered less than the longer string.
- function string-ci<=? s1 s2 [...]¶
apply the less-than-or-equal comparator to case-insensitive strings
- Param s1:
string
- Type s1:
string
- Param s2:
string
- Type s2:
string
- Return:
the result of comparing the arguments
- Rtype:
boolean
string<=?with more than one argument (a minimum of two) has each subsequent argument compared to the one to its left. The result is#tif all subsequent arguments are less-than-or-equal to the argument to their left otherwise the result is#f.string<=?converts the Idio string to a UTF-8 representation in a C string then uses strncasecmp(3) to compare using the shorter length string.If the strings are considered equal then the shorter string is considered less than the longer string.
- function string-ci=? s1 s2 [...]¶
apply the equality comparator to case-insensitive strings
- Param s1:
string
- Type s1:
string
- Param s2:
string
- Type s2:
string
- Return:
the result of comparing the arguments
- Rtype:
boolean
string=?with more than one argument (a minimum of two) has each subsequent argument compared to the one to its left. The result is#tif all subsequent arguments are equal to the argument to their left otherwise the result is#f.string=?converts the Idio string to a UTF-8 representation in a C string then uses strncasecmp(3) to compare using the shorter length string.If the strings are considered equal then the shorter string is considered less than the longer string.
- function string-ci>=? s1 s2 [...]¶
apply the greater-than-or-equal comparator to case-insensitive strings
- Param s1:
string
- Type s1:
string
- Param s2:
string
- Type s2:
string
- Return:
the result of comparing the arguments
- Rtype:
boolean
string>=?with more than one argument (a minimum of two) has each subsequent argument compared to the one to its left. The result is#tif all subsequent arguments are greater-than-or-equal to the argument to their left otherwise the result is#f.string>=?converts the Idio string to a UTF-8 representation in a C string then uses strncasecmp(3) to compare using the shorter length string.If the strings are considered equal then the shorter string is considered less than the longer string.
- function string-ci>? s1 s2 [...]¶
apply the greater-than comparator to case-insensitive strings
- Param s1:
string
- Type s1:
string
- Param s2:
string
- Type s2:
string
- Return:
the result of comparing the arguments
- Rtype:
boolean
string>?with more than one argument (a minimum of two) has each subsequent argument compared to the one to its left. The result is#tif all subsequent arguments are greater-than the argument to their left otherwise the result is#f.string>?converts the Idio string to a UTF-8 representation in a C string then uses strncasecmp(3) to compare using the shorter length string.If the strings are considered equal then the shorter string is considered less than the longer string.
- function strip-string str discard [ends]¶
return a string which is str without leading, trailing (or both) discard characters
- Param str:
string
- Type str:
string
- Param discard:
string
- Type discard:
string
- Param ends:
'left,'right(default),'bothor'none- Type ends:
symbol, optional
- Return:
string
The returned value could be str or a substring of str
Last built at 2025-10-28T07:10:47Z+0000 from 3d9f9d3 (dev) for Idio 0.3.b.6