Strings

Strings are the usual "-delimited sequences of “characters”, here, Unicode code points. Normal escape sequences are available: \n, \t etc. and \u... (for up to four digit hex sequences) and \U... (for up to eight digit hex sequences).

There is an additional \x.. (for up to two digit hex sequences) which is only useful for constructing non-UTF-8 pathnames.

Multi-line strings are perfectly reasonable.

strings.idio
s1 := "hello\nworld"

s2 := "hello
world"

printf "Does <<%s>> equal? <<%s>>? %s\n" s1 s2 (equal? s1 s2)

;; enter one directly (source files are UTF-8)
s1 = "ħello"

;; or using an escape sequence
s2 = "\u0127ello"

printf "Does <<%s>> equal? <<%s>>? %s\n" s1 s2 (equal? s1 s2)

;; Note that you only need pass as many hex digits as necessary to
;; distinguish the code point so long as the following characters are
;; not possible hex digits.
;;
;; We can't say \u127ello as the "up to four digits" will consume 127e
;; which is ቾ (U+127E ETHIOPIC SYLLABLE CO).
;;
;; In the next example U+0050 LATIN CAPITAL LETTER P can be reduced to
;; \u50 because the next code point is l (U+006C LATIN SMALL LETTER L)
;; which is not a hex digit.
s1 = "\u50lay time!"

;; unicode/describe works for strings too which is helpful when the
;; visual representation is confusing, for example, this is not é
;; (U+00E9 LATIN SMALL LETTER E WITH ACUTE)
unicode/describe "é"
$ idio strings
Does <<hello
world>> equal? <<hello
world>>? #t
Does <<ħello>> equal? <<ħello>>? #t
0065;;Ll;;;;;;;;;;0045;;0045 # Letter Lowercase Alphabetic ASCII_Hex_Digit Uppercase=0045 Titlecase=0045
0301;;Mn;;;;;;;;;;;; # Mark Extend

String Indexing

You can make integer-index accesses of strings which return Unicode code points.

You can capture substrings of strings from an index position p0 through to but excluding another index position (defaulting to the rest of the string).

string-access.idio
s1 := "ħello"

;; s1.0 is slightly slower
printf "first code point is %s (or %s)\n" (string-ref s1 0) s1.0

printf "last code point is %s\n" s1.-1

slen := string-length s1

ss1 := substring s1 1 (slen - 1)

printf "s1 from 1 up to %d is %s\n" (slen - 1) ss1
printf "ss1 is %d code points\n" (string-length ss1)

;; you can loop over the code points in strings
for c in (substring s1 0 3) {
  write c                            ; output a reader-friendly format
  (newline)
}

printf "the first l of %s is at index %d\n" s1 (string-index s1 #\l)
printf "the last l of %s is at index %d\n" s1 (string-rindex s1 #\l)
$ idio string-access
first code point is ħ (or ħ)
last code point is o
s1 from 1 up to 4 is ell
ss1 is 3 code points
#U+0127
#\e
#\l
the first l of ħello is at index 2
the last l of ħello is at index 3

Split/Join/Trim

Idio defaults to shell-like behaviour for splitting strings in that multiple adjacent instances of a delimiter are consumed together giving the sense of splitting a line of text into fields or words. You can, of course, be more exacting than that.

string-parts.idio
;; two SPACEs at start and end and a TAB in the middle
s1 := "  hello       world  "

words := split-string s1 " \t"
printf "words are %s\n" words

;; fields is similar but uses IFS as the delimiter (which defaults to
;; the usual " \t\n") and returns an array with the first element
;; being the original string
printf "fields are %s\n" (fields s1)

printf "joined up words are %s\n" (join-string "-+-" words)

printf "right-stripped string is  '%s'\n" (strip-string s1 " ")
printf "double-stripped string is '%s'\n" (strip-string s1 " " 'both)
$ idio string-parts
words are ("hello" "world")
fields are #[ "  hello       world  " "hello" "world" ]
joined up words are hello-+-world
right-stripped string is  '  hello   world'
double-stripped string is 'hello     world'

Interpolated Strings

Idio strings are not interpolated (a bit like the shell’s single-quoted strings) but we have frequent need for interpolated strings.

Here we can go one better and not just allow the expansion of a variable but the expansion of an expression.

Interpolated strings are encoded as #S{...${expr}...} where everything between the outermost matching { and } are scanned for instances of the interpolation sigil, $. A matching set of { and } is read in and the expression therein is evaluated, the result being converted to a string (if required) and replacing the interpolated expression. The rest of the string is added in a similar way.

interpolated-string.idio
printf #S{Your SHELL is ${SHELL} (${string-length SHELL} code points)\n}
$ idio interpolated-string
Your SHELL is /bin/bash (9 code points)

Interpolated strings are used very heavily in code generation.

Last built at 2024-10-13T06:11:44Z+0000 from 77077af (dev) for Idio 0.3