.. include:: ../global.rst .. _`POSIX regex`: POSIX regex ^^^^^^^^^^^ :lname:`Idio` uses the POSIX :manpage:`regex(7)` regular expression primitives :ref:`regcomp ` and :ref:`regexec `. These are combined in the function :ref:`regex-matches `. Slightly better for use in loops is the template :ref:`regex-case ` which works like a simplified :ref:`cond ` except the clause "conditions" are regular expressions to be matched. ``regex-case`` then supplies the consequent block with the result of the call to ``regexec`` as the variable :var:`r`. As such :var:`r.0` is the whole of the matched string, :var:`r.1` is the first matched sub-expression, :var:`r.2` the second matched sub-expression, etc.. Similarly, :ref:`pattern-case ` provides something like the shell's *Pattern Matching* where ``*`` and ``?`` are really ``.*`` and ``.`` respectively. In particular, see :ref:`regex-pattern-string ` for how how the string is processed. .. _`regcomp`: .. idio:function:: regcomp rx [flags] POSIX :manpage:`regex(3)` compile the regular expression in `rx` suitable for subsequent use in :ref:`regexec ` The `flags` are: ``REG_EXTENDED`` ``REG_ICASE`` ``REG_NOSUB`` (ignored) ``REG_NEWLINE`` This code defaults to ``REG_EXTENDED`` so there is an extra ``REG_BASIC`` flag to disable ``REG_EXTENDED`` :param rx: regular expression :type rx: string :param flags: regcomp flags :type flags: list of symbols :return: compiled :manpage:`regex(3)` :rtype: C/pointer .. _`regexec`: .. idio:function:: regexec rx str [flags] POSIX :manpage:`regex(3)` match the regular expression in `rx` against the string `str` where `rx` was compiled using :ref:`regcomp ` The `flags` are: ``REG_NOTBOL`` ``REG_NOTEOL`` ``REG_STARTEND`` (if supported, see below) ``REG_VERBOSE`` return verbose results On a successful match an array of the subexpressions in `rx` is returned with the first (zero-th) being the entire matched string. If a subexpression in `rx` matched the corresponding array element will be the matched string. If a subexpression in `rx` did not match the corresponding array element will be ``#f``. :param rx: compiled regular expression :type rx: C/pointer :param str: string to match against :type rx: string :param flags: regexec flags :type flags: list of symbols :return: see below :rtype: array or ``#f`` By default `regexec` returns an array of matching subexpressions or ``#f`` for no match. If ``REG_VERBOSE`` is passed in flags then each element of the array is a list of the matched sub-expression, its starting offset and its ending offset plus one (suitable for :ref:`substring `). ``REG_STARTEND`` (if supported) is a valid :lname:`C` flag and accepted here but is ignored as there is no means to pre-supply ``pmatch[0]`` (see :manpage:`regexec(3)`). .. _`regex-matches`: .. idio:function:: regex-matches rx str does `rx` match `str`? :param rx: regular expression :type rx: string :param str: string to match against :type str: string :return: see :ref:`regexec ` .. _`regex-case`: .. idio:template:: regex-case e [clauses] ``regex-case`` works like a simplified :ref:`cond ` where `e` is the string to be matched against and the "conditions" in each clause are the regular expressions to test with. :param e: the string to be matched against :type e: string :param clauses: clauses like :samp:`("{regex}" {expr})` :return: whatever any matched clause's consequent expression returns. `e` will be evaluated and should return a string. If the regular expression matches then the consequent expression is treated like an implict ``=>`` clause where the supplied parameter is :var:`r`. Thus :var:`r.0` represents the whole of the matched string, :var:`r.1` the first matched sub-expression, :var:`r.2` the second matched sub-expression, etc.. :Example: Suppose we want to match a common :samp:`{var}={value}` assignment: .. code-block:: idio (regex-case (read-line) ("^([[:alpha:]][[:alnum:]_]*)=(.*)" { printf "%s is '%s'\n" r.1 r.2 })) .. note:: ``regex-case`` stashes the compiled regular expression for literal strings in a global table. This means that in loops the regular expression doesn't need to be recompiled. It also means the compiled regular expressions are not reaped until :lname:`Idio` exits. .. seealso:: :ref:`pattern-case ` .. _`regex-exact-string`: .. idio:function:: regex-exact-string str Return a :manpage:`regcomp(3)`-safe version of `str` :param str: string to make safe :type str: string :return: regcomp-safe string :rtype: string In particular, code points in the set ``$^.[()|*+?{`` (see :manpage:`regex(7)`) are escaped. .. _`regex-pattern-string`: .. idio:function:: regex-pattern-string str Return a `Pattern Matching` version of `str` :param str: string to convert :type str: string :return: pattern-like string :rtype: string In particular: * ``*`` is replaced with ``.*`` * ``?`` is replaced with ``.`` * (simple) `bracket expressions` are allowed with optional ``*``, ``+`` or ``?`` qualifiers A simple bracket expression is one with no collating elements (eg. ``[:alpha:]``) or at most one collating element so long as it is the last element of the bracket expression list. * ``.^$|+`` are (otherwise) escaped and become literals * ``{`` is escaped and is a literal therefore `bounds` (``{n,m}``) are not allowed * ``()`` are escaped and are literals therefore sub-expressions are not allowed .. _`pattern-case`: .. idio:template:: pattern-case e [clauses] ``pattern-case`` works like a simplified :ref:`cond ` where `e` is the string to be matched against and the "conditions" in each clause are the *pattern matches* to test with. :param e: the string to be matched against :type e: string :param clauses: clauses like :samp:`("{pattern-match}" {expr})` :return: whatever any matched clause's consequent expression returns. `e` will be evaluated and should return a string. Here, *pattern matches* have :ref:`regex-pattern-string ` applied, are anchored to the entire string and the code continues like :ref:`regex-case `. The string manipulation is like: :samp:`sprintf "^%s$" (regex-pattern-string {pattern-match})` If the pattern matches then the consequent expression is treated like an implict ``=>`` clause where the supplied parameter is :var:`r`. Thus :var:`r.0` represents the whole of the matched string, :var:`r.1` the first matched sub-expression, :var:`r.2` the second matched sub-expression, etc.. :Example: Suppose we want an unreliable method to determine if this is a BSD-style operating system: .. code-block:: idio (pattern-case (collect-output uname -s) ("*BSD" { printf "%s is a BSD\n" r.0 })) .. note:: ``pattern-case`` stashes the compiled regular expression for literal strings in a global table. This means that in loops the regular expression doesn't need to be recompiled. It also means the compiled regular expressions are not reaped until :lname:`Idio` exits. .. seealso:: :ref:`regex-case ` .. include:: ../commit.rst