Idio Form

I have a strong desire to make Idio be strict. I have raged over shell scripts that do not handle errors.

To be fair, it’s not the easiest thing to get to grips with as errors occur in the shell all the time. Even TAB-completion can generate errors which, if you had had errexit enabled, is a touch annoying.

Shell scripts which run without error handling are at chronic risk of blithely crashing on through despite earlier commands failing. What could possibly go wrong? This is made far worse when the screen is clogged with command output with errors and warnings lost in the noise.

Even something as simple as ls *.tar can be a life-saver (script-saver?) as if the expected files don’t exist then Pathname Expansion will have failed passing the literal text *.tar to ls which will, unsurprisingly, be unable to find such a file and fail.

There’s your canary in the coal mine, right there. ls not only voiced a complaint: ls: cannot access '*.tar': No such file or directory but also exited non-zero. You could have stopped the script right there, almost for free, and saved whatever was coming next from crashing and burning.

What’s not to like?

Strictly Scripting

I have an idea that all technically correct rules and behaviours should be enabled by default.

This then requires a clear and unambiguous statement in the script that the rule or behaviour is being disabled. Something like:

suppress exit-on-error

Of course, regular users wouldn’t want to go suppressing exit-on-error as it defeats the very thing I’ve been raging about.

However, there are several circumstances when we don’t want full-on exit-on-error behaviour. Let’s take a look at some Bash casework.

For these examples we’ll modify our prompt to include the Bash environment variable SHLVL and a small indent so we know where we are and you can check $- for current shell settings – albeit that it doesn’t report on set -o long-form options.

set -e

Exit on error, right? Except…

Subshells spawned to execute command substitutions inherit the value of the -e option from the parent shell. When not in posix mode, bash clears the -e option in such subshells.

bash(1)

Hmm, I don’t think I’ve ever run bash --posix (or typed set -o posix). We’re not feeling particularly beholden to POSIX’s views on shells so we can ignore that and note that Command Substitution (alone!) gets the special treatment:

[ 2]$ set -e
[ 2]$ echo $(echo hello ; false ; echo world;)
hello world
[ 2]$

whereas for Process Substitution:

[ 2]$ cat <(echo hello ; false ; echo world;)
hello
[ 2]$

but notice that whilst false failed (hopefully!) and the rest of the commands were not run the overall “asynchronous process” did not result in a failure being propagated to bash even though errexit is in effect in shell #2. You feel there’s a sort of exit-0-on-error mechanism being used in the Process Substitution part.

Compare that with a regular subshell (for which we use an extra bash to demonstrate):

[ 2]$ bash
[  3]$ set -e
[  3]$ (echo hello ; false ; echo world;)
hello
[1]$ echo $?
1
[1]$

Whoa! All the way back out to shell #1 which reports that shell #2 exited with 1.

Looking more closely, shell #3 has errexit set and we ran a subshell which inherits that value. The subshell sees false fail and so exits non-zero itself. Shell #3 sees the subshell fail and because it has errexit set will fail itself.

Of course, we forgot that shell #2 also had errexit set, because we were testing Command and Process Substitution, and so it exited non-zero because shell #3 exited non-zero.

Finally, shell #1 brought a halt to our toppling dominoes because no-one enables errexit in an interactive shell because this is what happens and no-one wants to log in again.

Shell scripts are a different matter and we do want them to fail.

One interesting thing to note from both the Process Substitution and subshell examples is that we do get hello\n from them before they exit. In other words we haven’t had any weird stdio-style buffering.

The details are in, for Linux, pipe(7) where we see that whilst a pipe (or FIFO) can buffer something like 65,536 bytes (technically, 16 pages) the individual writes to the pipe will be atomic if they are PIPE_BUF bytes or less (for example, 4096 on Linux but 512 on POSIX).

pipefail

I don’t normally run with pipefail enabled. In the first instance because I didn’t know it was there and then because some reasonable tasks become burdensome.

For example, I quite often run some fiendishly complex filter piping the result into, say, head -1 as I only want the first line.

However, we are at risk of head stopping reading its input pipe before the (fiendishly complex filter on the) left hand side has finished writing. We are at risk of the SIGPIPE/EPIPE pincer movement.

As it turns out, the in-my-head example doesn’t work – and for reasons I don’t understand:

[ 2]$ man bash | head -1
BASH(1)                     General Commands Manual                    BASH(1)
[ 2]$ echo ${PIPESTATUS[*]}
0 0
[ 2]$

Hmm, no errors. The Bash man page is nearly 400k bytes so man should block writing to the pipe and head should exit. Dunno. Maybe man does something funny because it so often sees PAGER fail when the user stops reading the man page?

Moving swiftly on, let’s try something more back to basics:

[ 2]$ (echo hello; sleep 1; echo world;) | head -1
hello
[ 2]$ echo ${PIPESTATUS[*]}
141 0
[ 2]$

Aha! That’s better. The exit status, 141, is over 128 meaning the cause was a signal and 141 - 128 is 13, aka SIGPIPE.

Now with added pipefail:

[ 2]$ set -o pipefail
[ 2]$ (echo hello; sleep 1; echo world;) | head -1
hello
[ 2]$

Erm…. PIPESTATUS reports the SIGPIPE again, though.

Ah, wait a minute, the pipeline may well have returned a non-zero status because of the SIGPIPE but we haven’t told shell #2 to do anything about that. We need errexit as well as pipefail:

[ 2]$ set -e
[ 2]$ set -o pipefail
[ 2]$ (echo hello; sleep 1; echo world;) | head -1
hello
[1]$ echo $?
141
[1]$

Top! This time shell #1 reports that shell #2 passed on the SIGPIPE exit status.

Observations

I don’t like the Bash Command Substitution mechanism of un-setting errexit. That feels wrong in some fundamental way. If something went wrong, processing should stop.

That’s what happens with the Bash Process Substitution case with a putative exit-0-on-error mechanism. Something went wrong so we stopped.

That still doesn’t feel quite right. I don’t know that something went wrong. I am unaware that Process Substitution was unable to complete its processing. I think I ought to know.

Clearly, the nominal exit-on-error mechanism to too crude for these “asynchronous” processes. It feels like there should be another kind of “trap” that we can hook onto.

Maybe the default behaviour is to ignore asynchronous process failure. Maybe it should be to raise the standard exit-on-error (but at least offering the opportunity to capture it separately).

Here’s a (possibly) more illuminating example. Not all systems have GNU tar with its z option so I’d habitually separate the processing myself:

$ gzcat file.tar.gz | tar xf -

In that example, if gzcat fails we have some options to handle it. However, a little reworking of the above for Process Substitution:

$ tar xf <(gzcat file.tar.gz)

and now, no error handling at all. Suddenly life is a little harder.

Last built at 2024-10-17T06:11:12Z+0000 from 463152b (dev)