Process Substitution¶
Rather than the similarly named Command Substitution, where we want to substitute the collected output of a command, with Process Substitution we want to substitute a filename for another command to use where the filename is really a pipe to a process.
This is a rather subtle distinction but the obvious use case is where an external command would be expecting a filename and not the contents of a file:
$ cat file
This is a file
If you were to give cat the contents of file
then
it won’t be very happy:
$ cat $(cat file)
cat: This: No such file or directory
cat: is: No such file or directory
cat: a: No such file or directory
This is a file
Note
Note that the file, file
, contains its own name in its
contents, hence the last line.
which leaves us with the, slightly tautological:
$ cat <(cat file)
This is a file
where the “outer” cat <(...)
is, in practice, something like cat
/dev/fd/63
and /dev/fd/63
is attached to the output of the
(asynchronous) command cat file
.
The, rather more understandable, canonical example is:
$ diff <(sort fileA) <(sort fileB)
which is a very powerful expression. Is it a form of process meta-programming? Maybe.
External commands are not the only use, I frequently use Process Substitution in loops:
while read line ; do
...
done < <(cat file)
which has the ever so subtle advantage of keeping the while
statement in the current shell so that it can modify any local
variables in the loop. Compare with:
cat file | while read line ; do
...
done
where the while
is in a subshell (because of the |
).
*
So there’s a few things to do:
we have to recognise the command we are going to run for Process Substitution
we need to run it
we need to marry this up with a (possibly named) pipe
(Named) Pipes¶
You would think this would be the easy bit but oh, no!
In the first instance, it would be nice to use the /dev/fd/N
format as it clearly represents what it is and has the nice advantage
that the operating system maintains it and we don’t have to run around
creating and removing files in the filesystem.
Clearly, there will be some operating system process-indirection
involved as we can’t be having people poking each others’
/dev/fd/0
file descriptors.
In essence, if we create a pipe(2) for inter-process
communication then each of the pipe’s file descriptors will have a
/dev/fd/N
entry associated with it.
For those systems that don’t support /dev/fd
then we’ll have
to create a true named pipe, a FIFO, see mkfifo(2), and use
its name instead.
Note
The nomenclature isn’t always consistent but let’s try to use a FIFO to mean the filesystem-oriented pipe. Once opened, the entry in the filesystem has no bearing on the functionality of the pipe.
We’ll use named pipe to indicate we intend to use a pathname to coordinate access which may or may not be an actual FIFO.
Of course, the reason that’s important is that not all operating system play the same game. Bash let’s us easily take a look at its thoughts:
[Linux ]$ ls -l <(echo "hello")
lr-x------. 1 idf idf 64 Jun 11 12:14 /dev/fd/63 -> 'pipe:[1781424]'
[SunOS ]$ ls -l <(echo "hello")
crw-rw-rw- 1 root root 578, 63 Jun 11 12:15 /dev/fd/63
[Mac OS X]$ ls -l <(echo "hello")
prw-rw---- 0 idf staff 0 11 Jun 12:16 /dev/fd/63
[FreeBSD ]$ ls -l <(echo "hello")
prw------- 1 idf wheel 0 Jun 11 12:17 /tmp//sh-np.OrAlUF
Hmm. There’s a few things to digest there.
There’s a little bit of bootstrap artiface going on here as Bash’s test for some functionality it is about to compile requires a shell that is able to implement the test.
First up, FreeBSD doesn’t use the /dev/fd
form. Well,
careful, it does for file descriptors 0, 1 and 2 but not for
anything else. Bash figures this out with a judicious exec
test -r /dev/fd/3 3</dev/null
in aclocal.m4
. On the plus
side, it is a pipe.
Of the operating systems that do support /dev/fd
you’ll notice
they are all using file descriptor 63. That’s because Bash
chooses to:
Move the parent end of the pipe to some high file descriptor, to avoid clashes with FDs used by the script.
—
process_substitute()
insubst.c
I guess that defers the problem with users having to guess which file
descriptors are free when they want to stash a file descriptor, as in
exec >&3
, and generally just plumping for a low numbered one, as
in 3. Too bad if that was important.
SunOS is a bit more interesting, /dev/fd/63
is a character
special device (and owned by root). In fact, many (all possible?)
/dev/fd/N
exist. I guess you’re meant to know your own file
descriptors (which doesn’t seem that unreasonable).
Mac OS X is possibly the most honest in that /dev/fd/63
is a
(named) pipe.
Linux has /dev/fd/63
be a symlink to what we presume must be a
pipe. ls -lL
confirms that.
Running¶
Running the command, married up to a (named) pipe should have the
heavy work done already. We have pipe-into
and pipe-from
mechanisms from the pipeline meta-commands work so, presumably, we can extend that with
named-pipe-into
and named-pipe-from
variants which will return
a pathname rather than a pipe handle.
In terms of how and when we do all this it ought to come out in the wash as:
ls -l <{ echo "hello" }
is just a regular function call where we evaluate each argument in
turn (with ls
and -l
just being symbols) and <{ echo "hello"
}
is another argument. So long as the evaluation of <{ ... }
results in a pathname to a named pipe (/dev/fd
or otherwise)
then we’re good to go.
But not so good to stop. Let’s have a think.
The trickier bit is arranging the longevity of the pipe’s name. Well,
the name should be easy enough but I mean, for example, in the case
of having established a pipe-from
-style sub-command then we
would normally use the read end of the pipe.
Except we need to be careful. Linux says:
If all file descriptors referring to the write end of a pipe have been closed, then an attempt to read(2) from the pipe will see end-of-file (read(2) will return 0).
—pipe(7)
The problem is that if we close the write end of a pipe-from
pipeline in the parent then Linux is at liberty to close the pipe
completely, under our feet, as soon as the asynchronous process
associated with the pipe is done. Which is quite possibly going to be
before we get around to calling read-line
on the pipe handle we
got back from launching the thing in the first place.
However, with a named-pipe-from
we intend that another
sub-program utilise this name (although, there’s no reason why we
shouldn’t use it – that depends on what the code says – albeit there
is a “complication” which we discuss below).
The process attached to the named pipe, the echo "hello"
part, is
only going to hang around whilst someone has the read end of the
pipe open. If no-one is holding the read end open then the writer,
echo "hello"
, will get hit by a SIGPIPE
/EPIPE
pincer
movement. (OK, not in the case of something like <{ sleep 1 }
as
it doesn’t write anything but you get my drift.)
Note
Technically, pipe(7) on Linux says:
If all file descriptors referring to the read end of a pipe have been closed, then a write(2) will cause a SIGPIPE signal to be generated for the calling process. If the calling process is ignoring this signal, then write(2) fails with the error EPIPE.
Named pipes are no different in this regard. That means there’s a sort of gap between us having:
created the named pipe and the “writer” (
echo "hello"
)created the sub-process “reader” (assuming the user of the named pipe is an external command)
and then the sub-process reader actually opening the named pipe
We have to hold the pipe open during this time to avoid the
SIGPIPE
/EPIPE
combo.
Bash doesn’t hold the pipe open beyond that encapsulating expression:
$ ( fn=<(echo "hello"); cat $fn )
cat: /dev/fd/63: No such file or directory
Ideally, we could pick up on some inter-expression gap but our evaluation model of cascading function calls (aided and abetted by templates) means we don’t really have a handle on individual expressions.
Ostensibly, our only safe point to close up is when the sub-process, technically, the job, associated with the Process Substitution completes. Whether it succeeds or fails, it is complete and therefore “done” with the pipe.
This works even if the reader of the named pipe is just more Idio code. Consider:
fh := open-input-file <{ printf "hello world\n" }
Here, the printf ...
part will be in a sub-process writing to the
named pipe but the read end of the named pipe is us, not a separate
program like ls.
We can close our read end of the named pipe when the sub-process completes.
Hence, lib/job-control.idio
maintains a table of
%process-substitution-job
entries, indexed by Job Control job.
Each entry is a simple structure of fd path dir
where
/dev/fd
-capable systems set fd
and the other two to #f
.
Systems using named pipes will set fd
to #f
and path
to
the named pipe with dir
the (unique) directory the named pipe is
in.
Note
Regarding both path
and dir
for the named pipe, recent
linkers have begun warning about the use of mktemp(3)
which Bash uses through sh_mktmpname()
to create a
temporary file name which it uses as a named pipe:
warning: the use of `mktemp' is dangerous, better use `mkstemp' or `mkdtemp'
The suggested solution is to use mkdtemp(3) to create a unique directory then create the named pipe in there. That’s fine but we now have to remove two things when we’re done, the named pipe and the temporary directory.
Back in our SIGCHLD handler, if we see that a job has completed then
we can check the table of %process-substitution-job
s and either
close(2) or unlink(2) and rmdir(2)
the appropriate parts.
During final shutdown we can also do any otherwise neglected tidying up for named pipes. We don’t care so much for the file descriptors as they’ll get closed during shutdown anyway.
Bash manages this with its fifo_list
array in
subst.c
which it periodically trims.
Warning
There’s always a possibility that we won’t have seen the child process complete even during shutdown in which case any outstanding named pipes will remain in the filesystem.
You can contrive this with Bash on FreeBSD (which uses named pipes) with something like:
$ bash -c "{ sleep 2 ; ls <(echo hello); } &"
$ /tmp//sh-np.TixduF
(Technically, you will have gotten your prompt back before
ls runs hence the prompt in $ /tmp//sh-np.TixduF
.)
Here we are backgrounding the Group Command meaning that
bash, as in bash -c
, will exit leaving the
backgrounded process, another bash, running.
Notice, given that the named pipe remains, that the sub-shell
bash is not the one tracking fifo_list
so it isn’t
removed – or there is a exit_shell() race condition!
Look out for directories called sh-np.xxxxxx
from
Bash or idio-np-xxxxxx
from Idio in
TMPDIR
or /tmp
or wherever.
Complications¶
There is a complication with the name-pipe-*
variants. Process
Substitution is meant to be for external commands to use,
diff, cat etc., but there’s nothing to stop us
getting the pathname to the named pipe back.
/dev/fd¶
However, for /dev/fd
enabled systems, that is, ones where
we’ve used a regular pipe(2) under the hood, this pathname,
/dev/fd/n
, masquerades as a true pathname but is really a
contrivance. It is not a file to be opened anew, it is already
open.
Opening /dev/fd/n
does not even result in file descriptor
n
. In fact, opening /dev/fd/n
is defined as
being indistinguishable from calling dup(n)
or
fcntl(n, F_DUPFD)
. Opening /dev/fd/n
is going to
get you another file descriptor, m
, and now you’ve got the
file open twice.
External Commands¶
Let’s consider an example using our trusty auto-exit test
command to write a named-pipe-into
test:
auto-exit -o 3 -O (named-pipe-into {
sed -e "s/^/Boo /"
})
from which we might get something like:
Boo auto-exit: 1054079: wrote line 1 to /dev/fd/8
Boo auto-exit: 1054079: wrote line 2 to /dev/fd/8
Boo auto-exit: 1054079: wrote line 3 to /dev/fd/8
We’re about to have four processes and several copies of file descriptors so let’s see what happens.
We are PID1 and you suspect that, at some point, we are going to
fork and exec auto-exit. Not before we’ve processed
all the arguments. After all, it’s just a complicated version of
cmd args
.
The interesting argument is (named-pipe-into ...)
, of course.
For this we will call fork-command
in lib/job-control.idio
which will do some prep work, fork and let the sub-Idio
figure out how to run ...
. The sub-Idio is PID2.
Before we launch it, though, the prep work is going to include a call,
by us, PID1, to libc/proc-subst-named-pipe
, in
src/libc-wrap.c
, which, as we’re not on FreeBSD, is going to
return the embellished output from pipe(2). Let’s say we
get back the two pipe file descriptors, pr
and pw
.
The embellished form is a list of those two and their names in the
file system, /dev/fd/pr
and /dev/fd/pw
.
The important thing to remember here is that we, PID1, have both of those file descriptors open. We then fork the sub-Idio as PID2 and then it too has both of those file descriptors open.
Part of the prep work for named-pipe-into
is to indicate that the
future stdin of the exec’d sub-process, PID2, is going to be
pr
. The action of prepping IO is to dup2(2) the
target file descriptor to 0 and close the target. The net effect of
that is that we still hold a file descriptor open to the read end of
the pipe that we knew as pr
but it is now file descriptor 0
and file descriptor pr
has been closed.
In the case of named-pipe-into
, the sub-Idio, PID2, is
looking to read from the pipe so it can close the write end,
pw
, and, as we’ve not done anything else, the
sub-Idio will have inherited PID1’s stdout.
The sub-Idio can now determine what to do with ...
which,
in this case is a simple block with a single sed command in
it. Of course it could be a series of commands doing whatever but
here it is a sed which has the conceptual advantage of
reading its stdin until it sees EOF.
As we know with pipes, it will get an EOF when all of the write ends are closed. At the moment, the sub-Idio, PID2, has closed its copy of the write end but the original Idio, PID1 still has it open.
Let’s assume the sub-Idio makes some progress and manages to fork and exec sed, now PID3, which inherits file descriptors from PID2 which include 0, which is the read end of the pipe, and 1, the same stdout as everyone else.
So far, then, we have sed, PID3, blocked reading from the
read end of the pipe, the sub-Idio, PID2, waiting for
sed to finish so it can carry on with whatever is left of
...
and the original Idio, PID1.
Whilst all that was going on, PID1 still has both pr
and
pw
open. With its named-pipe-into
hat on, the parent
Idio doesn’t need the read end of the pipe open, so it can
close pr
leaving it with just the write end of the pipe,
pw
in its hands.
The parent Idio’s task here was to evaluate the argument
(named-pipe-into ...)
for which the return value is the pathname
representing the opened pipe that someone can write into. Here, then,
it is the fourth element of the list returned from
libc/proc-subst-named-pipe
, /dev/fd/pw
.
*
Let’s take a slight pivot at this point. Having evaluated the
argument (named-pipe-into ...)
, PID1 has created an asynchronous
command (the combination of the sub-Idio and its sub-process
running sed) which it can associate with the returned named
pipe (either the file descriptor pw
or the pathname of a
true named pipe).
*
In terms of evaluating our original line we have:
|
a symbol. |
|
a symbol. |
3 |
a fixnum. |
|
a symbol. |
|
a pathname |
Great! Let’s fork and exec!
We fork and exec auto-exit as PID4 which inherits PID1’s file descriptors which include:
0 – stdin which we’re not going to use
1 – stdout which we’re not going to use because of the
-O FILE
argumentpw
So, here in auto-exit, PID4, we have pw
open
but we’re going to ignore it. Instead, using -O FILE
, we
will run exec >FILE
meaning our stdout is going to
/dev/fd/pw
, the same place as pw
, the write end of
the pipe.
*
Hey, the write end of the pipe is open twice, what gives?
Well, to some degree, we don’t care. Indeed, that’s the nature of the
beast in passing a /dev/fd/n
form. The file (device?)
/dev/fd/n
only exists if file descriptor n
is open
and there is no way of opening /dev/fd/n
without duplicating
the file descriptor.
In one sense we are reliant on the fact that we are running an external command which will do its thing and then exit, implicitly closing all file descriptors. It feels a bit inelegant but, uh, that’s the way it goes. We can sort of see the same from Bash with something like:
$ ls -l /dev/fd/ <(sleep 1)
lr-x------. 1 ... /dev/fd/63 -> 'pipe:[6122009]'
/dev/fd/:
total 0
lrwx------. 1 ... 0 -> /dev/pts/1
lrwx------. 1 ... 1 -> /dev/pts/1
lrwx------. 1 ... 2 -> /dev/pts/1
lr-x------. 1 ... 3 -> /proc/1072573/fd
lr-x------. 1 ... 63 -> 'pipe:[6122009]'
Here, ls has re-ordered its arguments to show
/dev/fd/63
first which is a symlink to pipe #6122009.
ls is also showing its own /dev/fd
listing which
includes:
/dev/fd/3
, the open directory for/dev/fd
(magically mapped to/proc/PID-of-ls/fd
) as ls loops calling getdents(2)/dev/fd/63
the Process Substitution argument which is that same symlink to pipe #6122009.
Now this example isn’t quite right because ls hasn’t
opened the argument /dev/fd/63
it was given (from the Process
Substitution argument <(sleep 1)
) but it demonstrates the file
descriptor 63 is open whilst ls is running and that if
ls opens its argument /dev/fd/63
then it’ll be using
file descriptor 4, say, as well as file descriptor 63.
*
In the meanwhile, auto-exit will now write three lines to its stdout which, being the write end of the pipe seamlessly appear on the stdin of sed and we get our nice output.
And then it all hangs.
The problem is that, although auto-exit wrote three lines
and quit, thus closing both its stdout and the inherited
pw
, the write end of the pipe was inherited from PID1
which still has it open. D’oh!
We need to have PID1 close pw
but it can’t do that until
after it has forked auto-exit as otherwise we’ll have
closed pw
and the pathname we’re passing to
auto-exit is invalid.
The twist, here, is that pw
is associated with the
argument (named-pipe-into ...)
and not the (external) command
auto-exit so it’s not even the case that we can close
pw
when auto-exit completes as we don’t hold that
relationship. pw
is associated with the asynchronous
command of the sub-Idio and sed, remember, and
sed is blocked reading from the read end of the pipe
meaning that it hasn’t exited indicating we are free to close
pw
through the %process-substitution-job
mechanism.
Classic deadlock.
Could we create such a relationship between auto-exit and
pw
? Maybe, but the cascading nature of evaluation means
that in practice we’d be throwing a nominal “close this, will ya?” out
into the aether in the hope that it is caught and managed.
There is another way, though, which is a bit hacky so you never heard
me say this, right? We could flag the /dev/fd/n
pathname we
create as special – the sort of special that users can’t abuse. If
PID1 were to walk along those arguments and identify any such
special pathnames it could take liberty of closing the associated file
descriptor after having forked the (external) command.
It feels slightly awful but it is quite practical. So, uh, let’s move on.
Idio Commands¶
Hopefully, we can all now see the problem with Process Substitution and Idio commands. If it takes a bit of magic hackery to be inserted between fork and exec of an external command to ensure that PID1 closes any special pathname argument file descriptors then who is going to do it for regular Idio commands?
Ans: no one.
fh := open-output-file (named-pipe-into {
sed -e "s/^/Boo /"
})
We now have both fh
open into the asynchronous command and the
write end of the pipe, pw
, open in PID1. No-one knows to
close pw
and sed will not see EOF. *shakes
fist*
To paper over this “contrived pathname and actually open file
descriptor” mess we can make open-output-file
act a bit like
open-output-file-from-fd
and have it figure out the file
descriptor from the supplied pathname if it is a special.
fh
will now use the file descriptor of the special pathname,
pw
, and when the user closes fh
they will have closed
the last reference to the write end of the pipe and sed
will get its EOF.
That, he says, hesitantly, seems to work… for output pipes. At least those that read their input until EOF.
For input pipes we have a slightly different problem. What we’re asking the asynchronous command to do might not take very long:
fn := named-pipe-from {
printf "hello\n"
}
fh := open-input-file fn
There is race condition here between the asynchronous command being launched and running to completion before we even get round to opening the file handle, let alone reading from it or closing it. The open could fail even though it’s the statement after the creation of the asynchronous command. The point being that we don’t have control over the operating system’s scheduling so who knows what might happen when a second process is in play.
Indeed, we can contrive a delayed-open command which will “sleep 1” before invoking the open. Here, we have no chance, the (named) pipe’s asynchronous command will (almost!) certainly have been and gone before we call open which will fail with ENOENT.
In general, though, the act of opening the file handle may result in open or fcntl system calls failing with EBADF or when we try to read or close the file handle later we can get the same EBADF. The pipe between us and the asynchronous command has been closed because the writer, the asynchronous command, has quit. Any action we take on the read end of the pipe will get EBADF.
Oh dear.
I’m not sure there is a sensible fix for this. If you associate an asynchronous command with a input pipe and then you delay opening, reading or closing the pipe then the asynchronous command could have completed and you’re going to get EBADF errors.
Hiccup¶
There is a minor “dotting the i’s and crossing the t’s” problem in
that the default open modes for open-input-file
and
open-output-file
are re
and we
respectively but the
underlying pipe(2) does not have the O_CLOEXEC
flag
set.
The obvious action is to disable that default “e” flag (and set the
handle’s type to be a pipe). However, had the user called
open-file
, say, with an explicit mode including “e” then we will
have upset their expectations.
So, update the code with a flag as to whether the user supplied the mode or not and remove the CLOEXEC component if they didn’t.
FIFOs¶
That tricksome hackery involved with /dev/fd/n
contrived
pathnames doesn’t mean we can remove our %process-substitution-job
mechanism, though. If we created a FIFO for FreeBSD then both the
FIFO and its parent directory still need to be removed when we get
notification that the associated asynchronous command has completed.
That said, we can leave the “close fd” parts in situ just in case.
We’ll do some mitigation. Originally I thought to use
suppress-errors!
but that applies a host of template and trap code
when in practice we know that most of the time the file descriptor
will have been closed. So I added a libc/close-if-open
which
reduces it down to a couple of system calls.
However, and somewhat more importantly, FIFOs introduce their own features as FIFOs are not regular files and have some peculiar behaviour.
You can experiment with this on FreeBSD although its version of truss(1) doesn’t tell you which the currently blocked system call is – you know, the one you’re interested in – so you need to read between the lines or force the use of FIFOs on Linux where strace will tell you what system call it is blocked in.
$ mkfifo une-pipe
$ cat une-pipe
Uh, nothing.
It’s not even blocked in a read, it is blocked in open(2) as:
When opening a FIFO with O_RDONLY or O_WRONLY set: When opening a FIFO with O_RDONLY or O_WRONLY set:
An open() for reading only will block the calling thread until a thread opens the file for writing. An open() for writing only will block the calling thread until a thread opens the file for reading.
—The Open Group: open()
Hmm. Let’s move on and into another window:
$ echo hello > une-pipe
and our cat echoes “hello” to the terminal… and exits. Wait, what?
Ah, yes. The act of echo closing the FIFO has generated an EOF which is enough for cat to quit.
Well, that’s not quite true. The actual behaviour is now more like pipe(2) pipes in that once one other thread has opened the FIFO then the FIFO remains open until all threads with the FIFO open close it. So, in other words, you can run
$ (echo hello; sleep 10; echo world) > une-pipe
in one window, holding the FIFO open for a little over ten seconds and in (yet) another window run:
$ echo breaking > une-pipe
and you’ll just get “breaking” somewhere in between the “hello” and “world”. “breaking” does not generate the EOF, it is the number of threads holding the FIFO open for writing reducing down to zero that generates the EOF.
Subtle!
This may seem a bit academic but as we’re handling the FIFO we need to be all over this. If the asynchronous command associated with a FIFO fails, we need to have an appropriate card up our sleeve.
For example, if, in the preparation for our echo command we were to crash and burn:
fn := named-pipe-from {
libc/exit 9
echo "hello"
}
fh := open-output-file fn
Then the chances are that we will block trying to open the FIFO – both times. Eh? We, the parent, clearly try to open the FIFO but the asynchronous command, or rather the sub-Idio launched to run the asynchronous command, tries to open the FIFO during the prep stage as it tries to make the FIFO its stdin.
In principle, then, both processes, us, the parent, and the sub-Idio, will synchronise and will both be trying to open the FIFO at the same time and then the operating system will allow us both to proceed.
The asynchronous command immediately exits with 9 and any subsequent activity by the parent Idio, notably, read(2), will get an EOF indication.
There’s another, slightly more subtle aspect that affects our simple tests. Suppose our test is:
fn := named-pipe-from {
cat "/dev/fd/0" > testfile
}
fh := open-output-file fn
puts "hello\n" fh
close-handle fh
In the parent, we will coordinate our open of the FIFO with the asynchronous command opening the FIFO for its stdin and we’ll write “hello\n” and be done.
The asynchronous command, however, is likely to be a little slower off
the mark as it does it’s prep, figures out the redirection and then
launches cat. That, in itself, isn’t the problem. The
problem is that cat sees /dev/fd/0
as “just another
file” and will open it.
If the parent has already written “hello\n” to the FIFO and closed the handle then there’s no-one with the FIFO open for writing and read(2) will block.
Bah!
We can start a fix with non-blocking I/O but there are knock-on effects.
If we tag the pathname we’re returning as a FIFO (much like we tagged
pipes for /dev/fd
systems) then we can know to use O_NONBLOCK.
But, careful, though! Only the parent wants O_NONBLOCK. If the
asynchronous command side has O_NONBLOCK set then, with our race
condition hats on, it will get EOF the moment it tries to read,
potentially long before we get round to opening our end of the pipe.
This forces proc-subst-named-pipe
to diverge into
proc-subst-named-pipe-into
and proc-subst-named-pipe-from
forms so that the inner code can know whether to tags the read/write
pathnames with the magic O_NONBLOCK flag.
With O_NONBLOCK, the open will return immediately however, it is our future read(2) that we need to be careful with:
When attempting to read from an empty pipe or FIFO:
If no process has the pipe open for writing, read() will return 0 to indicate end-of-file.
If some process has the pipe open for writing and O_NONBLOCK is set, read() will return -1 and set errno to [EAGAIN].
—The Open Group: read()
and write(2) is even more complicated:
If the O_NONBLOCK flag is set, write() requests will be handled differently, in the following ways:
The write() function will not block the thread.
A write request for {PIPE_BUF} or fewer bytes will have the following effect: If there is sufficient space available in the pipe, write() will transfer all the data and return the number of bytes requested. Otherwise, write() will transfer no data and return -1 with errno set to [EAGAIN].
A write request for more than {PIPE_BUF} bytes will case one of the following:
When at least one byte can be written, transfer what it can and return the number of bytes written. When all data previously written to the pipe is read, it will transfer at least {PIPE_BUF} bytes.
When no data can be written, transfer no data and return -1 with errno set to [EAGAIN].
—The Open Group: write()
<sigh> Just for completeness, then:
If the O_NONBLOCK flag is set, or if there are any pending signals, close() does not wait for output to drain, and dismantles the STREAM immediately.
—The Open Group: close()
That’s all too complicated for us (read: me!) right now. There’s a reason why programming languages triumphantly announce their “Async I/O” package some time down the line. Asynchronous I/O is hard and requires some fiddly exactness and therefore some fiddlingly exact thinking.
Let’s come back to that.
In the meanwhile. Don’t use “named pipes” in Idio code just use regular pipes!
Tidying Up Process Substitution¶
What do we do about tidying up? When do we do tidying up?
We will have created an entry in %%process-substitution-jobs
when
we created the asynchronous command, a map of job to a
%process-substitution-job
struct.
tidy-process-substitution-job
will get invoked when the job
completes – which includes when it errors.
/dev/fd/¶
Suppose we were reading from a pipe and the asynchronous command completes normally. We should not pro-actively close the pipe as the writer will have left some data in the pipe and we, the reader, should be able to retrieve it normally and, having read all the stored data, get an EOF.
What happens if the asynchronous command failed in some way? We may
or may not get a warning – depending on the value of
suppress-async-command-report!
– but what should we do about the
pipe?
Clearly, we have two choices. We could do nothing and have the reader read whatever the asynchronous command had gotten round to writing into the pipe and then get an EOF. It would be none-the-wiser that the pipe had failed though.
Alternatively, we could pro-actively close the pipe, under the feet of the reader (us!), and subsequent operations will get EBADF. (Hopefully!)
I think I prefer the later. Something went wrong and the user should get to know because we started throwing errors everywhere.
Do we need some means of suppressing this behaviour, though? Maybe the writer is a known ropey program.
What if we were writing to the pipe? Of interest, can the
asynchronous command exit successfully while we’re still writing?
head -1
suggests that it can so we need to consider our options.
If there’s any kind of error, we should let the user know by closing the pipe under their feet.
If there was no error, the head -1
example, I still think we
should close the pipe under the writer’s feet. It seems to me to be
the right thing to do, however inconvenient.
You fancy that this might be a more popular candidate for suppression!
FIFOs¶
FIFOs are more interesting again as they are entries in the file
system which we need to remove. We can do that in a similar way to
closing the file descriptor for /dev/fd
systems when the
asynchronous command completes.
However, we don’t have a handle [sic] on the file handle the user is using to access the FIFO. In other words we have no control other than to remove the FIFO from the file system before the user opens it – in all likelihood, a very small window.
That said, if the user hadn’t opened the FIFO before we removed it
then they’ll get an ^i/o-no-such-file-error
(ENOENT). If they had
already opened the FIFO then… I don’t know.
Something to come back to.
Last built at 2024-12-21T07:11:04Z+0000 from 463152b (dev)