Extensions Location¶
Where to put stuff is (finally) being recognised as a thing.
In the Beginning, there was only one disk and it was full. So stuff spilled over into any handy spare space.
Wherever you put it, though, it was always the same kind of stuff, the same kind of binaries because they were all running on your computer.
With the advent of networked file systems you immediately begin to hit, at the very least, version issues between the sharing systems. If you then threw a different kind of machine into the mix then you were in heaps of trouble.
Skipping history, in the modern age we can use QEMU to run:
operating systems for any machine, on any supported architecture
Now we’re in real trouble! And yet, amazingly, we still install
executables in /usr/bin
and libraries in /usr/lib
–
albeit there’s been a concession to use /usr/lib64
for 64-bit
libraries.
It’s still pretty hopeless.
Multiarch¶
file-hierarchy(7) differs from hier(7) (which is available on the *BSDs as well) which describes the current layout.
There are attempts to do better and file-hierarchy(7) (Linux or systemd-systems only?) references the File System Hierarchy and the XDG Base Directory Specification and the Debian Multiarch Architecture Specifiers (Tuples) amongst others.
Operating Systems take differing views on the detail, Fedora
sticks with /usr/lib64
for the suggested $libdir
(systemd-path system-library-arch
) whereas Ubuntu goes
all in with /usr/lib/x86_64-linux-gnu
and
/usr/lib/aarch64-linux-gnu
, for example. My Raspberry Pi 3B+
(standard unit of computing) reports
/usr/lib/arm-linux-gnueabihf
with the eabihf
part
reflecting that it is a (little endian 32-bit) “ARM EABI, hard-float”
– which I don’t think you need me to explain….
Those tuples don’t feel particularly easy to derive without asking
systemd-path (or gcc -print-multiarch
(since 4.9), I
see) and stashing the results.
With the multiarch work, though:
The existing proposals allow for the co-installation of libraries and headers for different architectures, but not (yet) binaries.
which rather seems to defeat the point.
Not that I have any particular answer, I can only report what I see.
*
For my own sins I did do some work with shell script wrappers where everything that is architecture-dependent is installed in an architecture-dependent directory leaving the question of, how do I know what to run?
Easy! For every executable you expect to be run you create a link for
the named executable to the wrapper script in .../bin
, the
place on your PATH
.
When you run the command, you are really running the wrapper script
which figures out where it is, ...
, and then calculates any
appropriate set of related environment variables, say,
LD_LIBRARY_PATH
with architecture-dependence included and
PERL5LIB
, and then runs the architecture-dependent
executable.
For example, .../bin/foo
, on an x86_64 system, might figure
out:
an
LD_LIBRARY_PATH
of.../lib/x86_64
a
PERL5LIB
of.../lib/perl5
and the real command to be run,
.../bin/x86_64/foo
, and runs it with all the original arguments intact.
You can imagine all sorts of:
operating system binary compatibility rules can be played out – in my case
SunOS-sparc-5.10
systems could runSunOS-sparc-5.9
executables andSunOS-sparc-5.8
executables and …and some reasonably sensible rules to avoid rewriting environment variables unnecessarily if
.../bin/foo
was likely to call.../bin/bar
– you could updatePATH
to point directly at.../bin/x86_64
in preference to.../bin
asLD_LIBRARY_PATH
is now correct.
It also had the handy (for filthily casual sysadmins [*cough*])
property of allowing you to tar the whole ...
hierarchy up and drop it anywhere on any other (compatible) system and
it would just work.
*
The ultimate trick, there, though, was replacing the executable with a shell script. They don’t take very long to run and can figure out a lot of stuff.
Nothing seems to be being proposed which has the same functionality,
hence we’re tied to binaries in /usr/bin
and if you stomp over
it with a binary from a different architecture then, uh, User Error.
Plenty of people have been looking at the problem from which you conclude there is some nuance I am not appreciating.
*
The one thing the multiarch work does do is address the disparate
ABIs with a variant of the
GNU triplet, something
like x86_64-linux-gnu
.
That gives us an ARCH
variant to distinguish this
architecture from some other architecture.
Versions¶
We also have some version numbers to throw into the mix.
Idio has a version number, say, IDIO_VER
, and your
extension has a version number, say, EXT_VER
.
You would think we ought to be able to have more than one version of
Idio installed – think python2.7 and
python3.9, say, which, each in turn, use
/usr/lib/pythonver/site-packages
and
/usr/lib64/pythonver
on my system.
And you’d like to think you can have multiple versions of your
multi-featured yet remarkably stable foo
extension module at
the same time.
Ideally, then, the path to our shared library should involve all of
the above, something like
.../lib/idio/IDIO_VER/foo/FOO_VER/ARCH/libfoo.so
.
Some immediate thoughts:
that doesn’t look great – but what do you care, this is for a machine
you don’t really want users to have to second guess any of that,
.../lib/idio
ought to be enough for a user and the machine can figure out the restthe positioning of
/ARCH/
is contentious (even to me!).Take GCC, for example, which has followed file-hierarchy(7)’s suggestion to use a subdirectory of
/usr/lib
(specifically, rather than$libdir
) for its own nefarious purposes then chosen/ARCH/MAJOR/
whereARCH
is a multiarch form even on a non-multiarch system (eg, Fedora where it uses a GNU triplet). That hierarchy does include header files.That seems too high up for
ARCH
where much of the use is for architecture-independent.idio
files, so push it further down.Python3 uses something like:
$libdir/python3.9/lib-dynload/EXT.cpython-3.9-ARCH.so
albeit the
EXT
is often_EXT
to distinguish the shared library extension from any pure-Python code which is likely to be in$libdir/python3.9/EXT.py
(and various$libdir/python3.9/__pycache__/EXT.cpython-3.9[.opt-n].pyc
byte-compiled files).and just where is
...
anyway? I imagine most people have their sights set on/usr/lib
(or/usr/lib64
) but it could be somewhere more centrally managed or shared like:/usr/local
– albeit the *BSDs like to use/usr/local
as the installation point of choice for their ports(7) packages treating it as a (genuinely) local filesystem that the OS has full control over/opt
– which SunOS likes to use for third party package installs (let’s not start on the multitude of/usr/collection
trees to keep yourPATH
full and busy)$HOME
on a network filesystem or, better,$HOME/.local
for some XDG compliance
In fact, how might we figure any of that out?
Elsewhere, for shared libraries
Perl uses something like
$libdir/perl5/auto/ext/ext.so
(whereperl5
might be variants ofperl5/PERL5_VER
) except SunOS where it is generally in/usr/perl5/PERL5_VER
and useslib/arch-tuple/auto/ext/ext.so
andarch-tuple
is a variation on the theme of the Debain multiarch, above.
Python uses a more straight-forward system for building C extensions where it is
PYTHONPATH-dir/ext.so
– except where they’re not, see above.
Idio¶
In the first instance, since idio is running, it ought to
know its own version number, IDIO_VER
, although we have a
bootstrap issue of which version of Idio will we run when we
type idio?
Python solves that by having /usr/bin/python
be a symlink to
/usr/bin/python3
which is a symlink to
/usr/bin/python3.9
. You need to explicitly run
/usr/bin/python2
etc. to get the older Python.
In fact, Python’s virtualenvs create a similar
set of symlinks in .../venv/bin
(albeit using a different
schema).
Of interest is the value of IDIO_VER
. In Python’s case it
is a major.minor
number yet my python --version
reports “Python 3.9.6” and, presumably, has a more specific version
number than that, too, see Version Numbers.
Extension¶
Slightly more problematically, is deriving the version number of the
extension, foo
, here, FOO_VER
. You guess we started
with something like load foo
so how do we get to
FOO_VER
?
Here, I think, we need some concept of a “latest” – which, I admit,
is probably the wrong word but we’ve started so let’s continue the
thought. Suppose we have a bit of a development frenzy and install
several versions of foo
:
.../lib/idio/IDIO_VER/foo/FOO_VER1/ARCH/libfoo.so
.../lib/idio/IDIO_VER/foo/FOO_VER2/ARCH/libfoo.so
.../lib/idio/IDIO_VER/foo/FOO_VER3/ARCH/libfoo.so
and then decide that FOO_VER3
is a bit rubbish and re-deploy
FOO_VER2
.
We would expect that anyone invoking load foo
will get
FOO_VER2
and so something has to say FOO_VER2
is
the latest to be deployed – whatever the possible versions,
especially as FOO_VER3
sorts higher than FOO_VER2
(however it is that you manage to sort version numbers).
Clearly, there needs to be a loading mechanism to support loading
FOO_VER3
specifically (crazy fools!) which might, *thinks
for too little time*, look like load foo@FOO_VER3
. Of
course, anyone loading an explicit version will not be able to take
automatic advantage of the newest shiny FOO_VER4
which will
solve all known problems when it is finally released. Indeed, you
will never know it is running against an outdated version.
So we can imagine a bunch of
.../lib/idio/IDIO_VER/ext/latest
files which contain, say,
EXT_VER
– or, better, ext@EXT_VER
– to
indicate the latest deployed version which is actually
.../lib/idio/IDIO_VER/ext/EXT_VER/ARCH/libext.so
.
All good.
Double Trouble¶
But wait! Suppose we now release another version of Idio?
Or, more perniciously, another spin on the existing IDIO_VER
if it takes the form major.minor
like Python.
Here, I’m thinking in Python terms of a 3.9.7 release updated
from the 3.9.6 we have installed but both bearing the same 3.9
IDIO_VER
.
That’s bound to change many of those latest
files to
whatever is the latest bits. But the previously installed
idio.major.minor
won’t be expected those (newer) extension
releases. If you’re very careful not to change any of the internals
of Idio then it’ll probably work – until it doesn’t.
Tricky.
Or is it? As we can only install a single binary in /usr/bin
then we will have overridden that previous
idio.major.minor
executable.
Is it the only idio.major.minor
executable on the system,
though? Is it the only idio.major.minor
executable using
the deployed .../lib/idio/IDIO_VER
hierarchy?
To avoid disappointment you have to mandate that only the
/usr/bin/idio
executable (for some IDIO_VER
) can use
the $libdir/idio/IDIO_VER
hierarchy – or, at least, the
only to use it without risk.
Python gets away with that as almost everything is a symlink to the (solitary) one true python.3.9 executable.
If you want to deploy a .../bin/idio
then you’ll need to
deploy a .../lib/idio/...
hierarchy.
Python does do that with its virtualenvs giving, say:
.../venv/bin/python
(a symlink, eventually, to/usr/bin/python3.9
).../venv/lib/python3.9/site-packages
Multiple Installations¶
In a previous life, when GNU software was changing fairly rapidly, I
reached the stage of installing new releases in separate hierarchies
in /usr/local
, so, /usr/local/emacs-ver
, say, and
updated startup scripts to have a nosey
and allow daring people to pick up the latest bits and the more
risk/surprise-averse could stick with a stable release.
In the modern age of packaged installs, people compiling software is a
rarity and we are forced into the single instance /usr/bin
cul-de-sac.
Mixed Releases¶
I have wondered about a slightly different approach where, at
deployment time, you might create a build-specific
.../lib/idio/IDIO_BUILDVER/latest
containing a list of all
the extension releases, say, foo@FOO_VERx
and
bar@BAR_VERy
and baz@BAZ_VERz
, each appropriate to
that IDIO_BUILDVER
.
If Idio didn’t know about an extension at the time of
deployment then you would derive the extension version number from its
latest
file – what else have you got to go on?
This method would allow multiple IDIO_BUILDVER
to work
alongside one another but neither could take useful advantage of any
newer EXT_VER
than their latest
files allowed
without risking invoking some far more advanced release of code.
Virtualenvs¶
We are rolling back around to that sort of Python-esque virtualenv system where we use a system installed executable but we pick up the virtualenv-specific installed extensions in preference.
Now we have to be a bit more careful when resolving where we are.
Where Are We?¶
I often install bundles where you want to pick up items related to the
executable being run. If you know this executable is
.../bin/idio
, say, then you can find the libraries that this
executable was meant to run with in .../lib/idio
.
This gives your bundle the air of position independence which is extremely useful if you have multiple, potentially incompatible, versions lying around. Even better when you can tar the bundle up and drop it elsewhere and have it just work.
This sort of position independence is similar to the Python-style virtualenv and RedHat Software Collections both of which require that you explicitly run a command to “activate” the new environment. I’ve always preferred the idea that running a command should be enough to activate the environment on its own.
That brings up a bit of a dance around auto-updating environment variables which is influenced by whether or not environment variables have been set at all.
There are two “executable” pathnames of interest, here:
the pathname you executed by dint of the specific executable you ran (
./bin/idio
or/path/to/bin/idio
) or is found on yourPATH
either of which, in the case of bundles/virtualenvs, might be a symlink (or chain of symlinks) to the real executablethe real executable
Which, when you’re not using a bundle/virtualenv, is probably the same as the first value.
For example, suppose I have created an XDG approved
$HOME/.local/bin/idio
symlink to a real deployed executable
/path/to/deployed/bin/idio
and that $HOME/.local
hierarchy contains your favourite Idio extensions in
$HOME/.local/lib/idio
.
If I run, via PATH
or directly,
$HOME/.local/bin/idio
I would expect to see both
$HOME/.local/lib/idio
and /path/to/deployed/lib/idio
on IDIOLIB
.
In particular, $HOME/.local/lib/idio
before
/path/to/deployed/lib/idio
.
There’s a slight variation for system executables (in /usr/bin
or /bin
) as the system will expect their library files to be
in $libdir
(/usr/lib64
or
/usr/lib/x86_64-linux-gnu
or wherever) but we can deal with
that.
The question is, where does any existing IDIOLIB
fit with
respect to these to, executable-oriented paths?
My bundling belief is that the executable-orientated paths should be
before any existing IDIOLIB
.
See the commentary on virtualenvs, below, as well.
Figuring out the pathname of the currently running executable is a non-trivial exercise with plenty of operating system-bespoke special cases and with the potential for no useful result whatsoever.
argv[0]¶
argv[0]
has its own issues. We’ll discuss these in a moment.
Most of the commands you run will be found on the PATH
and
will be launched (with execve(2) or some-such) such that
argv[0]
is what you type, say, ls
. If you want to know
which ls is being run we have to hunt along the
PATH
ourselves to find it.
On the other hand, if you explicitly run a command with a specific
pathname element, say, ./bin/ls
then whilst argv[0]
is still
what you typed, we can derive the full pathname from argv[0]
itself. realpath(3) is your friend, here, and you’ll get
back some /path/to/bin/ls
.
You can figure it out yourself without realpath(3) as
argv[0]
will be either an absolute pathname or will require the
melding of the current working directory and argv[0]
resolving
symlinks and flattening out any .
and ..
elements
along the way.
Unfortunately, resolving symlinks hides the pathname to the original
argv[0]
which will be a problem for virtualenvs. So we’ll need to
figure out a “normalized” argv[0]
(not resolving symlinks) and a,
maybe different, maybe not, “resolved” argv[0]
(which has resolved
symlinks).
In the meanwhile, ls (probably) doesn’t much care where the
binary was when it was launched. If I copy /usr/bin/ls
to my
home directory and then run $HOME/ls
I’ll still get a listing.
On the other hand, we might care a little bit more about which
executable has been launched as for multiple installations we would
expect to have some installation-specific library files nearby. If we
had been launched as .../bin/idio
then we would probably
expect to have .../lib/idio
with a complement of library
files.
That’s important in development as new features in the executable are likely to go hand in hand with the use of those new features in the library files.
The essence of the issue is that if you have run an explicit pathname(d) executable then the associated libraries should be prefixed to any existing IDIOLIB.
Of course, if we don’t have an IDIOLIB when we start up then we should create one with the operating system appropriate library location.
Note, however, that argv[0]
cannot be relied upon to actually be
the name of the executable. I’m sure many of us have written a
“pretty name” tool to replace the otherwise indistinguishable command
names in ps output. We’ll be cursing ourselves now!
What else can we use?
/proc¶
Many Unix systems have a /proc
filesystem which has useful
information about running processes. /proc
has no standard
format and so the appropriate entry to probe for is operating
system-dependent. There’s more details here
and here
amongst others, no doubt.
caveats¶
Yes, this is something of a race condition as the very first thing the code does is try to figure out where it is. However, that’s the very nature of race conditions, it might happen this time.
One problem that both argv[0]
and /proc
suffer from is the
(legitimate) use of unlink(2) to remove the running
executable from the filesystem. Maybe the /proc
variant might
survive that experience but we won’t be able to stat(2)
anything we lookup on the PATH
based on argv[0]
.
There’s no particular answer to that as there is no answer to the “pretty name” variant. From this we recognise we must handle no valid answer.
What’s the answer in this case, then? Firstly print out a warning that something awful has happened. Then I guess we have to use the operating system-dependent default values and trust that the user can identify the external issue.
Is it likely? Well it’s not uncommon for Continuous Integration systems to delete build artifacts, including target executables, before moving onto a test stage. All they would need to do is run that cleanup stage in parallel with the test stage, for efficiency reasons, and suddenly we’re at risk.
Another problem is if chroot(2) has been called in between the exec(2) and us trying to resolve the real path. That’s unlikely to be us…isn’t it?
Again, there’s not much to be doing here other than use a fallback.
Virtualenvs¶
What happens with a virtualenv?
We are going to run .../bin/idio
, or, possibly,
.../bin/link
, which will be a symlink to, say,
/usr/bin/idio
(or some other deployed executable). Or it
could be a chain of symlinks (think: python
to python3
to python3.9
to /usr/bin/python3.9
) for which we
ignore all the intermediaries. We’re just looking at
.../bin/idio
, the original script interpreter and
/usr/bin/idio
the actual executable.
We will want to have both .../lib/idio
, the “virtualenv
libraries”, and $usrlib/idio
(or whatever is appropriate for
the system), the “executable libraries”, on IDIOLIB
.
Obviously, if .../bin/idio
is a symlink to
/path/to/bin/idio
then we be looking for .../lib/idio
and /path/to/lib/idio
to be used.
Here’s a subtlety, though, suppose you have set IDIOLIB
before you start. I think the result should be (broadly):
venv_lib:exe_lib:IDIOLIB
That is any virtualenv and executable libraries should prefix any
existing IDIOLIB
– even if IDIOLIB
already
contains venv_lib
or exe_lib
I think I would rather protect the integrity of the script being run
(which is expecting particular library suites in venv_lib
and exe_lib
) than accommodate astute users. The more adept
can work their way around anyway.
I’m thinking in terms of multiple virtualenvs, A, B and C, where each call commands from the other.
*
The one thing you may (will?) end up with is repeated prefixing (or suffixing) of library paths.
Historically, I’ve written a trimpath
function to re-write any
colon-separated PATH-style variable with its equivalent without
repetitions. A useful tool in the bag that’s worth adding, I think.
*
If .../bin/idio
is a hard link to an executable, this doesn’t
work. It is not (usefully) possible to determine the other name(s)
for the executable (the other reference(s) to the inode) and even if
we did pause the bootstrap to search the entire filesystem we can’t
reliably determine which is the one true name if all have a
corresponding lib
directory.
The upshot is that if we stat(2) argv[0]
(or the
realpath(3) version of it) and it is a symlink then we need
to add the corresponding lib
hierarchy to IDIOLIB
followed by the corresponding lib
directory associated with
the resolution of readlink(2) of the original interpreter.
Last built at 2024-12-21T07:11:01Z+0000 from 463152b (dev)