Extensions Location

Where to put stuff is (finally) being recognised as a thing.

In the Beginning, there was only one disk and it was full. So stuff spilled over into any handy spare space.

Wherever you put it, though, it was always the same kind of stuff, the same kind of binaries because they were all running on your computer.

With the advent of networked file systems you immediately begin to hit, at the very least, version issues between the sharing systems. If you then threw a different kind of machine into the mix then you were in heaps of trouble.

Skipping history, in the modern age we can use QEMU to run:

operating systems for any machine, on any supported architecture

Now we’re in real trouble! And yet, amazingly, we still install executables in /usr/bin and libraries in /usr/lib – albeit there’s been a concession to use /usr/lib64 for 64-bit libraries.

It’s still pretty hopeless.

Multiarch

There are attempts to do better and file-hierarchy(7) (Linux or systemd-systems only?) references the File System Hierarchy and the XDG Base Directory Specification and the Debian Multiarch Architecture Specifiers (Tuples) amongst others.

Operating Systems take differing views on the detail, Fedora sticks with /usr/lib64 for the suggested $libdir (systemd-path system-library-arch) whereas Ubuntu goes all in with /usr/lib/x86_64-linux-gnu and /usr/lib/aarch64-linux-gnu, for example. My Raspberry Pi 3B+ (standard unit of computing) reports /usr/lib/arm-linux-gnueabihf with the eabihf part reflecting that it is a (little endian 32-bit) “ARM EABI, hard-float” – which I don’t think you need me to explain….

Those tuples don’t feel particularly easy to derive without asking systemd-path (or gcc -print-multiarch (since 4.9), I see) and stashing the results.

With the multiarch work, though:

The existing proposals allow for the co-installation of libraries and headers for different architectures, but not (yet) binaries.

which rather seems to defeat the point.

Not that I have any particular answer, I can only report what I see.

*

For my own sins I did do some work with shell script wrappers where everything that is architecture-dependent is installed in an architecture-dependent directory leaving the question of, how do I know what to run?

Easy! For every executable you expect to be run you create a link for the named executable to the wrapper script in .../bin, the place on your PATH.

When you run the command, you are really running the wrapper script which figures out where it is, ..., and then calculates any appropriate set of related environment variables, say, LD_LIBRARY_PATH with architecture-dependence included and PERL5LIB, and then runs the architecture-dependent executable.

For example, .../bin/foo, on an x86_64 system, might figure out:

  • an LD_LIBRARY_PATH of .../lib/x86_64

  • a PERL5LIB of .../lib/perl5

  • and the real command to be run, .../bin/x86_64/foo, and runs it with all the original arguments intact.

You can imagine all sorts of:

  • operating system binary compatibility rules can be played out – in my case SunOS-sparc-5.10 systems could run SunOS-sparc-5.9 executables and SunOS-sparc-5.8 executables and …

  • and some reasonably sensible rules to avoid rewriting environment variables unnecessarily if .../bin/foo was likely to call .../bin/bar – you could update PATH to point directly at .../bin/x86_64 in preference to .../bin as LD_LIBRARY_PATH is now correct.

It also had the handy (for filthily casual sysadmins [*cough*]) property of allowing you to tar the whole ... hierarchy up and drop it anywhere on any other (compatible) system and it would just work.

*

The ultimate trick, there, though, was replacing the executable with a shell script. They don’t take very long to run and can figure out a lot of stuff.

Nothing seems to be being proposed which has the same functionality, hence we’re tied to binaries in /usr/bin and if you stomp over it with a binary from a different architecture then, uh, User Error.

Plenty of people have been looking at the problem from which you conclude there is some nuance I am not appreciating.

*

The one thing the multiarch work does do is address the disparate ABIs with a variant of the GNU triplet, something like x86_64-linux-gnu.

That gives us an ARCH variant to distinguish this architecture from some other architecture.

Versions

We also have some version numbers to throw into the mix. Idio has a version number, say, IDIO_VER, and your extension has a version number, say, EXT_VER.

You would think we ought to be able to have more than one version of Idio installed – think python2.7 and python3.9, say, which, each in turn, use /usr/lib/pythonver/site-packages and /usr/lib64/pythonver on my system.

And you’d like to think you can have multiple versions of your multi-featured yet remarkably stable foo extension module at the same time.

Ideally, then, the path to our shared library should involve all of the above, something like .../lib/idio/IDIO_VER/foo/FOO_VER/ARCH/libfoo.so.

Some immediate thoughts:

  1. that doesn’t look great – but what do you care, this is for a machine

  2. you don’t really want users to have to second guess any of that, .../lib/idio ought to be enough for a user and the machine can figure out the rest

  3. the positioning of /ARCH/ is contentious (even to me!).

    Take GCC, for example, which has followed file-hierarchy(7)’s suggestion to use a subdirectory of /usr/lib (specifically, rather than $libdir) for its own nefarious purposes then chosen /ARCH/MAJOR/ where ARCH is a multiarch form even on a non-multiarch system (eg, Fedora where it uses a GNU triplet). That hierarchy does include header files.

    That seems too high up for ARCH where much of the use is for architecture-independent .idio files, so push it further down.

    Python3 uses something like:

    $libdir/python3.9/lib-dynload/EXT.cpython-3.9-ARCH.so

    albeit the EXT is often _EXT to distinguish the shared library extension from any pure-Python code which is likely to be in $libdir/python3.9/EXT.py (and various $libdir/python3.9/__pycache__/EXT.cpython-3.9[.opt-n].pyc byte-compiled files).

  4. and just where is ... anyway? I imagine most people have their sights set on /usr/lib (or /usr/lib64) but it could be somewhere more centrally managed or shared like:

    • /usr/local – albeit the *BSDs like to use /usr/local as the installation point of choice for their ports(7) packages treating it as a (genuinely) local filesystem that the OS has full control over

    • /opt – which SunOS likes to use for third party package installs (let’s not start on the multitude of /usr/collection trees to keep your PATH full and busy)

    • $HOME on a network filesystem or, better, $HOME/.local for some XDG compliance

In fact, how might we figure any of that out?

Elsewhere, for shared libraries

  • Perl uses something like $libdir/perl5/auto/ext/ext.so (where perl5 might be variants of perl5/PERL5_VER) except SunOS where it is generally in /usr/perl5/PERL5_VER and uses lib/arch-tuple/auto/ext/ext.so and arch-tuple is a variation on the theme of the Debain multiarch, above.

  • Python uses a more straight-forward system for building C extensions where it is PYTHONPATH-dir/ext.so – except where they’re not, see above.

Idio

In the first instance, since idio is running, it ought to know its own version number, IDIO_VER, although we have a bootstrap issue of which version of Idio will we run when we type idio?

Python solves that by having /usr/bin/python be a symlink to /usr/bin/python3 which is a symlink to /usr/bin/python3.9. You need to explicitly run /usr/bin/python2 etc. to get the older Python.

In fact, Python’s virtualenvs create a similar set of symlinks in .../venv/bin (albeit using a different schema).

Of interest is the value of IDIO_VER. In Python’s case it is a major.minor number yet my python --version reports “Python 3.9.6” and, presumably, has a more specific version number than that, too, see Version Numbers.

Extension

Slightly more problematically, is deriving the version number of the extension, foo, here, FOO_VER. You guess we started with something like load foo so how do we get to FOO_VER?

Here, I think, we need some concept of a “latest” – which, I admit, is probably the wrong word but we’ve started so let’s continue the thought. Suppose we have a bit of a development frenzy and install several versions of foo:

  • .../lib/idio/IDIO_VER/foo/FOO_VER1/ARCH/libfoo.so

  • .../lib/idio/IDIO_VER/foo/FOO_VER2/ARCH/libfoo.so

  • .../lib/idio/IDIO_VER/foo/FOO_VER3/ARCH/libfoo.so

and then decide that FOO_VER3 is a bit rubbish and re-deploy FOO_VER2.

We would expect that anyone invoking load foo will get FOO_VER2 and so something has to say FOO_VER2 is the latest to be deployed – whatever the possible versions, especially as FOO_VER3 sorts higher than FOO_VER2 (however it is that you manage to sort version numbers).

Clearly, there needs to be a loading mechanism to support loading FOO_VER3 specifically (crazy fools!) which might, *thinks for too little time*, look like load foo@FOO_VER3. Of course, anyone loading an explicit version will not be able to take automatic advantage of the newest shiny FOO_VER4 which will solve all known problems when it is finally released. Indeed, you will never know it is running against an outdated version.

So we can imagine a bunch of .../lib/idio/IDIO_VER/ext/latest files which contain, say, EXT_VER – or, better, ext@EXT_VER – to indicate the latest deployed version which is actually .../lib/idio/IDIO_VER/ext/EXT_VER/ARCH/libext.so.

All good.

Double Trouble

But wait! Suppose we now release another version of Idio? Or, more perniciously, another spin on the existing IDIO_VER if it takes the form major.minor like Python. Here, I’m thinking in Python terms of a 3.9.7 release updated from the 3.9.6 we have installed but both bearing the same 3.9 IDIO_VER.

That’s bound to change many of those latest files to whatever is the latest bits. But the previously installed idio.major.minor won’t be expected those (newer) extension releases. If you’re very careful not to change any of the internals of Idio then it’ll probably work – until it doesn’t. Tricky.

Or is it? As we can only install a single binary in /usr/bin then we will have overridden that previous idio.major.minor executable.

Is it the only idio.major.minor executable on the system, though? Is it the only idio.major.minor executable using the deployed .../lib/idio/IDIO_VER hierarchy?

To avoid disappointment you have to mandate that only the /usr/bin/idio executable (for some IDIO_VER) can use the $libdir/idio/IDIO_VER hierarchy – or, at least, the only to use it without risk.

Python gets away with that as almost everything is a symlink to the (solitary) one true python.3.9 executable.

If you want to deploy a .../bin/idio then you’ll need to deploy a .../lib/idio/... hierarchy.

Python does do that with its virtualenvs giving, say:

  • .../venv/bin/python (a symlink, eventually, to /usr/bin/python3.9)

  • .../venv/lib/python3.9/site-packages

Multiple Installations

In a previous life, when GNU software was changing fairly rapidly, I reached the stage of installing new releases in separate hierarchies in /usr/local, so, /usr/local/emacs-ver, say, and updated startup scripts to have a nosey and allow daring people to pick up the latest bits and the more risk/surprise-averse could stick with a stable release.

In the modern age of packaged installs, people compiling software is a rarity and we are forced into the single instance /usr/bin cul-de-sac.

Mixed Releases

I have wondered about a slightly different approach where, at deployment time, you might create a build-specific .../lib/idio/IDIO_BUILDVER/latest containing a list of all the extension releases, say, foo@FOO_VERx and bar@BAR_VERy and baz@BAZ_VERz, each appropriate to that IDIO_BUILDVER.

If Idio didn’t know about an extension at the time of deployment then you would derive the extension version number from its latest file – what else have you got to go on?

This method would allow multiple IDIO_BUILDVER to work alongside one another but neither could take useful advantage of any newer EXT_VER than their latest files allowed without risking invoking some far more advanced release of code.

Virtualenvs

We are rolling back around to that sort of Python-esque virtualenv system where we use a system installed executable but we pick up the virtualenv-specific installed extensions in preference.

Now we have to be a bit more careful when resolving where we are.

Shared Library

A final note is that it is common practice for files in the standard system library directories – and, presumably, for all libraries as part of standardized packaging instructions – to follow the shared library naming conventions resulting in a linker name, libfoo.so, being a symlink to the soname (shared object name), libfoo.so.major, which is a symlink to a real name, libfoo.so.major.minor[.rev].

In this format, major is updated for interface-breaking changes, minor is for updates to major and rev is the revision of minor.

In principle, /usr/lib64/libc.so is a symlink to, say, /usr/lib64/libc.so.2.33 although as I test that this Fedora system has libc.so as “GNU ld script”:

/* GNU ld script
   Use the shared library, but some functions are only in
   the static library, so try that secondarily.  */
OUTPUT_FORMAT(elf64-x86-64)
GROUP ( /lib64/libc.so.6 /usr/lib64/libc_nonshared.a  AS_NEEDED ( /lib64/ld-linux-x86-64.so.2 ) )

which suggests to use of /lib64/libc.so.6 which is a symlink to libc-2.33.so.

So, “along those lines”, then.

*

For added fun, libtool uses a different version numbering scheme involving more precise backwards compatibility.

Where Are We?

I often install bundles where you want to pick up items related to the executable being run. If you know this executable is .../bin/idio, say, then you can find the libraries that this executable was meant to run with in .../lib/idio.

This gives your bundle the air of position independence which is extremely useful if you have multiple, potentially incompatible, versions lying around. Even better when you can tar the bundle up and drop it elsewhere and have it just work.

This sort of position independence is similar to the Python-style virtualenv and RedHat Software Collections both of which require that you explicitly run a command to “activate” the new environment. I’ve always preferred the idea that running a command should be enough to activate the environment on its own.

That brings up a bit of a dance around auto-updating environment variables which is influenced by whether or not environment variables have been set at all.

There are two “executable” pathnames of interest, here:

  1. the pathname you executed by dint of the specific executable you ran (./bin/idio or /path/to/bin/idio) or is found on your PATH either of which, in the case of bundles/virtualenvs, might be a symlink (or chain of symlinks) to the real executable

  2. the real executable

    Which, when you’re not using a bundle/virtualenv, is probably the same as the first value.

For example, suppose I have created an XDG approved $HOME/.local/bin/idio symlink to a real deployed executable /path/to/deployed/bin/idio and that $HOME/.local hierarchy contains your favourite Idio extensions in $HOME/.local/lib/idio.

If I run, via PATH or directly, $HOME/.local/bin/idio I would expect to see both $HOME/.local/lib/idio and /path/to/deployed/lib/idio on IDIOLIB.

In particular, $HOME/.local/lib/idio before /path/to/deployed/lib/idio.

There’s a slight variation for system executables (in /usr/bin or /bin) as the system will expect their library files to be in $libdir (/usr/lib64 or /usr/lib/x86_64-linux-gnu or wherever) but we can deal with that.

The question is, where does any existing IDIOLIB fit with respect to these to, executable-oriented paths?

My bundling belief is that the executable-orientated paths should be before any existing IDIOLIB.

See the commentary on virtualenvs, below, as well.

Figuring out the pathname of the currently running executable is a non-trivial exercise with plenty of operating system-bespoke special cases and with the potential for no useful result whatsoever.

argv[0]

Most of the commands you run will be found on the PATH and will be launched (with execve(2) or some-such) such that argv[0] is what you type, say, ls. If you want to know which ls is being run we have to hunt along the PATH ourselves to find it.

On the other hand, if you explicitly run a command with a specific pathname element, say, ./bin/ls then whilst argv[0] is still what you typed, we can derive the full pathname from argv[0] itself. realpath(3) is your friend, here, and you’ll get back some /path/to/bin/ls.

You can figure it out yourself without realpath(3) as argv[0] will be either an absolute pathname or will require the melding of the current working directory and argv[0] resolving symlinks and flattening out any . and .. elements along the way.

Unfortunately, resolving symlinks hides the pathname to the original argv[0] which will be a problem for virtualenvs. So we’ll need to figure out a “normalized” argv[0] (not resolving symlinks) and a, maybe different, maybe not, “resolved” argv[0] (which has resolved symlinks).

In the meanwhile, ls (probably) doesn’t much care where the binary was when it was launched. If I copy /usr/bin/ls to my home directory and then run $HOME/ls I’ll still get a listing.

On the other hand, we might care a little bit more about which executable has been launched as for multiple installations we would expect to have some installation-specific library files nearby. If we had been launched as .../bin/idio then we would probably expect to have .../lib/idio with a complement of library files.

That’s important in development as new features in the executable are likely to go hand in hand with the use of those new features in the library files.

The essence of the issue is that if you have run an explicit pathname(d) executable then the associated libraries should be prefixed to any existing IDIOLIB.

Of course, if we don’t have an IDIOLIB when we start up then we should create one with the operating system appropriate library location.

Note, however, that argv[0] cannot be relied upon to actually be the name of the executable. I’m sure many of us have written a “pretty name” tool to replace the otherwise indistinguishable command names in ps output. We’ll be cursing ourselves now!

What else can we use?

/proc

Many Unix systems have a /proc filesystem which has useful information about running processes. /proc has no standard format and so the appropriate entry to probe for is operating system-dependent. There’s more details here and here amongst others, no doubt.

caveats

One problem that both argv[0] and /proc suffer from is the (legitimate) use of unlink(2) to remove the running executable from the filesystem. Maybe the /proc variant might survive that experience but we won’t be able to stat(2) anything we lookup on the PATH based on argv[0].

There’s no particular answer to that as there is no answer to the “pretty name” variant. From this we recognise we must handle no valid answer.

What’s the answer in this case, then? Firstly print out a warning that something awful has happened. Then I guess we have to use the operating system-dependent default values and trust that the user can identify the external issue.

Is it likely? Well it’s not uncommon for Continuous Integration systems to delete build artifacts, including target executables, before moving onto a test stage. All they would need to do is run that cleanup stage in parallel with the test stage, for efficiency reasons, and suddenly we’re at risk.

Another problem is if chroot(2) has been called in between the exec(2) and us trying to resolve the real path. That’s unlikely to be us…isn’t it?

Again, there’s not much to be doing here other than use a fallback.

Virtualenvs

What happens with a virtualenv?

We are going to run .../bin/idio, or, possibly, .../bin/link, which will be a symlink to, say, /usr/bin/idio (or some other deployed executable). Or it could be a chain of symlinks (think: python to python3 to python3.9 to /usr/bin/python3.9) for which we ignore all the intermediaries. We’re just looking at .../bin/idio, the original script interpreter and /usr/bin/idio the actual executable.

We will want to have both .../lib/idio, the “virtualenv libraries”, and $usrlib/idio (or whatever is appropriate for the system), the “executable libraries”, on IDIOLIB.

Obviously, if .../bin/idio is a symlink to /path/to/bin/idio then we be looking for .../lib/idio and /path/to/lib/idio to be used.

Here’s a subtlety, though, suppose you have set IDIOLIB before you start. I think the result should be (broadly):

venv_lib:exe_lib:IDIOLIB

That is any virtualenv and executable libraries should prefix any existing IDIOLIBeven if IDIOLIB already contains venv_lib or exe_lib

I think I would rather protect the integrity of the script being run (which is expecting particular library suites in venv_lib and exe_lib) than accommodate astute users. The more adept can work their way around anyway.

I’m thinking in terms of multiple virtualenvs, A, B and C, where each call commands from the other.

*

The one thing you may (will?) end up with is repeated prefixing (or suffixing) of library paths.

Historically, I’ve written a trimpath function to re-write any colon-separated PATH-style variable with its equivalent without repetitions. A useful tool in the bag that’s worth adding, I think.

*

If .../bin/idio is a hard link to an executable, this doesn’t work. It is not (usefully) possible to determine the other name(s) for the executable (the other reference(s) to the inode) and even if we did pause the bootstrap to search the entire filesystem we can’t reliably determine which is the one true name if all have a corresponding lib directory.

The upshot is that if we stat(2) argv[0] (or the realpath(3) version of it) and it is a symlink then we need to add the corresponding lib hierarchy to IDIOLIB followed by the corresponding lib directory associated with the resolution of readlink(2) of the original interpreter.

Last built at 2024-09-16T06:11:15Z+0000 from 463152b (dev)