.. include:: ../../global.rst
.. _`C-api`:
*******
C API
*******
Rationale
=========
:lname:`Idio` is written in :lname:`C` and therefore uses "the"
:lname:`C` API. So what's the problem?
The problem comes in the form of portability. Let's say I want to
call and use the result of :manpage:`getpid(2)`. Easy enough, it's:
.. aside:: No flies on us!
.. code-block:: c
pid_t pid = getpid ();
OK. Now I need to store this ``pid_t`` to be able to pass it around
as an ``IDIO`` value. And a ``pid_t`` is a what, exactly, on your
system? I have access to systems where it is either an ``int`` or a
``long`` (which doesn't tell me if an ``int`` differs from a ``long``
anyway). So, er, which one?
Another slightly more famous example is a ``time_t``. It was (and
still will be in some cases) a 32-bit entity resulting in the
impending `Epochalypse
`_ in January 2038.
Except every modern Linux operating system doesn't have that problem
as Arnd Bergmann and John Stultz plodded though and changed the
interfaces to 64-bit (kicking the Epochalyse can down the road for `a
few hundred billion years
`_...).
And, another bugbear, how do I print a ``time_t`` out without
upsetting the compiler on some system or another?
Underlying this is that the :lname:`C` API uses an opaque *typedef*
for some base :lname:`C` type. For us to be able to store and
transport it we could do with knowing what that type is.
In the first instance, we might just use ``intmax_t`` and
``uintmax_t`` (whatever *they* are) to store all :lname:`C` integral
types and let the :lname:`C` compiler figure out the casting. But
that doesn't seem like *Art*. Not least because we *still* don't know
if a ``pid_t`` or a ``time_t`` is signed or unsigned -- OK, we can
take a look/reasonable guess and hope that everyone else uses the
same.
Instead, wouldn't it be useful if we could write code that looked
like:
.. code-block:: c
pid_t pid = getpid ();
IDIO ipid = idio_C_pid_t (pid);
where the ``idio_C_pid_t`` constructor is aware of whatever underlying
base type a ``pid_t`` is on *this* system and stashes it as an ``int``
or ``long`` as appropriate.
Now that the value is an ``IDIO`` value we can pass it merrily around
to anyone else. The only people who will actually care whether it is
an ``int`` or a ``long`` are those manipulating it and they would want
a deconstructor, ``IDIO_C_TYPE_pid_t``, that is equally aware of the
stash leaving us to write:
.. code-block:: c
void idio_func (IDIO v)
{
IDIO_USER_C_TYPE_ASSERT (pid_t, v);
pid_t pid = IDIO_C_TYPE_pid_t (v);
...
}
where the type assertion knows the mapping from ``pid_t`` to ``int``
or ``long`` as well.
Conversions
-----------
Hmm, wait a minute, what if *we* in :lname:`Idio`\ -land want to
create a ``pid_t``? Suppose we've read in the output of :program:`ps`
and decide we need to :manpage:`kill(2)` some egregious process?
We'll have started with some string which we can convert into an
:lname:`Idio` number with ``read-number`` but we, probably, don't even
know if that number is even a fixnum or a bignum. How are we going to
create a ``pid_t``?
We do have ``C/integer->`` to help us which can be given a clue as to
which :lname:`C` type to create.
So, we need to pass it some clue as to which of the fourteen
:lname:`C` base types we want. Obviously, we're going to pass it
``pid_t`` -- because that's *all* we want to know about this value --
which needs to map to some symbol (the usual :lname:`Scheme`\ -ish
way) which eventually resolves to either ``'int`` or ``'long``
dependent on system.
.. code-block:: idio-console
Idio> C/integer-> 23 libc/pid_t
23
.. aside::
I don't think the :lname:`C` compiler would cope with two typedefs
of ``pid_t`` so maybe we should have a flat namespace (here, just
``pid_t``) as well.
However, see the commentary in :ref:`namespaces`, below.
In practice, ``pid_t`` is really ``libc/pid_t``, ie. to accommodate
different libraries' potential name clashes we'll prefix the typedef
name with the module the name comes from.
Those names are exported from ``libc`` but so are many many other
names with a huge potential for clashes:
.. code-block:: idio
import libc
can be interesting.
Also notice that the printed representation of a :lname:`C` ``pid_t``
is, perhaps unsurprisingly, indistinguishable from the fixnum we
constructed it from! I'm not sure there's a useful solution to this.
There is the similar ``C/->integer`` which will return an
:lname:`Idio` fixnum or bignum (depending on the size of the
:lname:`C` value). On a 64-bit machine:
.. code-block:: idio-console
Idio> fixnum? (C/->integer libc/INT_MAX)
#t
Idio> fixnum? (C/->integer libc/INTMAX_MAX)
#f
On a 32-bit system, ``libc/INT_MAX`` is also a bignum.
Predicates
----------
In the same manner any ``libc``-oriented code will want to be able to
test that some :lname:`Idio` value is a ``libc/pid_t``. That
predicate should, by rights, be ``libc/pid_t?``.
Given that ``pid_t`` is a typedef to ``int`` (or ``long`` or whatever)
then we're going to need a mapping from ``libc/pid_t?`` through to the
:lname:`C` domain's ``C/int?`` or ``C/long?`` as appropriate.
Note that the change, arguably, *chain*, from a :lname:`C` library
(ie. :lname:`Idio` module) into the :lname:`C` domain's (fourteen)
base types might well involve some multi-library hops depending on how
the typedefs roll.
Caveats
=======
The main caveat is that if your interface uses ``#define`` then this
mechanism won't help you. ``#define``\ s are part of the :lname:`C`
pre-processor and so those definitions will have disappeared by the
time the :lname:`C` compiler has its way and we get to look at the
generated code.
Another issue is that the set of values that some parameters take
might be system-dependent. Think of the ``resource`` parameter to
:manpage:`getrlimit(2)`, the likes of ``RLIMIT_NOFILE``,
``RLIMIT_NPROC`` etc.. On the one hand that ought not be an issue
except that it presumes we do, somehow, know what values have been
defined on this system. Some Linux systems have migrated these values
into an enumerated type, which we can see, and maintained the
:lname:`C` macros for backwards compatibility. Other systems are
``#define`` only.
For these to be portable you will have to resort to using something
along the lines of the error, signal and rlimit value tests:
.. code-block:: c
:caption: :file:`src/libc-wrap.c`
#if defined (EBADF)
IDIO_LIBC_ERRNO (EBADF);
#endif
and empirical knowledge of the ported systems.
.. _namespaces:
Namespaces
==========
I started out distinguishing between the base :lname:`C` types in the
existing ``C`` namespace and those new typedefs being introduced from
libc in the ``libc`` namespace before thinking that :lname:`C` doesn't
have any namespaces, what am I doing?
However, :lname:`C++` does have namespaces so I guess it would be
prudent to maintain this artiface.
This will lead to the mildly confusing ``idio_libc_pid_t`` constructor
and some :samp:`idio_{libfoo}_pid_t` constructor from some
:samp:`{libfoo}` API which also happens to use a ``pid_t`` in its
interface.
So, to correct the earlier names, the proper convention is that for
true :lname:`C` domain types (``int``, ``long`` etc.) we'll use:
.. code-block:: c
IDIO_USER_C_TYPE_ASSERT ({base-type}, v);
{base-type} C_v = IDIO_C_TYPE_{base-type} (v);
IDIO r = idio_C_{base-type} (C_v);
and for some :samp:`{libfoo}` API using a typedef'd symbol we'll have:
.. code-block:: c
IDIO_USER_{libfoo}_TYPE_ASSERT ({type-def}, v);
{type-def} C_v = IDIO_{libfoo}_{type-def} (v);
IDIO r = idio_{libfoo}_{type-def} (C_v);
therefore, for the ``pid_t`` in ``libc`` example, we'll have:
.. code-block:: c
IDIO_USER_libc_TYPE_ASSERT (pid_t, v);
pid_t C_v = IDIO_libc_pid_t (v);
IDIO r = idio_libc_pid_t (C_v);
Build Bootstrap I
=================
We'll come back to this but :lname:`Idio` uses the :lname:`C` API for
:file:`libc` to run and yet we're somehow using :lname:`Idio` to
*build* that :file:`libc` interface. How does *that* work?
.. aside::
"Cheat" is a very loaded term. Of course we mean that we use our
experience, wisdom and guile buried under a weight of fortune...
Well, we cheat, of course. In the first instance there are some
:file:`libc` API files that are at least consistent, if not
necessarily correct for this system, these are in
:file:`build-bootstrap` subdirectories.
That is enough to allow us to run :program:`idio-c-api-gen` to build
correct :file:`libc` API files for this system. The presence of which
prompts :program:`make` to rebuild :program:`idio`.
DWARF
=====
.. sidebox::
DWARF was originally a `play on words
`_
given its relationship to ELF, the Executable and Linking Format
for object files but later backronym'ed to...well, who cares?
`DWARF `_ is our friend, here. One observation
is that the debugger seems to know a lot about the types in our system
and if we investigate the output of a "debug" object file it turns out
to contain a lot of information about the types in the object file.
But, as noted, nothing from the :lname:`C` pre-processor.
.. sidebox::
:program:`readelf` is not available on Mac OS X -- noting, in
turn, that :program:`objdump` on Mac OS X is rubbish and you need
to run :program:`dwarfdump` to get anything useful.
:program:`objdump` seems to exist on all the systems I use which
produces a wealth of information in text format. Given we'll have to
parse something, text format seems a lot easier for us.
DIEs
----
The output is in the form of cascades of :abbr:`DIE (Debugging
Information Entry)`\ s. These look something like:
.. code-block:: text
<1><50>: Abbrev Number: 2 (DW_TAG_base_type)
<51> DW_AT_byte_size : 2
<52> DW_AT_encoding : 5 (signed)
<53> DW_AT_name : (indirect string, offset: 0x9c4): short int
<1><57>: Abbrev Number: 3 (DW_TAG_base_type)
<58> DW_AT_byte_size : 4
<59> DW_AT_encoding : 5 (signed)
<5a> DW_AT_name : int
The output varies with:
* the (historic) version of :program:`objdump` changes the exact format
* the set of entries is dependent on the types used in the object file (duh!)
* the *order* of output varies on the contents of the object file
However, reading between the lines we can see there appear to be two
``base_type``\ s defined in that snippet:
* a ``short int`` with a ``byte_size`` of 2 is type ``<50>``
* an ``int`` with a ``byte_size`` of 4 is type ``<57>``
The sizes seem about right for this system for a start!
The ``encoding`` attribute is interesting as it is a clue as to how to
interpret those 2 or 4 bytes. ``signed``, ``signed char``,
``unsigned`` and ``float`` are examples. I don't think in :lname:`C`
there's really very much going on, here, but I get the impression that
the ``encoding`` allows for much more like the
``UTF-8``/``UCS-2``\ -style of encoding.
.. note::
A DIE without a type uses the implied type of ``void``.
pid_t
^^^^^
Let's try a ``pid_t``:
.. code-block:: c
:caption: :file:`src/libc-api.c`
int main (int argc, char ** argv)
{
pid_t pid = getpid ();
return 0;
}
which compiling, ``gcc -g -c -o libc-api.o libc-api.c`` gives us
(amongst other things):
.. code-block:: text
<1><57>: Abbrev Number: 3 (DW_TAG_base_type)
<58> DW_AT_byte_size : 4
<59> DW_AT_encoding : 5 (signed)
<5a> DW_AT_name : int
<1>: Abbrev Number: 4 (DW_TAG_typedef)
DW_AT_name : (indirect string, offset: 0x7f1): __pid_t
DW_AT_decl_file : 2
DW_AT_decl_line : 154
DW_AT_decl_column : 25
DW_AT_type : <0x57>
<1><1ea>: Abbrev Number: 4 (DW_TAG_typedef)
<1eb> DW_AT_name : (indirect string, offset: 0x727): pid_t
<1ef> DW_AT_decl_file : 3
<1f0> DW_AT_decl_line : 97
<1f1> DW_AT_decl_column : 17
<1f2> DW_AT_type : <0xe9>
The third DIE shown, ``<1ea>`` at the bottom, is a ``typedef`` for
``pid_t`` and has some clues about a file and that its ``type`` is
``<0xe9>``.
The middle DIE shown, is ```` which is itself a ``typedef`` for
``__pid_t`` from a different file (you would guess) and in turn has
its type as ``<0x57>``.
The first DIE shown is ``<57>`` and is a ``base_type``, an ``int``.
Sweet!
As noted, another system typedefs ``pid_t`` directly to a ``long int``
base type.
That raises another spectre. Where has this ``__pid_t`` type come
from and is it portable?
Well, the answer is that it is obviously some local definition and is
clearly not portable so we need to be more careful. Not only are the
ultimate base types of the typedefs different but the mappings through
to them are different too.
That might come back to bite us, it might not.
.. rst-class:: center
\*
In the meanwhile we can see a means to automatically generate many of
the interfaces we've been talking about, most of which are simply
redefinitions of existing base types.
In the :lname:`C` world, you can imagine creating:
.. code-block:: c
:caption: :file:`src/libc-api.h`
#define IDIO_TYPE_C_libc___pid_t IDIO_TYPE_C_INT
#define idio_libc___pid_t idio_C_int
#define IDIO_C_TYPE_libc___pid_t IDIO_C_TYPE_int
#define idio_isa_libc___pid_t idio_isa_C_int
#define IDIO_TYPE_C_libc_pid_t IDIO_TYPE_C_libc___pid_t
#define idio_libc_pid_t idio_libc___pid_t
#define IDIO_C_TYPE_libc_pid_t IDIO_C_TYPE_libc___pid_t
#define idio_isa_libc_pid_t idio_isa_libc___pid_t
Here,
* :samp:`IDIO_TYPE_C_{module}_{name}` is a mapping from the :lname:`C`
constructed name to an :lname:`Idio` :file:`src/gc.h` compatible
macro for the :lname:`C` base types
This allows us to figure out the correct :manpage:`printf(3)` format
string for a given type.
* :samp:`idio_{module}_{name}` is the constructor
* :samp:`IDIO_C_TYPE_{module}_{name}` is the destructor
An unfortunate naming near-miss with
:samp:`IDIO_TYPE_C_{module}_{name}`.
* :samp:`idio_isa_{module}_{name}` is a predicate
The :samp:`IDIO_USER_{module}_TYPE_ASSERT({type},{x})` is a generic
macro which ultimately is going to call
:samp:`idio_isa_{module}_{type} ({x})` which we've just defined.
and for the :lname:`Idio` world:
.. code-block:: idio
:caption: :file:`lib/libc-api.idio`
export (
__pid_t
__pid_t?
pid_t
pid_t?
)
__pid_t := 'int
define __pid_t? C/int?
pid_t := __pid_t
define pid_t? libc/__pid_t?
Meaning that some :lname:`Idio` code can:
.. code-block:: idio
pid := ...
c-pid := C/integer-> pid libc/pid_t
and everything lines up!
Structures and Unions
^^^^^^^^^^^^^^^^^^^^^
Structures and unions are all "lifted" up to the top level, even if
they are only used inside another structure or union.
That said, they are very simple being the ``structure_type`` itself
(with or without a ``name``) and a sequence of ``member``\ s each of
which has a ``type`` in turn.
Hmm, now we know about the fields in a structure, we ought to be able
to generate some code to be able to access the fields.
From :lname:`Idio` you can imagine wanting to access a
:manpage:`stat(2)` ``struct stat`` along the lines of:
.. code-block:: idio
sb := libc/stat "."
printf "size=%s\n" sb.st_size
which is utilizing the ``value-index`` operator ``.`` to access the
``st_size`` member using a symbol, ``st_size``.
What do we need for that? Well, we should define the symbol, for a
start, then we need a function that can poke about in the ``struct
stat``, let's call it ``struct-stat-ref``:
.. code-block:: c
IDIO_SYMBOL_DECL (st_dev);
IDIO_SYMBOL_DECL (st_ino);
...
IDIO_SYMBOL_DECL (st_size);
...
IDIO_DEFINE_PRIMITIVE2_DS ("struct-stat-ref", libc_struct_stat_ref, (IDIO stat, IDIO member), "stat member", "\
in C, stat->member \n\
\n\
:param stat: C struct stat \n\
:type stat: C/pointer \n\
:param member: C struct member \n\
:type member: symbol \n\
:return: stat->member \n\
:rtype: varies on member \n\
")
{
IDIO_ASSERT (stat);
IDIO_ASSERT (member);
/*
* Test Case: libc-errors/struct-stat-ref-bad-pointer-type.idio
*
* struct-stat-ref #t #t
*/
IDIO_USER_C_TYPE_ASSERT (pointer, stat);
/*
* Test Case: libc-errors/struct-stat-ref-bad-member-type.idio
*
* struct-stat-ref v #t
*/
IDIO_USER_TYPE_ASSERT (symbol, member);
... can we check that stat is a pointer to a struct stat?
struct stat *statp = IDIO_C_TYPE_POINTER_P (stat);
if (idio_S_st_dev == member) {
return idio_libc_dev_t (statp->st_dev);
} else if (idio_S_st_ino == member) {
...
} else if (idio_S_st_size == member) {
return idio_libc_off_t (statp->st_size);
} else
...
}
We should be able to generate a similar ``struct-stat-set!`` although
whether such a function is warranted is another question. *a priori*
it's a valid operation.
For the answer to testing whether ``stat`` really is a ``struct
stat``, see :ref:`subprograms`, below.
Separately, though related, there is the question, about how the
system might know that it should use ``struct-stat-ref``, in
particular, when de-structuring some (random) :lname:`C` pointer,
rather than any other primitive (or function). More on that below.
Pointers
^^^^^^^^
Pointers are flagged en route to the real type underneath. Here's the
case for the ``formal_parameter`` :samp:`{argv}`:
.. code-block:: text
<1><178>: Abbrev Number: 9 (DW_TAG_pointer_type)
<179> DW_AT_byte_size : 8
<17a> DW_AT_type : <0x17e>
<1><17e>: Abbrev Number: 2 (DW_TAG_base_type)
<17f> DW_AT_byte_size : 1
<180> DW_AT_encoding : 6 (signed char)
<181> DW_AT_name : (indirect string, offset: 0x351): char
<1><2fa>: Abbrev Number: 9 (DW_TAG_pointer_type)
<2fb> DW_AT_byte_size : 8
<2fc> DW_AT_type : <0x178>
<2><115d>: Abbrev Number: 36 (DW_TAG_formal_parameter)
<115e> DW_AT_name : (indirect string, offset: 0x57b): argv
<1162> DW_AT_decl_file : 1
<1163> DW_AT_decl_line : 55
<1164> DW_AT_decl_column : 28
<1165> DW_AT_type : <0x2fa>
<1169> DW_AT_location : 4 byte block: 91 f0 b9 7f (DW_OP_fbreg: -8976)
Which you should be able to see both that :samp:`{argv}` is a pointer
to a pointer to a ``char``, ie. ``char **argv`` and that DIEs are not
necessarily printed in an obvious order.
Arrays
""""""
Array types are a bit more subtle. They are described distinctly from
pointers, as they should be, though *we* might use them
indiscriminately from pointers.
In this example, all of the member of the ``struct utsname`` are
``char []`` rather than ``char *``:
.. code-block:: text
<1>: Abbrev Number: 28 (DW_TAG_structure_type)
DW_AT_name : (indirect string, offset: 0x1e8): utsname
DW_AT_byte_size : 390
DW_AT_decl_file : 23
DW_AT_decl_line : 48
DW_AT_decl_column : 8
DW_AT_sibling : <0xcdf>
<2>: Abbrev Number: 12 (DW_TAG_member)
DW_AT_name : (indirect string, offset: 0x5e1): sysname
DW_AT_decl_file : 23
DW_AT_decl_line : 51
DW_AT_decl_column : 10
DW_AT_type : <0xcdf>
DW_AT_data_member_location: 0
<2>: Abbrev Number: 12 (DW_TAG_member)
DW_AT_name : (indirect string, offset: 0x459): nodename
DW_AT_decl_file : 23
DW_AT_decl_line : 54
DW_AT_decl_column : 10
DW_AT_type : <0xcdf>
DW_AT_data_member_location: 65
<1>: Abbrev Number: 5 (DW_TAG_array_type)
DW_AT_type : <0x17e>
DW_AT_sibling : <0xcef>
``<17e>`` was the ``char`` in the pointer example, above.
Notice also, part of the ABI (as opposed to the API) is exposed in
that the ``data_member_location`` for ``nodename`` is 65 bytes along
suggesting that ``sysname`` is defined as ``char sysname[65]`` on this
system.
:manpage:`uname(2)` on this system notes:
The length of the fields in the struct varies. Some operating
systems or libraries use a hardcoded 9 or 33 or 65 or 257. Other
systems use SYS_NMLN or _SYS_NMLN or UTSLEN or _UTSNAME_LENGTH.
Clearly, it is a bad idea to use any of these constants; just use
sizeof(...). Often 257 is chosen in order to have room for an
internet hostname.
Enumerated Types
^^^^^^^^^^^^^^^^
Enumerated types have a ``type`` and a number of ``enumerator``\ s
each of which has a ``const_value``.
.. sidebox::
"A bit loose"? How Unusual for :lname:`C`... *not!*
The wording around the type for an enumerated type is a bit loose and
merely suggests something big enough to hold the entire set of
enumerators.
I can see both ``int`` and ``unsigned int`` enumerated types.
.. _subprograms:
Subprograms
^^^^^^^^^^^
I'm not sure what ``subprogram`` is meant to describe but you will
usually get to see ``main`` or whatever the main enclosing function
is.
The information provided includes the formal parameters and any
variables used in the function (possibly including any lexical blocks
they exist in).
.. sidebox::
Fedora using GCC 10!
However, rather usefully, on some systems it will also have a
``subprogram`` entry for each function it calls.
This is strikingly useful as it only requires one system, using
portable definitions, to produce this information from which we can
sketch out the framework for a series of ``IDIO_DEFINE_PRIMITIVE``
functions describing those calls. Once defined they are, by
definition(?), using the portable :lname:`C` API and are therefore
applicable to all systems.
Due to the absence of debugging information in, in my case, libc,
there's no formal parameter names for the system and library calls I'm
referencing but we do get the formal parameter types and the return
type. We can obviously invent some arguments names, *arg1*, *arg2*,
etc..
For :manpage:`kill(2)`, whose prototype looks like:
.. code-block:: c
int kill(pid_t pid, int sig);
this becomes:
.. code-block:: c
IDIO_DEFINE_PRIMITIVE2_DS ("kill", libc_kill, (IDIO arg1, IDIO arg2), "arg1 arg2", "\
in C: kill (arg1, arg2) \n\
a wrapper to libc kill() \n\
\n\
:param arg1: \n\
:type arg1: libc/__pid_t \n\
:param arg2: \n\
:type arg2: C/int \n\
:return:
:rtype: C/int
")
{
IDIO_ASSERT (arg1);
IDIO_ASSERT (arg2);
/*
* Test Case: libc-errors/kill-bad-arg1-type.idio
*
* kill #t #t
*/
IDIO_USER_libc_TYPE_ASSERT (__pid_t, arg1);
__pid_t C_arg1 = IDIO_C_TYPE_libc___pid_t (arg1);
/*
* Test Case: libc-errors/kill-bad-arg2-type.idio
*
* kill #t #t
*/
IDIO_USER_C_TYPE_ASSERT (int, arg2);
int C_arg2 = IDIO_C_TYPE_int (arg2);
int kill_r = kill (C_arg1, C_arg2);
/* check for errors */
if (-1 == kill_r) {
idio_error_system_errno ("kill", idio_S_nil, IDIO_C_FUNC_LOCATION ());
return idio_S_notreached;
}
/*
* WARNING: this is probably an incorrect return
*/
return idio_C_int (kill_r);
}
Now, that's not too shabby for something automatically generated.
.. rst-class:: center
\*
If we *do* install the libc debugging symbols we *still* don't get
``formal_parameter`` names -- well, for the ``subprogram``\s we're interested in,
anyway.
Installing the debugging symbols isn't quite as obvious as you might
think. In the case of Fedora these are in separate packages in
disabled repos but on the plus side tools like :program:`gdb` know to
go looking for them and programs like :program:`objdump` can be
persuaded with an extra argument (``K`` in this case).
.. code-block:: console
$ sudo dnf --enablerepo=fedora-debuginfo debuginfo-install glibc-debuginfo
$ objdump -WilK /lib64/libc-2.33.so
.. code-block:: text
<1><3452f>: Abbrev Number: 52 (DW_TAG_subprogram)
<34530> DW_AT_external : 1
<34530> DW_AT_name : (indirect string, offset: 0x108c0): kill
<34534> DW_AT_decl_file : 93
<34535> DW_AT_decl_line : 112
<34536> DW_AT_decl_column : 12
<34537> DW_AT_prototyped : 1
<34537> DW_AT_type : <0x2a>
<3453b> DW_AT_declaration : 1
<3453b> DW_AT_sibling : <0x34547>
<2><3453c>: Abbrev Number: 24 (DW_TAG_formal_parameter)
<3453d> DW_AT_type : <0x2341>
<2><34541>: Abbrev Number: 24 (DW_TAG_formal_parameter)
<34542> DW_AT_type : <0x2a>
<2><34546>: Abbrev Number: 0
Some ``subprogram``\s do have named formal parameters in libc which
makes me think the construction of things might be considerably more
interesting than at first blush.
Wait a minute, isn't there some ``__kill`` nonsense floating about
with system calls? Hmm. The ``__kill`` ``subprogram`` also declines
to offer us any formal parameter names. So let's dig deeper:
.. code-block:: console
$ sudo dnf --enablerepo=fedora-debuginfo debuginfo-install kernel-debuginfo-$(uname -r)
$ cd /usr/lib/debug/lib/modules/$(uname -r)
$ nm vmlinux | grep '[tT] kill_'
...
ffffffff810ecb50 T kill_pgrp
ffffffff810ecc20 T kill_pid
ffffffff810ecb90 T kill_pid_info
ffffffff810ea580 T kill_pid_usb_asyncio
ffffffff8131ecc0 t kill_procs
...
*I don't think we're in Kansas any more, Toto!*
.. code-block:: console
$ objdump -WilK vmlinux | less +/kill_pid
.. code-block:: text
<1><10e9d79>: Abbrev Number: 53 (DW_TAG_subprogram)
<10e9d7a> DW_AT_external : 1
<10e9d7a> DW_AT_name : (indirect string, offset: 0x179603): kill_pid
<10e9d7e> DW_AT_decl_file : 2
<10e9d7f> DW_AT_decl_line : 1793
<10e9d81> DW_AT_decl_column : 5
<10e9d82> DW_AT_prototyped : 1
<10e9d82> DW_AT_type : <0x10bcbd0>
<10e9d86> DW_AT_low_pc : 0xffffffff810ecc20
<10e9d8e> DW_AT_high_pc : 0x1a
<10e9d96> DW_AT_frame_base : 1 byte block: 9c (DW_OP_call_frame_cfa)
<10e9d98> DW_AT_GNU_all_call_sites: 1
<10e9d98> DW_AT_sibling : <0x10e9e07>
<2><10e9d9c>: Abbrev Number: 51 (DW_TAG_formal_parameter)
<10e9d9d> DW_AT_name : pid
<10e9da1> DW_AT_decl_file : 2
<10e9da2> DW_AT_decl_line : 1793
<10e9da4> DW_AT_decl_column : 26
<10e9da5> DW_AT_type : <0x10c3638>
<10e9da9> DW_AT_location : 0x3a232c (location list)
<10e9dad> DW_AT_GNU_locviews: 0x3a2326
<2><10e9db1>: Abbrev Number: 51 (DW_TAG_formal_parameter)
<10e9db2> DW_AT_name : sig
<10e9db6> DW_AT_decl_file : 2
<10e9db7> DW_AT_decl_line : 1793
<10e9db9> DW_AT_decl_column : 35
<10e9dba> DW_AT_type : <0x10bcbd0>
<10e9dbe> DW_AT_location : 0x3a237e (location list)
<10e9dc2> DW_AT_GNU_locviews: 0x3a2378
<2><10e9dc6>: Abbrev Number: 28 (DW_TAG_formal_parameter)
<10e9dc7> DW_AT_name : (indirect string, offset: 0x3b5017): priv
<10e9dcb> DW_AT_decl_file : 2
<10e9dcc> DW_AT_decl_line : 1793
<10e9dce> DW_AT_decl_column : 44
<10e9dcf> DW_AT_type : <0x10bcbd0>
<10e9dd3> DW_AT_location : 0x3a23ce (location list)
<10e9dd7> DW_AT_GNU_locviews: 0x3a23ca
<2><10e9ddb>: Abbrev Number: 90 (DW_TAG_GNU_call_site)
...
<3><10e9e05>: Abbrev Number: 0
<2><10e9e06>: Abbrev Number: 0
Hmm, *three* parameters. I think we've gone *too* deep.
So, *query-replace* :samp:`arg{n}` seems fine to me. We almost
certainly need to tweak the code anyway.
.. rst-class:: center
\*
We have a portability issue as Fedora has defined the API with
``__pid_t`` but if we *query-replaced* the double-underscore we're in
a much better position.
We can obviously query-replace ``arg1`` for ``pid`` and ``arg2`` for
``sig`` leaving us with:
.. code-block:: c
IDIO_DEFINE_PRIMITIVE2_DS ("kill", libc_kill, (IDIO pid, IDIO sig), "pid sig", "\
in C: kill (pid, sig) \n\
a wrapper to libc kill() \n\
\n\
:param pid: \n\
:type pid: libc/pid_t \n\
:param sig: \n\
:type sig: C/int \n\
:return:
:rtype: C/int
")
{
IDIO_ASSERT (pid);
IDIO_ASSERT (sig);
/*
* Test Case: libc-errors/kill-bad-pid-type.idio
*
* kill #t #t
*/
IDIO_USER_libc_TYPE_ASSERT (pid_t, pid);
pid_t C_pid = IDIO_C_TYPE_libc_pid_t (pid);
/*
* Test Case: libc-errors/kill-bad-sig-type.idio
*
* kill #t #t
*/
IDIO_USER_C_TYPE_ASSERT (int, sig);
int C_sig = IDIO_C_TYPE_int (sig);
int kill_r = kill (C_pid, C_sig);
/* check for errors */
if (-1 == kill_r) {
idio_error_system_errno ("kill", idio_S_nil, IDIO_C_FUNC_LOCATION ());
return idio_S_notreached;
}
/*
* WARNING: this is probably an incorrect return
*/
return idio_C_int (kill_r);
}
and we may just have a working, portable, interface to the ``libc``
API!
Re-imagining APIs
-----------------
.. sidebox::
It does cover a lot of cases, though!
Now, this almost certainly won't work for you `off the bat
`_ as I've clearly
directed the automatic code generation to handle the most common form
of error that system and library calls produce and assumed you can
directly return the value from the API call.
For calls like :manpage:`getcwd(3)` the value returned is a ``char *``
and for error checking we should be comparing to ``NULL``.
For something like :manpage:`times(3)` which is expecting a ``struct
tms`` to be supplied (unlikely from :lname:`Idio`-land!) but also the
returned value is a ``clock_t``. Here, whilst the "check for errors"
is nominally correct we should be retaining the value and we will end
up returning a list of the ``clock_t`` and the ``struct tms`` back to
the user.
Similarly, a direct copy of the :lname:`C` API is not usefully correct
for something like :manpage:`stat(2)` where the user is in no position
to create a ``struct stat`` to pass as an argument. In this case we
would only accept a ``pathname`` argument and allocate a ``struct
stat`` to be freed later.
This leads to the idea that the automatic code generation will give us
a starter for ten which we can edit into permanence.
.. _`CSI`:
Auto-Application of Methods
^^^^^^^^^^^^^^^^^^^^^^^^^^^
Now, about that ``struct stat`` we've just allocated. We've gone to
the trouble of creating ``struct-stat-ref`` (and the moot
``struct-stat-set!``) to manipulate it. How might we get that to
happen auto-magically?
Well, suppose we associate with the ``C_pointer`` in the ``IDIO``
value we're about to return some useful type information. What's
useful? Hmm, how about a couple of things: a string,
``"libc/struct-stat"`` (useful for reporting) and a reference to
``struct-stat-ref``?
In the latter case, we don't need to also add ``struct-stat-set!`` as
we can use our trusty :ref:`setters` mechanism to do the right thing.
We want something that is unique for each type and a ``pair`` always
falls into that category -- even if the contents are the same the
actual ``pair`` itself is unique in memory.
So, at the time we define our structure, let's throw out a "C
Structure Identification":
.. code-block:: c
IDIO_C_STRUCT_IDENT_DECL (libc_struct_stat);
IDIO_SYMBOL_DECL (st_dev);
...
and then a bit later on, when we're adding primitives etc. we can
expand that into a list:
.. code-block:: c
IDIO fgvi = IDIO_EXPORT_MODULE_PRIMITIVE (idio_libc_module, libc_struct_stat_ref);
IDIO_C_STRUCT_IDENT_DEF ("libc/struct-stat", libc_struct_stat, fgvi);
IDIO_EXPORT_MODULE_PRIMITIVE (idio_libc_module, libc_struct_stat_set);
Here, we take advantage of the fact that
``IDIO_EXPORT_MODULE_PRIMITIVE`` always returned the actual primitive
reference (the value associated with the symbol ``struct-stat-ref``)
-- except we normally throw it away.
We can stick that reference in a list through the :lname:`C` macro
``IDIO_C_STRUCT_IDENT_DEF``.
In practice, it defines a name in :lname:`C`\ -land,
``idio_CSI_libc_struct_stat`` whose value is the list ``("struct stat"
struct-stat-ref)``.
So, we have something unique per structure that we can associate with
a pointer and the ``stat`` primitive can return:
.. code-block:: c
return idio_C_pointer_type (idio_CSI_libc_struct_stat, statp);
Now we can revisit ``struct-stat-ref`` and perform the check:
.. code-block:: c
IDIO_USER_C_TYPE_ASSERT (pointer, stat);
if (idio_CSI_libc_struct_stat != IDIO_C_TYPE_POINTER_PTYPE (stat)) {
idio_error_param_value ("stat", "should be a libc/struct-stat", IDIO_C_FUNC_LOCATION ());
return idio_S_notreached;
}
and we can extend the code for ``value-index`` to have a nosey in any
:lname:`C` pointers that come its way:
.. code-block:: c
:caption: :file:`src/util.c`
...
case IDIO_TYPE_C_POINTER:
{
IDIO t = IDIO_C_TYPE_POINTER_PTYPE (o);
if (idio_S_nil != t) {
IDIO cmd = IDIO_LIST3 (IDIO_PAIR_HT (t), o, i);
IDIO r = idio_vm_invoke_C (idio_thread_current_thread (), cmd);
return r;
}
}
break;
...
We need to do a couple of changes for the *-set!* side of things to
work.
In :lname:`Idio`\ -land we can check that both *-ref* and *-set!*
primitives exist -- after all, someone might deem modifying a
:samp:`struct {foo}` a poor move and have removed ``struct-stat-set!``
-- but the auto-generated :file:`lib/libc-api.idio` doesn't know that:
.. code-block:: idio
:caption: :file:`lib/libc-api.idio`
if (and (function? struct-stat-ref)
(function? struct-stat-set!)) {
set! (setter struct-stat-ref) struct-stat-set!
}
and the code for ``set-value-index!`` can be updated for :lname:`C`
pointers as well:
.. code-block:: c
:caption: :file:`src/util.c`
...
case IDIO_TYPE_C_POINTER:
{
IDIO t = IDIO_C_TYPE_POINTER_PTYPE (o);
if (idio_S_nil != t) {
/*
* We want: (setter ref) o i v
*
* but we have to invoke by stage:
*/
IDIO setter_cmd = IDIO_LIST2 (idio_module_symbol_value (idio_S_setter,
idio_Idio_module,
idio_S_nil),
IDIO_PAIR_HT (t));
IDIO setter_r = idio_vm_invoke_C (idio_thread_current_thread (), setter_cmd);
if (! idio_isa_function (setter_r)) {
idio_debug ("(setter %s) did not yield a function\n", IDIO_PAIR_HT (t));
break;
}
IDIO set_cmd = IDIO_LIST4 (setter_r, o, i, v);
IDIO set_r = idio_vm_invoke_C (idio_thread_current_thread (), set_cmd);
return set_r;
}
}
break;
...
Printing
^^^^^^^^
Of course, with your bespoke structure, you might want a bespoke
printer. There's a mechanism there too.
The ``add-as-string`` system for adding bespoke printing for
:lname:`Idio` structures has been extended to support :lname:`C`
structures with ``idio_CSI_`` support. The printer is associated with
the ``idio_CSI_`` value such that all :lname:`C` pointers to the same
kind of ``struct`` use the same printer.
The generated printers have two parts, a :lname:`C` part
``_as_string()`` which creates the results via ``idio_display_C()``
and ``idio_display()`` and an output string handle and an
:lname:`Idio` part which calls the :lname:`C` part.
The normal structure printing is to:
#. add :samp:`#``
For some structure types there is a natural printed format. For
example, a ``struct timeval`` has seconds and micro-seconds parts and
is commonly displayed in a :samp:`{seconds}.{micro-seconds}` form.
We know that the ``tv_usec`` member of a ``struct timeval`` represent
micro-seconds and can only be six decimal digits even though its type,
``suseconds_t``, is probably a ``long``. In fact, it *must* be
displayed as six leading-0-padded digits otherwise it makes no sense.
For example, 1s and 213us would be displayed as ``1.000213`` and not
``1.213``.
Further, if there is a precision pending, say, 3, then the precision
is applied to the leading-0-padded string, not the literal ``tv_usec``
value, giving a result of ``1.000``.
.. sidebox::
You wouldn't read back in the printed form, would you?
No brownie points for you!
Note that the resultant printed form does not include the structure
name, it appears as just a floating point number. The *value* is a
``struct timeval``, it's just the printed form that looks like a
floating point number. Compare that with the printed representation
of the fixnum, 23, and the ``libc/pid_t``, 23, we had before.
The auto printing of :lname:`C` structures comes in quite handy. For
example, noting I accidentally called the external command
:program:`stat` first, we get to compare results:
.. code-block:: idio-console
Idio> stat "."
File: .
Size: 8192 Blocks: 24 IO Block: 4096 directory
Device: fd00h/64768d Inode: 17910888 Links: 3
Access: (0775/drwxrwxr-x) Uid: ( 1000/ idf) Gid: ( 1000/ idf)
Context: unconfined_u:object_r:user_home_t:s0
Access: 2021-07-05 12:24:42.072310969 +0100
Modify: 2021-07-05 12:24:43.574321757 +0100
Change: 2021-07-05 12:24:43.574321757 +0100
Birth: 2021-05-11 11:19:01.916839841 +0100
#t
Idio> libc/stat "."
#
(I've broken the CSI printout for viewing convenience -- it is one long line!)
In this case, a ``struct timespec`` with a ``tv_nsec`` field for
nano-seconds is seen for the timestamp fields. Notice the leading 0
for the access time entries.
And, not wanting to emphasise the point, those ``struct timespec``
printed representations use 19 significant digits which, if you recall
the work on :ref:`bignums`, is one too many for an accurate floating
point value. Off to inexact school for you!
Files
-----
:program:`idio-c-api-gen` takes a nominal library name as an argument,
say, ``libc``. It then seeks out an :file:`.../ext/libc` directory
where :file:`.../ext` is derived from possible directories called
:file:`.../lib/idio` in :envvar:`IDIOLIB`.
It then looks for :file:`.../ext/libc/api/libc-api.c` and compiles it
into :file:`.../ext/libc/api/libc-api.o` using the local
:file:`Makefile`.
It then runs :program:`objdump` on that :file:`.o` file and starts
generating output in :file:`.../ext/libc/gen`:
* :file:`gen/libc-api.c` (unfortunate name clash with the original
source file)
This :lname:`C` source file contains:
- primitives for the :lname:`C` ``struct`` accessors and printer
- primitives for any ``subprogram`` definitions -- excluding
``main``
**It will be incorrect!**
It is impossible to infer the correct handling of any errors. The
sample code is for the most common form of system errors.
It is also not possible to identify when, commonly, a :lname:`C`
pointer in the API is meant to be allocated by the caller.
That sort of :lname:`C` API is likely to be replaced by an
:lname:`Idio` API where the pointer is not required as an argument
but is allocated and supplied internally and subsequently returned
to the user rather than the nominal return value from the API
call.
Think of the example of :manpage:`stat(2)` where the :lname:`Idio`
user cannot supply a ``struct stat`` and have an expectation that
the value returned from a ``stat`` call is the ``struct stat``
(and not the ``int`` that :manpage:`stat(2)` returns).
- a putative :samp:`idio_{libc}_api_add_primitives()` function
- a putative :samp:`idio_init_{libc}_api()` function which will
contain:
* ``#ifdef``\ -wrappered definitions for any ``enumeration_type``\
s
* the definitions for any ``struct`` members' symbols
* the corresponding ``idio_CSI_`` definition for the ``struct``
itself
* :file:`gen/libc-api.h`
This :lname:`C` header file contains:
- a helpful description of the used ``base_type``\ s in a comment
- a generic :samp:`IDIO_USER_{lib}_TYPE_ASSERT()` macro
- a series of :lname:`C` macro expansions for each ``typedef``:
* a constructor
* an accessor
* a predicate
- declarations for the ``struct`` definitions in
:file:`gen/libc-api.c`
* :file:`gen/libc-api.idio`
This :lname:`Idio` source file contains:
- exports of:
* the ``typedef`` type mappings (for the ``C/integer->`` function)
* (commented out) the ``enumerated_type`` ``enumeration``\ s
- the definitions of the entities just exported
- the setup for some :ref:`setters`
* :file:`gen/test-libc-error.idio`
This :lname:`Idio` source file contains a reasonable attempt at a
test suite based on the known type and value tests that can be
automatically inferred from the ``struct``\ s and ``subprogram``\ s
described.
It should be able to alert when a test fails to generate the
expected error but do not rely on this.
* :file:`gen/libc-errors/*`
This directory contains putative instances of all the test cases
described above.
**It will be incorrect!**
For example, for ``struct`` test cases, to be able to reach some
test cases then an valid argument needs to be supplied.
It is impossible to infer how to create such an argument and the
sample code simply refers to :samp:`{v}`.
For example, for ``stat``-related tests which require a valid
``struct stat`` I've replaced :samp:`{v}` with ``(libc/stat ".")``
which, since we've been writing the above code, successfully returns
a ``struct stat`` with appropriate ``idio_CSI_`` definition.
Inconsistent Outputs
--------------------
You will not get the same output from all systems at the very least
because of the issues regarding ``typedef`` mappings as described
previously.
There are further complications with ``struct`` definitions and
``subprogram`` API types.
In the case of a ``struct`` you may find that some systems define
extra structure members over and above the nominal :lname:`C` API.
These should have no effect -- other than adding extra member name
symbols that you have no reason to use. I have generally removed them
from the code to reduce the members down to the portable set.
For both ``struct`` and ``subprogram`` definitions you may find that
the actual :lname:`C` API uses some of these intermediate typedefs
we've mentioned.
For example, my Fedora system seems to use ``__pid_t`` everywhere
where the nominal :lname:`C` API thinks a ``pid_t`` should be used.
There are two problems here:
#. the generated :file:`gen/libc-api.c` will be using ``__pid_t``,
eg. ``IDIO_USER_libc_TYPE_ASSERT (__pid_t, arg1);``
That's not too traumatic to fix but you need to be aware of it if
you choose a Fedora system as the source for your proto-permanent
:file:`src/libc-api.c`.
#. unless you make an effort then the typedef for ``pid_t`` itself
will be nowhere to be found
This can be trivially solved by re-writing the original source file
to say:
.. code-block:: c
pid_t pid = getpid ();
thus forcing the ``typedef`` mapping to appear.
Although, obviously, you won't know that that is required until you
discover that the expected ``pid_t`` is missing.
In other words, the creation of :file:`.../ext/libc/gen/libc-api.c`
could take a couple of iterations around the loop.
Inconsistent API
----------------
In some cases the API has changed over time. Historically, the
:manpage:`stat(2)` API had three ``time_t`` members, ``st_atime``,
``st_mtime`` and ``st_ctime``.
Since the ``time_t`` work described above the API has (mostly) become
three ``struct timespec`` members, ``st_atim``, ``st_mtim`` and
``st_ctim``. There's lots of good things here and one bad one.
.. sidebox::
The latter has potential typedef issues but appears to be a
``long`` on most systems. I guess any 16-bit systems might have an
issue.
A ``struct timespec`` has both a ``time_t`` member, ``tv_sec``, now,
of course, 64 bits wide and a nano-second-capable member, ``tv_nsec``.
The positives, then, are that we have our billions of years in a
``time_t`` *and* we have nano-second granularity.
The downside is that there is no (longer) a reference to the
``time_t``, the notional ``st_atime`` etc., in the API.
Most systems:
.. code-block:: c
#define st_atime st_atim.tv_sec
but as we know, no traces of the :lname:`C` pre-processor are left in
the object file.
That means that :program:`idio-c-api-gen` cannot generate any such
references.
Of course, that's easy enough to fix when we're patching up the
generated code for other reasons. We can manually write the code to
declare and use the extra symbols, ``st_atime`` etc., and add extra
clauses in the ``struct`` accessor primitives to handle the extra
symbols and, of course, we can correctly return the value with the
constructor ``idio_libc_time_t``.
Oddities
^^^^^^^^
Mac OS X chooses to be different and uses ``st_atimespec`` etc. as the
member names.
OpenIndiana uses a ``timespec_t`` which ought to cause us a problem as
we don't *use* a ``timespec_t`` in the source file so no ``typedef``
mapping is created. Luckily, it has typedef'd ``struct timespec`` as
``timespec_t`` and so the nominal :lname:`C` API code which access
``statp->st_atim`` etc. just works.
Many systems don't typedef a ``suseconds_t`` for a ``struct timeval``
(returned by :manpage:`gettimeofday(2)` and :manpage:`getrusage(2)`)
even though they seem to get most of the way with :lname:`C` macros
for ``__suseconds_t_defined`` or ``_SUSECONDS_T_DECLARED``.
Evolution
=========
In the first instance, `muggins
`_, here, wrote the
``libc`` interfaces by hand in :file:`src/libc-wrap.c` -- the initial
prompt to look to automate the process as I was getting fed up trying
to figure out that a ``pid_t`` was on my collection of test systems.
I can then run :program:`idio-c-api-gen` for ``libc`` and take a copy
of the resultant :file:`gen/libc-api.c` and refashion it to replace
the interfaces in :file:`src/libc-wrap.c`.
Refashioning for me consisted largely of replacing the likes of
``__pid_t`` with ``pid_t`` and query-replacing the interface argument
names.
Interfaces to the likes of :manpage:`stat(2)` require more involvement
as the user wouldn't be supplying a ``struct stat`` and we want to
return the (suitably tagged) ``struct stat`` back to the user rather
than the ``int`` that :manpage:`stat(2)` returns.
Interfaces to the likes of :manpage:`getcwd(3)` require different
error tests and something like :manpage:`mkstemp(3)` requires even
more fiddling to return the open file descriptor and the name of the
file (from the modified template passed in).
Thus :file:`src/libc-api.c` requires the definitions in
:file:`src/libc-api.h` to build and, once I'd rejigged all the callers
-- think all the ``libc`` interfaces in :file:`lib/job-control.idio`
-- :program:`idio` requires the definitions in
:file:`lib/libc-api.idio` to run.
So, :file:`src/libc-api.*` are great for me on this box. But, whilst
:file:`src/libc-api.c` has been manually tweaked to use the nominal
:lname:`C` API, :file:`src/libc-api.h` and :file:`lib/libc-api.idio`
are full of system-specific definitions.
I can't check those into source control as they're simply wrong for
anyone else.
Build Bootstrap II
------------------
Well, they're wrong but not *too* wrong which let's us play a trick.
Let's put a copy of whatever I've generated here, on my dev system, in
a :file:`src/build-bootstrap` directory which *all other systems* will
use to get going.
We can say that :program:`bin/idio` depends on a locally created
:file:`src/libc-api.h` and :file:`lib/libc-api.idio` and have a
specific rule to create those.
That specific rule can change the include paths for both the
:lname:`C` compiler and :program:`bin/idio` such that it uses the
:file:`src/build-bootstrap` directories just for long enough to run
:program:`idio-c-api-gen`.
Compiling this bootstrap version is likely to generate some warnings
about overflow and implicit constant conversions and others. We care
deeply about this and... *Look! A squirrel!*
Having run :program:`idio-c-api-gen` on this system we will have
generated correct typedef mappings in :file:`src/libc-api.h` and
:file:`lib/libc-api.idio` and :program:`make` should convince itself
to rebuild :program:`idio` because a header file has changed
(technically, appeared).
:file:`src/libc-api.c` was refashioned to use the nominal :lname:`C`
API so requires no adjustment on any other system.
.. include:: ../../commit.rst