C API

Rationale

Idio is written in C and therefore uses “the” C API. So what’s the problem?

The problem comes in the form of portability. Let’s say I want to call and use the result of getpid(2). Easy enough, it’s:

pid_t pid = getpid ();

OK. Now I need to store this pid_t to be able to pass it around as an IDIO value. And a pid_t is a what, exactly, on your system? I have access to systems where it is either an int or a long (which doesn’t tell me if an int differs from a long anyway). So, er, which one?

Another slightly more famous example is a time_t. It was (and still will be in some cases) a 32-bit entity resulting in the impending Epochalypse in January 2038. Except every modern Linux operating system doesn’t have that problem as Arnd Bergmann and John Stultz plodded though and changed the interfaces to 64-bit (kicking the Epochalyse can down the road for a few hundred billion years…).

And, another bugbear, how do I print a time_t out without upsetting the compiler on some system or another?

Underlying this is that the C API uses an opaque typedef for some base C type. For us to be able to store and transport it we could do with knowing what that type is.

In the first instance, we might just use intmax_t and uintmax_t (whatever they are) to store all C integral types and let the C compiler figure out the casting. But that doesn’t seem like Art. Not least because we still don’t know if a pid_t or a time_t is signed or unsigned – OK, we can take a look/reasonable guess and hope that everyone else uses the same.

Instead, wouldn’t it be useful if we could write code that looked like:

pid_t pid = getpid ();
IDIO ipid = idio_C_pid_t (pid);

where the idio_C_pid_t constructor is aware of whatever underlying base type a pid_t is on this system and stashes it as an int or long as appropriate.

Now that the value is an IDIO value we can pass it merrily around to anyone else. The only people who will actually care whether it is an int or a long are those manipulating it and they would want a deconstructor, IDIO_C_TYPE_pid_t, that is equally aware of the stash leaving us to write:

void idio_func (IDIO v)
{
    IDIO_USER_C_TYPE_ASSERT (pid_t, v);

    pid_t pid = IDIO_C_TYPE_pid_t (v);
    ...
}

where the type assertion knows the mapping from pid_t to int or long as well.

Conversions

Hmm, wait a minute, what if we in Idio-land want to create a pid_t? Suppose we’ve read in the output of ps and decide we need to kill(2) some egregious process? We’ll have started with some string which we can convert into an Idio number with read-number but we, probably, don’t even know if that number is even a fixnum or a bignum. How are we going to create a pid_t?

We do have C/integer-> to help us which can be given a clue as to which C type to create.

So, we need to pass it some clue as to which of the fourteen C base types we want. Obviously, we’re going to pass it pid_t – because that’s all we want to know about this value – which needs to map to some symbol (the usual Scheme-ish way) which eventually resolves to either 'int or 'long dependent on system.

Idio> C/integer-> 23 libc/pid_t
23

In practice, pid_t is really libc/pid_t, ie. to accommodate different libraries’ potential name clashes we’ll prefix the typedef name with the module the name comes from.

Those names are exported from libc but so are many many other names with a huge potential for clashes:

import libc

can be interesting.

Also notice that the printed representation of a C pid_t is, perhaps unsurprisingly, indistinguishable from the fixnum we constructed it from! I’m not sure there’s a useful solution to this.

There is the similar C/->integer which will return an Idio fixnum or bignum (depending on the size of the C value). On a 64-bit machine:

Idio> fixnum? (C/->integer libc/INT_MAX)
#t
Idio> fixnum? (C/->integer libc/INTMAX_MAX)
#f

On a 32-bit system, libc/INT_MAX is also a bignum.

Predicates

In the same manner any libc-oriented code will want to be able to test that some Idio value is a libc/pid_t. That predicate should, by rights, be libc/pid_t?.

Given that pid_t is a typedef to int (or long or whatever) then we’re going to need a mapping from libc/pid_t? through to the C domain’s C/int? or C/long? as appropriate.

Note that the change, arguably, chain, from a C library (ie. Idio module) into the C domain’s (fourteen) base types might well involve some multi-library hops depending on how the typedefs roll.

Caveats

The main caveat is that if your interface uses #define then this mechanism won’t help you. #defines are part of the C pre-processor and so those definitions will have disappeared by the time the C compiler has its way and we get to look at the generated code.

Another issue is that the set of values that some parameters take might be system-dependent. Think of the resource parameter to getrlimit(2), the likes of RLIMIT_NOFILE, RLIMIT_NPROC etc.. On the one hand that ought not be an issue except that it presumes we do, somehow, know what values have been defined on this system. Some Linux systems have migrated these values into an enumerated type, which we can see, and maintained the C macros for backwards compatibility. Other systems are #define only.

For these to be portable you will have to resort to using something along the lines of the error, signal and rlimit value tests:

src/libc-wrap.c
#if defined (EBADF)
    IDIO_LIBC_ERRNO (EBADF);
#endif

and empirical knowledge of the ported systems.

Namespaces

I started out distinguishing between the base C types in the existing C namespace and those new typedefs being introduced from libc in the libc namespace before thinking that C doesn’t have any namespaces, what am I doing?

However, C++ does have namespaces so I guess it would be prudent to maintain this artiface.

This will lead to the mildly confusing idio_libc_pid_t constructor and some idio_libfoo_pid_t constructor from some libfoo API which also happens to use a pid_t in its interface.

So, to correct the earlier names, the proper convention is that for true C domain types (int, long etc.) we’ll use:

IDIO_USER_C_TYPE_ASSERT ({base-type}, v);

{base-type} C_v = IDIO_C_TYPE_{base-type} (v);

IDIO r = idio_C_{base-type} (C_v);

and for some libfoo API using a typedef’d symbol we’ll have:

IDIO_USER_{libfoo}_TYPE_ASSERT ({type-def}, v);

{type-def} C_v = IDIO_{libfoo}_{type-def} (v);

IDIO r = idio_{libfoo}_{type-def} (C_v);

therefore, for the pid_t in libc example, we’ll have:

IDIO_USER_libc_TYPE_ASSERT (pid_t, v);

pid_t C_v = IDIO_libc_pid_t (v);

IDIO r = idio_libc_pid_t (C_v);

Build Bootstrap I

We’ll come back to this but Idio uses the C API for libc to run and yet we’re somehow using Idio to build that libc interface. How does that work?

Well, we cheat, of course. In the first instance there are some libc API files that are at least consistent, if not necessarily correct for this system, these are in build-bootstrap subdirectories.

That is enough to allow us to run idio-c-api-gen to build correct libc API files for this system. The presence of which prompts make to rebuild idio.

DWARF

DWARF is our friend, here. One observation is that the debugger seems to know a lot about the types in our system and if we investigate the output of a “debug” object file it turns out to contain a lot of information about the types in the object file.

But, as noted, nothing from the C pre-processor.

objdump seems to exist on all the systems I use which produces a wealth of information in text format. Given we’ll have to parse something, text format seems a lot easier for us.

DIEs

The output is in the form of cascades of DIEs. These look something like:

<1><50>: Abbrev Number: 2 (DW_TAG_base_type)
   <51>   DW_AT_byte_size   : 2
   <52>   DW_AT_encoding    : 5        (signed)
   <53>   DW_AT_name        : (indirect string, offset: 0x9c4): short int
<1><57>: Abbrev Number: 3 (DW_TAG_base_type)
   <58>   DW_AT_byte_size   : 4
   <59>   DW_AT_encoding    : 5        (signed)
   <5a>   DW_AT_name        : int

The output varies with:

  • the (historic) version of objdump changes the exact format

  • the set of entries is dependent on the types used in the object file (duh!)

  • the order of output varies on the contents of the object file

However, reading between the lines we can see there appear to be two base_types defined in that snippet:

  • a short int with a byte_size of 2 is type <50>

  • an int with a byte_size of 4 is type <57>

The sizes seem about right for this system for a start!

The encoding attribute is interesting as it is a clue as to how to interpret those 2 or 4 bytes. signed, signed char, unsigned and float are examples. I don’t think in C there’s really very much going on, here, but I get the impression that the encoding allows for much more like the UTF-8/UCS-2-style of encoding.

Note

A DIE without a type uses the implied type of void.

pid_t

Let’s try a pid_t:

src/libc-api.c
int main (int argc, char ** argv)
{
    pid_t pid = getpid ();

    return 0;
}

which compiling, gcc -g -c -o libc-api.o libc-api.c gives us (amongst other things):

<1><57>: Abbrev Number: 3 (DW_TAG_base_type)
   <58>   DW_AT_byte_size   : 4
   <59>   DW_AT_encoding    : 5        (signed)
   <5a>   DW_AT_name        : int

<1><e9>: Abbrev Number: 4 (DW_TAG_typedef)
   <ea>   DW_AT_name        : (indirect string, offset: 0x7f1): __pid_t
   <ee>   DW_AT_decl_file   : 2
   <ef>   DW_AT_decl_line   : 154
   <f0>   DW_AT_decl_column : 25
   <f1>   DW_AT_type        : <0x57>

<1><1ea>: Abbrev Number: 4 (DW_TAG_typedef)
   <1eb>   DW_AT_name        : (indirect string, offset: 0x727): pid_t
   <1ef>   DW_AT_decl_file   : 3
   <1f0>   DW_AT_decl_line   : 97
   <1f1>   DW_AT_decl_column : 17
   <1f2>   DW_AT_type        : <0xe9>

The third DIE shown, <1ea> at the bottom, is a typedef for pid_t and has some clues about a file and that its type is <0xe9>.

The middle DIE shown, is <e9> which is itself a typedef for __pid_t from a different file (you would guess) and in turn has its type as <0x57>.

The first DIE shown is <57> and is a base_type, an int.

Sweet!

As noted, another system typedefs pid_t directly to a long int base type.

That raises another spectre. Where has this __pid_t type come from and is it portable?

Well, the answer is that it is obviously some local definition and is clearly not portable so we need to be more careful. Not only are the ultimate base types of the typedefs different but the mappings through to them are different too.

That might come back to bite us, it might not.

*

In the meanwhile we can see a means to automatically generate many of the interfaces we’ve been talking about, most of which are simply redefinitions of existing base types.

In the C world, you can imagine creating:

src/libc-api.h
#define IDIO_TYPE_C_libc___pid_t            IDIO_TYPE_C_INT
#define idio_libc___pid_t                   idio_C_int
#define IDIO_C_TYPE_libc___pid_t            IDIO_C_TYPE_int
#define idio_isa_libc___pid_t               idio_isa_C_int

#define IDIO_TYPE_C_libc_pid_t              IDIO_TYPE_C_libc___pid_t
#define idio_libc_pid_t                     idio_libc___pid_t
#define IDIO_C_TYPE_libc_pid_t              IDIO_C_TYPE_libc___pid_t
#define idio_isa_libc_pid_t                 idio_isa_libc___pid_t

Here,

  • IDIO_TYPE_C_module_name is a mapping from the C constructed name to an Idio src/gc.h compatible macro for the C base types

    This allows us to figure out the correct printf(3) format string for a given type.

  • idio_module_name is the constructor

  • IDIO_C_TYPE_module_name is the destructor

    An unfortunate naming near-miss with IDIO_TYPE_C_module_name.

  • idio_isa_module_name is a predicate

The IDIO_USER_module_TYPE_ASSERT(type,x) is a generic macro which ultimately is going to call idio_isa_module_type (x) which we’ve just defined.

and for the Idio world:

lib/libc-api.idio
export (
        __pid_t
        __pid_t?

        pid_t
        pid_t?
        )

__pid_t                            := 'int
define __pid_t?                       C/int?

pid_t                              := __pid_t
define pid_t?                         libc/__pid_t?

Meaning that some Idio code can:

pid := ...
c-pid := C/integer-> pid libc/pid_t

and everything lines up!

Structures and Unions

Structures and unions are all “lifted” up to the top level, even if they are only used inside another structure or union.

That said, they are very simple being the structure_type itself (with or without a name) and a sequence of members each of which has a type in turn.

Hmm, now we know about the fields in a structure, we ought to be able to generate some code to be able to access the fields.

From Idio you can imagine wanting to access a stat(2) struct stat along the lines of:

sb := libc/stat "."
printf "size=%s\n" sb.st_size

which is utilizing the value-index operator . to access the st_size member using a symbol, st_size.

What do we need for that? Well, we should define the symbol, for a start, then we need a function that can poke about in the struct stat, let’s call it struct-stat-ref:

IDIO_SYMBOL_DECL (st_dev);
IDIO_SYMBOL_DECL (st_ino);
...
IDIO_SYMBOL_DECL (st_size);
...

IDIO_DEFINE_PRIMITIVE2_DS ("struct-stat-ref", libc_struct_stat_ref, (IDIO stat, IDIO member), "stat member", "\
in C, stat->member                      \n\
                                        \n\
:param stat: C struct stat      \n\
:type stat: C/pointer           \n\
:param member: C struct member          \n\
:type member: symbol                    \n\
:return: stat->member           \n\
:rtype: varies on member                \n\
")
{
    IDIO_ASSERT (stat);
    IDIO_ASSERT (member);

    /*
     * Test Case: libc-errors/struct-stat-ref-bad-pointer-type.idio
     *
     * struct-stat-ref #t #t
     */
    IDIO_USER_C_TYPE_ASSERT (pointer, stat);
    /*
     * Test Case: libc-errors/struct-stat-ref-bad-member-type.idio
     *
     * struct-stat-ref v #t
     */
    IDIO_USER_TYPE_ASSERT (symbol, member);

    ... can we check that stat is a pointer to a struct stat?

    struct stat *statp = IDIO_C_TYPE_POINTER_P (stat);
    if (idio_S_st_dev == member) {
        return idio_libc_dev_t (statp->st_dev);
    } else if (idio_S_st_ino == member) {
    ...
    } else if (idio_S_st_size == member) {
        return idio_libc_off_t (statp->st_size);
    } else
    ...
}

We should be able to generate a similar struct-stat-set! although whether such a function is warranted is another question. a priori it’s a valid operation.

For the answer to testing whether stat really is a struct stat, see Subprograms, below.

Separately, though related, there is the question, about how the system might know that it should use struct-stat-ref, in particular, when de-structuring some (random) C pointer, rather than any other primitive (or function). More on that below.

Pointers

Pointers are flagged en route to the real type underneath. Here’s the case for the formal_parameter argv:

<1><178>: Abbrev Number: 9 (DW_TAG_pointer_type)
   <179>   DW_AT_byte_size   : 8
   <17a>   DW_AT_type        : <0x17e>
<1><17e>: Abbrev Number: 2 (DW_TAG_base_type)
   <17f>   DW_AT_byte_size   : 1
   <180>   DW_AT_encoding    : 6       (signed char)
   <181>   DW_AT_name        : (indirect string, offset: 0x351): char
<1><2fa>: Abbrev Number: 9 (DW_TAG_pointer_type)
   <2fb>   DW_AT_byte_size   : 8
   <2fc>   DW_AT_type        : <0x178>
<2><115d>: Abbrev Number: 36 (DW_TAG_formal_parameter)
   <115e>   DW_AT_name        : (indirect string, offset: 0x57b): argv
   <1162>   DW_AT_decl_file   : 1
   <1163>   DW_AT_decl_line   : 55
   <1164>   DW_AT_decl_column : 28
   <1165>   DW_AT_type        : <0x2fa>
   <1169>   DW_AT_location    : 4 byte block: 91 f0 b9 7f      (DW_OP_fbreg: -8976)

Which you should be able to see both that argv is a pointer to a pointer to a char, ie. char **argv and that DIEs are not necessarily printed in an obvious order.

Arrays

Array types are a bit more subtle. They are described distinctly from pointers, as they should be, though we might use them indiscriminately from pointers.

In this example, all of the member of the struct utsname are char [] rather than char *:

<1><c80>: Abbrev Number: 28 (DW_TAG_structure_type)
   <c81>   DW_AT_name        : (indirect string, offset: 0x1e8): utsname
   <c85>   DW_AT_byte_size   : 390
   <c87>   DW_AT_decl_file   : 23
   <c88>   DW_AT_decl_line   : 48
   <c89>   DW_AT_decl_column : 8
   <c8a>   DW_AT_sibling     : <0xcdf>
<2><c8e>: Abbrev Number: 12 (DW_TAG_member)
   <c8f>   DW_AT_name        : (indirect string, offset: 0x5e1): sysname
   <c93>   DW_AT_decl_file   : 23
   <c94>   DW_AT_decl_line   : 51
   <c95>   DW_AT_decl_column : 10
   <c96>   DW_AT_type        : <0xcdf>
   <c9a>   DW_AT_data_member_location: 0
<2><c9b>: Abbrev Number: 12 (DW_TAG_member)
   <c9c>   DW_AT_name        : (indirect string, offset: 0x459): nodename
   <ca0>   DW_AT_decl_file   : 23
   <ca1>   DW_AT_decl_line   : 54
   <ca2>   DW_AT_decl_column : 10
   <ca3>   DW_AT_type        : <0xcdf>
   <ca7>   DW_AT_data_member_location: 65

<1><cdf>: Abbrev Number: 5 (DW_TAG_array_type)
   <ce0>   DW_AT_type        : <0x17e>
   <ce4>   DW_AT_sibling     : <0xcef>

<17e> was the char in the pointer example, above.

Notice also, part of the ABI (as opposed to the API) is exposed in that the data_member_location for nodename is 65 bytes along suggesting that sysname is defined as char sysname[65] on this system.

uname(2) on this system notes:

The length of the fields in the struct varies. Some operating systems or libraries use a hardcoded 9 or 33 or 65 or 257. Other systems use SYS_NMLN or _SYS_NMLN or UTSLEN or _UTSNAME_LENGTH. Clearly, it is a bad idea to use any of these constants; just use sizeof(…). Often 257 is chosen in order to have room for an internet hostname.

Enumerated Types

Enumerated types have a type and a number of enumerators each of which has a const_value.

The wording around the type for an enumerated type is a bit loose and merely suggests something big enough to hold the entire set of enumerators.

I can see both int and unsigned int enumerated types.

Subprograms

I’m not sure what subprogram is meant to describe but you will usually get to see main or whatever the main enclosing function is.

The information provided includes the formal parameters and any variables used in the function (possibly including any lexical blocks they exist in).

However, rather usefully, on some systems it will also have a subprogram entry for each function it calls.

This is strikingly useful as it only requires one system, using portable definitions, to produce this information from which we can sketch out the framework for a series of IDIO_DEFINE_PRIMITIVE functions describing those calls. Once defined they are, by definition(?), using the portable C API and are therefore applicable to all systems.

Due to the absence of debugging information in, in my case, libc, there’s no formal parameter names for the system and library calls I’m referencing but we do get the formal parameter types and the return type. We can obviously invent some arguments names, arg1, arg2, etc..

For kill(2), whose prototype looks like:

int kill(pid_t pid, int sig);

this becomes:

IDIO_DEFINE_PRIMITIVE2_DS ("kill", libc_kill, (IDIO arg1, IDIO arg2), "arg1 arg2", "\
in C: kill (arg1, arg2)         \n\
a wrapper to libc kill()                \n\
                                        \n\
:param arg1:                            \n\
:type arg1: libc/__pid_t                        \n\
:param arg2:                            \n\
:type arg2: C/int                       \n\
:return:
:rtype: C/int
")
{
    IDIO_ASSERT (arg1);
    IDIO_ASSERT (arg2);

   /*
    * Test Case: libc-errors/kill-bad-arg1-type.idio
    *
    * kill #t #t
    */
    IDIO_USER_libc_TYPE_ASSERT (__pid_t, arg1);
    __pid_t C_arg1 = IDIO_C_TYPE_libc___pid_t (arg1);

   /*
    * Test Case: libc-errors/kill-bad-arg2-type.idio
    *
    * kill #t #t
    */
    IDIO_USER_C_TYPE_ASSERT (int, arg2);
    int C_arg2 = IDIO_C_TYPE_int (arg2);

    int kill_r = kill (C_arg1, C_arg2);

    /* check for errors */
    if (-1 == kill_r) {
        idio_error_system_errno ("kill", idio_S_nil, IDIO_C_FUNC_LOCATION ());

        return idio_S_notreached;
    }


    /*
     * WARNING: this is probably an incorrect return
     */
    return idio_C_int (kill_r);

}

Now, that’s not too shabby for something automatically generated.

*

If we do install the libc debugging symbols we still don’t get formal_parameter names – well, for the subprograms we’re interested in, anyway.

Installing the debugging symbols isn’t quite as obvious as you might think. In the case of Fedora these are in separate packages in disabled repos but on the plus side tools like gdb know to go looking for them and programs like objdump can be persuaded with an extra argument (K in this case).

$ sudo dnf --enablerepo=fedora-debuginfo debuginfo-install glibc-debuginfo
$ objdump -WilK /lib64/libc-2.33.so
<1><3452f>: Abbrev Number: 52 (DW_TAG_subprogram)
   <34530>   DW_AT_external    : 1
   <34530>   DW_AT_name        : (indirect string, offset: 0x108c0): kill
   <34534>   DW_AT_decl_file   : 93
   <34535>   DW_AT_decl_line   : 112
   <34536>   DW_AT_decl_column : 12
   <34537>   DW_AT_prototyped  : 1
   <34537>   DW_AT_type        : <0x2a>
   <3453b>   DW_AT_declaration : 1
   <3453b>   DW_AT_sibling     : <0x34547>
<2><3453c>: Abbrev Number: 24 (DW_TAG_formal_parameter)
   <3453d>   DW_AT_type        : <0x2341>
<2><34541>: Abbrev Number: 24 (DW_TAG_formal_parameter)
   <34542>   DW_AT_type        : <0x2a>
<2><34546>: Abbrev Number: 0

Some subprograms do have named formal parameters in libc which makes me think the construction of things might be considerably more interesting than at first blush.

Wait a minute, isn’t there some __kill nonsense floating about with system calls? Hmm. The __kill subprogram also declines to offer us any formal parameter names. So let’s dig deeper:

$ sudo dnf --enablerepo=fedora-debuginfo debuginfo-install kernel-debuginfo-$(uname -r)
$ cd /usr/lib/debug/lib/modules/$(uname -r)
$ nm vmlinux | grep '[tT] kill_'
...
ffffffff810ecb50 T kill_pgrp
ffffffff810ecc20 T kill_pid
ffffffff810ecb90 T kill_pid_info
ffffffff810ea580 T kill_pid_usb_asyncio
ffffffff8131ecc0 t kill_procs
...

I don’t think we’re in Kansas any more, Toto!

$ objdump -WilK vmlinux  | less +/kill_pid
<1><10e9d79>: Abbrev Number: 53 (DW_TAG_subprogram)
   <10e9d7a>   DW_AT_external    : 1
   <10e9d7a>   DW_AT_name        : (indirect string, offset: 0x179603): kill_pid
   <10e9d7e>   DW_AT_decl_file   : 2
   <10e9d7f>   DW_AT_decl_line   : 1793
   <10e9d81>   DW_AT_decl_column : 5
   <10e9d82>   DW_AT_prototyped  : 1
   <10e9d82>   DW_AT_type        : <0x10bcbd0>
   <10e9d86>   DW_AT_low_pc      : 0xffffffff810ecc20
   <10e9d8e>   DW_AT_high_pc     : 0x1a
   <10e9d96>   DW_AT_frame_base  : 1 byte block: 9c    (DW_OP_call_frame_cfa)
   <10e9d98>   DW_AT_GNU_all_call_sites: 1
   <10e9d98>   DW_AT_sibling     : <0x10e9e07>
<2><10e9d9c>: Abbrev Number: 51 (DW_TAG_formal_parameter)
   <10e9d9d>   DW_AT_name        : pid
   <10e9da1>   DW_AT_decl_file   : 2
   <10e9da2>   DW_AT_decl_line   : 1793
   <10e9da4>   DW_AT_decl_column : 26
   <10e9da5>   DW_AT_type        : <0x10c3638>
   <10e9da9>   DW_AT_location    : 0x3a232c (location list)
   <10e9dad>   DW_AT_GNU_locviews: 0x3a2326
<2><10e9db1>: Abbrev Number: 51 (DW_TAG_formal_parameter)
   <10e9db2>   DW_AT_name        : sig
   <10e9db6>   DW_AT_decl_file   : 2
   <10e9db7>   DW_AT_decl_line   : 1793
   <10e9db9>   DW_AT_decl_column : 35
   <10e9dba>   DW_AT_type        : <0x10bcbd0>
   <10e9dbe>   DW_AT_location    : 0x3a237e (location list)
   <10e9dc2>   DW_AT_GNU_locviews: 0x3a2378
<2><10e9dc6>: Abbrev Number: 28 (DW_TAG_formal_parameter)
   <10e9dc7>   DW_AT_name        : (indirect string, offset: 0x3b5017): priv
   <10e9dcb>   DW_AT_decl_file   : 2
   <10e9dcc>   DW_AT_decl_line   : 1793
   <10e9dce>   DW_AT_decl_column : 44
   <10e9dcf>   DW_AT_type        : <0x10bcbd0>
   <10e9dd3>   DW_AT_location    : 0x3a23ce (location list)
   <10e9dd7>   DW_AT_GNU_locviews: 0x3a23ca
<2><10e9ddb>: Abbrev Number: 90 (DW_TAG_GNU_call_site)
   ...
<3><10e9e05>: Abbrev Number: 0
<2><10e9e06>: Abbrev Number: 0

Hmm, three parameters. I think we’ve gone too deep.

So, query-replace argn seems fine to me. We almost certainly need to tweak the code anyway.

*

We have a portability issue as Fedora has defined the API with __pid_t but if we query-replaced the double-underscore we’re in a much better position.

We can obviously query-replace arg1 for pid and arg2 for sig leaving us with:

IDIO_DEFINE_PRIMITIVE2_DS ("kill", libc_kill, (IDIO pid, IDIO sig), "pid sig", "\
in C: kill (pid, sig)         \n\
a wrapper to libc kill()                \n\
                                        \n\
:param pid:                            \n\
:type pid: libc/pid_t                        \n\
:param sig:                            \n\
:type sig: C/int                       \n\
:return:
:rtype: C/int
")
{
    IDIO_ASSERT (pid);
    IDIO_ASSERT (sig);

   /*
    * Test Case: libc-errors/kill-bad-pid-type.idio
    *
    * kill #t #t
    */
    IDIO_USER_libc_TYPE_ASSERT (pid_t, pid);
    pid_t C_pid = IDIO_C_TYPE_libc_pid_t (pid);

   /*
    * Test Case: libc-errors/kill-bad-sig-type.idio
    *
    * kill #t #t
    */
    IDIO_USER_C_TYPE_ASSERT (int, sig);
    int C_sig = IDIO_C_TYPE_int (sig);

    int kill_r = kill (C_pid, C_sig);

    /* check for errors */
    if (-1 == kill_r) {
        idio_error_system_errno ("kill", idio_S_nil, IDIO_C_FUNC_LOCATION ());

        return idio_S_notreached;
    }


    /*
     * WARNING: this is probably an incorrect return
     */
    return idio_C_int (kill_r);

}

and we may just have a working, portable, interface to the libc API!

Re-imagining APIs

Now, this almost certainly won’t work for you off the bat as I’ve clearly directed the automatic code generation to handle the most common form of error that system and library calls produce and assumed you can directly return the value from the API call.

For calls like getcwd(3) the value returned is a char * and for error checking we should be comparing to NULL.

For something like times(3) which is expecting a struct tms to be supplied (unlikely from Idio-land!) but also the returned value is a clock_t. Here, whilst the “check for errors” is nominally correct we should be retaining the value and we will end up returning a list of the clock_t and the struct tms back to the user.

Similarly, a direct copy of the C API is not usefully correct for something like stat(2) where the user is in no position to create a struct stat to pass as an argument. In this case we would only accept a pathname argument and allocate a struct stat to be freed later.

This leads to the idea that the automatic code generation will give us a starter for ten which we can edit into permanence.

Auto-Application of Methods

Now, about that struct stat we’ve just allocated. We’ve gone to the trouble of creating struct-stat-ref (and the moot struct-stat-set!) to manipulate it. How might we get that to happen auto-magically?

Well, suppose we associate with the C_pointer in the IDIO value we’re about to return some useful type information. What’s useful? Hmm, how about a couple of things: a string, "libc/struct-stat" (useful for reporting) and a reference to struct-stat-ref?

In the latter case, we don’t need to also add struct-stat-set! as we can use our trusty Setters mechanism to do the right thing.

We want something that is unique for each type and a pair always falls into that category – even if the contents are the same the actual pair itself is unique in memory.

So, at the time we define our structure, let’s throw out a “C Structure Identification”:

IDIO_C_STRUCT_IDENT_DECL (libc_struct_stat);
IDIO_SYMBOL_DECL (st_dev);
...

and then a bit later on, when we’re adding primitives etc. we can expand that into a list:

IDIO fgvi = IDIO_EXPORT_MODULE_PRIMITIVE (idio_libc_module, libc_struct_stat_ref);
IDIO_C_STRUCT_IDENT_DEF ("libc/struct-stat", libc_struct_stat, fgvi);
IDIO_EXPORT_MODULE_PRIMITIVE (idio_libc_module, libc_struct_stat_set);

Here, we take advantage of the fact that IDIO_EXPORT_MODULE_PRIMITIVE always returned the actual primitive reference (the value associated with the symbol struct-stat-ref) – except we normally throw it away.

We can stick that reference in a list through the C macro IDIO_C_STRUCT_IDENT_DEF.

In practice, it defines a name in C-land, idio_CSI_libc_struct_stat whose value is the list ("struct stat" struct-stat-ref).

So, we have something unique per structure that we can associate with a pointer and the stat primitive can return:

return idio_C_pointer_type (idio_CSI_libc_struct_stat, statp);

Now we can revisit struct-stat-ref and perform the check:

IDIO_USER_C_TYPE_ASSERT (pointer, stat);
if (idio_CSI_libc_struct_stat != IDIO_C_TYPE_POINTER_PTYPE (stat)) {
    idio_error_param_value ("stat", "should be a libc/struct-stat", IDIO_C_FUNC_LOCATION ());

    return idio_S_notreached;
}

and we can extend the code for value-index to have a nosey in any C pointers that come its way:

src/util.c
         ...
         case IDIO_TYPE_C_POINTER:
             {
                 IDIO t = IDIO_C_TYPE_POINTER_PTYPE (o);
                 if (idio_S_nil != t) {
                     IDIO cmd = IDIO_LIST3 (IDIO_PAIR_HT (t), o, i);

                     IDIO r = idio_vm_invoke_C (idio_thread_current_thread (), cmd);

                     return r;
                 }
             }
             break;
         ...

We need to do a couple of changes for the -set! side of things to work.

In Idio-land we can check that both -ref and -set! primitives exist – after all, someone might deem modifying a struct foo a poor move and have removed struct-stat-set! – but the auto-generated lib/libc-api.idio doesn’t know that:

lib/libc-api.idio
if (and (function? struct-stat-ref)
        (function? struct-stat-set!)) {
  set! (setter struct-stat-ref) struct-stat-set!
}

and the code for set-value-index! can be updated for C pointers as well:

src/util.c
         ...
         case IDIO_TYPE_C_POINTER:
             {
                 IDIO t = IDIO_C_TYPE_POINTER_PTYPE (o);
                 if (idio_S_nil != t) {
                     /*
                      * We want: (setter ref) o i v
                      *
                      * but we have to invoke by stage:
                      */
                     IDIO setter_cmd = IDIO_LIST2 (idio_module_symbol_value (idio_S_setter,
                                                                             idio_Idio_module,
                                                                             idio_S_nil),
                                                   IDIO_PAIR_HT (t));

                     IDIO setter_r = idio_vm_invoke_C (idio_thread_current_thread (), setter_cmd);

                     if (! idio_isa_function (setter_r)) {
                         idio_debug ("(setter %s) did not yield a function\n", IDIO_PAIR_HT (t));
                         break;
                     }

                     IDIO set_cmd = IDIO_LIST4 (setter_r, o, i, v);

                     IDIO set_r = idio_vm_invoke_C (idio_thread_current_thread (), set_cmd);

                     return set_r;
                 }
             }
             break;
         ...

Printing

Of course, with your bespoke structure, you might want a bespoke printer. There’s a mechanism there too.

The add-as-string system for adding bespoke printing for Idio structures has been extended to support C structures with idio_CSI_ support. The printer is associated with the idio_CSI_ value such that all C pointers to the same kind of struct use the same printer.

The generated printers have two parts, a C part _as_string() which creates the results via idio_display_C() and idio_display() and an output string handle and an Idio part which calls the C part.

The normal structure printing is to:

  1. add #<CSI structure-name

  2. for each structure member:

    1. add member-name:

    2. add the printed form of the structure member

      This requires a helper function, idio_C_type_format_string() which can map, say, libc/pid_t into the appropriate printf(3) format string (probably, %ld or %d)

  3. add >

For some structure types there is a natural printed format. For example, a struct timeval has seconds and micro-seconds parts and is commonly displayed in a seconds.micro-seconds form.

We know that the tv_usec member of a struct timeval represent micro-seconds and can only be six decimal digits even though its type, suseconds_t, is probably a long. In fact, it must be displayed as six leading-0-padded digits otherwise it makes no sense. For example, 1s and 213us would be displayed as 1.000213 and not 1.213.

Further, if there is a precision pending, say, 3, then the precision is applied to the leading-0-padded string, not the literal tv_usec value, giving a result of 1.000.

Note that the resultant printed form does not include the structure name, it appears as just a floating point number. The value is a struct timeval, it’s just the printed form that looks like a floating point number. Compare that with the printed representation of the fixnum, 23, and the libc/pid_t, 23, we had before.

The auto printing of C structures comes in quite handy. For example, noting I accidentally called the external command stat first, we get to compare results:

Idio> stat "."
  File: .
  Size: 8192            Blocks: 24         IO Block: 4096   directory
Device: fd00h/64768d    Inode: 17910888    Links: 3
Access: (0775/drwxrwxr-x)  Uid: ( 1000/     idf)   Gid: ( 1000/     idf)
Context: unconfined_u:object_r:user_home_t:s0
Access: 2021-07-05 12:24:42.072310969 +0100
Modify: 2021-07-05 12:24:43.574321757 +0100
Change: 2021-07-05 12:24:43.574321757 +0100
 Birth: 2021-05-11 11:19:01.916839841 +0100
#t
Idio> libc/stat "."
#<CSI libc/struct-stat
      st_dev:64768
      st_ino:17910888
      st_nlink:3
      st_mode:16893
      st_uid:1000
      st_gid:1000
      st_rdev:0
      st_size:8192
      st_blksize:4096
      st_blocks:24
      st_atim:1625484282.072310969
      st_mtim:1625484283.574321757
      st_ctim:1625484283.574321757>

(I’ve broken the CSI printout for viewing convenience – it is one long line!)

In this case, a struct timespec with a tv_nsec field for nano-seconds is seen for the timestamp fields. Notice the leading 0 for the access time entries.

And, not wanting to emphasise the point, those struct timespec printed representations use 19 significant digits which, if you recall the work on Bignums, is one too many for an accurate floating point value. Off to inexact school for you!

Files

idio-c-api-gen takes a nominal library name as an argument, say, libc. It then seeks out an .../ext/libc directory where .../ext is derived from possible directories called .../lib/idio in IDIOLIB.

It then looks for .../ext/libc/api/libc-api.c and compiles it into .../ext/libc/api/libc-api.o using the local Makefile.

It then runs objdump on that .o file and starts generating output in .../ext/libc/gen:

  • gen/libc-api.c (unfortunate name clash with the original source file)

    This C source file contains:

    • primitives for the C struct accessors and printer

    • primitives for any subprogram definitions – excluding main

      It will be incorrect!

      It is impossible to infer the correct handling of any errors. The sample code is for the most common form of system errors.

      It is also not possible to identify when, commonly, a C pointer in the API is meant to be allocated by the caller.

      That sort of C API is likely to be replaced by an Idio API where the pointer is not required as an argument but is allocated and supplied internally and subsequently returned to the user rather than the nominal return value from the API call.

      Think of the example of stat(2) where the Idio user cannot supply a struct stat and have an expectation that the value returned from a stat call is the struct stat (and not the int that stat(2) returns).

    • a putative idio_libc_api_add_primitives() function

    • a putative idio_init_libc_api() function which will contain:

      • #ifdef-wrappered definitions for any enumeration_types

      • the definitions for any struct members’ symbols

      • the corresponding idio_CSI_ definition for the struct itself

  • gen/libc-api.h

    This C header file contains:

    • a helpful description of the used base_types in a comment

    • a generic IDIO_USER_lib_TYPE_ASSERT() macro

    • a series of C macro expansions for each typedef:

      • a constructor

      • an accessor

      • a predicate

    • declarations for the struct definitions in gen/libc-api.c

  • gen/libc-api.idio

    This Idio source file contains:

    • exports of:

      • the typedef type mappings (for the C/integer-> function)

      • (commented out) the enumerated_type enumerations

    • the definitions of the entities just exported

    • the setup for some Setters

  • gen/test-libc-error.idio

    This Idio source file contains a reasonable attempt at a test suite based on the known type and value tests that can be automatically inferred from the structs and subprograms described.

    It should be able to alert when a test fails to generate the expected error but do not rely on this.

  • gen/libc-errors/*

    This directory contains putative instances of all the test cases described above.

    It will be incorrect!

    For example, for struct test cases, to be able to reach some test cases then an valid argument needs to be supplied.

    It is impossible to infer how to create such an argument and the sample code simply refers to v.

    For example, for stat-related tests which require a valid struct stat I’ve replaced v with (libc/stat ".") which, since we’ve been writing the above code, successfully returns a struct stat with appropriate idio_CSI_ definition.

Inconsistent Outputs

You will not get the same output from all systems at the very least because of the issues regarding typedef mappings as described previously.

There are further complications with struct definitions and subprogram API types.

In the case of a struct you may find that some systems define extra structure members over and above the nominal C API. These should have no effect – other than adding extra member name symbols that you have no reason to use. I have generally removed them from the code to reduce the members down to the portable set.

For both struct and subprogram definitions you may find that the actual C API uses some of these intermediate typedefs we’ve mentioned.

For example, my Fedora system seems to use __pid_t everywhere where the nominal C API thinks a pid_t should be used.

There are two problems here:

  1. the generated gen/libc-api.c will be using __pid_t, eg. IDIO_USER_libc_TYPE_ASSERT (__pid_t, arg1);

    That’s not too traumatic to fix but you need to be aware of it if you choose a Fedora system as the source for your proto-permanent src/libc-api.c.

  2. unless you make an effort then the typedef for pid_t itself will be nowhere to be found

    This can be trivially solved by re-writing the original source file to say:

    pid_t pid = getpid ();
    

    thus forcing the typedef mapping to appear.

    Although, obviously, you won’t know that that is required until you discover that the expected pid_t is missing.

    In other words, the creation of .../ext/libc/gen/libc-api.c could take a couple of iterations around the loop.

Inconsistent API

In some cases the API has changed over time. Historically, the stat(2) API had three time_t members, st_atime, st_mtime and st_ctime.

Since the time_t work described above the API has (mostly) become three struct timespec members, st_atim, st_mtim and st_ctim. There’s lots of good things here and one bad one.

A struct timespec has both a time_t member, tv_sec, now, of course, 64 bits wide and a nano-second-capable member, tv_nsec.

The positives, then, are that we have our billions of years in a time_t and we have nano-second granularity.

The downside is that there is no (longer) a reference to the time_t, the notional st_atime etc., in the API.

Most systems:

#define st_atime st_atim.tv_sec

but as we know, no traces of the C pre-processor are left in the object file.

That means that idio-c-api-gen cannot generate any such references.

Of course, that’s easy enough to fix when we’re patching up the generated code for other reasons. We can manually write the code to declare and use the extra symbols, st_atime etc., and add extra clauses in the struct accessor primitives to handle the extra symbols and, of course, we can correctly return the value with the constructor idio_libc_time_t.

Oddities

Mac OS X chooses to be different and uses st_atimespec etc. as the member names.

OpenIndiana uses a timespec_t which ought to cause us a problem as we don’t use a timespec_t in the source file so no typedef mapping is created. Luckily, it has typedef’d struct timespec as timespec_t and so the nominal C API code which access statp->st_atim etc. just works.

Many systems don’t typedef a suseconds_t for a struct timeval (returned by gettimeofday(2) and getrusage(2)) even though they seem to get most of the way with C macros for __suseconds_t_defined or _SUSECONDS_T_DECLARED.

Evolution

In the first instance, muggins, here, wrote the libc interfaces by hand in src/libc-wrap.c – the initial prompt to look to automate the process as I was getting fed up trying to figure out that a pid_t was on my collection of test systems.

I can then run idio-c-api-gen for libc and take a copy of the resultant gen/libc-api.c and refashion it to replace the interfaces in src/libc-wrap.c.

Refashioning for me consisted largely of replacing the likes of __pid_t with pid_t and query-replacing the interface argument names.

Interfaces to the likes of stat(2) require more involvement as the user wouldn’t be supplying a struct stat and we want to return the (suitably tagged) struct stat back to the user rather than the int that stat(2) returns.

Interfaces to the likes of getcwd(3) require different error tests and something like mkstemp(3) requires even more fiddling to return the open file descriptor and the name of the file (from the modified template passed in).

Thus src/libc-api.c requires the definitions in src/libc-api.h to build and, once I’d rejigged all the callers – think all the libc interfaces in lib/job-control.idioidio requires the definitions in lib/libc-api.idio to run.

So, src/libc-api.* are great for me on this box. But, whilst src/libc-api.c has been manually tweaked to use the nominal C API, src/libc-api.h and lib/libc-api.idio are full of system-specific definitions.

I can’t check those into source control as they’re simply wrong for anyone else.

Build Bootstrap II

Well, they’re wrong but not too wrong which let’s us play a trick. Let’s put a copy of whatever I’ve generated here, on my dev system, in a src/build-bootstrap directory which all other systems will use to get going.

We can say that bin/idio depends on a locally created src/libc-api.h and lib/libc-api.idio and have a specific rule to create those.

That specific rule can change the include paths for both the C compiler and bin/idio such that it uses the src/build-bootstrap directories just for long enough to run idio-c-api-gen.

Compiling this bootstrap version is likely to generate some warnings about overflow and implicit constant conversions and others. We care deeply about this and… Look! A squirrel!

Having run idio-c-api-gen on this system we will have generated correct typedef mappings in src/libc-api.h and lib/libc-api.idio and make should convince itself to rebuild idio because a header file has changed (technically, appeared).

src/libc-api.c was refashioned to use the nominal C API so requires no adjustment on any other system.

Last built at 2024-09-07T06:11:17Z+0000 from 463152b (dev)