C API¶
Rationale¶
Idio is written in C and therefore uses “the” C API. So what’s the problem?
The problem comes in the form of portability. Let’s say I want to call and use the result of getpid(2). Easy enough, it’s:
pid_t pid = getpid ();
OK. Now I need to store this pid_t
to be able to pass it around
as an IDIO
value. And a pid_t
is a what, exactly, on your
system? I have access to systems where it is either an int
or a
long
(which doesn’t tell me if an int
differs from a long
anyway). So, er, which one?
Another slightly more famous example is a time_t
. It was (and
still will be in some cases) a 32-bit entity resulting in the
impending Epochalypse in January 2038.
Except every modern Linux operating system doesn’t have that problem
as Arnd Bergmann and John Stultz plodded though and changed the
interfaces to 64-bit (kicking the Epochalyse can down the road for a
few hundred billion years…).
And, another bugbear, how do I print a time_t
out without
upsetting the compiler on some system or another?
Underlying this is that the C API uses an opaque typedef for some base C type. For us to be able to store and transport it we could do with knowing what that type is.
In the first instance, we might just use intmax_t
and
uintmax_t
(whatever they are) to store all C integral
types and let the C compiler figure out the casting. But
that doesn’t seem like Art. Not least because we still don’t know
if a pid_t
or a time_t
is signed or unsigned – OK, we can
take a look/reasonable guess and hope that everyone else uses the
same.
Instead, wouldn’t it be useful if we could write code that looked like:
pid_t pid = getpid ();
IDIO ipid = idio_C_pid_t (pid);
where the idio_C_pid_t
constructor is aware of whatever underlying
base type a pid_t
is on this system and stashes it as an int
or long
as appropriate.
Now that the value is an IDIO
value we can pass it merrily around
to anyone else. The only people who will actually care whether it is
an int
or a long
are those manipulating it and they would want
a deconstructor, IDIO_C_TYPE_pid_t
, that is equally aware of the
stash leaving us to write:
void idio_func (IDIO v)
{
IDIO_USER_C_TYPE_ASSERT (pid_t, v);
pid_t pid = IDIO_C_TYPE_pid_t (v);
...
}
where the type assertion knows the mapping from pid_t
to int
or long
as well.
Conversions¶
Hmm, wait a minute, what if we in Idio-land want to
create a pid_t
? Suppose we’ve read in the output of ps
and decide we need to kill(2) some egregious process?
We’ll have started with some string which we can convert into an
Idio number with read-number
but we, probably, don’t even
know if that number is even a fixnum or a bignum. How are we going to
create a pid_t
?
We do have C/integer->
to help us which can be given a clue as to
which C type to create.
So, we need to pass it some clue as to which of the fourteen
C base types we want. Obviously, we’re going to pass it
pid_t
– because that’s all we want to know about this value –
which needs to map to some symbol (the usual Scheme-ish
way) which eventually resolves to either 'int
or 'long
dependent on system.
Idio> C/integer-> 23 libc/pid_t
23
In practice, pid_t
is really libc/pid_t
, ie. to accommodate
different libraries’ potential name clashes we’ll prefix the typedef
name with the module the name comes from.
Those names are exported from libc
but so are many many other
names with a huge potential for clashes:
import libc
can be interesting.
Also notice that the printed representation of a C pid_t
is, perhaps unsurprisingly, indistinguishable from the fixnum we
constructed it from! I’m not sure there’s a useful solution to this.
There is the similar C/->integer
which will return an
Idio fixnum or bignum (depending on the size of the
C value). On a 64-bit machine:
Idio> fixnum? (C/->integer libc/INT_MAX)
#t
Idio> fixnum? (C/->integer libc/INTMAX_MAX)
#f
On a 32-bit system, libc/INT_MAX
is also a bignum.
Predicates¶
In the same manner any libc
-oriented code will want to be able to
test that some Idio value is a libc/pid_t
. That
predicate should, by rights, be libc/pid_t?
.
Given that pid_t
is a typedef to int
(or long
or whatever)
then we’re going to need a mapping from libc/pid_t?
through to the
C domain’s C/int?
or C/long?
as appropriate.
Note that the change, arguably, chain, from a C library (ie. Idio module) into the C domain’s (fourteen) base types might well involve some multi-library hops depending on how the typedefs roll.
Caveats¶
The main caveat is that if your interface uses #define
then this
mechanism won’t help you. #define
s are part of the C
pre-processor and so those definitions will have disappeared by the
time the C compiler has its way and we get to look at the
generated code.
Another issue is that the set of values that some parameters take
might be system-dependent. Think of the resource
parameter to
getrlimit(2), the likes of RLIMIT_NOFILE
,
RLIMIT_NPROC
etc.. On the one hand that ought not be an issue
except that it presumes we do, somehow, know what values have been
defined on this system. Some Linux systems have migrated these values
into an enumerated type, which we can see, and maintained the
C macros for backwards compatibility. Other systems are
#define
only.
For these to be portable you will have to resort to using something along the lines of the error, signal and rlimit value tests:
#if defined (EBADF)
IDIO_LIBC_ERRNO (EBADF);
#endif
and empirical knowledge of the ported systems.
Namespaces¶
I started out distinguishing between the base C types in the
existing C
namespace and those new typedefs being introduced from
libc in the libc
namespace before thinking that C doesn’t
have any namespaces, what am I doing?
However, C++ does have namespaces so I guess it would be prudent to maintain this artiface.
This will lead to the mildly confusing idio_libc_pid_t
constructor
and some idio_libfoo_pid_t
constructor from some
libfoo
API which also happens to use a pid_t
in its
interface.
So, to correct the earlier names, the proper convention is that for
true C domain types (int
, long
etc.) we’ll use:
IDIO_USER_C_TYPE_ASSERT ({base-type}, v);
{base-type} C_v = IDIO_C_TYPE_{base-type} (v);
IDIO r = idio_C_{base-type} (C_v);
and for some libfoo
API using a typedef’d symbol we’ll have:
IDIO_USER_{libfoo}_TYPE_ASSERT ({type-def}, v);
{type-def} C_v = IDIO_{libfoo}_{type-def} (v);
IDIO r = idio_{libfoo}_{type-def} (C_v);
therefore, for the pid_t
in libc
example, we’ll have:
IDIO_USER_libc_TYPE_ASSERT (pid_t, v);
pid_t C_v = IDIO_libc_pid_t (v);
IDIO r = idio_libc_pid_t (C_v);
Build Bootstrap I¶
We’ll come back to this but Idio uses the C API for
libc
to run and yet we’re somehow using Idio to
build that libc
interface. How does that work?
Well, we cheat, of course. In the first instance there are some
libc
API files that are at least consistent, if not
necessarily correct for this system, these are in
build-bootstrap
subdirectories.
That is enough to allow us to run idio-c-api-gen to build
correct libc
API files for this system. The presence of which
prompts make to rebuild idio.
DWARF¶
DWARF was originally a play on words given its relationship to ELF, the Executable and Linking Format for object files but later backronym’ed to…well, who cares?
DWARF is our friend, here. One observation is that the debugger seems to know a lot about the types in our system and if we investigate the output of a “debug” object file it turns out to contain a lot of information about the types in the object file.
But, as noted, nothing from the C pre-processor.
readelf is not available on Mac OS X – noting, in turn, that objdump on Mac OS X is rubbish and you need to run dwarfdump to get anything useful.
objdump seems to exist on all the systems I use which produces a wealth of information in text format. Given we’ll have to parse something, text format seems a lot easier for us.
DIEs¶
The output is in the form of cascades of DIEs. These look something like:
<1><50>: Abbrev Number: 2 (DW_TAG_base_type)
<51> DW_AT_byte_size : 2
<52> DW_AT_encoding : 5 (signed)
<53> DW_AT_name : (indirect string, offset: 0x9c4): short int
<1><57>: Abbrev Number: 3 (DW_TAG_base_type)
<58> DW_AT_byte_size : 4
<59> DW_AT_encoding : 5 (signed)
<5a> DW_AT_name : int
The output varies with:
the (historic) version of objdump changes the exact format
the set of entries is dependent on the types used in the object file (duh!)
the order of output varies on the contents of the object file
However, reading between the lines we can see there appear to be two
base_type
s defined in that snippet:
a
short int
with abyte_size
of 2 is type<50>
an
int
with abyte_size
of 4 is type<57>
The sizes seem about right for this system for a start!
The encoding
attribute is interesting as it is a clue as to how to
interpret those 2 or 4 bytes. signed
, signed char
,
unsigned
and float
are examples. I don’t think in C
there’s really very much going on, here, but I get the impression that
the encoding
allows for much more like the
UTF-8
/UCS-2
-style of encoding.
Note
A DIE without a type uses the implied type of void
.
pid_t¶
Let’s try a pid_t
:
int main (int argc, char ** argv)
{
pid_t pid = getpid ();
return 0;
}
which compiling, gcc -g -c -o libc-api.o libc-api.c
gives us
(amongst other things):
<1><57>: Abbrev Number: 3 (DW_TAG_base_type)
<58> DW_AT_byte_size : 4
<59> DW_AT_encoding : 5 (signed)
<5a> DW_AT_name : int
<1><e9>: Abbrev Number: 4 (DW_TAG_typedef)
<ea> DW_AT_name : (indirect string, offset: 0x7f1): __pid_t
<ee> DW_AT_decl_file : 2
<ef> DW_AT_decl_line : 154
<f0> DW_AT_decl_column : 25
<f1> DW_AT_type : <0x57>
<1><1ea>: Abbrev Number: 4 (DW_TAG_typedef)
<1eb> DW_AT_name : (indirect string, offset: 0x727): pid_t
<1ef> DW_AT_decl_file : 3
<1f0> DW_AT_decl_line : 97
<1f1> DW_AT_decl_column : 17
<1f2> DW_AT_type : <0xe9>
The third DIE shown, <1ea>
at the bottom, is a typedef
for
pid_t
and has some clues about a file and that its type
is
<0xe9>
.
The middle DIE shown, is <e9>
which is itself a typedef
for
__pid_t
from a different file (you would guess) and in turn has
its type as <0x57>
.
The first DIE shown is <57>
and is a base_type
, an int
.
Sweet!
As noted, another system typedefs pid_t
directly to a long int
base type.
That raises another spectre. Where has this __pid_t
type come
from and is it portable?
Well, the answer is that it is obviously some local definition and is clearly not portable so we need to be more careful. Not only are the ultimate base types of the typedefs different but the mappings through to them are different too.
That might come back to bite us, it might not.
*
In the meanwhile we can see a means to automatically generate many of the interfaces we’ve been talking about, most of which are simply redefinitions of existing base types.
In the C world, you can imagine creating:
#define IDIO_TYPE_C_libc___pid_t IDIO_TYPE_C_INT
#define idio_libc___pid_t idio_C_int
#define IDIO_C_TYPE_libc___pid_t IDIO_C_TYPE_int
#define idio_isa_libc___pid_t idio_isa_C_int
#define IDIO_TYPE_C_libc_pid_t IDIO_TYPE_C_libc___pid_t
#define idio_libc_pid_t idio_libc___pid_t
#define IDIO_C_TYPE_libc_pid_t IDIO_C_TYPE_libc___pid_t
#define idio_isa_libc_pid_t idio_isa_libc___pid_t
Here,
IDIO_TYPE_C_module_name
is a mapping from the C constructed name to an Idiosrc/gc.h
compatible macro for the C base typesThis allows us to figure out the correct printf(3) format string for a given type.
idio_module_name
is the constructorIDIO_C_TYPE_module_name
is the destructorAn unfortunate naming near-miss with
IDIO_TYPE_C_module_name
.idio_isa_module_name
is a predicate
The IDIO_USER_module_TYPE_ASSERT(type,x)
is a generic
macro which ultimately is going to call
idio_isa_module_type (x)
which we’ve just defined.
and for the Idio world:
export (
__pid_t
__pid_t?
pid_t
pid_t?
)
__pid_t := 'int
define __pid_t? C/int?
pid_t := __pid_t
define pid_t? libc/__pid_t?
Meaning that some Idio code can:
pid := ...
c-pid := C/integer-> pid libc/pid_t
and everything lines up!
Structures and Unions¶
Structures and unions are all “lifted” up to the top level, even if they are only used inside another structure or union.
That said, they are very simple being the structure_type
itself
(with or without a name
) and a sequence of member
s each of
which has a type
in turn.
Hmm, now we know about the fields in a structure, we ought to be able to generate some code to be able to access the fields.
From Idio you can imagine wanting to access a
stat(2) struct stat
along the lines of:
sb := libc/stat "."
printf "size=%s\n" sb.st_size
which is utilizing the value-index
operator .
to access the
st_size
member using a symbol, st_size
.
What do we need for that? Well, we should define the symbol, for a
start, then we need a function that can poke about in the struct
stat
, let’s call it struct-stat-ref
:
IDIO_SYMBOL_DECL (st_dev);
IDIO_SYMBOL_DECL (st_ino);
...
IDIO_SYMBOL_DECL (st_size);
...
IDIO_DEFINE_PRIMITIVE2_DS ("struct-stat-ref", libc_struct_stat_ref, (IDIO stat, IDIO member), "stat member", "\
in C, stat->member \n\
\n\
:param stat: C struct stat \n\
:type stat: C/pointer \n\
:param member: C struct member \n\
:type member: symbol \n\
:return: stat->member \n\
:rtype: varies on member \n\
")
{
IDIO_ASSERT (stat);
IDIO_ASSERT (member);
/*
* Test Case: libc-errors/struct-stat-ref-bad-pointer-type.idio
*
* struct-stat-ref #t #t
*/
IDIO_USER_C_TYPE_ASSERT (pointer, stat);
/*
* Test Case: libc-errors/struct-stat-ref-bad-member-type.idio
*
* struct-stat-ref v #t
*/
IDIO_USER_TYPE_ASSERT (symbol, member);
... can we check that stat is a pointer to a struct stat?
struct stat *statp = IDIO_C_TYPE_POINTER_P (stat);
if (idio_S_st_dev == member) {
return idio_libc_dev_t (statp->st_dev);
} else if (idio_S_st_ino == member) {
...
} else if (idio_S_st_size == member) {
return idio_libc_off_t (statp->st_size);
} else
...
}
We should be able to generate a similar struct-stat-set!
although
whether such a function is warranted is another question. a priori
it’s a valid operation.
For the answer to testing whether stat
really is a struct
stat
, see Subprograms, below.
Separately, though related, there is the question, about how the
system might know that it should use struct-stat-ref
, in
particular, when de-structuring some (random) C pointer,
rather than any other primitive (or function). More on that below.
Pointers¶
Pointers are flagged en route to the real type underneath. Here’s the
case for the formal_parameter
argv
:
<1><178>: Abbrev Number: 9 (DW_TAG_pointer_type)
<179> DW_AT_byte_size : 8
<17a> DW_AT_type : <0x17e>
<1><17e>: Abbrev Number: 2 (DW_TAG_base_type)
<17f> DW_AT_byte_size : 1
<180> DW_AT_encoding : 6 (signed char)
<181> DW_AT_name : (indirect string, offset: 0x351): char
<1><2fa>: Abbrev Number: 9 (DW_TAG_pointer_type)
<2fb> DW_AT_byte_size : 8
<2fc> DW_AT_type : <0x178>
<2><115d>: Abbrev Number: 36 (DW_TAG_formal_parameter)
<115e> DW_AT_name : (indirect string, offset: 0x57b): argv
<1162> DW_AT_decl_file : 1
<1163> DW_AT_decl_line : 55
<1164> DW_AT_decl_column : 28
<1165> DW_AT_type : <0x2fa>
<1169> DW_AT_location : 4 byte block: 91 f0 b9 7f (DW_OP_fbreg: -8976)
Which you should be able to see both that argv
is a pointer
to a pointer to a char
, ie. char **argv
and that DIEs are not
necessarily printed in an obvious order.
Arrays¶
Array types are a bit more subtle. They are described distinctly from pointers, as they should be, though we might use them indiscriminately from pointers.
In this example, all of the member of the struct utsname
are
char []
rather than char *
:
<1><c80>: Abbrev Number: 28 (DW_TAG_structure_type)
<c81> DW_AT_name : (indirect string, offset: 0x1e8): utsname
<c85> DW_AT_byte_size : 390
<c87> DW_AT_decl_file : 23
<c88> DW_AT_decl_line : 48
<c89> DW_AT_decl_column : 8
<c8a> DW_AT_sibling : <0xcdf>
<2><c8e>: Abbrev Number: 12 (DW_TAG_member)
<c8f> DW_AT_name : (indirect string, offset: 0x5e1): sysname
<c93> DW_AT_decl_file : 23
<c94> DW_AT_decl_line : 51
<c95> DW_AT_decl_column : 10
<c96> DW_AT_type : <0xcdf>
<c9a> DW_AT_data_member_location: 0
<2><c9b>: Abbrev Number: 12 (DW_TAG_member)
<c9c> DW_AT_name : (indirect string, offset: 0x459): nodename
<ca0> DW_AT_decl_file : 23
<ca1> DW_AT_decl_line : 54
<ca2> DW_AT_decl_column : 10
<ca3> DW_AT_type : <0xcdf>
<ca7> DW_AT_data_member_location: 65
<1><cdf>: Abbrev Number: 5 (DW_TAG_array_type)
<ce0> DW_AT_type : <0x17e>
<ce4> DW_AT_sibling : <0xcef>
<17e>
was the char
in the pointer example, above.
Notice also, part of the ABI (as opposed to the API) is exposed in
that the data_member_location
for nodename
is 65 bytes along
suggesting that sysname
is defined as char sysname[65]
on this
system.
uname(2) on this system notes:
The length of the fields in the struct varies. Some operating systems or libraries use a hardcoded 9 or 33 or 65 or 257. Other systems use SYS_NMLN or _SYS_NMLN or UTSLEN or _UTSNAME_LENGTH. Clearly, it is a bad idea to use any of these constants; just use sizeof(…). Often 257 is chosen in order to have room for an internet hostname.
Enumerated Types¶
Enumerated types have a type
and a number of enumerator
s
each of which has a const_value
.
“A bit loose”? How Unusual for C… not!
The wording around the type for an enumerated type is a bit loose and merely suggests something big enough to hold the entire set of enumerators.
I can see both int
and unsigned int
enumerated types.
Subprograms¶
I’m not sure what subprogram
is meant to describe but you will
usually get to see main
or whatever the main enclosing function
is.
The information provided includes the formal parameters and any variables used in the function (possibly including any lexical blocks they exist in).
Fedora using GCC 10!
However, rather usefully, on some systems it will also have a
subprogram
entry for each function it calls.
This is strikingly useful as it only requires one system, using
portable definitions, to produce this information from which we can
sketch out the framework for a series of IDIO_DEFINE_PRIMITIVE
functions describing those calls. Once defined they are, by
definition(?), using the portable C API and are therefore
applicable to all systems.
Due to the absence of debugging information in, in my case, libc, there’s no formal parameter names for the system and library calls I’m referencing but we do get the formal parameter types and the return type. We can obviously invent some arguments names, arg1, arg2, etc..
For kill(2), whose prototype looks like:
int kill(pid_t pid, int sig);
this becomes:
IDIO_DEFINE_PRIMITIVE2_DS ("kill", libc_kill, (IDIO arg1, IDIO arg2), "arg1 arg2", "\
in C: kill (arg1, arg2) \n\
a wrapper to libc kill() \n\
\n\
:param arg1: \n\
:type arg1: libc/__pid_t \n\
:param arg2: \n\
:type arg2: C/int \n\
:return:
:rtype: C/int
")
{
IDIO_ASSERT (arg1);
IDIO_ASSERT (arg2);
/*
* Test Case: libc-errors/kill-bad-arg1-type.idio
*
* kill #t #t
*/
IDIO_USER_libc_TYPE_ASSERT (__pid_t, arg1);
__pid_t C_arg1 = IDIO_C_TYPE_libc___pid_t (arg1);
/*
* Test Case: libc-errors/kill-bad-arg2-type.idio
*
* kill #t #t
*/
IDIO_USER_C_TYPE_ASSERT (int, arg2);
int C_arg2 = IDIO_C_TYPE_int (arg2);
int kill_r = kill (C_arg1, C_arg2);
/* check for errors */
if (-1 == kill_r) {
idio_error_system_errno ("kill", idio_S_nil, IDIO_C_FUNC_LOCATION ());
return idio_S_notreached;
}
/*
* WARNING: this is probably an incorrect return
*/
return idio_C_int (kill_r);
}
Now, that’s not too shabby for something automatically generated.
*
If we do install the libc debugging symbols we still don’t get
formal_parameter
names – well, for the subprogram
s we’re interested in,
anyway.
Installing the debugging symbols isn’t quite as obvious as you might
think. In the case of Fedora these are in separate packages in
disabled repos but on the plus side tools like gdb know to
go looking for them and programs like objdump can be
persuaded with an extra argument (K
in this case).
$ sudo dnf --enablerepo=fedora-debuginfo debuginfo-install glibc-debuginfo
$ objdump -WilK /lib64/libc-2.33.so
<1><3452f>: Abbrev Number: 52 (DW_TAG_subprogram)
<34530> DW_AT_external : 1
<34530> DW_AT_name : (indirect string, offset: 0x108c0): kill
<34534> DW_AT_decl_file : 93
<34535> DW_AT_decl_line : 112
<34536> DW_AT_decl_column : 12
<34537> DW_AT_prototyped : 1
<34537> DW_AT_type : <0x2a>
<3453b> DW_AT_declaration : 1
<3453b> DW_AT_sibling : <0x34547>
<2><3453c>: Abbrev Number: 24 (DW_TAG_formal_parameter)
<3453d> DW_AT_type : <0x2341>
<2><34541>: Abbrev Number: 24 (DW_TAG_formal_parameter)
<34542> DW_AT_type : <0x2a>
<2><34546>: Abbrev Number: 0
Some subprogram
s do have named formal parameters in libc which
makes me think the construction of things might be considerably more
interesting than at first blush.
Wait a minute, isn’t there some __kill
nonsense floating about
with system calls? Hmm. The __kill
subprogram
also declines
to offer us any formal parameter names. So let’s dig deeper:
$ sudo dnf --enablerepo=fedora-debuginfo debuginfo-install kernel-debuginfo-$(uname -r)
$ cd /usr/lib/debug/lib/modules/$(uname -r)
$ nm vmlinux | grep '[tT] kill_'
...
ffffffff810ecb50 T kill_pgrp
ffffffff810ecc20 T kill_pid
ffffffff810ecb90 T kill_pid_info
ffffffff810ea580 T kill_pid_usb_asyncio
ffffffff8131ecc0 t kill_procs
...
I don’t think we’re in Kansas any more, Toto!
$ objdump -WilK vmlinux | less +/kill_pid
<1><10e9d79>: Abbrev Number: 53 (DW_TAG_subprogram)
<10e9d7a> DW_AT_external : 1
<10e9d7a> DW_AT_name : (indirect string, offset: 0x179603): kill_pid
<10e9d7e> DW_AT_decl_file : 2
<10e9d7f> DW_AT_decl_line : 1793
<10e9d81> DW_AT_decl_column : 5
<10e9d82> DW_AT_prototyped : 1
<10e9d82> DW_AT_type : <0x10bcbd0>
<10e9d86> DW_AT_low_pc : 0xffffffff810ecc20
<10e9d8e> DW_AT_high_pc : 0x1a
<10e9d96> DW_AT_frame_base : 1 byte block: 9c (DW_OP_call_frame_cfa)
<10e9d98> DW_AT_GNU_all_call_sites: 1
<10e9d98> DW_AT_sibling : <0x10e9e07>
<2><10e9d9c>: Abbrev Number: 51 (DW_TAG_formal_parameter)
<10e9d9d> DW_AT_name : pid
<10e9da1> DW_AT_decl_file : 2
<10e9da2> DW_AT_decl_line : 1793
<10e9da4> DW_AT_decl_column : 26
<10e9da5> DW_AT_type : <0x10c3638>
<10e9da9> DW_AT_location : 0x3a232c (location list)
<10e9dad> DW_AT_GNU_locviews: 0x3a2326
<2><10e9db1>: Abbrev Number: 51 (DW_TAG_formal_parameter)
<10e9db2> DW_AT_name : sig
<10e9db6> DW_AT_decl_file : 2
<10e9db7> DW_AT_decl_line : 1793
<10e9db9> DW_AT_decl_column : 35
<10e9dba> DW_AT_type : <0x10bcbd0>
<10e9dbe> DW_AT_location : 0x3a237e (location list)
<10e9dc2> DW_AT_GNU_locviews: 0x3a2378
<2><10e9dc6>: Abbrev Number: 28 (DW_TAG_formal_parameter)
<10e9dc7> DW_AT_name : (indirect string, offset: 0x3b5017): priv
<10e9dcb> DW_AT_decl_file : 2
<10e9dcc> DW_AT_decl_line : 1793
<10e9dce> DW_AT_decl_column : 44
<10e9dcf> DW_AT_type : <0x10bcbd0>
<10e9dd3> DW_AT_location : 0x3a23ce (location list)
<10e9dd7> DW_AT_GNU_locviews: 0x3a23ca
<2><10e9ddb>: Abbrev Number: 90 (DW_TAG_GNU_call_site)
...
<3><10e9e05>: Abbrev Number: 0
<2><10e9e06>: Abbrev Number: 0
Hmm, three parameters. I think we’ve gone too deep.
So, query-replace argn
seems fine to me. We almost
certainly need to tweak the code anyway.
*
We have a portability issue as Fedora has defined the API with
__pid_t
but if we query-replaced the double-underscore we’re in
a much better position.
We can obviously query-replace arg1
for pid
and arg2
for
sig
leaving us with:
IDIO_DEFINE_PRIMITIVE2_DS ("kill", libc_kill, (IDIO pid, IDIO sig), "pid sig", "\
in C: kill (pid, sig) \n\
a wrapper to libc kill() \n\
\n\
:param pid: \n\
:type pid: libc/pid_t \n\
:param sig: \n\
:type sig: C/int \n\
:return:
:rtype: C/int
")
{
IDIO_ASSERT (pid);
IDIO_ASSERT (sig);
/*
* Test Case: libc-errors/kill-bad-pid-type.idio
*
* kill #t #t
*/
IDIO_USER_libc_TYPE_ASSERT (pid_t, pid);
pid_t C_pid = IDIO_C_TYPE_libc_pid_t (pid);
/*
* Test Case: libc-errors/kill-bad-sig-type.idio
*
* kill #t #t
*/
IDIO_USER_C_TYPE_ASSERT (int, sig);
int C_sig = IDIO_C_TYPE_int (sig);
int kill_r = kill (C_pid, C_sig);
/* check for errors */
if (-1 == kill_r) {
idio_error_system_errno ("kill", idio_S_nil, IDIO_C_FUNC_LOCATION ());
return idio_S_notreached;
}
/*
* WARNING: this is probably an incorrect return
*/
return idio_C_int (kill_r);
}
and we may just have a working, portable, interface to the libc
API!
Re-imagining APIs¶
It does cover a lot of cases, though!
Now, this almost certainly won’t work for you off the bat as I’ve clearly directed the automatic code generation to handle the most common form of error that system and library calls produce and assumed you can directly return the value from the API call.
For calls like getcwd(3) the value returned is a char *
and for error checking we should be comparing to NULL
.
For something like times(3) which is expecting a struct
tms
to be supplied (unlikely from Idio-land!) but also the
returned value is a clock_t
. Here, whilst the “check for errors”
is nominally correct we should be retaining the value and we will end
up returning a list of the clock_t
and the struct tms
back to
the user.
Similarly, a direct copy of the C API is not usefully correct
for something like stat(2) where the user is in no position
to create a struct stat
to pass as an argument. In this case we
would only accept a pathname
argument and allocate a struct
stat
to be freed later.
This leads to the idea that the automatic code generation will give us a starter for ten which we can edit into permanence.
Auto-Application of Methods¶
Now, about that struct stat
we’ve just allocated. We’ve gone to
the trouble of creating struct-stat-ref
(and the moot
struct-stat-set!
) to manipulate it. How might we get that to
happen auto-magically?
Well, suppose we associate with the C_pointer
in the IDIO
value we’re about to return some useful type information. What’s
useful? Hmm, how about a couple of things: a string,
"libc/struct-stat"
(useful for reporting) and a reference to
struct-stat-ref
?
In the latter case, we don’t need to also add struct-stat-set!
as
we can use our trusty Setters mechanism to do the right thing.
We want something that is unique for each type and a pair
always
falls into that category – even if the contents are the same the
actual pair
itself is unique in memory.
So, at the time we define our structure, let’s throw out a “C Structure Identification”:
IDIO_C_STRUCT_IDENT_DECL (libc_struct_stat);
IDIO_SYMBOL_DECL (st_dev);
...
and then a bit later on, when we’re adding primitives etc. we can expand that into a list:
IDIO fgvi = IDIO_EXPORT_MODULE_PRIMITIVE (idio_libc_module, libc_struct_stat_ref);
IDIO_C_STRUCT_IDENT_DEF ("libc/struct-stat", libc_struct_stat, fgvi);
IDIO_EXPORT_MODULE_PRIMITIVE (idio_libc_module, libc_struct_stat_set);
Here, we take advantage of the fact that
IDIO_EXPORT_MODULE_PRIMITIVE
always returned the actual primitive
reference (the value associated with the symbol struct-stat-ref
)
– except we normally throw it away.
We can stick that reference in a list through the C macro
IDIO_C_STRUCT_IDENT_DEF
.
In practice, it defines a name in C-land,
idio_CSI_libc_struct_stat
whose value is the list ("struct stat"
struct-stat-ref)
.
So, we have something unique per structure that we can associate with
a pointer and the stat
primitive can return:
return idio_C_pointer_type (idio_CSI_libc_struct_stat, statp);
Now we can revisit struct-stat-ref
and perform the check:
IDIO_USER_C_TYPE_ASSERT (pointer, stat);
if (idio_CSI_libc_struct_stat != IDIO_C_TYPE_POINTER_PTYPE (stat)) {
idio_error_param_value ("stat", "should be a libc/struct-stat", IDIO_C_FUNC_LOCATION ());
return idio_S_notreached;
}
and we can extend the code for value-index
to have a nosey in any
C pointers that come its way:
...
case IDIO_TYPE_C_POINTER:
{
IDIO t = IDIO_C_TYPE_POINTER_PTYPE (o);
if (idio_S_nil != t) {
IDIO cmd = IDIO_LIST3 (IDIO_PAIR_HT (t), o, i);
IDIO r = idio_vm_invoke_C (idio_thread_current_thread (), cmd);
return r;
}
}
break;
...
We need to do a couple of changes for the -set! side of things to work.
In Idio-land we can check that both -ref and -set!
primitives exist – after all, someone might deem modifying a
struct foo
a poor move and have removed struct-stat-set!
– but the auto-generated lib/libc-api.idio
doesn’t know that:
if (and (function? struct-stat-ref)
(function? struct-stat-set!)) {
set! (setter struct-stat-ref) struct-stat-set!
}
and the code for set-value-index!
can be updated for C
pointers as well:
...
case IDIO_TYPE_C_POINTER:
{
IDIO t = IDIO_C_TYPE_POINTER_PTYPE (o);
if (idio_S_nil != t) {
/*
* We want: (setter ref) o i v
*
* but we have to invoke by stage:
*/
IDIO setter_cmd = IDIO_LIST2 (idio_module_symbol_value (idio_S_setter,
idio_Idio_module,
idio_S_nil),
IDIO_PAIR_HT (t));
IDIO setter_r = idio_vm_invoke_C (idio_thread_current_thread (), setter_cmd);
if (! idio_isa_function (setter_r)) {
idio_debug ("(setter %s) did not yield a function\n", IDIO_PAIR_HT (t));
break;
}
IDIO set_cmd = IDIO_LIST4 (setter_r, o, i, v);
IDIO set_r = idio_vm_invoke_C (idio_thread_current_thread (), set_cmd);
return set_r;
}
}
break;
...
Printing¶
Of course, with your bespoke structure, you might want a bespoke printer. There’s a mechanism there too.
The add-as-string
system for adding bespoke printing for
Idio structures has been extended to support C
structures with idio_CSI_
support. The printer is associated with
the idio_CSI_
value such that all C pointers to the same
kind of struct
use the same printer.
The generated printers have two parts, a C part
_as_string()
which creates the results via idio_display_C()
and idio_display()
and an output string handle and an
Idio part which calls the C part.
The normal structure printing is to:
add
#<CSI structure-name
for each structure member:
add
member-name:
add the printed form of the structure member
This requires a helper function,
idio_C_type_format_string()
which can map, say,libc/pid_t
into the appropriate printf(3) format string (probably,%ld
or%d
)
add
>
For some structure types there is a natural printed format. For
example, a struct timeval
has seconds and micro-seconds parts and
is commonly displayed in a seconds.micro-seconds
form.
We know that the tv_usec
member of a struct timeval
represent
micro-seconds and can only be six decimal digits even though its type,
suseconds_t
, is probably a long
. In fact, it must be
displayed as six leading-0-padded digits otherwise it makes no sense.
For example, 1s and 213us would be displayed as 1.000213
and not
1.213
.
Further, if there is a precision pending, say, 3, then the precision
is applied to the leading-0-padded string, not the literal tv_usec
value, giving a result of 1.000
.
You wouldn’t read back in the printed form, would you?
No brownie points for you!
Note that the resultant printed form does not include the structure
name, it appears as just a floating point number. The value is a
struct timeval
, it’s just the printed form that looks like a
floating point number. Compare that with the printed representation
of the fixnum, 23, and the libc/pid_t
, 23, we had before.
The auto printing of C structures comes in quite handy. For example, noting I accidentally called the external command stat first, we get to compare results:
Idio> stat "."
File: .
Size: 8192 Blocks: 24 IO Block: 4096 directory
Device: fd00h/64768d Inode: 17910888 Links: 3
Access: (0775/drwxrwxr-x) Uid: ( 1000/ idf) Gid: ( 1000/ idf)
Context: unconfined_u:object_r:user_home_t:s0
Access: 2021-07-05 12:24:42.072310969 +0100
Modify: 2021-07-05 12:24:43.574321757 +0100
Change: 2021-07-05 12:24:43.574321757 +0100
Birth: 2021-05-11 11:19:01.916839841 +0100
#t
Idio> libc/stat "."
#<CSI libc/struct-stat
st_dev:64768
st_ino:17910888
st_nlink:3
st_mode:16893
st_uid:1000
st_gid:1000
st_rdev:0
st_size:8192
st_blksize:4096
st_blocks:24
st_atim:1625484282.072310969
st_mtim:1625484283.574321757
st_ctim:1625484283.574321757>
(I’ve broken the CSI printout for viewing convenience – it is one long line!)
In this case, a struct timespec
with a tv_nsec
field for
nano-seconds is seen for the timestamp fields. Notice the leading 0
for the access time entries.
And, not wanting to emphasise the point, those struct timespec
printed representations use 19 significant digits which, if you recall
the work on Bignums, is one too many for an accurate floating
point value. Off to inexact school for you!
Files¶
idio-c-api-gen takes a nominal library name as an argument,
say, libc
. It then seeks out an .../ext/libc
directory
where .../ext
is derived from possible directories called
.../lib/idio
in IDIOLIB
.
It then looks for .../ext/libc/api/libc-api.c
and compiles it
into .../ext/libc/api/libc-api.o
using the local
Makefile
.
It then runs objdump on that .o
file and starts
generating output in .../ext/libc/gen
:
gen/libc-api.c
(unfortunate name clash with the original source file)This C source file contains:
primitives for the C
struct
accessors and printerprimitives for any
subprogram
definitions – excludingmain
It will be incorrect!
It is impossible to infer the correct handling of any errors. The sample code is for the most common form of system errors.
It is also not possible to identify when, commonly, a C pointer in the API is meant to be allocated by the caller.
That sort of C API is likely to be replaced by an Idio API where the pointer is not required as an argument but is allocated and supplied internally and subsequently returned to the user rather than the nominal return value from the API call.
Think of the example of stat(2) where the Idio user cannot supply a
struct stat
and have an expectation that the value returned from astat
call is thestruct stat
(and not theint
that stat(2) returns).a putative
idio_libc_api_add_primitives()
functiona putative
idio_init_libc_api()
function which will contain:#ifdef
-wrappered definitions for anyenumeration_type
sthe definitions for any
struct
members’ symbolsthe corresponding
idio_CSI_
definition for thestruct
itself
gen/libc-api.h
This C header file contains:
a helpful description of the used
base_type
s in a commenta generic
IDIO_USER_lib_TYPE_ASSERT()
macroa series of C macro expansions for each
typedef
:a constructor
an accessor
a predicate
declarations for the
struct
definitions ingen/libc-api.c
gen/libc-api.idio
This Idio source file contains:
exports of:
the
typedef
type mappings (for theC/integer->
function)(commented out) the
enumerated_type
enumeration
s
the definitions of the entities just exported
the setup for some Setters
gen/test-libc-error.idio
This Idio source file contains a reasonable attempt at a test suite based on the known type and value tests that can be automatically inferred from the
struct
s andsubprogram
s described.It should be able to alert when a test fails to generate the expected error but do not rely on this.
gen/libc-errors/*
This directory contains putative instances of all the test cases described above.
It will be incorrect!
For example, for
struct
test cases, to be able to reach some test cases then an valid argument needs to be supplied.It is impossible to infer how to create such an argument and the sample code simply refers to
v
.For example, for
stat
-related tests which require a validstruct stat
I’ve replacedv
with(libc/stat ".")
which, since we’ve been writing the above code, successfully returns astruct stat
with appropriateidio_CSI_
definition.
Inconsistent Outputs¶
You will not get the same output from all systems at the very least
because of the issues regarding typedef
mappings as described
previously.
There are further complications with struct
definitions and
subprogram
API types.
In the case of a struct
you may find that some systems define
extra structure members over and above the nominal C API.
These should have no effect – other than adding extra member name
symbols that you have no reason to use. I have generally removed them
from the code to reduce the members down to the portable set.
For both struct
and subprogram
definitions you may find that
the actual C API uses some of these intermediate typedefs
we’ve mentioned.
For example, my Fedora system seems to use __pid_t
everywhere
where the nominal C API thinks a pid_t
should be used.
There are two problems here:
the generated
gen/libc-api.c
will be using__pid_t
, eg.IDIO_USER_libc_TYPE_ASSERT (__pid_t, arg1);
That’s not too traumatic to fix but you need to be aware of it if you choose a Fedora system as the source for your proto-permanent
src/libc-api.c
.unless you make an effort then the typedef for
pid_t
itself will be nowhere to be foundThis can be trivially solved by re-writing the original source file to say:
pid_t pid = getpid ();
thus forcing the
typedef
mapping to appear.Although, obviously, you won’t know that that is required until you discover that the expected
pid_t
is missing.In other words, the creation of
.../ext/libc/gen/libc-api.c
could take a couple of iterations around the loop.
Inconsistent API¶
In some cases the API has changed over time. Historically, the
stat(2) API had three time_t
members, st_atime
,
st_mtime
and st_ctime
.
Since the time_t
work described above the API has (mostly) become
three struct timespec
members, st_atim
, st_mtim
and
st_ctim
. There’s lots of good things here and one bad one.
The latter has potential typedef issues but appears to be a
long
on most systems. I guess any 16-bit systems might have an
issue.
A struct timespec
has both a time_t
member, tv_sec
, now,
of course, 64 bits wide and a nano-second-capable member, tv_nsec
.
The positives, then, are that we have our billions of years in a
time_t
and we have nano-second granularity.
The downside is that there is no (longer) a reference to the
time_t
, the notional st_atime
etc., in the API.
Most systems:
#define st_atime st_atim.tv_sec
but as we know, no traces of the C pre-processor are left in the object file.
That means that idio-c-api-gen cannot generate any such references.
Of course, that’s easy enough to fix when we’re patching up the
generated code for other reasons. We can manually write the code to
declare and use the extra symbols, st_atime
etc., and add extra
clauses in the struct
accessor primitives to handle the extra
symbols and, of course, we can correctly return the value with the
constructor idio_libc_time_t
.
Oddities¶
Mac OS X chooses to be different and uses st_atimespec
etc. as the
member names.
OpenIndiana uses a timespec_t
which ought to cause us a problem as
we don’t use a timespec_t
in the source file so no typedef
mapping is created. Luckily, it has typedef’d struct timespec
as
timespec_t
and so the nominal C API code which access
statp->st_atim
etc. just works.
Many systems don’t typedef a suseconds_t
for a struct timeval
(returned by gettimeofday(2) and getrusage(2))
even though they seem to get most of the way with C macros
for __suseconds_t_defined
or _SUSECONDS_T_DECLARED
.
Evolution¶
In the first instance, muggins, here, wrote the
libc
interfaces by hand in src/libc-wrap.c
– the initial
prompt to look to automate the process as I was getting fed up trying
to figure out that a pid_t
was on my collection of test systems.
I can then run idio-c-api-gen for libc
and take a copy
of the resultant gen/libc-api.c
and refashion it to replace
the interfaces in src/libc-wrap.c
.
Refashioning for me consisted largely of replacing the likes of
__pid_t
with pid_t
and query-replacing the interface argument
names.
Interfaces to the likes of stat(2) require more involvement
as the user wouldn’t be supplying a struct stat
and we want to
return the (suitably tagged) struct stat
back to the user rather
than the int
that stat(2) returns.
Interfaces to the likes of getcwd(3) require different error tests and something like mkstemp(3) requires even more fiddling to return the open file descriptor and the name of the file (from the modified template passed in).
Thus src/libc-api.c
requires the definitions in
src/libc-api.h
to build and, once I’d rejigged all the callers
– think all the libc
interfaces in lib/job-control.idio
– idio requires the definitions in
lib/libc-api.idio
to run.
So, src/libc-api.*
are great for me on this box. But, whilst
src/libc-api.c
has been manually tweaked to use the nominal
C API, src/libc-api.h
and lib/libc-api.idio
are full of system-specific definitions.
I can’t check those into source control as they’re simply wrong for anyone else.
Build Bootstrap II¶
Well, they’re wrong but not too wrong which let’s us play a trick.
Let’s put a copy of whatever I’ve generated here, on my dev system, in
a src/build-bootstrap
directory which all other systems will
use to get going.
We can say that bin/idio depends on a locally created
src/libc-api.h
and lib/libc-api.idio
and have a
specific rule to create those.
That specific rule can change the include paths for both the
C compiler and bin/idio such that it uses the
src/build-bootstrap
directories just for long enough to run
idio-c-api-gen.
Compiling this bootstrap version is likely to generate some warnings about overflow and implicit constant conversions and others. We care deeply about this and… Look! A squirrel!
Having run idio-c-api-gen on this system we will have
generated correct typedef mappings in src/libc-api.h
and
lib/libc-api.idio
and make should convince itself
to rebuild idio because a header file has changed
(technically, appeared).
src/libc-api.c
was refashioned to use the nominal C
API so requires no adjustment on any other system.
Last built at 2024-12-21T07:10:59Z+0000 from 463152b (dev)