Further Challenges in Generating C Bindings

After parsing the output from gcc, there were more challenges in generating such bindings that work well for my language. C headers files are really inaccessible description for an API, despite being standardized and plain text.

Macros need some parsing too, and then some more

In C file it's not unusual to store important constants in macros. The macro is an expression which folds into the correct constant. The macros are defined as rewriting rules into C, so it would be very difficult to convert all of them. I were able to parse some of these macros by parsing them with the same parser I used for constants appearing in arrays and enumerations.

GCC doesn't retain the order of the macros in the debug output, so I needed separate parsing and evaluation stages for retrieving the constants. Macro constants depended on the enumerated constants, so macroparsing happens after the actual parsing.

Partial Evaluation

Most of the time I tried to evaluatee constants as soon as I encountered them. Some C headers represent constants as size of an another structure. The problem is that I can't force the size of structures at this point, as the API specification should be portable, if possible. I decided to produce partial evaluation whenever I hit these constructs. I drop lot of the constants and type definitions I parse, so partial evaluation lets me acknowledge the constructs but I don't need to deal with them right yet.

Namespace Restriction

Once parsed the header files contain lot more than the API. I solved it by shuffling through the functions, using a regex to select the functions I generate bindings for. It'd be also quite goofy to hold separate namespaces for unions and structures. I solved this by scanning through every function and a structure I'm writing out. If there's a typedef that produces the alias for the structure in the global space, then I coalesce the structure to hold that name. If there's no structure of same name in the global space, I drop the structure from the struct space. Otherwise I prefix it with 'struct_' -name.

Namespace Reconstruction

Namespace prefixes helped me to recognise the functions part of the API, but it would be frustrating having to write sdl.SDL_Init instead of just sdl.Init. I rewrite every name using a regex substitution. I have to do some verification to make sure the substitution doesn't produce namespace collision.

Finally

Linus Tolvards makes sure to not break ABI and userspace of linux. Library API definitions written as C/C++ headers makes sure we don't need negligence from Linus to break the userspace.

I hope C wouldn't be a standard for describing how to interface with shared libraries. Considering the effort required to generate bindings it is nearly unreadable for non-C programmers. For similar reason it is also easy to design a C API which doesn't stay consistent across platforms. We are bullying ourselves here.

Similar posts