Compiling with Continuation Passing Style
I'm about to describe an insanely effective way to implement a self-hosting dynamic programming language. Effective in the sense that it brings the language self-hosting in matter of weeks or days.
The outline of this compilation strategy is described in the Cheney on the M.T.A. -paper. The method to convert the program into continuation passing style has been explained by Matt Might. At the first try I didn't understood this explanation though, so I believe it deserves my attempt at explanation.
Outline
This compilation strategy is suitable for a lexically scoped programming language that passes function call arguments by value, and may have complex control flow constructs such as greenlets, bactracking or exceptions.
The source code is translated into continuation passing style, which can be described roughly by this language:
call -> (parameter parameter*)
parameter -> (lambda (variable*) call)
parameter -> variable
Once in CPS form, the computation can be decomposed into blocks, divided into following pieces:
[upscope variables]
[argument variables]
[new variables]
[closure instantiation]
[variable assignments] [inline operations]
[call with continuation]
Once compiled to C, all constructs are allocated in the stack. Once the stack starts to fill up the GC is invoked to copy the live objects into heap and flush the stack.
Translation to Continuation Passing Style
I did this slightly harder at the first time. The technique that I used introduced (lambda (a) (k a))
- objects, which are entirely unnecessary and needed a coalescing -operation to remove. Anyway, here's how Matt Might proposes to do it:
function compile(expr, k)
expr is an expression
k is a continuation or function
If expression is symbol
retrieve(k, expr)
If expression is lambda special form
add continuation variable 'cont'
into the special form
compile the body with the
introduced continuation
retrieve(k, lambda)
If expression is call
create empty list 'args'
create list 'seq', consisting from
parameters of the call
next_parameter is a function with argument 'a'
push 'a' to list 'args'
if no parameters left in seq
build a function call
using 'args' and lift(k)
otherwise
compile(
take next parameter from 'seq',
next_parameter)
compile(
take first parameter from 'seq',
next_parameter)
function retrieve(k, a)
k is a continuation or function
a is a valid cps parameter
if continuation then build a
function call(k, a)
otherwise call k(a) and return the result
function lift(k):
k is continuation or function
if continuation, just return the continuation
otherwise build lambda with
argument 'a' and body k(a)
In short, the function is called with the expression, accompanied by a continuation or a function which produces a call when you give it a valid cps parameter. The functions lift
and retrieve
are used to control what kind of construct is built. This way, the code never needs to create superfluous continuations that'd have to be pruned out of the code.
Compiling step: CPS to C
To make the compiling easy, the CPS must not contain cycles. It can be made sure by binding loop bodies into variables, so the body doesn't introduce a cycle. The first step is to define a SHADOW
-set for every function.
SHADOW
- consists of all variables that
are produced by the function.
[argument variables]+[new variables]
If you've been doing some booking during the CPS pass, you already have the shadow set and you don't need to compute it yourself.
Next one needs to compute variables that must be fetched from the upscope. It happens by roughly following rules:
given a lambda
- find the upscope sets of the lambdas
and calls connected to this lambda.
- union them together, discard every
variable in the SHADOW set of this lambda.
Now you have everything you need to construct the equivalent set of C -functions. From here on you don't need to do anything else but give every lambda a name, plot them into a C-template. Then dump each function's header and the function itself into a .c -file, accompanied by the header file.
Compiling generic
This technique doesn't depend on C. There is practically no need to compiling into C. Once the upscope has been computed into every lambda, you can convert them into blocks that I described under the outline -subtitle. That might be your intermediate form, reserving a way for various ahead-of-time or just-in-time -optimizations. The SSA -forms are said to be equivalent to a subset of CPS -forms.
The C compiling step is useful for bootstrapping the language. It's simple and leaves tons of pathways open. You won't end up coding up anything that'd force your new language into a fixed path. When you need to get your next dynamic programming language up and alive, this is just the perfect way to do it.