Compiling with Continuation Passing Style
I'm about to describe an insanely effective way to implement a self-hosting dynamic programming language. Effective in the sense that it brings the language self-hosting in matter of weeks or days.
The outline of this compilation strategy is described in the Cheney on the M.T.A. -paper. The method to convert the program into continuation passing style has been explained by Matt Might. At the first try I didn't understood this explanation though, so I believe it deserves my attempt at explanation.
Outline
This compilation strategy is suitable for a lexically scoped programming language that passes function call arguments by value, and may have complex control flow constructs such as greenlets, bactracking or exceptions.
The source code is translated into continuation passing style, which can be described roughly by this language:
call -> (parameter parameter*)parameter -> (lambda (variable*) call)parameter -> variable
Once in CPS form, the computation can be decomposed into blocks, divided into following pieces:
[upscope variables][argument variables][new variables][closure instantiation][variable assignments] [inline operations][call with continuation]
Once compiled to C, all constructs are allocated in the stack. Once the stack starts to fill up the GC is invoked to copy the live objects into heap and flush the stack.
Translation to Continuation Passing Style
I did this slightly harder at the first time. The technique that I used introduced (lambda (a) (k a)) - objects, which are entirely unnecessary and needed a coalescing -operation to remove. Anyway, here's how Matt Might proposes to do it:
function compile(expr, k)expr is an expressionk is a continuation or functionIf expression is symbolretrieve(k, expr)If expression is lambda special formadd continuation variable 'cont'into the special formcompile the body with theintroduced continuationretrieve(k, lambda)If expression is callcreate empty list 'args'create list 'seq', consisting fromparameters of the callnext_parameter is a function with argument 'a'push 'a' to list 'args'if no parameters left in seqbuild a function callusing 'args' and lift(k)otherwisecompile(take next parameter from 'seq',next_parameter)compile(take first parameter from 'seq',next_parameter)function retrieve(k, a)k is a continuation or functiona is a valid cps parameterif continuation then build afunction call(k, a)otherwise call k(a) and return the resultfunction lift(k):k is continuation or functionif continuation, just return the continuationotherwise build lambda withargument 'a' and body k(a)
In short, the function is called with the expression, accompanied by a continuation or a function which produces a call when you give it a valid cps parameter. The functions lift and retrieve are used to control what kind of construct is built. This way, the code never needs to create superfluous continuations that'd have to be pruned out of the code.
Compiling step: CPS to C
To make the compiling easy, the CPS must not contain cycles. It can be made sure by binding loop bodies into variables, so the body doesn't introduce a cycle. The first step is to define a SHADOW -set for every function.
SHADOW- consists of all variables thatare produced by the function.[argument variables]+[new variables]
If you've been doing some booking during the CPS pass, you already have the shadow set and you don't need to compute it yourself.
Next one needs to compute variables that must be fetched from the upscope. It happens by roughly following rules:
given a lambda- find the upscope sets of the lambdasand calls connected to this lambda.- union them together, discard everyvariable in the SHADOW set of this lambda.
Now you have everything you need to construct the equivalent set of C -functions. From here on you don't need to do anything else but give every lambda a name, plot them into a C-template. Then dump each function's header and the function itself into a .c -file, accompanied by the header file.
Compiling generic
This technique doesn't depend on C. There is practically no need to compiling into C. Once the upscope has been computed into every lambda, you can convert them into blocks that I described under the outline -subtitle. That might be your intermediate form, reserving a way for various ahead-of-time or just-in-time -optimizations. The SSA -forms are said to be equivalent to a subset of CPS -forms.
The C compiling step is useful for bootstrapping the language. It's simple and leaves tons of pathways open. You won't end up coding up anything that'd force your new language into a fixed path. When you need to get your next dynamic programming language up and alive, this is just the perfect way to do it.