Thoughts on Writing a Compiler

Last week I wrote a compiler for my language. It translates the source code into bytecode so it can be evaluated by a runtime written in C. Now I share some of the thoughts I caught.

I realised the compiler could be written entirely in my own language. Both the specializer and the bytecode compilers. The bytecode compiler could be always load from the bytecode, and the specializer could specialize itself. This kind of organization would make the language easier to modify and adjust.

I wrote my bytecode and IR to obey the SSA form, with substitution operation for representing store into upscope variables and for code that isn't analyzed yet. All the operations are stored in blocks such that the program flow enters from the top and exits from the bottom. I am not sure how useful properties these are yet, but I guess I will discover more about it soon enough.

To compile the parse tree into bytecode, it seems to be practical to provide an object that works as a state for the builder. The builder can tell at any time the context for compiling, which is useful for example when compiling return, break, continue -statements. Otherwise the code to compile the program flow would end up being hard to understand. I do not compile scope yet, but it seems that it's best to compile one function at a time, so there would be full scope information available for compiling the nested functions.

The point in optimizing a program is to make the transformations to code, that will improve performance but not affect the behavior of the program. The very simplest operations, such as removing an unnecessary jump in the program can be done directly. To have better optimizations it seems that the program needs to be analysed to prove that nothing bad happens if you rewrite or eliminate some code, or leave away a test, or let the two values use same register.

The postorder traversal through the flow graph is, interestingly, one thing that I ended up using often. It's useful if you're doing linear register allocation, determining variable flow or just doing abstract interpretation on the code. It might have something to do with an observation that the immediate dominator of every node in the graph ends up being after the node in the list produced by the traversal.

The dominance seems to be useful concept in overall. It tells which nodes are for sure visited when one arrives into a node. The SSA seems more like a way to realise things based on dominance, rather than being an optimization in itself. Overall where the program arrives from into the node seems to be really interesting for optimization. I see myself needing immediate predecessors table as often as I need the postorder traversal through the flowgraph.

I'll publish the code when I have written the runtime for my language. It should appear into the language repository.

Similar posts