Start of abstract interpretation in Lever

Today I provided introspection features to Lever:

import optable
optable.enc          # encoding table.
optable.dec          # decoding table.

function.spec.argc        # required arguments
function.spec.optional    # optionals, null allowed if missing.
function.spec.is_variadic # has variadic argument?
function.spec.varnames    # all variable names.
code = function.code
code.closure         # same as 'function.code'
code.module          # module for this function
code.is_generator    # is this a generator?
code.excs            # exception blocks
code.regc            # register count
code.localc          # locals count
code.length          # bytecode length
code[i]              # i:th byte in code
code.get_sourceloc(pc)
code.getupv(frame, index)
code.constant(index)

Equipped with these, I can do abstract interpretation on Lever byte code.

Using these you can dump and read the source code after it has been loaded into memory. This is the feature that made it possible for RPython to translate Python programs into machine code.

As soon as I got the introspection to work, I dumped the bytecode of this simple function:

(a, b):
    return a + b

Here's the bytecode printout:

0000: 0 = getloc 0(index)
0003: 1 = getloc 1(index)
0006: 2 = getglob "+"
0009: 3 = call 2 3 4
0014: return 1
0016: 5 = getglob "null"
0019: return 1

These instructions are internally represented in 16-bit unsigned integers. The first integer encodes the length in its lowest byte.

Please note that this might not be the most efficient format for interpretation, therefore the design focused on aspects such as on making it nice and enjoyable to use.

I can still remember how stubborn I was about the efficiency questions. The people working at PyPy insisted me that the bytecode was irrelevant for performance because the JIT performance really doesn't make any connection with specifics of the bytecode.

The rules on how the JIT transforms machine code can be thought as a partial evaluator that selects on what it partially evaluates. The bytecode is a constant, and therefore it is immediately elided away from every trace.

There is still a thing I am concerned about in the bytecode though. I believe it could further be simpler while fitting on what purpose it has. I believe the jumps and hoops and imitation of machine code with cond -instructions and exception handling lists might be entirely redundant and pointless.

I have already dug up my link lists and references for compiling and various instruction formats. In the few next months Lever runtime will get a translator and a compiler.

I will apply the translation framework to graphics programming and computer algebra systems.

Similar posts