Subtyping

There was a post on r/programminglanguages about a new subtyping algorithm. I've been looking for something like this so I did take a look at their demo.

I was pleasantly surprised at the results on the screen. It matched close to how I would analyse the sample programs verbally. I had the feeling that I have found the typing system for my compiler framework.

It's been a rough ride trying to understand the paper and implement it into an usable module. I'm slowly starting to understand it and it's turning out to be brilliant.

The subtyping library based on this wouldn't need to have any awareness about the specifics of types it solves, it could get the information in the input.

So how would it work in practice? Consider the following implementation of factorial written into extended basic blocks:

entry(n):
  fac = constant 1
  i   = constant 2
  jump loop(fac, i, n)
loop(fac, i, n):
  cond = le i, n
  if cond then exit(fac)
  fac = mul fac, i
  v   = constant 1
  i   = add i, v
  jump loop(fac, i, n)
exit(fac):
  return fac

All lever programs can be translated into the above form. It is not the form where they are represented for the virtual machine that would normally run them, but there may be a reason to converge the two formats eventually. The reason might be simple as making it possible to store the type inference results in the modules.

It is important to understand that for every free variable in the subtyping algorithm, there is a separate idea of how the inputs and outputs of that variable are constrained. The operations that constrain the output aren't going to constrain the input, and vice versa.

The implementation of the algorithm uses type automatons, but we can also handle it on the paper just as well. Let's start and type-inference the entry block.

entry(n-):
  fac = constant 1 +<uint>
  i   = constant 2 +<uint>
  jump loop(fac+, i+, n+)

The + stands for output and the - stands for input. The explanation for the above output is that the positive integer constants have a natural output constraint of <uint>. Note that this is in SSA form, so we have sufficiently typed this program so far.

There is no information about the n so it is passed to the loop unconstrained.

The jump into another part of a function is treated like an ordinary function call, except that there does not exist an output variable. Therefore there's a fac+ ≤ fac- constraint in the program at this point. Now we are ready to inference the types for the loop block:

loop(fac -<uint>, i -<uint>, n):
  cond = le i, n <?>
  if cond -<bool> then exit(fac+)
  fac = mul fac, i <?>
  v   = constant 1 +<uint>
  i   = add i, v <?>
  jump loop(fac+, i+, n+)

I typed this as far as I could, but we can see there are still some gaps in our type information. To solve those gaps we are no longer in the problem domain of the subtyping module.

Given that the behavior of multimethods in Lever is very predictable, we can solve all of the types in the programs with what we already know.

In the case of le we have to do a bit more analysis than usual. The le multimethod table looks like this for the given input:

[int, int] -> bool

That'd seem we have solved the problem, but what if our type isn't an integer? What happens then? The answer is that we call default, which is doing the following:

try find method cmp(a, b), it if succeeds,
  return cmp(a, b) <= 0
otherwise a, b = coerce(a, b) and re-attempt
  without calling .default again.

The cmp() table is empty, but in the future it may also contain [int, int]. It doesn't matter because the first constraint subsumes it.

What does coerce() give? It coerces the arguments. It has the following implementations:

[int, float] -> [float, float]
[float, int] -> [float, float]

So we may also have to consider the [float, float] case here. So our final type signatures are:

[int, int]   -> bool  (int_lt)
[int, float] -> bool  (float_lt . coerce)

Merging these our lt would constraint to:

(int, int | float) -> bool

The mul and add are very simple! There is the int_add that matches to the all integer types and has the system-defined type signature (a, a) -> a.

So now we have our constraints for the loop block:

loop(fac -<uint>, i -<uint>, n -<number>):
  cond = le(i, n) +<bool>
  if cond -<bool> then exit(fac+)
  fac = mul fac, i +<uint>
  v   = constant 1 +<uint>
  i   = add i, v +<uint>
  jump loop(fac+, i+, n+)

The typing system has the rule that negative types cannot hold | types, so we cannot type n with the information we have. If we have <number> up in our subtype hierarchy above the integers and floats, we can use that though.

And then we can finally also type the exit:

exit(fac -<uint>):
  return fac+

We have the type signature for our factorial now. It is (uint, number) -> uint. This answer can seem like nonsense because Lever doesn't have a number type for floats and integers. But the type system is partially separate from the Lever language though, and it can have whichever abstractions that are necessary for typing the program.

If we don't have the number abstraction in our type system, we will get the type (uint, T) -> uint for this factorial. That would mean the second argument can be anything, so some would conclude that our type inference has failed. Depending on what we do and seek for, these different results can be useful.

If we are crazy, we might even describe type cmp<uint, *> which gets resolved if the fitting subtype is given. The arguments would collect the constraints.. arith<y, *> & cmp<uint, *> and so on as the program is read. In some cases this kind of scheme would cause a state explosion though.

The subtyping group restriction is present because the mixed & and | notations would be frustrating to work with, so they are restricted to their "polarities" as a tradeoff, giving this kind of polar type system. They were chosed based on what kind of constraints the programs tend to gather as they search out the constraints present in the program.

On the input you have to know the values the function requires and on the output you look for all the possible values a function can return. So this is a fairly nice system from that perspective as long as you're careful with the polarities.

I still have few open questions about how to design the type system around of the whole language around these concepts, but if I can use this I may be looking into the next paradigm change where a programming language is no longer defined through whether it supports dynamic or static typing.

Similar posts