Subtyping
There was a post on r/programminglanguages about a new subtyping algorithm. I've been looking for something like this so I did take a look at their demo.
I was pleasantly surprised at the results on the screen. It matched close to how I would analyse the sample programs verbally. I had the feeling that I have found the typing system for my compiler framework.
It's been a rough ride trying to understand the paper and implement it into an usable module. I'm slowly starting to understand it and it's turning out to be brilliant.
The subtyping
library based on this wouldn't need to have
any awareness about the specifics of types it solves, it
could get the information in the input.
So how would it work in practice? Consider the following implementation of factorial written into extended basic blocks:
entry(n):
fac = constant 1
i = constant 2
jump loop(fac, i, n)
loop(fac, i, n):
cond = le i, n
if cond then exit(fac)
fac = mul fac, i
v = constant 1
i = add i, v
jump loop(fac, i, n)
exit(fac):
return fac
All lever programs can be translated into the above form. It is not the form where they are represented for the virtual machine that would normally run them, but there may be a reason to converge the two formats eventually. The reason might be simple as making it possible to store the type inference results in the modules.
It is important to understand that for every free variable in the subtyping algorithm, there is a separate idea of how the inputs and outputs of that variable are constrained. The operations that constrain the output aren't going to constrain the input, and vice versa.
The implementation of the algorithm uses type automatons, but we can also handle it on the paper just as well. Let's start and type-inference the entry block.
entry(n-):
fac = constant 1 +<uint>
i = constant 2 +<uint>
jump loop(fac+, i+, n+)
The +
stands for output and the -
stands for input. The
explanation for the above output is that the positive integer
constants have a natural output constraint of <uint>
. Note
that this is in SSA form, so we have sufficiently typed this
program so far.
There is no information about the n
so it is passed to the
loop unconstrained.
The jump into another part of a function is treated like an
ordinary function call, except that there does not exist an
output variable. Therefore there's a fac+ ≤ fac-
constraint in the program at this point. Now we are ready to
inference the types for the loop block:
loop(fac -<uint>, i -<uint>, n):
cond = le i, n <?>
if cond -<bool> then exit(fac+)
fac = mul fac, i <?>
v = constant 1 +<uint>
i = add i, v <?>
jump loop(fac+, i+, n+)
I typed this as far as I could, but we can see there are still some gaps in our type information. To solve those gaps we are no longer in the problem domain of the subtyping module.
Given that the behavior of multimethods in Lever is very predictable, we can solve all of the types in the programs with what we already know.
- We know that the first argument to
le
is integer here. - We know that the both arguments to
mul
andadd
are integers.
In the case of le
we have to do a bit more analysis than
usual. The le
multimethod table looks like this for the
given input:
[int, int] -> bool
That'd seem we have solved the problem, but what if our type isn't an integer? What happens then? The answer is that we call default, which is doing the following:
try find method cmp(a, b), it if succeeds,
return cmp(a, b) <= 0
otherwise a, b = coerce(a, b) and re-attempt
without calling .default again.
The cmp()
table is empty, but in the future it may also
contain [int, int]
. It doesn't matter because the first
constraint subsumes it.
What does coerce()
give? It coerces the arguments. It has
the following implementations:
[int, float] -> [float, float]
[float, int] -> [float, float]
So we may also have to consider the [float, float]
case
here. So our final type signatures are:
[int, int] -> bool (int_lt)
[int, float] -> bool (float_lt . coerce)
Merging these our lt
would constraint to:
(int, int | float) -> bool
The mul
and add
are very simple! There is the int_add
that matches to the all integer types and has the
system-defined type signature (a, a) -> a
.
So now we have our constraints for the loop block:
loop(fac -<uint>, i -<uint>, n -<number>):
cond = le(i, n) +<bool>
if cond -<bool> then exit(fac+)
fac = mul fac, i +<uint>
v = constant 1 +<uint>
i = add i, v +<uint>
jump loop(fac+, i+, n+)
The typing system has the rule that negative types cannot
hold |
types, so we cannot type n
with the information
we have. If we have <number>
up in our subtype hierarchy
above the integers and floats, we can use that though.
And then we can finally also type the exit
:
exit(fac -<uint>):
return fac+
We have the type signature for our factorial now. It is
(uint, number) -> uint
. This answer can seem like
nonsense because Lever doesn't have a number type for floats
and integers. But the type system is partially separate from the
Lever language though, and it can have whichever
abstractions that are necessary for typing the program.
If we don't have the number abstraction in our type system,
we will get the type (uint, T) -> uint
for this factorial.
That would mean the second argument can be anything, so some
would conclude that our type inference has failed. Depending
on what we do and seek for, these different results can be
useful.
If we are crazy, we might even describe type
cmp<uint, *>
which gets resolved if the fitting subtype is
given. The arguments would collect the constraints..
arith<y, *> & cmp<uint, *>
and so on as the program is
read. In some cases this kind of scheme would cause a state
explosion though.
The subtyping group restriction is present because the mixed &
and |
notations would be frustrating to work with, so they are
restricted to their "polarities" as a tradeoff, giving this
kind of polar type system. They were chosed based on what kind of
constraints the programs tend to gather as they search out
the constraints present in the program.
On the input you have to know the values the function requires and on the output you look for all the possible values a function can return. So this is a fairly nice system from that perspective as long as you're careful with the polarities.
I still have few open questions about how to design the type system around of the whole language around these concepts, but if I can use this I may be looking into the next paradigm change where a programming language is no longer defined through whether it supports dynamic or static typing.