MLsub subtyping in SSA IR
Stephen Dolan's work on a subtyping system seemed brilliant, but to really use it I had to gain an intuition into what is going on and how does it work.
The instant moment I saw the demo in the above link, I felt the type annotations were unusual from the usual. They felt acceptable, humane, sane, correct, fresh and pleasing.
Dolan's thesis utilizes a lot of theory, presents proofs for the algorithms. It is a brilliant thesis but very hard to approach for laymen. Still, what is presented is appreciable. I hope this post builds up your intuition on the contents of that thesis.
I already attempted this earlier but my methods had severe flaws that hindered a faithful explanation of the important concepts. This post will hopefully do better.
Perspective on types
First, when understanding subtyping there is an useful perspective on what is a type, which isn't probably on the top of your mind when you begin reading this.
The type is a shorthand for a constraint. It represents a capability a value is required to have if it appears under that type.
An integer type requires that a value is able to represent an integer value and partake in integer arithmetic.
A record type requires that you can take some value out from a slot in the record, or store something into the slot of that record. It may have requirements set on the value it accepts.
A function type requires that you can call it with some values, and in return you get some value out. Just like the record, it may have requirements set on the values that it accepts.
To force the types to form a lattice, we additionally have a top and a bottom.
A top type does not assign requirements for a value. It can be anything at all.
A bottom type assigns all requirements for a value, including those we do not know about yet. In practice nothing can satisfy those requirements.
Here are their representations that we are going to use:
int integer type
{n} record type (has label 'n')
{n: t} record type (has label 'n' with type 't')
(a, b) -> c function type
top top type
bot bottom type
We do an assumption that these are not the only types in our system. Rather, these are the types we are aware about.
The type system we are about to build forms a lattice. Therefore some of the terminology comes from lattices here.
Type variables and subtyping constraints
When it is not clear what type constraints the value has, it is represented with a variable. To represent those variables lowcase alphabetical letters are used in this illustration.
Variables may be constrained with a subtyping relation, denoted with less-or-equal sign:
a <= b the 'a' must be a subtype of 'b'.
The type 'a' here is more constrained than the type 'b'. In layman terms the meaning is that you're able to satisfy the requirements of the type 'b' with the type 'a' values.
Type recursion
When type variable occurs within itself, it can be denoted by recursion:
µa.{next: a}
This notation has resemblance with function notation of lambda calculus, but the meaning is different. To get an idea of the meaning, lets unfold the recursion once:
{next: µa.{next: a}}
Although the representation unfolded, it is still the same type.
Meet and Join
The types can be combined without restrictions, after all they are constraints that demand abilities from values.
The meet operator takes two types and forms a type that requires that a value has capabilities of both types.
a ∧ b the meet operator, resulting type has
capabilities of the both.
The meet is also called a conjunction of upper bound constraints because it satisfies the following:
x <= a and x <= b iff x <= a ∧ b
The join operator similarly takes two types and forms a type that only has capabilities the both of those types have.
a ∨ b the join operator, resulting type has
only capabilities both of 'a' and 'b' have.
The join is also called a conjunction of lower bound constraints because it satisfies the following:
a <= x and b <= x iff a ∨ b <= x
Every meet and join is valid, no matter which types they represent. They may come as valid results from the type inference engine. It is a different matter whether they resolve to a type that has been implemented on the machine.
To give you a grasp of what is going on, I'll shove some examples down your throat.
(a -> b) ∧ {x: c}
This could be satisfied by a record that is also a callable. After all you can make this kind of objects in Python. This next in other hand:
(a -> b) ∧ {x: c} ∧ int
This could be satisfied by a record that is a callable, and you can also do integer arithmetic on it. I doubt anyone would want that. The type system doesn't hang to this, but the implementation is able to tell that it cannot implement this kind of type.
The joined types can also resolve to valid types. The compiler could be able to resolve it into a tagged pointer that you access with type promotion later.
Could such join types cause problems with this type system? Before answering that lets continue a little further.
Subtyping relations between types
Some of these types are naturally valid subtypes of each other.
Every type in the system is a valid subtype of the top
.
And every type in the system is a valid subtype for bot
.
The other subtype relations are associated to the types
themselves:
{m, n} <= {n}
(a) -> c <= (d) -> f iff c <= f and d <= a
The meaning is roughly that records can provide more fields than what are consumed. Functions work if they have the same shape and they provide in minimum what is required and require in maximum what is given.
The type inferencing
The type inferencing system defined by the mlsub thesis is based on the method from Hindley-Milner, which solves the problem through unification.
The unification in Hindley-Milner type system extracts equality constraints from the program and uses substitution into type expressions to eliminate them. The MLsub does the same, but instead uses inequality constraints.
Meaning of subtyping constraints
Before going further, lets discuss an example of extracting inquality constraints from a program. Considering the following short program:
x = {m: 5, n: 2}
x.n
The first line assigns a record into a variable. This means
that the variable x
must accept such a record as an input,
the constraint to represent this is:
{m: int, n: int} <= x
The second line retrieves a field from the record in the variable. So the variable must have such a field, it is represented with this:
x <= {n: y}
These are all the constraints that are provided. If we can
change the type of x
such that they are satisfied, we are
done here.
Eliminating inequality constraints
The solution can be found in the conjunction clauses of the join and meet operators:
x <= a and x <= b iff x <= a ∧ b
a <= x and b <= x iff a ∨ b <= x
Without going to details right yet, the idea is that we can transform inequality constraints into substitutions that produces types which satisfy them, like this:
{m: int, n: int} <= x
becomes...
x := {m: int, n: int} ∨ x
That is, x
becomes the join of the original x
and the
value which provided a lower boundary to it.
Likewise the another constraint transforms into a:
x <= {n: y}
becomes...
x := {n: y} ∧ x
And this substitutes the x
with a new upper boundary.
Polar types
The upper boundary of a variable describes what can be passed in as a value, whereas the lower boundary tells what does the value satisfy if it is retrieved.
A new upper boundary only affects inputs, whereas a new lower boundary only affects outputs. To make use out of this we will introduce type polarities.
The negative polar types:
-a variables
int integers
(+a, +b) -> -c functions
{n: -a} records
-a ∧ -b meet
top
The positive polar types:
+a variables
int integers
(-a, -b) -> +c functions
{n: +a} records
+a ∨ +b join
bot
Substitutions that introduce new upper boundaries only occur on positive instances of variables whereas the ones that introduce new lower boundaries occur on negative instances.
A full example, or two
Lets take a new program again, but this time define it as a function:
argument z
x = {n: z}
y = x.n + 1
return x
It does nothing interesting, but this time it's got a
function signature of type (z) -> x
, the type inference
should give us more accurate signature.
The constraints that come out are:
{n: +z} <= -x
+x <= {n: -g}
+g <= int
int <= -y
It is important to note that these expressions have polarities too. The reason is that we also substitute into them.
The first constraint reveals the substitution
+x = {n: +z} ∨ +x
that we can use twice, the state ends up being:
function (-z) -> {n: +z} ∨ +x
{n: +z} ∨ +x <= {n: -g}
+g <= int
int <= -y
Topmost remaining constraint now decomposes into more:
{n: +z} <= {n: -g}
+x <= {n: -g}
One of the resulting constraint further decomposes to reveal
a subtyping relation between z
and g
+z <= -g
+x <= {n: -g}
One substitution at a time we go forward...
function (-z) -> {n: +z} ∨ +x
+g = +z ∨ +g
+x <= {n: -g}
+g <= int
int <= -y
In the next step the +g
variable and +x
variable
describe trivialities and can be erased:
function (-z) -> {n: +z} ∨ +x
-x = {n: -g} ∧ -x
+z ∨ +g <= int
int <= -y
The current substitution causes no results because the -x
no longer appears anywhere.
function (-z) -> {n: +z}
-z = int ∧ -z
int <= -y
Also -y
doesn't even appear, although it would reveal the
+y = int
for the intermediary variable. We can finally
retrieve the type signature:
(int ∧ z) -> {n: z}
The result is weird, did we go wrong way somewhere? No. It is just surprising. It tells you that the value you pass into this function must be a valid integer and have the properties of what is passed out inside a record.
It's so correct that it ends up surprising you.
BUT WHAT IF?
Earlier on this article I discussed about the possibility to use types produced by joins as a sort of "promotionable" types.
The situation could happen in the following kind of program:
1:
x = 10
goto 4
2:
x = {a: 5}
goto 4
3:
x = () -> 7
goto 4
4:
x = phi(x@3, x@2, x@1)
x + 1
x()
x@4
variable in this example would demand to be a function
and an integer at the same time, it will be only able to
become either one though. The type constraints here would
be:
int ∨ {a: int} ∨ (() -> int) <= x
x <= int ∧ () -> int
Are we in a trouble?
int <= -x
{a: int} <= -x
(() -> int) <= -x
+x <= int
+x <= () -> int
Lets find it out, the x doesn't appear in any of our inputs so we can write open their substitution forms straight away, the won't be affected.
+x = int ∨ +x
+x = {a: int} V +x
+x = (() -> int) V +x
+x <= int
+x <= () -> int
And we can coalesce them, then substitute:
int ∨ {a: int} V (() -> int) V +x <= int
int ∨ {a: int} V (() -> int) V +x <= () -> int
It seems we are being catched. These constraints cannot be satisfied. The type inferencer would have to report a failure.
If we proceed despite that violation, we get some nonsensical constraints:
int <= int
(() -> int) <= int
{a: int} <= int
+x <= int
At this point something is definitely wrong, we are certainly unable to continue.
Could be put an explicit promotion command in between and
have it work? Maybe. The biunification requires the
constraints to have +t <= -t
-polarity, but the integer is
a valid subtype of the join.
It is probably the best idea to let the type constraints mean what they mean and use the explicit box types or tagged unions if you want to box types.
Notes from the use in practice
If you implement a compiler based on this into a dynamically typed language, all of these things happen inside a translation unit. You have separate type systems in your runtime and C FFI, during the compiling those types should be imported into the type environment.
The type environment exists inside the translation unit and ceases to be once the target code is generated.
C structures and their members are possibly exciting here. C structures differ from the types here by byte offsets that are required for accessing the members.
A little try:
x = {g: int(3) @4}
x.g
We have the constraint:
{g: int(3) @4} <= -x
+x <= {g: -y@?}
Outcome of the substitution:
{g: int(3) @4} <= {g: -y@?}
For this substitution to succeed we have to enforce offset into our attribute access. Therefore it is perfectly possible to use this subtyping scheme to produce subtyping-capable C structs and unions.
Type automaton
The naive implementation of the type inference here would produce very cluttered results. Dolan's thesis also provides a way to implement a type automaton graph datastructure for solving these constraints.
The details of the type inference engine are potentially very complicated and hard to understand so they will be covered next.
The types are represented as a certain kind of automaton for the biunification.
Every circle in the above picture is a state. There is a finite set of states. Each state has a polarity and zero or more head constructors.
The negative polarity states are colored black and the heads of each state are written inside them into braces.
The heads represent types, and to represent type variables inside them they introduce labeled transitions from one state to an another marked as solid lines.
The automaton doesn't contain type variables as they are represented implicitly as flow edges that are those dotted lines.
Biunification in the automaton
The biunification proceeds by merging states together. The
operation merge(q1, q2)
copies the heads, transitions and
flow edges of the q2
into q1
.
One potential source of confusion can originate from the fact that the automaton can become nondeterministic when states are merged. It appears that each state is just allowed to have multiple transitions with same label.
To get the details right, remember that the outcome should this biunification algorithm should be the exact same as that of the one earlier.
The unification algorithm as it is described in the thesis:
biunification of +p <= -q
Stop the biunify if it has been already done for p and q.
If there is at least one x = H(+p), y = H(-p) such that
x is not the subtype of y, then there is a type error.
Stop the biunify.
If there exists a head in +p, that is not subtype of
all heads in -q, then there is a type error, stop the biunify.
for each state -s that has a flow edge with +p
merge(-s, -q)
for each state +s that has a flow edge with -q
merge(+s, +p)
take each +j, -k where the following holds, eg. have the same label and polarity does not change.
transition(+p, +j, label)
transition(-q, -k, label)
do
biunify(+j, -k)
take each -j, +k where the following holds, eg. have the same label and polarity changes.
transition(+p, -j, label)
transition(-q, +k, label)
do
biunify(+k, -j)
To understand this, and be able to trust it, we should be able to verify that this indeed eliminates the constraint through the substitution just like before. It is worthwhile to check that.
The first action of stopping biunification if it has been already made works, because if the constraint is eliminated from the equation then it can be ignored from that on.
The second rule of causing an error if the subtyping rule fails ensures correct typing, as seen before with an experiment we just made earlier.
The last rules of traversing transitions make sense as well. They decompose larger constraints into many smaller ones.
Only the merging is left from the explanation. Why does it modify the states in the end of the flow edges? Also what are the conditions for it to work correctly?
The variables of same polarity cannot mix the join/meet operations, so it'd seem that the substitution operation would turn out to be some kind of a merge.
Lets remind that the join-substitution is:
+a <= -b
+b = +b ∨ +a
And the meet-substitution is:
+a <= -b
-a = -b ∧ -a
The polarity of the item that is substituted by an another is opposite of that one which appears in the constraint. The flow edges point out what is exactly substituted out.
What is a state in this automaton?
The flow edges form a bipartite graph because they cannot occur between states of same polarity. What we used to call a type variable ends up being a biclique in that graph. Biclique is a sub-graph where every vertex is connected to every other vertex of the opposite kind. The type variables are just labels for data flow.
The states themselves would appear to be input and output ports for data flow. For each variable in the program there are two of them. The positive one is an output port, and the negative is an input port.
If you do not have an output port for the variable then it cannot be observed. If you do not have an input port for it then it is not writable.
Next we should grab tight on the intuition we have so far and go through few examples, this time with the automaton in the picture. I think it is enough to work through two examples:
+a ∨ {x} <= -b ∧ {y}
If you expand this, you will find out that it will produce the following:
+a <= -b
+a <= {y}
{x} <= -b
{x} <= {y}
This explains why the biunification is bidirectional. You can imagine a slack variable for every state, so the biunification in our automaton represents the above example when both states have some heads. It reveals that the bisubstitution is bidirectional.
The first constraint represents flow edge update. The last constraint represents for that subtyping check. So we can conclude the steps in the biunification algorithm actually match with these facts.
The another case to consider is the one below:
+a <= {x: b}
+a <= {x: c}
As the result you should get the -a
merging two states
into itself and adding flow edges between them. The result
should explain what those flow edges are and why there can
be many values with the same transition edges.
You may also notice that if your state does not represent a variable, then you can get away without creating a flow pair for the state.
At this point you should have very clear intuition about what everything means in this automaton.
Flow edges illustrate value flow
The flow edges appear between every input and output where the value given as input may appear in the output.
This should give you an intuition how the constraints are eliminated by the automaton. They are just distributed such that an input cannot accept anything that an another input downstream from it cannot accept. Additionally they are being continuosly linked together such that the program doesn't need to traverse through the whole chain of states to do the restriction.
Likewise the same happens with outputs, but the meaning is inverted. Outputs are constrained such that they represent the intersected properties of types that may pass through.
Each biunification produces new flow relationships.
Transition merges
At every step the transitions of every state are merged into an another. You will produce some of these at every point where the program retrieves an attribute.
Having multiple transition edges of same type is remotely similar to having those states merged. The additional edges are redundant and do not create new kind of types you could represent with the automaton. But they may allow the merge to be simple.
Differences between Hindley-Milner type inference
If you didn't already notice, this typing scheme is adding information while it is going through. The graph grows on every constraint solve. It is different from the unification algorithm.
When Hindley-Milner's inference algorithm operated, the information was destroyed and the graph it operates on, they can shrink at every unification step.
The performance with the good implementation of this algorithm may be worse than the performance with a good implementation of Hindley-Milner.
But better quality of the results, even if it was subjective, you would expect it to cost some more.
Towards the practicalities
At this point you should have understood the algorithm itself pretty well. You should also understand that the ideas here could be applied in many ways and your way to apply them might turn out to be better than mine.
I implemented a subtyping inference engine for my intermediate representation. The source code and input of this engine was dynamically typed. It was required to be in SSA form.
For now I didn't get any benefits from the SSA condition in implementing the type engine, but it helped with the implementation of the partial evaluator.
The type automaton described before became a piece of the translation unit. The unit consists of all the information that is needed to produce the target code.
One important detail is that my type inference engine is supposed to work in tandem with the partial evaluation step. It is the first step, and the purpose of that step is to remove the steps that can be computed straight away.
When the IR arrives into my translation unit, it contains python-like functions for getattr, setattr, call. Each one of those dispatch based on their type.
The partial evaluator must run substitutions to produce the final program. Getattr should transform into something, Setattr should transform into another. But before they can do so they must obtain some information about what's their meaning in the first place.
Every SSA IR instruction that could be passed around as a value did get a state flow pair. The phi-functions are translated to constraints between their inputs and outputs.
Every SSA IR instruction and phi node that can be passed around as a value should get a state flow pair.
Best approach to handling C structs and pointers?
All the above may make you to wonder, how would you really map C with its structs and pointers into this framework such that it doesn't become terrible to maintain?
How should structs be treated, what about the pointers? I think it really depends on how your language's semantics go around these concepts.
In my case it was really simple. The idea of a struct type included the idea that there's a pointer to the struct. The IR would not even pretend that structs could appear as direct objects. The attribute access would always require a struct, and the pointers would have to be dereferenced into structs before the attribute access would be allowed on them. This would only apply within the automaton though, outside the translation unit the C structs would be decorated with pointers as usual.
Each struct is also treated as a valid head. This allows the type engine to not enforce the offset constraint, yet describe the offset correctly to the generator by giving it all the structures that may arrive to an attribute access.
The getattr that would be informed that their base object is
potentially a structure end up rewritten into load/access
-instructions. If the value of load
is not used in
anything else than extract
, then it is erased and the
extract is replaced by load/access
. This behavior
originated from SPIR-V
which was one of the target
languages for my translator.
The pointers would get the same referencing/dereferencing behavior.
How about type promotion semantics?
The Vulkan community proposed that I would start with explicit typing. In optimized programs the typing is often explicit because it matters for vectorization and such. For people looking for performance, implicit type promotions turn out to be unpleasant surprises.
I think I will provide the type promotion as an option into the engine, but it won't be a type engine matter. Good type promotion semantics should rely on range analysis. When the range analysis comes into the picture we are completely out of the type inference's turf.
If the partial evaluator and range analyzer would interact, we could optimize some programs further, but the behavior of this compiler would become much more difficult to predict.
Treatment of head constructors?
One remaining question that left answering are the head constructors.
The head constructors are the only pieces that represent the constraints in the automaton. They are the only piece that describe the type lattice and the rules that lattice follows.
Theoretically, when you output a value, you can produce a head constructor that has a shape: {a,b,c}, giving it the attributes that the new structure obtains. When something has to be constrained you produce a head constructor that defines the constraint, eg. {b}.
As the type engine just adds the head constructors of an one cell to an another, the above constraining works just fine.
In practice you can also provide a
merge_to(this, group, polarity)
-method which attempts to merge the head constructor into the
existing head constructors in the group.
Therefore you can provide the subtyping engine, then require that the user of the engine provides the following interface to describe the rules of the types he is going to use:
is_subtype(x, y) true/false depending on whether x <= y
head.merge_to(group, polarity) allows the head to describe
compaction behavior when merged
into a set.
Just remember that the polarity changes the merging behavior
significantly. Eg. In the records the +
polarity should
cause the attribute fields to be intersected, eg.
{a,b} ∨ {b, c} == {b}
when two record head constructors
become one record head constructor.
Likewise the -
polarity should cause the attribute fields
to union, eg. {a, b} ∧ {b, c} == {a, b, c}
.
I also tried to store the transition edges in the head constructors. Let me warn you: that won't be the same thing. The head constructor attributes may end up being removed, but the transitions in states won't be ever removed. This may be necessary to ensure that constraining order doesn't matter.
The results
The awesome thing about this all is that the type inference engine made like this ends up being rather simple. A nice implementation easily ends up being under 1000 lines of code.
The current implementation converts the
getattr/setattr/getitem/setitem
directly into the appropriate access/load/store/extract
-instructions. The conversion could be done later in the
code generation step, but it was a good idea to exercise the
partial evaluator even if the test program was the simple
buffer filler from the
few weeks ago.
An another nice discovery is that we can treat integer constants as a special subtype of integers, which is a subtype of all integer-like types. Therefore you don't need to differentiate between integer or float constants of different bit widths, as they derive their properties from the environment where they're put.
Despite that I am using C types for translation, the language ends up being surprisingly strongly typed. It even hints a little how it could be strengthened further, while remaining backwards compatible with the C language.
Here's the IR and type annotations for the test program from my system:
L0:
v0 = access <OpVariable> 0
v1 = load v0
v2 = access <OpVariable> 0
v3 = access v2 v1
store v3 v1
ret null
L0:
v0 -> -{} +{u32*(Input)}
v1 -> -{} +{u32}
v2 -> -{} +{u32*(Uniform)}
v3 -> -{} +{u32*(Uniform)}
The code producing this result is still bit of an incomplete. There are still many details I have to get right before you can begin using this. Type inference is no longer on the list though!
Next I am likely going to read through the resources I've collected up about value range analysis. After that it is probably time to improve the IR and have a closer look into the head constructors that are needed by the C/GLSL types.
Of course, it is also time to relax. I learned a lot by writing today's post.