Adjoint functors & normalization
Given a category and adjoint functors you can produce a calculus around natural transformations.
In this blogpost I attempt to implement a strongly normalizing system around cartesian closed categories. I also present how to implement typechecking for such structures.
Reinterpretation of free theorems
Parametricity presents some "free theorems". The idea is that application of some functions can be permuted. A nice, useful example of this is the "head" -function.
Take an arbitrary function f
.
Apply it to every element on a list and then take the first element.
You could also take the first element and then apply the function.
λlist. head (fmap f list) = λlist. f (head list)
It is easier to express this through compositions.
head ∘ fmap f = f ∘ head
The head
has a type List a → a
.
This theorem is true for any program of this type.
The list example is nice because it makes sense informally. Parametric polymorphism results in similar kind of theorems for any polymorphic type.
Natural transformations present similar behavior.
Recall that it's a map between functors.
Natural transformation between functors F and G
must associate a morphism n(x) : F(x) → G(x)
for every object x
.
The transformation must come with a property that the composition
inside the structure can "slide" past it.
f : x → y
n(y) ∘ F(f) = G(f) ∘ n(x)
Adjoint functors form pairs of natural transformations, unit and counit, that are required to form reduction rules.
Cartesian closed category is defined as a category that has a terminal, product and exponentials. All of these structures can be described as right adjoint functors.
Terminal results in one natural transformation because the counit vanishes.
const : a → 1
Since it's a natural transformation, the following holds for any morphism "f".
const ∘ f = const
Cartesian product constructs pairs. It has a counit that splits into two.
pair : a → (a*a)
fst : (a*b) → a
snd : (a*b) → b
And likewise because it's a natural transformation, we get the theorems:
(f*f) ∘ pair = pair ∘ f
fst ∘ (f*g) = f ∘ fst
snd ∘ (f*g) = g ∘ snd
The star is a product bifunctor that also comes from the adjoint definition.
f
, g
can be an any pair of morphisms here.
f : a → b
g : c → d
(f*g) : (a*c) → (b*d)
Exponentials construct function-like structures.
cur : a → (a*b)^b
ap : (a^b)*b → a
Since they are natural transformations they have the theorems:
cur ∘ f = (f*c)^c ∘ cur
ap ∘ (f^c)*c = f ∘ ap
f
can be any morphism while c
can be any identity morphism here.
The exponential forms a profunctor.
f : a → b
g : c → d
(f^g) : (a^d) → (b^c)
Aside being natural transformations, exponentials are also something else. They have some additional theorems, these are like natural transformations but inverted.
f : a → b
(c*f)^a ∘ cur = (c*b)^f ∘ cur
ap ∘ (c^b)*f = ap ∘ (c^f)*a
The conventions are same here. I use a, b, c...
for identity morphisms
and f, g, h...
for arbitrary morphisms.
The extranatural transformation happens between covariant and contravariant functor.
f : x → y
n(p) ∘ F(f) = n(q) ∘ G(f)
It seem to arise from the exponent being an adjoint, but I don't know how it goes there exactly.
That these are adjoints also mean that they come with the following reduction rules.
[1] fst∘pair = a
[2] snd∘pair = b
[3] fst*snd ∘ pair = a*b
[4] ap ∘ cur*b = a
[5] ap^b ∘ cur = a^b
These follow the same conventions as before. Precedence order is: exponentials, products, composition.
Natural transformation is a family of morphisms, when it's used alone in an expression, it means for some morphism from that family that fits the context.
The reduction rules are numbered because each of them have their own "cross section", an identity natural transformation.
[1] a*b
[2] a*b
[3] (a*b)*(a*b)
[4] (a*b)^b*b
[5] (a^b*b)^b
This gives some intuition over reduction. Normalization could be taken as a task of emptying space between introduction and elimination forms such that they can be collapsed.
We can also examine normalization as task of permuting the expression and removing pairs of unit&counit until we cannot remove any more of them.
Now there's a problem that pair
doesn't identify
which morphism we used.
It suggests that natural transformations
contain parameters just like how functors do.
Therefore we would encode them like this:
const a, pair f, cur f g, fst f b, snd a g, ap f g
Some natural transformations erase their contents into identity morphisms and I've marked them to match, however they also take any morphisms here.
This yields a slightly different "categorical abstract machine" where counits and units end up having their own internal structure.
Some computation rules have to track identity transformations here. Therefore lets define taking of left and right identity from a natural transformation.
f : a → b
lid f = b
rid f = a
The "left" and "right" here is defined by the way how composition aligns. It's this way because we are mainly examining compositions here.
The rules for constructing these functions are readable from the definitions given so far. For example the right and left identity for exponential unit is:
lid (ap f g) = lid f
rid (ap f g) = (rid f ^ lid g) * rid g
Identity natural transformations correspond to functors. Every functor has an identity natural transformation.
With the rules given so far, the computation rules resulting from units/counits alone can be summarized as:
const f = const ∘ f = const
pair f = (f*f) ∘ pair = pair ∘ f
cur f g = ((f*g)^rid g) ∘ cur = ((lid f*lid g)^g) ∘ cur ∘ f
fst f g = f ∘ fst = fst ∘ (f*g)
snd f g = g ∘ snd = snd ∘ (f*g)
ap f g = f ∘ ap ∘ ((rid f ^ g) * rid g) = ap ∘ ((f^lid g) * g)
The resulting structures are very nice computationally, except to an one detail. How do we define what is a normal form for them?
Normal form
If we define normalization as permuting the structure until we cannot remove any structure without introducing new structure. Then it's very well defined what stands as a normal form.
That yields a definition for the algorithm to detect a normal form. We'd have to permute the structure out in every way and produce every valid rewrite that contains a reduction.
Fold-in problem
One obvious thing that helps finding a normal form is to fold away every composition. This way we don't need to permute across composition.
In most cases we can compute a dot away.
fst f g ∘ pair h = fst (f ∘ h) (rid h)
If we could do this for every structure then composition would behave almost like addition with numbers.
However it doesn't work that way. There are four cases where we cannot remove a composition.
ap f g ∘ pair h = ap ∘ (f^(g∘h)∘h)*c ∘ pair
ap f g ∘ (h * k) = ap ∘ (f^(g∘k)∘h)
(f^g) ∘ cur h k = (f∘h*(k∘g))^c ∘ cur
f*g ∘ pair h = (f∘h)*(g∘h) ∘ pair
These cases can summarized as:
- exponential counit and products do not fold together.
- exponential counit and product unit do not fold together.
- exponentials and exponential unit do not fold together.
- products and product units do not fold together.
If we examine the reduction rules a bit, they reveal that structures could be divided a bit. This division allows them to fold together.
ap (f^g)
cur (f*g)
pair (diag f)
Now z ∘ ap f ∘ x * y
folds neatly into ap (z*y ∘ f ∘ x)
.
We can reformulate the natural transformations as:
const f = 1 ∘ const ∘ f
pair (f*f) = f*f ∘ pair ∘ f
cur (f*g) = f*g^~g ∘ cur ∘ f
fst f g = f ∘ fst ∘ (f*g)
snd f g = g ∘ snd ∘ (f*g)
ap (f^g) = f ∘ ap ∘ (f^~g)*g
An important thing to observe:
These structures fold together from sides where they don't interact.
This property comes from having units and counits in here.
const
, pair
, cur
are units and the rest are counits.
Reduction algorithm (incorrect)
To reduce structures we have to describe unit/counit chains. Note that the structures are permutable along their chains. Lets define them:
L f = fst (L f) _
L f = snd _ (L f)
L f = ap (L f^_)
L f = f
L f = pair (L f^_)
R f = const (R f)
R f = pair (R f) (R f)
R f = cur (R f*_)
R f = f
Notice that the 'pair' expands outside from both of its sides, therefore it needs to passthrough reduction on both sides.
These rules have an use because they have this property:
R f ∘ g = R (f∘g)
f ∘ L g = L (f∘g)
Now we can reformulate the reduction rules:
pair (R (fst x _)) = x
pair (R (snd _ y)) = y
pair (R (R (fst x _)*R (snd _ y))) = x*y
cur (R (ap x)) = x
ap (L (cur x)) = x
const x = const (rid x)
NOTE: These reformulated rules are wrong.
They are contradicted by examples such as ap (fst (cur x) _
.
To get a better grasp of this, you can slide the structures in the expression
but check the type after sliding the structure around.
cur (pair (ap x * ap y))
is another fairly eluding structure.
Lets unfold it to make it clear what's going on:
((ap ∘ (x*_)) * (ap ∘ (y*_)) ∘ pair _)^z ∘ cur
.
You can see the original problem of structures not fusing is present.
If the structure worked, it'd yield a rewriting system. We could fuse every composition and then erase every reducible pair.
Note that two equal structures wouldn't be syntactically equal. You would have to apply the permutation to check equality.
Typechecking of structures
It is a surprisingly simple task to typecheck these structures. You have to verify that the splitted structures compose internally.
ap x → lid x = f^g
cur x → rid x = f*g
pair x → rid x = f*f
Then given any two natural transformation and their compositions.
g ∘ f → rid g = lid f
The left and right identity can be easily retrieved from the natural transformations they form:
lid 1 = 1
lid _ = _
lid (f*g) = lid f * lid g
lid (f^g) = lid f ^ rid g
lid (const f) = 1
lid (pair f) = f
lid (cur f) = lid f ^ prod2 (rid f)
lid (fst f g) = lid f
lid (snd f g) = lid g
lid (ap f) = exp1 (lid f)
rid 1 = 1
rid _ = _
rid (f*g) = rid f * rid g
rid (f^g) = rid f ^ lid g
rid (const f) = rid f
rid (pair f) = prod1 (rid f)
rid (pair f) = prod2 (rid f)
rid (cur f) = prod1 (rid f)
rid (fst f g) = rid f * rid g
rid (snd f g) = rid f * rid g
rid (ap f) = rid f * exp2 (lid f)
It is interesting to compare this to bidirectional typechecking. Bidirectional typechecking allows to separate programs/proofs in normal-form from their types/propositions. Here we have something similar going on.
Relation to bidirectional typing
Actually to illustrate, lets pick up those rules for typed lambda calculus and compare them.
───────────────
Γ, x:t ⊢ x : t↓
Γ ⊢ x : t → u↓ ∧ y : t↑
───────────────────────
Γ ⊢ x y : t↓
Γ, x:t ⊢ y↑
─────────────────
Γ ⊢ λx.y : t → u↑
To make sense of this, we have to translate the structures into our language. We can take a scope to the right side. Then we can apply a fairly well-known translation.
It starts by converting lambda expressions to cur
-expressions.
{λx.y} = cur (wr {y})
We obtain a scope that rolls downwards.
((Γ,2),1),0
Next we need to translate application. You can see it duplicates a scope and applies. Therefore we can likely translate it like this:
{x y} = pair (ap [x]y)
[x y]z = pair (ap [x](y,z))
Note that for traversing neutral forms we need an another rule. The argument ends up inside the "fst/snd" stack.
And finally we may guess that the variables can be translated like this. Take their De-Bruijn index and convert it to structure:
{var 0} = snd ? ?
{var n} = fst {var (n-1)} ?
[var 0]z = snd ? (unfold z)
[var n]z = fst [var (n-1)]z ?
unfold x = ? ^ {x}
unfold x,y = unfold x^{y}
The question marks are holes where we need to leave type information in order to make the program typecheck.
If you fill the holes with identity transformations without specifying them then this structure is "missing" type information and it must be supplied from outside.
Also now if we look at our reformulated natural transformations, they show that all the internal information leaks out from the transformations.
This seem to work even if we just folded structure together only using natural transformation rules, leaving reduction rules away.
This may have made bidirectional typechecking a bit easier to understand. Maybe it does the same for type inference?
Insights into type inference
If you think of proof as a program and proposition as a type, then Hindley-Milner type inference is an odd algorithm. It infers a proposition for a proof.
But it seems to make sense here. To see it lets first give a categorical definition for unification.
Unification finds a coequalizer for any two substitutions in a category made of substitutions.
With dependent type theory it yields the following program that satisfies two theorems:
unify : Subs → Subs → Maybe Subs
unify_thm1 : Σ(z:Subs. Just z = unify x y) → (z.x = z.y)
unify_thm2 : (q:Subs) → (q.x = q.y)
→ Σ(w:Subs. Σz:Subs. Just z = unify x y ∧ z.w = q)
The first theorem asserts that unification generates a substitution that makes two input substitutions equal.
The second theorem asserts that given any substitution that makes two substitutions equal, unification gives a substitution that can be turned into that substitution.
The definition as coequalizer would additionally require
a clause that the w
in the second theorem is unique for any x
and y
.
Another way to state this, lets say you're given some substitution scheme that shares "input" and "output" variables.
{x ↦ f (g a), y ↦ g (g c)}
{x ↦ f (g (h c)), y ↦ g (g c)}
Unification algorithm must find a substitution that makes these substitutions equal, if there's a way to make them equal.
coeq = {a ↦ h c}
{x ↦ f (g (h c)), y ↦ g (g c)}
{x ↦ f (g (h c)), y ↦ g (g c)}
Unification algorithm must find an unique substitution that can be used to produce any other coequalizing substitution.
Lets examine a program that lacks type information, and it doesn't typecheck.
cur f
So why is this program invalid?
It is because there's a constraint: rid f = x*y
.
If we substitute f = 1
, the program becomes invalid.
But the program should stay same under any substitution.
No problem then, find a coequalizer for rid f
and x*y
,
then apply it to the program.
You are going to obtain.
cur (g*h)
Type inference appears to be proof search after all. We just assume it is filling the holes with identity proofs.
Proof focusing/proof search
So it gives a nice clean perspective on type inference. How about proof search?
It turns out it can be easier to explain proof focusing in this system.
Any program in this system has a normal form. Therefore if there's a proof, then that proof has a normal form. So if we search for proofs, we only need to search for normal-form proofs.
Lets find a proof for S combinator:
? : 1 ⊢ (a → (b → c)) → ((a → b) → (a → c))
[/] const f
[ ] pair f
[ ] cur f
[/] fst f g
[/] snd f g
[/] ap f
[/] x*y
[/] x^y
Arrows translate to exponentials, but I do it when needed.
The context erases most options because they simply don't fit.
The pair
would not progress the search at the moment.
We don't have any other options than applying the 'cur' at first.
cur ? : 1*(a → (b → c)) ⊢ (a → b) → (a → c)
[/] const f
[ ] pair f
[ ] cur f
[ ] fst f g
[ ] snd f g
[ ] ap f
[/] x*y
[/] x^y
Should we apply cur
, ap
, fst
or snd
here?
Well we could apply cur
because fst
can be applied later and then
we can apply to it to obtain our structure.
Applying cur
doesn't restrict our search here so we can apply it now!
In fact we can apply it twice.
cur cur ? : (1*(a → b))*(a → (b → c)) ⊢ (a → c)
cur cur cur : ((1*a)*(a → b))*(a → (b → c)) ⊢ c
[/] const f
[ ] pair f
[/] cur f
[ ] fst f g
[ ] snd f g
[ ] ap f
[/] x*y
[/] x^y
There's exactly one item in context that can deliver c
,
this can be used to prune away lots of choices.
Likewise it's behind an arrow so it means that only way to proceed
is to apply something.
cur cur cur pair ap ?
: ((1*a)*(a → b))*(a → (b → c))
⊢ c ^ ((1*a)*(a → b))*(a → (b → c))
So far the search has been deterministic.
It stays deterministic because b → c
needs to be applied to something.
cur cur cur pair ap pair ap ?
: ((1*a)*(a → b))*(a → (b → c))
⊢ (c ^ ((1*a)*(a → b))*(a → (b → c))) ^ ((1*a)*(a → b))*(a → (b → c))
Now there's a match in the context.
cur cur cur pair ap pair ap snd ((1*a)*(b^a)) ?
: (c^b)^a
⊢ (c ^ ((1*a)*(a → b))*(a → (b → c))) ^ ((1*a)*(a → b))*(a → (b → c))
And now split.
cur cur cur pair ap pair ap snd ((1*a)*(b^a)) ((c^f)^g)
f = ? : ((1*a)*(a → b))*(a → (b → c))) ⊢ b
g = ? : ((1*a)*(a → b))*(a → (b → c))) ⊢ a
You may see how there's really no reason to apply certain operations. There's a match in the second context.
g = fst ? ((c^b)^a) : ((1*a)*(a → b)) ⊢ a
g = fst (fst (snd 1 a) (b^a)) ((c^b)^a)
For the first context.
f = pair ap (fst (snd (1*a) (b^h) ((c^b)^a)))
h = ? : ((1*a)*(a → b))*(a → (b → c))) ⊢ a
h = g
The proof in whole is:
cur cur cur pair ap pair ap snd ((1*a)*(b^a)) ((c^f)^g)
f = pair ap (fst (snd (1*a) (b^g)) ((c^b)^a))
g = fst (fst (snd 1 a) (b^a)) ((c^b)^a)
Lets drop some information that's already in the proposition.
cur cur cur pair ap pair ap snd id ((id^f)^g)
f = pair ap (fst (snd id (id^g)) id)
g = fst (fst (snd id id) id) id
And then use de-bruijn indices.
cur cur cur pair ap pair ap 0 ((id^(pair ap 1 (id^2 id)))^2 id)
Nice. I think there's likely no other proofs.
Some difficult details remain, but the point was to show that proof focusing is easier to explain in this system.
Lets put the result aside for now. We'll try readback on it later.
Prolog presentation/Herbrand universe
Since we have proof search, can we model logic programming in this system? Turns out we can, but we don't get the whole thing yet.
Herbrand universe consists of types freely constructed
from a vocabulary of function and constant symbols.
Eg, we'd have types that look like logic program terms. f(x)
Now if we take logic expressions as propositions, we can put them into the context. You'd get propositions such as:
x^(z * w * y) * x^(y * q) ...
Then put the query to the other side.
? : context ⊢ query
It's much more narrow search, only using few transformations. But the thing checks out.
For modeling Prolog we'd need some kind of an equality type and quantifiers into the system.
This has been also something well-known but it's glaringly obvious here.
Fixed points/strongly connected components
That's all good but as programs all of these structures are terminating. So you'd think that'd make a poor programming language. Such an ugly syntax and programs aren't even terminating. Fortunately things can be broken even further.
To make non-terminating programs, all you need is a fixed point. Fixed point is this kind of a structure:
fix x^x = x ∘ fix ∘ x^x
Now if you have a large program made from smaller expressions
that refer to each other, you can group them into strongly connected components.
Then for every component make a product x
.
Cut the cyclic references in the context and turn it into x^x
.
Then wrap that function into fix
.
Overall the context seem like a good tool to pass in references and "link" the program.
Efficient normalization?
So far the thing I've implemented forms an awful evaluator but it's quite interesting. Type checking is trivial and reduction happens vertically in the structure but it's not substitution and it happens with every structure not just the exponential structures.
The interaction between structures is complicated enough that I got it wrong.
Reduction operators arrange into a chain, but if it's forming a "line", then why reduction is jumping about the structure? Recall the "space" must be empty between the structures to contract it. Behavior of the "pair" is a bit odd as well because it just passes into itself either way even if it's an unit transformation.
Applications in computer aided design
Aside yielding computational systems, it'd seem these same structures would yield a nice CAD environment.
Products or exponentials could be also graphical primitives and they could be shown on the screen. Now if the language had actual graphical primitives such as points. You could state a point by writing.
pt - - -
3 dashes for three values in this case. So this would be a graphical primitive that you can edit on the screen.
Next make three points.
pair (pair pt - - - pt - - -) pt - - -
And you make a curve through 3 points.
curve3pt ∘ pair (pair pt - - - pt - - -) pt - - -
The program can present you a curve but doesn't need to forget how it was constructed. You can take constructs apart or share points between different curves.
Readback?
Maybe from there we can get to the last subject for this post. How would you read back these structures?
We have a sketch of an algorithm to encode lambda expressions into this weird format.
{λx.y} = cur (wr {y})
{x y} = pair (ap [x]y)
[x y]z = pair (ap [x](y,z))
{var 0} = snd ? ?
{var n} = fst {var (n-1)} ?
[var 0]z = snd ? (unfold z)
[var n]z = fst [var (n-1)]z ?
unfold x = ? ^ {x}
unfold x,y = unfold x^{y}
If we'd like to present something it'd be clearly in a normal form. It means we could try to do the writeout in reverse. How about we just do reduction and record the results into a structure while tracking the scope?
cur cur cur pair ap pair ap 0 ((id^(pair ap 1 (id^2 id)))^2 id)
x y z ↦
pair ap pair ap 0 ((id^(pair ap 1 (id^2 id)))^2 id) ∘ {1,x,y,z}
Good so far, lets then zap out the pair.
x y z ↦
ap pair ap 0 ((id^(pair ap 1 (id^2 id)))^2 id) ∘ {1,x,y,z}*{1,x,y,z}
Then ap. This is probably disturbing because the context flips to the other side. Lets mark some boxes so we remember that it's constructing application.
x y z ↦ ⊔ ⊔
$^{1,x,y,z} ∘ pair ap 0 ((id^(pair ap 1 (id^2 id)))^2 id) ∘ {1,x,y,z}
Zap the pair again, then zap ap out.
x y z ↦ ⊔ ⊔
$^{1,x,y,z} ∘ ap 0 ((id^(pair ap 1 (id^2 id)))^2 id) ∘ {1,x,y,z}*{1,x,y,z}
x y z ↦ (⊔ ⊔ ) ⊔
($^{1,x,y,z})^{1,x,y,z} ∘ 0 ((id^(pair ap 1 (id^2 id)))^2 id) ∘ {1,x,y,z}
The context could be split now, but lets remove the index first.
x y z ↦ (⊔ ⊔ ) ⊔
($^{1,x,y,z})^{1,x,y,z} ∘ ((id^(pair ap 1 (id^2 id)))^2 id) ∘ {z}
x y z ↦ (⊔ ⊔ ) ⊔
($^(pair ap 1 (id^2 id)∘{1,x,y,z}))^(2 id∘{1,x,y,z}) ∘ {z}
You may see how it's starting to break out.
x y z ↦ (⊔ ⊔ ) ⊔
($^(pair ap 1 (id^2 id)∘{1,x,y,z}))^(2 id∘{1,x,y,z}) ∘ {z}
x y z ↦ (⊔ ⊔ ) ⊔
($^(pair ap 1 (id^2 id)∘{1,x,y,z}))^({x}) ∘ {z}
x y z ↦ (⊔ ⊔ ) x
$^(pair ap 1 (id^2 id)∘{1,x,y,z}) ∘ {z}
x y z ↦ (⊔ ⊔ ) x
$^(ap 1 (id^2 id∘{1,x,y,z})∘{1,x,y,z}) ∘ {z}
x y z ↦ (⊔ (⊔ ⊔)) x
$^($^($^2 id∘{1,x,y,z})∘{y}) ∘ {z}
x y z ↦ (⊔ (⊔ ⊔)) x
$^($^($^{x})∘{y}) ∘ {z}
x y z ↦ (z (⊔ ⊔)) x
($^($^{x})∘{y}
x y z ↦ (z (y ⊔)) x
($^{x})
x y z ↦ (z (y x)) x
x y z ↦ z (y x) x
I did some error there. I think I ordered the scope wrong way around.
x y z ↦ x (y z) z
And it seems having mixed the order of arguments would be enough to get it wrong.
x y z ↦ x z (y z)
It's not plausible that this kind of a readback algorithm would always do it. But lets see..
- zap cur, create a lambda expression.
- zap ap, put a marker in front and convert back to expression.
- zap fst/snd, remove items from scope and finally add 'fst/snd' clauses if items run out.
- zap pair without marker, build a pair.
- zap exponent from left with a marker, build apply.
- zap product from left, bifunctor map.
- zap exponent from left without a marker, build profunctor map.
Maybe the structure can be "evaluated" back into the lambda expression.
(Partial, incorrect) Implementation in Haskell
This implements typechecking and normalization. Though it doesn't implement anything else described here.
The description of the algorithm is quite simple: You just fold things together and remove the structures with the rules. It yields an equally simple implementation although a bit convoluted.
data NT
= Top
| Term
| Comp NT NT
| Prod NT NT
| Exp NT NT
| Const NT
| Pair NT
| Cur NT
| Fst NT NT
| Snd NT NT
| Ap NT
deriving (Eq, Show)
-- The typechecking rules are complicated a bit
-- because they have to look inside the structures.
proj1,proj2,exp1,exp2 :: NT -> Maybe NT
proj1 (Prod x _) = pure x
proj1 _ = Nothing
proj2 (Prod _ x) = pure x
proj2 _ = Nothing
exp1 (Exp x _) = pure x
exp1 _ = Nothing
exp2 (Exp _ x) = pure x
exp2 _ = Nothing
-- Also the splitup of pair f introduces a diagonal -construct
-- that represents sharing of morphisms between branches.
diag :: NT -> NT
diag a = Prod a a
-- Left and right identity
-- Every natural transformation has a left and right identity
lid, rid :: NT -> Maybe NT
lid Top = pure Top
lid Term = pure Term
lid (Comp g _) = lid g
lid (Prod x y) = do
a <- lid x
b <- lid y
pure (Prod a b)
lid (Exp x y) = do
a <- lid x
b <- rid y
pure (Exp a b)
lid (Const _) = pure Term
lid (Pair fg) = lid fg
lid (Cur fg) = do
l <- lid fg
r <- rid fg >>= proj2
pure (Exp l r)
lid (Fst f _) = lid f
lid (Snd _ g) = lid g
lid (Ap fg) = lid fg >>= exp1
rid Top = pure Top
rid Term = pure Term
rid (Comp _ f) = rid f
rid (Prod x y) = do
a <- rid x
b <- rid y
pure (Prod a b)
rid (Exp x y) = do
a <- rid x
b <- lid y
pure (Exp a b)
rid (Const f) = rid f
rid (Pair fg) = do
z <- rid fg
a <- proj1 z
b <- proj2 z
if a == b then pure a else Nothing
rid (Cur fg) = rid fg >>= proj1
rid (Fst f g) = do
a <- rid f
b <- rid g
pure (Prod a b)
rid (Snd f g) = do
a <- rid f
b <- rid g
pure (Prod a b)
rid (Ap fg) = do
l <- rid fg
r <- lid fg >>= exp2
pure (Prod l r)
-- Typechecking of structures.
-- We check that left and right identity of each structure matches up.
-- On Ap/Cur/Pair structures we have to check that their left/right identity
-- matches a pattern characteristic to the structure.
is_valid :: NT -> Bool
is_valid Top = True
is_valid Term = True
is_valid (Comp g f) = condition $ do
x <- rid g
y <- lid f
pure (x == y && is_valid g && is_valid f)
is_valid (Prod x y) = is_valid x && is_valid y
is_valid (Exp x y) = is_valid x && is_valid y
is_valid (Const x) = is_valid x
is_valid (Pair f) = condition (rid (Pair f) >> (pure (is_valid f)))
is_valid (Cur f) = condition (rid (Cur f) >> pure (is_valid f))
is_valid (Fst f g) = is_valid f && is_valid g
is_valid (Snd f g) = is_valid f && is_valid g
is_valid (Ap f) = condition (lid (Ap f) >> pure (is_valid f))
condition :: Maybe Bool -> Bool
condition = maybe False id
-- A structure is in normal form when we cannot permute it
-- and remove a pair of unit&counit from it
-- without adding a pair of unit&counit.
-- Normalization rewrites compositions away from the structure
-- while folding structures whenever permitted.
norm :: NT -> NT
norm Top = Top
norm Term = Term
norm (Prod f g) = Prod (norm f) (norm g)
norm (Exp f g) = Exp (norm f) (norm g)
norm (Const f) = Const (certain (rid f))
norm (Pair fg) = pair_fold (norm fg)
norm (Cur fg) = cur_fold (norm fg)
norm (Fst f g) = Fst (norm f) (certain (rid g))
norm (Snd f g) = Snd (certain (rid f)) (norm g)
norm (Ap fg) = ap_fold (norm fg)
norm (Comp g f) = norm2 g f
norm2 :: NT -> NT -> NT
----- Constant interactions
norm2 Top Top = Top
norm2 Term Term = Term
----- N-Functor interactions
norm2 (Prod x y) (Prod z w) = Prod (norm2 x z) (norm2 y w)
norm2 (Exp x y) (Exp z w) = Exp (norm2 x w) (norm2 y z)
----- Generic interactions
norm2 f (Fst g h) = Fst (norm2 f g) (certain (rid h))
norm2 f (Snd g h) = Snd (certain (rid g)) (norm2 f h)
norm2 f (Ap gh) = ap_fold (norm2 (ap_left_in f gh) gh)
norm2 (Cur fg) h = cur_fold (norm2 fg (cur_right_in fg h))
norm2 (Const f) g = norm (Const (certain (rid g)))
norm2 (Pair fg) h = pair_fold (norm2 fg (diag h))
----- Specific interactions
norm2 (Fst f g) (Prod x y) = Fst (norm2 f x) (certain (rid y))
norm2 (Snd f g) (Prod x y) = Snd (certain (rid x)) (norm2 g y)
norm2 (Fst f g) (Pair k) = pair_fold (norm2 (Fst f g) k)
norm2 (Snd f g) (Pair k) = pair_fold (norm2 (Snd f g) k)
norm2 (Ap fg) (Prod x y) = ap_fold ((ap_side_in fg y) `norm2` (fg `norm2` x))
norm2 (Exp z w) (Cur gh) = cur_fold (z `norm2` (gh `norm2` cur_side_in w gh))
norm2 Term (Const f) = norm (Const f)
norm2 f (Pair gh) = pair_fold (norm2 f gh)
----- Composition interactions
norm2 (Comp f g) h = f `norm2` (g `norm2` h)
norm2 f (Comp g h) = f `norm2` (g `norm2` h)
ap_left_in :: NT -> NT -> NT
ap_left_in f gh = certain (lid gh >>= exp2 >>= pure . Exp f)
ap_side_in :: NT -> NT -> NT
ap_side_in fg h = certain (lid fg >>= exp1 >>= pure . (\z -> Exp z h))
cur_right_in :: NT -> NT -> NT
cur_right_in fg h = certain (rid fg >>= proj2 >>= pure . Prod h)
cur_side_in :: NT -> NT -> NT
cur_side_in f gh = certain (rid gh >>= proj1 >>= pure . (\z -> Prod z f))
certain = maybe undefined id
-- Folds attempt to erase structure.
pair_fold,cur_fold,ap_fold :: NT -> NT
pair_fold a = case right_zap zap_copair a of
Just z -> z
Nothing -> Pair a
cur_fold a = case right_zap zap_ap a of
Just z -> z
Nothing -> Cur a
ap_fold a = case left_zap zap_cur a of
Just z -> z
Nothing -> Ap a
-- Zappers attempt to destroy structure
-- If they succeed they signal it down.
left_zap, right_zap :: (NT -> Maybe NT) -> NT -> Maybe NT
left_zap zap (Comp f g) = zap f >>= (\z -> pure (Comp z g))
left_zap zap (Fst x y) = zap x >>= (\z -> pure (Fst z y))
left_zap zap (Snd x y) = zap y >>= (\z -> pure (Snd x z))
left_zap zap (Ap xy) = left_zap expzap xy where
expzap (Exp x y) = left_zap zap x >>= (\z -> pure (Ap (Exp z y)))
expzap _ = Nothing
left_zap zap (Pair xy) = left_zap zap xy >>= (\z -> pure (Pair xy))
left_zap zap x = zap x
right_zap zap (Comp f g) = zap g >>= (\z -> pure (Comp f z))
right_zap zap (Const x) = zap (Const x)
right_zap zap (Pair xy) = right_zap prodzap xy where
prodzap (Prod x y) = do
a <- right_zap zap x
b <- right_zap zap y
pure (Prod a b)
prodzap _ = Nothing
right_zap zap (Cur xy) = right_zap prodzap xy where
prodzap (Prod x y) = right_zap zap x >>= (\z -> pure (Cur (Prod z y)))
prodzap _ = Nothing
right_zap zap x = zap x
-- Zap on Const at right side is bit special as it should always succeed.
zap_ap :: NT -> Maybe NT
zap_ap (Ap z) = pure z
zap_ap (Const a) = pure (Const (certain (rid a >>= proj1)))
zap_ap _ = Nothing
zap_cur :: NT -> Maybe NT
zap_cur (Cur z) = pure z
zap_cur _ = Nothing
zap_copair :: NT -> Maybe NT
zap_copair (Prod x y) = do
a <- right_zap zap_fst x
b <- right_zap zap_snd y
pure (Prod a b)
zap_copair (Fst f _) = pure f
zap_copair (Snd _ g) = pure g
zap_copair (Const a) = pure (Const (certain (rid a >>= proj1))) -- proj1&proj2 produce same result.
zap_copair _ = Nothing
zap_fst :: NT -> Maybe NT
zap_fst (Fst f _) = pure f
zap_fst (Const a) = pure (Const (certain (rid a >>= proj1))) -- proj1&proj2 produce same result.
zap_fst _ = Nothing
zap_snd :: NT -> Maybe NT
zap_snd (Snd _ g) = pure g
zap_snd (Const a) = pure (Const (certain (rid a >>= proj1))) -- proj1&proj2 produce same result.
zap_snd _ = Nothing
I likely attempt to write the proper reduction algorithm as well. I can still try to work outwards from that exact definition of a normal form.
Conclusion
This post relates whole lot of programming language theory together. Hindley-Milner algorithm is proof-search after all, even if it's searching for types.
It's also interesting that bidirectional typing does work even with structures that'd be far away from being in a normal form.
I'm thrilled about how much computational content there is in a proof that some functors form an adjoint.