Adjoint functors visualized

I finally understand adjoint functors in detail. I can perhaps explain it such that everybody understands what adjoint functors are from now on.

I've made some SVG illustrations with Elm programming language. They are along in this post.

Lets start with categories.

Category is a mathematical object consisting of directed paths. Paths have endpoints that they may share with other paths.

Paths and endpoints can be labeled and both are regarded as paths that are composible.

Composition identifies paths that are combinations of other paths.

For an example. Let's take a small category with points a,b and path f:a->b

We have paths:

a, b, f

And we can identify their compositions.

f ∘ a
f ∘ a ∘ a
b ∘ f
b ∘ b ∘ f
b ∘ b ∘ f ∘ a

These all compositions are just different names for the path f.

Thinking about paths convey the rules that a category must satisfy.

Formally it is required there are objects and morphisms and composition between them that satisfies some rules.

Composition must be associative:

h ∘ (g ∘ f) = (h ∘ g) ∘ f = h ∘ g ∘ f

For every object x, there must be a morphism id : x → x such that it works as an identity for composition.

x ∘ f = f
g ∘ x = g

Likewise, composition is valid only when composed morphisms share an endpoint.

f : a → b,  g : b → c
---------------------
g ∘ f

Now you should have a rough idea of what a category is. We are going to examine these categories.

Category theory doesn't look inside points or paths. It's very generic that way. But if we don't identify contents of points and paths it can be bit too abstract to handle at first.

If you treat functions as paths, function composition as composition, and program space as a category. You obtain a programming language and you can derive rules of computation for it by recognizing constructs that produce adjoint functors. I'm going to show how this works.

Equivalent paths

Equations describe which paths are regarded as same. Earlier you saw the associativity rule:

h ∘ g ∘ f = (h ∘ g) ∘ f = h ∘ (g ∘ f)

The idea of this rule is that we can just drop parentheses around the composition.

Likewise we had the rule for identities. Given any path f:a → b:

f ∘ a = f
b ∘ f = f

We're going to have more equations, some of them are suitable computation rules.

Equations convey the structure that a category has and there's a way to represent them with diagrams. You may draw a diagram and say that the shape inside the paths commutes.

Aside describing that paths are identical, an equation may also state that paths are equivalent. If we think that paths are continuous, then we may think that equivalent paths can "continuously deform" into the other.

For an example, we might identify differently ordered pairs as equivalent.

(a, b) ≃ (b, a)

Then we could create a "thick path" from such equivalent pairs to other equivalent pairs.

Identical paths and equivalent paths are similar concepts. When two paths are equivalent you may imagine there forms a continuous "plane" between them. Likewise when paths are identical then you could think that you can stretch them such that they look like an equivalence.

A continuous deformation may also have a direction, in that case it forms a path.

Next I describe some constructs we need in order to define adjoint functors. They can be thought of as different kinds of continuous deformations.

Functors

Functor F is a mapping between categories that covers everything from it's source category and maps it to the target category.

Visually we can think Functor is a smooth projection of the source category into the target category.

Every source path has a corresponding path in the target category.

F : C → D
F(f) = f'
for every path f in C
there is some path f' in D

The smoothness of the mapping means that paths may stretch or collapse but not break. Paths preserve their structure such that composition remains the same operation.

F(g ∘ f) = F(g) ∘ F(f)
For every f:X → Y and g:Y → Z.

For an example lets pick a diagonal functor that maps a category into it's cartesian product.

It's usually written down like this:

Diag : C → C*C

If we start with a category where there's 1 path f, the diagonal brings it into the following category:

source points: a,b
source paths: f

target points: (a,a), (a,b), (b,a), (b,b)
target paths: (f,a), (a,f), (f,b), (b,f), (f,f)

The diagonal forms the following mapping:

a ↦ (a,a)
b ↦ (b,b)
f ↦ (f,f)

And since it's a small category it suffices to just check through every pair to verify that we have a functor:

F(f ∘ a) = F(f) ∘ F(a) = F(f)
F(b ∘ f) = F(b) ∘ F(f) = F(f)

You could also think that a functor displays an image of some category in an another category.

To describe adjoint functors, we need a concept of smoothly transforming a functor into an another. This can be thought of as moving the image a functor displays.

Natural transformations

Natural transformation is a transformation of a functor into an another functor.

Since functors are required to cover their source category, the natural transformation ends up continuously transforming the image of one functor into another functor's image.

Formally the "continuous" transform is represented as constraint that must be satisfied by the natural transformation:

ŋ : F(C) → G(C)
f : a → b
ŋ(b) ∘ F(f) = G(f) ∘ ŋ(a)

Both functors and natural transformations can be used as paths to form a category. This means that there's an identity functor for every category. Likewise there's an identity natural transformation for every functor.

Adjoint functors

Take two functors forming a loop between two categories, F and G.

We're only drawing the images each functor is tracing to their target categories. Note that we don't build this with the earlier category with only one path in it. Also there are more paths in these categories than what we show here. To make the picture complete we'd have to trace the image of G back to D category through F functor and vice versa.

The diagram represents the diagonal functor on the left side. On the right side there is its right adjoint. This construct can define pairs in programming languages.

The functors are adjoint if there are the following natural transformations:

unit   : C → G∘F
counit : F∘G → D

The unit transform the identity functor of the upper category to G∘F. The second one counit transforms from F∘G to identity functor of lower category.

The visualization hopefully makes it easier to understand what's going on. The unit transforms the image of upper category into image of G∘F. The counit transforms the image of F∘G into the image of lower category.

In our example the unit ends up being a family of paths a → a*a inside the C -category. The counit ends up being a family of paths <a*b, a*b> → <a,b> inside the C*C -category.

The counit splits into two families of paths because the C*C -category is just forming from the pairs that can be formed by pairing things from C.

Adjoint functors are required to satisfy an another condition. Pick either left or right functor. If you apply unit transformation and then counit transformation, you should get the same functor back.

More formally described, if you first do the transformation unit∘F, then transformation F∘counit. This should be the identity of F -functor.

  F --F∘unit-> F∘G∘F --counit∘F-> F
= F --F-> F

Likewise for the G functor.

  G --unit∘G-> G∘F∘G --G∘counit-> G
= G --G-> G

We can start unraveling this in the same way we examined the unit and counit.

Examining products

Lets look at the image of F. It's the <a,a> in category C*C. Applying unit to it means that the starting-point of the functor expands into a loop through the C*C category. So we get the following:

a
---------- F
<a,a>
---------- G
a*a
---------- F
<a*a, a*a>

Now the counit is applied to the loop to squeeze the functor back to F from the end-point The result is:

a
---------- F
<a,a>

So that's a quite simple rule. For a pair this requirement means that the introduction/elimination rules (unit/counit) are locally sound. We can duplicate the values/functions with unit, and apply counit to them, yet we retrieve the same things we put in.

Next lets look at what it means with G. So G is the image a*b in category C. The application of unit expands the ending point of the functor and we get this:

<a,b>
----------- G
a*b
----------- F
<a*b,a*b>
----------- G
(a*b)*(a*b)

However, now the counit shrinks the functor from the beginning-point such that it resumes to a*b.

At first it may seem it's doing the same thing, but actually it means that the introduction/elimination rules are locally complete.

You see we start with a*b instead of two instances of a. Next we duplicate it to a pair and take the counit inside the pair, and we end up with the same thing we had.

In short the rules mean that:

counit ∘ F unit = C*C
G counit ∘ unit = C

And if we replace counit with <fst,snd> and unit with dup. It shows up that we get these reduction rules:

<fst∘dup, snd∘dup> = <a,a>
(fst*snd) ∘ dup = a
for any a in C.

These are very useful rules and all, but if we've already forgotten how we defined them it'd seem quite bland because it seems like we are unable to do anything for a pair that's not a duplicate of some computation.

However we have a way to represent non-duplicate pairs because there's the functor and it applies to paths as well.

fst ∘ G <x,y>

So we got to look at it again... First from the left side.

<f,f>

What does it mean to apply unit to a path? Recall that the unit is a natural transformation and therefore the following is true:

unit ∘ f = (f*f) ∘ unit

Likewise the counit is a natural transformation so..

<fst,snd> ∘ <f*g,f*g> = <f,g> ∘ <fst,snd>
<fst∘(f*g), snd∘(f*g)> = <f∘fst, g∘snd>

Okay, so we get.. Um. We did this already didn't we?

<f,f>
---------- unit
<f*f, f*f>
---------- counit
<f,f>

f*g
---------- unit
<f*g, f*g>
---------- counit
f*g

And if you recall the G is a functor, you'll obtain:

G (<f,f'> ∘ <g,g'>) = G <f,f'> ∘ G <g,g'>

And diagonal functor weren't very interesting in the first place.

Lets collect all the rules together so far.

fst∘dup = a
snd∘dup = a
(fst*snd)∘dup = a
dup∘f = (f*f)∘dup
fst∘(f*g) = f∘fst
snd∘(f*g) = g∘snd
(f∘k)*(g∘h) = (f*g)∘(k*h)

So aside retrieving those elimination rules from adjoint condition, the requirement that things are functors, natural transformations, etc. ensures we have plenty of rules to simplify expressions here.

Lets examine exponentials next

Ok, lets examine yet an another familiar adjunction. If the category has product, then it's possible to recognize an another structure, called exponentials. We construct an endofunctor C → C. It has the following image:

f
--- F
f*a

The requirement that it's a functor must be satisfied of course.

(f∘g)*a = (f*a) ∘ (g*a)

You've seen that before, right.

(f∘g)*(a∘a) = (f*a) ∘ (g*a)

Applying of this functor is equivalent to assuming we have some value. The right adjunction for it would produce the exponential.

unit : a → (f*a)^a

How about the counit? Exponential can be applied to any type as it must also be an endofunctor. Also since we are required to "cross back" through the assumption we get:

counit : (f^a)*a → f

So they seem like familiar constructs. Lets call them curry and apply.

We should get two elimination rules again.

apply ∘ (curry*a) = C
(apply^a) ∘ unit = C

These should seem very familiar again. The first one is beta reduction with empty function. The second one is eta reduction.

Are we able to use these? Recall we are required to have the natural transformations.

curry ∘ f = ((f*a)^a) ∘ curry
apply ∘ (f^a)*a = f ∘ apply

And we have the right functor of course.

(f∘g)^a = f^a ∘ g^a

In total we have the following rules now:

apply ∘ (curry*a) = b
(apply^a) ∘ curry = b
curry ∘ f = ((f*a)^a) ∘ curry
apply ∘ (f^a)*a = f ∘ apply
(f∘g)^a = f^a ∘ g^a

We've extracted a whole set of rules from here. Yet there's still perhaps a little bit we can squeeze from here.

Adjunctions compose

Adjunctions form a category themselves because they compose. Since we got C,C and C,C*C adjunctions, we could try compose them and see what we get. The left functor would compose assumption to diagonal. And we'd get the following image:

<f*a, f*a>

Likewise the right functor would give the:

(f*g)^a

We'd have the following unit and counit:

unit : f → (f*f)^a
counit : <(f*g)^a*a, (f*g)^a*a> → <f,g>

We already have the rules and they just compose like expected. And it means we get these natural transformations:

unit ∘ f = (f*f)^a ∘ unit
counit ∘ <(f*g)^a*a, (f*g)^a*a> = <f,g> ∘ counit

Likewise we have these elimination rules:

counit ∘ <unit*a,unit*a> = C*C
(counit*counit)^a ∘ unit = C

What are the unit and counit? They seem familiar but how do the adjunctions compose? Well, both seem familiar, they're probably:

unit = dup^a ∘ curry
counit = <fst∘apply,snd∘apply>

The types match and it makes a bit of sense as well. I'm not sure it's correct but we get the elimination rules:

fst∘apply∘(dup^a∘curry)*a = b
snd∘apply∘(dup^a∘curry)*a = b
((fst∘apply)*(snd∘apply))^a ∘ dup^a∘curry = b

Here's the interesting question: Are these new rules?

[1]  fst∘dup = a
[2]  snd∘dup = a
[3]  (fst*snd)∘dup = a
[4]  dup∘f = (f*f)∘dup
[5]  fst∘(f*g) = f∘fst
[6]  snd∘(f*g) = g∘snd
[7]  (f∘k)*(g∘h) = (f*g)∘(k*h)
[8]  apply ∘ (curry*a) = b
[9]  (apply^a) ∘ curry = b
[10] curry ∘ f = ((f*a)^a) ∘ curry
[11] apply ∘ (f^a)*a = f ∘ apply
[12] (f∘g)^a = f^a ∘ g^a

Probably not because we can apply other rules to derive them. Like this:

fst∘apply∘(dup^a∘curry)*a = b
-----------^^^^^^^^^^^^^^------ [7]
fst∘apply∘(dup^a)*a∘curry*a = b
----^^^^^^^^^^^^^^^------------ [11]
fst∘dup∘apply∘curry*a = b
^^^^^^^------------------ [1]
apply*curry*a = b
----------------- [8]
b = b
-----

Likewise lets look at the last one:

((fst∘apply)*(snd∘apply))^a ∘ dup^a∘curry = b
-^^^^^^^^^^^^^^^^^^^^^^^--------------------- [7]
(fst*snd∘apply*apply)^a ∘ dup^a∘curry = b
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^---------- [12]
(fst*snd∘apply*apply∘dup)^a ∘ curry = b
---------^^^^^^^^^^^^^^^--------------- [4]
(fst*snd∘dup∘apply)^a ∘ curry = b
-^^^^^^^^^^^--------------------- [3]
apply^a ∘ curry = b
------------------- [9]
b = b
-----

Unless there's more interesting way to interpret the composition of adjunctions, we aren't getting more axioms. However the composition seem to produce neat tests for evaluators.

Another example of adjoint functors

The use for programming languages isn't the only way to apply adjoint functors. You can describe lot of different things as adjoints. I remember this next one was in Spivak's category theory book.

Pick a functor from whole number line to real number line. This time the morphisms are order relations eg. 1 <= 5, 4 <= 4, etc. Now consider that this functor has a right adjoint.

So we got a functor F that does the transformations such as:

(1 <= 1) ↦ (1.0 <= 1.0)
(5 <= 7) ↦ (5.0 <= 7.0)
(2 <= 3) ↦ (2.0 <= 3.0)

And we got some functor G that does the same backwards, such that we got two additional family of relations:

unit   : a <= (G∘F) a
counit : (F∘G) a <= a
for all a.

Now if we start looking at this..

0 <= G 0.0
(F∘G) 0.0 <= 0.0

1 <= G 1.0
(F∘G) 1.0 <= 1.0

1 <= G 2.0
(F∘G) 2.0 <= 2.0

If G takes a whole number larger than the real number, we'd be in trouble.

(F∘G) 2.0 <= 2.0
----------------
(F) 3 <= 2.0
------------
3.0 <= 2.0

If G takes a whole number that's too small, we'll be also in trouble? Not by the counit-rule at least:

(F∘G) 1.0 <= 1.0
----------------
F 0 <= 1.0
----------------
0.0 <= 1.0

But by the unit-rule

1 <= G 1.0
----------
1 <= 0

Recall that G must be a functor and morphisms can be thought of as intervals.

G(g ∘ f) = G(g) ∘ G(f)
G(a <= b <= c) = G(a) <= G(b) <= G(cc)

This means that G must be the floor -function. It cannot return a whole number lower than floor(a) because otherwise the unit -law isn't satisfied. Likewise it cannot return a whole number higher than a because then the counit -law isn't satisfied.

The condition for adjoint unit and counit should be satisfied as well.

float(3) <= float(5)
---------------------------------------------- float∘unit
float(3) <= float(5) <= float∘(floor∘float)(5)
---------------------------------------------- counit∘float
float(3) <= (float∘floor)∘float(5) <= float(5)

3.0 <= 5.0
---------------------------------------------- float∘unit
3.0 <= 5.0 <= 5.0
---------------------------------------------- counit∘float
3.0 <= 5.0 <= 5.0

Likewise for the floor -functor.

floor(3.5) <= floor(4.2)
--------------------------------------------------- unit∘floor
floor(3.5) <= floor(4.2) <= (floor∘float)floor(4.2)
---------------------------------------------------- floor∘counit
floor(3.5) <= floor∘(float∘floor)(4.2) <= floor(4.2)

3 <= 4
----------- unit∘floor
3 <= 4 <= 4
----------- floor∘counit
3 <= 4 <= 4

You're likely able to retrieve some abstract reduction rules as well. I'm not sure if I figure it out.

Conclusion

After reading this you perhaps understand what adjoint functors are. Thought at first it probably doesn't seem very useful.

The difficulty of category theory seem to be that you can understand and apply it in so many ways. For example, there's this idea that simple algebraic datatypes form functors. If you give function to f:a → b and g:c → d, the following type-expressions make sense:

f+g, f*g, f^c

They form functions such as:

f+g : a+c → b+d
f*g : a*c → b*d
f^c : a^c → b^d

Since they're functors, you can compose them.

(+) : C*C → C
(*) : C*C → C
(^a) : C → C

Therefore if you have an algebraic datatype of any kind, you can apply any function within the structure by lifting it. Overall, you can do "algebra" with algebraic datatypes.

Monads/Comonads, structured recursion and optics are also expressible in category theory.