Understand IO Monad and implement it yourself in Haskell

This post is a rewrite of a message I wrote last week. It's a filler to cap an otherwise empty weekend slot.

I'm going to explain what distinguishes purely functional programming languages and why they need an IO monad. This post for anybody who is curious about what an IO monad is. I try to explain things on detail, such that it could be understood with basic education. No programming knowledge needed.

Contents of the post aren't new. In fact the post I responded to referenced Chris Done's post about pure-io. There should be nothing here that wouldn't be already explained elsewhere.

IO monad solves a problem of how a purely functional programming language can represent communication with its environment.

Informal explanation

You should be able to understand what's going on below.

  5 * 4
= 5 * 2 * 2
= 10 * 2
= 20

Computation is similar to rewriting expressions like this, such that the meaning of expression remains same.

Mathematical expressions can name interactions. For example, we can describe mathematically a desire to print something. Or get a line from the user.

print abcd
getline

Then we come up with rewriting rules that match what we expect.

print abcd » ()      {you must write abcd to the paper after this}
getline    » y       {you must take the 'y' from the user}

Now if you try to build programs with these rules, you quickly notice that you have to introduce an order for how you apply these rewriting rules.

If a programming language is purely functional it forbids such side conditions. Though, how do you represent programs that interact then?

Side conditions in computation aren't needed for interaction. That is, we can take every computation rule with a side condition and turn them into an interaction rule.

Instead of doing the thing above, you can introduce sequencing. We can take (x;y) to mean that x is sequenced before y.

x := getline; print ("hello" + x)

Now the interaction of this program has been separated from the rewriting rules. To interpret this program as interaction with user, Take the item from the left, in this case it's x := getline. Do what it's supposed to do. Replace 'x' with result you get. Then continue with the remaining expression. The remaining expression is.

print ("hello" + "you.")

And I would be mandated to say "helloyou." to complete the interaction.

Now it's your turn. Let me give you a program.

5 + print "hello"

I haven't told you how to interpret this, though.

You may invent several ways to interpret this program. But if no rule has been specified there, none of them is correct. What should we do? The answer is to ask what we should take as a valid expression.

But then we should ask what's valid for x;y, or what's valid with getline. Do this for every expression that you have.

Lets start with describing numbers. We can denote that we have numbers, starting from zero.

0 : number
1 : number
2 : number
3 : number
...

How do you state what is a valid addition? Well we could continue with the same way. The arrow is implication here, it means that if the things on left side hold, then the thing on the right side holds.

(x : number), (y : number) → (x + y : number)

We can remove the terms and arrive to this concise presentation.

(+) : (number, number) → number

If you fiddle around you may notice I used addition to mean at text concatenation earlier on. That expression would be valid no longer. Don't worry about that for now because it's a hard aesthetical question far outside of the scope of this text. We can use a different symbol for putting text together.

Lets move on denoting what we take as a valid getline, print, and composition ;.

We could start with these:

getline : command
print : text → command
(;) : (command, command) → command

If you're sharp person or you already know the subject, you're asking "where's the result that's given by the getline?" These aren't complete because we have to introduce parameters to our types.

getline : io text
print : text → io 1 
(;) : ∀a,b. (io a, io b) → io b

The stands for "for all", so a and b refer to any type here and 1 can be only filled by a specific constant.

These are correct, but you may be wondering where's this one thing? How do you access the value getline returns? Well, we introduce some syntactic sugar. This equation:

x := f; g

Is same as this equation:

f »= λx. g

If you meet the λ for the first time here. It just constructs the a → b and makes the a available for construction of b. To supply the value to the construct you apply it to a value.

(λx.g) v = g   {replace 'x' with 'v',
                following some rules related to variables}

There are few rules for variables inside λ. First of all, we should consider x to be present only when reading g. It's not available for use outside of its scope. Second, when you have stacks of these expressions, eg. λx.λy.λz.g, they should come with different variables so that you can distinguish between them.

Now we can describe when assignment is valid.

(»=) : ∀a,b. (io a, (a → io b)) → io b

So we've provided the constructs you need to describe interactions. There's perhaps still an one thing you might like to have.

pure : ∀a. a → io a

I've explained how getline and print should be interpreted. The pure needs a little bit of explanation.

pure x » x          {do nothing}

Now we have some simple rules for constructing programs that we deem computable. Lets collect them together. I took a freedom to add some rules for text processing.

0, 1, 2, 3.. : number
(+) : (number, number) → number

"a", "b", "c".. : text
(++) : (text, text) → text

getline : io text
print : text → io 1 

(;)  : ∀a,b. (io a, io b) → io b
(»=) : ∀a,b. (io a, (a → io b)) → io b
pure : ∀a.   a → io a

These last three rules define the IO monad. Monad is a certain kind of commonly appearing structure that describes many such structures that we can modify without changing it's meaning.

Something being a monad means that it has (»=) that is associative, and it has an unit called pure.

The (;) is same as (»=), except that it discards the value produced from the computation.

We can convey the rules that monads have to satisfy like this:

(x;y);z = x;(y;z) = x;y;z
a := pure a;x = x
a := x;pure a = x

Associativity allows to drop the parentheses from a sequence. These are similar to what we'd have available for mathematical manipulation of expressions. You could compare them to these equations that are for numbers:

a+(b+c) = (a+b)+c = a+b+c
0+a = a
a+0 = a

We introduced a lot of structure by describing what stands as a valid equation. But it should be clear why we need it. These rules help with reasoning about programs.

Now we can try interpreting programs again. I can give you the same program.

5 + print "hello"

Now you can ask me: "How is this valid exactly?"

I would not be able to present you a type that makes it a valid equation. This means that you don't need to do anything to that program.

However, I can still give you a valid program:

print ("Hi I'm Santa, tell me your name? ");
name := getline;
print ("Hello " ++ name ++ ", and what do you wish for Christmas this year?");
item := getline;
print ("Ok.")

Now if you ask me "How is this valid?" I'm going to tell you it's io 1. You can verify that by following the rules that were laid out. We got sequences of commands, and you can verify that the "getline" produces text, which is then used in a context that is valid with text.

IO monads aren't the only solution to representing communication. Though they are excellent for conveying behavior where the next step taken would depend on the previous step taken.

Rationale

This pure functional programming would seem to be a whole lot of trouble. Or then you think "Oh I I know how to write programs now!".

The way pure functional languages simplify reasoning may be taken as constraining. I can show you how to do this because I've been good at it. First take a type, for example this type:

number → number

Now if you've been following, you know this describes a program that gets a number and must construct a number. Simplest program that fills the type would be:

λn. n

But you could also produce a number that is multiplied, something added to it, etc.

Now start complaining that you can't write to a memory address, or read from some memory address to fill this type.

That's how you can see this as constraining.

But in other hand you don't need to worry about anything like that. If you rewrite something in the program, you don't need to worry that its meaning would change because you cause interference in something that's happening behind the curtains. In fact well written purely functional programs don't depend on how the computer is structured. You can already see it in the examples given so far. The earlier 'getline' and 'print'. They never talk about keyboards, printers, speech recognition or anything like that. By leaving away details you're freed to fill them in later on.

This concludes the easy parts of the blog post. Next we discuss why these structures come out like they do.

Technical details

If you'd like to understand what the IO monad exactly solves. You need to read this part. It's possibly still accessible.

The illustrations aren't too much different from the earlier. There's let x in y -expression in here used for taking apart pairs.

It's easiest to illustrate with a counterexample of a system that is slightly weaker than IO monads, and fails for that reason.

Here's an alternative weaker way to represent communication:

getline : channel → (channel, text)
print : (channel, text) → channel

To represent a communication, instead of io 1 you now fill a type channel → channel.

With this system you can fill a type channel → (channel, channel) An example of that would be:

λch. (print(ch, "hello"), print(ch, "Hello"))

To convert (channel, channel) into channel, you select either side. This means that the other interaction from these two ends up being speculation.

Now lets pick an example of how this representation of communication fails:

λch0. let (x, ch1) = read (print("input?", ch0))
      in print("hello " ++ x, ch0)

Two interactions are described here and the second one is chosen. But the second interaction is constructed with the results from the first interaction. It means that if you try to interpret this program you're going to get stuck.

IO monad prevents the production of results such that you can only read those results that you actually chained to be part of the interaction. Examine the (»=) signature, I cleaned off the quantifiers for you so it's a bit easier to read:

(»=) : (io a, (a → io b)) → io b

Here's an "implementation" for this bind, in terms of channels:

io a = (ch → (a, ch))
(r »= f) = λch0. let (x, ch1) = r ch0
                 in  f x ch1

Now if you don't tell the user what io a is exactly. The outcome is that he can't speculate with values that weren't produced. If he uses results of some interaction it forces to prepend that interaction to be the part of the interaction that is being produced with the result.

Note that the signature of the (»=) allows you to guess what is going on there. Why is it taking prior interaction as input? Why is it taking subsequent interaction as input and then give similar interaction back afterwards?

There's a short jump from this kind of typed purely functional programming to proving theorems. It's possible to write types that constrain the implementation so much that it can be only filled with implementations that are correct. It's not even that difficult.

Next we go and implement IO monads in Haskell. I won't explain Haskell in the same detail I explained the expressions presented earlier. You may need some knowledge of Haskell to understand what you read.

Haskell implementations

I'm going to present several implementations, but they all go into the same module.

{-# LANGUAGE GADTs #-}

module MY_IO where

This next data structure is a generic algebraic data structure. It allows me to describe nearly arbitrary expressions as "constructors" that I can pattern match on later.

data MyIo a where
    M_print :: String -> MyIo ()
    M_getLine :: MyIo String
    M_bind :: MyIo a -> (a -> MyIo b) -> MyIo b

Here's a simple interaction expressed with the constructs:

m_hello :: MyIo ()
m_hello = M_getLine `M_bind` \name -> M_print ("Hello " ++ name)

We can interpret the interaction through Haskell's own IO monad.

io_interpret :: MyIo a -> IO a
io_interpret (M_print s) = print s
io_interpret (M_getLine) = getLine
io_interpret (M_bind x f) = io_interpret x >>= io_interpret . f

But we could also interpret such interaction in a more abstract way. For example, we could pass in a list of strings as input, and examine how the program reacts to that.

l_interpret :: MyIo a -> [String]
    -> Either [String] (a, [String], [String])
l_interpret (M_print s) x      = Right ((), x, [s])
l_interpret (M_getLine) (x:xs) = Right (x, xs, []) 
l_interpret (M_getLine) []     = Left []
l_interpret (M_bind x f) xs = case l_interpret x xs of
    Left ys1 -> Left ys1
    Right (z,xs1,ys1) -> case l_interpret (f z) xs1 of
        Left ys2 -> Left (ys1 ++ ys2)
        Right (q,xs2,ys2) -> Right (q, xs2, ys1 ++ ys2)

The l_interpret turns the interaction into a function that gets a list of inputs and either returns partial or completed result.

*MY_IO> l_interpret m_hello []
Left []
*MY_IO> l_interpret m_hello ["foo"]
Right ((),[],["Hello foo"])
*MY_IO> l_interpret m_hello ["foo", "bar"]
Right ((),["bar"],["Hello foo"])

It may seem impractical to examine the program by running it over again after every interaction.

data Cnt a = Cnt [String] (String -> Cnt a)
           | Ret [String] a

outp :: Cnt a -> [String]
outp (Cnt out _) = out
outp (Ret out _) = out

nextp :: Cnt a -> String -> Cnt a
nextp (Cnt _ fn) = fn 
nextp (Ret out x) = const (Ret [] x)

c_interpret :: MyIo a -> [String] -> Cnt a
c_interpret (M_print s)  out = Ret (out++[s]) ()
c_interpret (M_getLine)  out = Cnt out (\x -> Ret [] x)
c_interpret (M_bind m f) out 
    = let rep (Ret out x)   = c_interpret (f x) out
          rep (Cnt out nxt) = Cnt out (\i -> rep (nxt i))
      in rep (c_interpret m out)

Now we don't need to recompute so that we can continue interaction. We get a "continuation" along the output that we can dig up.

*MY_IO> let a = c_interpret m_hello []
*MY_IO> outp a
[]
*MY_IO> let a' = nextp a "foo"
*MY_IO> outp a'
["Hello foo"]
*MY_IO> let a'' = nextp a' "bar"
*MY_IO> outp a''
[]
*MY_IO> outp $ c_interpret (M_print "hello" `M_bind` const (M_print "hello")) []
["hello","hello"]

Now this may seem quite restrictive way to implement an IO monad. You have to describe every command in the GADT and you can't mix different command sets. Or how about you'd like to represent a program that can only print out results?

We can do that. We could do it "Tagless-final style".

class PIn m where
    g_getLine :: m String

class POut m where
    g_print :: String -> m ()

instance PIn IO where
    g_getLine = getLine

instance POut IO where
    g_print = print

It should be clear that you can now describe different kind of interfaces here and they don't do anything with each other.

Now try these:

*MY_IO> g_print "hello" >> g_print "hello"
"hello"
"hello"

So that looks like nothing exciting. But perhaps something interesting still happened? Lets examine the type it infers.

*MY_IO> :t g_print "hello" >> g_print "hello"
g_print "hello" >> g_print "hello" :: (POut m, Monad m) => m ()

So the type is this weird..

(POut m, Monad m) => m ()

Now do you think that a program typed like this could ask for input?

test_program :: (POut m, Monad m) => m ()
test_program = do
    a <- g_getLine
    g_print ("hello" ++ a)

[1 of 1] Compiling MY_IO            ( MY_IO.hs, interpreted )

MY_IO.hs:63:10: error:
    • Could not deduce (PIn m) arising from a use of ‘g_getLine’
      from the context: (POut m, Monad m)
        bound by the type signature for:
                   test_program :: (POut m, Monad m) => m ()
        at MY_IO.hs:61:1-41
      Possible fix:
        add (PIn m) to the context of
          the type signature for:
            test_program :: (POut m, Monad m) => m ()
    • In a stmt of a 'do' block: a <- g_getLine
      In the expression:
        do { a <- g_getLine;
             g_print ("hello" ++ a) }
      In an equation for ‘test_program’:
          test_program
            = do { a <- g_getLine;
                   g_print ("hello" ++ a) }
Failed, modules loaded: none.

How about we try to cheat? We know it's implemented as IO anyway..

test_program2 :: (POut m, Monad m) => m ()
test_program2 = do
    a <- getLine
    g_print ("hello" ++ a)

[1 of 1] Compiling MY_IO            ( MY_IO.hs, interpreted )

MY_IO.hs:63:10: error:
    • Couldn't match type ‘m’ with ‘IO’
      ‘m’ is a rigid type variable bound by
        the type signature for:
          test_program2 :: forall (m :: * -> *). (POut m, Monad m) => m ()
        at MY_IO.hs:61:18
      Expected type: m String
        Actual type: IO String
    • In a stmt of a 'do' block: a <- getLine
      In the expression:
        do { a <- getLine;
             g_print ("hello" ++ a) }
      In an equation for ‘test_program2’:
          test_program2
            = do { a <- getLine;
                   g_print ("hello" ++ a) }
    • Relevant bindings include
        test_program2 :: m () (bound at MY_IO.hs:62:1)
Failed, modules loaded: none.

Types now seem to convey what the program can do, despite that they didn't do so before.

test_program3 :: (POut m, Monad m) => m ()
test_program3 = do
    g_print ("hello")

You can also hide the "Monad m" by adding constraints into typeclasses. Like this: class Monad m => PIn m instead of the simpler class PIn m because it being a monad is a bit of implied from the context.

You may have already guessed that implementing IO monad yourself isn't just a fun exercise. You can do it to write better, more abstract programs.

Similar posts