Writing STLC Evaluator to explore into structure of proof assistants

I wrote a lambda calculus evaluator to learn how to write a proof assistant.

The evaluator is structured into:

  1. Main interface
  2. Parsing
  3. Printing
  4. Type checking
  5. Normalization

I call it Alonzo N.B.E.. People have had traditional to name STLC evaluators Alonzo for some time now. This evaluator is based on normalization-by-evaluation -strategy, so that's the reason for the name.

The software has not been finished, but I still likely do only few remaining changes during the next two days, then I let it be.

This program has been different to those I've written before in a way that I am mainly determining how clear and easy it is to understand through the types that have been declared.

Main interface

My main issues with statically typed languages and compilers has been that they're intolerant for bad user input. Your best way to see whether something's wrong is evaluation, yet the compiler asks you to figure out what's wrong with minimal explanation of what went wrong.

The point with theorem-proving assisted programming is that you never have to run the program to understand the problems there are. You don't even have to read the program. You can just read from the propositions what kind of properties the program is guaranteed to have.

Though this means that the interaction between user and type checker becomes a fairly major event that should be smooth as possible.

The parser is built to be relisient, such that the user always gets some result from the system as long as he has written at least partially correct program.

main :: IO ()
load_module :: String -> IO Module

The first sign of attempt to make it relisient is the load_module that just returns a module. I still would have work to do in order to do it better. An attempt was done though.

The module may contain portions that failed to parse, it also accumulates any errors the type checker gives during the passthrough.

data Module = Module {
    get_structure :: [Structure],
    get_declarations :: Map.Map String Term,
    get_judgements :: Map.Map String Type,
    get_unchecked :: [String],
    get_checked :: [String] }

data Structure =
    Declaration String
  | Judgement String
  | NoParse [Token]
  | Problems String
  | Errors String [Fail]

The [Structure] describes the structure of the input file. It is recorded to form the output of the program. Declarations and judgements are separated into maps for easy access.

Terms and types are shaped as they come from the parser, even if the typechecking still rejects structures that aren't shaped right.

data Term =
    Var Int
  | App Term Term
  | Abs Term
  | Name String
  | Ann Term Type

data Type = Some String | Arrow Type Type

The parser places everything into the unchecked -list.

The type checker gets a role of a very ill-natured and systematic Santa Claus with a double-entry bookkeeping system; It removes "kids" from unchecked -list and if they've been "nice", move them to checked -list or then drop the entry while writing a complaint.

The default mode of the program is to interleave the error report into the code that was passed in, and normalize the parts that correctly typechecked. File quite doesn't come in the exact shape as it was inserted, but quite close.

Overall I'd like to preserve the shape and structure of the input while where possible, but I'm not sure how to achieve it well.

The pipeline didn't shape-out quite well yet, and it contains only one task mentioned earlier:

normalize_all :: Module -> [Structure] -> [(Module, Structure)]

The last thing done to any module is that it is printed with the given page width counted in characters.

print_module_entry :: Int -> (Module, Structure) -> IO ()

That's about what this program is doing. Now we get to the details, only in slightly more coarse detail than what the type information reveals.


I only describe the syntax informally as that's as accurate explanation as I can give. The implementation is likely incorrect as it is just plain Haskell.

The input is line-based. Every expression is supposed to start with a line that doesn't have space trailing in front. Every indented line is treated as part of a single expression that started with a non-indented line.

We have '#' -comments that end up to a newline. They've been fairly nice, clean and easy to work with in Vim.

The language uses unicode characters and . When using my language I completely expect you have overloaded your 'INSERT' or 'CAPS-LOCK' -key to input some math unicode characters.

The basic input you give are judgements and declarations.

name : type
name = term

For types you can just write some words and separate them by arrows.

S : (a → a → a) → (a → a) → a

Type checker treats the types to come from same environment that is separate from the term environment, and that every type with same name is equal.

No type polymorphism in place or anything such.

For declarations you write maple-arrows for lambdas. I decided to do it this way as it's relatively clean way to write functions.

S = x ↦ y ↦ z ↦ x z (y z)

The parsing interface is fairly simple inside the program. It just transforms from a string to tokens, then it transforms into a module.

tokenize_all :: String -> [Token]
parse_module :: Parser Module

Parsing itself is implemented with parsing monads and combinators.

Pretty printing

The pretty printing tends to happen in two stages. First we produce a document for the pretty printer to work on and then we render it.

The document rendering is taken care by pretty_run and the algorithm used isn't very fancy even if it looks majestetic. It just looks three lines ahead on the input to see whether to break a line.

pretty_run :: Int -> [String] -> PrettyState () -> String

The renderer is also supplied with list of the free variables. It's needed for stringifying context, as I don't record the variables that the user used, although I probably should do that.

In the retrospect it was a bad idea to supply free variable names into the pretty printing routine. Every run of the pretty printer doesn't need free variables being supplied, to counteract this, an empty list is passed when no variables are needed.

The document structure of the pretty printer is a sequence built with Haskell's monads, made from the following combinators:

  1. textb text: write text out in single contiguous nonbreakable string.
  2. blankb text indent: a breakable portion in the document. Add the indent if broken at this point.
  3. leftb/rightb: Left/Right boundary marker. Used along parentheses. Shapes the line breaking procedure.

The shape is probably roughly what to expect. The AST visits some intermediate presentation for a document before being written into a string.


The type checking is done right before normalization, although it should be a separate step that runs when it's asked or needed in order to do something else.

It verifies a name at a time in a module, recursively doing the referenced items first. I wanted that you can write the judgemetns and declarations in the order that makes sense, rather than have to follow the order of dependency.

The typechecking strategy is bidirectional type checking, that was just covered in the previous post.

verify :: String -> Module -> Either Fail Module
infer :: Module -> [Type] -> Term -> Either Fail Type
check :: Module -> [Type] -> Term -> Type -> Either Fail ()

The type checking isn't quite as relisient as I'd want it to be. It fails to the first issue, rather than trying to find more issues to complain about in a single declaration.

Ideally we'd like it to run into as many failures as possible before returning.


The normalization is done through evaluation. The algorithms are explained elsewhere better than what I could describe, so I only explain shortly what's going on.

The normalization-by-evaluation interprets the lambda calculus as small programs that resemble the implementation language, that is then evaluated. The evaluation result is read back from the structure, which may trigger further evaluations.

type Env = (String -> Maybe Term)
type Scope = [Value]

data Value =
    Closure Env Scope Term
  | NVar Int
  | NName String
  | NAp Value Value

eval :: Env -> Scope -> Term -> Value
apply :: Value -> Value -> Value
readback :: Int -> Value -> Term

Note how the returned Values roughly resemble normal-forms of lambda calculus.

The evaluator returns a β-normal-form, though NBE -evaluators are able to give ηβ-normal-forms as well.


I thought it was a good idea to practice a little bit before rolling out anything larger again. Now I'm confirmed that it was a good idea, although I weren't sure of all the objectives I'd have with such project.

This project helped me to study some ideas and present some opinions, nice.

It's also a working program that you can use to normalize lambda calculus expressions. Like other this kind of programs it likely still has some bugs in it, because it's not formally verified code.

There are still some ideas and thoughts that arose while working on this, and that aren't represented by the program.

Ideas about parsing & pretty printing

The parsing routines should always be equipped with unparsing routines as well. They are important for displaying results and automating program-writing.

Both parsing and printing routines should be exposed toward the user, so that she can use them in her program as libraries.

Parsers should be generated from grammars, and they shouldn't contain any features that cannot be generalized and formalized somehow.

Example: If you can't introduce layout-sensitivity in a general format, just forget about it.

The languages themselves should be fairly simple, and how they present types should take precedence over program representation.

Line/column information should be separable from the input programs. Were the program to be re-printed, it should be re-introducable.

I think that system documentation should be describing a "letter-page" that you can program into your keyboard to insert the mathematical symbols it is using in its standard library. Otherwise I find it acceptable to use unicode symbols in new programming languages.

I read few influential papers while working on this, by Nils Anders Danielsson:

  1. Parsing Mixfix Operators
  2. Correct-by-construction pretty printing

The Mixfix Operators presents how custom precedence orders can be introduced and handled by users of the programming environment, and how they can be converted into grammars. I think I will apply the ideas somehow in the future.

The Correct-by-construction pretty printing formally verifies that the pretty-printer produces the output that can be parsed back while retaining same meaning. It also has overall, very nice formal presentation for grammars.

Though I remain a bit mystified what to do about ambiguity in the user-supplied grammars. The paper isn't treating this part.

Btw. How to provide forms of indentation and layout that are formally presentable and do not excessively complicate parsing. Suggestions are welcome.

Ideas about error reporting

My overall impression of the use of compilers now, suggests that error messaging should be presented in a way that allows quick glancing through them.

Error messages shouldn't be long or descriptive. They should be near-symbolic and short, displayed along with the code that caused them, if possible.

A proof-checking routine shouldn't look like a nagging idiot. It should be like an extension to your perception.

Ideas about multiple modules

The implementation is missing module support.

I think that a module system should be hierarchical and scoped, such that you can create your own scopes and structures that can be then reloaded in a bunch if needed.

If you aren't explicitly focusing on a specific module, the errors about them should be suppressed until it's relevant or explicitly requested.

The content in modules should be preprocessed such that depending on them is not slowing down the work more than is necessary.

How the modules were created should be written into the modules themselves, and should also reveal whether the module should be rebuilt from its source.

If the module is from a library, it should be formally checked whether an updated module fits the program. The end user of the software should experience the same convenience as the programmer.

The formal verification elements should be exposeable to the end-user. Preferably every program would come with its associated, documented and unobfuscated source code, and the runtime environment could not be modified to deliver it any other way.

It won't be in the license conditions. Rather it'll be written into documentation, and the enforcement will be social. Companies not distributing source code will be banned from community-kept events, their commits into repositories are refused, they may have issues with employing capable people, their software may be dropped from community-kept services without a warning. Also I'll invent more stuff like that as I get on it. Plus going against the letter of this will gets the same treatment.

Final thoughts

Overall I treat compilers as if they were high-end professional industrial equipment with user-serviceable and interchangeable parts.

I hope you are inspired.

Similar posts