Towards general-purpose logic programming

I've been struggling with trying to get Lever type inference its programs and started to feel overwhelmed. This week explored for alternative ways to design programming languages.

Let's look into this idea of a general-purpose logic programming language. This should sound preposterous for anyone who has tried Prolog but did not stick to it.

Why?

I studied this subject after observing similarities in implementations of µKanren and Icon/Unicon.

Icon procedures return stream of values.
µKanren expressions return stream of states.

To understand why this is such a big deal, it suffices to mention that this associates iterators and generators to logic programming. Reasoning does not stop there, though.

Control flow complexity

A good imperative programming language comes with generators and exception handling. These significantly reduce errors in writing software yet are a burden for implementors to support.

Three-address-code and stack based virtual machines require that you implement one or two special purpose instructions for iteration, then some more for exception control flow, and eventually few more for pattern matching.

The virtual machine can usually handle this complexity, but it will eventually propagate down into other utilities that have to read the bytecode.

User studies

On the rare occassion when I meet people and get to hear how they reason about programs, it will be revealed that lines such as x = x + 1 confuse people. It seems to be difficult to grasp this idea that a value of a variable changes while the program executes.

Prolog does not change their variable state once they're bound, it seems to be helpful when analysing what the program is doing.

Transaction logic

When logic programs backtrack, they have to resume into a previous state. Transaction logic brings this further by wrapping changes to mutable data inside a transaction.

This means that logic programs written in transaction logic can use mutable state without increasing the chance to corrupt the program state inadvertedly. It comes with the cost of keeping a log for every change done to every structure during the transaction.

Inevitably you cannot ban mutable data structures. It is nice if your system has a way to cope with those mutable structures as well. That is what TR brings to the table.

Side-effects

Side-effects in logic programming languages seem like something that'd need extra semantics, but it turns out to not be a problem. You can use variables for control flow. This approach is in use at Mercury programming language and it works well.

As an example, this is how you would write a simple program that prompts for input:

main I0, I3 <-
    write "What is your name?", I0, I1
    readline Name, I1, I2
    write "Hello" ++ Name, I2, I3

When we produce a query ?- main(Action, exit)., we get the following response:

Action={write "What is your name?",
        {readline <V1>,
         {write "Hello" ++ <V1>,
          return}}}

We have several ways to go about this. Mercury requires that you verify that the program is deterministic, so that it can evaluate while it runs through the program.

An another valid approach would be to query the program, retrieve the result and pass it to an evaluator if it is well-formed.

Divergence

Languages kin to Prolog seem to be perfect for experience the halting problem and its variations in practice. This is the main reason for why I have condemned logic programming before.

But Prolog is not different from other programming languages in this aspect. Every program can hang or crash irrespective of the language it is written in.

Rather the problem would appear to be that in order to program in Prolog you have to know an awful lot of details about the implementation, and some of those details result in very unpredictable behavior.

You don't like it if the software breaks up when you wear socks with wrong color of stripes in them. Especially if it does that by not reporting the error: WrongSockStripeColor(expecting Yellow)

This is especially a problem in relational logic programs. William Byrd provides several tools to avoid divergence in relational programs at his thesis. "Relational Programming in miniKanren: Techniques, Applications, and Implementations".

The relational programming presents a problem because the clauses written to be used in relational manner can be used in so many ways.

For example, the member/2, the relational containment clause, can be used in 4 different ways:

"a" in ["a", "b", "c"]
A   in ["a", "b", "c"]
"a" in B
A   in B

Try to not do the assumption that B is a list when reading these clauses.

The first clause is a simple containment check.
The second clause retrieves items present in the list.
The third clause constraints B to contain "a". Note that we cannot assume that B is a list, therefore we cannot produce a value for B here.
The fourth clause establishes a bidirectional constraint that whatever A is, it must be within B.

Ensuring that any of the above cases doesn't fail in unpredictable ways or doesn't perform badly. It is a very difficult problem to solve.

Performance

Logic programs may branch in so many places and stores so many values that their interpretation can turn out to be slow. This means that optimization tools are ever more important.

When programming in high-level languages, the showstopper really is the performance. The absolute showstopper is that many dynamically typed programming languages treat their variances sloppily enough that it is near impossible to come up with something to translate their code to run fast as possible.

Variances are parts that may change during the program's execution. They form a starting point for the optimizing compiler. Every value that may vary during execution must be considered by the compiler.

For example, Python modules are mutable and this presents a variance for every global value in the language. The values in modules are accessed through their variables, so to optimize a Python program you either have to prove or assume that an accessed variable in a module does not change during evaluation. In practice few people would notice if Python scopes were immutable so the mutable modules are mostly an unnecessary variance that causes harm for optimizers.

An another example from sloppy handling of variances. I spent plenty of time on Lever 0.8.0 to its lexical scope with mutable upscope variables. You may wonder how many times I've used that feature? Well I used it once to parse C language trigraphs. I abuse them to lookahead few characters to rewrite them into one character when it's a trigraph.

On the retrospective I would probably not bother with writable scope anymore because it was so difficult to implement and it introduces a variance to every variable that I have to eliminate if I do any sort of reasoning about the program.

I know that the opposite showstopper waiting in C/C++/Java is that the application is uselessly buggy, too expensive to write and too inflexible to change. The comparison can be won by shedding out the absolute showstoppers from your side.

"Can Logic Programming Execute as Fast as Imperative Programming?" by Peter Lodewijk Van Roy. The answer he presents in 1990, is that yes, you can do it by introducing global dataflow analysis, dropping out nondeterminism from where it's not needed and pre-evaluating unification into simpler operations.

Horn-Clauses as an IR

I was searching for ways to transform logic programs into SSA. Instead I found out a way to convert from three-address-code to logic programming.

"Horn-Clauses as an Intermediate Representation for Program Analysis and Transformation" by Jorge Navas. This paper describes the following grammar:

Prog:
    Proc*
Proc:
    Clause*
Clause:
    Head <- Goal*
Head:
    Name(Var*; Var*)
Goal: 
    op(Val*; Var*)
    guard(Val*;)
    Name(Val*; Var*)
Val: Var | Const

There is an addition constraint that I didn't quite understand by the first reading, but it seems to relate to forcing a deterministic execution flow and passing information to the subsequent clauses from the guards that must fail in order to reach them.

Consider this C program for Euclidean algorithm, straight from the paper:

int gcd(int a, int b) {
    while (b != 0) {
        int t = b;
        b = a % t;
        a = t;
    }
    return a;
}

Here's the translated and simplified logic program:

gcd(a, b; ret) <-
    b != 0, gcd(b, a % b; ret)
gcd(a, b; ret) <-
    b == 0, ret = a

I would argue this is easier for people to understand as well. Especially helpful part seems to be the requirement of the complementary guard clause.

I also like that this form can be easily adjusted for iterators, pattern matching and exception flow.

Iterators and pattern matchers are form of guard clauses that return values.

read_list(name; output) <-
    open(name; fd),
    read_and_close(fd; output)
read_list(name; output) <-
    on file_error(not_exist),
    produce_default_list(;output)

The exception guards the except clause, such that on failure of the open we get the exception. Directly in the next line we can demonstrate the closing of the file handle, no matter what happens:

read_and_close(fd; output) <-
    output = [],
    read_items(iter(fd), output)
read_and_close(fd; finally) <-
    close(fd)

I write the 'finally' into the output-side because this clause must not interfere with production of those values.

A really funny thing. At first I wrote this wrong, but I realised that because the clause I wrote didn't receive 'fd' in its control flow.

Iterator extraction is just a different sort of guard clause.

read_items(items, output) <-
    next(items; value, rest),
    output.append(value),
    read_items(rest, output)
read_items(items, output) <-
    empty(items;)

How about? What would Python generators look like in this language? Maybe the implementation of range would look something like this:

generator
range(start, stop, step; g) <-
    step > 0,
    start < stop,
    yield start
    resume range(start+step, stop; g)
range(start, stop, step; g) <-
    step < 0,
    start > stop,
    yield start
    resume range(start+step, stop; g)
range(start, stop, step; g) <-
    step > 0,
    start >= stop
range(start, stop, step; g) <-
    step < 0,
    start <= stop
range(start, stop, step; g) <-
    step == 0,
    raise ValueError("range() step argument must not be zero")

Note that getting a sign wrong in this program would cause it to never terminate on some of its inputs.

It could be much more concise than this, but it would appear that we can directly prove from this clause that this program terminates.

the clause produces a loop.
step does not change.
stop does not change.
start changes to start+step on every iteration.
step > 0 means that the start+step increases.
start < stop means that we are approaching stop, eventually this clause terminates.
when start >= stop, we end up to the step > 0, start >= stop.
Likewise when step < 0, we can see that start+step decrements and that we approach stop from above. Likewise we observe termination there.

The above proof assumes that start, stop, step values are of same domain, comparison forms a total ordering for elements being compared in that domain, and that the step > 0, start+step > start is true, that the step < 0, start+step < start is true for all values in that domain.

We also directly see that if step can be zero, we may observe this code to produce ValueError.

Seeing how easy this conclusion is to make, I predict that if we can treat the divergence issues in logic programming, it could be extremely helpful for writing correct and safe software.

Conversion back to SSA

I could have stopped there, but there's one remaining question. How to go into SSA form from the LP form? The conversion from LP to SSA seems to be straightforward. Lets look at the gcd again:

int32 a, b, c, ret;
gcd(a, b; ret) <-
    b != 0, c = a % b, gcd(b, c; ret)
gcd(a, b; ret) <-
    b == 0, ret = a

This is what we need to have produced after type inference. We must have recognized that the abstract 0 in the program can be constant-folded into i32(0), we also must observe that the a%b occurs on integers. Also the type must have been inferenced for every variable.

We will produce the C function signature: int32(int32, int32) and an entry point.
The first branch has multiple choices so the next step is to produce a conditional.
The first clause shows an integer-valued mod and tail-calls gcd. It can convert to a jump. The value flow requires that we insert some phi-nodes here.
The last clause returns a value and doesn't branch, so we can terminate it with ret.

The final program is:

int32 gcd(int32 a, int32 b)
entry:
    a = phi(a, label0: b)
    b = phi(b, label0: c)
    x = icmp(NE, b, 0)
    brc x, label0, label1
label0:
    c = imod(a, b)
    br entry
label1:
    ret a

What if after simplification there's a call that is not a tail-call? It means that you produce a separate function that uses the same graph, but produces a function with that label as the entry.

If the insertion of phi-nodes cause any trouble, there's a paper for SSA construction algorithm that directly translate from an abstract syntax tree or bytecode into the static single assignment form.

I find it pleasing that all of the steps involved in optimizing the program by using this format seem to be straightforward to explain.