Build systems are logic programming languages (à la Carte paper discussion)

I just finished reading Build Systems à la Carte (pdf link). This paper implements a build system in Haskell, capturing several behaviors perceived in other build systems to compare their features.

I also read Olle Fredriksson's "Query-based compiler architectures". This post inspired me to look into the paper.

Since Olle is calling these things "query based" people may have already figured out that there's a connection between logic programming and build systems. But lets elaborate on that.

There are also some other things I discuss at the same time.

Correspondence with logic programs

Lets pick a simple makefile.

file1: file2 file3
    cat file2 file3 > file1

file2: file4 file5
    cat file4 file5 > file2 

file3:
    echo File3 contents > file3

Assuming you have file4 and file5 in your project, when you do make --dry-run file1 it writes the corresponding script:

cat file4 file5 > file2 
echo File3 contents > file3
cat file2 file3 > file1

If you run these commands, then touch the file4 and run the same command again, you get the following script:

cat file4 file5 > file2 
cat file2 file3 > file1

The make -command checks up every dependency and builds a plan to update them. This corresponds to proof search and the proof is a program to obtain a file that's up-to-date.

The correspondence of build systems with logic programs is interesting for few reasons.

You can use any logic programming implementation as a basis for a build system, for example in Haskell you could embed microKanren.
The goals in a logic program are propositions. Propositions correspond to types.

The logic correspondence means that it's possible to represent build scripts through types:

d1 :: F "file2" -> F "file3" -> F "file1"
d1 file2 file3
    = (++) <$> file2 <*> file3

At first this correspondence may seem loose and it might seem like just a pure play with syntax. Still the proof-search aspect of build system is a thing.

How about logic variables? Could you use those with a logic program? Probably. If a logic program successfully answers a query about a build then it means that there's a plan to update the program. If variables end up into the resulting plan, they'll be structures you can fill however you like.

Even with the structures chosen by both the paper and the blogpost. You can notice the resemblance to logic programs that might also use Applicative or MonadPlus as structure.

For example these task descriptors that represent a spreadsheet formula.

spreadsheet1 :: Tasks Applicative String Integer
spreadsheet1 "B1" = Just (Task b1fn)
    where b1fn fetch = (+) <$> fetch "A1" <*> fetch "A2"
spreadsheet1 "B2" = Just (Task b2fn)
    where b2fn fetch = (*2) <$> fetch "B1"
spreadsheet1 _    = Nothing

There's that structure you might find in a microKanren implementation. The applicative works as conjunction proposition here.

Frustration with the code

The code in the paper is turning out to be very frustrating to examine. It's using some combination of Haskell extensions yet doesn't list those in the paper through a LANGUAGE -keyword. I keep getting compile errors such as "Couldn't match type ‘v’ with ‘v1’" and similar when I try to compile the sample programs. And I got no slightest idea of how to fix those issues.

The paper explains quite well what they're doing, so it's not a big problem that I don't get the code to compile.

Composition of a build system

In the paper the build system is split into two pieces:

How the tasks are scheduled, that is how it is decided which task starts running.
How it is determined which tasks should be updated.

In the latter there's a strategy that's commonly used, but I got a question about.

The use of verifying traces in a build system

To the section 4.2.2, I think the approach of using hashes to determine whether keys are dirty is incorrect.

Hash function maps an arbitrary size value into a fixed-size value. If you compute a hash of two values, and those hashes are different, then the values are different.

hash a ≠ hash b → a ≠ b

Hash function is a surjective function, but it's non-injective. That is, if you have two hashes and they're the same, it can be the values that produced those hashes aren't same.

(hash a = hash b → a = b) → 0

If you keep driving the size of the hash up, the probability of finding two colliding hashes increases.

It means that a verifying trace can fail to detect that a dependency changed. If you use too small hash here it will render the verifying trace useless.

I suppose it's not a problem, just use a big enough hash. But there's a question on this: How do you prove that the resulting behavior is acceptable?

Are demand-driven compilers overrated?

The post appeared on reddit's discussion boards and u/gasche replied to it:

I suspect that the query-based or demand-driven compilation approaches are overrated. The pitch sounds great: "IDEs are important, demand-driven makes IDEs easier, so you should implement your language this way". But is there solid evidence for this idea, and why are we assuming that conclusions from one project generalize to other projects?

The pitch is actually pretty rad. It does make IDEs easier to implement if you got a build-system-like incremental compiling present. Also these things you can implement are useful.

I've seen few of these things working in Idris and Idris2. The compiler was able to expand case expressions and display what's missing from an expresion with holes in it. It could also engage a proof search engine and that was able to automatically fill up some structures. These alone are useful on their own.

I don't know if query-based compilers generalize, but if they don't, would Makefiles work with your compiler either? They are technically one step down from query-based compiling.

I really like about the idea that you could get the same result every time something builds up, and that it'd gradually update itself if I modify it. This would happen because the system is designed such that it just works that way.

There's also this interesting possibility, an afterthought I got while finishing this blogpost. You could make the build system run the program when it's finished. The program could then reload depending on which parts of it changed, giving a sort of live programming environment that you can engage gradually when you start to need it.