Lever JSON decoding/encoding
Lever programming language received its first JSON encoder yesterday. The encoder is capable of pretty printing. I also improved the decoder.
Testings and verification code suggests the new encoder and decoder are behaving correct. Benchmarks suggest that the decoder is as fast as the decoder in Python's json library.
The new json library was prototyped in Python. The results of that prototyping ended up to the json-algorithm repository. I think the contents there will be considerable help for anyone who needs to roll his own json implementation.
Lever already had a JSON decoder that read the header files for FFI. It was not complete though.
The new decoder is a pushdown automaton. I handwrote the state transitions and associated them with actions the automaton should do during the transition. Once I was satisfied to the results I wrote a program that compacts the state transitions in a table and a verifier that checks whether my test covers every valid state transition.
Now I'm thinking about it, it came out so well that it's really boring.
The encoder is bit more interesting. I wrote it following the CS-TR-79-770. That report is from 1979.
In modern context the algorithm is extremely simple and produces very good output for what it is. I am excited about where this algorithm could lead. It is inspiring.
The printing algorithm described in the report consists of two subroutines that operate through input stream of tokens. In my implementation the json stringifier calls the first subroutine with the token it produced. The two subroutines are called a printer and a scanner.
The scanner runs on the stream, marking on each control token how much space the things that come after it take on an untreated line.
The printer runs three lines worth of tokens after the scanner. This program acts on the basis of information the scanner gathers before it. It will split the blanks that would result in line overflows and controls indentation and line breaking around groups.
The scanner communicates with the printer by annotating the tokens it has already read. The pair is designed such that the lack of information doesn't prevent the whole system from proceeding. If the scanner annotates a token too late the printer can already act because it knows that the scanner is three line widths ahead it.
The report itself is pretty interesting. It has not aged well at all. There are lots of errors. If the paper was not written in hurry then those came from OCR software or careless scribe left out important lines. The pseudocode had reduced to very much unreadable and it was hard to understand what those people used to mean with different symbols they used.
It was still easier to read than most papers coming after functional craze that write their pseudocode in Haskell. And they are occassionally explaining simpler things than this. In best case there the paper itself describes the subject well enough that the code is superfluous.
I understand the excitement that comes with Haskell. The programs take the form of an algebra. Some people can take that algebra, calculate it on paper or transform it to obtain proofs.
The problem is that Haskell is strict. It doesn't let you violate the rules it sets. And if you violate those rules then you just confuse heck out of everyone.
Best mathematical notations always bend with the subject they work with. Those notations are ambiguous with other notations of other mathematic branches but they can explain the subject they work with to the extent that you can understand it very well.
As a notation Haskell suffers from the problems of LISP. It's a bad notation for nearly everything. It manages to do this by overloading precedence rules to the point that you naturally write code where you need to recognize the precedence in both wide horizontal and wide vertical direction to make any sense of it.
Therefore it sucks for representing and understanding programs.