Marpa: Parsing Revolution

While ago I wrote that PyPy threw my projects bottom-up. This time it was the marpa library and the algorithm it pivots on.

Similar to my LR&GLL studies I were looking for parsing techniques that could be applied across applications such as compilers and IDEs. Though this time I didn't expect that one single parser could cover every case I have.

Getting used to the idea of writing this stuff in python, I finished writing an Earley variant of a parser very similar to marpa. I didn't execute this with top score, but I still ended up with much more than I anticipated for.

The algorithm I used originates from "Practical Earley Parsing (Aycock & Horspool 2002)", additionally I tried to implement Leo's algorith, but I only got it to work if I use my parser as a recognizer. I'm sure I would get it to work once I actually need right recursive grammars.

Key to projectional editing

After I got the parser to work, I ended up thinking it would solve the user interface problems I had with structure editing. Since my previous editor didn't entirely fit the design of the parser, I decided to pick some Qt4 code from the eco editor and apply it to create a testbench. The little list editor implements structure editing in much smaller form and without all the features that'd be crucial for production ready editor that textended was supposed to be.

I picked relatively simple approach to embedding the parser. Editing operations would destroy the structure leaving flags around for parser to detect and reconstruct into semantically valid structures. Every text cell user filled would be classified into token classes in the grammar. This turned out to work surprisingly well. Though, it breaks down on ambiguous grammars and does not always produce entirely expectable results. The source code turned to resemble remarkably lot for the description of incremental parsing in harmonia research project.

Here's the key to plausibly good structure editing, but the discovery itself is demotivating me to look further into the subject, because it greatly extends what I can do with plain text parsing.

Pyllisp parsing engine

I'm writing to replace a hand-written parser in pyllisp with bytecode and the same parsing engine I used in the little list editor. I'm going to shape it into a grammar file such as this one:

file => expr
expr => binop: expr plus:"+" symbol
expr => symbol

And into a module that uses the grammar, which will look something like this:

import grammarlang

def post_binop(env, lhs, op, rhs):
    pass # will contain code to
         # compile bytecode for
         # a binary operation.

parser = grammarlang.load_parser("pyllisp.grammar")

def build(source):
    env = Env()
    parser(globals(), env, source)
    return env.close()

It will let me design a language similar to the language pyllisp already parsed, but I've got far much greater control over how to extend the language. In fact this kind of construct lets me design the grammar and the compiler in sync! This is a kind of revolutionary find:

I can probably come up with some more bullet points, but I just got tired from writing them up.

Similar posts