# Earley parsing in Idris: WellFounded (Part 2)

In this series we write a certified Earley recognizer and I tell how it went.

In the second part we discuss well-founded relations as they appear in Idris, then represent string generation through shared packed parse forests.

## More Idris-suckage

Although Idris is mostly working, it tends to be really picky over which seemingly similar ways you use to represent your program. Sometimes the pattern matching declarations confuse it while other times the case expressions don't work.

Similarly sometimes the code starts working when you
drop it out from the `where`

-clause.
This isn't a well-made programming environment you'd want to use.

Too bad everything else sucks even worse after you've figured out what you can do with Idris. Going back from Idris to Python now feels like going from Python to Visual Basic. Going to statically typed languages is like going straight down to coding Cobol or Algol.

## WellFounded

Well-founded relations are easier to see than understand.
Idris has this `wfRec`

-function in its library.

`wfRec : WellFounded rel =>`

`(step : (x : a) -> ((y : a) -> rel y x -> b) -> b)`

`-> a -> b`

Here's a common way to structure programs around `wfRec`

:

`SomeRelation : A -> A -> Type`

`WellFounded SomeRelation`

`doSomething : A -> b`

`doSomething = wfRec step`

`where step : (x : A) -> ((y : A) -> SomeRelation y x -> b) -> b`

`step x rec = ?some_b`

We can define recursive functions as long as we prove our recursion is covered by a well-founded relation.

Relation seem to be well-founded if it's accessible over all values it relates:

`interface WellFounded (rel : a -> a -> Type) where`

`wellFounded : (x : _) -> Accessible rel x`

Accessibility is difficult to understand, and it's described by the following construction:

`data Accessible : (rel : a -> a -> Type) -> (x : a) -> Type where`

`Access : (rec : (y : a) -> rel y x -> Accessible rel y)`

`-> Accessible rel x`

Lets form some sort of a relation:

`data Relation1 : Nat -> Nat -> Type where`

`SAB : (S a = b) -> Relation1 a b`

`sab_acc : (x : Nat) -> (y : Nat) -> Relation1 y x -> Accessible Relation1 y`

`sab_acc Z Z (SAB Refl) impossible`

`sab_acc (S k) _ (SAB Refl) = Access (sab_acc k)`

`WellFounded Relation1 where`

`wellFounded x = Access (sab_acc x)`

Unfortunately we aren't smarter after looking at `Accessible`

-construction.
Fortunately once we have such relations,
it seems we can form recursive functions on them if we happen to have them.

Idris library contains several variations in the WellFounded -library. One is the induction by size. We can use this to do implement a transitive closure over a list.

`transitive_closure : (s -> w -> Bool) -> (w -> s)`

`-> List w -> List s -> List w`

`transitive_closure {s} {w} ck conv g vs = let`

`ind = sizeInd {a=List w} {P=TCR s w}`

`in ind step g vs`

`where step : (x : List w)`

`-> ((y : List w) -> Smaller y x -> TCR s w y)`

`-> TCR s w x`

`step g rec [] = []`

`step g rec (a::aa) = case inspect (bin (ck a) g) of`

`(With (ll,rr) prf) => case inspect (length rr) of`

`(With Z prf2) => step g rec aa`

`(With (S z) prf2) => let`

`ree = rec ll (bin_smaller (ck a) prf (flip prf2) {k=g})`

`in rr ++ ree (map conv rr)`

Unfortunately the type signature is not precise enough to provide the necessary proofs for us to verify the parsing algorithm with this, but the recipe is there. To calculate the transitive closure we repeat some single step first on the whole set then proceed to repeat it on subsets of that set.

## SPPF/PT structures

The "Generates" -relation from the previous post, that described correct behavior of the software, quite resembled like a parse tree. This is a recurring thing in proving things correct as the proofs tend to convey interesting behavior.

Since parse trees can convey the idea that the grammar generates some string, I was encouraged to pick up structures that is commonly used with Earley parsers. Shared packed parse forests.

`data SPPF : (g:Grammar s) -> SPPS s g -> List s -> Type where`

`Leaf : SPPF g (Sym s) [s]`

`Short : (rule:Index g) -> SPPF g (Sym x) xs`

`-> {auto short_rule:(rhs (nth rule) = [x])}`

`-> {auto z_is:(lhs (nth rule) = z)}`

`-> SPPF g (Sym z) xs`

`Branch : SPPF g (prev_dot rule dot) xs -> SPPF g (Sym (nth dot)) ys`

`-> {default concat_theorem ok:(k:Index zs ** cut k = (xs,ys))}`

`-> {auto z_is:(this_dot rule dot = z)}`

`-> SPPF g z zs`

`DeepShort : (rule:Index g) -> SPPF g (Sym x) xs`

`-> {auto short_rule:(rhs (nth rule) = [x])}`

`-> {auto z_is:(lhs (nth rule) = z)}`

`-> SPPF g (Sym z) xs`

`-> SPPF g (Sym z) xs`

`DeepBranch : SPPF g (prev_dot rule dot) xs -> SPPF g (Sym (nth dot)) ys`

`-> {default concat_theorem ok:(k:Index zs ** cut k = (xs,ys))}`

`-> {auto z_is:(this_dot rule dot = z)}`

`-> SPPF g z zs`

`-> SPPF g z zs`

I wanted to convey it like it appears in papers and books, so we have chains of branches and short chains. The parsed symbol is replaced by a parsing state, which may be either a completed symbol or a partial item.

`data SPPS : (s:Type) -> Grammar s -> Type where`

`Sym : s -> SPPS s g`

`Dot : {g:Grammar s} -> (rule:Index g) -> Index (rhs_tail (nth rule)) -> SPPS s g`

We can easily prove that we won't have non-empty sppf-structures:

`sppf_nonempty : SPPF g s zs -> NonEmpty zs`

`sppf_nonempty Leaf = IsNonEmpty`

`sppf_nonempty (Short _ x) = sppf_nonempty x`

`sppf_nonempty (Branch _ _ {ok=(k**_)}) = index_nonempty k`

`sppf_nonempty (DeepShort _ _ z) = sppf_nonempty z`

`sppf_nonempty (DeepBranch _ _ _ {ok=(k**_)}) = index_nonempty k`

Now the SPPF actually stands for multiple parse trees. We can describe the corresponding parse tree by dropping the chaining structures:

`data PT : (g:Grammar s) -> SPPS s g -> List s -> Type where`

`SLeaf : PT g (Sym s) [s]`

`SShort : (rule:Index g) -> PT g (Sym x) xs`

`-> {auto short_rule:(rhs (nth rule) = [x])}`

`-> {auto z_is:(lhs (nth rule) = z)}`

`-> PT g (Sym z) xs`

`SBranch : PT g (prev_dot rule dot) xs -> PT g (Sym (nth dot)) ys`

`-> {default concat_theorem ok:(k:Index zs ** cut k = (xs,ys))}`

`-> {auto z_is:(this_dot rule dot = z)}`

`-> PT g z zs`

The structure cannot be built if you don't actually have a correct result, so having this parse forest over some string implies you have at least one recognized interpretation of the input. It can be confirmed, we can convert any sppf to some parse tree it represents.

`sppf_to_any_pt : SPPF g s zs -> PT g s zs`

`sppf_to_any_pt Leaf = SLeaf`

`sppf_to_any_pt (Short rule y {short_rule=short_rule} {z_is=z_is}) =`

`SShort rule (sppf_to_any_pt y) {short_rule=short_rule} {z_is=z_is}`

`sppf_to_any_pt (Branch {rule} {dot} {ok} {z_is} x y) =`

`SBranch {rule=rule} {dot=dot} (sppf_to_any_pt x) (sppf_to_any_pt y) {ok=ok} {z_is=z_is}`

`sppf_to_any_pt (DeepShort rule y w {short_rule=short_rule} {z_is=z_is}) = sppf_to_any_pt w`

`sppf_to_any_pt (DeepBranch {rule} {dot} {ok} {z_is} x y z) = sppf_to_any_pt z`

Then we can conclude that if it's impossible to get a parse tree, then it's also impossible to get a SPPF. We can prove this through contraposition.

`pt_theorem : Not (PT g z zs) -> (SPPF g z zs) -> Void`

`pt_theorem f = (f . sppf_to_any_pt)`

The recognizers we have had so far are unable to produce these SPPF structures because they provide too few guarantees about their behavior.

## What kind of results can we get?

It seems it's sufficient to prove `¬(PT g z zs)`

to conclude that some string `zs`

cannot be parsed.
But we can't be as loose if we get a positive result.

Ideally we would like to return a `SPPF`

,
and that should contain every possible parse tree that
can be built with the given constraints.

We have to prove that given any parse-tree `a`

, we can find it from the SPPF
that we build:

`(a:PT g z zs) -> Contains a p`

Though we have a problem we have to treat before we proceed. In some cases we can't build such a "full" SPPF because there are infinite amount of parses!

Consider the following type's inhabitants:

`PT [B ::= [A], B ::= [B]] (Sym B) [A]`

That's right, we can build...

`(rule 1) $ (rule 0) (Leaf)`

`(rule 1) $ (rule 1) $ (rule 0) (Leaf)`

`(rule 1) $ (rule 1) $ (rule 1) $ (rule 0) (Leaf)`

`(rule 1) $ (rule 1) $ (rule 1) $ (rule 1) $ (rule 0) (Leaf)`

`...`

Yet we can't build an infinite SPPF that would cover it. It's also not something that we'd even want. It's nice that our SPPFs are finite. To account for this, we have to add that the structures we recognize won't be redundant. They have to satisfy the following non-redundancy condition.

`data NonRedundant : List (Index g) -> PT g z zs -> Type where`

`LeafND : NonRedundant [] SLeaf`

`ShortND : NonRedundant a x`

`-> Not (Elem rule a)`

`-> NonRedundant (rule::a)`

`(SShort rule x {short_rule=short_rule} {z_is=z_is})`

`BranchND : NonRedundant _ x -> NonRedundant _ y`

`-> NonRedundant [] (SBranch x y {rule=rule} {dot=dot} {ok=ok} {z_is=z_is})`

And we can finally tell what condition we want to hold for the SPPF we want to examine:

`sppf_is_complete : (p:SPPF g z zs) -> Type`

`sppf_is_complete {g} {z} {zs} p =`

`({rs:List (Index g)} -> (a:PT g z zs) -> NonRedundant rs a -> Contains a p)`

The conclusion is that our recognizer should fill up the following structure:

`data ParseResult : (g:Grammar s) -> SPPS s g -> List s -> Type where`

`Parse : (p:SPPF g z zs) -> sppf_is_complete p -> ParseResult g z zs`

`NoParse : {g:Grammar s}`

`-> Not (PT g z zs)`

`-> ParseResult g z zs`

Technically this is still insufficient, because our parser knows more than this. I'm just not sure how to describe an idea of a "viable prefix".

The "Containment" is a fairly simple relational statement, though I'd hope to make these simpler:

`data Contains : PT g z zs -> SPPF g z zs -> Type where`

`LeafIn : Contains SLeaf Leaf`

`ShortInTop : Contains x xs -> Contains`

`(SShort rule x {short_rule=short_rule} {z_is=z_is})`

`(Short rule xs {short_rule=short_rule} {z_is})`

`BranchInTop : Contains x xs -> Contains y ys -> Contains`

`(SBranch x y {rule=rule} {dot=dot} {ok=ok} {z_is=z_is})`

`(Branch xs ys {rule=rule} {dot=dot} {ok=ok} {z_is=z_is})`

`ShortIn : Contains x xs -> Contains`

`(SShort rule x {short_rule=short_rule} {z_is=z_is})`

`(DeepShort rule xs _ {short_rule=short_rule} {z_is})`

`BranchIn : Contains x xs -> Contains y ys -> Contains`

`(SBranch x y {rule=rule} {dot=dot} {ok=ok} {z_is=z_is})`

`(DeepBranch xs ys _ {rule=rule} {dot=dot} {ok=ok} {z_is=z_is})`

`ShortInDeep : Contains x ys -> Contains x`

`(DeepShort rule xs ys {short_rule=short_rule} {z_is})`

`BranchInDeep : Contains x zs -> Contains x`

`(DeepBranch xs ys zs {rule=rule} {dot=dot} {ok=ok} {z_is=z_is})`

I suppose to build the proof, we have to construct items like before, except that they carry the SPPF.

`data Item : (g:Grammar s) -> List s -> Type where`

`Pr : (rule:Index g) -> Item g xs`

`Sh : (rule:Index g)`

`-> (dot:Index (rhs_tail (nth rule)))`

`-> (p:SPPF g (prev_dot rule dot) zs ** sppf_is_complete p)`

`-> (k:Index xs ** (cut k = (_,zs)))`

`-> Item g xs`

To construct anything at all, we need the base case that is trivial. I suppose it'd be the symbol shifting with empty grammar, it always succeeds:

`lemma_shift_symbol : (a : PT [] (Sym x) [x]) -> NonRedundant pr a -> Contains a Leaf`

`lemma_shift_symbol SLeaf x = LeafIn`

`lemma_shift_symbol (SShort rule z) x = absurd rule`

`lemma_shift_symbol (SBranch {rule} y z) x = absurd rule`

`shift_symbol : (sym:s) -> (p:SPPF [] (Sym sym) [sym] ** sppf_is_complete p)`

`shift_symbol sym = (Leaf ** lemma_shift_symbol)`

Now I have to tell you that I haven't gotten further than this. I'd have to correctly define what is an item set, and the parser itself.

`ParseState : (g:Grammar s) -> SPPS s g -> List s -> Type`

`init : (g:Grammar s) -> (acc:s) -> ParseState g (Sym acc) []`

`shift : (g:Grammar s) -> ParseState g z zs -> (sym:s) -> ParseState g z (zs ++ [sym])`

`check : ParseState g z zs -> ParseResult g z zs`

If we had those, then we could build this:

`append_lemma : (ys ++ []) = ys`

`append_lemma {ys = []} = Refl`

`append_lemma {ys = (x :: xs)} = cong (append_lemma {ys=xs})`

`append_lemma_2 : (ys:List a) -> (x:a) -> (xs:List a) -> ((ys ++ [x]) ++ xs) = ys ++ (x :: xs)`

`append_lemma_2 [] x xs = Refl`

`append_lemma_2 (y :: ys) x xs = cong (append_lemma_2 ys x xs)`

`parse : DecEq s => (g : Grammar s) -> (acc:s) -> (input:List s) ->`

`ParseResult g (Sym acc) input`

`parse g acc xs = check $ step g xs $ init g acc`

`where step : (g:Grammar s) -> (xs:List s) -> ParseState g z ys -> ParseState g z (ys++xs)`

`step g [] pst {ys} = rewrite append_lemma {ys=ys} in pst`

`step g (x :: xs) pst {z} {ys} = let`

`pst2 = shift g pst x {z=z} {zs=ys}`

`pstn = step g xs pst2 {z=z} {ys=ys ++ [x]}`

`in rewrite flip $ append_lemma_2 ys x xs in pstn`

The difficult parts are still up ahead and I'm not sure if I am able to finish it.