# Studying Axiom with a Context-Free Grammar

Axiom has been implemented in common lisp and parsed with a modified Pratt parser. There's a preparsing step that inserts parentheses and semicolons to implement indentation sensitivity.

It is working, but it is difficult to find out how this language ought be parsed from the contents of a hand-written parser.

I wrote a context-free grammar for Axiom. Ultimately, I am attempting to interpret this language and learn some details about its innards.

## The properties of the grammar

The language presented in this grammar is not equivalent with the one parsed by the original implementation. I ensured that it unambiguously parses the source code of the packages that come with Axiom.

I am not certain that the whole parse is correct because there are small details that were difficult to understand.

## Preliminary

I assume that you understand what context-free grammars are. To figure out the syntax, you may want to have a short introduction. The following grammar includes the string "little mary had a lamb"

`sentence: who "had" what`

`who:`

`"little" "mary"`

`"duke" "nukem"`

`what:`

`"a" "lamb"`

Grammar's purpose is to reveal the structure of the strings that are valid in a language it represents. The production rules are generative so a string may be ambiguous, eg. it may parse multiple ways.

## Acquisition

Obtaining a grammar for a hand-written parser is difficult. I tried several ways to obtain a grammar for the pratt parser until I found a way that worked.

The method that worked was that I start with the following grammar:

`expr:`

`atom+`

`atom:`

`symbol`

`string`

`int`

`all_special_characters`

`"(" expr? ")"`

`"[" expr? "]"`

`"{" expr? "}"`

Next you start searching out for the topmost elements that exist in the language and place them to their respective places in the grammar. Eventually I figured out how to put everything together that appears below.

## Semicolon separators

Semicolons divide the file into statements, additionally commas are doing the same within parentheses.

`stmt_0:`

`stmt_0 ";" stmt_1`

`stmt_1`

`expr_0:`

`expr_1 ["for" symbol "in" for_loop_head, "while" expr_100]+`

`expr_1`

`expr_1:`

`expr_1 ";" expr_10`

`expr_10`

`expr_10:`

`expr_10 "," statement`

`statement`

The parentheses may also enclose generator expressions. They
are told apart by `for`

or `while`

without `repeat`

inside
them.

## Where clause

Most of the toplevel statements in the `.spad`

files have a
`where`

-clause. It divides the clause into two parts.

`stmt_1:`

`statement "where" statement`

`statement`

Later I changed the grammar further to make use of the differences at this point.

## Statements - Dangling else

Statements present several difficulties for the grammar. The worst of them is the dangling else.

The `s_prefix`

-macro implements the numerous assignment
operators this language comes with.

`statement:`

`s_prefix(if_then)`

`s_prefix(if_then_else)`

`if_then_else:`

`"if" expr_100 "then" s_prefix(if_then_else) "else" s_prefix(if_then_else)`

`iterator* "repeat" statement`

`expr_100`

`"return" expr_100?`

`"leave" expr_100?`

`if_then:`

`"if" expr_100 "then" statement`

`"if" expr_100 "then" s_prefix(if_then_else) "else" s_prefix(if_then)`

The above grammar ensures that the `else`

expressions end up
being associated to the innermost conditional in a
statement.

The `if_then_else`

coalescess the remaining statements
because you find those other statements within these ones.
Without the dangling else problem this would be a much simpler
grammar.

## Statements - loops

The syntax of Axiom presents an idea that generative clauses, streams and loops are related constructs. This is a nice viewpoint I will likely incorporate into my own language's syntax eventually.

The same idea is present in common lisp's loop macro, perhaps in a bit stronger form.

I am not sure how these iterators stack within each other.
The `repeat`

clause builds a loop block out of them.

The repeat may also appear alone, and in that case it's the same as "while true repeat ..."

`iterator:`

`"for" symbol "in" for_loop_head`

`"while" expr_100`

`"until" expr_100`

`for_loop_head:`

`expr_100 (["|", "suchthat"] expr_100)?`

## Statements - assignment

The assignment ends up awkwardly inside the dangling else
-problem. The result is two sub-grammars
`s_prefix(if_then_else)`

and `s_prefix(if_then)`

that have a
similar shape:

`s_prefix(stmt):`

`expr_100 "==>" s_prefix(stmt)`

`expr_100 "==" s_prefix(stmt)`

`expr_100 "with" s_prefix(stmt)`

`expr_100 "add" s_prefix(stmt)`

`expr_100 "==>" "with" s_prefix(stmt)`

`expr_100 "==>" "add" s_prefix(stmt)`

`expr_100 "==" "with" s_prefix(stmt)`

`expr_100 "==" "add" s_prefix(stmt)`

`expr_100 ":=" s_prefix(stmt)`

`expr_100 ":-" s_prefix(stmt)`

`expr_100 "=>" s_prefix(stmt)`

`expr_100 "+->" s_prefix(stmt) # likely not correct place for this.`

`":" s_prefix(stmt)`

`stmt`

The `with`

and `add`

are weird but important parts of this
language. `with`

is used to provide properties for domains
whereas `add`

provides implementations for operations in
domains.

The category and domain constructors are allowed to contain conditionals and loops. Hinting that domains and categories are not parametric, instead they are always constructed by a function that has parameters.

## Expressions

The "or" and "and" here are left-recursive. It may be necessary to see in practice which precedence rules the Axiom used.

`expr_100:`

`expr_100 "or" expr_120`

`expr_100 "\\/" expr_120`

`expr_120`

`expr_120:`

`expr_120 "and" expr_130`

`expr_120 "/\\" expr_130`

`expr_130`

Original parser is likely accepting ^, not, ~ as a stacked prefix, but it did not appear in practice so I left recursion out from these groups.

`expr_130:`

`"^" expr_150`

`"not" expr_150`

`"~" expr_150`

`expr_150`

`expr_150: # 399 400`

`"has" expr_200`

`"=" opsuffix? expr_200`

`expr_200 "<" opsuffix? expr_200`

`expr_200 ">" opsuffix? expr_200`

`expr_200 "<=" opsuffix? expr_200`

`expr_200 ">=" opsuffix? expr_200`

`expr_200 "=" opsuffix? expr_200`

`expr_200 "^=" opsuffix? expr_200`

`expr_200 "~=" opsuffix? expr_200`

`expr_200 "in" expr_200`

`expr_200 "case" expr_200`

`expr_200 "has" expr_200`

`expr_200 "is" expr_200`

`expr_200 "isnt" expr_200`

`expr_200`

The `..`

forms a generator expression, I bet it only appears
in some places but I'm not sure which. Anyway the original
parser would appear to accept it about anywhere it appears.

`expr_200:`

`expr_200 ".."`

`expr_200 ".." expr_300`

`expr_300`

There is an optional opsuffix spread across the grammar, it will be explained later what it is.

The precedence rules of Axiom seem to group neatly even if there are a lot of them.

`expr_300:`

`expr_300 "+" opsuffix? expr_400`

`expr_300 "-" opsuffix? expr_400`

`expr_400`

`expr_400:`

`"-" opsuffix? expr_400`

`"+" opsuffix? expr_400`

`expr_500`

`expr_500:`

`expr_500 "*" opsuffix? expr_800`

`expr_500 "rem" opsuffix? expr_800`

`expr_500 "mod" opsuffix? expr_800`

`expr_500 "quo" opsuffix? expr_800`

`expr_500 "div" opsuffix? expr_800`

`expr_500 "/" opsuffix? expr_800`

`expr_500 "exquo" opsuffix? expr_800`

`expr_800`

## Reducers

In some places I saw expressions such as `+/[1,2,3]`

. I
suppose that thing evaluates to the `1 + (2 + 3)`

, as in
this would be a syntax for reducing sequences.

I saw it being used with only few kind of operators. You can likely use any operator you want.

`reducer:`

`"+"`

`"-"`

`"*"`

`"or"`

`"and"`

`expr_800:`

`reducer "/" expr_900`

`expr_800 "**" opsuffix? expr_900`

`expr_800 "^" expr_900`

`expr_900`

## Various tight binding operators

It took a lot of effort to figure out what is the correct order in the following operators. I think I didn't get it right entirely. For example, sometimes the opsuffix appears on a place that hints it doesn't structure like I believe it does.

`expr_900:`

`expr_900 ":" expr_910`

`expr_900 "::" expr_910`

`expr_900 "@" expr_910`

`expr_910`

`expr_910:`

`expr_910 "pretend" expr_920`

`expr_910 "." expr_920`

`expr_920`

`expr_920:`

`expr_930`

`expr_930:`

`"->" expr_950`

`expr_930 "->" expr_950`

`expr_930 "<-" expr_950`

`expr_950`

`expr_950:`

`"!" expr_955`

`expr_955 "!!" expr_955`

`expr_955`

`expr_955:`

`"#" opsuffix? expr_955`

`"'" expr_955`

`expr_960`

## Trouble with the dollar sign

The interaction of the dollar sign is troublesome. It's there to select the source package for the operation when it may be vague. In practice it seems to be also used to denote when the expression should be evaluated from the common lisp environment, and it also means something when it appears alone or in the end of the expression.

Nud/Led naming comes from the pratt parser. It refers to the expression being a prefix vs. a subsequent item.

Additionally you can apply items without parentheses. `a b`

is same as `a(b)`

. This easily causes ambiguities when the
parentheses are also used for grouping items so the
production rules for parentheses have to be carefully
crafted.

`expr_960:`

`expr_1000 "$" expr_960`

`expr_1000`

`expr_1000:`

`"$"`

`atom_nud expr_1000_led`

`atom_nud`

`expr_1000_led:`

`"#" expr_1000_led`

`atom_led expr_1000_led`

`atom_led`

`"$"`

`atom_nud:`

`atom`

`atom_nud "(" expr_0? ")"`

`"(" expr_0? ")"`

`atom_led:`

`atom_led "(" expr_0? ")"`

`atom`

`atom:`

`symbol`

`string`

`int`

`float`

`"%"`

`"[" expr_0? "]"`

`"{" expr_0? "}"`

## Opsuffix

Opsuffix is one of those things I really don't like at all.

`opsuffix:`

`"$" "%"`

`"$" symbol`

`"$" "(" expr_0? ")"`

## Unused keywords

The parser described several keywords that aren't in the above grammar.

`keywords:`

`"exit"`

`"from"`

`"iterate"`

`"yield"`

`"otherwise"`

`"when"`

`"unless"`

I just left them away because they aren't used anywhere in the source code. But wrote them up so that when they appear, it will be noticed.

## Final thoughts

A well-preserved programming language that is more than 40 years old.

I didn't mention much about the tokenizer, but tokenizer in Axiom isn't looking like very interesting. The only odd-looking thing that it does is the use of underscore as an escape character.

I keep asking myself why did I find this, and why only now? I think the question that brought me here is:

What does A + B mean when A and B are not same specific type, or not identified as specific types of something?

I keep going until I find answers.