Packrat parsing gives us “the simplicity, elegance, and generality of the backtracking model, but eliminates the risk of super-linear parse time, by saving all intermediate parsing results as they are computed and ensuring that no result is evaluated more than once.”
The main tradeoff is space. We need lots of space for memoization, but we get that very elegantly in lazy functional programming languages! No need for explicit HashMap
s.
In return, we get linear time parsing of any LL(
Building a parser
Packrat parsing is recursive descent parsing with some spice.
Naïve recursive descent
Let’s define a datatype to represent the result of a parse.
Parsing functions are pretty straightforward:
This is backtracking, and it’s exponential.
Adding memoization
Instead, we can build a memoization table, kinda like the LL(1) parsing table, where each row is a nonterminal and each column is a position in the input. At each point, we store the parsed value, and the column number corresponding to where the rest of the input starts:
To build this table, we can start from the bottom right and move right-to-left. Within each column, we move bottom-to-top. i.e., outer loop is right-to-left, inner is bottom-to-top.
This gets us linear runtime, even with backtracking. This also gives us unlimited lookahead. However, doing this manually builds lots of unneeded results, and we have to carefully order how the results in a particular column are calculated!
Packrat parsing
This is lazy memoization basically. A lazy functional language is an ideal place to implement this, since it owes itself well to not needing explicit memoization structures to keep track of this info.
Now, we create a struct representing a column of our memoization table. Additionally, a Parsed
value now corresponds to the tuple of parsed value and forward pointer to a column in our memoization table above!
We modify nonterminal parsing functions accordingly to take this into account:
Finally, we create a special top-level function parse
to tie up all this recursion:
There’s some mutual recursion stuff hapening with d
! This only works in a lazy language. We only evaluate what we need to, and every column is only evaluated at most once by recursive calls of parse
. Nuts.
Handling right-recursion
We convert it to left-recursion lmao.
TODO
There’s some other extra add-ons; I’ve skipped those for brevity.