Day 2 homework experiment: incremental parser without recursion

Miguel Lechón

#14587

March 18, 2018

I've written precedence climbing parsers before, so instead of repeating the exercise, I've decided to tweak the constraints of this exercise and build an incremental parser that is fed one character at a time and will always output a valid s-expression when asked for it.

Here's the sample output for the example Per gave us:

1                   -> 1
12                  -> 12
12*                 -> (* 12 0)
12*3                -> (* 12 3)
12*34               -> (* 12 34)
12*34 +             -> (+ (* 12 34) 0)
12*34 + 4           -> (+ (* 12 34) 4)
12*34 + 45          -> (+ (* 12 34) 45)
12*34 + 45/         -> (+ (* 12 34) (/ 45 0))
12*34 + 45/5        -> (+ (* 12 34) (/ 45 5))
12*34 + 45/56       -> (+ (* 12 34) (/ 45 56))
12*34 + 45/56 +     -> (+ (+ (* 12 34) (/ 45 56)) 0)
12*34 + 45/56 + ~   -> (+ (+ (* 12 34) (/ 45 56)) (~ 0))
12*34 + 45/56 + ~2  -> (+ (+ (* 12 34) (/ 45 56)) (~ 2))
12*34 + 45/56 + ~25 -> (+ (+ (* 12 34) (/ 45 56)) (~ 25))

On top of operator precedence, arity and associativity, I also ask for the operator's neutral element, in order to have a convenient right child placeholder for binary operators. Here's an example of that in action:

arithmetic example:
==================
1                                   -> 1
1 +                                 -> (+ 1 0)
1 + 2                               -> (+ 1 2)
1 + 2 *                             -> (+ 1 (* 2 1))
1 + 2 * (                           -> (+ 1 (* 2 1))
1 + 2 * (3                          -> (+ 1 (* 2 3))
1 + 2 * (3 -                        -> (+ 1 (* 2 (- 3 0)))
1 + 2 * (3 - 4                      -> (+ 1 (* 2 (- 3 4)))
1 + 2 * (3 - 4)                     -> (+ 1 (* 2 (- 3 4)))
1 + 2 * (3 - 4) -                   -> (- (+ 1 (* 2 (- 3 4))) 0)
1 + 2 * (3 - 4) - -                 -> (- (+ 1 (* 2 (- 3 4))) (- 0))
1 + 2 * (3 - 4) - -5                -> (- (+ 1 (* 2 (- 3 4))) (- 5))
1 + 2 * (3 - 4) - -5^               -> (- (+ 1 (* 2 (- 3 4))) (- (^ 5 1)))
1 + 2 * (3 - 4) - -5^6              -> (- (+ 1 (* 2 (- 3 4))) (- (^ 5 6)))
1 + 2 * (3 - 4) - -5^6+             -> (+ (- (+ 1 (* 2 (- 3 4))) (- (^ 5 6))) 0)
1 + 2 * (3 - 4) - -5^6+7            -> (+ (- (+ 1 (* 2 (- 3 4))) (- (^ 5 6))) 7)
1 + 2 * (3 - 4) - -5^6+7 +          -> (+ (+ (- (+ 1 (* 2 (- 3 4))) (- (^ 5 6))) 7) 0)
1 + 2 * (3 - 4) - -5^6+7 + 8        -> (+ (+ (- (+ 1 (* 2 (- 3 4))) (- (^ 5 6))) 7) 8)
1 + 2 * (3 - 4) - -5^6+7 + 8^       -> (+ (+ (- (+ 1 (* 2 (- 3 4))) (- (^ 5 6))) 7) (^ 8 1))
1 + 2 * (3 - 4) - -5^6+7 + 8^(      -> (+ (+ (- (+ 1 (* 2 (- 3 4))) (- (^ 5 6))) 7) (^ 8 1))
1 + 2 * (3 - 4) - -5^6+7 + 8^(9     -> (+ (+ (- (+ 1 (* 2 (- 3 4))) (- (^ 5 6))) 7) (^ 8 9))
1 + 2 * (3 - 4) - -5^6+7 + 8^(9/    -> (+ (+ (- (+ 1 (* 2 (- 3 4))) (- (^ 5 6))) 7) (^ 8 (/ 9 1)))
1 + 2 * (3 - 4) - -5^6+7 + 8^(9/1   -> (+ (+ (- (+ 1 (* 2 (- 3 4))) (- (^ 5 6))) 7) (^ 8 (/ 9 1)))
1 + 2 * (3 - 4) - -5^6+7 + 8^(9/10  -> (+ (+ (- (+ 1 (* 2 (- 3 4))) (- (^ 5 6))) 7) (^ 8 (/ 9 10)))
1 + 2 * (3 - 4) - -5^6+7 + 8^(9/10) -> (+ (+ (- (+ 1 (* 2 (- 3 4))) (- (^ 5 6))) 7) (^ 8 (/ 9 10)))

A couple of mildly interesting properties of this parser is that it doesn't use a lexer and that it traverses the in-memory expression tree explicitly instead of relying on recursion for precedence climbing.

I don't expect this approach to scale beyond the scope of homework, but maybe I can still reuse it for day 3.

Here's the code. The meat is mainly inside the procedure "feed".

Edited by Miguel Lechón on March 18, 2018, 7:34am Reason: Initial post

Maykeye

#14591

March 18, 2018

Are such parsers good at error recovery? For example,

1	foo(10 bar(30,20), 34 ;

in (manually written) recursive descend it's relatively easy to recover this into either "foo(10,34)"(recover at next comma) or "foo(10, bar(30,20),34)"(insert comma)

Is it easy to make precedence climbing to climb into either of these?

Miguel Lechón

#14595

March 18, 2018

Since my parser takes its input character by character, I think the "insert comma" strategy is probably easy to add, but the "recover at next comma" would require a separate ParserState_SomethingWentWrongAndImWaitingForAComma branch on the main switch-case inside the "feed" procedure and I just don't see that approach scaling up to general syntactical error recovery.

If I ever use this kind of parser for anything, it will be to take interactive input for a calculator or a REPL or something along those lines.