"Cheating" on Day 2 with a Pratt parser

← Back to index

"Cheating" on Day 2 with a Pratt parser

Anikki

#14558

March 17, 2018

I think this is a very simple way of parsing expressions. It's called a Pratt parser. Changing the associativity of the binary operators is as simple as subtracting 1.

enum {
    LBP_ADD = 1,
    LBP_MUL = 2,
    LBP_NEG = 3,
    LBP_POW = 4
}

int[256] lbps;

void init_lbps() {
    lbps['+'] = LBP_ADD;
    lbps['-'] = LBP_ADD;

    lbps['*'] = LBP_MUL;
    lbps['/'] = LBP_MUL;
    lbps['%'] = LBP_MUL;
    lbps[TOKEN_LEFT_SHIFT] = LBP_MUL;
    lbps[TOKEN_RIGHT_SHIFT] = LBP_MUL;
    lbps['&'] = LBP_MUL;

    lbps[TOKEN_POW] = LBP_POW;
}

int lbp(TokenKind kind) { return lbps[kind]; }

int nud(Token token) {
    switch (token.kind) {
    case TOKEN_INT: return token.val;
    case '-': return -parse_expr(LBP_NEG);
    case '~': return ~parse_expr(LBP_NEG);
    case '(':
        int result = parse_expr(0);
        expect(')');
        return result;
    default: assert(false);
    }
}

int led(TokenKind kind, int lhs) {
    int assoc = 0; // left associative
    if (kind == TokenKind.Pow) {
        assoc = 1; // right associative
    }
    int rhs = parse_expr(lbp(kind) - assoc);
    switch (kind) {
    case '+': return lhs + rhs;
    case '-': return lhs - rhs;
    case '*': return lhs * rhs;
    case '/': assert(rhs != 0); return lhs / rhs;
    case '%': return lhs % rhs;
    case '&': return lhs & rhs;
    case TOKEN_LEFT_SHIFT: return lhs << rhs;
    case TOKEN_RIGHT_SHIFT: return lhs >> rhs;
    case TOKEN_POW: return cast(int) pow(cast(float) lhs, cast(float) rhs);
    default: assert(false);
    }
}

int parse_expr(int rbp) {
    Token curr_tok = token;
    next_token();
    int lhs = nud(curr_tok);
    while (rbp < lbp(token.kind)) {
        curr_tok = token;
        next_token();
        lhs = led(curr_tok.kind, lhs);
    }
    return lhs;
}

Edited by Anikki on March 17, 2018, 3:38am Reason: Initial post

Per Vognsen

#14560

March 17, 2018

Yeah, Pratt parsing is neat and easy to code, but most people couldn't have invented it from scratch. The nice thing about precedence climbing is that it's intuitive enough that many people can invent (and have invented) it on their own, and it doesn't require a different mental model from the rest of your recursive-descent parser. If you do the "make it table driven" exercise from day 2, you'll also see the code compresses down heavily and becomes generic by a straightforward parameterization and refactoring of your existing code. The sequence leading up to the table-driven refactoring is intended as a guided exercise culminating in inventing a "name brand" computer science algorithm on your own that would have been non-obvious or opaque if it had just been sprung on you fully formed. I encourage you to try it out and see for yourself even if you know other techniques like Pratt parsing or Dijkstra's shunting yard algorithm.

Edited by Per Vognsen on March 17, 2018, 5:41am

Reply to Thread

← Back to index