SH:Blog:

### Precedence Parsing

We've all heard about "order of operations," which tells us how to treat syntax like a * b + c * d ^ e ^ f / g as if it were (a * b) + ((c * (d ^ (e ^ f))) / g). For the sake of example we're using ^ for exponentiation, because that's right-associative.

We can specify precedence levels for binary operators by giving them numbers and telling whether they're left or right associative.

```+, - : 10, left
*, / : 20, left
^    : 30, right
```

The operators with higher precedence get their neighbors attached to them first, and ties are resolved based on left/right associativity. But here's what I do. Instead of saying "left" and "right," specify "left-precedence" and "right-precedence" values, or "grabbing-strengths," that say how hard an operator grabs at its neighbor.

```+, - : 104, 106
*, / : 204, 206
^    : 306, 304
```

We can see the grabbing strengths in action:

a 204*206 b 104+106 c 204*206 d 306^304 e 306^304 f 204/206 g

Based on the grabbing strengths,

• b gets pulled left because 206 > 104
• c gets pulled right because 106 < 204
• d gets pulled right because 206 < 306
• e gets pulled right because 304 < 306
• f gets pulled left because 304 > 204

(Hold on. You might be bored. Skip to non-associative operators if you want to skip the hum-drum rehash of precedence climbing.)

What happens next? Operators that successfully pull both of their neighbors inward will "win." We parenthesize those sub-expressions first:

(a * b) 104+106 c 204*206 d 306^304 (e ^ f) 204/206 g

And then we keep parenthesizing winners until we're done:

(a * b) 104+106 c 204*206 (d ^ (e ^ f)) 204/206 g

(a * b) 104+106 (c * (d ^ (e ^ f))) 204/206 g

(a * b) 104+106 ((c * (d ^ (e ^ f))) / g)

((a * b) + ((c * (d ^ (e ^ f))) / g)

Let's say we're writing a parser by hand – in other words, we consume and peek at input off an input stream. We define a function to parse expressions, from left-to-right, like this:

```enum Direction { Left, Right, Indeterminate };

Expr parse_expr(InputStream *p) {
return parse_expr_after_prec(p, 0);
}

// We're walking through a text file from left to right. prec is
// the right precedence of whatever operator we've already parsed,
// grabbing at the stuff we're about to parse, from the left side.
// This function consumes atoms and binary operators until
// it gets to a weaker operator.
Expr parse_expr_after_prec(InputStream *p, int prec) {
// First consume an "atom" -- something that isn't a binary operator.
Expr lhs = parse_atom(p);

continue_after_lhs:
// Look for the operator but don't consume it.
char op;
if (!try_peek_operator(p, &op)) {
// No operator, we're at the end of the expression.
return lhs;
}
int op_left_prec = left_precedence(op);
Direction dir = grab_direction(prec, op_left_prec);
switch (dir) {
case Left:
// The next operator grabs more weakly, so we abandon!
return lhs;
case Right: {
// (Consume the operator we peeked at.)
char op2 = parse_operator(p);
assert(op2 == op);

// op grabs more strongly.  So, first, we consume its
// right-hand expression, using its right precedence.
Expr rhs = parse_expr_after_prec(p, right_precedence(op));

// op grabbed more strongly than whatever's to the left of lhs,
// and it grabbed more strongly than whatever's to the right of
// rhs.    So we'd parenthesize (lhs op rhs), so-to-speak, and make
// that our new lhs.
lhs = build_binop_expr(lhs, op, rhs);
goto continue_after_lhs;
} break;
case Indeterminate:
// There's no ordering relation between prec and op_left_prec.
// Parse fails!
throw parse_error("sorry!");
}
}

Direction grab_direction(int lhs, int rhs) {
if (lhs > rhs) {
return Left;
} else if (lhs < rhs) {
return Right;
} else {
return Indeterminate;
}
}

int left_precedence(char op) {
switch (op) {
case '+': return 104;
case '-': return 104;
case '*': return 204;
case '/': return 204;
case '^': return 306;
default:
}
}

int right_precedence(char op) {
switch (op) {
case '+': return 106;
case '-': return 106;
case '*': return 206;
case '/': return 206;
case '^': return 304;
default:
}
}
```

The loop in parse_expr_after_prec is a standard technique for parsing operators. You can modify it to handle ternary operators, function call syntax (if it's not highest-precedence(??)) and generally, whatever you need. Unary operators should be given maximum precedence and be handled by parse_atom.

### Non-associative operators

We use grab_direction to compare grabbing strengths. This gives us the flexibility to define a partial ordering. Suppose we change its definition:

```Direction grab_direction(int x, int y) {
// Treats e.g. 105 as being incomparable to 104 and 106.
if (lhs > rhs + 1) {
return Left;
} else if (lhs + 1 < rhs) {
return Right;
} else {
return Indeterminate;
}
}
```

Suppose we add an operator "%" (the modulo operator) with left-precedence 205 and right-precedence 205. Then expressions like

a 204/206 b 205%205 c

are invalid. The reason is, ask yourself: which grabs b harder? It's a stalemate. Thus you're not allowed to mix modulo up with multiplication and division – you're forced to use parentheses.

In fact, I recommend the following choices, if you're in charge of the syntax:

```+, - : 104, 106
*    : 204, 206
/    : 204, 205
%    : 205, 205
^    : 306, 304
```

This prohibits expressions like a / b / c and a / b * c.

We're currently using numbers to represent grabbing strengths, and requiring the number to differ by at least 2 in order for an operator to "win." This defines a partial ordering. We could make a fancier partial ordering. Suppose we add the bitwise operators "&" and "|", and the equality operator, "==", to the language. We want to:

• prohibit mixing bitwise operators with any others, as in a & b | c and a & b + c
• permit associativity with the same operator, like a & b & c
• permit comparisons like a + b == c | d

We could define precedence levels this way:

```+, - : A-104, A-106
*    : A-204, A-206
/    : A-204, A-205
%    : A-205, A-205
^    : A-306, A-304  // exponentiation, not xor

&    : B-1, B-1
|    : C-1, C-1

==   : D-0, D-0
```

The partial ordering is that the A's worked as before, and A > D, B > D, and C > D. In code, like this:

```struct PrecLevel {
char letter;  // A, B, C, D
int number;
};

Direction grab_direction(PrecLevel lhs, PrecLevel rhs) {
switch (lhs.letter) {
case 'A':
switch (rhs.letter) {
case 'A':
if (lhs.number > rhs.number + 1) {
return Left;
} else if (lhs.number + 1 < rhs.number) {
return Right;
} else {
return Indeterminate;
}
case 'B': return Indeterminate;
case 'C': return Indeterminate;
case 'D': return Left;
}
case 'B':
switch (rhs.letter) {
case 'A': return Indeterminate;
case 'B': return Left;
case 'C': return Indeterminate;
case 'D': return Left;
}
case 'C':
switch (rhs.letter) {
case 'A': return Indeterminate;
case 'B': return Indeterminate;
case 'C': return Left;
case 'D': return Left;
}
case 'D':
switch (rhs.letter) {
case 'A': return Right;
case 'B': return Right;
case 'C': return Right;
case 'D': return Indeterminate;
}
}
}
```

And that's how you can make a language with better syntax.

(Do you want to know why the levels are 104, 106, 204, ... and not 14, 16, 24, ...? Ask a BASIC programmer.)

- Sam

(posted June 28 '16)

### Update:Table-Based Precedence

A simpler way to formulate operator precedence is with a table of size n2. For operators x and y, the table specifies whether the expression a x b y c pulls left or right. In other words, the table displays the value of grab_direction(prec_level(x), prec_level(y)).

+ - * / % ^ & | == L L R R R R L L L R R R R L L L L L R L L L R L L L R L L L L L L R L L L L L R R R R R R R R

(updated May 27 '20)