We've all heard about "order of operations," which tells us how to treat syntax like `a * b + c * d ^ e ^ f / g` as if it were `(a * b) + ((c * (d ^ (e ^ f))) / g)`. For the sake of example we're using `^` for exponentiation, because that's right-associative.

We can specify precedence levels for binary operators by giving them numbers and telling whether they're left or right associative.

+, - : 10, left *, / : 20, left ^ : 30, right

The operators with higher precedence get their neighbors attached to them first, and ties are resolved based on left/right associativity. But here's what I do. Instead of saying "left" and "right," specify "left-precedence" and "right-precedence" values, or "grabbing-strengths," that say how hard an operator grabs at its neighbor.

+, - : 104, 106 *, / : 204, 206 ^ : 306, 304

We can see the grabbing strengths in action:

a 204*206 b 104+106 c 204*206 d 306^304 e 306^304 f 204/206 g

Based on the grabbing strengths,

**b**gets pulled*left*because 206 > 104**c**gets pulled*right*because 106 < 204**d**gets pulled*right*because 206 < 306**e**gets pulled*right*because 304 < 306**f**gets pulled*left*because 304 > 204

(Hold on. You might be bored. Skip to non-associative operators if you want to skip the hum-drum rehash of precedence climbing.)

What happens next? Operators that successfully pull *both* of their neighbors inward will "win." We parenthesize those sub-expressions first:

(a * b) 104+106 c 204*206 d 306^304 (e ^ f) 204/206 g

And then we keep parenthesizing winners until we're done:

(a * b) 104+106 c 204*206 (d ^ (e ^ f)) 204/206 g

(a * b) 104+106 (c * (d ^ (e ^ f))) 204/206 g

(a * b) 104+106 ((c * (d ^ (e ^ f))) / g)

((a * b) + ((c * (d ^ (e ^ f))) / g)

Let's say we're writing a parser by hand – in other words, we consume and peek at input off an input stream. We define a function to parse expressions, from left-to-right, like this:

enum Direction { Left, Right, Indeterminate }; Expr parse_expr(InputStream *p) { return parse_expr_after_prec(p, 0); } // We're walking through a text file from left to right. prec is // the right precedence of whatever operator we've already parsed, // grabbing at the stuff we're about to parse, from the left side. // This function consumes atoms and binary operatorsuntil// it gets to a weaker operator. Expr parse_expr_after_prec(InputStream *p, int prec) { // First consume an "atom" -- something that isn't a binary operator. Expr lhs = parse_atom(p); continue_after_lhs: // Look for the operator but don't consume it. char op; if (!try_peek_operator(p, &op)) { // No operator, we're at the end of the expression. return lhs; } int op_left_prec = left_precedence(op); Direction dir = grab_direction(prec, op_left_prec); switch (dir) { case Left: // The next operator grabs more weakly, so we abandon! return lhs; case Right: { // (Consume the operator we peeked at.) char op2 = parse_operator(p); assert(op2 == op); // op grabs more strongly. So, first, we consumeits// right-hand expression, using itsrightprecedence. Expr rhs = parse_expr_after_prec(p, right_precedence(op)); // op grabbed more strongly than whatever's to the left of lhs, // and it grabbed more strongly than whatever's to the right of // rhs. So we'd parenthesize (lhs op rhs), so-to-speak, and make // that our new lhs. lhs = build_binop_expr(lhs, op, rhs); goto continue_after_lhs; } break; case Indeterminate: // There's no ordering relation between prec and op_left_prec. // Parse fails! throw parse_error("sorry!"); } } Direction grab_direction(int lhs, int rhs) { if (lhs > rhs) { return Left; } else if (lhs < rhs) { return Right; } else { return Indeterminate; } } int left_precedence(char op) { switch (op) { case '+': return 104; case '-': return 104; case '*': return 204; case '/': return 204; case '^': return 306; default: throw logic_error("bad operator!"); } } int right_precedence(char op) { switch (op) { case '+': return 106; case '-': return 106; case '*': return 206; case '/': return 206; case '^': return 304; default: throw logic_error("bad operator!"); } }

The loop in `parse_expr_after_prec` is a standard technique for parsing operators. You can modify it to handle ternary operators, function call syntax (if it's not highest-precedence(??)) and generally, whatever you need. Unary operators should be given maximum precedence and be handled by `parse_atom`.

We use `grab_direction` to compare grabbing strengths. This gives us the flexibility to define a partial ordering. Suppose we change its definition:

Direction grab_direction(int x, int y) { // Treats e.g. 105 as being incomparable to 104 and 106. if (lhs > rhs + 1) { return Left; } else if (lhs + 1 < rhs) { return Right; } else { return Indeterminate; } }

Suppose we add an operator "`%`" (the modulo operator) with left-precedence 205 and right-precedence 205. Then expressions like

a 204/206 b 205%205 c

are invalid. The reason is, ask yourself: which grabs **b** harder? It's a stalemate. Thus you're not allowed to mix modulo up with multiplication and division – you're forced to use parentheses.

In fact, I recommend the following choices, if you're in charge of the syntax:

+, - : 104, 106 * : 204, 206 / : 204, 205 % : 205, 205 ^ : 306, 304

This prohibits expressions like `a / b / c` and `a / b * c`.

We're currently using numbers to represent grabbing strengths, and requiring the number to differ by at least 2 in order for an operator to "win." This defines a partial ordering. We could make a fancier partial ordering. Suppose we add the bitwise operators "`&`" and "`|`", and the equality operator, "`==`", to the language. We want to:

- prohibit mixing bitwise operators with any others, as in
`a & b | c`and`a & b + c` - permit associativity with the same operator, like
`a & b & c` - permit comparisons like
`a + b == c | d`

We could define precedence levels this way:

+, - : A-104, A-106 * : A-204, A-206 / : A-204, A-205 % : A-205, A-205 ^ : A-306, A-304 // exponentiation, not xor & : B-1, B-1 | : C-1, C-1 == : D-0, D-0

The partial ordering is that the A's worked as before, and A > D, B > D, and C > D. In code, like this:

struct PrecLevel { char letter; // A, B, C, D int number; }; Direction grab_direction(PrecLevel lhs, PrecLevel rhs) { switch (lhs.letter) { case 'A': switch (rhs.letter) { case 'A': if (lhs.number > rhs.number + 1) { return Left; } else if (lhs.number + 1 < rhs.number) { return Right; } else { return Indeterminate; } case 'B': return Indeterminate; case 'C': return Indeterminate; case 'D': return Left; } case 'B': switch (rhs.letter) { case 'A': return Indeterminate; case 'B': return Left; case 'C': return Indeterminate; case 'D': return Left; } case 'C': switch (rhs.letter) { case 'A': return Indeterminate; case 'B': return Indeterminate; case 'C': return Left; case 'D': return Left; } case 'D': switch (rhs.letter) { case 'A': return Right; case 'B': return Right; case 'C': return Right; case 'D': return Indeterminate; } } }

And that's how you can make a language with better syntax.

(Do you want to know why the levels are 104, 106, 204, ... and not 14, 16, 24, ...? Ask a BASIC programmer.)

- Sam

(posted June 28 '16)