We've all heard about "order of operations," which tells us how to treat syntax like a * b + c * d ^ e ^ f / g as if it were (a * b) + ((c * (d ^ (e ^ f))) / g). For the sake of example we're using ^ for exponentiation, because that's right-associative.
We can specify precedence levels for binary operators by giving them numbers and telling whether they're left or right associative.
+, - : 10, left *, / : 20, left ^ : 30, right
The operators with higher precedence get their neighbors attached to them first, and ties are resolved based on left/right associativity. But here's what I do. Instead of saying "left" and "right," specify "left-precedence" and "right-precedence" values, or "grabbing-strengths," that say how hard an operator grabs at its neighbor.
+, - : 104, 106 *, / : 204, 206 ^ : 306, 304
We can see the grabbing strengths in action:
a 204*206 b 104+106 c 204*206 d 306^304 e 306^304 f 204/206 g
Based on the grabbing strengths,
(Hold on. You might be bored. Skip to non-associative operators if you want to skip the hum-drum rehash of precedence climbing.)
What happens next? Operators that successfully pull both of their neighbors inward will "win." We parenthesize those sub-expressions first:
(a * b) 104+106 c 204*206 d 306^304 (e ^ f) 204/206 g
And then we keep parenthesizing winners until we're done:
(a * b) 104+106 c 204*206 (d ^ (e ^ f)) 204/206 g
(a * b) 104+106 (c * (d ^ (e ^ f))) 204/206 g
(a * b) 104+106 ((c * (d ^ (e ^ f))) / g)
((a * b) + ((c * (d ^ (e ^ f))) / g)
Let's say we're writing a parser by hand – in other words, we consume and peek at input off an input stream. We define a function to parse expressions, from left-to-right, like this:
enum Direction { Left, Right, Indeterminate }; Expr parse_expr(InputStream *p) { return parse_expr_after_prec(p, 0); } // We're walking through a text file from left to right. prec is // the right precedence of whatever operator we've already parsed, // grabbing at the stuff we're about to parse, from the left side. // This function consumes atoms and binary operators until // it gets to a weaker operator. Expr parse_expr_after_prec(InputStream *p, int prec) { // First consume an "atom" -- something that isn't a binary operator. Expr lhs = parse_atom(p); continue_after_lhs: // Look for the operator but don't consume it. char op; if (!try_peek_operator(p, &op)) { // No operator, we're at the end of the expression. return lhs; } int op_left_prec = left_precedence(op); Direction dir = grab_direction(prec, op_left_prec); switch (dir) { case Left: // The next operator grabs more weakly, so we abandon! return lhs; case Right: { // (Consume the operator we peeked at.) char op2 = parse_operator(p); assert(op2 == op); // op grabs more strongly. So, first, we consume its // right-hand expression, using its right precedence. Expr rhs = parse_expr_after_prec(p, right_precedence(op)); // op grabbed more strongly than whatever's to the left of lhs, // and it grabbed more strongly than whatever's to the right of // rhs. So we'd parenthesize (lhs op rhs), so-to-speak, and make // that our new lhs. lhs = build_binop_expr(lhs, op, rhs); goto continue_after_lhs; } break; case Indeterminate: // There's no ordering relation between prec and op_left_prec. // Parse fails! throw parse_error("sorry!"); } } Direction grab_direction(int lhs, int rhs) { if (lhs > rhs) { return Left; } else if (lhs < rhs) { return Right; } else { return Indeterminate; } } int left_precedence(char op) { switch (op) { case '+': return 104; case '-': return 104; case '*': return 204; case '/': return 204; case '^': return 306; default: throw logic_error("bad operator!"); } } int right_precedence(char op) { switch (op) { case '+': return 106; case '-': return 106; case '*': return 206; case '/': return 206; case '^': return 304; default: throw logic_error("bad operator!"); } }
The loop in parse_expr_after_prec is a standard technique for parsing operators. You can modify it to handle ternary operators, function call syntax (if it's not highest-precedence(??)) and generally, whatever you need. Unary operators should be given maximum precedence and be handled by parse_atom.
We use grab_direction to compare grabbing strengths. This gives us the flexibility to define a partial ordering. Suppose we change its definition:
Direction grab_direction(int x, int y) { // Treats e.g. 105 as being incomparable to 104 and 106. if (lhs > rhs + 1) { return Left; } else if (lhs + 1 < rhs) { return Right; } else { return Indeterminate; } }
Suppose we add an operator "%" (the modulo operator) with left-precedence 205 and right-precedence 205. Then expressions like
a 204/206 b 205%205 c
are invalid. The reason is, ask yourself: which grabs b harder? It's a stalemate. Thus you're not allowed to mix modulo up with multiplication and division – you're forced to use parentheses.
In fact, I recommend the following choices, if you're in charge of the syntax:
+, - : 104, 106 * : 204, 206 / : 204, 205 % : 205, 205 ^ : 306, 304
This prohibits expressions like a / b / c and a / b * c.
We're currently using numbers to represent grabbing strengths, and requiring the number to differ by at least 2 in order for an operator to "win." This defines a partial ordering. We could make a fancier partial ordering. Suppose we add the bitwise operators "&" and "|", and the equality operator, "==", to the language. We want to:
We could define precedence levels this way:
+, - : A-104, A-106 * : A-204, A-206 / : A-204, A-205 % : A-205, A-205 ^ : A-306, A-304 // exponentiation, not xor & : B-1, B-1 | : C-1, C-1 == : D-0, D-0
The partial ordering is that the A's worked as before, and A > D, B > D, and C > D. In code, like this:
struct PrecLevel { char letter; // A, B, C, D int number; }; Direction grab_direction(PrecLevel lhs, PrecLevel rhs) { switch (lhs.letter) { case 'A': switch (rhs.letter) { case 'A': if (lhs.number > rhs.number + 1) { return Left; } else if (lhs.number + 1 < rhs.number) { return Right; } else { return Indeterminate; } case 'B': return Indeterminate; case 'C': return Indeterminate; case 'D': return Left; } case 'B': switch (rhs.letter) { case 'A': return Indeterminate; case 'B': return Left; case 'C': return Indeterminate; case 'D': return Left; } case 'C': switch (rhs.letter) { case 'A': return Indeterminate; case 'B': return Indeterminate; case 'C': return Left; case 'D': return Left; } case 'D': switch (rhs.letter) { case 'A': return Right; case 'B': return Right; case 'C': return Right; case 'D': return Indeterminate; } } }
And that's how you can make a language with better syntax.
(Do you want to know why the levels are 104, 106, 204, ... and not 14, 16, 24, ...? Ask a BASIC programmer.)
- Sam
(posted June 28 '16)
A simpler way to formulate operator precedence is with a table of size n2. For operators x and y, the table specifies whether the expression a x b y c pulls left or right. In other words, the table displays the value of grab_direction(prec_level(x), prec_level(y)).
+ | - | * | / | % | ^ | & | | | == | |
---|---|---|---|---|---|---|---|---|---|
+ | L | L | R | R | R | R | L | ||
- | L | L | R | R | R | R | L | ||
* | L | L | L | L | R | L | |||
/ | L | L | R | L | |||||
% | L | L | R | L | |||||
^ | L | L | L | L | L | R | L | ||
| | L | L | |||||||
& | L | L | |||||||
== | R | R | R | R | R | R | R | R |
(updated May 27 '20)