We've all heard about "order of operations," which tells us how to treat syntax like a * b + c * d ^ e ^ f / g as if it were (a * b) + ((c * (d ^ (e ^ f))) / g). For the sake of example we're using ^ for exponentiation, because that's right-associative.
We can specify precedence levels for binary operators by giving them numbers and telling whether they're left or right associative.
+, - : 10, left *, / : 20, left ^ : 30, right
The operators with higher precedence get their neighbors attached to them first, and ties are resolved based on left/right associativity. But here's what I do. Instead of saying "left" and "right," specify "left-precedence" and "right-precedence" values, or "grabbing-strengths," that say how hard an operator grabs at its neighbor.
+, - : 104, 106 *, / : 204, 206 ^ : 306, 304
We can see the grabbing strengths in action:
a 204*206 b 104+106 c 204*206 d 306^304 e 306^304 f 204/206 g
Based on the grabbing strengths,
(Hold on. You might be bored. Skip to non-associative operators if you want to skip the hum-drum rehash of precedence climbing.)
What happens next? Operators that successfully pull both of their neighbors inward will "win." We parenthesize those sub-expressions first:
(a * b) 104+106 c 204*206 d 306^304 (e ^ f) 204/206 g
And then we keep parenthesizing winners until we're done:
(a * b) 104+106 c 204*206 (d ^ (e ^ f)) 204/206 g
(a * b) 104+106 (c * (d ^ (e ^ f))) 204/206 g
(a * b) 104+106 ((c * (d ^ (e ^ f))) / g)
((a * b) + ((c * (d ^ (e ^ f))) / g)
Let's say we're writing a parser by hand – in other words, we consume and peek at input off an input stream. We define a function to parse expressions, from left-to-right, like this:
enum Direction { Left, Right, Indeterminate };
Expr parse_expr(InputStream *p) {
return parse_expr_after_prec(p, 0);
}
// We're walking through a text file from left to right. prec is
// the right precedence of whatever operator we've already parsed,
// grabbing at the stuff we're about to parse, from the left side.
// This function consumes atoms and binary operators until
// it gets to a weaker operator.
Expr parse_expr_after_prec(InputStream *p, int prec) {
// First consume an "atom" -- something that isn't a binary operator.
Expr lhs = parse_atom(p);
continue_after_lhs:
// Look for the operator but don't consume it.
char op;
if (!try_peek_operator(p, &op)) {
// No operator, we're at the end of the expression.
return lhs;
}
int op_left_prec = left_precedence(op);
Direction dir = grab_direction(prec, op_left_prec);
switch (dir) {
case Left:
// The next operator grabs more weakly, so we abandon!
return lhs;
case Right: {
// (Consume the operator we peeked at.)
char op2 = parse_operator(p);
assert(op2 == op);
// op grabs more strongly. So, first, we consume its
// right-hand expression, using its right precedence.
Expr rhs = parse_expr_after_prec(p, right_precedence(op));
// op grabbed more strongly than whatever's to the left of lhs,
// and it grabbed more strongly than whatever's to the right of
// rhs. So we'd parenthesize (lhs op rhs), so-to-speak, and make
// that our new lhs.
lhs = build_binop_expr(lhs, op, rhs);
goto continue_after_lhs;
} break;
case Indeterminate:
// There's no ordering relation between prec and op_left_prec.
// Parse fails!
throw parse_error("sorry!");
}
}
Direction grab_direction(int lhs, int rhs) {
if (lhs > rhs) {
return Left;
} else if (lhs < rhs) {
return Right;
} else {
return Indeterminate;
}
}
int left_precedence(char op) {
switch (op) {
case '+': return 104;
case '-': return 104;
case '*': return 204;
case '/': return 204;
case '^': return 306;
default:
throw logic_error("bad operator!");
}
}
int right_precedence(char op) {
switch (op) {
case '+': return 106;
case '-': return 106;
case '*': return 206;
case '/': return 206;
case '^': return 304;
default:
throw logic_error("bad operator!");
}
}
The loop in parse_expr_after_prec is a standard technique for parsing operators. You can modify it to handle ternary operators, function call syntax (if it's not highest-precedence(??)) and generally, whatever you need. Unary operators should be given maximum precedence and be handled by parse_atom.
We use grab_direction to compare grabbing strengths. This gives us the flexibility to define a partial ordering. Suppose we change its definition:
Direction grab_direction(int x, int y) {
// Treats e.g. 105 as being incomparable to 104 and 106.
if (lhs > rhs + 1) {
return Left;
} else if (lhs + 1 < rhs) {
return Right;
} else {
return Indeterminate;
}
}
Suppose we add an operator "%" (the modulo operator) with left-precedence 205 and right-precedence 205. Then expressions like
a 204/206 b 205%205 c
are invalid. The reason is, ask yourself: which grabs b harder? It's a stalemate. Thus you're not allowed to mix modulo up with multiplication and division – you're forced to use parentheses.
In fact, I recommend the following choices, if you're in charge of the syntax:
+, - : 104, 106 * : 204, 206 / : 204, 205 % : 205, 205 ^ : 306, 304
This prohibits expressions like a / b / c and a / b * c.
We're currently using numbers to represent grabbing strengths, and requiring the number to differ by at least 2 in order for an operator to "win." This defines a partial ordering. We could make a fancier partial ordering. Suppose we add the bitwise operators "&" and "|", and the equality operator, "==", to the language. We want to:
We could define precedence levels this way:
+, - : A-104, A-106 * : A-204, A-206 / : A-204, A-205 % : A-205, A-205 ^ : A-306, A-304 // exponentiation, not xor & : B-1, B-1 | : C-1, C-1 == : D-0, D-0
The partial ordering is that the A's worked as before, and A > D, B > D, and C > D. In code, like this:
struct PrecLevel {
char letter; // A, B, C, D
int number;
};
Direction grab_direction(PrecLevel lhs, PrecLevel rhs) {
switch (lhs.letter) {
case 'A':
switch (rhs.letter) {
case 'A':
if (lhs.number > rhs.number + 1) {
return Left;
} else if (lhs.number + 1 < rhs.number) {
return Right;
} else {
return Indeterminate;
}
case 'B': return Indeterminate;
case 'C': return Indeterminate;
case 'D': return Left;
}
case 'B':
switch (rhs.letter) {
case 'A': return Indeterminate;
case 'B': return Left;
case 'C': return Indeterminate;
case 'D': return Left;
}
case 'C':
switch (rhs.letter) {
case 'A': return Indeterminate;
case 'B': return Indeterminate;
case 'C': return Left;
case 'D': return Left;
}
case 'D':
switch (rhs.letter) {
case 'A': return Right;
case 'B': return Right;
case 'C': return Right;
case 'D': return Indeterminate;
}
}
}
And that's how you can make a language with better syntax.
(Do you want to know why the levels are 104, 106, 204, ... and not 14, 16, 24, ...? Ask a BASIC programmer.)
- Sam
(posted June 28 '16)
A simpler way to formulate operator precedence is with a table of size n2. For operators x and y, the table specifies whether the expression a x b y c pulls left or right. In other words, the table displays the value of grab_direction(prec_level(x), prec_level(y)).
| + | - | * | / | % | ^ | & | | | == | |
|---|---|---|---|---|---|---|---|---|---|
| + | L | L | R | R | R | R | L | ||
| - | L | L | R | R | R | R | L | ||
| * | L | L | L | L | R | L | |||
| / | L | L | R | L | |||||
| % | L | L | R | L | |||||
| ^ | L | L | L | L | L | R | L | ||
| | | L | L | |||||||
| & | L | L | |||||||
| == | R | R | R | R | R | R | R | R |
(updated May 27 '20)