diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
index b72e5e420352a..ca0f051e8fc75 100644
--- a/.pre-commit-config.yaml
+++ b/.pre-commit-config.yaml
@@ -54,6 +54,16 @@ repos:
- ''
- '// '
- ''
+ - --custom_format
+ - '\.l$'
+ - '/*'
+ - ''
+ - '*/'
+ - --custom_format
+ - '\.y$'
+ - '/*'
+ - ''
+ - '*/'
exclude: |
(?x)^(
.bazelversion|
@@ -61,6 +71,7 @@ repos:
third_party/examples/.*/compile_flags.carbon.txt|
website/(firebase/.firebaserc|jekyll/(Gemfile.lock|theme/.*))|
.*\.def|
+ .*\.svg|
.*/testdata/.*\.golden
)$
- id: check-google-doc-style
diff --git a/proposals/README.md b/proposals/README.md
index 3c6fd6583968d..b8a348b6147e6 100644
--- a/proposals/README.md
+++ b/proposals/README.md
@@ -56,5 +56,6 @@ request:
- [0444 - GitHub Discussions](p0444.md)
- [0447 - Generics terminology](p0447.md)
- [0538 - `return` with no argument](p0538.md)
+- [0555 - Operator precedence](p0555.md)
diff --git a/proposals/p0555.md b/proposals/p0555.md
new file mode 100644
index 0000000000000..d94b85e094464
--- /dev/null
+++ b/proposals/p0555.md
@@ -0,0 +1,308 @@
+# Operator precedence
+
+
+
+[Pull request](https://github.com/carbon-language/carbon-lang/pull/555)
+
+
+
+## Table of contents
+
+- [Problem](#problem)
+- [Background](#background)
+- [Proposal](#proposal)
+- [Details](#details)
+ - [Notational convention](#notational-convention)
+ - [When to add precedence edges](#when-to-add-precedence-edges)
+ - [Parsing with a partial precedence order](#parsing-with-a-partial-precedence-order)
+- [Rationale based on Carbon's goals](#rationale-based-on-carbons-goals)
+- [Alternatives considered](#alternatives-considered)
+ - [Total order](#total-order)
+ - [Different precedence for different operands](#different-precedence-for-different-operands)
+ - [Require less than a partial order](#require-less-than-a-partial-order)
+
+
+
+## Problem
+
+Most expression-oriented languages use a strict hierarchy of precedence levels.
+That approach is error-prone, as it assigns meaning to programs that developers
+may either not understand or may misunderstand.
+
+## Background
+
+Given an expression, we need to be able to infer its structure: what are the
+operands of each of the operators? This may be ambiguous in the absence of rules
+that determine which operator is preferred, such as in the expression
+`a $ b ^ c`: is this `(a $ b) ^ c` or `a $ (b ^ c)`?
+
+Starting with a sequence of operators and non-operator terms, we can completely
+determine the structure of an expression by determining which operator in our
+sequence will be the root of the parse tree, splitting the expression at that
+point, and recursively determining the structure of each subexpression. The
+operator that forms the root of the parse tree is said to have the lowest
+precedence in the expression.
+
+Traditionally, this is accomplished by assigning a precedence level to each
+operator and devising a total ordering over precedence levels. For example, we
+could assign a higher precedence level to an infix `*` operator than an infix
+`+` operator. With that choice of precedence levels, an infix `*` operator would
+bind tighter than an infix `+` operator, regardless of the order in which they
+appear.
+
+This approach is well-understood, but is problematic. For example, in C++,
+expressions such as `a & b << c * 3` are valid, but the meaning of such an
+expression is unlikely to be readily apparent to many developers. Worse, for
+cases such as `a & 3 == 3`, there is a clear intended meaning, namely
+`(a & 3) == 3`, but the actual meaning is something else -- in this case,
+`a & (3 == 3)`.
+
+Because the precedence rules are not widely known and are sometimes quite
+surprising, parentheses are used as a matter of course for certain kinds of C++
+expressions. However, the absence of such parentheses is not diagnosed in all
+cases, even by many linting tools, and forgetting those parentheses can lead to
+subtle bugs.
+
+## Proposal
+
+Do not have a total ordering of precedence levels. Instead, define a partial
+ordering of precedence levels. Expressions using operators that lack relative
+orderings must be disambiguated by the developer, for example by adding
+parentheses; when a program's meaning depends on an undefined relative ordering
+of two operators, it will be rejected due to ambiguity.
+
+The default behavior for any new operator is for it to be unordered with respect
+to all other operators, thereby requiring parentheses when combining that
+operator with any other operator. Precedence rules should be added only if it is
+reasonable to expect most or all developers who regularly use Carbon to reliably
+remember the precedence rule.
+
+## Details
+
+### Notational convention
+
+For pedagogical purposes, our documentation will use
+[Hasse diagrams](https://en.wikipedia.org/wiki/Hasse_diagram) to represent
+operator precedence partial orders, where operators with lower precedence are
+considered less than (and therefore depicted lower than and connected to)
+operators with higher precedence. In our diagrams, an enclosing arrow will be
+used to show associativity within precedence groups, if there is any, with a
+left-to-right arrow meaning a left-associative operator.
+
+For example:
+
+
+
+
+
+... would depict a higher-precedence `*` operator and a lower-precedence `+`
+operator, both of which are left-associative, and a non-associative `<<`
+operator. The `==` operator is lower precedence than all of those operators, and
+parentheses are higher precedence than all of those operators.
+
+With those precedence rules:
+
+- `a + b * c` would be parsed as `a + (b * c)`, because `+` has lower
+ precedence than `*`.
+- `a + b << c` would be an error, requiring parentheses, because the
+ precedence levels of `+` and `<<` are unordered.
+
+A [python script](p0555/figures.py) to generate these diagrams is included with
+this proposal.
+
+### When to add precedence edges
+
+Given a program whose meaning is ambiguous to a reader, it is preferable to
+reject the program rather than to arbitrarily pick a meaning. For Carbon's
+operators, we should only add an ordering between two operators if there is a
+logical reason for that ordering, not merely to provide _some_ answer. **Goal:
+for every combination of operators, either it should be reasonable to expect
+most or all developers who regularly use Carbon to reliably remember the
+precedence, or there should not be a precedence rule.**
+
+As an example, consider the expression `a * b ^ c`, where `*` is assumed to be a
+multiplication operator and `^` is assumed to be a bitwise XOR operation. We
+should reject this expression because there is no logical reason to perform
+either operator first and it would be unreasonable to expect Carbon developers
+to remember an arbitrary tie-breaker between the two options.
+
+This still leaves open the question of how high a bar of knowledge we put on our
+developers (what is reasonable for us to expect?). We can use experience from
+C++ to direct this decision: just as many developers who regularly use C++ do
+not remember the relative precedence of `&&` vs `||`, and `&` vs `|`, and `&` vs
+`<<`, and so on, we shouldn't expect them to remember similar precedence rules
+in Carbon. If we are in doubt, omitting a precedence rule and waiting for
+real-world experience should be preferred.
+
+### Parsing with a partial precedence order
+
+A traditional, totally-ordered precedence scheme can be implemented by an
+[operator precedence parser](https://en.wikipedia.org/wiki/Operator-precedence_parser):
+
+- Keep track of the current left-hand-side operand and an ambient precedence
+ level. The ambient precedence level is the precedence of the operator whose
+ operand is being parsed, or a placeholder "lowest" precedence level when
+ parsing an expression that is not the operand of an operator.
+- When a new operator is encountered, its precedence is compared to the
+ ambient precedence level:
+ - If its precedence is higher than the ambient precedence level, then
+ recurse ("shift") with that as the new ambient precedence level to form
+ the right-hand side of the new operator. After forming the right-hand
+ side, build an operator expression from the current left-hand side
+ operand and the right-hand side operand; that is the new current
+ left-hand side.
+ - If its precedence is equal to the ambient precedence level, then use the
+ associativity of that precedence level to determine what to do:
+ - If the operator is left-associative, build an operator expression.
+ - If the operator is right-associative, recurse.
+ - If the operator is non-associative, produce an error.
+ - If its precedence is lower than the ambient precedence level, return the
+ expression formed so far; it's the complete operand to an earlier
+ operator.
+
+This is, for example, the strategy
+[currently used in Clang](https://github.com/llvm/llvm-project/blob/5f0903e9bec97e67bf34d887bcbe9d05790de934/clang/lib/Parse/ParseExpr.cpp#L396).
+
+The above algorithm is only suited to parsing in the case where precedence
+levels are totally ordered, because it does not say what to do if the new
+precedence is not comparable with the ambient precedence. However, the algorithm
+can easily be adapted to also parse with a partial precedence order by adding
+one more case:
+
+- If the precedence level of the new operator is not comparable with the
+ ambient precedence level, produce an ambiguity error.
+
+The key observation here is that, if we ever see `... a * b ^ c ...`, where `*`
+and `^` have incomparable precedence, no later tokens can ever resolve the
+ambiguity, so we can diagnose it immediately. Sketch proof: If there were a
+valid parse tree for this expression, one of `*` and `^` must end up as an
+ancestor of the other. But in a valid parse tree, along the path from one
+operator to the other, precedences monotonically increase, so by transitivity of
+the precedence partial ordering, the ancestor operator has lower precedence than
+the descendent operator.
+
+An operator precedence parser with a partial ordering of predecence levels
+[has been implemented](https://github.com/carbon-language/carbon-lang/commit/b8afadb3c6af5e68192d585232fee759180ea1e3)
+as a proof-of-concept in the Carbon toolchain.
+
+Operator precedence partial ordering can also be implemented in yacc / bison
+parser generators by using a variant of the
+[precedence climbing method](https://en.wikipedia.org/wiki/Operator-precedence_parser#Precedence_climbing_method).
+For example, here is a yacc grammar for the Hasse diagram shown above:
+
+```
+expression: compare_expression | compare_operand;
+
+compare_expression: compare_lhs EQEQ compare_operand { $$ = ($1 == $3); };
+compare_lhs: compare_expression | compare_operand;
+compare_operand: add_expression | multiply_expression | shift_expression | primary_expression;
+
+add_expression: add_lhs '+' add_operand { $$ = ($1 + $3); };
+add_lhs: add_expression | add_operand;
+add_operand: multiply_expression | multiply_operand;
+
+multiply_expression: multiply_lhs '*' multiply_operand { $$ = ($1 * $3); };
+multiply_lhs: multiply_expression | multiply_operand;
+multiply_operand: primary_expression;
+
+shift_expression: shift_lhs LSH shift_operand { $$ = ($1 << $3); };
+shift_lhs: shift_expression | shift_operand;
+shift_operand: primary_expression;
+
+primary_expression: INT | '(' expression ')' { $$ = $2; };
+```
+
+Note that some care must be taken to avoid grammar ambiguities. Under the
+precedence climbing method, a `primary_expression` would be a
+`shift_expression`, a `multiply_expression`, and an `add_expression`, and
+therefore interpreting a `primary_expression` as an `expression` would be
+ambiguous: we could take either the `shift_expression` path or the
+`multiply_expression` path through the grammar. The above formulation avoids
+this ambiguity by excluding `primary_expression` from `add_expression` and
+`shift_expression`, and instead listing it as a distinct production for
+`compare_operand`. A yacc grammar such as the above can be produced
+systematically for any precedence partial ordering.
+
+A complete example of a yacc parser with operator precedence partial ordering is
+available [alongside this proposal](p0555/yacc-parser).
+
+## Rationale based on Carbon's goals
+
+- Software and language evolution
+
+ - The advice to not supply an operator precedence relationship if in doubt
+ is based on the idea that it's easier to add a precedence rule as an
+ evolutionary step than to remove one.
+
+- Code that is easy to read, understand, and write
+
+ - This proposal aims to support this goal by ensuring that the operator
+ expressions that are used in programs are readily understood by
+ practitioners, by making unreadable constructs invalid.
+
+## Alternatives considered
+
+### Total order
+
+We could provide a total order over operator precedence. This proposal is not
+strictly in conflict with doing so, if every ordering relationship is justified,
+but in practice we expect there to be pairs of operators for which there is no
+obvious precedence relationship.
+
+For:
+
+- This is established practice across most languages.
+
+Against:
+
+- This practice is a common source of bugs in the case where an arbitrary or
+ bad choice is made.
+
+### Different precedence for different operands
+
+We could provide different precedence relationships for the left and right sides
+of infix operators. For example, we could allow multiplication on the left of a
+`<<` operator but not on the right. This is precedented in C++: the `?` in a
+`?:` allows a comma operator on its right but not on its left.
+
+For:
+
+- This may allow some additional cases that would be clear and unsurprising.
+
+Against:
+
+- The resulting rules would be more challenging to learn, and it seems likely
+ that they would fail the test that most developers who regularly use Carbon
+ know the rules.
+
+This proposal is not incompatible with adopting such a direction in future if we
+find motivation to do so.
+
+### Require less than a partial order
+
+We could require something weaker than a partial ordering of precedence levels.
+This proposal assumes the following two points are useful for human
+comprehension of operator precedence:
+
+- The lowest-precedence operator does not depend on the relative order of
+ operators in the expression (except as a tie-breaker when there are multiple
+ operators with the same precedence, where the associativity of the operator
+ is considered).
+- If an `^` expression can appear indirectly (but unparenthesized) within an
+ `$` expression, then an `^` expression can appear directly within an `$`
+ expression.
+- If the lowest-precedence operator in `a $ b ^ c` is `$`, and the
+ lowest-precedence operator in `b ^ c # d` is `^`, then the lowest-precedence
+ operator in `a $ b ^ c # d` is `$`.
+
+These assumptions lead to the conclusion that operator precedence should form a
+partial order over equivalence classes of operators. However, these assumptions
+could be wrong.
+
+If we find motivation to select rules that violate the above assumptions, we
+should reconsider the approach of using a partial precedence ordering, but no
+motivating case is currently known.
diff --git a/proposals/p0555/example.svg b/proposals/p0555/example.svg
new file mode 100644
index 0000000000000..0ac4abeaa6228
--- /dev/null
+++ b/proposals/p0555/example.svg
@@ -0,0 +1,118 @@
+
+
+
+
+
diff --git a/proposals/p0555/figures.py b/proposals/p0555/figures.py
new file mode 100755
index 0000000000000..7366029e75ce8
--- /dev/null
+++ b/proposals/p0555/figures.py
@@ -0,0 +1,138 @@
+#! /usr/bin/env python
+
+__copyright__ = """
+Part of the Carbon Language project, under the Apache License v2.0 with LLVM
+Exceptions. See /LICENSE for license information.
+SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+"""
+
+fmt = "svg"
+
+
+def escape(s):
+ return (
+ s.replace("&", "&")
+ .replace("<", "<")
+ .replace(">", ">")
+ .replace("[", "[")
+ .replace("]", "]")
+ )
+
+
+def tablejoin(items, separator):
+ data = ("
%s
" % separator).join(
+ "
%s
" % item for item in items
+ )
+ return '
%s
' % data
+
+
+def code(s):
+ # FIXME: GraphViz's handling of font metrics appears to be pretty broken.
+ # Add a little extra width to each character with a non-code-font space to
+ # compensate.
+ codefont = "".join(
+ (
+ '%s '
+ )
+ % escape(part)
+ for part in s
+ )
+ return (
+ '
%s
'
+ % codefont
+ )
+
+
+def math(s):
+ # Render math in italics but otherwise unchanged.
+ return "%s" % s
+
+
+def raw(s):
+ return s
+
+
+LtR = ' shape="rarrow"'
+RtL = ' shape="larrow"'
+NonAssoc = ""
+
+out = None
+num = 0
+
+
+def group(ops, assoc=NonAssoc, style=code):
+ global num
+ num = num + 1
+ name = "op%d" % num
+ print(
+ " %s [label=<%s>%s]"
+ % (
+ name,
+ tablejoin((style(op) for op in ops), ", "),
+ assoc,
+ ),
+ file=out,
+ )
+ return name
+
+
+def edge(a, b):
+ print(" %s -> %s" % (a, b), file=out)
+
+
+def combine(name, items):
+ if len(items) <= 1:
+ return items
+ print(" %s [label=<%s> shape=ellipse]" % (name, name), file=out)
+ res = name
+ for i in items:
+ edge(i, name)
+ return [res]
+
+
+def graph(f):
+ import subprocess
+
+ outfile = open(f.__name__ + "." + fmt, "w")
+ process = subprocess.Popen(
+ ["dot", "-T" + fmt],
+ stdin=subprocess.PIPE,
+ stdout=outfile,
+ encoding="utf8"
+ # ["cat"], stdin=subprocess.PIPE, stdout=outfile, encoding='utf8'
+ )
+ global out
+ out = process.stdin
+ # print >>out, ' node [shape="rectangle" style="rounded" fontname="Arial"]'
+ print(
+ """
+digraph {
+ layout = dot
+ rankdir = TB
+ rank = "min"
+ node [shape="none" fontsize="12" height="0"
+ fontname="BlinkMacSystemFont,Segoe UI,Helvetica,Arial,sans-serif"]
+ edge [dir="none"]
+ """.strip(),
+ file=out,
+ )
+ f()
+ print("}", file=out)
+ process.communicate()
+ return f
+
+
+@graph
+def example():
+ term = group(["(...)"], NonAssoc)
+ mul = group(["a * b"], LtR)
+ add = group(["a + b"], LtR)
+ shl = group(["a << b"], NonAssoc)
+ compare = group(["a == b"], NonAssoc)
+
+ edge(term, mul)
+ edge(mul, add)
+ edge(term, shl)
+ edge(add, compare)
+ edge(shl, compare)
diff --git a/proposals/p0555/yacc-parser/Makefile b/proposals/p0555/yacc-parser/Makefile
new file mode 100644
index 0000000000000..ee58095ba0704
--- /dev/null
+++ b/proposals/p0555/yacc-parser/Makefile
@@ -0,0 +1,11 @@
+# Part of the Carbon Language project, under the Apache License v2.0 with LLVM
+# Exceptions. See /LICENSE for license information.
+# SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+
+example: example.l example.y
+ flex example.l
+ bison example.y --defines
+ clang example.tab.c lex.yy.c -o example
+
+clean:
+ rm -f example.tab.c example.tab.h lex.yy.c example
diff --git a/proposals/p0555/yacc-parser/example.l b/proposals/p0555/yacc-parser/example.l
new file mode 100644
index 0000000000000..92062a862db7f
--- /dev/null
+++ b/proposals/p0555/yacc-parser/example.l
@@ -0,0 +1,25 @@
+/*
+Part of the Carbon Language project, under the Apache License v2.0 with LLVM
+Exceptions. See /LICENSE for license information.
+SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+*/
+
+%option noyywrap
+%{
+#include
+#define YY_DECL int yylex()
+#include "example.tab.h"
+%}
+
+%%
+
+[0-9]+ { yylval = atoi(yytext); return INT; }
+"*" { return '*'; }
+"+" { return '+'; }
+"<<" { return LSH; }
+"==" { return EQEQ; }
+"(" { return '('; }
+")" { return ')'; }
+";" { return ';'; }
+
+%%
diff --git a/proposals/p0555/yacc-parser/example.y b/proposals/p0555/yacc-parser/example.y
new file mode 100644
index 0000000000000..1344890c54e1a
--- /dev/null
+++ b/proposals/p0555/yacc-parser/example.y
@@ -0,0 +1,47 @@
+/*
+Part of the Carbon Language project, under the Apache License v2.0 with LLVM
+Exceptions. See /LICENSE for license information.
+SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+*/
+
+%{
+#include
+
+extern int yylex();
+extern int yyparse();
+extern FILE* yyin;
+
+void yyerror(const char *s) { fprintf(stderr, "%s\n", s); }
+int main() { while (yyparse()) {} }
+%}
+
+%define api.value.type {int}
+%token INT LSH EQEQ
+
+%%
+
+interpreter:
+ %empty
+| interpreter expression ';' { printf("%d\n", $2); }
+
+expression: compare_expression | compare_operand;
+
+compare_expression: compare_lhs EQEQ compare_operand { $$ = ($1 == $3); };
+compare_lhs: compare_expression | compare_operand;
+compare_operand: add_expression | multiply_expression | shift_expression | primary_expression;
+
+add_expression: add_lhs '+' add_operand { $$ = ($1 + $3); };
+add_lhs: add_expression | add_operand;
+add_operand: multiply_expression | multiply_operand;
+
+multiply_expression: multiply_lhs '*' multiply_operand { $$ = ($1 * $3); };
+multiply_lhs: multiply_expression | multiply_operand;
+multiply_operand: primary_expression;
+
+shift_expression: shift_lhs LSH shift_operand { $$ = ($1 << $3); };
+shift_lhs: shift_expression | shift_operand;
+shift_operand: primary_expression;
+
+primary_expression: INT | '(' expression ')' { $$ = $2; };
+
+%%