carbon-language · zygoloid · Jun 25, 2021 · May 27, 2021 · Jun 24, 2021 · Jun 24, 2021
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -54,13 +54,24 @@ repos:
           - ''
           - '// '
           - ''
+          - --custom_format
+          - '\.l$'
+          - '/*'
+          - ''
+          - '*/'
+          - --custom_format
+          - '\.y$'
+          - '/*'
+          - ''
+          - '*/'
         exclude: |
           (?x)^(
               .bazelversion|
               compile_flags.txt|
               third_party/examples/.*/compile_flags.carbon.txt|
               website/(firebase/.firebaserc|jekyll/(Gemfile.lock|theme/.*))|
               .*\.def|
+              .*\.svg|
               .*/testdata/.*\.golden
           )$
       - id: check-google-doc-style

diff --git a/proposals/README.md b/proposals/README.md
@@ -54,5 +54,6 @@ request:
 -   [0438 - Functions](p0438.md)
 -   [0444 - GitHub Discussions](p0444.md)
 -   [0447 - Generics terminology](p0447.md)
+-   [0555 - Operator precedence](p0555.md)
 
 <!-- endproposals -->
diff --git a/proposals/p0555.md b/proposals/p0555.md
@@ -0,0 +1,308 @@
+# Operator precedence
+
+<!--
+Part of the Carbon Language project, under the Apache License v2.0 with LLVM
+Exceptions. See /LICENSE for license information.
+SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+-->
+
+[Pull request](https://github.com/carbon-language/carbon-lang/pull/555)
+
+<!-- toc -->
+
+## Table of contents
+
+-   [Problem](#problem)
+-   [Background](#background)
+-   [Proposal](#proposal)
+-   [Details](#details)
+    -   [Notational convention](#notational-convention)
+    -   [When to add precedence edges](#when-to-add-precedence-edges)
+    -   [Parsing with a partial precedence order](#parsing-with-a-partial-precedence-order)
+-   [Rationale based on Carbon's goals](#rationale-based-on-carbons-goals)
+-   [Alternatives considered](#alternatives-considered)
+    -   [Total order](#total-order)
+    -   [Different precedence for different operands](#different-precedence-for-different-operands)
+    -   [Require less than a partial order](#require-less-than-a-partial-order)
+
+<!-- tocstop -->
+
+## Problem
+
+Most expression-oriented languages use a strict hierarchy of precedence levels.
+That approach is error-prone, as it assigns meaning to programs that programmers
+may either not understand or may misunderstand.
+
+## Background
+
+Given an expression, we need to be able to infer its structure: what are the
+operands of each of the operators? This may be ambiguous in the absence of rules
+that determine which operator is preferred, such as in the expression
+`a $ b ^ c`: is this `(a $ b) ^ c` or `a $ (b ^ c)`?
+
+Starting with a sequence of operators and non-operator terms, we can completely
+determine the structure of an expression by determining which operator in our
+sequence will be the root of the parse tree, splitting the expression at that
+point, and recursively determining the structure of each subexpression. The
+operator that forms the root of the parse tree is said to have the lowest
+precedence in the expression.
+
+Traditionally, this is accomplished by assigning a precedence level to each
+operator and devising a total ordering over precedence levels. For example, we
+could assign a higher precedence level to an infix `*` operator than an infix
+`+` operator. With that choice of precedence levels, an infix `*` operator would
+bind tighter than an infix `+` operator, regardless of the order in which they
+appear.
+
+This approach is well-understood, but is problematic. For example, in C++,
+expressions such as `a & b << c * 3` are valid, but the meaning of such an
+expression is unlikely to be readily apparent to many programmers. Worse, for
+cases such as `a & 3 == 3`, there is a clear intended meaning, namely
+`(a & 3) == 3`, but the actual meaning is something else -- in this case,
+`a & (3 == 3)`.
+
+Because the precedence rules are not widely known and are sometimes quite
+surprising, parentheses are used as a matter of course for certain kinds of C++
+expressions. However, the absence of such parentheses is not diagnosed in all
+cases, even by many linting tools, and forgetting those parentheses can lead to
+subtle bugs.
+
+## Proposal
+
+Do not have a total ordering of precedence levels. Instead, define a partial
+ordering of precedence levels. Expressions using operators that lack relative
+orderings must be disambiguated by the programmer, for example by adding
+parentheses; when a program's meaning depends on an undefined relative ordering
+of two operators, it will be rejected due to ambiguity.
+
+The default behavior for any new operator is for it to be unordered with respect
+to all other operators, thereby requiring parentheses when combining that
+operator with any other operator. Precedence rules should be added only if it is
+reasonable to expect most or all professional Carbon developers to remember the
+precedence rule.
+
+## Details
+
+### Notational convention
+
+For pedagogical purposes, our documentation will use
+[Hasse diagrams](https://en.wikipedia.org/wiki/Hasse_diagram) to represent
+operator precedence partial orders, where operators with lower precedence are
+considered less than (and therefore depicted lower than and connected to)
+operators with higher precedence. In our diagrams, an enclosing arrow will be
+used to show associativity within precedence groups, if there is any, with a
+left-to-right arrow meaning a left-associative operator.
+
+For example:
+
+<div align="center">
+<img src="p0555/example.svg" alt="Example operator precedence diagram">
+</div>
+
+... would depict a higher-precedence `*` operator and a lower-precedence `+`
+operator, both of which are left-associative, and a non-associative `<<`
+operator. The `==` operator is lower precedence than all of those operators, and
+parentheses are higher precedence than all of those operators.
+
+With those precedence rules:
+
+-   `a + b * c` would be parsed as `a + (b * c)`, because `+` has lower
+    precedence than `*`.
+-   `a + b << c` would be an error, requiring parentheses, because the
+    precedence levels of `+` and `<<` are unordered.
+
+A [python script](p0555/figures.py) to generate these diagrams is included with
+this proposal.
+
+### When to add precedence edges
+
+Given a program whose meaning is ambiguous to a reader, it is preferable to
+reject the program rather than to arbitrarily pick a meaning. For Carbon's
+operators, we should only add an ordering between two operators if there is a
+logical reason for that ordering, not merely to provide _some_ answer. **Goal:
+for every combination of operators, either it should be reasonable to expect
+most or all professional Carbon developers to remember the precedence, or there
+should not be a precedence rule.**
+
+As an example, consider the expression `a * b ^ c`, where `*` is assumed to be a
+multiplication operator and `^` is assumed to be a bitwise XOR operation. We
+should reject this expression because there is no logical reason to perform
+either operator first and it would be unreasonable to expect Carbon developers
+to remember an arbitrary tie-breaker between the two options.
+
+This still leaves open the question of how high a bar of knowledge we put on our
+programmers (what is reasonable for us to expect?). We can use experience from
+C++ to direct this decision: just as many professional C++ programmers do not
+remember the relative precedence of `&&` vs `||`, and `&` vs `|`, and `&` vs
+`<<`, and so on, we shouldn't expect them to remember similar precedence rules
+in Carbon. If we are in doubt, omitting a precedence rule and waiting for
+real-world experience should be preferred.
+
+### Parsing with a partial precedence order
+
+A traditional, totally-ordered precedence scheme can be implemented by an
+[operator precedence parser](https://en.wikipedia.org/wiki/Operator-precedence_parser):
+
+-   Keep track of the current left-hand-side operand and an ambient precedence
+    level. The ambient precedence level is the precedence of the operator whose
+    operand is being parsed, or a placeholder "lowest" precedence level when
+    parsing an expression that is not the operand of an operator.
+-   When a new operator is encountered, its precedence is compared to the
+    ambient precedence level:
+    -   If its precedence is higher than the ambient precedence level, then
+        recurse ("shift") with that as the new ambient precedence level to form
+        the right-hand side of the new operator. After forming the right-hand
+        side, build an operator expression from the current left-hand side
+        operand and the right-hand side operand; that is the new current
+        left-hand side.
+    -   If its precedence is equal to the ambient precedence level, then use the
+        associativity of that precedence level to determine what to do:
+        -   If the operator is left-associative, build an operator expression.
+        -   If the operator is right-associative, recurse.
+        -   If the operator is non-associative, produce an error.
+    -   If its precedence is lower than the ambient precedence level, return the
+        expression formed so far; it's the complete operand to an earlier
+        operator.
+
+This is, for example, the strategy
+[currently used in Clang](https://github.com/llvm/llvm-project/blob/5f0903e9bec97e67bf34d887bcbe9d05790de934/clang/lib/Parse/ParseExpr.cpp#L396).
+
+The above algorithm is only suited to parsing in the case where precedence
+levels are totally ordered, because it does not say what to do if the new
+precedence is not comparable with the ambient precedence. However, the algorithm
+can easily be adapted to also parse with a partial precedence order by adding
+one more case:
+
+-   If the precedence level of the new operator is not comparable with the
+    ambient precedence level, produce an ambiguity error.
+
+The key observation here is that, if we ever see `... a * b ^ c ...`, where `*`
+and `^` have incomparable precedence, no later tokens can ever resolve the
+ambiguity, so we can diagnose it immediately. Sketch proof: If there were a
+valid parse tree for this expression, one of `*` and `^` must end up as an
+ancestor of the other. But in a valid parse tree, along the path from one
+operator to the other, precedences monotonically increase, so by transitivity of
+the precedence partial ordering, the ancestor operator has lower precedence than
+the descendent operator.
+
+An operator precedence parser with a partial ordering of predecence levels
+[has been implemented](https://github.com/carbon-language/carbon-lang/commit/b8afadb3c6af5e68192d585232fee759180ea1e3)
+as a proof-of-concept in the Carbon toolchain.
+
+Operator precedence partial ordering can also be implemented in yacc / bison
+parser generators by using a variant of the
+[precedence climbing method](https://en.wikipedia.org/wiki/Operator-precedence_parser#Precedence_climbing_method).
+For example, here is a yacc grammar for the Hasse diagram shown above:
+
+```
+expression: compare_expression | compare_operand;
+
+compare_expression: compare_lhs EQEQ compare_operand { $$ = ($1 == $3); };
+compare_lhs: compare_expression | compare_operand;
+compare_operand: add_expression | multiply_expression | shift_expression | primary_expression;
+
+add_expression: add_lhs '+' add_operand { $$ = ($1 + $3); };
+add_lhs: add_expression | add_operand;
+add_operand: multiply_expression | multiply_operand;
+
+multiply_expression: multiply_lhs '*' multiply_operand { $$ = ($1 * $3); };
+multiply_lhs: multiply_expression | multiply_operand;
+multiply_operand: primary_expression;
+
+shift_expression: shift_lhs LSH shift_operand { $$ = ($1 << $3); };
+shift_lhs: shift_expression | shift_operand;
+shift_operand: primary_expression;
+
+primary_expression: INT | '(' expression ')' { $$ = $2; };
+```
+
+Note that some care must be taken to avoid grammar ambiguities. Under the
+precedence climbing method, a `primary_expression` would be a
+`shift_expression`, a `multiply_expression`, and an `add_expression`, and
+therefore interpreting a `primary_expression` as an `expression` would be
+ambiguous: we could take either the `shift_expression` path or the
+`multiply_expression` path through the grammar. The above formulation avoids
+this ambiguity by excluding `primary_expression` from `add_expression` and
+`shift_expression`, and instead listing it as a distinct production for
+`compare_operand`. A yacc grammar such as the above can be produced
+systematically for any precedence partial ordering.
+
+A complete example of a yacc parser with operator precedence partial ordering is
+available [alongside this proposal](p0555/yacc-parser).
+
+## Rationale based on Carbon's goals
+
+-   Software and language evolution
+
+    -   The advice to not supply an operator precedence relationship if in doubt
+        is based on the idea that it's easier to add a precedence rule as an
+        evolutionary step than to remove one.
+
+-   Code that is easy to read, understand, and write
+
+    -   This proposal aims to support this goal by ensuring that the operator
+        expressions that are used in programs are readily understood by
+        practitioners, by making unreadable constructs invalid.
+
+## Alternatives considered
+
+### Total order
+
+We could provide a total order over operator precedence. This proposal is not
+strictly in conflict with doing so, if every ordering relationship is justified,
+but in practice we expect there to be pairs of operators for which there is no
+obvious precedence relationship.
+
+For:
+
+-   This is established practice across most languages.
+
+Against:
+
+-   This practice is a common source of bugs in the case where an arbitrary or
+    bad choice is made.
+
+### Different precedence for different operands
+
+We could provide different precedence relationships for the left and right sides
+of infix operators. For example, we could allow multiplication on the left of a
+`<<` operator but not on the right. This is precedented in C++: the `?` in a
+`?:` allows a comma operator on its right but not on its left.
+
+For:
+
+-   This may allow some additional cases that would be clear and unsurprising.
+
+Against:
+
+-   The resulting rules would be more challenging to learn, and it seems likely
+    that they would fail the test that most professional Carbon programmers know
+    the rules.
+
+This proposal is not incompatible with adopting such a direction in future if we
+find motivation to do so.
+
+### Require less than a partial order
+
+We could require something weaker than a partial ordering of precedence levels.
+This proposal assumes the following two points are useful for human
+comprehension of operator precedence:
+
+-   The lowest-precedence operator does not depend on the relative order of
+    operators in the expression (except as a tie-breaker when there are multiple
+    operators with the same precedence, where the associativity of the operator
+    is considered).
+-   If an `^` expression can appear indirectly (but unparenthesized) within an
+    `$` expression, then an `^` expression can appear directly within an `$`
+    expression.
+-   If the lowest-precedence operator in `a $ b ^ c` is `$`, and the
+    lowest-precedence operator in `b ^ c # d` is `^`, then the lowest-precedence
+    operator in `a $ b ^ c # d` is `$`.
+
+These assumptions lead to the conclusion that operator precedence should form a
+partial order over equivalence classes of operators. However, these assumptions
+could be wrong.
+
+If we find motivation to select rules that violate the above assumptions, we
+should reconsider the approach of using a partial precedence ordering, but no
+motivating case is currently known.