From 1cd901738c8ece711f3b51b4e199f132f118bedd Mon Sep 17 00:00:00 2001 From: Mazdak Farrokhzad Date: Mon, 9 Apr 2018 11:38:44 +0200 Subject: [PATCH 01/26] rfc, nested-disjunction-patterns: some initial notes" --- text/0000-nested-patterns.md | 109 +++++++++++++++++++++++++++++++++++ 1 file changed, 109 insertions(+) create mode 100644 text/0000-nested-patterns.md diff --git a/text/0000-nested-patterns.md b/text/0000-nested-patterns.md new file mode 100644 index 00000000000..ed00b1c4eb2 --- /dev/null +++ b/text/0000-nested-patterns.md @@ -0,0 +1,109 @@ +- Feature Name: `nested_disjunction_patterns` +- Start Date: 2018-04-07 +- RFC PR: (leave this empty) +- Rust Issue: (leave this empty) + +# Summary +[summary]: #summary + +Allow `|` to be used within a pattern such that `Some(A(x) | B(x))` becomes +a valid pattern. You can also nest `|` by an arbitrary amount. + +# Motivation +[motivation]: #motivation + +## DRY + +## Mental model + ++ French ++ German ++ Spanish ++ Esperanto ++ Swedish ++ Farsi ++ Finnish ++ Japanese ++ Portuguese + +## Readability + +## Ergonomics + + + +TODO + +# Guide-level explanation +[guide-level-explanation]: #guide-level-explanation + +```rust +match (first, next!()) { + (0xE0 , 0xA0 ... 0xBF) | + (0xE1 ... 0xEC, 0x80 ... 0xBF) | + (0xED , 0x80 ... 0x9F) | + (0xEE ... 0xEF, 0x80 ... 0xBF) => {} + _ => err!(Some(1)) +} +``` + +rewrite as: + +```rust +match (first, next!()) { + (0xE0 , 0xA0 ... 0xBF) | + (0xE1 ... 0xEC | 0xEE ... 0xEF, 0x80 ... 0xBF) | + (0xED , 0x80 ... 0x9F) => {} + _ => err!(Some(1)) +} +``` + +# Reference-level explanation +[reference-level-explanation]: #reference-level-explanation + +TODO + +# Drawbacks +[drawbacks]: #drawbacks + +TODO + +# Rationale and alternatives +[alternatives]: #alternatives + +TODO + +# Prior art +[prior-art]: #prior-art + +## CSS4 selectors + +[CSS4]: https://drafts.csswg.org/selectors/#matches + +In [CSS4] (draft proposal), it is possible to write a selector +`div > *:matches(ul, ol)` which is equivalent to `div > ul, div > ol`. +The moral equivalent of this in Rust would be: `Div(Ul | Ol)`. + +## Regex + +TODO + +## OCaml + +[This is supported](https://caml.inria.fr/pub/docs/manual-ocaml/patterns.html#sec108) in OCaml. An example from Real World OCaml is: +```ocaml +let is_ocaml_source s = + match String.rsplit2 s ~on:'.' with + | Some (_, ("ml" | "mli")) -> true + | _ -> false +``` + +## Haskell + +The equivalent proposal is currently being discussed for inclusion in Haskell. +See: for that process. + +# Unresolved questions +[unresolved]: #unresolved-questions + +TODO \ No newline at end of file From 7f9ae99a58a5948d3c8511b4b6200ff5123a6f0e Mon Sep 17 00:00:00 2001 From: Mazdak Farrokhzad Date: Wed, 29 Aug 2018 22:20:08 +0200 Subject: [PATCH 02/26] rfc, or-patterns: initial version. --- text/0000-nested-patterns.md | 641 ++++++++++++++++++++++++++++++++--- 1 file changed, 599 insertions(+), 42 deletions(-) diff --git a/text/0000-nested-patterns.md b/text/0000-nested-patterns.md index ed00b1c4eb2..c011ef08ef8 100644 --- a/text/0000-nested-patterns.md +++ b/text/0000-nested-patterns.md @@ -1,77 +1,552 @@ -- Feature Name: `nested_disjunction_patterns` -- Start Date: 2018-04-07 -- RFC PR: (leave this empty) -- Rust Issue: (leave this empty) +- Feature Name: `or_patterns` +- Start Date: 2018-08-29 +- RFC PR: _ +- Rust Issue: _ # Summary [summary]: #summary -Allow `|` to be used within a pattern such that `Some(A(x) | B(x))` becomes -a valid pattern. You can also nest `|` by an arbitrary amount. +Allow `|` to be arbitrarily nested within a pattern such +that `Some(A(0) | B(1 | 2))` becomes a valid pattern. +The operator `|` has low precedence such that `i @ p | q` means `(i @ p) | q`. # Motivation [motivation]: #motivation -## DRY +Nothing this RFC proposes adds anything with respect to expressive power. +Instead, the aim is to make the power we already have more easy to wield. +For example, we wish to improve ergonomics, readability, and the mental model. -## Mental model +## Don't repeat yourself + +Consider an example match arm such as (1): -+ French -+ German -+ Spanish -+ Esperanto -+ Swedish -+ Farsi -+ Finnish -+ Japanese -+ Portuguese +```rust +Some(Enum::A) | Some(Enum::B) | Some(Enum::C) | Some(Enum::D) => .. +``` -## Readability +Here, we are repeating `Some($pat)` three times. + +Compare (1) to how we could have written this with this RFC (2): + +```rust +Some(Enum::A | Enum::B | Enum::C | Enum::D) => .. +``` -## Ergonomics +We can see that this is clearly shorter and that the amount of extra work +we have to do scales linearly with the number of inner variants we mention. +The ability to nest patterns in this way therefore results in improved +writing ergonomics. +## Mental model +However, as we know, code is read more than it is written. So are we trading +readability for increased ergonomics? We believe this is not the case. +Instead, this RFC aims to improve the readability of code by reducing the +amount of redundant information that needs to be scanned. -TODO +In addition, we aim to more closely align Rust with the mental model that +*humans* have and how we usually speak and communicate. + +Consider that you wanted to ask someone what the *colour* of their *car* was. +Would you be more inclined to ask: + +> Is your car red, white, or blue? + +Or would you instead ask: + +> Is your car red, your car white, or your car blue? + +[CNF]: https://en.wikipedia.org/wiki/Conjunctive_normal_form + +[DNF]: https://en.wikipedia.org/wiki/Disjunctive_normal_form + +When researching for this RFC; many people were asked and all of them preferred +the first alternative. This user testing was done on both programmers and +non-programmers alike and included speakers of: ++ Indo-European: + + English + + German (2) + + Swedish (3) + + French (2) + + Portuguese (1) + + Spanish (2) + + Farsi (3) ++ Finnish (1) ++ Esperanto (1) ++ Japanese (1) + +Thus, we conjecture that it's more common for humans to not distribute and to +instead use something akin to *conjunctive normal form* ([CNF]) when communicating. +A likely consequence of this is that a common way to understand snippet (1) +formulated in [DNF] is to first mentally reconstruct it into CNF and then +understand the implications of the pattern. + +By allowing users to encode their logic in the way they think instead of going +through more indirect routes, we can improve the understandability of code. + +## Reducing complexity with uniformity + +A principal way in which programming languages accumulate complexity is by +adding more and more rules that a programmer needs to keep in their head to +write or understand what a program does. A consequence of this is that often +times, caveats and corner cases make for a language that is harder to learn, +understand, and write in. To avoid such caveats, it is thus imperative that +we should try to keep the language more uniform rather than less. +This is an important means through which it becomes possible to give users +more expressiveness but at the same time limit the cost each feature takes +from our complexity budget. + +With this RFC, we try to reduce the complexity of the language by extending +a feature which already exists, and which many users already know about, +to another place. In a sense, giving the user more capabilities results +in a negative increase in complexity. + +[RFC 2175]: https://github.com/rust-lang/rfcs/pull/2175 + +In concrete terms, where before we only allowed a pattern of the form +`'|'? pat | pat` at the top level of `match` and [similar constructs][RFC 2175], +which special cased the language, we now allow `pat | pat` anywhere a pattern +may occur whereby we simplify the ruleset of the language. +In fact, there are already users that try this expecting it to work but +then find out that it does not. + +Furthermore, allowing `pat | pat` in the pattern grammar also allows macros to +produce disjunctions such as `$p | $q`. + +## Real world use cases + +This RFC wouldn't be complete without concrete use cases which it would +facilitate. While there are not an overabundance of cases where `pat | pat` +would help, there are some where it would. Let's go through a few of them. + +[precursor]: https://github.com/rust-lang/rfcs/blob/de235887a80555427314c7eb25c6214523d50cce/text/0000-pipe-in-patterns.md + +1. One example which was raised in the [precursor] to this RFC was building a + state machine which is iterating through `chars_indices`: + + ```rust + match iter.next() { + Some(_, ' ' | '\n' | '\r' | '\u{21A1}') => { + // Change state + } + Some(index, ch) => { + // Look at char + } + None => return Err(Eof), + } + ``` + +[GHC proposal]: https://github.com/osa1/ghc-proposals/blob/or_patterns/proposals/0000-or-patterns.rst#real-world-examples + +2. Other examples are listed in the equivalent [GHC proposal]. + +3. Another example which was provided in the [precursor] RFC was: + + ```rust + for event in event_pump.poll_iter() { + use sdl2::event::Event; + use sdl2::keyboard::Keycode::{Escape, Q}; + match event { + Event::KeyDown { keycode: Some(Escape | Q), ... } => break 'game, + _ => {}, + } + ... + } + ``` + +4. Other cases where this feature was requested include: + + + + + +[alercah_discord]: https://discordapp.com/channels/442252698964721669/448237931136679936/483325957130813440 + +5. Another use case due to [@alercah][alercah_discord] is: + + ```rust + pub fn is_green(self) -> bool { + match self { + | Tile::Suited(Suit::Souzu, 2 | 3 | 4 | 6 | 8) + | Tile::Dragon(Dragon::Green) => true, + _ => false, + } + } + ``` + +6. Some further examples found with sourcegraph include: + + From [cc-rs](https://sourcegraph.com/github.com/alexcrichton/cc-rs/-/blob/src/lib.rs#L1307:18): + + ```rust + match (self.cpp_set_stdlib.as_ref(), cmd.family) { + (None, _) => {} + (Some(stdlib), ToolFamily::Gnu | ToolFamily::Clang) => { + cmd.push_cc_arg(format!("-stdlib=lib{}", stdlib).into()); + } + _ => { + ... + } + } + ``` + + From [capnproto](https://sourcegraph.com/github.com/capnproto/capnproto-rust/-/blob/capnp/src/private/layout.rs#L1979-2002): + + ```rust + // Check whether the size is compatible. + match expected_element_size { + None | Some(Void | InlineComposite) => (), + Some(Bit) => { ... } + Some(Byte | TwoBytes | FourBytes | EightBytes) => { ... }, + ... + } + ``` + + From [chrono](https://sourcegraph.com/github.com/chronotope/chrono/-/blob/src/format/parsed.rs#L271-308): + + ```rust + fn resolve_year(y: Option, q: Option, + r: Option) -> ParseResult> { + match (y, q, r) { + (y, None, None) => Ok(y), + (Some(y), q, r @ (Some(0...99) None)) => { ... }, + ... + } + } + ``` + + From maidsafe's [routing](https://sourcegraph.com/github.com/maidsafe/routing/-/blob/src/states/node.rs#L2138-2180): + + ```rust + match self.peer_mgr.connection_info_received(...) { + ..., + Ok(IsProxy | IsClient | IsJoiningNode) => { ... }, + Ok(Waiting | IsConnected) | Err(_) => (), + } + ``` + + Also from [routing](https://sourcegraph.com/github.com/maidsafe/routing/-/blob/src/states/node.rs#L2215-2245): + + ```rust + match self.peer_mgr.connection_info_received(...) { + Ok(Ready(our_info, their_info)) => { ... } + Ok(Prepare(_) | IsProxy | IsClient | IsJoiningNode) => { ... } + Ok(Waiting | IsConnected) | Err(_) => (), + } + + ``` + + From [termion](https://sourcegraph.com/github.com/ticki/termion/-/blob/src/input.rs#L143-153): + + ```rust + for c in self.bytes() { + match c { + Err(e) => return Err(e), + Ok(0 | 3 | 4) => return Ok(None), + Ok(0x7f) => { buf.pop(); } + Ok(b'\n' | b'\r') => break, + Ok(c) => buf.push(c), + } + } + ``` + +7. Some other use cases are: + + From git2-rs: + + ```rust + match obj.kind() { + Some(Commit | Tag | Tree) => ... + Some(Blob) => ... + None => ... + } + ``` + + From [debcargo](https://salsa.debian.org/rust-team/debcargo/blob/master/src/debian/dependency.rs#L234): + + ```rust + match (op, &mmp.clone()) { + (&Lt, &M(0) | &MM(0, 0) | &MMP(0, 0, 0)) => debcargo_bail!( + "Unrepresentable dependency version predicate: {} {:?}", + dep.name(), + p + ), + (&Tilde, &M(_) | &MM(_, _)) => { + vr.constrain_lt(mmp.inclast()); + vr.constrain_ge(mmp); + } + (&Compatible, &MMP(0, minor, _) | &MM(0, minor)) => { + vr.constrain_lt(MM(0, minor + 1)); + vr.constrain_ge(mmp); + } + (&Compatible, &MMP(major, _, _) | &MM(major, _) | &M(major)) => { + vr.constrain_lt(M(major + 1)); + vr.constrain_ge(mmp); + } + ..., + } + ``` + +8. From rustc, we have: + + In `src/librustc_mir/interpret/eval_context.rs`: + ```rust + Some(Def::Static(..) | Def::Const(..) | Def::AssociatedConst(..)) => {}, + ``` + + In `src/librustc_mir/util/borrowck_errors.rs`: + ```rust + (&ty::TyArray(_, _), Some(true) | None) => "array", + ``` + + In `src/librustc/middle/reachable.rs`: + ```rust + Some(Def::Local(node_id) | Def::Upvar(node_id, ..)) => { .. } + ``` + + In `src/librustc/infer/error_reporting/mod.rs`: + ```rust + Some(hir_map::NodeBlock(_) | hir_map::NodeExpr(_)) => "body", + ``` + + In `src/libfmt_macros/lib.rs`: + ```rust + Some((_, '>' | '<' | '^')) => { .. } + ``` + + In `src/librustc/traits/select.rs`: + ```rust + ty::TyInfer(ty::IntVar(_) | ty::FloatVar(_)) | .. => { .. } + ``` + + In `src/librustc_typeck/check/mod.rs`: + ```rust + ty::TyInt(ast::IntTy::I8 | ast::IntTy::I16) | ty::TyBool => { .. } + + ... + + ty::TyUint(ast::UintTy::U8 | ast::UintTy::U16) => { .. } + ``` + + In `src/tools/cargo/src/cargo/sources/path.rs`: + ```rust + Some("Cargo.lock" | "target") => continue, + ``` + + In `src/libsyntax_ext/format_foreign.rs`: + ```rust + ('h' | 'l' | 'L' | 'z' | 'j' | 't' | 'q', _) => { + state = Type; + length = Some(at.slice_between(next).unwrap()); + move_to!(next); + }, + + ... + + let width = match self.width { + Some(Num::Next) => { + // NOTE: Rust doesn't support this. + return None; + } + w @ Some(Num::Arg(_) | Num::Num(_)) => w, + None => None, + }; + ``` + + In `src/libsyntax/parse/token.rs`: + + ```rust + BinOp(Minus | Star | Or | And) | OrOr => true, + ``` # Guide-level explanation [guide-level-explanation]: #guide-level-explanation +Simply put, `$p | $q` where `$p` and `$q` are some patterns is now itself +a legal pattern. + +This means that you may for example write: + ```rust -match (first, next!()) { - (0xE0 , 0xA0 ... 0xBF) | - (0xE1 ... 0xEC, 0x80 ... 0xBF) | - (0xED , 0x80 ... 0x9F) | - (0xEE ... 0xEF, 0x80 ... 0xBF) => {} - _ => err!(Some(1)) +enum Foo { + Bar, + Baz, + Quux(T), +} + +fn main() { + match Some(Foo::Bar) { + Some(Foo::Bar | Foo::Baz) => { .. }, + _ => { .. }, + } } ``` -rewrite as: +Because `$p | $q` is itself a pattern, this means that you can nest arbitrarily: ```rust -match (first, next!()) { - (0xE0 , 0xA0 ... 0xBF) | - (0xE1 ... 0xEC | 0xEE ... 0xEF, 0x80 ... 0xBF) | - (0xED , 0x80 ... 0x9F) => {} - _ => err!(Some(1)) +fn main() { + match Some(Foo::Bar) { + Some(Foo::Bar | Foo::Quux(0 | 1 | 3)) => { .. }, + _ => { .. } + } } ``` +Note that the operator `|` has a low precedence. This means that you +have to write `foo @ (1 | 2 | 3)` instead of writing `foo @ 1 | 2 | 3`. + # Reference-level explanation [reference-level-explanation]: #reference-level-explanation -TODO +## Grammar + +To the pattern grammar we introduce the following alternative: + +```rust +pat : pat '|' pat + | ... + ; +``` + +Here `|` has the lowest precedence. +In particular, the operator `@` binds more tightly than `|` does. +Thus, `i @ p | q` associates as `i @ (p | q)` as opposed to `(i @ p) | q`. + +We then introduce a production: + +```rust +top_pat : '|'? pat ; +``` + +We then change the grammar of `let` statements to (as compared to [RFC 2175]): + +```rust +let : LET top_pat maybe_ty_ascription maybe_init_expr ';' ; +``` + +We change the grammar of `if let` expressions to: + +```rust +expr_if_let : IF LET top_pat '=' expr_nostruct block (ELSE block_or_if)? ; +``` + +And for `while let` expressions: + +```rust +expr_while_let : maybe_label WHILE LET top_pat '=' expr_nostruct block ; +``` + +For `for` loop expressions we now have: + +```rust +expr_for : maybe_label FOR top_pat IN expr_nostruct block ; +``` + +For `match` expressions we now have: + +```rust +expr_match : MATCH expr_nostruct '{' match_clause* nonblock_match_clause? '}' ; +match_clause : nonblock_match_clause ',' | block_match_clause ','? ; +nonblock_match_clause : match_arm (nonblock_expr | block_expr_dot) ; +block_match_clause : match_arm (block | block_expr) ; + +match_arm : maybe_outer_attrs top_pat (IF expr_nostruct)? FAT_ARROW ; +``` + +In other words, in all of the contexts where a pattern is currently accepted, +the compiler will now accept pattern alternations of form `p | q` where +`p` and `q` are arbitrary patterns. + +## Static semantics + +1. Given a pattern `p | q` at some depth for some arbitrary patterns `p` and `q`, + the pattern is considered ill-formed if: + + + the type inferred for `p` does not unify with the type inferred for `q`, or + + the same set of bindings are not introduced in `p` and `q`, or + + the type two bindings with the same name in `p` and `q` do not unify. + +2. When type checking an expression `match e_s { a_1 => e_1, ... a_n => e_n }`, + for each match arm `a_i` which contains a pattern of form `p_i | q_i`, + the pattern `p_i | q_i` is considered ill formed if, + at the depth `d` where it exists the fragment of `e_s` at depth `d`, + the type of the expression fragment does not unify with `p_i | q_i`. + +3. With respect to exhaustiveness checking, a pattern `p | q` is + considered to cover `p` as well as `q`. For some constructor `c(x, ..)` + the distributive law applies such that `c(p | q, ..rest)` covers the same + set of value as `c(p, ..rest) | c(q, ..rest)` does. This can be applied + recursively until there are no more nested patterns of form `p | q` other + than those that exist at the top level. + + Note that by *"constructor"* we do not refer to tuple struct patterns, + but rather we refer to a pattern for any product type. + This includes enum variants, tuple structs, structs with named fields, + arrays, tuples, and slices. + +## Dynamic semantics + +1. The dynamic semantics of pattern matching a scrutinee expression `e_s` + against a pattern `c(p | q, ..rest)` at depth `d` where `c` is some constructor, + `p` and `q` are arbitrary patterns, and `rest` is optionally any remaining + potential factors in `c`, is defined as being the same as that of + `c(p, ..rest) | c(q, ..rest)`. + +## Implementation notes + +With respect to both static and dynamic semantics, +it is always valid to first desugar a pattern `c(p | q)` +in CNF to its equivalent form in DNF, i.e. `c(p) | c(q)`. + +However, implementing `c(p | q)` in terms of a pure desugaring to `c(p) | c(q)` +may not be optimal as the desugaring can result in exponential blow-up of patterns. +Instead, it is more likely that a one-step case analysis will be more efficient. + +Which implementation technique to use is left open to each Rust compiler. # Drawbacks [drawbacks]: #drawbacks -TODO +1. Some parsers will have to be rewritten by a tiny bit; + We do this with any syntactic change in the language so + there should not be any problem. # Rationale and alternatives -[alternatives]: #alternatives +[alternatives]: #rationale-and-alternatives + +As for why the change as proposed in this RFC should be done, +it is discussed in the [motivation]. + +Since we already use `|` for alternation at the top level, the only consistent +operator syntax for alternations in nested patterns would be `|`. +Therefore, there are not many design choices to make with respect to *how* +this change should be done rather than *if*. + +With respect to the precedence of `|`, we already cannot interpret `i @ p | q` +as `i @ (p | q)` because it is already legal to write `i @ p | j @ q`. +Therefore, if we say that `|` binds more tightly, +then `i @ p | j @ q` will associate as `i @ (p | j @ q)` which as a different +meaning than what we currently have, thus causing a breaking change. + +The only real choice that we do have to make is whether the new addition to the +pattern grammar should be `pat : .. | pat "|" pat ;` or if it instead should be +`pat : .. | "|"? pat "|" pat ;`. We have chosen the latter for 4 reasons: + +1. If we chose the former we can later change to the latter but not vice versa. + This is thus the conservative choice. + +2. There is precedent for such a decision due to [OCaml][ocaml]. + +3. The benefit to macros is dubious as they don't have to produce leading + alternations. + +4. Leading alternations inside patterns is considered poor style. + +However, there is one notable advantage to permitting leading `|` in nested +pattern: -TODO +1. Libraries or tools such as `syn` will have *slightly* easier time parsing + the grammar of Rust. # Prior art [prior-art]: #prior-art @@ -84,13 +559,49 @@ In [CSS4] (draft proposal), it is possible to write a selector `div > *:matches(ul, ol)` which is equivalent to `div > ul, div > ol`. The moral equivalent of this in Rust would be: `Div(Ul | Ol)`. -## Regex +## Regular expressions + +[regex]: https://en.wikipedia.org/wiki/Regular_expression + +Most [regular expression][regex] formalisms support at least the +following operations (where `a`, `b`, and `c` are arbitrary regexes): + ++ Concatenation: *"`a` followed by `b`"*. + Commonly written by just saying `ab`. + ++ Alternation: *"first match `a` or otherwise match `b`"* + Commonly written as `a | b`. + `|` binds more loosely than concatenation. + ++ Grouping: used to define the scope of what operators apply to. + Commonly written as `(a)`. + +Formally, the the minimal formalism we need is: + +```rust +pat : terminal | pat pat | pat "|" pat | "(" pat ")" ; +``` + +Given this formalism, it is then possible to encode a regex: + +```rust +a(b | c) +``` + +By the law of distributivity, we can rewrite this as: -TODO +```rust +ab | ac +``` ## OCaml +[ocaml]: #ocaml + +[ocaml_support]: https://caml.inria.fr/pub/docs/manual-ocaml/patterns.html#sec108 + +[This is supported][ocaml_support] in OCaml. +An example from "Real World OCaml" is: -[This is supported](https://caml.inria.fr/pub/docs/manual-ocaml/patterns.html#sec108) in OCaml. An example from Real World OCaml is: ```ocaml let is_ocaml_source s = match String.rsplit2 s ~on:'.' with @@ -98,12 +609,58 @@ let is_ocaml_source s = | _ -> false ``` +While OCaml will permit the following: + +```ocaml +let foo = + match Some(1) with + | Some(1 | 2) -> true + | _ -> false +``` + +the OCaml compiler will reject: + +```ocaml +let foo = + match Some(1) with + | Some(| 1 | 2) -> true (* Note in particular the leading | in Some(..). *) + | _ -> false +``` + +We have chosen to impose the same restriction as OCaml here with respect to +not allowing leading `|` in nested pattern alternations. + +## F# + +[fsharp_patterns]: https://docs.microsoft.com/en-us/dotnet/fsharp/language-reference/pattern-matching + +A language which is quite similar to OCaml is F#. +With respect to [pattern matching][fsharp_patterns], we may write: + +```fsharp +let detectZeroOR point = + match point with + | (0, 0) | (0, _) | (_, 0) -> printfn "Zero found." + | _ -> printfn "Both nonzero." +``` + +F# calls these "OR pattern"s and includes +`pattern1 | pattern2` in the pattern grammar. + ## Haskell -The equivalent proposal is currently being discussed for inclusion in Haskell. -See: for that process. +[ghc_proposal_43]: https://github.com/ghc-proposals/ghc-proposals/pull/43 + +The [equivalent proposal][ghc_proposal_43] is currently being discussed for +inclusion in Haskell. + +## Lisp + +[lisp_libs]: https://stackoverflow.com/a/3798659/1063961 + +There is support for or-patterns in [various lisp libraries][lisp_libs]. # Unresolved questions [unresolved]: #unresolved-questions -TODO \ No newline at end of file +There are none. From 99db31da2c7e9787dac6a2e115a5acc6d5a787f2 Mon Sep 17 00:00:00 2001 From: Mazdak Farrokhzad Date: Wed, 29 Aug 2018 22:20:36 +0200 Subject: [PATCH 03/26] rfc, or-patterns: move file. --- text/{0000-nested-patterns.md => 0000-or-patterns.md} | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename text/{0000-nested-patterns.md => 0000-or-patterns.md} (100%) diff --git a/text/0000-nested-patterns.md b/text/0000-or-patterns.md similarity index 100% rename from text/0000-nested-patterns.md rename to text/0000-or-patterns.md From 27c8467f17845d66293ba32b7999a94c6b178cca Mon Sep 17 00:00:00 2001 From: Mazdak Farrokhzad Date: Wed, 29 Aug 2018 23:05:43 +0200 Subject: [PATCH 04/26] rfc, or-patterns: fix typo. --- text/0000-or-patterns.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/0000-or-patterns.md b/text/0000-or-patterns.md index c011ef08ef8..2a2d7bf186c 100644 --- a/text/0000-or-patterns.md +++ b/text/0000-or-patterns.md @@ -208,7 +208,7 @@ would help, there are some where it would. Let's go through a few of them. r: Option) -> ParseResult> { match (y, q, r) { (y, None, None) => Ok(y), - (Some(y), q, r @ (Some(0...99) None)) => { ... }, + (Some(y), q, r @ (Some(0...99) | None)) => { ... }, ... } } From fb37a51e5bddfe1030cd682d0cdf513bcb7f2d1d Mon Sep 17 00:00:00 2001 From: Mazdak Farrokhzad Date: Wed, 29 Aug 2018 23:08:35 +0200 Subject: [PATCH 05/26] rfc, or-patterns: simplify to &(p | q | ..). --- text/0000-or-patterns.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/text/0000-or-patterns.md b/text/0000-or-patterns.md index 2a2d7bf186c..8c2a921248c 100644 --- a/text/0000-or-patterns.md +++ b/text/0000-or-patterns.md @@ -265,20 +265,20 @@ would help, there are some where it would. Let's go through a few of them. ```rust match (op, &mmp.clone()) { - (&Lt, &M(0) | &MM(0, 0) | &MMP(0, 0, 0)) => debcargo_bail!( + (&Lt, &(M(0) | MM(0, 0) | MMP(0, 0, 0))) => debcargo_bail!( "Unrepresentable dependency version predicate: {} {:?}", dep.name(), p ), - (&Tilde, &M(_) | &MM(_, _)) => { + (&Tilde, &(M(_) | MM(_, _))) => { vr.constrain_lt(mmp.inclast()); vr.constrain_ge(mmp); } - (&Compatible, &MMP(0, minor, _) | &MM(0, minor)) => { + (&Compatible, &(MMP(0, minor, _) | MM(0, minor))) => { vr.constrain_lt(MM(0, minor + 1)); vr.constrain_ge(mmp); } - (&Compatible, &MMP(major, _, _) | &MM(major, _) | &M(major)) => { + (&Compatible, &(MMP(major, _, _) | MM(major, _) | M(major))) => { vr.constrain_lt(M(major + 1)); vr.constrain_ge(mmp); } From 1d959b93cb153f966c4155b51837bdf941d0fae3 Mon Sep 17 00:00:00 2001 From: Mazdak Farrokhzad Date: Wed, 29 Aug 2018 23:10:07 +0200 Subject: [PATCH 06/26] rfc, or-patterns: fix typo. --- text/0000-or-patterns.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/0000-or-patterns.md b/text/0000-or-patterns.md index 8c2a921248c..326144f3b63 100644 --- a/text/0000-or-patterns.md +++ b/text/0000-or-patterns.md @@ -464,7 +464,7 @@ the compiler will now accept pattern alternations of form `p | q` where + the type inferred for `p` does not unify with the type inferred for `q`, or + the same set of bindings are not introduced in `p` and `q`, or - + the type two bindings with the same name in `p` and `q` do not unify. + + the type of any two bindings with the same name in `p` and `q` do not unify. 2. When type checking an expression `match e_s { a_1 => e_1, ... a_n => e_n }`, for each match arm `a_i` which contains a pattern of form `p_i | q_i`, From 6db6ac985816e96efe213f26e3a5b0381d1091d0 Mon Sep 17 00:00:00 2001 From: Mazdak Farrokhzad Date: Wed, 29 Aug 2018 23:14:34 +0200 Subject: [PATCH 07/26] rfc, or-patterns: expand initialism DNF. --- text/0000-or-patterns.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/text/0000-or-patterns.md b/text/0000-or-patterns.md index 326144f3b63..3c6551ae2c4 100644 --- a/text/0000-or-patterns.md +++ b/text/0000-or-patterns.md @@ -79,8 +79,8 @@ non-programmers alike and included speakers of: Thus, we conjecture that it's more common for humans to not distribute and to instead use something akin to *conjunctive normal form* ([CNF]) when communicating. A likely consequence of this is that a common way to understand snippet (1) -formulated in [DNF] is to first mentally reconstruct it into CNF and then -understand the implications of the pattern. +formulated in *disjunctive normal form* ([DNF]) is to first mentally reconstruct +it into CNF and then understand the implications of the pattern. By allowing users to encode their logic in the way they think instead of going through more indirect routes, we can improve the understandability of code. From 61d494822eef57031811bc1fc90bddee1e79e9c9 Mon Sep 17 00:00:00 2001 From: Mazdak Farrokhzad Date: Thu, 30 Aug 2018 01:21:39 +0200 Subject: [PATCH 08/26] rfc, or-patterns: fix an oopsie. --- text/0000-or-patterns.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/0000-or-patterns.md b/text/0000-or-patterns.md index 3c6551ae2c4..8451576e68f 100644 --- a/text/0000-or-patterns.md +++ b/text/0000-or-patterns.md @@ -530,7 +530,7 @@ meaning than what we currently have, thus causing a breaking change. The only real choice that we do have to make is whether the new addition to the pattern grammar should be `pat : .. | pat "|" pat ;` or if it instead should be -`pat : .. | "|"? pat "|" pat ;`. We have chosen the latter for 4 reasons: +`pat : .. | "|"? pat "|" pat ;`. We have chosen the former for 4 reasons: 1. If we chose the former we can later change to the latter but not vice versa. This is thus the conservative choice. From 2f75fbcb84e28dcc2ad8b14dd06cfac9f1f871b7 Mon Sep 17 00:00:00 2001 From: Mazdak Farrokhzad Date: Thu, 30 Aug 2018 05:46:27 +0200 Subject: [PATCH 09/26] rfc, or-patterns: fix inconsistency. --- text/0000-or-patterns.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/0000-or-patterns.md b/text/0000-or-patterns.md index 8451576e68f..86898fa7ea1 100644 --- a/text/0000-or-patterns.md +++ b/text/0000-or-patterns.md @@ -410,7 +410,7 @@ pat : pat '|' pat Here `|` has the lowest precedence. In particular, the operator `@` binds more tightly than `|` does. -Thus, `i @ p | q` associates as `i @ (p | q)` as opposed to `(i @ p) | q`. +Thus, `i @ p | q` associates as `(i @ p) | q` as opposed to `i @ (p | q)`. We then introduce a production: From a1a17bd96904007fe350fef595b245580ec3fa2c Mon Sep 17 00:00:00 2001 From: Mazdak Farrokhzad Date: Thu, 30 Aug 2018 05:47:53 +0200 Subject: [PATCH 10/26] rfc, or-patterns: flatten list of languages. --- text/0000-or-patterns.md | 15 +++------------ 1 file changed, 3 insertions(+), 12 deletions(-) diff --git a/text/0000-or-patterns.md b/text/0000-or-patterns.md index 86898fa7ea1..c6bf9ed10b0 100644 --- a/text/0000-or-patterns.md +++ b/text/0000-or-patterns.md @@ -63,18 +63,9 @@ Or would you instead ask: When researching for this RFC; many people were asked and all of them preferred the first alternative. This user testing was done on both programmers and -non-programmers alike and included speakers of: -+ Indo-European: - + English - + German (2) - + Swedish (3) - + French (2) - + Portuguese (1) - + Spanish (2) - + Farsi (3) -+ Finnish (1) -+ Esperanto (1) -+ Japanese (1) +non-programmers alike and included speakers of: English, German (2) Swedish (3), +French (2), Portuguese (1), Spanish (2), Farsi (3), Finnish (1), Esperanto (1), +and Japanese (1). Thus, we conjecture that it's more common for humans to not distribute and to instead use something akin to *conjunctive normal form* ([CNF]) when communicating. From 0501bd4a15b7f70370fa0c372c26f345a774a9f5 Mon Sep 17 00:00:00 2001 From: Mazdak Farrokhzad Date: Thu, 30 Aug 2018 05:49:59 +0200 Subject: [PATCH 11/26] rfc, or-patterns: be more to the point. --- text/0000-or-patterns.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/0000-or-patterns.md b/text/0000-or-patterns.md index c6bf9ed10b0..c25941fd336 100644 --- a/text/0000-or-patterns.md +++ b/text/0000-or-patterns.md @@ -96,7 +96,7 @@ in a negative increase in complexity. [RFC 2175]: https://github.com/rust-lang/rfcs/pull/2175 In concrete terms, where before we only allowed a pattern of the form -`'|'? pat | pat` at the top level of `match` and [similar constructs][RFC 2175], +`pat | pat` at the top level of `match` and [similar constructs][RFC 2175], which special cased the language, we now allow `pat | pat` anywhere a pattern may occur whereby we simplify the ruleset of the language. In fact, there are already users that try this expecting it to work but From e410faf9ed4466800f63d2e6934fe601924b8acd Mon Sep 17 00:00:00 2001 From: Mazdak Farrokhzad Date: Thu, 30 Aug 2018 05:53:18 +0200 Subject: [PATCH 12/26] rfc, or-patterns: clarify where i @ p | j @ q is allowed today. --- text/0000-or-patterns.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/text/0000-or-patterns.md b/text/0000-or-patterns.md index c25941fd336..73afe40d8d0 100644 --- a/text/0000-or-patterns.md +++ b/text/0000-or-patterns.md @@ -513,9 +513,9 @@ operator syntax for alternations in nested patterns would be `|`. Therefore, there are not many design choices to make with respect to *how* this change should be done rather than *if*. -With respect to the precedence of `|`, we already cannot interpret `i @ p | q` -as `i @ (p | q)` because it is already legal to write `i @ p | j @ q`. -Therefore, if we say that `|` binds more tightly, +With respect to the precedence of `|`, we cannot interpret `i @ p | q` +as `i @ (p | q)` because it is already legal to write `i @ p | j @ q` +at the top level of a pattern. Therefore, if we say that `|` binds more tightly, then `i @ p | j @ q` will associate as `i @ (p | j @ q)` which as a different meaning than what we currently have, thus causing a breaking change. From 2c974ae56116d1f59815776b0c2865c6155bd490 Mon Sep 17 00:00:00 2001 From: Mazdak Farrokhzad Date: Thu, 30 Aug 2018 05:57:46 +0200 Subject: [PATCH 13/26] rfc, or-patterns: code using git2-rs, not from it. --- text/0000-or-patterns.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/0000-or-patterns.md b/text/0000-or-patterns.md index 73afe40d8d0..6c5697f21c0 100644 --- a/text/0000-or-patterns.md +++ b/text/0000-or-patterns.md @@ -242,7 +242,7 @@ would help, there are some where it would. Let's go through a few of them. 7. Some other use cases are: - From git2-rs: + In code using git2-rs: ```rust match obj.kind() { From d477b2cff6461d6c2c511ca381aeb792124b6530 Mon Sep 17 00:00:00 2001 From: Mazdak Farrokhzad Date: Thu, 30 Aug 2018 06:32:52 +0200 Subject: [PATCH 14/26] rfc, or-patterns: permalinks. --- text/0000-or-patterns.md | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/text/0000-or-patterns.md b/text/0000-or-patterns.md index 6c5697f21c0..671e8c89a0a 100644 --- a/text/0000-or-patterns.md +++ b/text/0000-or-patterns.md @@ -128,7 +128,7 @@ would help, there are some where it would. Let's go through a few of them. } ``` -[GHC proposal]: https://github.com/osa1/ghc-proposals/blob/or_patterns/proposals/0000-or-patterns.rst#real-world-examples +[GHC proposal]: https://github.com/osa1/ghc-proposals/blob/77ee8e615aa28fbf2d0ef2be876a852c4e63c53b/proposals/0000-or-patterns.rst#real-world-examples 2. Other examples are listed in the equivalent [GHC proposal]. @@ -166,7 +166,7 @@ would help, there are some where it would. Let's go through a few of them. 6. Some further examples found with sourcegraph include: - From [cc-rs](https://sourcegraph.com/github.com/alexcrichton/cc-rs/-/blob/src/lib.rs#L1307:18): + From [cc-rs](https://github.com/alexcrichton/cc-rs/blob/74ce606aa227a30a97d7c1990c1e8d322e01c6d8/src/lib.rs#L1307-L1319): ```rust match (self.cpp_set_stdlib.as_ref(), cmd.family) { @@ -180,7 +180,7 @@ would help, there are some where it would. Let's go through a few of them. } ``` - From [capnproto](https://sourcegraph.com/github.com/capnproto/capnproto-rust/-/blob/capnp/src/private/layout.rs#L1979-2002): + From [capnproto](https://github.com/capnproto/capnproto-rust/blob/35027494bb6e741aa478597358bac8ac92108a30/capnp/src/private/layout.rs#L1979-L2002): ```rust // Check whether the size is compatible. @@ -192,7 +192,7 @@ would help, there are some where it would. Let's go through a few of them. } ``` - From [chrono](https://sourcegraph.com/github.com/chronotope/chrono/-/blob/src/format/parsed.rs#L271-308): + From [chrono](https://github.com/chronotope/chrono/blob/94b43fa2e8bd43e7f42bb5b67afd1c3415b27683/src/format/parsed.rs#L271-L308): ```rust fn resolve_year(y: Option, q: Option, @@ -205,7 +205,7 @@ would help, there are some where it would. Let's go through a few of them. } ``` - From maidsafe's [routing](https://sourcegraph.com/github.com/maidsafe/routing/-/blob/src/states/node.rs#L2138-2180): + From maidsafe's [routing](https://github.com/maidsafe/routing/blob/0081a48d59e4fe3fb86b20da1fceb8f757855112/src/states/node.rs#L2138-L2180): ```rust match self.peer_mgr.connection_info_received(...) { @@ -215,7 +215,7 @@ would help, there are some where it would. Let's go through a few of them. } ``` - Also from [routing](https://sourcegraph.com/github.com/maidsafe/routing/-/blob/src/states/node.rs#L2215-2245): + Also from [routing](https://github.com/maidsafe/routing/blob/0081a48d59e4fe3fb86b20da1fceb8f757855112/src/states/node.rs#L2215-L2245): ```rust match self.peer_mgr.connection_info_received(...) { @@ -226,7 +226,7 @@ would help, there are some where it would. Let's go through a few of them. ``` - From [termion](https://sourcegraph.com/github.com/ticki/termion/-/blob/src/input.rs#L143-153): + From [termion](https://github.com/redox-os/termion/blob/d2945cd36c452824aeabd5d7c13980d9567eb8a2/src/input.rs#L143-L153): ```rust for c in self.bytes() { @@ -252,7 +252,7 @@ would help, there are some where it would. Let's go through a few of them. } ``` - From [debcargo](https://salsa.debian.org/rust-team/debcargo/blob/master/src/debian/dependency.rs#L234): + From [debcargo](https://salsa.debian.org/rust-team/debcargo/blob/4355097810264644cb08ddaa8f7464d5887275f1/src/debian/dependency.rs#L234-291): ```rust match (op, &mmp.clone()) { From 55754ae475f92676d2f08c941c154d0f42f7bb4e Mon Sep 17 00:00:00 2001 From: Mazdak Farrokhzad Date: Thu, 30 Aug 2018 06:41:03 +0200 Subject: [PATCH 15/26] rfc, or-patterns: give example blowup. --- text/0000-or-patterns.md | 35 ++++++++++++++++++++++++++++++++++- 1 file changed, 34 insertions(+), 1 deletion(-) diff --git a/text/0000-or-patterns.md b/text/0000-or-patterns.md index 671e8c89a0a..fecb73e1cdd 100644 --- a/text/0000-or-patterns.md +++ b/text/0000-or-patterns.md @@ -490,7 +490,40 @@ it is always valid to first desugar a pattern `c(p | q)` in CNF to its equivalent form in DNF, i.e. `c(p) | c(q)`. However, implementing `c(p | q)` in terms of a pure desugaring to `c(p) | c(q)` -may not be optimal as the desugaring can result in exponential blow-up of patterns. +may not be optimal as the desugaring can result in multiplicative blow-up of patterns. +An example of such blow up can be seen with: + +```rust +match expr { + (0 | 1, 0 | 1, 0 | 1, 0 | 1) => { ... }, +} +``` + +If we expanded this naively to DNF we would get: + +```rust +match expr { + | (0, 0, 0, 0) + | (0, 0, 0, 1) + | (0, 0, 1, 0) + | (0, 0, 1, 1) + | (0, 1, 0, 0) + | (0, 1, 0, 1) + | (0, 1, 1, 0) + | (0, 1, 1, 1) + | (1, 0, 0, 0) + | (1, 0, 0, 1) + | (1, 0, 1, 0) + | (1, 0, 1, 1) + | (1, 1, 0, 0) + | (1, 1, 0, 1) + | (1, 1, 1, 0) + | (1, 1, 1, 1) + => { ... }, +} +``` + + Instead, it is more likely that a one-step case analysis will be more efficient. Which implementation technique to use is left open to each Rust compiler. From a00d80255a9146a52c6c6cdb17a8e48be6e2c683 Mon Sep 17 00:00:00 2001 From: Mazdak Farrokhzad Date: Thu, 30 Aug 2018 06:52:58 +0200 Subject: [PATCH 16/26] rfc, or-patterns: more rationale for precedence. --- text/0000-or-patterns.md | 16 +++++++++++++--- 1 file changed, 13 insertions(+), 3 deletions(-) diff --git a/text/0000-or-patterns.md b/text/0000-or-patterns.md index fecb73e1cdd..f98f4d25232 100644 --- a/text/0000-or-patterns.md +++ b/text/0000-or-patterns.md @@ -8,7 +8,6 @@ Allow `|` to be arbitrarily nested within a pattern such that `Some(A(0) | B(1 | 2))` becomes a valid pattern. -The operator `|` has low precedence such that `i @ p | q` means `(i @ p) | q`. # Motivation [motivation]: #motivation @@ -383,8 +382,10 @@ fn main() { } ``` -Note that the operator `|` has a low precedence. This means that you -have to write `foo @ (1 | 2 | 3)` instead of writing `foo @ 1 | 2 | 3`. +Note that the operator `|` has a low precedence. This means that if you +want the same outcome as `foo @ 1 | foo @ 2 | foo @ 3`, you have to write +`foo @ (1 | 2 | 3)` instead of writing `foo @ 1 | 2 | 3`. +This is discussed in the [rationale][alternatives]. # Reference-level explanation [reference-level-explanation]: #reference-level-explanation @@ -552,6 +553,15 @@ at the top level of a pattern. Therefore, if we say that `|` binds more tightly, then `i @ p | j @ q` will associate as `i @ (p | j @ q)` which as a different meaning than what we currently have, thus causing a breaking change. +And even if we could associate `i @ p | q` as `i @ (p | q)` there is a good +reason why we should not. Simply put, we should understand `@` as a +pattern / set intersection operator and the operator `|` as the union operator. +This is analogous to multiplication and addition as well as conjunction and +disjunction in logic. In these fields, it is customary for multiplication and +conjunction to bind more tightly. That is, we interpret `a * b + c` as +`(a * b) + c` and not `a * (b + c)`. Similarly, we interpret `p ∧ q ∨ r` +as `(p ∧ q) ∨ r` and not `p ∧ (q ∨ r)`. + The only real choice that we do have to make is whether the new addition to the pattern grammar should be `pat : .. | pat "|" pat ;` or if it instead should be `pat : .. | "|"? pat "|" pat ;`. We have chosen the former for 4 reasons: From bf7a7bd0469108402c324826eee9a3b7ca058563 Mon Sep 17 00:00:00 2001 From: Mazdak Farrokhzad Date: Thu, 30 Aug 2018 06:55:13 +0200 Subject: [PATCH 17/26] rfc, or-patterns: binding mode is important. --- text/0000-or-patterns.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/text/0000-or-patterns.md b/text/0000-or-patterns.md index f98f4d25232..a0d20457b18 100644 --- a/text/0000-or-patterns.md +++ b/text/0000-or-patterns.md @@ -456,7 +456,8 @@ the compiler will now accept pattern alternations of form `p | q` where + the type inferred for `p` does not unify with the type inferred for `q`, or + the same set of bindings are not introduced in `p` and `q`, or - + the type of any two bindings with the same name in `p` and `q` do not unify. + + the type of any two bindings with the same name in `p` and `q` do not unify + with respect to types or binding modes. 2. When type checking an expression `match e_s { a_1 => e_1, ... a_n => e_n }`, for each match arm `a_i` which contains a pattern of form `p_i | q_i`, From a20658609e35d31158f9677641d0adb2e4369443 Mon Sep 17 00:00:00 2001 From: Mazdak Farrokhzad Date: Thu, 30 Aug 2018 09:33:54 +0200 Subject: [PATCH 18/26] rfc, or-patterns: discuss special casing E0408. --- text/0000-or-patterns.md | 50 +++++++++++++++++++++++++++++++++++++++- 1 file changed, 49 insertions(+), 1 deletion(-) diff --git a/text/0000-or-patterns.md b/text/0000-or-patterns.md index a0d20457b18..76ad8cb35b4 100644 --- a/text/0000-or-patterns.md +++ b/text/0000-or-patterns.md @@ -449,6 +449,55 @@ In other words, in all of the contexts where a pattern is currently accepted, the compiler will now accept pattern alternations of form `p | q` where `p` and `q` are arbitrary patterns. +### Error messages + +As previously noted, the precedence of the operator `|` is lower than that of +the operator `@`. This results in `i @ p | q` being interpreted as `(i @ p) | q`. +In turn, this would result in an error because `i` is not defined in all +alternations. An example: + +```rust +fn main() { + match 1 { + i @ 0 | 1 => {}, + } +} +``` + +This would result in: + +```rust +error[E0408]: variable `i` is not bound in all patterns + --> src/main.rs:3:17 + | +3 | i @ 0 | 1 => {}, + | - ^ pattern doesn't bind `i` + | | + | variable not in all patterns +``` + +However, it is quite likely that a user who wrote `i @ p | q` wanted the +semantics of `i @ (p | q)` because it would be the only thing that would +be a well formed pattern. To guide the user on the way, we recommend special +casing the error message for such circumstances with for example: + +```rust +error[E0408]: variable `i` is not bound in all patterns + --> src/main.rs:3:17 + | +3 | i @ 0 | 1 => {}, + | - ^ pattern doesn't bind `i` + | | + | variable not in all patterns + | + | hint: if you wanted `i` to cover both cases, try adding parenthesis around: + | + | i @ 0 | 1 + | ^^^^^ +``` + +The particular design of such an error message is left open to implementations. + ## Static semantics 1. Given a pattern `p | q` at some depth for some arbitrary patterns `p` and `q`, @@ -525,7 +574,6 @@ match expr { } ``` - Instead, it is more likely that a one-step case analysis will be more efficient. Which implementation technique to use is left open to each Rust compiler. From 839008b4b15accd7ca61fdf3d1dd3ace7461a0ef Mon Sep 17 00:00:00 2001 From: Mazdak Farrokhzad Date: Thu, 30 Aug 2018 09:37:50 +0200 Subject: [PATCH 19/26] rfc, or-patterns: fix typo. --- text/0000-or-patterns.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/0000-or-patterns.md b/text/0000-or-patterns.md index 76ad8cb35b4..c3d96ffe755 100644 --- a/text/0000-or-patterns.md +++ b/text/0000-or-patterns.md @@ -490,7 +490,7 @@ error[E0408]: variable `i` is not bound in all patterns | | | variable not in all patterns | - | hint: if you wanted `i` to cover both cases, try adding parenthesis around: + | hint: if you wanted `i` to cover both cases, try adding parentheses around: | | i @ 0 | 1 | ^^^^^ From 5a820ccda210602ceab9b4647418c20882b632f2 Mon Sep 17 00:00:00 2001 From: Mazdak Farrokhzad Date: Thu, 20 Sep 2018 19:49:51 +0200 Subject: [PATCH 20/26] rfc, or-patterns: clarify that top_pat is permitted in fn arguments. --- text/0000-or-patterns.md | 54 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 54 insertions(+) diff --git a/text/0000-or-patterns.md b/text/0000-or-patterns.md index c3d96ffe755..fcd1846ae46 100644 --- a/text/0000-or-patterns.md +++ b/text/0000-or-patterns.md @@ -387,6 +387,42 @@ want the same outcome as `foo @ 1 | foo @ 2 | foo @ 3`, you have to write `foo @ (1 | 2 | 3)` instead of writing `foo @ 1 | 2 | 3`. This is discussed in the [rationale][alternatives]. +You can also use `p | q` in: + +1. `if let` expressions: + + ```rust + if let Foo::Bar | Foo::Quux(1 | 2) = some_computation() { + ... + } + ``` + +1. `while let` expressions: + + ```rust + while let Ok(1 | 2) | Err(3) = different_computation() { + ... + } + ``` + +3. `let` statements: + + ```rust + let Ok(x) | Err(x) = another_computation(); + ``` + + In this case, the pattern must be irrefutable as `Ok(x) | Err(x)` is. + +4. `fn` arguments: + + ```rust + fn foo(Ok(x) | Err(x): Result) { + ... + } + ``` + + Here too, the pattern must be irrefutable. + # Reference-level explanation [reference-level-explanation]: #reference-level-explanation @@ -445,6 +481,12 @@ block_match_clause : match_arm (block | block_expr) ; match_arm : maybe_outer_attrs top_pat (IF expr_nostruct)? FAT_ARROW ; ``` +For the patterns of `fn` arguments we now have: + +```rust +param : top_pat ':' ty_sum ; +``` + In other words, in all of the contexts where a pattern is currently accepted, the compiler will now accept pattern alternations of form `p | q` where `p` and `q` are arbitrary patterns. @@ -591,11 +633,15 @@ Which implementation technique to use is left open to each Rust compiler. As for why the change as proposed in this RFC should be done, it is discussed in the [motivation]. +## Syntax + Since we already use `|` for alternation at the top level, the only consistent operator syntax for alternations in nested patterns would be `|`. Therefore, there are not many design choices to make with respect to *how* this change should be done rather than *if*. +## Precedence + With respect to the precedence of `|`, we cannot interpret `i @ p | q` as `i @ (p | q)` because it is already legal to write `i @ p | j @ q` at the top level of a pattern. Therefore, if we say that `|` binds more tightly, @@ -611,6 +657,8 @@ conjunction to bind more tightly. That is, we interpret `a * b + c` as `(a * b) + c` and not `a * (b + c)`. Similarly, we interpret `p ∧ q ∨ r` as `(p ∧ q) ∨ r` and not `p ∧ (q ∨ r)`. +## Leading `|` + The only real choice that we do have to make is whether the new addition to the pattern grammar should be `pat : .. | pat "|" pat ;` or if it instead should be `pat : .. | "|"? pat "|" pat ;`. We have chosen the former for 4 reasons: @@ -631,6 +679,12 @@ pattern: 1. Libraries or tools such as `syn` will have *slightly* easier time parsing the grammar of Rust. +## `fn` arguments + +In this RFC, we allow `p | q` inside patterns of `fn` arguments. +The rationale for this is simply consistency with `let` which also permit +these and did so before this RFC at the top level with [RFC 2175]. + # Prior art [prior-art]: #prior-art From 3b016a3e6d5df58a50e9eb4381d9ef5d19c19fab Mon Sep 17 00:00:00 2001 From: Mazdak Farrokhzad Date: Thu, 20 Sep 2018 20:32:48 +0200 Subject: [PATCH 21/26] rfc, or-patterns: discuss macros and closures. --- text/0000-or-patterns.md | 57 +++++++++++++++++++++++++++++++++++----- 1 file changed, 51 insertions(+), 6 deletions(-) diff --git a/text/0000-or-patterns.md b/text/0000-or-patterns.md index fcd1846ae46..e90bdb2a674 100644 --- a/text/0000-or-patterns.md +++ b/text/0000-or-patterns.md @@ -423,27 +423,47 @@ You can also use `p | q` in: Here too, the pattern must be irrefutable. +5. closure arguments: + + ```rust + let closure = |(Ok(x) | Err(x))| x + 1; + ``` + + Notice that in this case, we have to wrap the pattern in parenthesis. + This restriction is currently enforced to avoid backtracking but may possibly + be lifted in the future based on other developments in the grammar. + # Reference-level explanation [reference-level-explanation]: #reference-level-explanation ## Grammar -To the pattern grammar we introduce the following alternative: +We parameterize the `pat` grammar by the choice whether to allow top level +`pat | pat`. We then change the pattern grammar to: ```rust -pat : pat '|' pat - | ... - ; +pat +: pat '|' pat +| ... +; + +pat +: "(" pat ")" +| ... +; ``` Here `|` has the lowest precedence. In particular, the operator `@` binds more tightly than `|` does. Thus, `i @ p | q` associates as `(i @ p) | q` as opposed to `i @ (p | q)`. +Note: `pat` does not entail that the grammar of Rust is context sensitive +because we "monomorphize" the parameterization below. + We then introduce a production: ```rust -top_pat : '|'? pat ; +top_pat : '|'? pat ; ``` We then change the grammar of `let` statements to (as compared to [RFC 2175]): @@ -491,6 +511,15 @@ In other words, in all of the contexts where a pattern is currently accepted, the compiler will now accept pattern alternations of form `p | q` where `p` and `q` are arbitrary patterns. +For closures we now have: + +```rust +inferrable_param : pat maybe_ty_ascription ; +``` + +Finally, `pat` macro fragment specifiers will also match the `pat` +production as opposed to `pat`. + ### Error messages As previously noted, the precedence of the operator `|` is lower than that of @@ -685,6 +714,10 @@ In this RFC, we allow `p | q` inside patterns of `fn` arguments. The rationale for this is simply consistency with `let` which also permit these and did so before this RFC at the top level with [RFC 2175]. +## Macros and closures + +See the section on [unresolved] questions for a brief discussion. + # Prior art [prior-art]: #prior-art @@ -800,4 +833,16 @@ There is support for or-patterns in [various lisp libraries][lisp_libs]. # Unresolved questions [unresolved]: #unresolved-questions -There are none. +1. Should we allow `top_pat` or `pat` in `inferrable_param` such + that closures permit `|Ok(x) | Err(x)|` without first wrapping in parenthesis? + We defer this decision to stabilization as it may depend on experimentation. + +2. Should the `pat` macro fragment specifier match `top_pat` in different + Rust editions or should it match `pat` as currently specified? + We defer such decisions to stabilization because it depends on the outcome + of crater runs to see what the extent of the breakage would be. + +The benefit of avoiding `pat` in as many places as possible would +both be grammatical consistency and fewer surprises for uses. +The drawbacks would be possible ambiguity or backtracking for closures and +breakage for macros. From 879fd99cac0594c249e0abcc49474adad6b26251 Mon Sep 17 00:00:00 2001 From: Mazdak Farrokhzad Date: Thu, 20 Sep 2018 22:28:20 +0200 Subject: [PATCH 22/26] rfc, or-patterns: discuss closures more + func args => pat. --- text/0000-or-patterns.md | 22 ++++++++++++++++------ 1 file changed, 16 insertions(+), 6 deletions(-) diff --git a/text/0000-or-patterns.md b/text/0000-or-patterns.md index e90bdb2a674..3eeba5be859 100644 --- a/text/0000-or-patterns.md +++ b/text/0000-or-patterns.md @@ -416,7 +416,7 @@ You can also use `p | q` in: 4. `fn` arguments: ```rust - fn foo(Ok(x) | Err(x): Result) { + fn foo((Ok(x) | Err(x)): Result) { ... } ``` @@ -501,16 +501,16 @@ block_match_clause : match_arm (block | block_expr) ; match_arm : maybe_outer_attrs top_pat (IF expr_nostruct)? FAT_ARROW ; ``` +In other words, in all of the contexts where a pattern is currently accepted, +the compiler will now accept pattern alternations of form `p | q` where +`p` and `q` are arbitrary patterns. + For the patterns of `fn` arguments we now have: ```rust -param : top_pat ':' ty_sum ; +param : pat ':' ty_sum ; ``` -In other words, in all of the contexts where a pattern is currently accepted, -the compiler will now accept pattern alternations of form `p | q` where -`p` and `q` are arbitrary patterns. - For closures we now have: ```rust @@ -835,7 +835,17 @@ There is support for or-patterns in [various lisp libraries][lisp_libs]. 1. Should we allow `top_pat` or `pat` in `inferrable_param` such that closures permit `|Ok(x) | Err(x)|` without first wrapping in parenthesis? + We defer this decision to stabilization as it may depend on experimentation. + Or current inclination is to keep the RFC as-is because the ambiguity is not + just for the compiler; for humans, it is likely also ambiguous and thus + harder to read. + + This also applies to functions, which although do not look as ambiguous + benefit from better consistency with closures. With respect to function + arguments there's also the issue that not disambiguating with parenthesis + makes it less clear whether the type ascription applies to the or-pattern + as a whole or just the last alternative. 2. Should the `pat` macro fragment specifier match `top_pat` in different Rust editions or should it match `pat` as currently specified? From 151c8a6014d6f39257ac6d4fa433f13a39bc2c05 Mon Sep 17 00:00:00 2001 From: Mazdak Farrokhzad Date: Thu, 20 Sep 2018 22:54:12 +0200 Subject: [PATCH 23/26] rfc, or-patterns: typo & grammar nit?. --- text/0000-or-patterns.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/text/0000-or-patterns.md b/text/0000-or-patterns.md index 3eeba5be859..7eea43dd62f 100644 --- a/text/0000-or-patterns.md +++ b/text/0000-or-patterns.md @@ -837,11 +837,11 @@ There is support for or-patterns in [various lisp libraries][lisp_libs]. that closures permit `|Ok(x) | Err(x)|` without first wrapping in parenthesis? We defer this decision to stabilization as it may depend on experimentation. - Or current inclination is to keep the RFC as-is because the ambiguity is not + Our current inclination is to keep the RFC as-is because the ambiguity is not just for the compiler; for humans, it is likely also ambiguous and thus harder to read. - This also applies to functions, which although do not look as ambiguous + This also applies to functions which, although do not look as ambiguous, benefit from better consistency with closures. With respect to function arguments there's also the issue that not disambiguating with parenthesis makes it less clear whether the type ascription applies to the or-pattern From 1ad3e9de79e57ce0bf917f3ff64850e4ebba43b4 Mon Sep 17 00:00:00 2001 From: Mazdak Farrokhzad Date: Tue, 25 Sep 2018 04:50:14 +0200 Subject: [PATCH 24/26] rfc, or-patterns: clarify unification semantics for p | q. --- text/0000-or-patterns.md | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/text/0000-or-patterns.md b/text/0000-or-patterns.md index 7eea43dd62f..fab8e86c84c 100644 --- a/text/0000-or-patterns.md +++ b/text/0000-or-patterns.md @@ -579,6 +579,11 @@ The particular design of such an error message is left open to implementations. + the type of any two bindings with the same name in `p` and `q` do not unify with respect to types or binding modes. + [type coercions]: https://doc.rust-lang.org/reference/type-coercions.html + + Unification of types is in all instances aforementioned exact and + implicit [type coercions] do not apply. + 2. When type checking an expression `match e_s { a_1 => e_1, ... a_n => e_n }`, for each match arm `a_i` which contains a pattern of form `p_i | q_i`, the pattern `p_i | q_i` is considered ill formed if, From 0032122829d127623e36a2a36a45300d8f8929c9 Mon Sep 17 00:00:00 2001 From: Mazdak Farrokhzad Date: Sun, 7 Oct 2018 05:06:16 +0200 Subject: [PATCH 25/26] rfc, or-patterns: note about macros in guide. --- text/0000-or-patterns.md | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/text/0000-or-patterns.md b/text/0000-or-patterns.md index fab8e86c84c..2df3ac76890 100644 --- a/text/0000-or-patterns.md +++ b/text/0000-or-patterns.md @@ -433,6 +433,19 @@ You can also use `p | q` in: This restriction is currently enforced to avoid backtracking but may possibly be lifted in the future based on other developments in the grammar. +6. macros by example: + + ```rust + macro_rules! foo { + ($p:pat) => { ... } + } + + foo!((Ok(x) | Err(x))); + ``` + + Here we must wrap the pattern in parenthesis since `$p:pat | $q:pat` is + already legal in patterns. + # Reference-level explanation [reference-level-explanation]: #reference-level-explanation From c73692fcdfbad3e1a968b35af556ad5541feb53d Mon Sep 17 00:00:00 2001 From: Mazdak Farrokhzad Date: Sun, 7 Oct 2018 05:11:07 +0200 Subject: [PATCH 26/26] RFC 2535 --- text/{0000-or-patterns.md => 2535-or-patterns.md} | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) rename text/{0000-or-patterns.md => 2535-or-patterns.md} (99%) diff --git a/text/0000-or-patterns.md b/text/2535-or-patterns.md similarity index 99% rename from text/0000-or-patterns.md rename to text/2535-or-patterns.md index 2df3ac76890..1d7aeec183d 100644 --- a/text/0000-or-patterns.md +++ b/text/2535-or-patterns.md @@ -1,7 +1,7 @@ - Feature Name: `or_patterns` - Start Date: 2018-08-29 -- RFC PR: _ -- Rust Issue: _ +- RFC PR: [rust-lang/rfcs#2535](https://github.com/rust-lang/rfcs/pull/2535) +- Rust Issue: [rust-lang/rust#54883](https://github.com/rust-lang/rust/issues/54883) # Summary [summary]: #summary