Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Add match/in statements #2144

Closed
wants to merge 5 commits into from
Closed

RFC: Add match/in statements #2144

wants to merge 5 commits into from

Conversation

Nokel81
Copy link
Contributor

@Nokel81 Nokel81 commented Sep 10, 2017

This is an RFC for adding overlapping match statements using the syntax of match/in.

Edit: rendered

@Ixrec
Copy link
Contributor

Ixrec commented Sep 10, 2017

It's not really clear to me from the RFC as written, so: Is the only benefit of this new syntax the ability to have multiple patterns per-arm using the pipe operator? If so, why does that require a new kind of match block?

#1882 essentially proposed adding the pipe operator to the existing match block. If I understand that comment thread correctly, it was closed not because of some fundamental problem with that idea, but because the RFC required a lot more work and wasn't a high priority.

Copy link
Contributor

@jonas-schievink jonas-schievink left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't help but feel that this RFC is a bit poorly written (no offense). It confuses terms, uses inaccurate language and contains grammatical mistakes. This can be fixed, of course.

It also feels like it is missing a proper reason to complicate the language with non-obvious syntax like this, only containing a single example that works almost just as well today.

# Summary
[summary]: #summary

This feature is to facilitate the writing and using of `match` statements in a whole new way.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"a whole new way" makes it sound like an advertisement and doesn't convey any information the reader might be interested in. It's also grammatically questionable.


Benefits of this syntax:
1. No new keywords need to be used. This is good thing since it means for a relatively small
addition there would be no breaks.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"there would be no breaks"? What's that supposed to mean?


Basic Syntax:
```rust
match val in {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the only difference to the existing match is a small keyword in the first line, I feel like it's too hard to see what kind of match block you're in when editing something in an arm.

I also think in as a distinguishing keyword is far too meaningless for such a large change in semantics. It's not obvious at all what's happening.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I chose in because it was the best choice from all the keywords already in the language. Also, as I mentioned in the list of benefits for/in is used for doing something with every element in an iterator so match/in is used for doing something with every branch.

match val in {
pat | pat => expr,
pat => expr
} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This deviation from the _ => ... syntax used by "regular" match is a bit unfortunate

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be a bit unfortunate but if you consider the meaning of _ it makes sense. The _ pattern means to match everything.


Meaning of parts:
1. The `else` is used in a similar sort of vein to that of the `_` pattern in normal matches and
could include several of the same warnings. The expression enclosed within this is only executed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"several of the same warnings" which warning are you referring to?


Edge cases:
1. If the `_` pattern in present in any of the contained matches and the `else` block is also
present then a `unreachable_code` lint on the code within the `else` block
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another missing verb. Perhaps "is emitted"? Also it feels weird that it's not an unreachable pattern warning, but it's not a pattern, so it seems technically correct...


Implementation Assumptions:
1. Assuming that a `match` statement is currently implemented similar to a long chain of
`if/else if` statements.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not, how would it be implemented in this way? if only works on booleans. match is built into the language and lowered into MIR later in the compiler.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will rewrite this. I chose to use the word if because it signified a check and a jump

`if/else if` statements.

Implementation:
1. This can be implemented as if it was a list of `if` statements.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, but it could be implemented as a list of matches.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would that work? I do not see an implementation via matches

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Take a look at the macro I posted below

1. This can be implemented as if it was a list of `if` statements.
2. To cover the `else` case the location to jump to at the end after checking all the branches
can be stored, initially set to the start of the `else` block but if it enters any of the
branches then it is set to immediately after the `else` block.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just storing a bool is sufficient.

[how-we-teach-this]: #how-we-teach-this

This should be called `match/in` statements since that is the combination of keywords that are
used similar to `for/in` statements. This idea would be best presented as a continuation of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None of these are actually statements, they're expressions.

@Nokel81
Copy link
Contributor Author

Nokel81 commented Sep 10, 2017

@Ixrec No I am not proposing just allowing multiple arms use the pipe operator since that is currently already allowed. This allows multiple arms to be executed.

@jonas-schievink
Copy link
Contributor

@Nokel81 That RFC proposed allowing | in all patterns. | is not currently part of pattern syntax, match just accepts a list of patterns separated by |. IIRC it didn't propose a "sequential match" like this, though.

@jonas-schievink
Copy link
Contributor

FWIW this is reasonably easy to implement with a macro (with many caveats and without lints, of course)

@Nokel81
Copy link
Contributor Author

Nokel81 commented Sep 10, 2017

@jonas-schievink Yes I know that it can be implemented with a macro but as you said it doesn't have all the lints or exhaustiveness checking the main two benefits of adding it as a fully supported control flow structure

```
into
```rust
match cmp.compare(&array[left], &array[right]) in {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The semantic equality of these two versions implies that match ... in, where a pattern is used in more than one match arm, has well-defined semantics for the order of match arms such that the statements of the first mention of pattern happens-before the statements of the second mention. This might be intuitive, or not at all - and such a divergence from the semantics of match should be syntactically clearer, in my opinion.

I feel this proposal is too close to the switch statements of C/C++/Java which, as far as I know, are confusing to newcomers. I think this is because intuitively, humans think of choice as exclusive - so "A or B" is parsed as "A xor B", which is why a lot of introductory books in discrete mathematics and logic has to clarify "or vs xor" (example: Logic in Computer Science, Huth & Ryan, p.4).

I think the price of repetition is worth the potential unreadability of the proposed match ... in construct.

addition there would be no code breaks of existing code with this change.
2. The `in` seems to imply that it sort of like an "iterator" of statements and will go through
each of them in turn.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Semantically, I consider this a big addition - which is why I'd like a more visible syntactic difference where this RFC to be accepted. Therefore, I consider the slight variation in syntax to be a downside. I expand on this further down.

Instead of in, I'd recommend using a terminal that more clearly indicates that this is top-down and more crucially that multiple branches may be taken. Perhaps the syntax might be: match inclusive <expr> { <patterns> }, or match topdown <expr> { <patterns> }.

  1. I don't agree that in implies this nearly strong enough for this to be instantly understandable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is fair and I am pretty sure that adding an additional keyword here wouldn't break any code since if a variable was called what ever it was that would be able to be checked but that is definitely something to consider

Copy link
Contributor

@Centril Centril Sep 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, adding the additional "keyword" should work w/o backwards incompatibility since there's no (iirc) parsing rule "<expr> <expr>".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a correct DFA adding that state would be rather complicated but I agree is do able

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... DFA for what? Lexing? The language is not regular...
Do note that parsing Rust is already at least context sensitive.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not actually looked that deep into Rust I didn't consider it to be that high order, though tbh I don't know why

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rust's lexical grammar is already context-sensitive due to r##"strings like this"##. The actual grammar after tokenization makes do with a fixed lookahead (I forgot how much exactly, something in the order of 5 tokens or so).

Not a single real programming language is regular (not counting assembly and machine code). Fun fact: Even the simple-looking Lua grammar requires infinite lookahead to parse and is thus harder to parse than Rust.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


This should not be done because it increases the size of language and might not be used by
everyone.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to see some discussion of the readability of inclusive top-down multiple-branch match constructs.

@Centril
Copy link
Contributor

Centril commented Sep 11, 2017

Sorry to invoke Wadler's Law on you, but I believe you have to really consider the readability of the syntax w.r.t. its semantics and possible confusion with the normal semantics of match. If you believe very strongly in this proposal, I'd consider a more obvious and clear syntax.

@Nokel81
Copy link
Contributor Author

Nokel81 commented Sep 11, 2017

@Centril Since I do believe strongly in this proposal I will consider a more obvious syntax. However, I also would not like to have to wait for an epoch for a new keyword so maybe having a keyword with @ prepended to it.

Would something like this be easier to differentiate:

@all_branches
match x {
    pat => expr
    ...
} else {
    expr
}

@Centril
Copy link
Contributor

Centril commented Sep 11, 2017

@Nokel81 I don't think you have to wait for an epoch for match <modifer> <expr> { ... } or match <expr> <modifer> { ... } and prepending @all_branches seems a bit un-rustic.

On with the 🚲 🏠 ... =)

@Nokel81
Copy link
Contributor Author

Nokel81 commented Sep 11, 2017

I think that match topdown is the best solution and I will update the PR

@Techcable
Copy link

Techcable commented Sep 11, 2017

Wouldn't a 'fallthrough' statement work just as well and better indicate the desired behavior?
The pattern wouldn't need to be repeated and it's clear that control should fallthrough to the next case just like it would implicitly in C or Java. I believe Go has this feature and I've often wished I could explicitly 'opt in' to fallthrough.

@Nokel81
Copy link
Contributor Author

Nokel81 commented Sep 11, 2017

Do you mean match fallthrough?

@Techcable
Copy link

Sorry about the confusion, I was editing my post to elaborate as you were asking that question.
I mean that the user could explicitly request fallthrough behavior only on a specific case with a fallthrough statement like go has.
Although I don't think a simple fallthrough statement wouldn't work for your specific example, I think a 'labeled' fallthrough statement might work, though you'd have to play around with go to see how it works.

@Nokel81
Copy link
Contributor Author

Nokel81 commented Sep 11, 2017

Thank you for the clarification. However I don't think that is an alternative to this proposal because fallthrough seems to indicate not checking (sort of like how C switch statements fall through automatically) and that is not what I want, the pattern matching is a key part of the idea

@Centril
Copy link
Contributor

Centril commented Sep 11, 2017

@Techcable That might imply that it is like C/C++/Javas fall-through which is not exactly what this RFC proposes. The only commonality is that this RFC and C style switch fall-through can execute the statements of multiple match arms - but in completely different ways. This is not "go to the next case even if the pattern did not match" but rather "execute all matching paths starting with the first one mentioned lexically". This RFC proposes something much less worse than C style fall-throughs.

@Nokel81 Another issue you have to clarify in this RFC is how to deal with the scenario when you have multiple patterns being valid but the type of their last expression is other than ().

How do you deal with the following?

let x = match topdown expr {
    A | B => 1,
    C | B => 2,
};

What is the value of x? I see 2 options: 1) Do not allow this, i.e: the type must be (), 2) The value is the last (lexically) expression.

Also - I think I'm slightly more in favour of match inclusive because it indicates that all branches matching will be taken - however, it does not indicate the top-down nature. However, match <expr> in all patterns matching and evaluate top down is a bit of a mouthful.

Btw... what is the most important part of this RFC...? The inclusive part, or the top-down part? You could have inclusive behavior but let the execution order be unspecified and depending on it be undefined behaviour.

@Nokel81
Copy link
Contributor Author

Nokel81 commented Sep 11, 2017

@Centril I need to add that to the file. The inclusive part is much more important and I think that the return type must be () because any other ordering of what the final value is would be completely arbitrary.

However, having the last branch's value being chosen also doesn't seem that strange since Rust code is sequential in nature to begin with and some people might want a return value with some side effect.

@burdges
Copy link

burdges commented Sep 11, 2017

There was previously discussion on this in #2003 where folks produced some macro to do roughly this. Also, the one example provided by the RFC can be handled using the ordering impls for Ordering :

let x = cmp.compare(&array[left], &array[right]);
if x <= Equal { .. }
if x >= Equal { .. }

I think match statements should stick with providing both exhaustiveness and disjointness checks because together they provide a more natural code invariant, i.e. exhaustiveness does not mean the same thing without disjointness.

If you want to avoid the disjointness checks of match, like the RFC does, then you should be using if let instead. We should focus on supporting | for if let and while let instead of the syntax proposed here. It'll be easier to audit exhaustiveness manually in the rare cases where you want this than to confuse readers about the presence or absence of disjointness checks everywhere.

Also, if you do want exhaustiveness but not disjointness checks, like this RFC proposes, then you might want unusual exhaustiveness checks that this RFC cannot provide, like asking that two arms get entered or something more subtle. In such cases, you could provide the desired assurance with assertions and tests, but one could imagine more flexible tools that prove specific code invariants hold at compile time.

tl;dr We have a mechanism to drop exhaustiveness and disjointness checks via if let that should be improved upon instead of creating niche syntaxes that harm readability of match.

@burdges
Copy link

burdges commented Sep 11, 2017

Also, if you have complex flow control, then frequently nested functions or closures produce more readable code, partially because they preserve disjointness, and they provide far more flexibility regardless.

fn foo(...) {
    ...
    fn doAnB() { .. }
    fn doBnC() { .. }
    fn doAnC() { .. }
    match x {
        A => { doAnB(); doAnC(); }
        B => { doAnB(); doBnC(); }
        C => { doAnC(); doBnC(); } 
    }
    ...
}

In particular, we should observe that this approach supports order of execution impossible with the proposed RFC, meaning the RFC is truly quite a niche construct.

@Nokel81
Copy link
Contributor Author

Nokel81 commented Sep 11, 2017

@burdges Yes I know of the rough macro but even with a macro exhaustiveness checks are not possible.

Currently there is no way to have something more subtle and I would disagree that extending if let is the better option because once you want exhaustiveness you are asking the compiler to do what is basically done in a match.

Also something as subtle as only entering two branches I would consider part of what you call "complex flow control" and that would able to be done with this RFC and a label for breaking.

sensitive language. And adding such a keyword increases the perceptual area of the new syntax
so as to make it clear which type of match is being used.
2. The word `fallthrough` is used because it implies that after a branch is finished then the
control falls through to the check of the next branch.

Copy link
Contributor

@Centril Centril Sep 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think fallthrough implies that it will execute the statements / yield the expressions of the next match arm irrespective of whether the it's pattern holds or not, i.e: readers will expect C++ style switch behavior.

Instead I'd like to add match many to the 🚲 🏠

`else` would be marked as unreadable.

Edge cases:
1. If the `_` pattern in present in any of the contained matches and the `else` block is also
present then a `unreachable_code` lint is emitted on the code within the `else` block
2. Since the main reason for using a `match` is the exhaustiveness checks as long as there isn't
an `else` block then the compiler will output an error for `non-exhaustive patterns` if not all
branches of the `match/in` are exhaustive.
branches of the `match/fallthrough` are exhaustive.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another possibility would be to use ! (never type) pattern instead of else to make the syntax more terse.
This would work because let value: T = panic!(); where T is any type is valid - i.e: bottom inhabits every type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So would it be the following (for clarification):

match many x {
    _ if x % 5 == 0 => expr,
    ! => expr
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I definitely see the appeal of it, since as you say it is much more terse, but I think that it would be more confusing since it is in the area of pat/expr of the expression but does not follow the same "rules" as the other patterns since it will only be executed if no other patterns are executed sort of how the else block of an if statement is only executed if none of the other ifs or else ifs are executed

@Centril
Copy link
Contributor

Centril commented Sep 11, 2017

@burdges I agree completely on extending if let and while let with | - perhaps a new RFC? or this one? What does @Nokel81 think?

When it comes to exhaustiveness checking, couldn't you simply reuse the current checks of match that all patterns are used at least once - and if the set of patterns is then over-specified, you simply drop the unreachable_patterns warning?

As you argue, this RFC is probably too niche, but it might be worth it to tests its use in nightly and see if it becomes popular...? We could make this an experimental RFC perhaps? Technically implementing this RFC should not be too complicated. As long as it is clearly marked syntactically that the construct is different from an ordinary match, I think it might be fine.

@Nokel81
Copy link
Contributor Author

Nokel81 commented Sep 11, 2017

@Centril I think that that would be a good idea but I think it should be a new RFC (I can write it up).

However with the exhaustiveness checking, yes it might be possible to reuse the current checks but then there has to be some sort of syntax for what if lets to check and I would argue that this is much more concise and understandable.

… like a C-style `switch` statement and `match all` impies that every branch is automatically matched
@withoutboats withoutboats added the T-lang Relevant to the language team, which will review and decide on the RFC. label Sep 14, 2017
@pthariensflame
Copy link
Contributor

What happens to data that gets moved out of a pattern more than once?

enum Foo {
    Bar(String),
    Baz(String),
    Quux(String),
}

let foo = Foo::Baz("example".to_string());
match foo in {
    Bar(s) | Baz(s) => …,
    Quux(s) | Baz(s) => …,
} // ???

@Centril
Copy link
Contributor

Centril commented Sep 14, 2017

@pthariensflame A workable semantic would be to simply throw a borrowck error on the second pattern indicating that a move occurred on the first pattern. This fits well with the principle of least surprise.

It would still be possible to use a by-ref pattern to avoid the move.

@hadronized
Copy link
Contributor

hadronized commented Oct 4, 2017

Btw, our current match blocks already support piping and moving if the types of the captures match, so what’s the whole point of that RFC in the end?

fn main() {
    let x: Result<&str, &str> = Ok("Hello, world!");
    
    match x {
      Ok(x) | Err(x) => println!("{}", x)
    }
}

EDIT: nevermind, I got it. And I don’t see the whole point of that. Use if let.

@Nokel81
Copy link
Contributor Author

Nokel81 commented Oct 4, 2017

Having every branch being checked and have the possibility to execute if matched even if a previous branch has matched

@nikomatsakis
Copy link
Contributor

@rfcbot fcp close

This RFC proposes a form of match in which more than one arm can execute (a kind of fall through). This seems like something that would potentially be useful from time to time, but not often enough to merit addition as a major feature, and match blocks can already be a source of stumbling blocks for learners. Moreover, it could readily be implemented as a procedural macro. Therefore, I move to close the RFC.

Thanks @Nokel81 for the suggestion, in any case!

@rfcbot
Copy link
Collaborator

rfcbot commented Jan 25, 2018

Team member @nikomatsakis has proposed to close this. The next step is review by the rest of the tagged teams:

No concerns currently listed.

Once these reviewers reach consensus, this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@rfcbot rfcbot added the proposed-final-comment-period Currently awaiting signoff of all team members in order to enter the final comment period. label Jan 25, 2018
@rfcbot
Copy link
Collaborator

rfcbot commented Jan 31, 2018

🔔 This is now entering its final comment period, as per the review above. 🔔

@rfcbot rfcbot added final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. and removed proposed-final-comment-period Currently awaiting signoff of all team members in order to enter the final comment period. labels Jan 31, 2018
@rfcbot
Copy link
Collaborator

rfcbot commented Feb 10, 2018

The final comment period is now complete.

1 similar comment
@rfcbot
Copy link
Collaborator

rfcbot commented Feb 10, 2018

The final comment period is now complete.

@aturon
Copy link
Member

aturon commented Feb 14, 2018

Closing as per @nikomatsakis's summary. Thanks again, @Nokel81!

@aturon aturon closed this Feb 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. T-lang Relevant to the language team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.