Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse error recovery is obversable by macros in several cases #103534

Open
fmease opened this issue Oct 25, 2022 · 13 comments
Open

Parse error recovery is obversable by macros in several cases #103534

fmease opened this issue Oct 25, 2022 · 13 comments
Labels
A-diagnostics Area: Messages for errors, warnings, and lints A-macros Area: All kinds of macros (custom derive, macro_rules!, proc macros, ..) A-parser Area: The parsing of Rust source code to an AST C-bug Category: This is a bug. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@fmease
Copy link
Member

fmease commented Oct 25, 2022

Introduction

In multiple cases, macro fragment specifiers like expr and stmt match more token streams than they should as a consequence of the parser trying to recover from obviously invalid Rust code to provide better immediate and subsequent error messages to the user.

Why Is This a Concern?

The user should be allowed to assume that a fragment specifier only matches valid Rust code, anything else would make the fragment specifier not live up to its name and as a result render it useless (to exaggerate).

One use of macros is the ability to define embedded / internal domain-specific languages (DSLs). Part of that is defining new syntax which might not necessarily be valid Rust syntax. Declarative macros allow users to create several macro rules / matchers enabling relatively fine-grained matching on tokens. Obviously, when writing those rules, macro authors need to know what a given fragment specifier accepts in order to confidently determine which specific rule applies for a given input. If the grammar used by a fragment specifier is actually larger than promised and basically unknown (implementation-defined to be precise), this becomes an impossible task.

Not only that. If we don't do anything, the grammars matched by fragment specifiers will keep changing over time as more and more recovery code gets added. This breaks Rust's backward compatibility guarantees! Macro calls that used to compile at some fixed point in time might potentially no longer compile in a future version of the compiler. In fact, backward compatibility has already been broken multiple times in the past without notice by (some) PRs introducing more error recovery.

Examples

There might be many more cases than listed below but it takes a lot of time experimenting and looking through the parser. I'll try to extend the list over time.

Expressions

macro_rules! check {
    ($e:expr) => {};

    // or `===`, `!==`, `<>`, `<=>`, `or` instead of `and`
    ($a:literal and $b:literal) => {};
    ($a:literal AND $b:literal) => {};

    (not $a:literal) => {};
    (NOT $a:literal) => {};
}

check! { 0 AND 1 } // passes (all good)
check! { 0 and 1 } // ⚠️ FAILS but should pass! "`and` is not a logical operator"

check! { NOT 1 } // passes (all good)
check! { not 1 } // ⚠️ FAILS but should pass! "unexpected `1` after identifier"
stderr
error: `and` is not a logical operator
  --> src/lib.rs:13:12
   |
13 | check! { 0 and 1 } // ⚠️ FAILS but should pass! "invalid comparison operator"
   |            ^^^ help: use `&&` to perform logical conjunction
   |
   = note: unlike in e.g., Python and PHP, `&&` and `||` are used for logical operators

error: unexpected `1` after identifier
  --> src/lib.rs:16:14
   |
16 | check! { not 1 } // ⚠️ FAILS but should pass! "unexpected `1` after identifier"
   |          ----^
   |          |
   |          help: use `!` to perform bitwise not

Statements

macro_rules! check {
    ($s:stmt) => {};

    // or `auto` instead of `var`
    (var $i:ident) => {};
    (VAR $i:ident) => {};
}

check! { VAR x } // passes (all good)
check! { var x } // ⚠️ FAILS but should pass! "invalid variable declaration" (+ ICE, #103529)
stderr
error: invalid variable declaration
  --> src/lib.rs:10:10
   |
10 | check! { var x } // ⚠️ FAILS but should pass! "invalid variable declaration" (+ ICE, #103529)
   |          ^^^
   |
help: write `let` instead of `var` to introduce a new variable
   |
10 | check! { let x } // ⚠️ FAILS but should pass! "invalid variable declaration" (+ ICE, #103529)
   |          ~~~

Other Fragments (e.g. Items, Types)

[no known cases at the time of this writing (active search ongoing)]

Editorial Notes

editorial notes

I used to list some more cases above (which some of you might have noticed) but I've since removed them as they've turned out to be incorrect. Here they are:

macro_rules! check {
    ($e:expr) => {};

    ($a:literal++) => {};
    ($a:literal@@) => {};

    ($n:ident : $e:expr) => {};
}

check! { 1@@ } // passes (all good)
check! { 1++ } // FAILS but would fail ANYWAY! "Rust has no postfix increment operator"
// ^ without the recovery code, we would try to parse a binary expression (operator `+`) and fail at the 2nd `+`

check! { struct : loop {} } // passes (all good)
check! { main : loop {} } // FAILS but would fail ANYWAY! "malformed loop label"
// ^ without the recovery code, we would try to parse type ascription and fail at `loop {}` which isn't a type

Related issue: #90256 (concerning procedural macros).

@rustbot label A-macros A-diagnostics A-parser T-compiler

@fmease fmease added the C-bug Category: This is a bug. label Oct 25, 2022
@rustbot rustbot added A-diagnostics Area: Messages for errors, warnings, and lints A-macros Area: All kinds of macros (custom derive, macro_rules!, proc macros, ..) A-parser Area: The parsing of Rust source code to an AST I-prioritize Issue: Indicates that prioritization has been requested for this issue. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Oct 25, 2022
@fmease fmease changed the title Multiple cases where parse error recovery is obversable by macros Parse error recovery is obversable by macros in several cases Oct 25, 2022
@compiler-errors
Copy link
Member

We probably should be a lot more conservative with recoveries in macros. This reminds me of #103224.

@Noratrieb
Copy link
Member

I will put up a PR to add some more info the parser to make these checks easier in the future.

@compiler-errors
Copy link
Member

compiler-errors commented Oct 25, 2022

@Nilstrieb and I discussed this a bit.

Nils will add a flag to the parser state to identify when we want recovery to be enabled, and only do recovery when this is set to true. Then all parser recoveries that exist currently should eventually be gated behind this flag, so that errors are properly bubbled up within macros, so that we call fall ahead to subsequent matcher branches.

For example, given this:

macro_rules! blah {
    ($expr:expr) => {};
    (not $expr:expr) => {};
}

fn main() {
    blah!(not 1);
}

We should not eagerly recover not as ! when parsing not 1, at least not until no other match arms correctly match the macro input.

In the case that no arms match successfully, we can retry all of the match arms with recovery enabled, which allows something like this:

macro_rules! blah {
    ($expr:expr) => {};
    (let) => {};
}

fn main() {
    blah!(not 1);
}

... to continue to give us a useful error message:

error: unexpected `1` after identifier
 --> <source>:7:15
  |
7 |     blah!(not 1);
  |           ----^
  |           |
  |           help: use `!` to perform bitwise not

error: aborting due to previous error

@Noratrieb
Copy link
Member

Noratrieb commented Oct 25, 2022

This brings up a fun question. Should this code compile?

macro_rules! what {
    ($e:expr) => { compile_error!("no") };
    ($($tt:tt)*) => {};
}

fn wh() {
    what! {
        match 1;
    }
}

My knowledge about macros (and the two sentences the reference says about this) indicates that this should work. The first arm fails, but the second matches. But this doesn't compile, as the parser .emit()s a diagnostic on this broken code (via some cool lookahead and recovery).

So we not only have to gate all recovery directly, but also all emissions of diagnostics behind the flag. There are currently 165 references to .emit() in rustc_parse, so that's gonna be some work. But I believe that this is worth it, as the current behavior just makes no sense. Given enough awareness around compiler contributors, I think we can make sure that no new recovery is added without gating it behind the flag.

@compiler-errors
Copy link
Member

I believe it should be compile, and that my changes described above and discussed in your comment ("So we not only have to gate all recovery directly, but also all emissions of diagnostics behind the flag.") would be sufficient.

Perhaps it's worthwhile to nominate this for lang team discussion, since that code you linked @Nilstrieb hasn't compiled since before 1.0 😅

@compiler-errors compiler-errors added the I-lang-nominated Nominated for discussion during a lang team meeting. label Oct 25, 2022
@compiler-errors
Copy link
Member

Question for the lang team:

  1. Should this code compile?
macro_rules! what {
    ($e:expr) => { compile_error!("no") };
    ($($tt:tt)*) => {};
}

fn wh() {
    what! {
        match 1;
    }
}
  1. Is it generally okay to "pare back" existing recovery behavior to make sure that something like this compiles?
macro_rules! blah {
    ($expr:expr) => {};
    (not $expr:expr) => {};
}

fn main() {
    blah!(not 1);
}

--

This may even be more of a "implementation" question than a lang question, to perhaps this nomination is moot.

@scottmcm
Copy link
Member

I think the core problem is that macro_rules are insufficiently future-proofed, leading to other problems too, like #86730.

My ideal solution (if we can find a way to do it without breaking the world) would be to make the second arm in

macro_rules! check {
    ($e:expr) => {};
    (not $a:literal) => {};
}

be useless, as anything it could potentially match would go into the first arm instead, and then produce an error if it's not actually a valid expr under the relevant edition rules. (TBH I don't even know if $e:expr here uses the macro definition's edition or the caller's edition when parsing.) This would probably

Otherwise for anything we try to add to the expr grammar there's an example like this that's a breaking change.

After all, if people really want special handing for something, it would be better for them to put that first, like

macro_rules! check {
    (not $a:literal) => {};
    ($e:expr) => {};
}

so that if one day we did add not false as supported syntax, it would keep working.

(Maybe that implies that for each macro matcher we'd have some "start set", like can_begin_expr? And then it's only adding something new to the matcher's start set that would be a breaking change, for which we'd need an edition change or a new matcher.)

@Noratrieb
Copy link
Member

Noratrieb commented Oct 26, 2022

The matching logic uses the starting set already (presumably as a fast path)
EDIT: Not just as a fast path, it's important semantically

@fmease
Copy link
Member Author

fmease commented Oct 26, 2022

My ideal solution (if we can find a way to do it without breaking the world) would be to make the second arm in […] useless

We could definitely do that for decl macros 2.0 since they are unstable. For decl macros 1.2, we could start issuing future incompat warnings.

I wonder if that'd make them less powerful.

@Noratrieb
Copy link
Member

Noratrieb commented Oct 26, 2022

I did some more investigation on this. This thread contains quite some confusion about the actually implemented matching semantics (mostly because of myself) but I think I can now clean this up.

The currently implemented macro matching semantics

One arm is tried after another in declaration order. If matching or an arm fails with the Failure error, the next arm is tried. When no more arms are left, the macro matching failed and an error is reported. If an arm fails with the Error/ErrorReported error (the difference between the two is an implementation detail and does not matter here), the macro matching is considered failed, an error is reported and no further arms are tried.

When encountering a nonterminal specifier, a check is done whether the first token can start this nonterminal kind. If it cannot, the nonterminal is not entered (which can either lead to a Failure later as the matcher is out of matchers, or work just fine, for example with $( $_:literal)? struct matching struct). If the first token can start this nonterminal, the parser is invoked to parse the current tokens as that nonterminal.

The parser will then parse as many tokens as necessary into that nonterminal. Importantly, it may not consume all input if it deems the further grammar to be invalid. For example, NOT 1 is successfully parsed as the NOT identifier, not consuming the 1 yet.

If the parser fails to parse the nonterminal, a fatal error is emitted and ErrorReported is returned:

let nt = match parser.to_mut().parse_nonterminal(kind) {
Err(mut err) => {
err.span_label(
span,
format!(
"while parsing argument for this `{kind}` macro fragment"
),
)
.emit();

If, after all matchers have been exhausted, there are still tokens around in the input, the matching is considered a Failure. This also applies the other way around, if there are matchers remaining but all tokens consumed.

With these semantics, this code should indeed not compile, as the expression grammar requires a { after the scrutinee expression:

macro_rules! what {
    ($e:expr) => { compile_error!("no") };
    ($($tt:tt)*) => {};
}

what! {
    match 1;
}

On the other hand, this code works just fine:

macro_rules! check {
    ($e:expr) => {};
    (NOT $a:literal) => {};
}

check! { NOT 1 }

The expression parser is invoked with NOT 1, because an identifier can start an expression. It then parses the NOT identifier as an identifier. It looks ahead to see whether the expression continues, but a literal cannot follow after an expression, so it deems its parsing to be completed and returns a successful result, leaving 1 behind in the token stream. After the expression, the matchers are over, but there are still tokens remaining in the input stream. This is a Failure error condition, so it goes to the next arm, which then matches successfully.

Where this goes wrong

macro_rules! check {
    ($e:expr) => {};
    (not $a:literal) => {};
}

check! { not 1 }

This should be equivalent to NOT example. But this is where the parser recovery comes in. After successfully parsing a not identifier, the parser eagerly looks at the next token to see whether it can start an expression. If it can, report and error and recover into !expr. In the correct Rust grammar, this can never happen in valid code, so this is fully legal for it to do normally.

token::Ident(..) if this.is_mistaken_not_ident_negation() => {
make_it!(this, attrs, |this, _| this.recover_not_expr(lo))
}

But this breaks the semantics described above. During nonterminal parsing, the parser must not consume more tokens than necessary, as its early return while leaving tokens behind is an important part of matching nonterminals.

What now

Based on all of this, I have two conclusions. Firstly, I believe that the current semantics make sense and should continue working like this. I would still prefer this being discussed in the lang team anyways just to make sure.

Also, it is clear that doing eager recovery by consuming more tokens than necessary should not happen in the nonterminal parser, ever. So the effort started by #103544 should continue, as this is clearly a bug.

matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Oct 27, 2022
…ler-errors

Add flag to forbid recovery in the parser

To start the effort of fixing rust-lang#103534, this adds a new flag to the parser, which forbids the parser from doing recovery, which it shouldn't do in macros.

This doesn't add any new checks for recoveries yet and is just here to bikeshed the names for the functions here before doing more.

r? `@compiler-errors`
@rustbot rustbot removed the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label Oct 28, 2022
@joshtriplett
Copy link
Member

We discussed this in today's @rust-lang/lang meeting.

We'd like both of these examples to work. Making the one with compile_error work seems like it'll take more work, and may need a more detailed proposal. Making the one with not work seems like a bugfix that lang doesn't need to review and approve.

@joshtriplett joshtriplett removed the I-lang-nominated Nominated for discussion during a lang team meeting. label Nov 8, 2022
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Nov 16, 2022
Only do parser recovery on retried macro matching

Eager parser recovery can break macros, so we don't do it at first. But when we already know that the macro failed, we can retry it with recovery enabled to still emit useful diagnostics.

Helps with rust-lang#103534
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Nov 16, 2022
Only do parser recovery on retried macro matching

Eager parser recovery can break macros, so we don't do it at first. But when we already know that the macro failed, we can retry it with recovery enabled to still emit useful diagnostics.

Helps with rust-lang#103534
@fmease
Copy link
Member Author

fmease commented Nov 19, 2022

I wonder if we should take a different approach for fixing this issue instead of using .may_recover(). Bear in mind that I am not 100% familiar with the parser & macro matching internals, so the following idea might not be workable.

What about changing PResult / adding a flag to the parser to keep track of whether a recovery occured. If it did then the token stream would not actually match a given macro matcher (hmm, but that would probably mean we need to backtrack somehow, unclear). This might introduce a performance penalty.

This way, we can still show a custom error message without making the syntactically malformed input well-formed. Right now for example, m! { fn<>() } (where m expects a ty) yields the error expected ( found < instead of function pointer types may not have generic parameters (#104223) as we use .may_recover() for this case.

Just throwing my thoughts out there.

@Noratrieb
Copy link
Member

In some cases this may work, but not always. Looking at the not 0 example, the parser needs to be edited. The parser looked ahead and eagerly recovers, emits an error and returns success. So even if the matcher then knew that recovery occurred (and the parser would somehow need to keep track of that, so we'd still need to annotate recovery) the session has the emitted error anyways.

But I agree that may_recover isn't great, but I haven't found anything better yet.

Aaron1011 pushed a commit to Aaron1011/rust that referenced this issue Jan 6, 2023
Only do parser recovery on retried macro matching

Eager parser recovery can break macros, so we don't do it at first. But when we already know that the macro failed, we can retry it with recovery enabled to still emit useful diagnostics.

Helps with rust-lang#103534
bors added a commit to rust-lang-ci/rust that referenced this issue Sep 13, 2023
…iser

Only suggest turbofish in patterns if we may recover

Fixes [after backport] rust-lang#115780.

CC rust-lang#103534.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-diagnostics Area: Messages for errors, warnings, and lints A-macros Area: All kinds of macros (custom derive, macro_rules!, proc macros, ..) A-parser Area: The parsing of Rust source code to an AST C-bug Category: This is a bug. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

6 participants