Parse error recovery is obversable by macros in several cases #103534

fmease · 2022-10-25T16:50:42Z

Introduction

In multiple cases, macro fragment specifiers like expr and stmt match more token streams than they should as a consequence of the parser trying to recover from obviously invalid Rust code to provide better immediate and subsequent error messages to the user.

Why Is This a Concern?

The user should be allowed to assume that a fragment specifier only matches valid Rust code, anything else would make the fragment specifier not live up to its name and as a result render it useless (to exaggerate).

One use of macros is the ability to define embedded / internal domain-specific languages (DSLs). Part of that is defining new syntax which might not necessarily be valid Rust syntax. Declarative macros allow users to create several macro rules / matchers enabling relatively fine-grained matching on tokens. Obviously, when writing those rules, macro authors need to know what a given fragment specifier accepts in order to confidently determine which specific rule applies for a given input. If the grammar used by a fragment specifier is actually larger than promised and basically unknown (implementation-defined to be precise), this becomes an impossible task.

Not only that. If we don't do anything, the grammars matched by fragment specifiers will keep changing over time as more and more recovery code gets added. This breaks Rust's backward compatibility guarantees! Macro calls that used to compile at some fixed point in time might potentially no longer compile in a future version of the compiler. In fact, backward compatibility has already been broken multiple times in the past without notice by (some) PRs introducing more error recovery.

Examples

There might be many more cases than listed below but it takes a lot of time experimenting and looking through the parser. I'll try to extend the list over time.

Expressions

macro_rules! check {
    ($e:expr) => {};

    // or `===`, `!==`, `<>`, `<=>`, `or` instead of `and`
    ($a:literal and $b:literal) => {};
    ($a:literal AND $b:literal) => {};

    (not $a:literal) => {};
    (NOT $a:literal) => {};
}

check! { 0 AND 1 } // passes (all good)
check! { 0 and 1 } // ⚠️ FAILS but should pass! "`and` is not a logical operator"

check! { NOT 1 } // passes (all good)
check! { not 1 } // ⚠️ FAILS but should pass! "unexpected `1` after identifier"

stderr

error: `and` is not a logical operator
  --> src/lib.rs:13:12
   |
13 | check! { 0 and 1 } // ⚠️ FAILS but should pass! "invalid comparison operator"
   |            ^^^ help: use `&&` to perform logical conjunction
   |
   = note: unlike in e.g., Python and PHP, `&&` and `||` are used for logical operators

error: unexpected `1` after identifier
  --> src/lib.rs:16:14
   |
16 | check! { not 1 } // ⚠️ FAILS but should pass! "unexpected `1` after identifier"
   |          ----^
   |          |
   |          help: use `!` to perform bitwise not

Statements

macro_rules! check {
    ($s:stmt) => {};

    // or `auto` instead of `var`
    (var $i:ident) => {};
    (VAR $i:ident) => {};
}

check! { VAR x } // passes (all good)
check! { var x } // ⚠️ FAILS but should pass! "invalid variable declaration" (+ ICE, #103529)

stderr

error: invalid variable declaration
  --> src/lib.rs:10:10
   |
10 | check! { var x } // ⚠️ FAILS but should pass! "invalid variable declaration" (+ ICE, #103529)
   |          ^^^
   |
help: write `let` instead of `var` to introduce a new variable
   |
10 | check! { let x } // ⚠️ FAILS but should pass! "invalid variable declaration" (+ ICE, #103529)
   |          ~~~

Other Fragments (e.g. Items, Types)

[no known cases at the time of this writing (active search ongoing)]

Editorial Notes

editorial notes

I used to list some more cases above (which some of you might have noticed) but I've since removed them as they've turned out to be incorrect. Here they are:

macro_rules! check {
    ($e:expr) => {};

    ($a:literal++) => {};
    ($a:literal@@) => {};

    ($n:ident : $e:expr) => {};
}

check! { 1@@ } // passes (all good)
check! { 1++ } // FAILS but would fail ANYWAY! "Rust has no postfix increment operator"
// ^ without the recovery code, we would try to parse a binary expression (operator `+`) and fail at the 2nd `+`

check! { struct : loop {} } // passes (all good)
check! { main : loop {} } // FAILS but would fail ANYWAY! "malformed loop label"
// ^ without the recovery code, we would try to parse type ascription and fail at `loop {}` which isn't a type

Related issue: #90256 (concerning procedural macros).

@rustbot label A-macros A-diagnostics A-parser T-compiler

The text was updated successfully, but these errors were encountered:

compiler-errors · 2022-10-25T17:56:55Z

We probably should be a lot more conservative with recoveries in macros. This reminds me of #103224.

Noratrieb · 2022-10-25T18:35:54Z

I will put up a PR to add some more info the parser to make these checks easier in the future.

compiler-errors · 2022-10-25T19:17:21Z

@Nilstrieb and I discussed this a bit.

Nils will add a flag to the parser state to identify when we want recovery to be enabled, and only do recovery when this is set to true. Then all parser recoveries that exist currently should eventually be gated behind this flag, so that errors are properly bubbled up within macros, so that we call fall ahead to subsequent matcher branches.

For example, given this:

macro_rules! blah {
    ($expr:expr) => {};
    (not $expr:expr) => {};
}

fn main() {
    blah!(not 1);
}

We should not eagerly recover not as ! when parsing not 1, at least not until no other match arms correctly match the macro input.

In the case that no arms match successfully, we can retry all of the match arms with recovery enabled, which allows something like this:

macro_rules! blah {
    ($expr:expr) => {};
    (let) => {};
}

fn main() {
    blah!(not 1);
}

... to continue to give us a useful error message:

error: unexpected `1` after identifier
 --> <source>:7:15
  |
7 |     blah!(not 1);
  |           ----^
  |           |
  |           help: use `!` to perform bitwise not

error: aborting due to previous error

Noratrieb · 2022-10-25T19:21:15Z

This brings up a fun question. Should this code compile?

macro_rules! what {
    ($e:expr) => { compile_error!("no") };
    ($($tt:tt)*) => {};
}

fn wh() {
    what! {
        match 1;
    }
}

My knowledge about macros (and the two sentences the reference says about this) indicates that this should work. The first arm fails, but the second matches. But this doesn't compile, as the parser .emit()s a diagnostic on this broken code (via some cool lookahead and recovery).

So we not only have to gate all recovery directly, but also all emissions of diagnostics behind the flag. There are currently 165 references to .emit() in rustc_parse, so that's gonna be some work. But I believe that this is worth it, as the current behavior just makes no sense. Given enough awareness around compiler contributors, I think we can make sure that no new recovery is added without gating it behind the flag.

compiler-errors · 2022-10-25T19:25:29Z

I believe it should be compile, and that my changes described above and discussed in your comment ("So we not only have to gate all recovery directly, but also all emissions of diagnostics behind the flag.") would be sufficient.

Perhaps it's worthwhile to nominate this for lang team discussion, since that code you linked @Nilstrieb hasn't compiled since before 1.0 😅

compiler-errors · 2022-10-25T19:27:18Z

Question for the lang team:

Should this code compile?

macro_rules! what {
    ($e:expr) => { compile_error!("no") };
    ($($tt:tt)*) => {};
}

fn wh() {
    what! {
        match 1;
    }
}

Is it generally okay to "pare back" existing recovery behavior to make sure that something like this compiles?

macro_rules! blah {
    ($expr:expr) => {};
    (not $expr:expr) => {};
}

fn main() {
    blah!(not 1);
}

--

This may even be more of a "implementation" question than a lang question, to perhaps this nomination is moot.

scottmcm · 2022-10-26T00:43:04Z

I think the core problem is that macro_rules are insufficiently future-proofed, leading to other problems too, like #86730.

My ideal solution (if we can find a way to do it without breaking the world) would be to make the second arm in

macro_rules! check {
    ($e:expr) => {};
    (not $a:literal) => {};
}

be useless, as anything it could potentially match would go into the first arm instead, and then produce an error if it's not actually a valid expr under the relevant edition rules. (TBH I don't even know if $e:expr here uses the macro definition's edition or the caller's edition when parsing.) This would probably

Otherwise for anything we try to add to the expr grammar there's an example like this that's a breaking change.

After all, if people really want special handing for something, it would be better for them to put that first, like

macro_rules! check {
    (not $a:literal) => {};
    ($e:expr) => {};
}

so that if one day we did add not false as supported syntax, it would keep working.

(Maybe that implies that for each macro matcher we'd have some "start set", like can_begin_expr? And then it's only adding something new to the matcher's start set that would be a breaking change, for which we'd need an edition change or a new matcher.)

Noratrieb · 2022-10-26T04:25:29Z

The matching logic uses the starting set already (presumably as a fast path)
EDIT: Not just as a fast path, it's important semantically

fmease · 2022-10-26T14:29:24Z

My ideal solution (if we can find a way to do it without breaking the world) would be to make the second arm in […] useless

We could definitely do that for decl macros 2.0 since they are unstable. For decl macros 1.2, we could start issuing future incompat warnings.

I wonder if that'd make them less powerful.

Noratrieb · 2022-10-26T18:41:44Z

I did some more investigation on this. This thread contains quite some confusion about the actually implemented matching semantics (mostly because of myself) but I think I can now clean this up.

The currently implemented macro matching semantics

One arm is tried after another in declaration order. If matching or an arm fails with the Failure error, the next arm is tried. When no more arms are left, the macro matching failed and an error is reported. If an arm fails with the Error/ErrorReported error (the difference between the two is an implementation detail and does not matter here), the macro matching is considered failed, an error is reported and no further arms are tried.

When encountering a nonterminal specifier, a check is done whether the first token can start this nonterminal kind. If it cannot, the nonterminal is not entered (which can either lead to a Failure later as the matcher is out of matchers, or work just fine, for example with $( $_:literal)? struct matching struct). If the first token can start this nonterminal, the parser is invoked to parse the current tokens as that nonterminal.

The parser will then parse as many tokens as necessary into that nonterminal. Importantly, it may not consume all input if it deems the further grammar to be invalid. For example, NOT 1 is successfully parsed as the NOT identifier, not consuming the 1 yet.

If the parser fails to parse the nonterminal, a fatal error is emitted and ErrorReported is returned:

rust/compiler/rustc_expand/src/mbe/macro_parser.rs

Lines 613 to 621 in 0a6b941

    
           let nt = match parser.to_mut().parse_nonterminal(kind) { 
        
               Err(mut err) => { 
        
                   err.span_label( 
        
                       span, 
        
                       format!( 
        
                           "while parsing argument for this `{kind}` macro fragment" 
        
                       ), 
        
                   ) 
        
                   .emit();

If, after all matchers have been exhausted, there are still tokens around in the input, the matching is considered a Failure. This also applies the other way around, if there are matchers remaining but all tokens consumed.

With these semantics, this code should indeed not compile, as the expression grammar requires a { after the scrutinee expression:

macro_rules! what {
    ($e:expr) => { compile_error!("no") };
    ($($tt:tt)*) => {};
}

what! {
    match 1;
}

On the other hand, this code works just fine:

macro_rules! check {
    ($e:expr) => {};
    (NOT $a:literal) => {};
}

check! { NOT 1 }

The expression parser is invoked with NOT 1, because an identifier can start an expression. It then parses the NOT identifier as an identifier. It looks ahead to see whether the expression continues, but a literal cannot follow after an expression, so it deems its parsing to be completed and returns a successful result, leaving 1 behind in the token stream. After the expression, the matchers are over, but there are still tokens remaining in the input stream. This is a Failure error condition, so it goes to the next arm, which then matches successfully.

Where this goes wrong

macro_rules! check {
    ($e:expr) => {};
    (not $a:literal) => {};
}

check! { not 1 }

This should be equivalent to NOT example. But this is where the parser recovery comes in. After successfully parsing a not identifier, the parser eagerly looks at the next token to see whether it can start an expression. If it can, report and error and recover into !expr. In the correct Rust grammar, this can never happen in valid code, so this is fully legal for it to do normally.

rust/compiler/rustc_parse/src/parser/expr.rs

Lines 618 to 620 in 0a6b941

    
           token::Ident(..) if this.is_mistaken_not_ident_negation() => { 
        
               make_it!(this, attrs, |this, _| this.recover_not_expr(lo)) 
        
           }

But this breaks the semantics described above. During nonterminal parsing, the parser must not consume more tokens than necessary, as its early return while leaving tokens behind is an important part of matching nonterminals.

What now

Based on all of this, I have two conclusions. Firstly, I believe that the current semantics make sense and should continue working like this. I would still prefer this being discussed in the lang team anyways just to make sure.

Also, it is clear that doing eager recovery by consuming more tokens than necessary should not happen in the nonterminal parser, ever. So the effort started by #103544 should continue, as this is clearly a bug.

…ler-errors Add flag to forbid recovery in the parser To start the effort of fixing rust-lang#103534, this adds a new flag to the parser, which forbids the parser from doing recovery, which it shouldn't do in macros. This doesn't add any new checks for recoveries yet and is just here to bikeshed the names for the functions here before doing more. r? `@compiler-errors`

joshtriplett · 2022-11-01T18:24:23Z

We discussed this in today's @rust-lang/lang meeting.

We'd like both of these examples to work. Making the one with compile_error work seems like it'll take more work, and may need a more detailed proposal. Making the one with not work seems like a bugfix that lang doesn't need to review and approve.

Only do parser recovery on retried macro matching Eager parser recovery can break macros, so we don't do it at first. But when we already know that the macro failed, we can retry it with recovery enabled to still emit useful diagnostics. Helps with rust-lang#103534

fmease · 2022-11-19T23:17:48Z

I wonder if we should take a different approach for fixing this issue instead of using .may_recover(). Bear in mind that I am not 100% familiar with the parser & macro matching internals, so the following idea might not be workable.

What about changing PResult / adding a flag to the parser to keep track of whether a recovery occured. If it did then the token stream would not actually match a given macro matcher (hmm, but that would probably mean we need to backtrack somehow, unclear). This might introduce a performance penalty.

This way, we can still show a custom error message without making the syntactically malformed input well-formed. Right now for example, m! { fn<>() } (where m expects a ty) yields the error expected ( found < instead of function pointer types may not have generic parameters (#104223) as we use .may_recover() for this case.

Just throwing my thoughts out there.

Noratrieb · 2022-11-20T08:22:12Z

In some cases this may work, but not always. Looking at the not 0 example, the parser needs to be edited. The parser looked ahead and eagerly recovers, emits an error and returns success. So even if the matcher then knew that recovery occurred (and the parser would somehow need to keep track of that, so we'd still need to annotate recovery) the session has the emitted error anyways.

But I agree that may_recover isn't great, but I haven't found anything better yet.

Only do parser recovery on retried macro matching Eager parser recovery can break macros, so we don't do it at first. But when we already know that the macro failed, we can retry it with recovery enabled to still emit useful diagnostics. Helps with rust-lang#103534

…iser Only suggest turbofish in patterns if we may recover Fixes [after backport] rust-lang#115780. CC rust-lang#103534.

fmease added the C-bug Category: This is a bug. label Oct 25, 2022

fmease changed the title ~~Multiple cases where parse error recovery is obversable by macros~~ Parse error recovery is obversable by macros in several cases Oct 25, 2022

compiler-errors added the I-lang-nominated Nominated for discussion during a lang team meeting. label Oct 25, 2022

Noratrieb mentioned this issue Oct 25, 2022

Add flag to forbid recovery in the parser #103544

Merged

fmease mentioned this issue Oct 27, 2022

Parser crashes after encountering invalid variable declaration in macro call #103529

Closed

rustbot removed the I-prioritize Issue: Indicates that prioritization has been requested for this issue. label Oct 28, 2022

joshtriplett removed the I-lang-nominated Nominated for discussion during a lang team meeting. label Nov 8, 2022

Noratrieb mentioned this issue Nov 12, 2022

Only do parser recovery on retried macro matching #104335

Merged

fmease mentioned this issue Nov 17, 2022

Provide a better error and a suggestion for Fn traits with lifetime params #104531

Merged

Noratrieb mentioned this issue Jan 6, 2023

in declarative macros, "literal" matching arm can throw compiler error rather than matching or failing to match #106504

Closed

This was referenced Jan 11, 2023

make error emitted on impl &Trait nicer #106712

Merged

Teach parser to understand fake anonymous enum syntax #106960

Merged

fmease mentioned this issue Mar 12, 2023

macro parses let keyword as pattern and fails + masked regression #109045

Open

fmease mentioned this issue Sep 12, 2023

Only suggest turbofish in patterns if we may recover #115785

Merged

bors added a commit to rust-lang-ci/rust that referenced this issue Sep 13, 2023

Auto merge of rust-lang#115785 - fmease:fix-pat-regression, r=wesleyw…

367db83

…iser Only suggest turbofish in patterns if we may recover Fixes [after backport] rust-lang#115780. CC rust-lang#103534.

fmease mentioned this issue Feb 29, 2024

Confusing error messages when a pattern match contains .foo() expressions #107523

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parse error recovery is obversable by macros in several cases #103534

Parse error recovery is obversable by macros in several cases #103534

fmease commented Oct 25, 2022 •

edited

Loading

compiler-errors commented Oct 25, 2022

Noratrieb commented Oct 25, 2022

compiler-errors commented Oct 25, 2022 •

edited

Loading

Noratrieb commented Oct 25, 2022 •

edited

Loading

compiler-errors commented Oct 25, 2022

compiler-errors commented Oct 25, 2022

scottmcm commented Oct 26, 2022

Noratrieb commented Oct 26, 2022 •

edited

Loading

fmease commented Oct 26, 2022

Noratrieb commented Oct 26, 2022 •

edited

Loading

joshtriplett commented Nov 1, 2022

fmease commented Nov 19, 2022 •

edited

Loading

Noratrieb commented Nov 20, 2022

Parse error recovery is obversable by macros in several cases #103534

Parse error recovery is obversable by macros in several cases #103534

Comments

fmease commented Oct 25, 2022 • edited Loading

Introduction

Why Is This a Concern?

Examples

Expressions

Statements

Other Fragments (e.g. Items, Types)

Editorial Notes

compiler-errors commented Oct 25, 2022

Noratrieb commented Oct 25, 2022

compiler-errors commented Oct 25, 2022 • edited Loading

Noratrieb commented Oct 25, 2022 • edited Loading

compiler-errors commented Oct 25, 2022

compiler-errors commented Oct 25, 2022

scottmcm commented Oct 26, 2022

Noratrieb commented Oct 26, 2022 • edited Loading

fmease commented Oct 26, 2022

Noratrieb commented Oct 26, 2022 • edited Loading

The currently implemented macro matching semantics

Where this goes wrong

What now

joshtriplett commented Nov 1, 2022

fmease commented Nov 19, 2022 • edited Loading

Noratrieb commented Nov 20, 2022

fmease commented Oct 25, 2022 •

edited

Loading

compiler-errors commented Oct 25, 2022 •

edited

Loading

Noratrieb commented Oct 25, 2022 •

edited

Loading

Noratrieb commented Oct 26, 2022 •

edited

Loading

Noratrieb commented Oct 26, 2022 •

edited

Loading

fmease commented Nov 19, 2022 •

edited

Loading