Start avoiding parse diagnostics on error tokens #4431

chandlerc · 2024-10-20T08:40:41Z

An invalid parse due to an error token isn't likely a great diagnostic
as it will already have been diagnosed by the lexer. A common case to
start handling that is when the parser encounters an invalid token when
expecting an expression.

This removes a number of unhelpful diagnostics after the lexer has done
a good job diagnosing.

This also means that there may be parse tree errors that aren't
diagnosed when there are lexer-diagnosed errors, so track that.

Follow-up to #4430 that almost finishes addressing its diagnostic TODO.

jonmeow

Seems good, but to be sure, you're doing this in a way that's specific to expressions. I think this'd miss, for example, declarations.

Had you considered adding support to Emit() in order to elide diagnostics where the location is an error token?

jonmeow · 2024-10-21T18:34:52Z

toolchain/parse/handle_expr.cpp

+      // Fallthrough to the error token case -- we don't need to diagnose those.
+      [[fallthrough]];
+    }
+    case Lex::TokenKind::Error: {


I'm not used to seeing a non-last default case (enough that I wouldn't have expected this to work), but maybe others are more versed with that structure? Had you considered writing this as an if instead of fallthrough, e.g.: default: if (token_kind != Lex::TokenKind::Error) { ...Emit... }

🤷 The structure didn't give me any pause, but we've established that I'm not necessarily representative there...

I had tried both ways of writing it, but the switch seemed better. Using an if is a bit awkward as you have to dig the kind back out and then re-test it. As we're already testing it for the switch, and there is a natural fallthrough structure, it seemed clean to use that instead.

🤷 The structure didn't give me any pause, but we've established that I'm not necessarily representative there...

I had tried both ways of writing it, but the switch seemed better. Using an if is a bit awkward as you have to dig the kind back out and then re-test it. As we're already testing it for the switch, and there is a natural fallthrough structure, it seemed clean to use that instead.

What do you mean by digging it back out? Couldn't you add auto token_kind = in the switch statement?

I mean, yes, I could also store it.

But it still ends up with an awkward thing where every other value is handled by a case, but this one isn't.

If you feel strongly that using fallthrough isn't OK here, I can change it I guess? Didn't seem like a big thing either way.

I'm really having trouble. #style to see if it's just me.

No stress, this really wasn't an important one, I'll switch it to the other form. Was really just trying to understand if it was just surprise or causing more trouble. It's slightly awkward to use the if, but as you say, very slight so seems easily outweighed given this isn't working at all.

Back to an if, with the suggested formulation.

toolchain/parse/handle_expr.cpp

chandlerc · 2024-10-24T07:25:48Z

Seems good, but to be sure, you're doing this in a way that's specific to expressions. I think this'd miss, for example, declarations.

Had you considered adding support to Emit() in order to elide diagnostics where the location is an error token?

I don't think we necessarily want to always elide a diagnostic because the location is an error token... That seems like a fairly subtle coupling between the location's kind and the diagnostic behavior. It seems more clear to explicitly control the whether to emit the diagnostic based on whether there is some already-diagnosed error.

It does mean we'll need to add support in other places, but I would somewhat want to consider in that place whether the diagnostic makes sense or not, and also whether or what recovery would be best given an error token. Even if we end up making similar choices, the context of teh choice seems relevant, so I wouldn't necessarily factor it until/unless we find some more underlying pattern we want to model with that factoring?

That said, some of my thinking is just initial thinking here. I've only really looked at the expression case so far, so I'm more comfortable confining the change to that. When we get to other cases, can always revisit?

jonmeow · 2024-10-24T16:30:07Z

Seems good, but to be sure, you're doing this in a way that's specific to expressions. I think this'd miss, for example, declarations.
Had you considered adding support to Emit() in order to elide diagnostics where the location is an error token?

I don't think we necessarily want to always elide a diagnostic because the location is an error token... That seems like a fairly subtle coupling between the location's kind and the diagnostic behavior. It seems more clear to explicitly control the whether to emit the diagnostic based on whether there is some already-diagnosed error.

It does mean we'll need to add support in other places, but I would somewhat want to consider in that place whether the diagnostic makes sense or not, and also whether or what recovery would be best given an error token. Even if we end up making similar choices, the context of teh choice seems relevant, so I wouldn't necessarily factor it until/unless we find some more underlying pattern we want to model with that factoring?

That said, some of my thinking is just initial thinking here. I've only really looked at the expression case so far, so I'm more comfortable confining the change to that. When we get to other cases, can always revisit?

Okay, but I'll note, my assumption would've been in the opposite direction -- do the more generic thing, and revisit if it has a problem.

chandlerc · 2024-10-24T16:33:29Z

Okay, but I'll note, my assumption would've been in the opposite direction -- do the more generic thing, and revisit if it has a problem.

To be clear, for me the bigger thing is sinking that commonality into Emit -- I think that making this conditionally emit based on the location token kind doesn't seem like a great API. If we want a generic thing, I think we should build something more dedicated to that, and so it seemed more like building a new generic thing rather than using an existing one.

I also asked @zygoloid to take a look though, maybe he has other thoughts here.

This rejects type literals with more digits than we can lex without APInt's help, and using a custom diagnostic. This is a pretty arbitrary implementation limit, I'm wide open to even more strict rules here. Despite no special casing and a very simplistic approach, by not using APInt this completely eliminates the lexing overhead for `i32` in the generated compilation benchmark where that specific type literal is very common. We see a 10% improvement in lexing there: ``` BM_CompileAPIFileDenseDecls<Phase::Lex>/256 39.0µs ± 4% 34.8µs ± 2% -10.86% (p=0.000 n=19+20) BM_CompileAPIFileDenseDecls<Phase::Lex>/1024 180µs ± 1% 158µs ± 2% -12.22% (p=0.000 n=18+20) BM_CompileAPIFileDenseDecls<Phase::Lex>/4096 731µs ± 2% 641µs ± 1% -12.31% (p=0.000 n=18+20) BM_CompileAPIFileDenseDecls<Phase::Lex>/16384 3.20ms ± 2% 2.86ms ± 2% -10.47% (p=0.000 n=18+19) BM_CompileAPIFileDenseDecls<Phase::Lex>/65536 13.8ms ± 1% 12.4ms ± 2% -9.78% (p=0.000 n=18+19) BM_CompileAPIFileDenseDecls<Phase::Lex>/262144 64.0ms ± 2% 58.4ms ± 2% -8.70% (p=0.000 n=19+18) ``` This starts to fix a TODO in the diagnostic for these by giving a reasonably good diagnostic about a very large type literal. However, in practice it regresses the diagnostics because error tokens produce noisy extraneous diagnostics from parse and check currently. Leaving the TODO there, and I have a follow-up PR to start improving the extraneous diagnostics.

An invalid parse due to an error token isn't likely a great diagnostic as it will already have been diagnosed by the lexer. A common case to start handling that is when the parser encounters an invalid token when expecting an expression. This removes a number of unhelpful diagnostics after the lexer has done a good job diagnosing. This also means that there may be parse tree errors that aren't diagnosed when there are lexer-diagnosed errors, so track that.

zygoloid · 2024-10-25T16:45:58Z

I also asked @zygoloid to take a look though, maybe he has other thoughts here.

Some discussion of this PR moved to discord, and I wrote up some of my thoughts there.

jonmeow

Approving. I trust you'll adjust the case, and I assume any commonality from zygoloid's comment will be handled separately (if there's anything to do right now)

chandlerc · 2024-11-02T05:48:08Z

Thanks, and merging with the fixed case structure.

github-actions bot requested a review from jonmeow October 20, 2024 08:40

github-actions bot added the toolchain label Oct 20, 2024

jonmeow reviewed Oct 21, 2024

View reviewed changes

chandlerc added 5 commits October 24, 2024 20:43

improve testing and implementation

a4721fd

format

c40cc9f

swich back to a loop as from_chars is missing in Clang 16

6329d91

chandlerc force-pushed the skip-error-expr branch from 70c7987 to 144ae9c Compare October 24, 2024 23:17

jonmeow approved these changes Oct 25, 2024

View reviewed changes

chandlerc added 5 commits November 2, 2024 04:56

Merge branch 'fast-ints' into skip-error-expr

abb5e87

Merge branch 'trunk' into skip-error-expr

4718bca

address review feedback

458d433

format

dca1a2b

fix variable name

80d4820

chandlerc enabled auto-merge November 2, 2024 05:48

chandlerc added this pull request to the merge queue Nov 2, 2024

Merged via the queue into carbon-language:trunk with commit 1b2eb42 Nov 2, 2024
8 checks passed

chandlerc deleted the skip-error-expr branch November 2, 2024 06:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Start avoiding parse diagnostics on error tokens #4431

Start avoiding parse diagnostics on error tokens #4431

chandlerc commented Oct 20, 2024 •

edited

Loading

jonmeow left a comment

jonmeow Oct 21, 2024

chandlerc Oct 24, 2024

jonmeow Oct 24, 2024

chandlerc Oct 24, 2024

jonmeow Oct 25, 2024

chandlerc Oct 25, 2024

chandlerc Nov 2, 2024

chandlerc commented Oct 24, 2024

jonmeow commented Oct 24, 2024

chandlerc commented Oct 24, 2024

zygoloid commented Oct 25, 2024

jonmeow left a comment

chandlerc commented Nov 2, 2024

Start avoiding parse diagnostics on error tokens #4431

Start avoiding parse diagnostics on error tokens #4431

Conversation

chandlerc commented Oct 20, 2024 • edited Loading

jonmeow left a comment

Choose a reason for hiding this comment

jonmeow Oct 21, 2024

Choose a reason for hiding this comment

chandlerc Oct 24, 2024

Choose a reason for hiding this comment

jonmeow Oct 24, 2024

Choose a reason for hiding this comment

chandlerc Oct 24, 2024

Choose a reason for hiding this comment

jonmeow Oct 25, 2024

Choose a reason for hiding this comment

chandlerc Oct 25, 2024

Choose a reason for hiding this comment

chandlerc Nov 2, 2024

Choose a reason for hiding this comment

chandlerc commented Oct 24, 2024

jonmeow commented Oct 24, 2024

chandlerc commented Oct 24, 2024

zygoloid commented Oct 25, 2024

jonmeow left a comment

Choose a reason for hiding this comment

chandlerc commented Nov 2, 2024

chandlerc commented Oct 20, 2024 •

edited

Loading