-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Start avoiding parse diagnostics on error tokens #4431
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems good, but to be sure, you're doing this in a way that's specific to expressions. I think this'd miss, for example, declarations.
Had you considered adding support to Emit()
in order to elide diagnostics where the location is an error token?
toolchain/parse/handle_expr.cpp
Outdated
// Fallthrough to the error token case -- we don't need to diagnose those. | ||
[[fallthrough]]; | ||
} | ||
case Lex::TokenKind::Error: { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not used to seeing a non-last default
case (enough that I wouldn't have expected this to work), but maybe others are more versed with that structure? Had you considered writing this as an if
instead of fallthrough
, e.g.: default: if (token_kind != Lex::TokenKind::Error) { ...Emit... }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤷 The structure didn't give me any pause, but we've established that I'm not necessarily representative there...
I had tried both ways of writing it, but the switch seemed better. Using an if is a bit awkward as you have to dig the kind back out and then re-test it. As we're already testing it for the switch, and there is a natural fallthrough structure, it seemed clean to use that instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤷 The structure didn't give me any pause, but we've established that I'm not necessarily representative there...
I had tried both ways of writing it, but the switch seemed better. Using an if is a bit awkward as you have to dig the kind back out and then re-test it. As we're already testing it for the switch, and there is a natural fallthrough structure, it seemed clean to use that instead.
What do you mean by digging it back out? Couldn't you add auto token_kind =
in the switch
statement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, yes, I could also store it.
But it still ends up with an awkward thing where every other value is handled by a case, but this one isn't.
If you feel strongly that using fallthrough isn't OK here, I can change it I guess? Didn't seem like a big thing either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm really having trouble. #style to see if it's just me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No stress, this really wasn't an important one, I'll switch it to the other form. Was really just trying to understand if it was just surprise or causing more trouble. It's slightly awkward to use the if, but as you say, very slight so seems easily outweighed given this isn't working at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Back to an if
, with the suggested formulation.
I don't think we necessarily want to always elide a diagnostic because the location is an error token... That seems like a fairly subtle coupling between the location's kind and the diagnostic behavior. It seems more clear to explicitly control the whether to emit the diagnostic based on whether there is some already-diagnosed error. It does mean we'll need to add support in other places, but I would somewhat want to consider in that place whether the diagnostic makes sense or not, and also whether or what recovery would be best given an error token. Even if we end up making similar choices, the context of teh choice seems relevant, so I wouldn't necessarily factor it until/unless we find some more underlying pattern we want to model with that factoring? That said, some of my thinking is just initial thinking here. I've only really looked at the expression case so far, so I'm more comfortable confining the change to that. When we get to other cases, can always revisit? |
Okay, but I'll note, my assumption would've been in the opposite direction -- do the more generic thing, and revisit if it has a problem. |
To be clear, for me the bigger thing is sinking that commonality into I also asked @zygoloid to take a look though, maybe he has other thoughts here. |
This rejects type literals with more digits than we can lex without APInt's help, and using a custom diagnostic. This is a pretty arbitrary implementation limit, I'm wide open to even more strict rules here. Despite no special casing and a very simplistic approach, by not using APInt this completely eliminates the lexing overhead for `i32` in the generated compilation benchmark where that specific type literal is very common. We see a 10% improvement in lexing there: ``` BM_CompileAPIFileDenseDecls<Phase::Lex>/256 39.0µs ± 4% 34.8µs ± 2% -10.86% (p=0.000 n=19+20) BM_CompileAPIFileDenseDecls<Phase::Lex>/1024 180µs ± 1% 158µs ± 2% -12.22% (p=0.000 n=18+20) BM_CompileAPIFileDenseDecls<Phase::Lex>/4096 731µs ± 2% 641µs ± 1% -12.31% (p=0.000 n=18+20) BM_CompileAPIFileDenseDecls<Phase::Lex>/16384 3.20ms ± 2% 2.86ms ± 2% -10.47% (p=0.000 n=18+19) BM_CompileAPIFileDenseDecls<Phase::Lex>/65536 13.8ms ± 1% 12.4ms ± 2% -9.78% (p=0.000 n=18+19) BM_CompileAPIFileDenseDecls<Phase::Lex>/262144 64.0ms ± 2% 58.4ms ± 2% -8.70% (p=0.000 n=19+18) ``` This starts to fix a TODO in the diagnostic for these by giving a reasonably good diagnostic about a very large type literal. However, in practice it regresses the diagnostics because error tokens produce noisy extraneous diagnostics from parse and check currently. Leaving the TODO there, and I have a follow-up PR to start improving the extraneous diagnostics.
An invalid parse due to an error token isn't likely a great diagnostic as it will already have been diagnosed by the lexer. A common case to start handling that is when the parser encounters an invalid token when expecting an expression. This removes a number of unhelpful diagnostics after the lexer has done a good job diagnosing. This also means that there may be parse tree errors that aren't diagnosed when there are lexer-diagnosed errors, so track that.
70c7987
to
144ae9c
Compare
Some discussion of this PR moved to discord, and I wrote up some of my thoughts there. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving. I trust you'll adjust the case, and I assume any commonality from zygoloid's comment will be handled separately (if there's anything to do right now)
Thanks, and merging with the fixed |
An invalid parse due to an error token isn't likely a great diagnostic
as it will already have been diagnosed by the lexer. A common case to
start handling that is when the parser encounters an invalid token when
expecting an expression.
This removes a number of unhelpful diagnostics after the lexer has done
a good job diagnosing.
This also means that there may be parse tree errors that aren't
diagnosed when there are lexer-diagnosed errors, so track that.
Follow-up to #4430 that almost finishes addressing its diagnostic TODO.