Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Start avoiding parse diagnostics on error tokens #4431

Merged
merged 10 commits into from
Nov 2, 2024

Conversation

chandlerc
Copy link
Contributor

@chandlerc chandlerc commented Oct 20, 2024

An invalid parse due to an error token isn't likely a great diagnostic
as it will already have been diagnosed by the lexer. A common case to
start handling that is when the parser encounters an invalid token when
expecting an expression.

This removes a number of unhelpful diagnostics after the lexer has done
a good job diagnosing.

This also means that there may be parse tree errors that aren't
diagnosed when there are lexer-diagnosed errors, so track that.

Follow-up to #4430 that almost finishes addressing its diagnostic TODO.

Copy link
Contributor

@jonmeow jonmeow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems good, but to be sure, you're doing this in a way that's specific to expressions. I think this'd miss, for example, declarations.

Had you considered adding support to Emit() in order to elide diagnostics where the location is an error token?

Comment on lines 214 to 217
// Fallthrough to the error token case -- we don't need to diagnose those.
[[fallthrough]];
}
case Lex::TokenKind::Error: {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not used to seeing a non-last default case (enough that I wouldn't have expected this to work), but maybe others are more versed with that structure? Had you considered writing this as an if instead of fallthrough, e.g.: default: if (token_kind != Lex::TokenKind::Error) { ...Emit... }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤷 The structure didn't give me any pause, but we've established that I'm not necessarily representative there...

I had tried both ways of writing it, but the switch seemed better. Using an if is a bit awkward as you have to dig the kind back out and then re-test it. As we're already testing it for the switch, and there is a natural fallthrough structure, it seemed clean to use that instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤷 The structure didn't give me any pause, but we've established that I'm not necessarily representative there...

I had tried both ways of writing it, but the switch seemed better. Using an if is a bit awkward as you have to dig the kind back out and then re-test it. As we're already testing it for the switch, and there is a natural fallthrough structure, it seemed clean to use that instead.

What do you mean by digging it back out? Couldn't you add auto token_kind = in the switch statement?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, yes, I could also store it.

But it still ends up with an awkward thing where every other value is handled by a case, but this one isn't.

If you feel strongly that using fallthrough isn't OK here, I can change it I guess? Didn't seem like a big thing either way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm really having trouble. #style to see if it's just me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No stress, this really wasn't an important one, I'll switch it to the other form. Was really just trying to understand if it was just surprise or causing more trouble. It's slightly awkward to use the if, but as you say, very slight so seems easily outweighed given this isn't working at all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Back to an if, with the suggested formulation.

toolchain/parse/handle_expr.cpp Outdated Show resolved Hide resolved
@chandlerc
Copy link
Contributor Author

Seems good, but to be sure, you're doing this in a way that's specific to expressions. I think this'd miss, for example, declarations.

Had you considered adding support to Emit() in order to elide diagnostics where the location is an error token?

I don't think we necessarily want to always elide a diagnostic because the location is an error token... That seems like a fairly subtle coupling between the location's kind and the diagnostic behavior. It seems more clear to explicitly control the whether to emit the diagnostic based on whether there is some already-diagnosed error.

It does mean we'll need to add support in other places, but I would somewhat want to consider in that place whether the diagnostic makes sense or not, and also whether or what recovery would be best given an error token. Even if we end up making similar choices, the context of teh choice seems relevant, so I wouldn't necessarily factor it until/unless we find some more underlying pattern we want to model with that factoring?

That said, some of my thinking is just initial thinking here. I've only really looked at the expression case so far, so I'm more comfortable confining the change to that. When we get to other cases, can always revisit?

@jonmeow
Copy link
Contributor

jonmeow commented Oct 24, 2024

Seems good, but to be sure, you're doing this in a way that's specific to expressions. I think this'd miss, for example, declarations.
Had you considered adding support to Emit() in order to elide diagnostics where the location is an error token?

I don't think we necessarily want to always elide a diagnostic because the location is an error token... That seems like a fairly subtle coupling between the location's kind and the diagnostic behavior. It seems more clear to explicitly control the whether to emit the diagnostic based on whether there is some already-diagnosed error.

It does mean we'll need to add support in other places, but I would somewhat want to consider in that place whether the diagnostic makes sense or not, and also whether or what recovery would be best given an error token. Even if we end up making similar choices, the context of teh choice seems relevant, so I wouldn't necessarily factor it until/unless we find some more underlying pattern we want to model with that factoring?

That said, some of my thinking is just initial thinking here. I've only really looked at the expression case so far, so I'm more comfortable confining the change to that. When we get to other cases, can always revisit?

Okay, but I'll note, my assumption would've been in the opposite direction -- do the more generic thing, and revisit if it has a problem.

@chandlerc
Copy link
Contributor Author

Okay, but I'll note, my assumption would've been in the opposite direction -- do the more generic thing, and revisit if it has a problem.

To be clear, for me the bigger thing is sinking that commonality into Emit -- I think that making this conditionally emit based on the location token kind doesn't seem like a great API. If we want a generic thing, I think we should build something more dedicated to that, and so it seemed more like building a new generic thing rather than using an existing one.

I also asked @zygoloid to take a look though, maybe he has other thoughts here.

This rejects type literals with more digits than we can lex without
APInt's help, and using a custom diagnostic. This is a pretty arbitrary
implementation limit, I'm wide open to even more strict rules here.

Despite no special casing and a very simplistic approach, by not using
APInt this completely eliminates the lexing overhead for `i32` in the
generated compilation benchmark where that specific type literal is very
common. We see a 10% improvement in lexing there:
```
BM_CompileAPIFileDenseDecls<Phase::Lex>/256        39.0µs ± 4%  34.8µs ± 2%  -10.86%  (p=0.000 n=19+20)
BM_CompileAPIFileDenseDecls<Phase::Lex>/1024        180µs ± 1%   158µs ± 2%  -12.22%  (p=0.000 n=18+20)
BM_CompileAPIFileDenseDecls<Phase::Lex>/4096        731µs ± 2%   641µs ± 1%  -12.31%  (p=0.000 n=18+20)
BM_CompileAPIFileDenseDecls<Phase::Lex>/16384      3.20ms ± 2%  2.86ms ± 2%  -10.47%  (p=0.000 n=18+19)
BM_CompileAPIFileDenseDecls<Phase::Lex>/65536      13.8ms ± 1%  12.4ms ± 2%   -9.78%  (p=0.000 n=18+19)
BM_CompileAPIFileDenseDecls<Phase::Lex>/262144     64.0ms ± 2%  58.4ms ± 2%   -8.70%  (p=0.000 n=19+18)
```

This starts to fix a TODO in the diagnostic for these by giving
a reasonably good diagnostic about a very large type literal. However,
in practice it regresses the diagnostics because error tokens produce
noisy extraneous diagnostics from parse and check currently. Leaving the
TODO there, and I have a follow-up PR to start improving the extraneous
diagnostics.
An invalid parse due to an error token isn't likely a great diagnostic
as it will already have been diagnosed by the lexer. A common case to
start handling that is when the parser encounters an invalid token when
expecting an expression.

This removes a number of unhelpful diagnostics after the lexer has done
a good job diagnosing.

This also means that there may be parse tree errors that aren't
diagnosed when there are lexer-diagnosed errors, so track that.
@zygoloid
Copy link
Contributor

I also asked @zygoloid to take a look though, maybe he has other thoughts here.

Some discussion of this PR moved to discord, and I wrote up some of my thoughts there.

Copy link
Contributor

@jonmeow jonmeow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving. I trust you'll adjust the case, and I assume any commonality from zygoloid's comment will be handled separately (if there's anything to do right now)

@chandlerc
Copy link
Contributor Author

Thanks, and merging with the fixed case structure.

@chandlerc chandlerc added this pull request to the merge queue Nov 2, 2024
Merged via the queue into carbon-language:trunk with commit 1b2eb42 Nov 2, 2024
8 checks passed
@chandlerc chandlerc deleted the skip-error-expr branch November 2, 2024 06:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants