Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DotLambda: add parser recovery #16238

Merged
merged 2 commits into from
Nov 14, 2023
Merged

Conversation

DedSec256
Copy link
Contributor

This PR adds parser recovery for the SynExpr.DotLambda for cases _. and _

Special thanks to @auduchinok for the consultation.

@DedSec256 DedSec256 requested a review from a team as a code owner November 7, 2023 19:40
src/Compiler/pars.fsy Outdated Show resolved Hide resolved
tests/service/ParserTests.fs Outdated Show resolved Hide resolved
src/Compiler/pars.fsy Outdated Show resolved Hide resolved
Copy link
Member

@T-Gro T-Gro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this change.
I would also prefer to have these tests as syntaxtree tests.

Reason is, if anything changes in the SyntaxTree, the baselines can be easily re-created, whereas specific assertions need to be manually checked.

(which is not a big deal when there is 2 of them, but it is when we have hundreds)

@DedSec256 DedSec256 force-pushed the ber.a/dotLambda-recovery branch from 4fdcfc4 to 601f0d0 Compare November 8, 2023 18:44
@DedSec256
Copy link
Contributor Author

Interesting observations on the commit 27e8d65:

After adding recovery, in the following invalid code DotLambdas began to be parsed

let a = ( upcast _ ) : obj

image

let b = ( _ :> _ ) : obj

image

let c = ( _ :> obj)

image

Also parens expressions began to be parsed more correctly.

src/Compiler/pars.fsy Outdated Show resolved Hide resolved
src/Compiler/pars.fsy Outdated Show resolved Hide resolved
@T-Gro
Copy link
Member

T-Gro commented Nov 9, 2023

@DedSec256 :

Seeing the newly appeared _ recoveries, I would be inclined to only do a recovery for UNDERSCORE DOT .., but not for a standalone UNDERSCORE only.
Chance of breaking something existing (also precedence rules) would decrease dramatically.

@T-Gro T-Gro self-requested a review November 9, 2023 10:59
@auduchinok
Copy link
Member

auduchinok commented Nov 9, 2023

Seeing the newly appeared _ recoveries, I would be inclined to only do a recovery for UNDERSCORE DOT .., but not for a standalone UNDERSCORE only.

Why, though? There're still errors on the incorrect code, so we aren't changing how any correct code is parsed, but surroundings are much better parsed in all the unfinished code examples.

The purpose of the parser recovery is to parse as much code in the most correct way as we can, so all the analysis works with the best possible tree. We don't have any ambiguity here, so it's clear that _ is an unfinished dot lambda, and being able to analyze things like parens or other pars of unfinished code is crucial for the tooling.

@T-Gro
Copy link
Member

T-Gro commented Nov 9, 2023

Seeing the newly appeared _ recoveries, I would be inclined to only do a recovery for UNDERSCORE DOT .., but not for a standalone UNDERSCORE only.

Why, though? There're still errors on the incorrect code, so we aren't changing how any correct code is parsed, but surroundings are much better parsed in all the unfinished code examples.

The purpose of the parser recovery is to parse as much code in the most correct way as we can, so all the analysis works with the best possible tree. We don't have any ambiguity here, so it's clear that _ is an unfinished dot lambda, and being able to analyze things like parens or other pars of unfinished code is crucial for the tooling.

I do not see a standalone _ as a lambda, e.g. in this example let a = ( upcast _ ) : obj from @DedSec256 above.

@auduchinok
Copy link
Member

auduchinok commented Nov 9, 2023

I do not see a standalone _ as a lambda, e.g. in this example let a = ( upcast _ ) : obj from @DedSec256 above.

Yes, but it should be something parseable when possible. And here it's unambiguously an unfinished dot lambda, as it's the only kind of an expression that could start with _. When there're more cases in the language it's possible to update these recovery rules, so it's not a problem.

Even looking at the same examples, it's clear that none of upcast or parens expressions were parsed prior to this change. It means that as soon as you type _, you get completely broken analysis in the tooling for some part of the tree, even though it could be parsed correctly prior the typing. This PR improves this a lot.

It's the same kind of recovery as in the most of these PRs and it's what makes a much better IDE experience.

It's OK to make assumptions about what an unfinished code may become after the the user types more into the editor. Not doing that makes analysis of surrounding code break. Consider interface recovery in #15943. We don't know whether it's going to be an interface ... end type representation or whether it's an interface ... with member declaration. But it doesn't matter much as long as we can provide improved tooling, and being able to parse it as something is the crucial part here.

@T-Gro
Copy link
Member

T-Gro commented Nov 9, 2023

... And here it's unambiguously an unfinished dot lambda, as it's the only kind of an expression that could start with _. When there're more cases in the language it's possible to update these recovery rules, so it's not a problem...

Couldn't it also be an unfinished identifier starting with an underscore?
What is what I would expect here, at least without the DotLambda feature in place.

@auduchinok
Copy link
Member

auduchinok commented Nov 9, 2023

Couldn't it also be an unfinished identifier starting with an underscore?

It's a good example, thanks! I think would be much better to produce an identifier expression indeed. It's still going to be parsed as an atomic expression, so everything seems good. The downside is it'd require more checks here and there about _ not actually being a valid identifier.

@auduchinok
Copy link
Member

auduchinok commented Nov 9, 2023

Another option is to produce a generic error expression node covering _ range. It's maybe even a better option.

So we have options to produce a dot lambda, an identifier, and a generic error node.

I think the choice should be made based on these things:

  • how many checks we need to add/update to make it ignored in the type checking
  • how many changes we need in features like code completion for it to work properly, i.e. suggesting names prefixed with _

@T-Gro
Copy link
Member

T-Gro commented Nov 9, 2023

If it were "just parsed" as an _ identifier, leaving potential error for typechecking.
Wouldn't it then cover all the needs, including completion working OOB for items prefixed with _ ?

@auduchinok
Copy link
Member

If it were "just parsed" as an _ identifier, leaving potential error for typechecking.
Wouldn't it then cover all the needs, including completion working OOB for items prefixed with _ ?

That's what I'd hope for! 🙂

@auduchinok
Copy link
Member

auduchinok commented Nov 9, 2023

If it were "just parsed" as an _ identifier, leaving potential error for typechecking.
Wouldn't it then cover all the needs, including completion working OOB for items prefixed with _ ?

That's what I'd hope for! 🙂

The problem I see is it'd be really nice if we the tree could also show that this is an error node somehow in this case. Producing just an identifier moves the responsibility to various features to check it at a later stage, potentially duplicating code. The more I think about it, the more I'd prefer it to be parsed as a dotLambda (or an error, in the worst case) if features like code completion keep working, at least for now.

Maybe I'd suggest to keep it as is in this PR if we know that there's no regression in code completion for _ prefix, and do another iteration later, when we're not constrained by a close release date (i.e. we still would like to cherry-pick these changes into our release)? I think it needs a bit more thoughtful consideration and testing here.

@auduchinok
Copy link
Member

Maybe SynExpr.FromParseError(SynExpr.Named(...)) would work, but it needs testing...

@DedSec256 DedSec256 force-pushed the ber.a/dotLambda-recovery branch from 27e8d65 to 8128b34 Compare November 13, 2023 16:53
src/Compiler/SyntaxTree/SyntaxTrivia.fs Outdated Show resolved Hide resolved
src/Compiler/pars.fsy Outdated Show resolved Hide resolved
@DedSec256 DedSec256 force-pushed the ber.a/dotLambda-recovery branch from 2918a7f to 6a16eaa Compare November 13, 2023 19:20
@T-Gro
Copy link
Member

T-Gro commented Nov 14, 2023

@DedSec256 : Thanks for the changes! Are we ready to merge here?

@psfinaki psfinaki merged commit 52d98fe into dotnet:main Nov 14, 2023
24 checks passed
@DedSec256 DedSec256 deleted the ber.a/dotLambda-recovery branch November 14, 2023 14:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

4 participants