Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

disambiguate fold vs parenthesized assignment #239

Conversation

brandonspark
Copy link
Contributor

Currently, there is an ambiguity in parsing fold expressions and parenthesized assignment expressions. This is described in issues #201 and #212.

This arises in the following trace:

( <id>    =
       ^ where we are here

The crux of the issue is that we do not know whether to shift the = forward, or to reduce the <id> to an expression_not_binary. If we do the former, we can no longer parse a fold expression, which expects <exp> <fold_operator> '...'. If we do the latter, we can no longer parse a parenthesized assignment expression, as an assignment expression does not permit arbitrary expressions to appear on the LHS.

The current behavior is that tree-sitter-cpp will reduce the <id>, and thus fail to parse any parenthesized assignment expression. Unfortunately, these are reasonably common, so this is actually disastrous.

This PR makes it so that parenthesized expressions specifically allow assignment expressions which feature expressions on the LHS, so that it can still parse even if we reduce the identifier. This causes the grammar to be a little more permissive than it needs to be, but it does not disallow valid programs, as another proposed solution to this problem does.

I believe it is better for us to be more permissive, in this case, so that we no longer disallow valid programs, with the understanding that we can restore stringency at a later date.

@aryx aryx requested review from jdrouhard and amaanq January 26, 2024 09:56
grammar.js Outdated Show resolved Hide resolved
@jdrouhard jdrouhard merged commit 4ca37be into tree-sitter:master Jan 26, 2024
1 check failed
@jdrouhard
Copy link
Collaborator

Thanks!

@amaanq
Copy link
Member

amaanq commented Feb 1, 2024

Hey this looks great, this was actually my original idea to fix this but I felt there could've been a better way, since this feels more like a bodge that just happens to work (in my head), but if it's the only solution then it's totally okay.

Though I do have a few nitpicks to point out (and wish we got resolved before merging) - namely not generating with master and the naming/exposing of the lhs rule, I'll just push a fix for that though so no worries.

brandonspark added a commit to semgrep/semgrep that referenced this pull request Feb 3, 2024
## What:
This PR updates both the `semgrep-c` and `semgrep-cpp` modules with the
latest changes from `tree-sitter-c` and `tree-sitter-cpp`.

## Why:
I recently fixed an [upstream tree-sitter
issue](tree-sitter/tree-sitter-cpp#239) in the
C++ grammar. Unfortunately, this means we need to take all of the new
`tree-sitter-cpp` changes to pull it in.

## How:
Lots of grammar hacking.

In several places, I augmented the C AST to be more in line with the CPP
AST. This makes it noticeably less simple, but is required, as the
`tree-sitter-c` grammar becomes more complex. This leads to a gross
amount of code duplication between the C and C++ translation code, but
since the typed CSTs are different at the moment, there isn't a good way
to prevent that.

This PR also makes use of the groundwork I laid down in
#9681, which introduces a
`preproc_if_poly` (and friends) types, which are polymorphic types that
encode the structure of the preprocessor statements that occur in C and
C++. Here, we see that we save massive amounts of logic duplication with
minimal boilerplate.

## Test plan:
`make test`

Added a parsing test for the parenthesized assignment thing that all of
this was originally for, too.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants