Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement a new parser on top of the new lexer #370

Merged
merged 14 commits into from
Dec 18, 2024
Merged

Implement a new parser on top of the new lexer #370

merged 14 commits into from
Dec 18, 2024

Conversation

mcy
Copy link
Member

@mcy mcy commented Nov 19, 2024

This change adds a new Protobuf parser to the experimental/parse package. It also includes a corpus of tests for this parser, some of which currently pass (incorrectly, due to the missing legalizer).

@mcy mcy force-pushed the mcy/parser2 branch 3 times, most recently from f579da3 to 60f10a1 Compare November 25, 2024 20:34
@mcy mcy requested a review from jhump November 25, 2024 20:34
@mcy mcy marked this pull request as ready for review November 25, 2024 20:34
@mcy mcy force-pushed the mcy/parser2 branch 8 times, most recently from 55372b6 to 4fe7cad Compare December 2, 2024 22:30
experimental/ast/nil.go Outdated Show resolved Hide resolved
experimental/internal/donotuse.go Outdated Show resolved Hide resolved
experimental/ast/nodes.go Outdated Show resolved Hide resolved
experimental/ast/expr_literal.go Outdated Show resolved Hide resolved
experimental/ast/path.go Outdated Show resolved Hide resolved
internal/golden/golden.go Outdated Show resolved Hide resolved
internal/iters/collect.go Outdated Show resolved Hide resolved
internal/iters/nth.go Outdated Show resolved Hide resolved
internal/iters/partition.go Outdated Show resolved Hide resolved
internal/iters/iters.go Outdated Show resolved Hide resolved
mcy added a commit that referenced this pull request Dec 5, 2024
This PR contains all changes from #370 that are not the parser nor its
tests. This PR has been separated out to break review dependencies.

New additions include:
- `ast.File.ToProto`, which is used for constructing a JSON
representation of the AST for golden diffing.
- `ast.Path` has new operations, including path splitting.
- `internal/iters`, since I keep adding iterator helpers.
- `internal/ast2`, which contains internal, layout-dependent AST
operations.
- `report.Renderer` will now show the backtrace at which a diagnostic
was created if it panics during rendering.
- `report.Snippet` will now discard snippets that have a nil span.
- Fixed bugs in `ExprAny`/`TypeAny` conversions.
@mcy
Copy link
Member Author

mcy commented Dec 5, 2024

I've rebased this PR over #385. The last few commits are the reviewable content of this PR.

@mcy mcy requested a review from jhump December 5, 2024 22:10
Copy link
Member

@jhump jhump left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I scanned the test cases. They are so numerous and long that they are hard to review with full attention -- easy to get mentally exhausted staring at them.

The more important points I'd like to make are:

  1. This structure feels like it might be hard to change to improve error handling.
  2. Some of the code is complicated enough that it's hard to really verify that it's going to accept/enforce all aspects of the actual grammar. It obviously is lenient, but it's hard to tell if there are edge cases where it fails to accept something that is actually valid. I've tried to add comments to places where I could tell it's not right, but it's hard to really see everything.

As far as the two points above, and any comment I left in that vein that feels like it might warrant a non-trivial overhaul: I am not asking that we make a lot of changes now. I'm just noting my concerns and think it will be easier to handle (in particular, it will be easier for me to review) with incremental changes/improvements after merging this. I know getting more sophisticated with error recovery may require quite a lot of intrusive changes, but I still think that's better as a follow-on than trying to tackle it in this PR.

experimental/parser/diagnostics_internal.go Outdated Show resolved Hide resolved
experimental/parser/diagnostics_internal.go Outdated Show resolved Hide resolved
experimental/parser/parse.go Outdated Show resolved Hide resolved
experimental/parser/parse.go Outdated Show resolved Hide resolved
experimental/parser/parse_decl.go Outdated Show resolved Hide resolved
experimental/parser/parse_state.go Outdated Show resolved Hide resolved
experimental/parser/parse_type.go Show resolved Hide resolved
experimental/parser/parse_type.go Show resolved Hide resolved
Comment on lines +187 to +193
// Finally, apply any remaining modifiers (in reverse order) to ty.
for i := len(mods) - 1; i >= 0; i-- {
ty = p.NewTypePrefixed(ast.TypePrefixedArgs{
Prefix: mods[i],
Type: ty,
}).AsAny()
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logically, I think this belongs above: after we've process any generic type arguments but before we look for an optional trailing path.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can't, and the comment in the previous stanza explains why. If we parse optional optional, and we want to return a path, we return optional, optional: a path type and a path. But if we don't want a path, we would return optional optional, nil: a prefixed type and a nil path.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumably it could also return optional optional optional as optional optional type and optional path? That is allowed in the language. I think the loop that adds prefixes would greedily capture all, so I think this would work, too.


M x returns (T) returns (T);
M x [foo = bar] returns (T);
M x { /* ... */ } returns (T);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned before, This looks to me like it would rather be parsed as M x { } followed by returns (T);. For one, I don't think this particular error is that likely; and two, I think it would make parseDef easier to read and understand.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a much more interesting issue when the thing after the {} is a compact options. Making it work just for compact options but not for the others is probably more work than what I've done in the refactor of parseDef.

@mcy mcy requested a review from jhump December 13, 2024 19:05
}
}

// If we didn't see any braces, this def needs to be ended by a semicolon.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still doesn't seem right. I would think we only allow omitting the semicolon if the last thing was a body. So foo { ... } returns (bar) should still expect a semicolon.

I think the condition should instead be:

var isBody bool
if lastFollower >= 0 {
    _, isBody = defFollowers[idx].(defBody)
}
if !isBody {
    // expect semicolon...

I'm still worried about the complexity of this logic -- it makes it much less intuitive exactly what the parser will/should parse, like when semicolons are expected or not, etc.

While a fuzz tester could be used to try to test for all cases, the result will still be a pain to maintain. I would much rather trade away some of these diagnostics for greater certainty of correctness and maintainability.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, there is a test that is intended to exhaustively test the O(n^2) things this code handles. IDK, IME the way to deal with this in any compiler is to just exhaustively test everything, and use fuzzing to generate test cases. This is why I built the golden test framework the way I did: I expect to write A LOT more tests.

Once the new compiler is working end to end I plan to walk through the whole spec and write test cases for everything mentioned therein.

experimental/parser/parse_def.go Outdated Show resolved Hide resolved
experimental/parser/parse_def.go Show resolved Hide resolved
experimental/parser/parse_def.go Outdated Show resolved Hide resolved
Comment on lines +256 to +257
case p.args.Type.Nil():
return taxa.EnumValue
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit counter-intuitive. I think the keyword of the enclosing element should dictate whether we interpret this as a field or enum. So a declaration like foo = 1; should not (IMO) be interpreted as an enum value if it's inside a message -- it's clearly a mangled field instead.

experimental/parser/parse_expr.go Outdated Show resolved Hide resolved
@mcy mcy requested a review from jhump December 16, 2024 22:54
@mcy mcy enabled auto-merge (squash) December 18, 2024 19:13
@mcy mcy merged commit 12612dc into main Dec 18, 2024
9 checks passed
@mcy mcy deleted the mcy/parser2 branch December 18, 2024 19:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants