Implement a new parser on top of the new lexer #370

mcy · 2024-11-19T00:49:12Z

This change adds a new Protobuf parser to the experimental/parse package. It also includes a corpus of tests for this parser, some of which currently pass (incorrectly, due to the missing legalizer).

experimental/ast/nil.go

experimental/internal/donotuse.go

experimental/ast/nodes.go

experimental/ast/expr_literal.go

experimental/ast/path.go

internal/golden/golden.go

internal/iters/collect.go

internal/iters/nth.go

internal/iters/partition.go

internal/iters/iters.go

This PR contains all changes from #370 that are not the parser nor its tests. This PR has been separated out to break review dependencies. New additions include: - `ast.File.ToProto`, which is used for constructing a JSON representation of the AST for golden diffing. - `ast.Path` has new operations, including path splitting. - `internal/iters`, since I keep adding iterator helpers. - `internal/ast2`, which contains internal, layout-dependent AST operations. - `report.Renderer` will now show the backtrace at which a diagnostic was created if it panics during rendering. - `report.Snippet` will now discard snippets that have a nil span. - Fixed bugs in `ExprAny`/`TypeAny` conversions.

mcy · 2024-12-05T22:10:09Z

I've rebased this PR over #385. The last few commits are the reviewable content of this PR.

jhump

I scanned the test cases. They are so numerous and long that they are hard to review with full attention -- easy to get mentally exhausted staring at them.

The more important points I'd like to make are:

This structure feels like it might be hard to change to improve error handling.
Some of the code is complicated enough that it's hard to really verify that it's going to accept/enforce all aspects of the actual grammar. It obviously is lenient, but it's hard to tell if there are edge cases where it fails to accept something that is actually valid. I've tried to add comments to places where I could tell it's not right, but it's hard to really see everything.

As far as the two points above, and any comment I left in that vein that feels like it might warrant a non-trivial overhaul: I am not asking that we make a lot of changes now. I'm just noting my concerns and think it will be easier to handle (in particular, it will be easier for me to review) with incremental changes/improvements after merging this. I know getting more sophisticated with error recovery may require quite a lot of intrusive changes, but I still think that's better as a follow-on than trying to tackle it in this PR.

experimental/parser/diagnostics_internal.go

experimental/parser/parse.go

experimental/parser/parse_decl.go

experimental/parser/parse_state.go

experimental/parser/parse_type.go

jhump · 2024-12-11T19:41:09Z

experimental/parser/parse_type.go

+	// Finally, apply any remaining modifiers (in reverse order) to ty.
+	for i := len(mods) - 1; i >= 0; i-- {
+		ty = p.NewTypePrefixed(ast.TypePrefixedArgs{
+			Prefix: mods[i],
+			Type:   ty,
+		}).AsAny()
+	}


Logically, I think this belongs above: after we've process any generic type arguments but before we look for an optional trailing path.

It can't, and the comment in the previous stanza explains why. If we parse optional optional, and we want to return a path, we return optional, optional: a path type and a path. But if we don't want a path, we would return optional optional, nil: a prefixed type and a nil path.

Presumably it could also return optional optional optional as optional optional type and optional path? That is allowed in the language. I think the loop that adds prefixes would greedily capture all, so I think this would work, too.

jhump · 2024-12-11T19:43:11Z

experimental/parser/testdata/parser/def/ordering.proto

+
+    M x returns (T) returns (T);
+    M x [foo = bar] returns (T);
+    M x { /* ... */ } returns (T);


As mentioned before, This looks to me like it would rather be parsed as M x { } followed by returns (T);. For one, I don't think this particular error is that likely; and two, I think it would make parseDef easier to read and understand.

This is a much more interesting issue when the thing after the {} is a compact options. Making it work just for compact options but not for the others is probably more work than what I've done in the refactor of parseDef.

jhump · 2024-12-16T14:51:38Z

experimental/parser/parse_def.go

+		}
+	}
+
+	// If we didn't see any braces, this def needs to be ended by a semicolon.


This still doesn't seem right. I would think we only allow omitting the semicolon if the last thing was a body. So foo { ... } returns (bar) should still expect a semicolon.

I think the condition should instead be:

var isBody bool if lastFollower >= 0 { _, isBody = defFollowers[idx].(defBody) } if !isBody { // expect semicolon...

I'm still worried about the complexity of this logic -- it makes it much less intuitive exactly what the parser will/should parse, like when semicolons are expected or not, etc.

While a fuzz tester could be used to try to test for all cases, the result will still be a pain to maintain. I would much rather trade away some of these diagnostics for greater certainty of correctness and maintainability.

Well, there is a test that is intended to exhaustively test the O(n^2) things this code handles. IDK, IME the way to deal with this in any compiler is to just exhaustively test everything, and use fuzzing to generate test cases. This is why I built the golden test framework the way I did: I expect to write A LOT more tests.

Once the new compiler is working end to end I plan to walk through the whole spec and write test cases for everything mentioned therein.

experimental/parser/parse_def.go

jhump · 2024-12-16T15:02:07Z

experimental/parser/parse_def.go

+	case p.args.Type.Nil():
+		return taxa.EnumValue


This is a bit counter-intuitive. I think the keyword of the enclosing element should dictate whether we interpret this as a field or enum. So a declaration like foo = 1; should not (IMO) be interpreted as an enum value if it's inside a message -- it's clearly a mangled field instead.

experimental/parser/parse_expr.go

mcy force-pushed the mcy/parser2 branch 3 times, most recently from f579da3 to 60f10a1 Compare November 25, 2024 20:34

mcy requested a review from jhump November 25, 2024 20:34

mcy marked this pull request as ready for review November 25, 2024 20:34

mcy force-pushed the mcy/parser2 branch 8 times, most recently from 55372b6 to 4fe7cad Compare December 2, 2024 22:30

jhump reviewed Dec 3, 2024

View reviewed changes

mcy mentioned this pull request Dec 4, 2024

Refector prerequisites for #370 #381

Merged

mcy added 4 commits December 5, 2024 13:46

add taxa

e295c4f

is.go -> classify.go

0a2a17c

merge updates

8737b48

missed diffs for iters

c2701f2

mcy force-pushed the mcy/parser2 branch from 13c6e01 to a73a82c Compare December 5, 2024 22:09

mcy requested a review from jhump December 5, 2024 22:10

mcy added 3 commits December 5, 2024 14:22

🤦

ce41282

parser

ee92ee1

tests

b766dab

mcy force-pushed the mcy/parser2 branch from a73a82c to b766dab Compare December 5, 2024 22:22

jhump reviewed Dec 11, 2024

View reviewed changes

mcy added 2 commits December 11, 2024 15:28

cr

e4aa648

Merge remote-tracking branch 'origin/main' into mcy/parser2

ff242ec

lint

4c2f04f

mcy requested a review from jhump December 13, 2024 19:05

jhump reviewed Dec 16, 2024

View reviewed changes

mcy added 3 commits December 16, 2024 12:10

Merge remote-tracking branch 'origin/main' into mcy/parser2

8a93feb

cr

ad4b604

update tests

e57743b

mcy requested a review from jhump December 16, 2024 22:54

jhump approved these changes Dec 18, 2024

View reviewed changes

Merge remote-tracking branch 'origin/main' into mcy/parser2

1230042

mcy enabled auto-merge (squash) December 18, 2024 19:13

mcy merged commit 12612dc into main Dec 18, 2024
9 checks passed

mcy deleted the mcy/parser2 branch December 18, 2024 19:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement a new parser on top of the new lexer #370

Implement a new parser on top of the new lexer #370

mcy commented Nov 19, 2024 •

edited

Loading

mcy commented Dec 5, 2024

jhump left a comment

jhump Dec 11, 2024

mcy Dec 11, 2024

jhump Dec 18, 2024

jhump Dec 11, 2024

mcy Dec 11, 2024

jhump Dec 16, 2024

mcy Dec 16, 2024

jhump Dec 16, 2024

Implement a new parser on top of the new lexer #370

Implement a new parser on top of the new lexer #370

Conversation

mcy commented Nov 19, 2024 • edited Loading

mcy commented Dec 5, 2024

jhump left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mcy commented Nov 19, 2024 •

edited

Loading