-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement a new parser on top of the new lexer #370
Conversation
f579da3
to
60f10a1
Compare
55372b6
to
4fe7cad
Compare
This PR contains all changes from #370 that are not the parser nor its tests. This PR has been separated out to break review dependencies. New additions include: - `ast.File.ToProto`, which is used for constructing a JSON representation of the AST for golden diffing. - `ast.Path` has new operations, including path splitting. - `internal/iters`, since I keep adding iterator helpers. - `internal/ast2`, which contains internal, layout-dependent AST operations. - `report.Renderer` will now show the backtrace at which a diagnostic was created if it panics during rendering. - `report.Snippet` will now discard snippets that have a nil span. - Fixed bugs in `ExprAny`/`TypeAny` conversions.
I've rebased this PR over #385. The last few commits are the reviewable content of this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I scanned the test cases. They are so numerous and long that they are hard to review with full attention -- easy to get mentally exhausted staring at them.
The more important points I'd like to make are:
- This structure feels like it might be hard to change to improve error handling.
- Some of the code is complicated enough that it's hard to really verify that it's going to accept/enforce all aspects of the actual grammar. It obviously is lenient, but it's hard to tell if there are edge cases where it fails to accept something that is actually valid. I've tried to add comments to places where I could tell it's not right, but it's hard to really see everything.
As far as the two points above, and any comment I left in that vein that feels like it might warrant a non-trivial overhaul: I am not asking that we make a lot of changes now. I'm just noting my concerns and think it will be easier to handle (in particular, it will be easier for me to review) with incremental changes/improvements after merging this. I know getting more sophisticated with error recovery may require quite a lot of intrusive changes, but I still think that's better as a follow-on than trying to tackle it in this PR.
// Finally, apply any remaining modifiers (in reverse order) to ty. | ||
for i := len(mods) - 1; i >= 0; i-- { | ||
ty = p.NewTypePrefixed(ast.TypePrefixedArgs{ | ||
Prefix: mods[i], | ||
Type: ty, | ||
}).AsAny() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logically, I think this belongs above: after we've process any generic type arguments but before we look for an optional trailing path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can't, and the comment in the previous stanza explains why. If we parse optional optional
, and we want to return a path, we return optional, optional
: a path type and a path. But if we don't want a path, we would return optional optional, nil
: a prefixed type and a nil path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Presumably it could also return optional optional optional
as optional optional
type and optional
path? That is allowed in the language. I think the loop that adds prefixes would greedily capture all, so I think this would work, too.
|
||
M x returns (T) returns (T); | ||
M x [foo = bar] returns (T); | ||
M x { /* ... */ } returns (T); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned before, This looks to me like it would rather be parsed as M x { }
followed by returns (T);
. For one, I don't think this particular error is that likely; and two, I think it would make parseDef
easier to read and understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a much more interesting issue when the thing after the {}
is a compact options. Making it work just for compact options but not for the others is probably more work than what I've done in the refactor of parseDef.
} | ||
} | ||
|
||
// If we didn't see any braces, this def needs to be ended by a semicolon. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still doesn't seem right. I would think we only allow omitting the semicolon if the last thing was a body. So foo { ... } returns (bar)
should still expect a semicolon.
I think the condition should instead be:
var isBody bool
if lastFollower >= 0 {
_, isBody = defFollowers[idx].(defBody)
}
if !isBody {
// expect semicolon...
I'm still worried about the complexity of this logic -- it makes it much less intuitive exactly what the parser will/should parse, like when semicolons are expected or not, etc.
While a fuzz tester could be used to try to test for all cases, the result will still be a pain to maintain. I would much rather trade away some of these diagnostics for greater certainty of correctness and maintainability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, there is a test that is intended to exhaustively test the O(n^2) things this code handles. IDK, IME the way to deal with this in any compiler is to just exhaustively test everything, and use fuzzing to generate test cases. This is why I built the golden test framework the way I did: I expect to write A LOT more tests.
Once the new compiler is working end to end I plan to walk through the whole spec and write test cases for everything mentioned therein.
case p.args.Type.Nil(): | ||
return taxa.EnumValue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit counter-intuitive. I think the keyword of the enclosing element should dictate whether we interpret this as a field or enum. So a declaration like foo = 1;
should not (IMO) be interpreted as an enum value if it's inside a message -- it's clearly a mangled field instead.
This change adds a new Protobuf parser to the
experimental/parse
package. It also includes a corpus of tests for this parser, some of which currently pass (incorrectly, due to the missing legalizer).