-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
decouple parser from PNode #425
Conversation
0ca4f69
to
8bb510d
Compare
8bb510d
to
dbe2edd
Compare
@saem this is a first pass on the implementation, which uncovered some implementation questions that I wrote down in changes - look them over and tell me about your thoughts on direction/solution. I'm considering whether I should also include the change to the DOD lexer in this PR or split it into a separate subsequent one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I got them all, the PR is already really starting to shape up.
# consistently copies this information around anyway, | ||
# so I will let it stay this way for now. | ||
token*: Token # TODO Replace full token value with an index information | ||
kind*: TNodeKind # NOTE/QUESTION - for now the same kind of nodes is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, or at least one will be a super set of the other. Ultimately separate enums are likely best.
|
||
# HACK explicit flags in order to track down all 'extra' information | ||
# that is collected during parsing. | ||
isBlockArg*: bool # QUESTION add 'nkStmtListBlockArg' or similar node |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You found it too!
OK, so this dumb thing is used to tell the semantic analysis layer about a statement list (possibly also block) that are being passed to a macro, template, whatever. During semantic analysis we collapse/flatten nested statement lists into one level, but we also do statement list unwrapping, so if there is only one node in a statement list we unwrap it except if there is a defer (I think this might be a bug/design flaw).
That first flattening is guarded by this dumb flag. I personally think we should always flatten, no matter what, but never unwrap in semStmtList (it's the last/second last proc in semstmts, I rewrote it recently so it should be readable). Without that guard a macro can receive arbitrary chunks of AST not always wrapped in a statement list and it makes the really annoying/painful to write.
I do believe between fixing the unpacking logic and the AST the parser generates this shouldn't be a problem. My guess at the most correct fix is sematic analysis layer should:
- unwrap as determined by the receiving node, so be done on add/index assign based on receiving node kind
- flattening should always take place in semantic analysis of a statement list because that's just always right
Then this dumb guard should not be necessary. Separating the AST used by sem, parsing, and backend means we can change add and index assign behavior and it only impacts sem so we can do destination based unwrapping rather than have it live incorrectly in semStmtList. 🎉
Sorry that was a rather lengthy explanation.
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
@@ -97,6 +97,8 @@ const | |||
callableDefs* = nkLambdaKinds + routineDefs | |||
|
|||
nkSymChoices* = {nkClosedSymChoice, nkOpenSymChoice} | |||
nkFloatKinds* = nkFloatLiterals # QUESTION remove float literals |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so, but maybe I'm missing some key point? Whatchya thinking?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have node kinds for everything and only float literals. It is a naming question, and it is a set of node kinds, not a set of literals. Also nkLiterals
is inconsistent with Atomic kinds in the macros.nim
, but that's another question. I think nkLiteralKinds
, nkFloatKinds
or even nkFloatLiteralKinds
make more sense (the last one is the most sensible IMO, but a bit longer)
TBH I always found this enum set definitions a bit weirdly structured and lacking in some places, but it is hard to see the underlying structure immediately from the code, it must be an incremental improvement
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nimskull/compiler/ast/ast_query.nim
Line 687 in 837238f
proc `[]`*(node: PNode, pos: NodePosName): PNode = |
[]
named indexing operator), but for now that's just a new idea I got while elaborating on the "literals/kinds" distinction above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For nkFloatLiterals
vs nkFloatKinds
, the former is literals only and the latter might include more, but I can't think of a single thing not covered by the former. The is a difference in meaning but I'm really hard pressed to think of a case where they would not be equivalent. I would check the sites and consolidate personally.
nimskull/compiler/ast/ast_query.nim
Line 687 in 837238f
proc `[]`*(node: PNode, pos: NodePosName): PNode = I think those two must be intrinsically related to each other (the collection of the enum kind sets and the
[]
named indexing operator), but for now that's just a new idea I got while elaborating on the "literals/kinds" distinction above
That's exactly the types of insights that form by looking at this stuff.
I agree with the sentiment, the naming and categorization needs to make sense based on purpose and not solely because a new programmer's first instincts are to lump them together.
So to recap:
- naming needs to be more consistent
- naming needs to better convey intention
- sets need to serve a purpose(s), eg: many sites in the compiler need to handle all literals one way (
nkLiterals
) or lots of literal handling is weird about floats vs other numbers (nkFloatLiterals
)
Doesn't have to be those precise names but I think we're on the same page. For actually changing sets carefully looking through their usage is required because there are unfortunate subtleties.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I answered a question and left some small suggestions regarding code style.
@saem I think the optimal plan would be to split in following steps (PRs)
|
This does look better and I'm sure it'll be revised more as we continue, let's do it. Additional benefit, smaller chunks, easier review, testing, etc.
Cool, that's what I was thinking things are shaping up around.
Yes, less hacks and more Hax. :D
Yup, will need to support:
But I'm assuming that's implied. The former is because we might be asked to handle various code fragments like repls and cases where we don't have a file perse. The latter is where we are formatting a subsection of some file. The amount of effort and support needed here might be near zero. Just highlighting two interesting cases besides the majority case of program/file(s) compilation.
Yup, then it would only need the left and right index and probably be done.
Based on what I could glean from the paper and @alaviss 's remarks I think knowing this is important for these reasons:
Yup, I think it'll be easier if the lexer has had a once over already. |
Step #5 is to add reduced parsed node set - based on the
|
d05b275
to
601b71c
Compare
601b71c
to
bb16d0a
Compare
c9f7f87
to
50a7ab1
Compare
7969466
to
98d807a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did a pass on all the comments.
|
||
# HACK explicit flags in order to track down all 'extra' information | ||
# that is collected during parsing. | ||
isBlockArg*: bool # QUESTION add 'nkStmtListBlockArg' or similar node |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
The main change - introduce a `ParsedNode` type which replaces `PNode` in the parser. This change allows for further work on decoupling `sem` from other parts of the compiler, making it easier to implement improvements in a way that would not rip through the whole codebase and test suite. Right now introduced type closely mimics the `PNode` counterpart, but this is just a temporary measure for the transition period. This commit is a part of multi-step series - full list can be seen in the related issue nim-works#423 * Documentation changes - Add missing documentation for changes in the earlier commit, add more how-tos to the debugging section (I haven't coded in a while, so was especially important to write down explanations for anything I had trouble with) nim-works@602367b * Tangentially related refactoring work - Cleanup the `passes.nim` implementation a bit - despite common (at least seemingly shared by many of the previous authors of the codebase) misconception longer variable names actually *do* increase readability. Also infamous recommendations for the "structured programming" also do not really mesh with proliferation of `break` statements in the code. Add todo/bug comment for the main processing loop bug related to the phase ordering in `compiler/sem/passes.nim:234` * Debugging tools improvements - Implement `astrepr.nim` support for the `ParsedNode` and `PIdent` - `debug` and `treeRepr` procedures. - Allow skipping repeated symbol in the `(open|closed)SymChoice` node kinds in the `astrepr` - Restructure imports of the `astepr` and move it closer to the 'primitive' modules - type definitions and trivial data queries. The most important change is removal of the `ast.nim` and `renderer.nim` imports, which opens these modules for debugging as well. - Consider possibility of a nil `owner` in the symbol owner chain representation calculations in `astrepr` - Semantic tracer debug output file rotation now uses location of the first `.define(` call as a file name base instead of integer-based ones. Added basic logging information about created files - now a developer can see what is going on and what gets written. For example, running with `--define=nimCompilerDebugTraceDir=/tmp` and seveal `define(...)` sections produces the following output: ``` comparisons.nim(269, 8): opening /tmp/comparisons_nim_0 trace comparisons.nim(274, 7): closing trace, wrote 44 records comparisons.nim(276, 8): opening /tmp/comparisons_nim_1 trace comparisons.nim(285, 7): closing trace, wrote 329 records ``` - Simplify implementation of the `reportInst` handling in the debug utils tracer - now each toplevel tracer template must submit the location by itself - this solution avoids unintuitive and fragile `instLoc(-5)` call which might break with more templates introduced. Also updated documentation on the `reportInst` and `reportFrom` in the reports file. - compiler/front/options.nim:693 :: Unconditionally output debugging traces if they are requested, regardless of the surrounding hooks and filters. Introduce the `bypassWriteHookForTrace` flag in the debugging hack controller which makes it possible to bypass the `writeln` hook. * Further work - compiler/ast/parser.nim:744 :: introduce two tokens in order to handle custom literals. There is no real need to mash together everything in a single chunk of text that would have to be split apart down the line.
98d807a
to
6103e24
Compare
bors r+ |
Merging now, @haxscramper have a look at the unresolved comment related suggestions. It might be easiest to simply use the same branch, apply the suggestions, and then do an rebase onto devel. |
Build succeeded: |
follow-up PR for review comments from the nim-works#425
428: parser refactor follow-up code style fixes r=haxscramper a=haxscramper follow-up PR for review comments from the #425 Co-authored-by: haxscramper <[email protected]>
wip implementation of the #423