-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
compiler: introduce an IR for the code generators #551
Conversation
The operation now expects its operands to be `NimNode`s already. A new instruction (`opcDataToAst`) is introduced for creating the AST representation of VM data. While also simplifying the VM a bit, the main reason behind the change is to not having to provide the full AST of the template call expression, as doing so is not possible when the code generator no longer operates on `PNode` AST.
Attempting to derive the first revision from the MIR was a mistake, as too much changes to the code generators are required for that to work. I'm going to revert all changes and start over, but now with a focus on keeping both the overall changeset and development of the new IR to a minimum. However, there is still some decoupling left do to before the work here can resume. This includes things like making dynlib handling part of the unified backend processing pipeline, or replacing/removing dependencies on routines that are not part of the code generators and use |
The `nfAllFieldsSet` flag stopped reaching the code generator with the introduction of the MIR, meaning that the condition always evaluates to 'true'.
The C and JS code-gen orchestrators were passing the AST produced for the main procedure directly to the code generators. This is no longer going to work once the code generators don't work with `PNode` anymore, so `canonicalize` is now used on the AST.
Instead of an integer-literal node, the procedure now accepts the value directly.
`astgen` is adjusted to produce `CgNode` instead of `PNode`. For this, multiple `PNode` -> `CgNode` translation procedures had to be introduced and `canUseView` + `flattenExpr` duplicated and adjusted for `CgNode`. The general processing logic stays the same. The module's document comment is also adjusted and an outdated mention of "sections" (they are called "regions" in the MIR) fixed. `astgen` as the name doesn't make much sense anymore and is going to be changed to something more fitting.
`canonicalize` and `generateAST` now return `CgNode` trees. For debug rendering, a `treeRepr` procedure for `CgNode` is added to the `cgirutils` module.
The instruction-emission procedure now accept a `TLineInfo` as input directly, instead of, unnecessarily, requiring a `PNode`. Wrappers that still use `PNode` are added for convenience.
All three code generators now use the `CgNode` IR. The changes to the modules are kept minimal in order to make review easier. As an additional way to keep the amount of changes smaller, the `compat` module is introduced.
It's obsolete now.
In addition, the module is moved to the `backend` directory.
i really like all the node simplification and rework, one question about it, did you already consider naming |
Hm, no, I didn't consider it. My thinking was that while it voids (discards) a value, it itself acts as a statement (returns no value), so, for consistency with the other statements, used the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this really is an incredible milestone in decoupling sem and the backends, along with backend unification! 🎉
|
||
cnkAsgn ## a = b | ||
cnkFastAsgn ## fast assign b to a | ||
# future direction: have ``cnkAsgn`` mean "assign without implying any |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if a better distinction is assign vs initialize?
For context I'm simply broadly remarking upon nomenclature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good question. Everything related to assignment is very fuzzy at the moment, mainly because of "assignment" having different meaning depending on the used code generator (they don't operate on the same language level with regards to assignments).
Broadly speaking, cnkAsgn
combines both "assign" (copy to non-empty destination) and "initialize" (copy to empty destination), while cnkFastAsgn
means "create a shallow copy". Whether combining "assign" and "initialize" is a good idea, I'm not sure (it probably isn't), but PNode
did it, so for initial compatibility, I carried it over.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like we're on the same page, needs some more thinking.
of cnkWithItems: | ||
childs*: seq[CgNode] | ||
|
||
# future direction: move to a single-sequence-based, data-oriented design |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -704,12 +711,12 @@ proc genWhileStmt(p: PProc, n: PNode) = | |||
p.blocks[^1].isLoop = true | |||
let labl = p.unique.rope | |||
lineF(p, "Label$1: while (true) {$n", [labl]) | |||
p.nested: genStmt(p, n[1]) | |||
p.nested: genStmt(p, n[0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The index change is because of the new structure of repeat vs while?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, the loops body was previously in the second slot, but now it's in the first (and only) one.
The module is named `cgirgen` now. `debug.rst` also contained outdated mentions of `PNode` being the IR the code generators -- this is fixed too.
Thank you for the review, @saem! |
/merge |
Merge requested by: @zerbina Contents after the first section break of the PR description has been removed and preserved below:
|
Summary
Add the
CgNode
intermediate representation, changeastgen
(which isrenamed to
cgirgen
) to output it instead ofPNode
, and update allcode generators to use
CgNode
. In order to keep the changes requiredfor the transition low, the differences of the new IR compared to
PNode
are kept minimal.The intent is to have an IR that can be evolved independently from sem
and the macro API. Only
PNode
is replaced so far, but bothPType
and
PSym
are planned to also get a dedicated version for use by thecode generators. In addition,
PNode
can now be evolved without thecode generators having to be considered.
Details
The core of the changes is the introduction of
CgNode
. It, for now,also uses a
ref
-tree-based data-representation and is very similar,in both structure and naming, to the
PNode
-subset previously used bythe code generators, but with some small simplifications/renamings
already applied where it makes sense.
Naming differences:
nkNimNodeLit
->cnkAstLit
nkCurly
->cnkSetConstr
nkBracket
->cnkArrayConstr
nkClosure
->cnkClosureConstr
nkHiddenDeref
->cnkDerefView
nkHiddentStdConv
->cnkHiddenConv
nkDiscardStmt
->cnkVoidStmt
nkExprColonExpr
->cnkBinding
Structural differences:
while
statement is replaced with therepeat
statement, sinceconditional loops (i.e.
while
statements) don't exist after the MIRphase. A
cnkRepeatStmt
node has a single sub-node, which is theloop's body
if
statements, since those don't existduring and after the MIR phase. A
cnkIfStmt
has the same structureas an
nkElifBranch
cnkReturnStmt
node has no sub nodesemit
being represented via a pragma statement, adedicated statement (
cnkEmitStmt
) is usedcnkPragmaStmt
s don't have sub-nodes, but instead directly store thepragma's name
cnkObjConstr
doesn't have an extra type slot in the first positionint
,uint
,float
, andstring
typesof
andelse
incase
statements,both use
cnkBranch
nkVarSection|nkLetSection
+nkIdentDefs
counterparts.Instead, definitions use the
cnkDef
nodeMany more simplifications are possible, but they are left for future
PRs.
Producing the IR
astgen
is changed to produceCgNode
instead ofPNode
IR. ThecanUseView
andflattenExpr
procedures forPNode
are copied andadjusted for
CgNode
.For literals, which are transported through the MIR phase as
PNode
s,translation-to
CgNode
logic is added (translateLit
). For integer-likeliterals, whether the output node is an
cnkIntLit
orcnkUIntLit
nodeis decided based on the input node's type, not on its node kind --
this helps catching issues with incorrectly typed integer literals. For
float32
literals,translateLit
already narrows the value.Since the module's name now doesn't apply anymore, it is changed to
cgirnode
. In addition, the module is moved to thebackend
directoryand
generateAST
is renamed togenerateIR
-- the module'sdocumentation is also updated.
Code generation
All three code generators and their surroundings are adjusted to operate
on
CgNode
instead ofPNode
, which for the most part means replacingoccurrences of
PNode
withCgNode
and adjusting for the structuraldifference listed above. Unrelated refactorings and clean-ups are kept
to a minimum.
There are two sources of
PNode
s still reaching into the codegenerators: constants and type AST. When generating the code for
constants, their AST is first translated to
CgNode
IR before beingpassed on to normal code generation, while for the type AST case,
special-purpose routines that still use
PNode
are added.Routines still used by the code generators that are only available for
PNode
are copied into the newcompat
module and adjusted to useCgNode
. This is meant to be an interim solution, and they are plannedto be phased out and removed again in the future.
Both the C and JavaScript code generation orchestrators passed the
PNode
body of the main procedure directly to the code generator. Thisdoesn't work anymore, and it is replaced with canonicalizing (which
produces a
CgNode
tree) the body first.