-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
compiler: make injectdestructors
a MIR pass
#450
Conversation
I think I understand it, going to write that out for confirmation: injectdestructors:
Prior to reading it, what I was looking for:
It seems you've already accounted for the majority of it -- if not all and it's just my understanding falling short. I think it's mostly covered with the So I'm not sure how we flow the info back up to Sem n order for us to ensure we can flow constraints (mostly types info) down from Sem and warnings/errors telemetry backup. Apologies if it's just my lack of understanding the code. |
Your understanding is correct. I apologize for not having provided a high-level overview. I think I insufficiently described my motivation for this PR, and since that information can help in understanding some decisions behind the current implementation, here goes a second attempt: The main contributors to my decision of working on this PR first are as following:
So to rephrase, the goal here is to introduce an IR that:
My current plan for the how the flow should look like in the near-term:
Using
The main reason behind the
A
I'm not sure I understand. In which case is it required for information to flow back from the MIR to Sem? Sorry if it's obvious. |
Thank you for the detailed explanation. The motivations make a lot of sense and the overall flow is sensible. I didn't need it, but consider me extra convinced. 😁
Somewhat of a guess on my part, as we're doing lifetime analysis here, then any invariant enforcement would also need to be here? Possibilities aside, at the time of my prior comment I had done a very quick search in the current inject destructors to look for Reading between the lines we'll likely end up with origin being tracked and nothing about the existing approach seems to make that an impossibility, I think you nailed it. 🎉 |
The most recent commit contains a partial redesign of the MIR. The previous iteration was a simple, linear-tree-based IR, which was structurally similar to One of the main differences when compared to let x =
if a: true
else: false get translated to the MIR equivalent of: var tmp: bool
if a:
tmp = true
else:
tmp = false
let x = tmp A significant issue with this is the loss of information about the control-flow dependencies between separate expressions. Consider: proc fsink(a: sink string, b: string) = ...
fsink(notLastRead x, (if cond: y else: z)) which got transformed to the MIR equivalent of: var tmp: string
if cond:
tmp = y
else:
tmp = z
fsink(notLastRead x, tmp)
One solution to this problem would be to assign every argument expression to a temporary first, use the temporaries as the actual arguments, and then inject the copy (or any other operations performed as part of There also were other issues, so I opted for a partial redesign instead. The core of the redesign is a shift to a representation where the focus is more on the flow of values. Expressions are now not tree-based anymore but are instead a sequence of applying operations to lvalues and values. Except for Each sequence starts with an input (e.g. a literal or the name of some entity), is followed by zero or more in-out operations, and ends in a sink. A sink is a kind of operation that doesn't produce a value. For example, both As a simple demonstration: doSomething(x, "a", (echo "a"; y.b.c.int)) would be represented in the MIR as:
Note that the callee procedure is now also only an argument to the Reading the above top-to-bottom left-to-right yields the nodes in the order they're placed in memory. The arrows are only meant for visualizing the flow of values. For documentation purposes, I wrote down the new grammar as a doc comment in I originally wanted for path expressions to use their own IR, mainly because this would allow for de-duplicating them and assigning each one a unique ID, making comparing two path expression for equality very efficient. The separate IR is also how they're currently implemented, but none of the expected benefits have materialized so far (comparing two paths for equality hasn't come up yet). There are many not-yet-solved issues, and de-duplicating them is complicated by the fact that conversions can be part of paths, so In general, there are still a lot of open questions, and some constructs, such as user-ops, are under- or unspecified. Nonetheless, I believe that in it's current state, the refactored MIR is already better than the previous iteration. Related to the MIR changes, there's an experiment with a small template-based DSL for making MIR generation easier -- see I also made general progress in regards to the actual There are some pre-existing issue with some of the rewrite-rules, mainly in how they interact (or not) with unstructured control-flow. I'll create tests for them, but am probably not going to fix them as part of this PR. |
I'm a little (very) sleep deprived so my review wasn't not very insightful I gather -- feel free to ignore the review comments if they're just unhelpful. I've read the code as best as I can and revisited your comment, I think the new approach is better (operatinos on l-values). I'll take another peek tomorrow -- hopefully I'll be able to understand things better then. I wouldn't consider that a blocker. Aside: minor thing is that perhaps there might be some vlaue in differentiating between |
compiler/sem/mirtrees.nim
Outdated
## This module contais the type defitions for the mid-end IR (MIR) together | ||
## with routines for querying them. | ||
## | ||
## Routines for traversing the used flat-tree-structure are also located here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bringing over the high level description of the MIR approach from your recent PR comment would be a nice to have.
I'm popping back and forth because my crappy working memory has to continually remind itself of what's up. 😅
compiler/sem/mirtrees.nim
Outdated
import | ||
compiler/ast/[ | ||
ast_types | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Casual remark:
I had a fun little realization. I know this module is the entry point, in terms of learning, for the MIR "package" because it's got basically no dependencies. Paired with some instructive comment that's a pretty nice cue in terms of navigating this bit of the compiler.
compiler/sem/mirconstr.nim
Outdated
|
||
import | ||
compiler/ast/ast_types, | ||
compiler/sem/mirtrees |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any particular reason to separate these out from mirtrees
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The construction procedures and utilities are only needed when generating new MIR code. For just comprehending and processing MIR (e.g. mirtraverse.nim
, astgen.nim
, etc.), they're not needed, so I thought it'd make sense to put them into a separate module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough, I can go either way on this sort of thing. Maaaaybe consider renaming it mirtrees_constr.nim
so it feels a bit subordinate to the other module. Might hint at "don't read me first".
injectdestructors
injectdestructors
as MIR passes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm suggesting small bits like doc comments right now as I'm refreshing my memory, thankfully the complexity is largely in the subject matter and not the implementaiton. 😄
Observations thus far:
- the code is clean and I feel like if there were issues I could tests and fix them
- similar to above, I can see myself or others extending it
- this obviously took a lot of effort and it shows, thank you thank you!
- my confidence in the change is mostly around CI at this point
🎉
I'm marking it approved. I'll keep reading and if I find areas of improvement that meaningfully simplify things or can easily be extracted, I'll note them in the PR.
## Stores information about the produced values plus the lvalue effects of | ||
## operations | ||
values: IndexedSeq[OpValue, ValueInfo] | ||
# XXX: `values` currently stores an entry for each *node*. Not every node |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tables are really really really slow. :(
They absolutely tanked performance when I was first prototyping a DOD AST with them.
compiler/sem/mirexec.nim
Outdated
## Computes the control-flow graph for the given `tree`. This is a very | ||
## expensive operation! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
## Computes the control-flow graph for the given `tree`. This is a very | |
## expensive operation! | |
## Computes the control-flow graph for the given `tree`. This is a very | |
## expensive operation! The high cost is due to two essential reasons: | |
## | |
## 1. a control flow graph needs to materialize all paths for a given `tree` | |
## 2. the amount of allocations/bookkeeping necessary to do so |
This is partly me just confirming my understanding, the first point I'm fairly confident about and the second one is more me guessing by looking at the push/pop of sequences and the Table
in ClosureEnv
.
Aside, I spent a few minutes trying to reason about replacing that table and the best I could figure was maybe the following:
type
ClosureEnv = object
# ... existing bits except joins
joinedNodes: seq[NodePosition]
nodeIsJoined: PackedSet[NodePosition]`
proc addJoin(env: var ClosureEnv, dest: NodePosition): JoinPoint =
let (id, isNew) =
if env.nodeIsJoined.containsOrIncl(dest):
env.nodeIsJoined.add dest
(env.joinedNodes.len - 1, true)
else:
(env.joinedNodes.find dest, false) # likely more efficient with a reverse find
JoinPoint(id: id, node: dest, isNew: isNew)
This is mostly a test of my understanding and a bit of fun, I doubt it's an improvement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't say that the control-flow graph needs to materialize all paths for tree
, rather only all edges (in the form of instructions). However, I might be misunderstanding what you mean.
Regarding the reason for why computeCfg
is expensive, the last time I profiled it, the by far largest amount of time was spent inside the tree traversal routines. For a general perspective, using a -d:release
compiler built with --gc:orc
to compile itself with --gc:orc
as the benchmark, more time (~15%) is spent in computeCfg
than in mirgen
(i.e. genProcBody
).
I did another profiling session using the newest commit -- the six most time consuming calls inside computeCfg
are:
Procedure | Time spent |
---|---|
mirtrees.findEnd |
0.401 % |
mirtrees.parent |
0.224 % |
algorithm.sort |
0.0476 % |
commit |
0.0174 % |
mirtrees.findParent |
0.0174 % |
addEdge |
0.011 % |
The "Time spent" is a percentage of the total compilation time. Using either recursion or a skip-list (or both) could reduce the amount of time spent on tree traversal.
The snippet you posted is, apart from two problems that I believe are unintended, correct. I was curious and ran a comparison between your code and the current implementation. Instrumenting a release compiler that is booted with --gc:orc
and using it to compile the devel
compiler with --gc:orc
yields:
Time taken (in milliseconds) | Allocations | Deallocations | |
---|---|---|---|
Table |
151 | 35219 | 17779 |
PackedSet + seq + find |
152 - 153 | 36865 | 19425 |
PackedSet + seq + rfind |
154 - 155 | 36865 | 19425 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The allocations really dictate performance, kinda sad that PackedSet
s weren't able to get ahead. I'll have to look closer especially once the AST starts to get the DOD treatment as I'd been hoping they'd do a lot of heavy lifting in a few areas.
Thank you for the review @haxscramper and @saem! The |
Merging the MIR is progressing rather slowly, as there are a lot of separate fixes required first. The most recent ones being fixes to the JS code-generator that are blocking the merging of #498, which is the PR with the fixes for the issue that is making bootstrapping fail in the Linux CI for this PR (the bug causing the |
52d106a
to
6849732
Compare
The PR is now rebased onto As for the test failures, the relevant ones are caused by the temporary injection pass breaking the assumptions of
Since the fix does not depend on the this PR and requires changes to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After fixing the remaining bugs (the backward traversal algorithm didn't work for nested loops, which I only noticed due to a single test failing with Valgrind reporting a leak) all tests pass, and I consider this PR to be done (apart from squashing and writing a commit message).
While fixing many bugs the old implementation had, a few new ones were introduced. I noted them, together with their impact, down via review comments, but in general think that none of them are severe enough to block this PR. The most critical one can be hacked around with a medium amount of effort, though I'd prefer to not do so as part of this PR.
# FIXME: none-lvalue-conversions also count as reads, but because no | ||
# distinction is being made between lvalue- and value-conversions | ||
# at the MIR level, they're currently not considered. This is an | ||
# issue and it needs to be fixed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a small pre-existing issue that dfa.nim
also has. In the current implementation it's only a cosmetic issue, but it should eventually be fixed nonetheless. I post-poned it, since the fix requires adjustments of the MIR.
Example:
var x = float(0.0)
discard int(x)
# `x` is treated as being used when control-flow reaches the `discard`
# statement, not when control-flow reaches the conversion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this paired with very aggressive/eager free placement we could introduce a bug where we free before the last use?
I agree that this is minor, IIUC we're not that aggressive currently, nor would we want to be in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's very likely that there's never going to be an issue, even if we're freeing very aggressively. The value resulting from a non-lvalue conversion is always used directly after, and there's nothing in between where a destruction could be placed.
# Another possible improvement is moving the actual hook injection | ||
# to a follow-up pass. The pass here would only inject ``mCopyAsgn`` and | ||
# ``mSinkAsgn`` magics, which the aforementioned follow-up pass then | ||
# expands into the hook calls. This would simplify the logic here a bit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is almost certain that this is the approach (without the magics) that I'll go with when rewriting/refactoring the pass. In my opinion, the benefits (composes better, much simpler logic, easier to understand) far outweigh the downsides (one more pass -> likely slower).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, I suspect looking at pass logic coalescing which fuses a bunch of these operations will be easier if each pass's logic is very simple/straightforwad. Not to mention, it's likely easier to get some quick wins with a little bit of pass composition based optimizations, than maintain a bunch of very carefully hand tuned implementations.
# FIXME: discriminant assignment lowering also needs to be disabled for | ||
# when generating code running at compile-time (e.g. inside a | ||
# macro) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to be a problem once the injectdestructors
pass is also used on code meant for running in the VM, but since that's not the case yet, no action is required for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree that it can be deferred. Moreover, that it's likely a good thing, I have a funny feeling about handling the same thing at different levels.
# TODO: with this approach, either the last branch needs to fork | ||
# control-flow to the next exception handler or ``mirgen`` has to | ||
# introduce a catch-all handler |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the second most severe issue (that is somewhat pre-exisiting). In the following example:
try:
try:
discard # 1
raise SomeError.newException("")
except OtherError:
discard
except SomeError:
discard # 2
there will be no path between 1 and 2 in the control-flow graph. The current plan for fixing this is to, as part of transf
, add what amounts to except: raise
to an nkTryStmt
that has at least one exception handler but not a "catch-all" one. Not only would that fix the issue in question here, there's also logic in the C and JS code-generators that can be removed then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handling it in a common place makes sense.
Although, not sure I could quite put together the proposed solution, would it effectively be this?
try:
try:
try:
discard # 1
raise SomeError.newException("")
except OtherError:
discard
except SomeError:
discard # 2
except:
raise
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost. The proposed solution would like this:
try:
try:
discard # 1
raise SomeError.newException("")
except OtherError:
discard
except:
raise
except SomeError:
discard # 2
except:
raise
This is roughly what the C and JS code-generators already do, and the control-flow graph generation logic would generate the correct edges without requiring any changes to it.
var saved | ||
var :tmp_1 = sibling[].right | ||
`=copy`(saved, :tmp_1) | ||
var :tmp_2 = sibling[].right |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although it looks like it, no copy is created here. The var :tmp2 = ...
is actually a var :tmp2: var T
, but there's an Empty
node in the IdentDefs
type slot, so nothing is rendered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, wish there was a way to comment it without mucking up the input. 🤔
while readLine(f, res): | ||
block label_1: | ||
while true: | ||
if op(readLine(f, res)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since astgen
has no way of knowing the name of the original symbol (apart from maybe inspecting the source node), it turns mnkMagic
MIR nodes into symbols with the name op
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to require far more fundamental fixes, outside the scope of this PR, of course.
# a literal constructor expression (which ``initValue`` also expands to) | ||
# not used *directly* in a consume context only produces a non-owning | ||
# container | ||
# TODO: this doesn't match the behaviour the specification describes (i.e. | ||
# constructor expression having the same semantics as procedure calls | ||
# in the context of lifetime hook injection). The behaviour here | ||
# should either be explicitly added to the spec or changed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is pre-existing behaviour that's not described by the specification.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's useful to know if it changed. Noting that this behaviour is not per the spec and one or the other should change is the right thing to do. I want to have more of a think about what it means, then we can discuss what makes sense.
Documenting it and leaving it as is good!
knownIssue: ''' | ||
Semantic analysis duplicates the analysed argument into each usage | ||
inside the template. Since it contains definitions of local variables, | ||
multiple definitions of the same entity exist in the expanded code, which | ||
is illegal at the MIR level, thus making the `injectdestructors` pass fail | ||
''' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decided to disable this test for now. It's the most severe issue, but there are so many hacks in mirgen
already that it felt wrong to introduce yet another one. For now, an entity being defined multiple times will raise an internal compiler error when the code reaches the injectdestructors
pass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds like it should be fixed in semantic analysis, rather than hack up injectdestructors
.
My next two days are really busy, but I could peek at sem and eval template code on the weekend, not sure what the fix will entail, though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, semantic analysis is the correct place to fix this.
The issue here was what originally motivated me to look into the macro output sanitizer. Turning the typed AST passed to a typed
template parameter into untyped AST before inserting it into the body could fix the problem, but work on the sanitizer is currently stuck due to issues with gensym'ed symbols.
## Summary The previous `injectdestructors` pass operated on `PNode` AST, and while it should have been able to work with the AST as generated by `astgen`, it didn't, which required the canonicalization step to be disabled when the `injecdestructors` pass was needed. Since the goal is to have all code-generators operate on either the MIR or another IR derived from the former, it is necessary that the MIR -> AST step happens immediately before code-generation, with no further transformation in-between. The pass is rewritten from the ground up, fixing many issues / bugs the previous implementation had and also making the the pass significantly more time and memory efficient than it was previously. In addition, the pass generates more efficient code due to the smarter omission of `wasMoved` and `=sink` calls. An additional change is that calling a procedure that only raises `Defect`s is now also considered as having control-flow related side- effects. They previously weren't, which meant that locations sometimes weren't cleaned up (via invoking the destructor) when raising a `Defect`. ## Fixes Assignments to location with destructors where the right-hand side is a complex expression now respect the strict left-to-right evaluation order: ```nim var i = 0 s = "str" # works now: (i = 1; s) = if i == 1: "str2" else: doAssert false; "" # also works now: proc p(s: var string): var string = i = 2 s p(s) = block: doAssert i == 2 "str2" ``` A mutation of the argument is now considered to happen when control-flow reaches the call, not when the argument is passed to a `var` parameter. The difference is subtle but important: ```nim proc f_sink(x: var seq[int], y: sink seq[int]) = y[0] = 2 doAssert x == [1, 2] # passes proc main() = var s = @[1, 2] f_sink(s, s) # a full copy is now introduced for the second argument main() ``` Globals are now only moved if it can proven that doing so does not affect program semantics. Previously, assigning a location derived from a global would perform a destructive move in certain circumstances, such as: ```nim # at module scope var a = "str" var b = a # moved proc p() = discard echo a # echoes an empty string ``` Due to the way the new implementation works, it is now possible for locations derived from globals to be consumed (if safe) when passing them to `sink` parameters. As a side-effect of operating on the MIR, the alias analysis now handles tuple access correctly: ```nim proc prc(x: var string) = var tup = (x: 0, y: "str") # `tup[1]` and `tup.y` were treated as two disjoint locations, # causing the assignment below to be erroneously optimized into a # move x = tup[1] doAssert tup.y == "str" var o: string prc(o) ``` ## Details In contrast to the previous implementation, the new one splits the pass up into multiple sub-passes. This makes the code easier to reason about, more modular, and reduces the amount of state each sub-pass has access to. The core part, the lifetime-tracking hook injection, is split into a *setup*, *analysis*, and *application* step (executed in that order). During the *setup* step, a control-flow graph of the code is computed and a database about the entities (i.e. locations) relevant to the pass built. Using both of them, the *analysis* step (which is itself split up into multiple sub-steps) computes for all values appearing in a *consume* context (e.g. argument to a `sink` parameter, right-hand-side of an assignment) whether they're owned and what locations still store a value at the end of their scope (and thus needing destruction). Finally, the *application* step uses all previously gather information to rewrite assignments into copy or sink hook calls, inject copies for unowned values passed to `sink` parameters, and to, where necessary, inject calls to the destructors. For both simplicity and efficiency, the assignment rewriting and copy injection both perform additional data-flow analysis in order to omit unnecessary calls to `wasMoved` and to turn sink assignments into shallow assignments (i.e. `FastAsgn`). ### Data-flow analysis Compared to the data-flow analysis implementation (`dfa.nim`) used previously, definitions (`def`) and uses (`use`) are not encoded in the control-flow graph but rather stored separately. The new forward analysis also doesn't have quadratic time and memory complexity in the case of loops. How data-flow analysis is employed is also different: instead of computing the last-reads and first-writes of *all* locations during a single traversal of the full control-flow graph, they're now computed *on-demand*. For example, to figure whether the value coming from an lvalue expression is owned (in a consume context) forward data-flow analysis is perfomed to check whether a further read is reachable. ### Additional changes - fix `mirchangesets` not compiling - fix multiple of the sets defined in `mirtrees` having the wrong elements - fix the `indexLike` tag routine for `NodeInstance` not being exported from `mirtrees` - fix `parentEnd` yielding the wrong node in some cases - add a procedure for querying the enum of a types equality operator magic without the symbol - add the `OrdinalSeq` type to the `utils/containers.nim` module - add the `Cursor` type; meant as a temporary placeholder for immutable views - add multiple new destructor-related test cases - remove the `nfLastRead` and `nfFirstWrite` node flags - remove the `sfSingleUsedTemp` symbol flag - remove the `rsemUncollectableRefCycle` warning. It had questionable utility and removing it was simpler than to make it work again. Should there exist the desire to bring it back, the problematic assignment should be detected in `sempass2` - remove the `optimizer` module - flag the `arc/tstrformat.nim` test as a known issue ### Timings Timings and peakmem are taken from compiling the compiler itself with the options `-d:release --compileOnly` and no cached compilation artifacts. The original compiler is build with `-d:release --exceptions:goto`. |compiler from|bootstrapped with|using|time taken|peakmem| |:------------|:----------------|:----|:---------|:------| |#d7a70238d0|`--gc:refc`|`--gc:refc`|24.426s|1015.844MiB| |this commit|`--gc:refc`|`--gc:refc`|24.687s|1014.254MiB| |#d7a70238d0 |`--gc:refc`|`--gc:orc`|45.206s|1.99GiB| |this commit |`--gc:refc`|`--gc:orc`|29.531s|1.222GiB| |#d7a70238d0 |`--gc:orc`|`--gc:refc`|15.492s|978.938MiB| |this commit|`--gc:orc`|`--gc:refc`|15.714s|979.023MiB| |#d7a70238d0 |`--gc:orc`|`--gc:orc`|22.218s|1.933GiB| |this commit|`--gc:orc`|`--gc:orc`|18.266s|1.188GiB|
3d10776
to
da47c7b
Compare
injectdestructors
as MIR passesinjectdestructors
a MIR pass
bors r+ |
Build succeeded: |
First, a short summary of how
injectdestructors
used to operate:use
anddef
) is generated for the code fragment (in this case, either the body of a procedure or a fragment of a module's top-level code) the transformations are to be applied toPNode
s corresponding to theuse
anddef
instructions for which a "last read" or "first write" is detected are marked with the respective node flagsnfLastRead
andnfFirstWrite
.=destroy
calls and temporaries are injected; assignments are rewritten into either=copy
calls,=sink
calls, or shallow copies (i.e.nkFastAsgn
); copies are introduced for arguments tosink
parameters where necessary;wasMoved
calls are inserted when an lvalue is consumed; scopes are wrapped intry
/finally
statements/expression if necessary and thenkIdentDefs
of relevant locals are moved to the start of their scope.optimizer.nim
) that eliminateswasMoved(x); =destroy(x)
pairs that cancel each other out is runWith the new implementation, the transformations are split into multiple passes. The control-flow graph still uses an instruction-based representation, but no longer contains the data-flow-related instructions. There are two new instructions,
join
andloop
, that together enable loop bodies (independent of their nesting) to be evaluated for their effects only once, and also for the graph to be efficiently traversed backwards. Themirexec
module implements both the CFG construction and the traversal routines.Instead of analyzing all uses of all locations for "last reads" in one single CFG traversal, each usage in a consume context (e.g. something passed to a
sink
parameter) is now analyzed separately. This greatly simplifies the analysis and reduces the amount of state that needs to be tracked, resulting in a large speedup.The aforementioned analysis is modeled as "solving" ownership. Each value that results from a computation gets an ownership status assigned -- a value is considered to be "owned" if there are no operations coming after the operation it resulted from that observe its presence in the location that stores it. Same as with the previous implementation, location access through aliases (i.e. a pointer) is not considered here. That is, accessing a location through a pointer is not considered to be a read or write of the underlying location. If a value is an r-value (e.g. it's the result of a procedure call and is not a view) it is always considered to be owned.
For destructor injection, instead of tentatively injecting a destructor for each location that has a destructor and letting the
optimizer
elide it later, a destructor is now only inserted for locations for which it is not certain whether they store (and own) a value once control-flow exits their scope. This analysis also takes implicit moves into account.Previously, a scope was wrapped in a
try
and the destructors called inside the correspondingfinally
clause if one of the following was present inside the scope: a call that raises or an explicit raise (Defect
s were ignored, independent of--panics
), abreak
/continue
statement, areturn
statement, another scope that contains one of the aforementioned. Afinally
clause is now only required if at least one location requiring destruction still stores a (owned) value when its scope is exited via unstructured control-flow. For example:s
requires a destructor call at the end of the block, but afinally
is not needed, sinces
no longer contains a value once control-flow reaches theraise
. To be able to detect consumes, the alive analysis requires value ownership to be figured out already.The actual assignment rewriting and destructor injection are performed as two separate but concurrently applied MIR passes. While it's possible to perform the copy injection for arguments that are not owned and are used in a consume context as another separate pass, it is applied as part of the assignment rewriting pass, in order to reduce the number of
MirTree
traversals.When an assignment doesn't own the source operand and the value represents a resource (i.e. something that has a
=destroy
hook), the assignment is rewritten into a call to the type-bound=copy
hook. If the source value is owned, depending on whether or not the destination location is empty, the assignment is rewritten into either a bit-wise copy (nkFastAsgn
) or a call to the type-bound=sink
hook. If the source is an lvalue (i.e. there exists a name for the containing location), a destructive move has to be performed (via a call towasMoved
), but only if it can't be proven that no connected write observes the now no longer owned value being present in the source location. The analysis (seeneedsReset
andisLastWrite
) takes into account if a destructor is present for the source location.Since destructors are only injected for alive locations and
wasMoved
calls are only generated where necessary, the separateoptimizer
pass is no longer needed.Fixes
Assignments to location with destructors where the right-hand side is a complex expression now respect the strict left-to-right evaluation order:
A mutation of the argument is now considered to happen when control-flow reaches the call, not when the argument is passed to a
var
parameter. The difference is subtle but important:Globals are now only moved if it can proven that doing so does not affect program semantics. Previously, assigning a location derived from a global would perform a destructive move in certain circumstances, such as:
Due to the way the new implementation works, it is now possible for locations derived from globals to be consumed (if safe) when passing them to
sink
parameters.As a side-effect of the MIR's canonicalization, the alias analysis now handles tuple access correctly:
Details
Timings
Timings and peakmem are taken from compiling the compiler itself with the options
-d:release --compileOnly
and no cached compilation artifacts. The original compiler is build with-d:release --exceptions:goto
.--gc:refc
--gc:refc
--gc:refc
--gc:refc
--gc:refc
--gc:orc
--gc:refc
--gc:orc
--gc:orc
--gc:refc
--gc:orc
--gc:refc
--gc:orc
--gc:orc
--gc:orc
--gc:orc
To-Do
--expandArc
tests/lang_objects/destructor/ttests.nim
devel
compiler/sem/optimizer.nim
module -- it's unused nowmirexec.nim
(partially done)Notes for reviewers