Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compiler: make injectdestructors a MIR pass #450

Merged
merged 1 commit into from
Feb 17, 2023

Conversation

zerbina
Copy link
Collaborator

@zerbina zerbina commented Oct 16, 2022

First, a short summary of how injectdestructors used to operate:

  1. an instruction-based control-flow-graph that also contains instruction related to data-flow analysis (i.e. use and def) is generated for the code fragment (in this case, either the body of a procedure or a fragment of a module's top-level code) the transformations are to be applied to
  2. the control- / data-flow-graph is traversed (executed) in order to figure out which uses of locations are "last reads" and which writes to them are "first writes". The PNodes corresponding to the use and def instructions for which a "last read" or "first write" is detected are marked with the respective node flags nfLastRead and nfFirstWrite.
  3. the AST is processed -- this is where the core logic happens. =destroy calls and temporaries are injected; assignments are rewritten into either =copy calls, =sink calls, or shallow copies (i.e. nkFastAsgn); copies are introduced for arguments to sink parameters where necessary; wasMoved calls are inserted when an lvalue is consumed; scopes are wrapped in try/finally statements/expression if necessary and the nkIdentDefs of relevant locals are moved to the start of their scope.
  4. the optimizer (optimizer.nim) that eliminates wasMoved(x); =destroy(x) pairs that cancel each other out is run

With the new implementation, the transformations are split into multiple passes. The control-flow graph still uses an instruction-based representation, but no longer contains the data-flow-related instructions. There are two new instructions, join and loop, that together enable loop bodies (independent of their nesting) to be evaluated for their effects only once, and also for the graph to be efficiently traversed backwards. The mirexec module implements both the CFG construction and the traversal routines.

Instead of analyzing all uses of all locations for "last reads" in one single CFG traversal, each usage in a consume context (e.g. something passed to a sink parameter) is now analyzed separately. This greatly simplifies the analysis and reduces the amount of state that needs to be tracked, resulting in a large speedup.

The aforementioned analysis is modeled as "solving" ownership. Each value that results from a computation gets an ownership status assigned -- a value is considered to be "owned" if there are no operations coming after the operation it resulted from that observe its presence in the location that stores it. Same as with the previous implementation, location access through aliases (i.e. a pointer) is not considered here. That is, accessing a location through a pointer is not considered to be a read or write of the underlying location. If a value is an r-value (e.g. it's the result of a procedure call and is not a view) it is always considered to be owned.

For destructor injection, instead of tentatively injecting a destructor for each location that has a destructor and letting the optimizer elide it later, a destructor is now only inserted for locations for which it is not certain whether they store (and own) a value once control-flow exits their scope. This analysis also takes implicit moves into account.

Previously, a scope was wrapped in a try and the destructors called inside the corresponding finally clause if one of the following was present inside the scope: a call that raises or an explicit raise (Defects were ignored, independent of --panics), a break/continue statement, a return statement, another scope that contains one of the aforementioned. A finally clause is now only required if at least one location requiring destruction still stores a (owned) value when its scope is exited via unstructured control-flow. For example:

block:
  var s = @[1, 2]
  if cond:
    f_sink(s) # `s` is consumed
    raise ValueError.newException("")
  else:
    discard

s requires a destructor call at the end of the block, but a finally is not needed, since s no longer contains a value once control-flow reaches the raise. To be able to detect consumes, the alive analysis requires value ownership to be figured out already.

The actual assignment rewriting and destructor injection are performed as two separate but concurrently applied MIR passes. While it's possible to perform the copy injection for arguments that are not owned and are used in a consume context as another separate pass, it is applied as part of the assignment rewriting pass, in order to reduce the number of MirTree traversals.

When an assignment doesn't own the source operand and the value represents a resource (i.e. something that has a =destroy hook), the assignment is rewritten into a call to the type-bound =copy hook. If the source value is owned, depending on whether or not the destination location is empty, the assignment is rewritten into either a bit-wise copy (nkFastAsgn) or a call to the type-bound =sink hook. If the source is an lvalue (i.e. there exists a name for the containing location), a destructive move has to be performed (via a call to wasMoved), but only if it can't be proven that no connected write observes the now no longer owned value being present in the source location. The analysis (see needsReset and isLastWrite) takes into account if a destructor is present for the source location.

Since destructors are only injected for alive locations and wasMoved calls are only generated where necessary, the separate optimizer pass is no longer needed.

Fixes

Assignments to location with destructors where the right-hand side is a complex expression now respect the strict left-to-right evaluation order:

var
  i = 0
  s = "str"

# works now:
(i = 1; s) =
  if i == 1: "str2"
  else: doAssert false; ""

# also works now:
proc p(s: var string): var string =
  i = 2
  s

p(s) = block:
  doAssert i == 2
  "str2"

A mutation of the argument is now considered to happen when control-flow reaches the call, not when the argument is passed to a var parameter. The difference is subtle but important:

proc f_sink(x: var seq[int], y: sink seq[int]) =
  y[0] = 2
  doAssert x == [1, 2] # passes

proc main() =
  var s = @[1, 2]
  f_sink(s, s) # a full copy is now introduced for the second argument

main()

Globals are now only moved if it can proven that doing so does not affect program semantics. Previously, assigning a location derived from a global would perform a destructive move in certain circumstances, such as:

# at module scope
var a = "str"
var b = a # moved

proc p() =
  discard

echo a # echoes an empty string

Due to the way the new implementation works, it is now possible for locations derived from globals to be consumed (if safe) when passing them to sink parameters.

As a side-effect of the MIR's canonicalization, the alias analysis now handles tuple access correctly:

proc prc(x: var string) =
  var tup = (x: 0, y: "str")
  # `tup[1]` and `tup.y` were treated as two disjoint locations, causing the assignment
  # below to be erroneously optimized into a move
  x = tup[1]
  doAssert tup.y == "str"

var o: string
prc(o)

Details

Timings

Timings and peakmem are taken from compiling the compiler itself with the options -d:release --compileOnly and no cached compilation artifacts. The original compiler is build with -d:release --exceptions:goto.

compiler from bootstrapped with using time taken peakmem
#d7a70238d0 --gc:refc --gc:refc 24.426s 1015.844MiB
this commit --gc:refc --gc:refc 24.687s 1014.254MiB
#d7a70238d0 --gc:refc --gc:orc 45.206s 1.99GiB
this commit --gc:refc --gc:orc 29.531s 1.222GiB
#d7a70238d0 --gc:orc --gc:refc 15.492s 978.938MiB
this commit --gc:orc --gc:refc 15.714s 979.023MiB
#d7a70238d0 --gc:orc --gc:orc 22.218s 1.933GiB
this commit --gc:orc --gc:orc 18.266s 1.188GiB

To-Do

  • adjust the test spec for the tests using --expandArc
  • split up tests/lang_objects/destructor/ttests.nim
  • either fix/cleanup or remove the MIR unit tests
  • move the MIR introduction into a separate PR
  • rebase this PR onto devel
  • remove the compiler/sem/optimizer.nim module -- it's unused now
  • fix the issue with procedure-level globals
  • address the remaining review comments
  • improve the documentation of mirexec.nim (partially done)
  • compile a list of pre-existing issues and newly introduced regressions
  • squash the commits and write a PR/commit message

Notes for reviewers

  • I collected a list of issues/notes in the form of review comments

@saem
Copy link
Collaborator

saem commented Oct 17, 2022

I think I understand it, going to write that out for confirmation:

injectdestructors:

  1. has mirgen generates an initial MirTree (genProcBody)
  2. then comes up with a list of changes it's going to make (analyse)
  3. applies them to the MirTree
  4. converts the updated MirTree to PNode

Prior to reading it, what I was looking for:

  • that it's basically all lisp-y on the inside, the AST is a program (we briefly discussed it before in chat and it turns out we're very much on the same page)
  • translation to any intermediate should "remember" where it came from
  • the preivous point is most true wrt to changes made for precise error/warnings (need lineage)

It seems you've already accounted for the majority of it -- if not all and it's just my understanding falling short. I think it's mostly covered with the ChangeSet + Action which remembers the PNode it's acting upon. So errors should be able to trace their lineage back within MIR. The second bullet point I'm a little less sure about because I can't see the how we could correlate original PNode to initial MirNode -- the final hop.

So I'm not sure how we flow the info back up to Sem n order for us to ensure we can flow constraints (mostly types info) down from Sem and warnings/errors telemetry backup. Apologies if it's just my lack of understanding the code.

@zerbina
Copy link
Collaborator Author

zerbina commented Oct 17, 2022

I think I understand it, going to write that out for confirmation:

Your understanding is correct. I apologize for not having provided a high-level overview.


I think I insufficiently described my motivation for this PR, and since that information can help in understanding some decisions behind the current implementation, here goes a second attempt:

The main contributors to my decision of working on this PR first are as following:

  • the pre-PR injectdestructors produces AST that violates various (reasonable) expectations of irgen. For example, sometimes a produced nkStmtListExpr has no type (i.e. it isn't actually an expression) or it has a type but doesn't end in an expression. While possible to work around these things, additional logic would be required and irgen would also have to become less strict about it's input.
  • the idea of implementing the move-analyser at the back-end IR level has been around since the new back-end's inception, but I no longer think that it's a good idea. Over time, the additional complexity of having the back-end IR accommodate for the move-analyser started to outweigh the benefits -- the ideas I collected on how the move-analyser could be implemented work better (if at all) with a slightly higher-level IR.
  • irgen is quite complex. It does a lot of different things, like performing general transformations and lowering some magics. Then there's also the deferred type translation plus the subsequent fix-up of all type IDs required by it.

So to rephrase, the goal here is to introduce an IR that:

  • is meant to support injectdestructors, lambdalifting, and, later on, maybe also other parts of transf (e.g. closureiterators, although this one will probably become unnecessary with the introduction of CPS transforms) efficiently operating on it
  • works as the input to irgen

My current plan for the how the flow should look like in the near-term:

input ->
  transform() ->               # `transf`
    toMir() ->                 # currently `genProcBody`
      lambdalifting() ->
        injectdestructors() ->
          backend()            # `irgen`

Using MirNode in it's current form as the input to new back-end (or more specifically, irgen) would have the following benefits already:

  • multiple transformations previously performed during irgen (listed in the PR's description) are now performed in mirgen. This would simplify irgen a lot.
  • the elision of nodes irrelevant to the back-end (e.g. nkBindStmt, nkTypeDef, etc) together with the linear tree structure make scanning of dependencies (types, procedures, constants, literals, etc.) trivial, rendering, among other things, deferred type translation for irgen obsolete.

The main reason behind the MirNode -> PNode translation is to keep the current code-generators working without having to keep two implementations of injectdestructors around.


The second bullet point I'm a little less sure about because I can't see the how we could correlate original PNode to initial MirNode -- the final hop.

A MirNode's origin is currently not kept track of (maybe to a source location, but not to a PNode) - this is something that's still missing.

So I'm not sure how we flow the info back up to Sem n order for us to ensure we can flow constraints (mostly types info) down from Sem and warnings/errors telemetry backup.

I'm not sure I understand. In which case is it required for information to flow back from the MIR to Sem? Sorry if it's obvious.

@saem
Copy link
Collaborator

saem commented Oct 18, 2022

Thank you for the detailed explanation. The motivations make a lot of sense and the overall flow is sensible. I didn't need it, but consider me extra convinced. 😁


So I'm not sure how we flow the info back up to Sem n order for us to ensure we can flow constraints (mostly types info) down from Sem and warnings/errors telemetry backup.

I'm not sure I understand. In which case is it required for information to flow back from the MIR to Sem? Sorry if it's obvious.

Somewhat of a guess on my part, as we're doing lifetime analysis here, then any invariant enforcement would also need to be here?

Possibilities aside, at the time of my prior comment I had done a very quick search in the current inject destructors to look for localReport and saw a few hits raising sem errors. I'm presuming we'll have to do the same-ish. Not a very thorough investigation admittedly.

Reading between the lines we'll likely end up with origin being tracked and nothing about the existing approach seems to make that an impossibility, I think you nailed it. 🎉

compiler/sem/mirtraverse.nim Outdated Show resolved Hide resolved
@zerbina
Copy link
Collaborator Author

zerbina commented Nov 2, 2022

The most recent commit contains a partial redesign of the MIR.

The previous iteration was a simple, linear-tree-based IR, which was structurally similar to PNode (albeit with a lot less node kinds). Except for things like symbols, literals, and other atoms, expressions and statements all used forward trees. That is, first comes the head node (mnkCall for example), then all child-nodes (which can be other sub-tree), and then the mnkEnd node.

One of the main differences when compared to PNode AST is the omission of complex expressions (if, case, try, block, and statement list expressions) in MIR -- those are transformed into assignments to temporaries. This was done to simplify both data-flow analysis and further code modifications. For example:

let x =
  if a: true
  else: false

get translated to the MIR equivalent of:

var tmp: bool
if a:
  tmp = true
else:
  tmp = false
let x = tmp

A significant issue with this is the loss of information about the control-flow dependencies between separate expressions. Consider:

proc fsink(a: sink string, b: string) = ...

fsink(notLastRead x, (if cond: y else: z))

which got transformed to the MIR equivalent of:

var tmp: string
if cond:
  tmp = y
else:
  tmp = z

fsink(notLastRead x, tmp)

x needs to copied, but it's not clear at which point, as the argument expressions x and tmp are entirely unrelated in the IR. While the observable effects would be minor in this specific case (string copies happening in a different order), injecting a copy operation directly before the call is wrong, as the copy is required to happen as part of evaluating the respective argument expression.

One solution to this problem would be to assign every argument expression to a temporary first, use the temporaries as the actual arguments, and then inject the copy (or any other operations performed as part of injectdestructors, e.g. sinking into an argument) at the position where the respective temporary is defined. This has the significant downside that these extra constraints and behaviour are not expressed explicitly in the IR.

There also were other issues, so I opted for a partial redesign instead.


The core of the redesign is a shift to a representation where the focus is more on the flow of values. Expressions are now not tree-based anymore but are instead a sequence of applying operations to lvalues and values. Except for if and case, statements and their representation were left untouched -- they still use sub-trees.

Each sequence starts with an input (e.g. a literal or the name of some entity), is followed by zero or more in-out operations, and ends in a sink. A sink is a kind of operation that doesn't produce a value. For example, both if and raise statements are sinks. A procedure call is treated as always returning a value and has to be explicitly followed by a dedicated void sink. Multiple arguments to an operation are supported via argument blocks and the arg sink.

As a simple demonstration:

doSomething(x, "a", (echo "a"; y.b.c.int))

would be represented in the MIR as:

argBlock:
  proc doSomething -> arg
  local x -> arg
  lit "a" -> arg
  argBlock:
    proc echo -> arg
    lit "a" -> arg
  end argBlock ->
  call -> void
  local y -> path -> stdConv -> arg
end argBlock ->
call -> void

Note that the callee procedure is now also only an argument to the call operation.

Reading the above top-to-bottom left-to-right yields the nodes in the order they're placed in memory. The arrows are only meant for visualizing the flow of values. For documentation purposes, I wrote down the new grammar as a doc comment in compiler/sem/mirtrees.nim.

I originally wanted for path expressions to use their own IR, mainly because this would allow for de-duplicating them and assigning each one a unique ID, making comparing two path expression for equality very efficient.

The separate IR is also how they're currently implemented, but none of the expected benefits have materialized so far (comparing two paths for equality hasn't come up yet). There are many not-yet-solved issues, and de-duplicating them is complicated by the fact that conversions can be part of paths, so sameType is required as part of the comparison (or it would be at least -- de-duplication is not yet implemented).

In general, there are still a lot of open questions, and some constructs, such as user-ops, are under- or unspecified. Nonetheless, I believe that in it's current state, the refactored MIR is already better than the previous iteration.

Related to the MIR changes, there's an experiment with a small template-based DSL for making MIR generation easier -- see mirconstr for the DSL and mirgen for how it's currently applied.


I also made general progress in regards to the actual injectdestructors rewrite, and did some down-scoping by removing the unfinished attempt of a re-implementation of cursor inference. Half of the tests in the destructors and arc categories succeed now.

There are some pre-existing issue with some of the rewrite-rules, mainly in how they interact (or not) with unstructured control-flow. I'll create tests for them, but am probably not going to fix them as part of this PR.

@saem
Copy link
Collaborator

saem commented Nov 6, 2022

I'm a little (very) sleep deprived so my review wasn't not very insightful I gather -- feel free to ignore the review comments if they're just unhelpful. I've read the code as best as I can and revisited your comment, I think the new approach is better (operatinos on l-values).

I'll take another peek tomorrow -- hopefully I'll be able to understand things better then. I wouldn't consider that a blocker.


Aside: minor thing is that perhaps there might be some vlaue in differentiating between callReturnValue and callUnit or something like that in order to drop the void requirement. But I'm pretty motivated to destroy conditionals/nested code.

compiler/sem/mirtrees.nim Outdated Show resolved Hide resolved
## This module contais the type defitions for the mid-end IR (MIR) together
## with routines for querying them.
##
## Routines for traversing the used flat-tree-structure are also located here.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bringing over the high level description of the MIR approach from your recent PR comment would be a nice to have.

I'm popping back and forth because my crappy working memory has to continually remind itself of what's up. 😅

Comment on lines 6 to 9
import
compiler/ast/[
ast_types
]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Casual remark:

I had a fun little realization. I know this module is the entry point, in terms of learning, for the MIR "package" because it's got basically no dependencies. Paired with some instructive comment that's a pretty nice cue in terms of navigating this bit of the compiler.

compiler/sem/mirmodify.nim Outdated Show resolved Hide resolved
compiler/sem/mirtraverse.nim Outdated Show resolved Hide resolved
compiler/sem/mirtraverse.nim Outdated Show resolved Hide resolved

import
compiler/ast/ast_types,
compiler/sem/mirtrees
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any particular reason to separate these out from mirtrees?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The construction procedures and utilities are only needed when generating new MIR code. For just comprehending and processing MIR (e.g. mirtraverse.nim, astgen.nim, etc.), they're not needed, so I thought it'd make sense to put them into a separate module.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough, I can go either way on this sort of thing. Maaaaybe consider renaming it mirtrees_constr.nim so it feels a bit subordinate to the other module. Might hint at "don't read me first".

@haxscramper haxscramper added compiler General compiler tag compiler/sem Related to semantic-analysis system of the compiler refactor Implementation refactor labels Nov 20, 2022
@haxscramper haxscramper added this to the C backend refactoring milestone Nov 22, 2022
@zerbina zerbina changed the title WIP: rewrite injectdestructors compiler: rewrite injectdestructors as MIR passes Dec 3, 2022
@zerbina zerbina marked this pull request as ready for review December 3, 2022 22:31
compiler/ast/ast_query.nim Outdated Show resolved Hide resolved
compiler/ast/ast_query.nim Outdated Show resolved Hide resolved
Copy link
Collaborator

@saem saem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm suggesting small bits like doc comments right now as I'm refreshing my memory, thankfully the complexity is largely in the subject matter and not the implementaiton. 😄

Observations thus far:

  • the code is clean and I feel like if there were issues I could tests and fix them
  • similar to above, I can see myself or others extending it
  • this obviously took a lot of effort and it shows, thank you thank you!
  • my confidence in the change is mostly around CI at this point

🎉


I'm marking it approved. I'll keep reading and if I find areas of improvement that meaningfully simplify things or can easily be extracted, I'll note them in the PR.

compiler/mir/analysis.nim Outdated Show resolved Hide resolved
compiler/mir/analysis.nim Outdated Show resolved Hide resolved
compiler/mir/analysis.nim Outdated Show resolved Hide resolved
## Stores information about the produced values plus the lvalue effects of
## operations
values: IndexedSeq[OpValue, ValueInfo]
# XXX: `values` currently stores an entry for each *node*. Not every node
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tables are really really really slow. :(

They absolutely tanked performance when I was first prototyping a DOD AST with them.

Comment on lines 114 to 115
## Computes the control-flow graph for the given `tree`. This is a very
## expensive operation!
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
## Computes the control-flow graph for the given `tree`. This is a very
## expensive operation!
## Computes the control-flow graph for the given `tree`. This is a very
## expensive operation! The high cost is due to two essential reasons:
##
## 1. a control flow graph needs to materialize all paths for a given `tree`
## 2. the amount of allocations/bookkeeping necessary to do so

This is partly me just confirming my understanding, the first point I'm fairly confident about and the second one is more me guessing by looking at the push/pop of sequences and the Table in ClosureEnv.

Aside, I spent a few minutes trying to reason about replacing that table and the best I could figure was maybe the following:

type
  ClosureEnv = object
    # ... existing bits except joins
    joinedNodes: seq[NodePosition]
    nodeIsJoined: PackedSet[NodePosition]`

proc addJoin(env: var ClosureEnv, dest: NodePosition): JoinPoint =
  let (id, isNew) =
    if env.nodeIsJoined.containsOrIncl(dest):
      env.nodeIsJoined.add dest
      (env.joinedNodes.len - 1, true)
    else:
      (env.joinedNodes.find dest, false)  # likely more efficient with a reverse find

  JoinPoint(id: id, node: dest, isNew: isNew)

This is mostly a test of my understanding and a bit of fun, I doubt it's an improvement.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't say that the control-flow graph needs to materialize all paths for tree, rather only all edges (in the form of instructions). However, I might be misunderstanding what you mean.

Regarding the reason for why computeCfg is expensive, the last time I profiled it, the by far largest amount of time was spent inside the tree traversal routines. For a general perspective, using a -d:release compiler built with --gc:orc to compile itself with --gc:orc as the benchmark, more time (~15%) is spent in computeCfg than in mirgen (i.e. genProcBody).

I did another profiling session using the newest commit -- the six most time consuming calls inside computeCfg are:

Procedure Time spent
mirtrees.findEnd 0.401 %
mirtrees.parent 0.224 %
algorithm.sort 0.0476 %
commit 0.0174 %
mirtrees.findParent 0.0174 %
addEdge 0.011 %

The "Time spent" is a percentage of the total compilation time. Using either recursion or a skip-list (or both) could reduce the amount of time spent on tree traversal.


The snippet you posted is, apart from two problems that I believe are unintended, correct. I was curious and ran a comparison between your code and the current implementation. Instrumenting a release compiler that is booted with --gc:orc and using it to compile the devel compiler with --gc:orc yields:

Time taken (in milliseconds) Allocations Deallocations
Table 151 35219 17779
PackedSet + seq + find 152 - 153 36865 19425
PackedSet + seq + rfind 154 - 155 36865 19425

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The allocations really dictate performance, kinda sad that PackedSets weren't able to get ahead. I'll have to look closer especially once the AST starts to get the DOD treatment as I'd been hoping they'd do a lot of heavy lifting in a few areas.

@zerbina
Copy link
Collaborator Author

zerbina commented Dec 9, 2022

Thank you for the review @haxscramper and @saem!

The for-loop PR and should fix the issue that currently causes a C compiler error for nimgrep.

@zerbina
Copy link
Collaborator Author

zerbina commented Dec 22, 2022

Merging the MIR is progressing rather slowly, as there are a lot of separate fixes required first. The most recent ones being fixes to the JS code-generator that are blocking the merging of #498, which is the PR with the fixes for the issue that is making bootstrapping fail in the Linux CI for this PR (the bug causing the nimgrep to not compile is already fixed in the MIR PR).

@saem saem assigned saem and unassigned saem Jan 8, 2023
@zerbina zerbina force-pushed the injectdestructors-rewrite branch from 52d106a to 6849732 Compare February 10, 2023 20:27
@zerbina
Copy link
Collaborator Author

zerbina commented Feb 10, 2023

The PR is now rebased onto devel (a slightly outdated version of it), meaning that the things already split out and merged separately are no longer part of the changeset.

As for the test failures, the relevant ones are caused by the temporary injection pass breaking the assumptions of mirgen.

mirgen translates the definitions (i.e. nkIdentDefs) of procedure-level globals in a way so that the resulting MIR code is turned back into an nkVarSection + nkIdentDefs by astgen, but with the injection of temporaries now happening, the assumption made by mirgen that the code it generates reaches astgen without modifications is no longer true.

Since the fix does not depend on the this PR and requires changes to transf and the code-generators, I'm going implement it in a separate PR.

@zerbina zerbina marked this pull request as draft February 12, 2023 16:04
@zerbina zerbina marked this pull request as ready for review February 15, 2023 00:12
Copy link
Collaborator Author

@zerbina zerbina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After fixing the remaining bugs (the backward traversal algorithm didn't work for nested loops, which I only noticed due to a single test failing with Valgrind reporting a leak) all tests pass, and I consider this PR to be done (apart from squashing and writing a commit message).

While fixing many bugs the old implementation had, a few new ones were introduced. I noted them, together with their impact, down via review comments, but in general think that none of them are severe enough to block this PR. The most critical one can be hacked around with a medium amount of effort, though I'd prefer to not do so as part of this PR.

Comment on lines +125 to +128
# FIXME: none-lvalue-conversions also count as reads, but because no
# distinction is being made between lvalue- and value-conversions
# at the MIR level, they're currently not considered. This is an
# issue and it needs to be fixed
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a small pre-existing issue that dfa.nim also has. In the current implementation it's only a cosmetic issue, but it should eventually be fixed nonetheless. I post-poned it, since the fix requires adjustments of the MIR.

Example:

var x = float(0.0)
discard int(x)
# `x` is treated as being used when control-flow reaches the `discard`
# statement, not when control-flow reaches the conversion  

Copy link
Collaborator

@saem saem Feb 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this paired with very aggressive/eager free placement we could introduce a bug where we free before the last use?

I agree that this is minor, IIUC we're not that aggressive currently, nor would we want to be in the future.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's very likely that there's never going to be an issue, even if we're freeing very aggressively. The value resulting from a non-lvalue conversion is always used directly after, and there's nothing in between where a destruction could be placed.

Comment on lines +965 to +971
# Another possible improvement is moving the actual hook injection
# to a follow-up pass. The pass here would only inject ``mCopyAsgn`` and
# ``mSinkAsgn`` magics, which the aforementioned follow-up pass then
# expands into the hook calls. This would simplify the logic here a bit
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is almost certain that this is the approach (without the magics) that I'll go with when rewriting/refactoring the pass. In my opinion, the benefits (composes better, much simpler logic, easier to understand) far outweigh the downsides (one more pass -> likely slower).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I suspect looking at pass logic coalescing which fuses a bunch of these operations will be easier if each pass's logic is very simple/straightforwad. Not to mention, it's likely easier to get some quick wins with a little bit of pass composition based optimizations, than maintain a bunch of very carefully hand tuned implementations.

Comment on lines +1439 to +1444
# FIXME: discriminant assignment lowering also needs to be disabled for
# when generating code running at compile-time (e.g. inside a
# macro)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to be a problem once the injectdestructors pass is also used on code meant for running in the VM, but since that's not the case yet, no action is required for now.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that it can be deferred. Moreover, that it's likely a good thing, I have a funny feeling about handling the same thing at different levels.

Comment on lines +344 to +346
# TODO: with this approach, either the last branch needs to fork
# control-flow to the next exception handler or ``mirgen`` has to
# introduce a catch-all handler
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the second most severe issue (that is somewhat pre-exisiting). In the following example:

try:
  try:
    discard # 1
    raise SomeError.newException("")
  except OtherError:
    discard
except SomeError:
  discard # 2

there will be no path between 1 and 2 in the control-flow graph. The current plan for fixing this is to, as part of transf, add what amounts to except: raise to an nkTryStmt that has at least one exception handler but not a "catch-all" one. Not only would that fix the issue in question here, there's also logic in the C and JS code-generators that can be removed then.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handling it in a common place makes sense.

Although, not sure I could quite put together the proposed solution, would it effectively be this?

try:
  try:
    try:
      discard # 1
      raise SomeError.newException("")
    except OtherError:
      discard
  except SomeError:
    discard # 2
except:
  raise

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost. The proposed solution would like this:

try:
  try:
    discard # 1
    raise SomeError.newException("")
  except OtherError:
    discard
  except:
    raise
except SomeError:
  discard # 2
except:
  raise

This is roughly what the C and JS code-generators already do, and the control-flow graph generation logic would generate the correct edges without requiring any changes to it.

compiler/sem/mirexec.nim Outdated Show resolved Hide resolved
var saved
var :tmp_1 = sibling[].right
`=copy`(saved, :tmp_1)
var :tmp_2 = sibling[].right
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although it looks like it, no copy is created here. The var :tmp2 = ... is actually a var :tmp2: var T, but there's an Empty node in the IdentDefs type slot, so nothing is rendered.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, wish there was a way to comment it without mucking up the input. 🤔

while readLine(f, res):
block label_1:
while true:
if op(readLine(f, res)):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since astgen has no way of knowing the name of the original symbol (apart from maybe inspecting the source node), it turns mnkMagic MIR nodes into symbols with the name op.

Copy link
Collaborator

@saem saem Feb 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to require far more fundamental fixes, outside the scope of this PR, of course.

Comment on lines +60 to +66
# a literal constructor expression (which ``initValue`` also expands to)
# not used *directly* in a consume context only produces a non-owning
# container
# TODO: this doesn't match the behaviour the specification describes (i.e.
# constructor expression having the same semantics as procedure calls
# in the context of lifetime hook injection). The behaviour here
# should either be explicitly added to the spec or changed
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pre-existing behaviour that's not described by the specification.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's useful to know if it changed. Noting that this behaviour is not per the spec and one or the other should change is the right thing to do. I want to have more of a think about what it means, then we can discuss what makes sense.

Documenting it and leaving it as is good!

Comment on lines +7 to +12
knownIssue: '''
Semantic analysis duplicates the analysed argument into each usage
inside the template. Since it contains definitions of local variables,
multiple definitions of the same entity exist in the expanded code, which
is illegal at the MIR level, thus making the `injectdestructors` pass fail
'''
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to disable this test for now. It's the most severe issue, but there are so many hacks in mirgen already that it felt wrong to introduce yet another one. For now, an entity being defined multiple times will raise an internal compiler error when the code reaches the injectdestructors pass.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds like it should be fixed in semantic analysis, rather than hack up injectdestructors.

My next two days are really busy, but I could peek at sem and eval template code on the weekend, not sure what the fix will entail, though

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, semantic analysis is the correct place to fix this.

The issue here was what originally motivated me to look into the macro output sanitizer. Turning the typed AST passed to a typed template parameter into untyped AST before inserting it into the body could fix the problem, but work on the sanitizer is currently stuck due to issues with gensym'ed symbols.

compiler/sem/mirexec.nim Outdated Show resolved Hide resolved
## Summary
The previous `injectdestructors` pass operated on `PNode` AST, and
while it should have been able to work with the AST as generated by
`astgen`, it didn't, which required the canonicalization step to be
disabled when the `injecdestructors` pass was needed.

Since the goal is to have all code-generators operate on either the MIR
or another IR derived from the former, it is necessary that the
MIR -> AST step happens immediately before code-generation, with no
further transformation in-between.

The pass is rewritten from the ground up, fixing many issues / bugs the
previous implementation had and also making the the pass significantly
more time and memory efficient than it was previously. In addition,
the pass generates more efficient code due to the smarter omission of
`wasMoved` and `=sink` calls.

An additional change is that calling a procedure that only raises
`Defect`s is now also considered as having control-flow related side-
effects. They previously weren't, which meant that locations sometimes
weren't cleaned up (via invoking the destructor) when raising a
`Defect`.

## Fixes
Assignments to location with destructors where the right-hand side is a
complex expression now respect the strict left-to-right evaluation
order:
```nim
var
  i = 0
  s = "str"

# works now:
(i = 1; s) =
  if i == 1: "str2"
  else: doAssert false; ""

# also works now:
proc p(s: var string): var string =
  i = 2
  s

p(s) = block:
  doAssert i == 2
  "str2"

```

A mutation of the argument is now considered to happen when
control-flow reaches the call, not when the argument is passed to a
`var` parameter. The difference is subtle but important:
```nim
proc f_sink(x: var seq[int], y: sink seq[int]) =
  y[0] = 2
  doAssert x == [1, 2] # passes

proc main() =
  var s = @[1, 2]
  f_sink(s, s) # a full copy is now introduced for the second argument

main()
```

Globals are now only moved if it can proven that doing so does not
affect program semantics. Previously, assigning a location derived from
a global would perform a destructive move in certain circumstances,
such as:

```nim
# at module scope
var a = "str"
var b = a # moved

proc p() =
  discard

echo a # echoes an empty string
```

Due to the way the new implementation works, it is now possible for
locations derived from globals to be consumed (if safe) when passing
them to `sink` parameters.

As a side-effect of operating on the MIR, the alias analysis now
handles tuple access correctly:
```nim
proc prc(x: var string) =
  var tup = (x: 0, y: "str")
  # `tup[1]` and `tup.y` were treated as two disjoint locations,
  # causing the assignment below to be erroneously optimized into a
  # move
  x = tup[1]
  doAssert tup.y == "str"

var o: string
prc(o)
```

## Details
In contrast to the previous implementation, the new one splits the pass
up into multiple sub-passes. This makes the code easier to reason
about, more modular, and reduces the amount of state each sub-pass has
access to.

The core part, the lifetime-tracking hook injection, is split into a
*setup*, *analysis*, and *application* step (executed in that order).

During the *setup* step, a control-flow graph of the code is computed
and a database about the entities (i.e. locations) relevant to the pass
built. Using both of them, the *analysis* step (which is itself split up
into multiple sub-steps) computes for all values appearing in a
*consume* context (e.g. argument to a `sink` parameter, right-hand-side
of an assignment) whether they're owned and what locations still store
a value at the end of their scope (and thus needing destruction).

Finally, the *application* step uses all previously gather information
to rewrite assignments into copy or sink hook calls, inject copies for
unowned values passed to `sink` parameters, and to, where necessary,
inject calls to the destructors.

For both simplicity and efficiency, the assignment rewriting and copy
injection both perform additional data-flow analysis in order to
omit unnecessary calls to `wasMoved` and to turn sink assignments
into shallow assignments (i.e. `FastAsgn`).

### Data-flow analysis
Compared to the data-flow analysis implementation (`dfa.nim`) used
previously, definitions (`def`) and uses (`use`) are not encoded in the
control-flow graph but rather stored separately. The new forward
analysis also doesn't have quadratic time and memory complexity in the
case of loops.

How data-flow analysis is employed is also different: instead of
computing the last-reads and first-writes of *all* locations during a
single traversal of the full control-flow graph, they're now computed
*on-demand*. For example, to figure whether the value coming from an
lvalue expression is owned (in a consume context) forward data-flow
analysis is perfomed to check whether a further read is reachable.

### Additional changes
- fix `mirchangesets` not compiling
- fix multiple of the sets defined in `mirtrees` having the wrong
  elements
- fix the `indexLike` tag routine for `NodeInstance` not being exported
  from `mirtrees`
- fix `parentEnd` yielding the wrong node in some cases
- add a procedure for querying the enum of a types equality operator
  magic without the symbol
- add the `OrdinalSeq` type to the `utils/containers.nim` module
- add the `Cursor` type; meant as a temporary placeholder for immutable
  views
- add multiple new destructor-related test cases
- remove the `nfLastRead` and `nfFirstWrite` node flags
- remove the `sfSingleUsedTemp` symbol flag
- remove the `rsemUncollectableRefCycle` warning. It had questionable
  utility and removing it was simpler than to make it work again.
  Should there exist the desire to bring it back, the problematic
  assignment should be detected in `sempass2`
- remove the `optimizer` module
- flag the `arc/tstrformat.nim` test as a known issue

### Timings
Timings and peakmem are taken from compiling the compiler itself with
the options `-d:release --compileOnly` and no cached compilation
artifacts. The original compiler is build with
`-d:release --exceptions:goto`.

|compiler from|bootstrapped with|using|time taken|peakmem|
|:------------|:----------------|:----|:---------|:------|
|#d7a70238d0|`--gc:refc`|`--gc:refc`|24.426s|1015.844MiB|
|this commit|`--gc:refc`|`--gc:refc`|24.687s|1014.254MiB|
|#d7a70238d0  |`--gc:refc`|`--gc:orc`|45.206s|1.99GiB|
|this commit |`--gc:refc`|`--gc:orc`|29.531s|1.222GiB|
|#d7a70238d0  |`--gc:orc`|`--gc:refc`|15.492s|978.938MiB|
|this commit|`--gc:orc`|`--gc:refc`|15.714s|979.023MiB|
|#d7a70238d0  |`--gc:orc`|`--gc:orc`|22.218s|1.933GiB|
|this commit|`--gc:orc`|`--gc:orc`|18.266s|1.188GiB|
@zerbina zerbina force-pushed the injectdestructors-rewrite branch from 3d10776 to da47c7b Compare February 17, 2023 14:36
@zerbina zerbina changed the title compiler: rewrite injectdestructors as MIR passes compiler: make injectdestructors a MIR pass Feb 17, 2023
@zerbina
Copy link
Collaborator Author

zerbina commented Feb 17, 2023

bors r+

@bors
Copy link
Contributor

bors bot commented Feb 17, 2023

Build succeeded:

@bors bors bot merged commit ba81827 into nim-works:devel Feb 17, 2023
@zerbina zerbina deleted the injectdestructors-rewrite branch February 20, 2023 20:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/sem Related to semantic-analysis system of the compiler compiler General compiler tag refactor Implementation refactor
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants