-
Notifications
You must be signed in to change notification settings - Fork 2.7k
JIT: null out inline gc type locals after inline body #9479
Conversation
@sivarv this is preliminary, but take a look Overall code size impact is surprisingly small, 0.02% across the jit-diff set. Most of the null stores get removed because most inlinee locals are tracked refs. There are some good and bad RA diffs. Will post some examples subsequently. binarytrees3 improves by about 2% in my local runs. |
src/jit/importer.cpp
Outdated
} | ||
|
||
if (impInlineInfo->lclVarInfo[lclNum + impInlineInfo->argCnt].lclIsPinned) | ||
// Look for GC refs that we will null out later on. | ||
if (lclTyp == TYP_REF || lclTyp == TYP_BYREF) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (lclTyp == TYP_REF || lclTyp == TYP_BYREF) [](start = 1, length = 52)
Minor: we can replace this with varTypeIsGC()
src/jit/flowgraph.cpp
Outdated
if (!lvaTable[tmpNum].lvPinned) | ||
// Is the local TYP_REF or TYP_BYREF? | ||
const var_types lclTyp = lvaTable[tmpNum].lvType; | ||
if ((lclTyp != TYP_REF) && (lclTyp != TYP_BYREF)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: Here too varTypeIsGC() can be used.
Code changes look good to me. Please run SPMI diffs on desktop as well. Few questions
|
@dotnet-bot test Windows_NT perf |
@JosephTremoulet any idea how long that takes to run? Also whose credentials does it show up under? |
@AndyAyersMS looks like it didn't actually trigger -- it should show up in the inline CI status list here. Maybe if you say the magic words? It worked for me in #9169 |
(and in that case it showed up under my credentials after the CI task completed) |
@dotnet-bot test Windows_NT perf |
Looks like perf job triggering may be broken. @DrewScoggins any ideas? |
(Edit: commented on the wrong PR, leaving relevant x-ref bits behind) It would be really nice if jit-diff would also report how many methods remained unchaged; I'm curious what % of methods this change impacts. |
@sivarv locally I measure about a 20% improvement on binarytrees3 (765 -> 591) but there is a lot of run to run variance. |
Sorry, this is my fault. We recently added x86 runs and when I did I changed the CI phrases to include the architecture. So you now need Windows_NT_{arch}. I want to keep this so you can launch just the jobs you want, but I will see if I can add back the old trigger and have it trigger both x86 and x64. |
test Windows_NT_x64 perf |
Ah, the perf tests entries have shown up (edit, see Drew fixed this -- thanks!) |
Some unfortunate diffs, mostly in cases where the null stores get optimized away.
|
Looks like the perf job sees 11% or so for BinaryTrees3. Nothing else moves much; csc/dataflow claims 4% regression but this is within the noise level and the instruction retired stayed flat. Similar but slightly smaller win on x86. |
Desktop SPMI: 12200 method impacted, 103768 bytes net size increase; 0.8% net size increase for the impacted methods. IIRC SPMI has a base set of nearly 2M methods, so about 1 in 200 methods is impacted. |
Looks like this has a bad interaction with the inline implicit tail call work. If an inlinee introduces an implicit tail call then we don't want to place nulling stores after the inlinee body, since the subsequent tail call transformation only expects to see side-effect free operations there. Odd though that this only caused failures on non-windows tests. Seems sensible enough to suppress nulling inlinee locals if the inlinee is in tail position, since presumably control is about to leave the frame anyways, so the lifetime trimming from nulling locals in such cases is minimal at best. |
LGTM |
Ah, windows crossgen also failing, but we don't run this CHECKED so we don't see it. |
The assert firing off is:
This is paranoia in tail call expansion, trying to ensure that any post-call operation can be safely deleted without removing important work, since we only expect to see The root cause of this assert is actually bad flag maintenance when updating return value expression placeholders. Return value placeholders are always given the If the inline fails, the call will be swapped in for the placeholder and the tree flags will at least be somewhat correct. But if the inline succeeds and the post-inline return value expression doesn't have any side effect (say because it is a spilled return value temp) then the tree flags are stale. And if the inline introduces a call in tail position then the stale flags cause the above assert to fire, since there is now a tail call followed by an apparently side-effecting operation. This is a latent issue that was introduced by allowing inlinee call sites to be recognized as tail calls, exposed now because this change makes the use of the return value spill temp much more common. It could happen even without this change (say by inlining a tail call to a method with pinned locals that itself has a non-inlined tail call) but evidently such cases are quite rare. The behavior of the retail jit is fine, since it is actually OK to remove the post-call operation; the bug is that its side effects are overstated. Ideally we'd get the side effects fully and properly updated once the return value placeholder is removed. Looking at how to do that now. |
Inlining can sometimes end up stretching GC lifetimes for inlinee locals past the end of the inlinee method body. This change extends the work done for inlining methods with pinned locals to null out all gc ref type locals. If there are any gc type locals, the return value is copied to a temp inside the inlinee body, and then at the end of the body the gc type locals are set to null. In most cases these null stores end up getting removed by dead code elimination, but if the local ends up untracked, the store remains and limits the active GC lifetime. This change requires more extensive use of the return value spill temp (since we must be absolutely sure the return value expression does not depend on any of the locals). We now use the spill temp if any local is a gc ref type (used or not). More widespread use of the spill temp exposed an issue in the tail call transformation, where the side effect flags were overly pessimistic for a post tail-call GT_COMMA statement that represented an ignored return value from an inlined call. The root issue is that the return value placeholder is given a GTF_CALL effect but later when the actual return value expression is substituted in place there may be no side effect -- in particular this happens when the return value expression is the spill temp. So, we also update the post tail-call side-effect-free statment check to look through the potentially pessimistic GT_COMMA flags and check for side effects on the child nodes. Closes #9218.
1a398aa
to
05721ee
Compare
Revised this a bit:
Local testing looks good; going to run this through desktop as well. |
Running some jit stress legs to try out variant inlining patterns... @dotnet-bot test Windows_NT jitstress2 |
@dotnet-bot test Windows_NT_x64 perf |
@sivarv think this is ready to go.... |
// Notes: | ||
// If the call we're inlining is in tail position then | ||
// we skip nulling the locals, since it can interfere | ||
// with tail calls introduced by the local. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo? "by the call" instead of "by the local".
return; | ||
} | ||
|
||
JITDUMP("Unpin inlinee locals:\n"); | ||
if (inlineInfo->iciCall->IsImplicitTailCall()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can there a possibility that we create a spillTemp for return value if the inlinee method has gc-ref locals but later don't append null assignments because the inlinee happens to be an implicit tail call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can that lead to degraded CQ because of spillTemp for return value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes we won't insert null assignments after a potential tail call, and it's intentional.
Since we're upstream of the tail call transform, we can't put the null assignments after since they will cause asserts during the tail call transformation.
And even if the tail call transformation doesn't happen, the method is going to return immediately anyways after the call (it is a call in tail position after all), so there's no real benefit to the null assignments, since the frame is about to be torn down.
There are some small CQ issues here and there with register shuffles; I mentioned one up thread. I'll open an issue and point out a few others.
if (inlineInfo->iciCall->IsImplicitTailCall()) | ||
{ | ||
JITDUMP("fgInlineAppendStatements: implicit tail call; skipping nulling.\n"); | ||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A call marked as implicit tail call doesn't guarantee that it will be honored. fgMorphCall() makes further checks and could ignore implicit-tail-call flag if any of those checks fail. The fact now the same condition is used for appending un-pinning and nulling out gc-ref locals, there is a possibility that we don't append unpin stmts in some cases?
I am trying to understand is this a conservative check? If yes, can there be cases JIT misses to append unpins?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As noted above it is intentional to skip this case; even if we could put assignments there the method is about to return.
Desktop SPMI now shows a size increase of 101905 bytes over 11998 functions, down a bit from the numbers above, presumably because we now skip some tail call cases. |
Looks good to me. |
See #9612 for a few examples where we generate unnecessary extra code. |
JIT: null out inline gc type locals after inline body Commit migrated from dotnet/coreclr@995b869
Inlining can sometimes end up stretching GC lifetimes for inlinee
locals past the end of the inlinee method body.
This change extends the work done for inlining methods with pinned
locals to null out all gc ref type locals. If there are any gc type
locals the return value is copied to a temp inside the inlinee body,
and then at the end of the body the gc type locals are set to null.
In most cases these null stores end up getting removed by dead code
elimination, but if the local ends up untracked, the store remains
and limits the active GC lifetime.
Closes #9218.