JIT: null out inline gc type locals after inline body #9479

AndyAyersMS · 2017-02-10T01:28:26Z

Inlining can sometimes end up stretching GC lifetimes for inlinee
locals past the end of the inlinee method body.

This change extends the work done for inlining methods with pinned
locals to null out all gc ref type locals. If there are any gc type
locals the return value is copied to a temp inside the inlinee body,
and then at the end of the body the gc type locals are set to null.

In most cases these null stores end up getting removed by dead code
elimination, but if the local ends up untracked, the store remains
and limits the active GC lifetime.

Closes #9218.

AndyAyersMS · 2017-02-10T01:30:48Z

@sivarv this is preliminary, but take a look
cc @dotnet/jit-contrib

Overall code size impact is surprisingly small, 0.02% across the jit-diff set. Most of the null stores get removed because most inlinee locals are tracked refs.

There are some good and bad RA diffs. Will post some examples subsequently.

binarytrees3 improves by about 2% in my local runs.

sivarv · 2017-02-10T03:31:40Z

src/jit/importer.cpp

        }

-        if (impInlineInfo->lclVarInfo[lclNum + impInlineInfo->argCnt].lclIsPinned)
+        // Look for GC refs that we will null out later on.
+        if (lclTyp == TYP_REF || lclTyp == TYP_BYREF)


if (lclTyp == TYP_REF || lclTyp == TYP_BYREF) [](start = 1, length = 52)

Minor: we can replace this with varTypeIsGC()

sivarv · 2017-02-10T03:38:27Z

src/jit/flowgraph.cpp

-        if (!lvaTable[tmpNum].lvPinned)
+        // Is the local TYP_REF or TYP_BYREF?
+        const var_types lclTyp = lvaTable[tmpNum].lvType;
+        if ((lclTyp != TYP_REF) && (lclTyp != TYP_BYREF))


Minor: Here too varTypeIsGC() can be used.

sivarv · 2017-02-10T03:52:29Z

Code changes look good to me.

Please run SPMI diffs on desktop as well.

Few questions

Is BinaryTrees benchmark in parity with Legacy JIT64 after this change? May be not? when I measured on my machine the difference was close to 15%. With this change, we will be confining the gc-ness of in-linee local of bottomUpTree() but ChildTreeNodes() is still not in-lined. In-lining ChildTreeNodes() would have the added benefit of not only avoiding call overhead but also avoiding write-barrier to update its out parameter.
Particularly, are there any other CqPerf benchmarks that has benefited from this change other than BinaryTrees?

JosephTremoulet · 2017-02-10T12:36:23Z

@dotnet-bot test Windows_NT perf

AndyAyersMS · 2017-02-10T16:01:29Z

@JosephTremoulet any idea how long that takes to run? Also whose credentials does it show up under?

JosephTremoulet · 2017-02-10T16:10:29Z

@AndyAyersMS looks like it didn't actually trigger -- it should show up in the inline CI status list here. Maybe if you say the magic words? It worked for me in #9169

JosephTremoulet · 2017-02-10T16:23:45Z

(and in that case it showed up under my credentials after the CI task completed)

AndyAyersMS · 2017-02-10T16:27:02Z

@dotnet-bot test Windows_NT perf

AndyAyersMS · 2017-02-10T16:35:25Z

Looks like perf job triggering may be broken. @DrewScoggins any ideas?

AndyAyersMS · 2017-02-10T16:38:28Z

(Edit: commented on the wrong PR, leaving relevant x-ref bits behind)

It would be really nice if jit-diff would also report how many methods remained unchaged; I'm curious what % of methods this change impacts.

AndyAyersMS · 2017-02-10T18:56:02Z

@sivarv locally I measure about a 20% improvement on binarytrees3 (765 -> 591) but there is a lot of run to run variance.

DrewScoggins · 2017-02-10T19:03:44Z

Sorry, this is my fault. We recently added x86 runs and when I did I changed the CI phrases to include the architecture. So you now need Windows_NT_{arch}. I want to keep this so you can launch just the jobs you want, but I will see if I can add back the old trigger and have it trigger both x86 and x64.

DrewScoggins · 2017-02-10T19:04:12Z

test Windows_NT_x64 perf
test Windows_NT_x86 perf

AndyAyersMS · 2017-02-10T19:08:16Z

Ah, the perf tests entries have shown up (edit, see Drew fixed this -- thanks!)

AndyAyersMS · 2017-02-10T20:15:16Z

Some unfortunate diffs, mostly in cases where the null stores get optimized away.
Likely because we are now using the return value spill temp more often.

;; System.DateTimeOffset:.ctor(struct):this

;; Before

       mov      rcx, gword ptr [rbx+8]
       test     rcx, rcx
       jne      SHORT G_M55312_IG03
       mov      rcx, rbx
       call     CachedData:CreateLocal():ref:this
       mov      rcx, rax

;; After
;; must save/restore rbp -- only appearances below

       mov      rbp, gword ptr [rbx+8]
       test     rbp, rbp
       jne      SHORT G_M55312_IG03
       mov      rcx, rbx
       call     CachedData:CreateLocal():ref:this
       mov      rbp, rax
G_M55312_IG03:
       mov      rcx, rbp

AndyAyersMS · 2017-02-10T21:52:36Z

Looks like the perf job sees 11% or so for BinaryTrees3. Nothing else moves much; csc/dataflow claims 4% regression but this is within the noise level and the instruction retired stayed flat.

Similar but slightly smaller win on x86.

AndyAyersMS · 2017-02-10T23:33:35Z

Desktop SPMI: 12200 method impacted, 103768 bytes net size increase; 0.8% net size increase for the impacted methods. IIRC SPMI has a base set of nearly 2M methods, so about 1 in 200 methods is impacted.

AndyAyersMS · 2017-02-10T23:49:51Z

Looks like this has a bad interaction with the inline implicit tail call work. If an inlinee introduces an implicit tail call then we don't want to place nulling stores after the inlinee body, since the subsequent tail call transformation only expects to see side-effect free operations there. Odd though that this only caused failures on non-windows tests.

Seems sensible enough to suppress nulling inlinee locals if the inlinee is in tail position, since presumably control is about to leave the frame anyways, so the lifetime trimming from nulling locals in such cases is minimal at best.

briansull · 2017-02-10T23:53:43Z

LGTM

AndyAyersMS · 2017-02-11T02:13:20Z

Ah, windows crossgen also failing, but we don't run this CHECKED so we don't see it.

AndyAyersMS · 2017-02-11T17:47:12Z

The assert firing off is:

Assertion failed '(popStmt->gtStmtExpr->gtFlags & GTF_ALL_EFFECT) == 0' in 'System.Internal:WinRT_IReadOnlyList(ref,ref,ref)' (IL size 38)
c:\repos\coreclr\src\jit\morph.cpp Line: 7946

This is paranoia in tail call expansion, trying to ensure that any post-call operation can be safely deleted without removing important work, since we only expect to see pop and nop there in the IL stream.

The root cause of this assert is actually bad flag maintenance when updating return value expression placeholders.

Return value placeholders are always given the GTF_CALL side effect (see gtNewInlineCandidateReturnExpr) because there needs to be some side effect on the return value placeholder to ensure a subsequent pop doesn't just throw the return value away, since the real return value expression (as of yet unknown) may well have a side effect.

If the inline fails, the call will be swapped in for the placeholder and the tree flags will at least be somewhat correct.

But if the inline succeeds and the post-inline return value expression doesn't have any side effect (say because it is a spilled return value temp) then the tree flags are stale. And if the inline introduces a call in tail position then the stale flags cause the above assert to fire, since there is now a tail call followed by an apparently side-effecting operation.

This is a latent issue that was introduced by allowing inlinee call sites to be recognized as tail calls, exposed now because this change makes the use of the return value spill temp much more common. It could happen even without this change (say by inlining a tail call to a method with pinned locals that itself has a non-inlined tail call) but evidently such cases are quite rare.

The behavior of the retail jit is fine, since it is actually OK to remove the post-call operation; the bug is that its side effects are overstated.

Ideally we'd get the side effects fully and properly updated once the return value placeholder is removed. Looking at how to do that now.

Inlining can sometimes end up stretching GC lifetimes for inlinee locals past the end of the inlinee method body. This change extends the work done for inlining methods with pinned locals to null out all gc ref type locals. If there are any gc type locals, the return value is copied to a temp inside the inlinee body, and then at the end of the body the gc type locals are set to null. In most cases these null stores end up getting removed by dead code elimination, but if the local ends up untracked, the store remains and limits the active GC lifetime. This change requires more extensive use of the return value spill temp (since we must be absolutely sure the return value expression does not depend on any of the locals). We now use the spill temp if any local is a gc ref type (used or not). More widespread use of the spill temp exposed an issue in the tail call transformation, where the side effect flags were overly pessimistic for a post tail-call GT_COMMA statement that represented an ignored return value from an inlined call. The root issue is that the return value placeholder is given a GTF_CALL effect but later when the actual return value expression is substituted in place there may be no side effect -- in particular this happens when the return value expression is the spill temp. So, we also update the post tail-call side-effect-free statment check to look through the potentially pessimistic GT_COMMA flags and check for side effects on the child nodes. Closes #9218.

AndyAyersMS · 2017-02-14T20:12:11Z

Revised this a bit:

look for GC ref locals up front rather than as they're used, and force use of the return spill temp early. I was worried the earlier approach might see a return before the first use of a gc local and potentially drop return values.
when nulling, account for all GC type locals, then screen out ones that aren't used
deal with pessimistic stmt flags in tail call opt.

Local testing looks good; going to run this through desktop as well.

AndyAyersMS · 2017-02-14T21:32:04Z

Running some jit stress legs to try out variant inlining patterns...

@dotnet-bot test Windows_NT jitstress2
@dotnet-bot test Ubuntu jitstress2

AndyAyersMS · 2017-02-14T23:23:32Z

@dotnet-bot test Windows_NT_x64 perf

AndyAyersMS · 2017-02-15T02:59:52Z

@sivarv think this is ready to go....

sivarv · 2017-02-15T04:27:27Z

src/jit/flowgraph.cpp

+// Notes:
+//    If the call we're inlining is in tail position then
+//    we skip nulling the locals, since it can interfere
+//    with tail calls introduced by the local.


typo? "by the call" instead of "by the local".

sivarv · 2017-02-15T04:35:28Z

src/jit/flowgraph.cpp

        return;
    }

-    JITDUMP("Unpin inlinee locals:\n");
+    if (inlineInfo->iciCall->IsImplicitTailCall())


Can there a possibility that we create a spillTemp for return value if the inlinee method has gc-ref locals but later don't append null assignments because the inlinee happens to be an implicit tail call?

Can that lead to degraded CQ because of spillTemp for return value?

Yes we won't insert null assignments after a potential tail call, and it's intentional.

Since we're upstream of the tail call transform, we can't put the null assignments after since they will cause asserts during the tail call transformation.

And even if the tail call transformation doesn't happen, the method is going to return immediately anyways after the call (it is a call in tail position after all), so there's no real benefit to the null assignments, since the frame is about to be torn down.

There are some small CQ issues here and there with register shuffles; I mentioned one up thread. I'll open an issue and point out a few others.

sivarv · 2017-02-15T04:39:43Z

src/jit/flowgraph.cpp

+    if (inlineInfo->iciCall->IsImplicitTailCall())
+    {
+        JITDUMP("fgInlineAppendStatements: implicit tail call; skipping nulling.\n");
+        return;


A call marked as implicit tail call doesn't guarantee that it will be honored. fgMorphCall() makes further checks and could ignore implicit-tail-call flag if any of those checks fail. The fact now the same condition is used for appending un-pinning and nulling out gc-ref locals, there is a possibility that we don't append unpin stmts in some cases?

I am trying to understand is this a conservative check? If yes, can there be cases JIT misses to append unpins?

As noted above it is intentional to skip this case; even if we could put assignments there the method is about to return.

AndyAyersMS · 2017-02-15T08:13:31Z

Desktop SPMI now shows a size increase of 101905 bytes over 11998 functions, down a bit from the numbers above, presumably because we now skip some tail call cases.

sivarv · 2017-02-15T15:36:50Z

Looks good to me.

AndyAyersMS · 2017-02-15T17:36:37Z

See #9612 for a few examples where we generate unnecessary extra code.

JIT: null out inline gc type locals after inline body Commit migrated from dotnet/coreclr@995b869

AndyAyersMS added * NO MERGE * The PR is not ready for merge yet (see discussion for detailed reasons) area-CodeGen optimization labels Feb 10, 2017

AndyAyersMS requested a review from sivarv February 10, 2017 01:28

dnfclas added the cla-already-signed label Feb 10, 2017

sivarv reviewed Feb 10, 2017

View reviewed changes

russellhadley mentioned this pull request Feb 10, 2017

jit-diff should report number of unchanged methods dotnet/jitutils#81

Closed

AndyAyersMS force-pushed the TrimInlineGcRefs branch from 1a398aa to 05721ee Compare February 14, 2017 20:05

AndyAyersMS removed the * NO MERGE * The PR is not ready for merge yet (see discussion for detailed reasons) label Feb 14, 2017

sivarv reviewed Feb 15, 2017

View reviewed changes

AndyAyersMS merged commit 995b869 into dotnet:master Feb 15, 2017

AndyAyersMS deleted the TrimInlineGcRefs branch February 15, 2017 17:36

AndyAyersMS mentioned this pull request Feb 17, 2017

Update CoreClr, CoreFx, External, ProjectNTfs, Standard to beta-25019-02, beta-25018-01, beta-25016-01, beta-25017-00, beta-25017-01, respectively (master) dotnet/corefx#16252

Merged

karelz modified the milestone: 2.0.0 Aug 28, 2017

AndyAyersMS mentioned this pull request Jun 13, 2019

make sure we close OS handle if socket is finalized without disposing dotnet/corefx#38499

Merged

AndyAyersMS mentioned this pull request Jan 31, 2020

JIT: register shuffles after the null inlinee gc refs change dotnet/runtime#7447

Closed

tmds mentioned this pull request May 29, 2020

Move epoll event handling to a non-inlined method dotnet/runtime#37138

Merged

AndyAyersMS mentioned this pull request Nov 25, 2020

RyuJIT in-lining policy: factor in the benefit of eliminating an address exposed obj-ref local dotnet/runtime#7201

Open

picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022

Merge pull request dotnet/coreclr#9479 from AndyAyersMS/TrimInlineGcRefs

beb796f

JIT: null out inline gc type locals after inline body Commit migrated from dotnet/coreclr@995b869

JIT: null out inline gc type locals after inline body #9479

JIT: null out inline gc type locals after inline body #9479

Conversation

AndyAyersMS commented Feb 10, 2017

AndyAyersMS commented Feb 10, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sivarv commented Feb 10, 2017

JosephTremoulet commented Feb 10, 2017

AndyAyersMS commented Feb 10, 2017

JosephTremoulet commented Feb 10, 2017

JosephTremoulet commented Feb 10, 2017

AndyAyersMS commented Feb 10, 2017

AndyAyersMS commented Feb 10, 2017

AndyAyersMS commented Feb 10, 2017 • edited Loading

AndyAyersMS commented Feb 10, 2017

DrewScoggins commented Feb 10, 2017

DrewScoggins commented Feb 10, 2017

AndyAyersMS commented Feb 10, 2017 • edited Loading

AndyAyersMS commented Feb 10, 2017

AndyAyersMS commented Feb 10, 2017

AndyAyersMS commented Feb 10, 2017

AndyAyersMS commented Feb 10, 2017

briansull commented Feb 10, 2017

AndyAyersMS commented Feb 11, 2017

AndyAyersMS commented Feb 11, 2017

AndyAyersMS commented Feb 14, 2017

AndyAyersMS commented Feb 14, 2017

AndyAyersMS commented Feb 14, 2017

AndyAyersMS commented Feb 15, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AndyAyersMS commented Feb 15, 2017

sivarv commented Feb 15, 2017

AndyAyersMS commented Feb 15, 2017

AndyAyersMS commented Feb 10, 2017 •

edited

Loading

AndyAyersMS commented Feb 10, 2017 •

edited

Loading