Fix lsra memory consumption #11233

sandreenko · 2017-04-26T20:23:38Z

Do not allocate new buffer each time when we call Clear for bitset.
It removes quadratic memory consumption for predecessors in lsra (N * N/ 64 size* were allocated, where N is the number of blocks).

Use the same preallocated buffer for lsra preds.

There are no spmi asm diffs.
Fix #11142, it allocated 3GB of memory, now it takes 300 MB,

PTAL @dotnet/jit-contrib , as I understand I have to merge it into the release branch too.

russellhadley · 2017-04-26T20:29:38Z

Pretty cool. Does this move the needle on memstats or TP?

BruceForstall · 2017-04-26T20:30:21Z

src/jit/lsra.h

@@ -1131,7 +1131,7 @@ class LinearScan : public LinearScanInterface
    int compareBlocksForSequencing(BasicBlock* block1, BasicBlock* block2, bool useBlockWeights);
    BasicBlockList* blockSequenceWorkList;
    bool            blockSequencingDone;
-    void addToBlockSequenceWorkList(BlockSet sequencedBlockSet, BasicBlock* block);
+    void addToBlockSequenceWorkList(BlockSet sequencedBlockSet, BasicBlock* block, BlockSet predSet);
    void removeFromBlockSequenceWorkList(BasicBlockList* listNode, BasicBlockList* prevNode);


These should really use BlockSet_ValArg_T instead of BlockSet

Although I guess you don't want to pass by value, do you? Should it be BlockSet&?

In reply to: 113554764 [](ancestors = 113554764)

Maybe:

void addToBlockSequenceWorkList(const BlockSet& sequencedBlockSet, BasicBlock* block, BlockSet& predSet);

since sequencedBlockSet is const/unchanged, and predSet is changed. If you don't use & then for short form bitsets you'll pass by value. It turns out that doesn't (currently) matter, since they don't allocate memory and you only care about reducing memory, but it would be weird for short bitsets to be passed by value and long bitsets to be passed by reference, accidentally.

A value arg should be fine here, since the bitset is cleared upon each call to addBlockSequenceWorklist. This should probably use BlockSet_ValArg_T, though.

Conceptually, we don't want to be passing these bitsets "by value".

That's fair. Your suggestion to use a reference parameter should suffice, then.

CarolEidt

The changes look good, with additional desktop testing.

CarolEidt · 2017-04-26T20:30:59Z

src/jit/bitsetasuint64inclass.h

@@ -180,11 +180,10 @@ class BitSetUint64

    inline void ClearD(Env env)
    {
-        // Recall that ClearD does *not* require "*this" to be of the current epoch.
+        // Recall that ClearD requires "*this" to be of the current epoch.
        Uint64BitSetOps::ClearD(env, m_bits);


Given that this is changing the requirements on ClearD, have you examined all the callsites to ensure that they are consistent with this? I would suggest that this undergo additional testing (e.g. desktop SPMI) to ensure that we don't encounter this assert.

I checked that there are no spmi failures or spmi diffs.

briansull · 2017-04-26T20:33:04Z

@sandreenko
I would like to take this fix for the Desktop 4.7.1 release of the JIT as well.
@CarolEidt
Can you confirm that it is low risk.

BruceForstall · 2017-04-26T20:36:59Z

src/jit/lsra.cpp

@@ -1540,6 +1540,7 @@ int LinearScan::compareBlocksForSequencing(BasicBlock* block1, BasicBlock* block
 // Arguments:
 //    sequencedBlockSet - the set of blocks that are already sequenced
 //    block             - the new block to be added
+//    predSet           - the buffer to save predessesors set


typo: predessesors

I think you should elaborate, e.g.: "a block set allocated by the caller used here as a temporary block set for constructing a predecessor set. Allocated by the caller to avoid reallocating a new block set with every call to this function"

Fixed, thanks.

pgavlin

The changes to LSRA look fine, but the changes to BitSet::ClearDLong are too risky.

pgavlin · 2017-04-26T20:46:00Z

src/jit/bitsetasshortlong.h

@@ -664,9 +664,11 @@ void BitSetOps</*BitSetType*/ BitSetShortLongRep,
               /*BitSetTraits*/ BitSetTraits>::ClearDLong(Env env, BitSetShortLongRep& bs)
 {
    assert(!IsShort(env));
-    // Recall that ClearD does *not* require "bs" to be of the current epoch.
-    // Therefore, we must allocate a new representation.
-    bs = MakeEmptyArrayBits(env);


This change makes me very nervous, as we do not seem to have any way to verify that the size of the bitset has not changed since we created it (i.e. there appears to be no way to check that the epoch at creation is the same as the epoch at the clear). I think you should introduce a new method (perhaps ClearDUnsafe? I don't have a great name for it) with the semantics you've introduced here.

@sandreenko We had talked about you converting existing ClearD() callers to call MakeEmpty() directly, for existing callers that could not easily be proven to work with the new semantics. I presume the existing semantics, and huge comment you deleted for ClearD(), were there for a reason.

Is there some reason you decided not to do that? Could you somehow convince yourself the new semantics work for every call site?

I manually checked all calls to ClearD, the initial idea was to replace them with MakeEmpty when it is necessary. But It appeared that all current users of ClearD do not change epoch.

One can assert(CheckEpoch(env)) which is being done below in bitsetasuint64inclass.h. I believe that assert could be made here as well.

But It appeared that all current users of ClearD do not change epoch.

I see twenty uses of ClearD in the compiler today:

src/jit/assertionprop.cpp:4559: BitVecOps::ClearD(apTraits, fgFirstBB->bbAssertionIn); src/jit/codegenlegacy.cpp:12585: VarSetOps::ClearD(compiler, compiler->compCurLife); src/jit/copyprop.cpp:299: VarSetOps::ClearD(this, optCopyPropKillSet); src/jit/emit.cpp:1475: VarSetOps::ClearD(emitComp, emitInitGCrefVars); src/jit/emit.cpp:1476: VarSetOps::ClearD(emitComp, emitPrevGCrefVars); src/jit/emit.cpp:4567: VarSetOps::ClearD(emitComp, emitThisGCrefVars); // This is initialized to Empty at the start of codegen. src/jit/flowgraph.cpp:1818: // Initialize the per-block bbReach sets. (Note that we can't just call BlockSetOps::ClearD() src/jit/flowgraph.cpp:10014: BlockSetOps::ClearD(this, bNext->bbReach); src/jit/liveness.cpp:489: VarSetOps::ClearD(this, fgCurUseSet); src/jit/liveness.cpp:490: VarSetOps::ClearD(this, fgCurDefSet); src/jit/liveness.cpp:725: VarSetOps::ClearD(this, inScope); src/jit/liveness.cpp:915: VarSetOps::ClearD(this, initVars); src/jit/liveness.cpp:1151: VarSetOps::ClearD(m_compiler, m_liveOut); src/jit/liveness.cpp:1265: VarSetOps::ClearD(m_compiler, m_liveIn); src/jit/liveness.cpp:1266: VarSetOps::ClearD(m_compiler, m_liveOut); src/jit/lsra.cpp:1741: BlockSetOps::ClearD(compiler, bbVisitedSet); src/jit/lsra.cpp:9317: VarSetOps::ClearD(compiler, sameResolutionSet); src/jit/lsra.h:1109: BlockSetOps::ClearD(compiler, bbVisitedSet); src/jit/regalloc.cpp:1343: VarSetOps::ClearD(this, varAsSet); src/jit/regalloc.cpp:6351: VarSetOps::ClearD(this, raLclRegIntf[i]);

How did you verify that the epoch of each bitset does not change between the creation of each bitset and the point at which they are cleared? If the epoch has changed, the failure mode could be pretty bad, as if the number of bits has increased we could end up scribbling over arbitrary memory.

This sounds like a much safer change to me.

I think that in this case if we replace all ClearD with MakeEmpty we will lose a valuable information.

I think that in this case if we replace all ClearD with MakeEmpty we will lose a valuable information.

Yes, this is a good point. You should instead create a separate ClearD method with the desired semantics.

FWIW, I can attest that liveness would also be able to use the new clear method in all cases that it currently uses ClearD.

Agreed. We could replace it with a new Clear() function that is non-destructive.

Now, it's possible that all the ClearD functions today can use the new semantics, meaning we don't need a new Clear function. It's just very suspicious that is true, given the huge comments about why ClearD is (currently) special.

It's just very suspicious that is true, given the huge comments about why ClearD is (currently) special.

This is my concern as well, especially given that the potential impacts of a mismatch are rather severe.

When I first looked at ClearD I thought it just clear bits without this epoch magic.
It was unique function, but nobody used it in such way.
For example:
flowgraph.cpp:

// Initialize the per-block bbReach sets. (Note that we can't just call BlockSetOps::ClearD() // when re-running this computation, because if the epoch changes, the size and representation of the // sets might change).

So I prefer not to create the second function, that will confuse us in the future, but to replace the first.

However: liveness, lsra, regalloc, assertionprop, copyprop and emit do not change the epoch.
flowgraph does, but it calls MakeEmpty before ClearD. But let me check that with noway_asserts.

BruceForstall · 2017-04-26T21:47:09Z

src/jit/bitsetasuint64inclass.h

        Uint64BitSetOps::ClearD(env, m_bits);
 #ifdef DEBUG
-        // But it updates it to of the current epoch.
-        m_epoch = BitSetTraits::GetEpoch(env);
+        assert(CheckEpoch(env));
 #endif


No need for #ifdef DEBUG. Just make it:

inline void ClearD(Env env) { assert(CheckEpoch(env)); Uint64BitSetOps::ClearD(env, m_bits); }

The new ClearD reuses existing buffer and assumes that epoch wasn't changed.

sandreenko · 2017-04-27T21:58:57Z

@pgavlin @BruceForstall I updated PR according to our discussion.

CarolEidt · 2017-04-27T22:20:46Z

src/jit/emit.cpp

-    VarSetOps::ClearD(emitComp, emitInitGCrefVars);
-    VarSetOps::ClearD(emitComp, emitPrevGCrefVars);
+    VarSetOps::OldStyleClearD(emitComp, emitInitGCrefVars);
+    VarSetOps::OldStyleClearD(emitComp, emitPrevGCrefVars);


This begs the question of whether these should not be initialized until CodeGen.

I agree, but we can make that change (if necessary) with a post-2.0 PR.

CarolEidt · 2017-04-27T22:21:47Z

src/jit/flowgraph.cpp

@@ -1815,7 +1815,7 @@ void Compiler::fgComputeReachabilitySets()

    for (block = fgFirstBB; block != nullptr; block = block->bbNext)
    {
-        // Initialize the per-block bbReach sets. (Note that we can't just call BlockSetOps::ClearD()


I think this comment is incorrect. I'm fine with leaving the code as-is, but at least the comment should be fixed.

pgavlin

LGTM aside from @CarolEidt's suggested comment change.

Thanks for doing this!

CarolEidt · 2017-04-27T22:24:52Z

I would like to see an issue to either remove all uses of OldStyleClearD or rename it, and to generally review the usage of the two flavors of ClearD and MakeEmpty

pgavlin · 2017-04-27T22:26:54Z

I would like to see an issue to either remove all uses of OldStyleClearD or rename it, and to generally review the usage of the two flavors of ClearD and MakeEmpty

@sandreenko filed https://github.com/dotnet/coreclr/issues/11263. Does that suffice?

CarolEidt · 2017-04-27T22:36:30Z

@sandreenko filed #11263. Does that suffice?

Yes, thanks.

sandreenko added area-CodeGen optimization labels Apr 26, 2017

sandreenko requested review from pgavlin and BruceForstall April 26, 2017 20:23

dnfclas added the cla-already-signed label Apr 26, 2017

BruceForstall reviewed Apr 26, 2017

View reviewed changes

CarolEidt approved these changes Apr 26, 2017

View reviewed changes

BruceForstall reviewed Apr 26, 2017

View reviewed changes

pgavlin suggested changes Apr 26, 2017

View reviewed changes

BruceForstall reviewed Apr 26, 2017

View reviewed changes

Sergey Andreenko added 3 commits April 27, 2017 12:27

Rename ClearD to OldStyleClearD

07faf74

create fast ClearD function.

5da8acb

The new ClearD reuses existing buffer and assumes that epoch wasn't changed.

Reuse existing buffer for the predSet in lsra.

4701304

sandreenko force-pushed the fix-lsra-memory-consumption branch from dafa057 to 4701304 Compare April 27, 2017 21:40

Sergey Andreenko added 2 commits April 27, 2017 14:53

Use fast ClearD in LSRA.

045a08e

Use fast ClearD in liveness

e8c1600

CarolEidt reviewed Apr 27, 2017

View reviewed changes

pgavlin approved these changes Apr 27, 2017

View reviewed changes

Fix the comment

38eb91c

delete the test from issues

6a14c6f

BruceForstall approved these changes Apr 27, 2017

View reviewed changes

sandreenko merged commit 70014f8 into dotnet:master Apr 28, 2017

sandreenko deleted the fix-lsra-memory-consumption branch April 28, 2017 01:07

karelz modified the milestone: 2.0.0 Aug 28, 2017

sandreenko mentioned this pull request Apr 16, 2019

Fix lsraBlockEpoch check. #24009

Merged

pgavlin mentioned this pull request Jan 31, 2020

Add code to verify that operations on short/long bitsets operate on the correct epoch. dotnet/runtime#7957

Open

sandreenko mentioned this pull request Jan 31, 2020

Out of memory exception in JIT/Regression/JitBlue/DevDiv_255294 dotnet/runtime#8034

Closed

Fix lsra memory consumption #11233

Fix lsra memory consumption #11233

Conversation

sandreenko commented Apr 26, 2017

russellhadley commented Apr 26, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CarolEidt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

briansull commented Apr 26, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pgavlin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sandreenko Apr 26, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sandreenko commented Apr 27, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pgavlin left a comment

Choose a reason for hiding this comment

CarolEidt commented Apr 27, 2017

pgavlin commented Apr 27, 2017

CarolEidt commented Apr 27, 2017

sandreenko Apr 26, 2017 •

edited

Loading