-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Fix lsra memory consumption #11233
Fix lsra memory consumption #11233
Conversation
Pretty cool. Does this move the needle on memstats or TP? |
src/jit/lsra.h
Outdated
@@ -1131,7 +1131,7 @@ class LinearScan : public LinearScanInterface | |||
int compareBlocksForSequencing(BasicBlock* block1, BasicBlock* block2, bool useBlockWeights); | |||
BasicBlockList* blockSequenceWorkList; | |||
bool blockSequencingDone; | |||
void addToBlockSequenceWorkList(BlockSet sequencedBlockSet, BasicBlock* block); | |||
void addToBlockSequenceWorkList(BlockSet sequencedBlockSet, BasicBlock* block, BlockSet predSet); | |||
void removeFromBlockSequenceWorkList(BasicBlockList* listNode, BasicBlockList* prevNode); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These should really use BlockSet_ValArg_T instead of BlockSet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although I guess you don't want to pass by value, do you? Should it be BlockSet&
?
In reply to: 113554764 [](ancestors = 113554764)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe:
void addToBlockSequenceWorkList(const BlockSet& sequencedBlockSet, BasicBlock* block, BlockSet& predSet);
since sequencedBlockSet is const/unchanged, and predSet is changed. If you don't use &
then for short form bitsets you'll pass by value. It turns out that doesn't (currently) matter, since they don't allocate memory and you only care about reducing memory, but it would be weird for short bitsets to be passed by value and long bitsets to be passed by reference, accidentally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A value arg should be fine here, since the bitset is cleared upon each call to addBlockSequenceWorklist
. This should probably use BlockSet_ValArg_T
, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Conceptually, we don't want to be passing these bitsets "by value".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fair. Your suggestion to use a reference parameter should suffice, then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes look good, with additional desktop testing.
src/jit/bitsetasuint64inclass.h
Outdated
@@ -180,11 +180,10 @@ class BitSetUint64 | |||
|
|||
inline void ClearD(Env env) | |||
{ | |||
// Recall that ClearD does *not* require "*this" to be of the current epoch. | |||
// Recall that ClearD requires "*this" to be of the current epoch. | |||
Uint64BitSetOps::ClearD(env, m_bits); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that this is changing the requirements on ClearD, have you examined all the callsites to ensure that they are consistent with this? I would suggest that this undergo additional testing (e.g. desktop SPMI) to ensure that we don't encounter this assert.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked that there are no spmi failures or spmi diffs.
@sandreenko |
src/jit/lsra.cpp
Outdated
@@ -1540,6 +1540,7 @@ int LinearScan::compareBlocksForSequencing(BasicBlock* block1, BasicBlock* block | |||
// Arguments: | |||
// sequencedBlockSet - the set of blocks that are already sequenced | |||
// block - the new block to be added | |||
// predSet - the buffer to save predessesors set |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: predessesors
I think you should elaborate, e.g.: "a block set allocated by the caller used here as a temporary block set for constructing a predecessor set. Allocated by the caller to avoid reallocating a new block set with every call to this function"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes to LSRA look fine, but the changes to BitSet::ClearDLong
are too risky.
src/jit/bitsetasshortlong.h
Outdated
@@ -664,9 +664,11 @@ void BitSetOps</*BitSetType*/ BitSetShortLongRep, | |||
/*BitSetTraits*/ BitSetTraits>::ClearDLong(Env env, BitSetShortLongRep& bs) | |||
{ | |||
assert(!IsShort(env)); | |||
// Recall that ClearD does *not* require "bs" to be of the current epoch. | |||
// Therefore, we must allocate a new representation. | |||
bs = MakeEmptyArrayBits(env); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change makes me very nervous, as we do not seem to have any way to verify that the size of the bitset has not changed since we created it (i.e. there appears to be no way to check that the epoch at creation is the same as the epoch at the clear). I think you should introduce a new method (perhaps ClearDUnsafe
? I don't have a great name for it) with the semantics you've introduced here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sandreenko We had talked about you converting existing ClearD() callers to call MakeEmpty() directly, for existing callers that could not easily be proven to work with the new semantics. I presume the existing semantics, and huge comment you deleted for ClearD(), were there for a reason.
Is there some reason you decided not to do that? Could you somehow convince yourself the new semantics work for every call site?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I manually checked all calls to ClearD, the initial idea was to replace them with MakeEmpty when it is necessary. But It appeared that all current users of ClearD do not change epoch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One can assert(CheckEpoch(env))
which is being done below in bitsetasuint64inclass.h. I believe that assert could be made here as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But It appeared that all current users of ClearD do not change epoch.
I see twenty uses of ClearD
in the compiler today:
src/jit/assertionprop.cpp:4559: BitVecOps::ClearD(apTraits, fgFirstBB->bbAssertionIn);
src/jit/codegenlegacy.cpp:12585: VarSetOps::ClearD(compiler, compiler->compCurLife);
src/jit/copyprop.cpp:299: VarSetOps::ClearD(this, optCopyPropKillSet);
src/jit/emit.cpp:1475: VarSetOps::ClearD(emitComp, emitInitGCrefVars);
src/jit/emit.cpp:1476: VarSetOps::ClearD(emitComp, emitPrevGCrefVars);
src/jit/emit.cpp:4567: VarSetOps::ClearD(emitComp, emitThisGCrefVars); // This is initialized to Empty at the start of codegen.
src/jit/flowgraph.cpp:1818: // Initialize the per-block bbReach sets. (Note that we can't just call BlockSetOps::ClearD()
src/jit/flowgraph.cpp:10014: BlockSetOps::ClearD(this, bNext->bbReach);
src/jit/liveness.cpp:489: VarSetOps::ClearD(this, fgCurUseSet);
src/jit/liveness.cpp:490: VarSetOps::ClearD(this, fgCurDefSet);
src/jit/liveness.cpp:725: VarSetOps::ClearD(this, inScope);
src/jit/liveness.cpp:915: VarSetOps::ClearD(this, initVars);
src/jit/liveness.cpp:1151: VarSetOps::ClearD(m_compiler, m_liveOut);
src/jit/liveness.cpp:1265: VarSetOps::ClearD(m_compiler, m_liveIn);
src/jit/liveness.cpp:1266: VarSetOps::ClearD(m_compiler, m_liveOut);
src/jit/lsra.cpp:1741: BlockSetOps::ClearD(compiler, bbVisitedSet);
src/jit/lsra.cpp:9317: VarSetOps::ClearD(compiler, sameResolutionSet);
src/jit/lsra.h:1109: BlockSetOps::ClearD(compiler, bbVisitedSet);
src/jit/regalloc.cpp:1343: VarSetOps::ClearD(this, varAsSet);
src/jit/regalloc.cpp:6351: VarSetOps::ClearD(this, raLclRegIntf[i]);
How did you verify that the epoch of each bitset does not change between the creation of each bitset and the point at which they are cleared? If the epoch has changed, the failure mode could be pretty bad, as if the number of bits has increased we could end up scribbling over arbitrary memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds like a much safer change to me.
I think that in this case if we replace all ClearD with MakeEmpty we will lose a valuable information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that in this case if we replace all ClearD with MakeEmpty we will lose a valuable information.
Yes, this is a good point. You should instead create a separate ClearD
method with the desired semantics.
FWIW, I can attest that liveness would also be able to use the new clear method in all cases that it currently uses ClearD
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. We could replace it with a new Clear() function that is non-destructive.
Now, it's possible that all the ClearD functions today can use the new semantics, meaning we don't need a new Clear function. It's just very suspicious that is true, given the huge comments about why ClearD is (currently) special.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's just very suspicious that is true, given the huge comments about why ClearD is (currently) special.
This is my concern as well, especially given that the potential impacts of a mismatch are rather severe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I first looked at ClearD I thought it just clear bits without this epoch magic.
It was unique function, but nobody used it in such way.
For example:
flowgraph.cpp:
// Initialize the per-block bbReach sets. (Note that we can't just call BlockSetOps::ClearD()
// when re-running this computation, because if the epoch changes, the size and representation of the
// sets might change).
So I prefer not to create the second function, that will confuse us in the future, but to replace the first.
However: liveness, lsra, regalloc, assertionprop, copyprop and emit do not change the epoch.
flowgraph does, but it calls MakeEmpty before ClearD. But let me check that with noway_asserts.
src/jit/bitsetasuint64inclass.h
Outdated
Uint64BitSetOps::ClearD(env, m_bits); | ||
#ifdef DEBUG | ||
// But it updates it to of the current epoch. | ||
m_epoch = BitSetTraits::GetEpoch(env); | ||
assert(CheckEpoch(env)); | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for #ifdef DEBUG
. Just make it:
inline void ClearD(Env env)
{
assert(CheckEpoch(env));
Uint64BitSetOps::ClearD(env, m_bits);
}
The new ClearD reuses existing buffer and assumes that epoch wasn't changed.
dafa057
to
4701304
Compare
@pgavlin @BruceForstall I updated PR according to our discussion. |
VarSetOps::ClearD(emitComp, emitInitGCrefVars); | ||
VarSetOps::ClearD(emitComp, emitPrevGCrefVars); | ||
VarSetOps::OldStyleClearD(emitComp, emitInitGCrefVars); | ||
VarSetOps::OldStyleClearD(emitComp, emitPrevGCrefVars); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This begs the question of whether these should not be initialized until CodeGen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, but we can make that change (if necessary) with a post-2.0 PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
@@ -1815,7 +1815,7 @@ void Compiler::fgComputeReachabilitySets() | |||
|
|||
for (block = fgFirstBB; block != nullptr; block = block->bbNext) | |||
{ | |||
// Initialize the per-block bbReach sets. (Note that we can't just call BlockSetOps::ClearD() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this comment is incorrect. I'm fine with leaving the code as-is, but at least the comment should be fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM aside from @CarolEidt's suggested comment change.
Thanks for doing this!
I would like to see an issue to either remove all uses of |
@sandreenko filed https://github.com/dotnet/coreclr/issues/11263. Does that suffice? |
Yes, thanks. |
Do not allocate new buffer each time when we call Clear for bitset.
It removes quadratic memory consumption for predecessors in lsra (N * N/ 64 size* were allocated, where N is the number of blocks).
Use the same preallocated buffer for lsra preds.
There are no spmi asm diffs.
Fix #11142, it allocated 3GB of memory, now it takes 300 MB,
PTAL @dotnet/jit-contrib , as I understand I have to merge it into the release branch too.