-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Write-Thru of EH variables in LSRA #543
Conversation
@dotnet/jit-contrib PTAL |
360d67b
to
67c62ea
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I expect that change has an impact on throughput and memory consumption, how big is it?
The implementation looks good, but I am scared to have so many places where we check writeThru
, that means it will be very easy to forget some of them.
bool isLocalVar : 1; | ||
|
||
// Is this Interval currently in a register and live? | ||
bool isActive; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should not it be declared as a bitfield?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved this field from the base type (Referenceable
) to Interval
, and changing it to a bitfield seemed to have a minor negative impact.
bool lvaVarDoNotEnregister(unsigned varNum); | ||
|
||
bool lvaEnregEHVars; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the same question why it is not a bitfield.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't currently have any bitfields on Compiler
, and given that there is only one instance per compilation, it seems that the efficiency of querying a byte overrides the storage impact.
src/coreclr/src/jit/instr.cpp
Outdated
if (isSpilledValue) | ||
{ | ||
// Is this the special case of a write-thru lclVar? | ||
// We mark it as SPILLED to denote that its value is valid in memroy. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: memroy
src/coreclr/src/jit/instr.cpp
Outdated
|
||
if (tree->gtFlags & GTF_SPILLED) | ||
// Is this a spilled value? | ||
bool isSpilledValue = ((tree->gtFlags & GTF_SPILLED) != 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what do we actually need to know here:
- is the value valid in memory?
- is the value valid in a register?
looks like the second, so maybe rename isSpilledValue
to validInReg
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. With the removal of legacy jit, I'm pretty sure this code is only used for lclVar spilling, so I think I'll try to simplify it.
src/coreclr/src/jit/lclvars.cpp
Outdated
@@ -85,6 +85,8 @@ void Compiler::lvaInit() | |||
lvaCurEpoch = 0; | |||
|
|||
structPromotionHelper = new (this, CMK_Generic) StructPromotionHelper(this); | |||
|
|||
lvaEnregEHVars = ((opts.compFlags & CLFLG_REGVAR) != 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What will happen if lsra::enregisterLocalVars
and compiler::lvaEnregEHVars
have different values, for example enregisterLocalVars==false
and lvaEnregEHVars==true
, will we try allocate registers for such variables or not?
If now why do we need a separate flag?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although I hadn't added it yet, I anticipated having a COMPlus
variable to separately control this (which I'm hopefully about to push as an update to this PR). However, LSRA uses the condition above to set enregisterLocalVars
so it wouldn't be possible to have lvaEnregEHVars
be true and enregisterLocalVars
be false.
@@ -3815,7 +3825,7 @@ void CodeGen::genFnPrologCalleeRegArgs(regNumber xtraReg, bool* pXtraRegClobbere | |||
#endif // !_TARGET_64BIT_ | |||
{ | |||
// If not a stack arg go to the next one |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe // if only register homed go to the next one
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about // If this arg is never on the stack, go to the next one.
?
// by a non-def GT_LCL_VAR that is marked GTF_SPILL. | ||
// This case occurs at the end of a block when its register is going dead. | ||
// It is already valid on the stack. | ||
if (((tree->gtFlags & GTF_VAR_DEF) != 0) || !varDsc->lvLiveInOutOfHndlr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please update that comment so it includes normal spills, not only lvLiveInOutOfHndlr
, because, at least for me, that was not obvious after the first read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, this comment is a bit confusing...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still seems confusing, the comment is talking about GTF_SPILL
but the code you changed here does not look at this flag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've pushed another change to this comment, hopefully making it clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's better, yes. @sandreenko do you agree?
{ | ||
// Treat this as having incoming EH flow, since we can't insert resolution moves into |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That change confuses me, but since hasEHBoundaryIn
was not used before that PR there could not be any regressions from that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What this does is change the block to have no predecessor, so that it doesn't have to deal with mismatched locations of EH vars across the boundary. However, this change actually causes some small diffs when the EH write-thru is disabled, because it creates mismatches on the non-EH edges. I'm looking into how complex it would be to eliminate this change, but it may be excessively complex. The crossgen diffs over all jits & altjits for frameworks & tests shows only 4 bytes of diff each over two large methods for arm32, and 40 bytes of diff for one 4020 byte method for x64/ux altjit.
src/coreclr/src/jit/lsra.cpp
Outdated
@@ -4435,7 +4498,24 @@ void LinearScan::spillGCRefs(RefPosition* killRefPosition) | |||
{ | |||
continue; | |||
} | |||
unassignPhysReg(regRecord, assignedInterval->recentRefPosition); | |||
bool needsKill = varTypeIsGC(assignedInterval->registerType); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that was part was already merged in #679, why does github show that as a diff?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it's because this is actually showing the original change that I extracted for that PR. I haven't yet pushed a rebased version of this PR.
src/coreclr/src/jit/lsrabuild.cpp
Outdated
// Compute set difference: newLiveIn = currentLiveVars - predBlock->bbLiveOut | ||
VarSetOps::DiffD(compiler, newLiveIn, predBlock->bbLiveOut); | ||
} | ||
bool needsDummyDefs = (!VarSetOps::IsEmpty(compiler, newLiveIn) && block != compiler->fgFirstBB); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please explain how this PR relates to the deletion of block != compiler->fgFirstBB
condition here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this PR, we have blocks with an incoming EH boundary that may have live-in candidate vars. Previously those vars would have no live-in candidate vars, by construction, so we didn't need to worry about whether they needed dummy defs. Now that we have live-in EH vars, we don't want dummy defs for those (they should always be on stack on entry), so they are treated as having no pred block (like fgFirstBB
), and that becomes the right condition for determining when we don't want dummy defs.
The Throughput tuning commit brings this from roughly a .05% throughput loss on x64 (in the noise on x86) to a .05% improvement on x64 and a .007% improvement (barely above the noise) on x86 for crossgen of SPC.dll
I'm not sure I fully understand. This is pretty fundamental, as these variables need to be handled differently. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some initial comments. Still need to look through lsra.cpp ...
@@ -415,6 +415,8 @@ class LclVarDsc | |||
unsigned char lvDoNotEnregister : 1; // Do not enregister this variable. | |||
unsigned char lvFieldAccessed : 1; // The var is a struct local, and a field of the variable is accessed. Affects | |||
// struct promotion. | |||
unsigned char lvLiveInOutOfHndlr : 1; // The variable is live in or out of an exception handler, and therefore must | |||
// be on the stack (at least at those boundaries.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you update the text for this when we dump the local var table? In existing code this is debug only and only gets dumped if the local is DNER. Would be nice to have it show up more prominently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added an " EH" annotation unconditionally, and left the 'H' character in the DNER dump.
// by a non-def GT_LCL_VAR that is marked GTF_SPILL. | ||
// This case occurs at the end of a block when its register is going dead. | ||
// It is already valid on the stack. | ||
if (((tree->gtFlags & GTF_VAR_DEF) != 0) || !varDsc->lvLiveInOutOfHndlr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, this comment is a bit confusing...
// If this is going live, the register must not have a variable in it, except | ||
// in the case of an exception variable, which may be already treated as live | ||
// in the register. | ||
assert(varDsc->lvLiveInOutOfHndlr || ((regSet.GetMaskVars() & regMask) == 0)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we consider any of these lvLiveInOutofHndlr
asserts as candidates for noway_assert
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've pretty much left the assert\noway_assert distinction as-is - these are generally just either expanding or contracting the condition for an assert.
if (varDsc->lvLiveInOutOfHndlr && !varDsc->lvDoNotEnregister && | ||
((node->gtFlags & GTF_VAR_DEF) != 0)) | ||
{ | ||
varDsc->incRefCnts(0, this); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there some other way to accomplish this? I never like to see us bias this sort of accounting if we can avoid it.
Say down the road we want to use weighted ref counts for other optimizations -- for instance trying to allocate the most frequently accessed locals at small FP or SP offsets -- we'd want these weighted ref counts to reflect our best model of reality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is tricky because of the way we overload the ref counts for use by optimization and register allocation - the register allocator uses the weight to determine the value of allocating a register.
Note that this is only the IsLir()
path.
|
||
// Store current ExecutionContext and SynchronizationContext as "previousXxx". | ||
// This allows us to restore them and undo any Context changes made in stateMachine.MoveNext | ||
// so that they won't "leak" out of the first await. | ||
ExecutionContext? previousExecutionCtx = previousExecutionCtx0; | ||
SynchronizationContext? previousSyncCtx = currentThread0._synchronizationContext; | ||
ExecutionContext? previousExecutionCtx = currentThread._executionContext; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be good to post diffs for these (maybe in a gist). Also when these "manual write-thru" changes were added there was some kind of perf testing -- have you looked at trying to revisit those tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto for the other Fx edits below...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am unaware of perf testing that was done when those mitigations were added - @stephentoub do you know how those might have been tested?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some perf tests in dotnet/coreclr#15629.
Possible follow-up: seems like this change might allow us to get rid of the Also if you're interested in stress-testing, you might try the |
@AndyAyersMS - thanks. Right now I'm working on getting this checked in disabled. Turns out there are some minor regressions due to the way block boundaries are handled. I'm trying to get those eliminated. At that point it would be interesting to work on the stress testing; I think the other is best to leave for later. |
|
||
if (newPreferences != RBM_NONE) | ||
if (!interval->isWriteThru || !isCallKill) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you (perhaps in a comment) explain this part of the change?
// so we won't break even until we have at least 4 * BB_UNITY_WEIGHT. | ||
// Given that we also don't have a good way to tell whether the variable is live | ||
// across a call in the non-EH code, we'll be extra conservative about this. | ||
// Note that for writeThru intervals we don't update the preferences to be only callee-save. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I take it this is the matching bit of logic for line 1185?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'll add a pointer
Have now looked over everything but lsra.cpp; not sure how much help I can be with the changes there. |
cf0b78a
to
9cbe9af
Compare
@dotnet/jit-contrib - This is ready for review. It is disabled by default and there are zero diffs. |
I plan to rebase and run jitstress jitstressregs once my two stress fixes are merged. |
9cbe9af
to
0bb9330
Compare
/azp run runtime-coreclr jitstress2-jitstressregs |
No pipelines are associated with this pull request. |
/azp list |
{ | ||
if (isBorn) | ||
{ | ||
compiler->codeGen->genUpdateVarReg(fldVarDsc, tree); | ||
} | ||
compiler->codeGen->genUpdateRegLife(fldVarDsc, isBorn, isDying DEBUGARG(tree)); | ||
} | ||
else | ||
if (isInMemory) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would these changes from else
to if (isInMemory)
be better as else if (isInMemory)
with another else
with a noway_assert()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No; the whole point of this change is that a variable can now be valid both in register and in memory, because when we "write-thru" a definition of an EH exposed variable it can remain live in the register as well.
@dotnet/jit-contrib ping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few more notes; I think there are still some unaddressed comments from earlier too.
|
||
// The GTF_SPILL flag generally means that we need to spill this local. | ||
// The exception is the case of an EH var that is being "spilled" | ||
// to the stack, indicated by aby an EH GT_LCL_VAR use that is marked GTF_SPILL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo 'by aby'
@@ -401,6 +401,7 @@ RETAIL_CONFIG_DWORD_INFO_EX(EXTERNAL_FeatureSIMD, W("FeatureSIMD"), EXTERNAL_Fea | |||
RETAIL_CONFIG_DWORD_INFO(INTERNAL_SIMD16ByteOnly, W("SIMD16ByteOnly"), 0, "Limit maximum SIMD vector length to 16 bytes (used by x64_arm64_altjit)") | |||
RETAIL_CONFIG_DWORD_INFO_EX(EXTERNAL_EnableAVX, W("EnableAVX"), EXTERNAL_JitEnableAVX_Default, "Enable AVX instruction set for wide operations as default", CLRConfig::REGUTIL_default) | |||
RETAIL_CONFIG_DWORD_INFO_EX(UNSUPPORTED_TrackDynamicMethodDebugInfo, W("TrackDynamicMethodDebugInfo"), 0, "Specifies whether debug info should be generated and tracked for dynamic methods", CLRConfig::REGUTIL_default) | |||
RETAIL_CONFIG_DWORD_INFO_EX(EXTERNAL_EnableEHWriteThru, W("EnableEHWriteThru"), 0, "Enable enregistration of variables live on EH edges", CLRConfig::REGUTIL_default) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you need this, the runtime doesn't care...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yes - thanks.
src/coreclr/src/jit/lsra.h
Outdated
@@ -78,6 +78,14 @@ inline regMaskTP calleeSaveRegs(RegisterType rt) | |||
return varTypeIsIntegralOrI(rt) ? RBM_INT_CALLEE_SAVED : RBM_FLT_CALLEE_SAVED; | |||
} | |||
|
|||
//------------------------------------------------------------------------ | |||
// registerTypesEquivalent: Get the set of callee-save registers of the given RegisterType |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cut and paste issue in comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like same just above too, in calleeSaveRegs
Mark EH variables (those that are live in or out of exception regions) only as lvLiveInOutOfHndlr, not necessarily lvDoNotEnregister During register allocation, mark these as write-thru, and mark all defs as write-thru, ensuring that the stack value is always valid. Mark those defs with GTF_SPILLED (this the "reload" flag and is not currently used for pure defs) to indicate that it should be kept in the register. Mark blocks that enter EH regions as having no predecessor, and set the location of all live-in vars to be on the stack. Change genFnPrologCalleeRegArgs to store EH vars also to the stack if they have a register assignment.
…cal register RefPositions during allocation.
0bb9330
to
c8c0f17
Compare
I believe I addressed all of the previous comments, along with the latest; if not, could you point out what's left? |
Looks like just this one. |
@@ -724,13 +729,16 @@ void Compiler::compChangeLife(VARSET_VALARG_TP newLife) | |||
|
|||
if (varDsc->lvIsInReg()) | |||
{ | |||
#ifdef DEBUG | |||
if (VarSetOps::IsMember(this, codeGen->gcInfo.gcVarPtrSetCur, bornVarIndex)) | |||
if (!varDsc->lvLiveInOutOfHndlr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a comment explaining what/why we are doing here..
|
||
#if defined(TARGET_ARM) | ||
if (storeType == TYP_DOUBLE) | ||
if ((storeType == TYP_DOUBLE) && !regArgTab[argNum].writeThru) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a comment regarding why we special case writeThru here
I believe that I've addressed all the PR feedback. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Bruce and I were talking about creating a rolling test where we could enable off by default changes like this so they don't regress while we're working on getting them to be on by default.
Mark EH variables (those that are live in or out of exception regions) only as lvLiveInOutOfHndlr, not necessarily lvDoNotEnregister.
During register allocation, mark these as write-thru, and mark all defs as write-thru, ensuring that the stack value is always valid.
Mark those defs with GTF_SPILLED (this the "reload" flag and is not currently used for pure defs) to indicate that it should be kept in the register.
Mark blocks that enter EH regions as having no predecessor, and set the location of all live-in vars to be on the stack.
Change genFnPrologCalleeRegArgs to store EH vars also to the stack if they have a register assignment.