Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Write-Thru of EH variables in LSRA #543

Merged
merged 7 commits into from
Feb 19, 2020

Conversation

CarolEidt
Copy link
Contributor

Mark EH variables (those that are live in or out of exception regions) only as lvLiveInOutOfHndlr, not necessarily lvDoNotEnregister.
During register allocation, mark these as write-thru, and mark all defs as write-thru, ensuring that the stack value is always valid.
Mark those defs with GTF_SPILLED (this the "reload" flag and is not currently used for pure defs) to indicate that it should be kept in the register.
Mark blocks that enter EH regions as having no predecessor, and set the location of all live-in vars to be on the stack.
Change genFnPrologCalleeRegArgs to store EH vars also to the stack if they have a register assignment.

@CarolEidt CarolEidt added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Dec 5, 2019
@CarolEidt
Copy link
Contributor Author

@dotnet/jit-contrib PTAL

Copy link
Contributor

@sandreenko sandreenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expect that change has an impact on throughput and memory consumption, how big is it?

The implementation looks good, but I am scared to have so many places where we check writeThru, that means it will be very easy to forget some of them.

bool isLocalVar : 1;

// Is this Interval currently in a register and live?
bool isActive;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should not it be declared as a bitfield?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved this field from the base type (Referenceable) to Interval, and changing it to a bitfield seemed to have a minor negative impact.

bool lvaVarDoNotEnregister(unsigned varNum);

bool lvaEnregEHVars;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the same question why it is not a bitfield.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't currently have any bitfields on Compiler, and given that there is only one instance per compilation, it seems that the efficiency of querying a byte overrides the storage impact.

if (isSpilledValue)
{
// Is this the special case of a write-thru lclVar?
// We mark it as SPILLED to denote that its value is valid in memroy.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: memroy


if (tree->gtFlags & GTF_SPILLED)
// Is this a spilled value?
bool isSpilledValue = ((tree->gtFlags & GTF_SPILLED) != 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do we actually need to know here:

  1. is the value valid in memory?
  2. is the value valid in a register?

looks like the second, so maybe rename isSpilledValue to validInReg?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. With the removal of legacy jit, I'm pretty sure this code is only used for lclVar spilling, so I think I'll try to simplify it.

@@ -85,6 +85,8 @@ void Compiler::lvaInit()
lvaCurEpoch = 0;

structPromotionHelper = new (this, CMK_Generic) StructPromotionHelper(this);

lvaEnregEHVars = ((opts.compFlags & CLFLG_REGVAR) != 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will happen if lsra::enregisterLocalVars and compiler::lvaEnregEHVars have different values, for example enregisterLocalVars==false and lvaEnregEHVars==true, will we try allocate registers for such variables or not?
If now why do we need a separate flag?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although I hadn't added it yet, I anticipated having a COMPlus variable to separately control this (which I'm hopefully about to push as an update to this PR). However, LSRA uses the condition above to set enregisterLocalVars so it wouldn't be possible to have lvaEnregEHVars be true and enregisterLocalVars be false.

@@ -3815,7 +3825,7 @@ void CodeGen::genFnPrologCalleeRegArgs(regNumber xtraReg, bool* pXtraRegClobbere
#endif // !_TARGET_64BIT_
{
// If not a stack arg go to the next one
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe // if only register homed go to the next one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about // If this arg is never on the stack, go to the next one.?

// by a non-def GT_LCL_VAR that is marked GTF_SPILL.
// This case occurs at the end of a block when its register is going dead.
// It is already valid on the stack.
if (((tree->gtFlags & GTF_VAR_DEF) != 0) || !varDsc->lvLiveInOutOfHndlr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please update that comment so it includes normal spills, not only lvLiveInOutOfHndlr, because, at least for me, that was not obvious after the first read.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, this comment is a bit confusing...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still seems confusing, the comment is talking about GTF_SPILL but the code you changed here does not look at this flag.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pushed another change to this comment, hopefully making it clear.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's better, yes. @sandreenko do you agree?

{
// Treat this as having incoming EH flow, since we can't insert resolution moves into
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That change confuses me, but since hasEHBoundaryIn was not used before that PR there could not be any regressions from that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What this does is change the block to have no predecessor, so that it doesn't have to deal with mismatched locations of EH vars across the boundary. However, this change actually causes some small diffs when the EH write-thru is disabled, because it creates mismatches on the non-EH edges. I'm looking into how complex it would be to eliminate this change, but it may be excessively complex. The crossgen diffs over all jits & altjits for frameworks & tests shows only 4 bytes of diff each over two large methods for arm32, and 40 bytes of diff for one 4020 byte method for x64/ux altjit.

@@ -4435,7 +4498,24 @@ void LinearScan::spillGCRefs(RefPosition* killRefPosition)
{
continue;
}
unassignPhysReg(regRecord, assignedInterval->recentRefPosition);
bool needsKill = varTypeIsGC(assignedInterval->registerType);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that was part was already merged in #679, why does github show that as a diff?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it's because this is actually showing the original change that I extracted for that PR. I haven't yet pushed a rebased version of this PR.

// Compute set difference: newLiveIn = currentLiveVars - predBlock->bbLiveOut
VarSetOps::DiffD(compiler, newLiveIn, predBlock->bbLiveOut);
}
bool needsDummyDefs = (!VarSetOps::IsEmpty(compiler, newLiveIn) && block != compiler->fgFirstBB);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please explain how this PR relates to the deletion of block != compiler->fgFirstBB condition here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this PR, we have blocks with an incoming EH boundary that may have live-in candidate vars. Previously those vars would have no live-in candidate vars, by construction, so we didn't need to worry about whether they needed dummy defs. Now that we have live-in EH vars, we don't want dummy defs for those (they should always be on stack on entry), so they are treated as having no pred block (like fgFirstBB), and that becomes the right condition for determining when we don't want dummy defs.

@CarolEidt
Copy link
Contributor Author

I expect that change has an impact on throughput and memory consumption, how big is it?

The Throughput tuning commit brings this from roughly a .05% throughput loss on x64 (in the noise on x86) to a .05% improvement on x64 and a .007% improvement (barely above the noise) on x86 for crossgen of SPC.dll

The implementation looks good, but I am scared to have so many places where we check writeThru, that means it will be very easy to forget some of them.

I'm not sure I fully understand. This is pretty fundamental, as these variables need to be handled differently.

Copy link
Member

@AndyAyersMS AndyAyersMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some initial comments. Still need to look through lsra.cpp ...

@@ -415,6 +415,8 @@ class LclVarDsc
unsigned char lvDoNotEnregister : 1; // Do not enregister this variable.
unsigned char lvFieldAccessed : 1; // The var is a struct local, and a field of the variable is accessed. Affects
// struct promotion.
unsigned char lvLiveInOutOfHndlr : 1; // The variable is live in or out of an exception handler, and therefore must
// be on the stack (at least at those boundaries.)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you update the text for this when we dump the local var table? In existing code this is debug only and only gets dumped if the local is DNER. Would be nice to have it show up more prominently.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added an " EH" annotation unconditionally, and left the 'H' character in the DNER dump.

// by a non-def GT_LCL_VAR that is marked GTF_SPILL.
// This case occurs at the end of a block when its register is going dead.
// It is already valid on the stack.
if (((tree->gtFlags & GTF_VAR_DEF) != 0) || !varDsc->lvLiveInOutOfHndlr)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, this comment is a bit confusing...

// If this is going live, the register must not have a variable in it, except
// in the case of an exception variable, which may be already treated as live
// in the register.
assert(varDsc->lvLiveInOutOfHndlr || ((regSet.GetMaskVars() & regMask) == 0));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we consider any of these lvLiveInOutofHndlr asserts as candidates for noway_assert?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pretty much left the assert\noway_assert distinction as-is - these are generally just either expanding or contracting the condition for an assert.

if (varDsc->lvLiveInOutOfHndlr && !varDsc->lvDoNotEnregister &&
((node->gtFlags & GTF_VAR_DEF) != 0))
{
varDsc->incRefCnts(0, this);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there some other way to accomplish this? I never like to see us bias this sort of accounting if we can avoid it.

Say down the road we want to use weighted ref counts for other optimizations -- for instance trying to allocate the most frequently accessed locals at small FP or SP offsets -- we'd want these weighted ref counts to reflect our best model of reality.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is tricky because of the way we overload the ref counts for use by optimization and register allocation - the register allocator uses the weight to determine the value of allocating a register.
Note that this is only the IsLir() path.


// Store current ExecutionContext and SynchronizationContext as "previousXxx".
// This allows us to restore them and undo any Context changes made in stateMachine.MoveNext
// so that they won't "leak" out of the first await.
ExecutionContext? previousExecutionCtx = previousExecutionCtx0;
SynchronizationContext? previousSyncCtx = currentThread0._synchronizationContext;
ExecutionContext? previousExecutionCtx = currentThread._executionContext;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to post diffs for these (maybe in a gist). Also when these "manual write-thru" changes were added there was some kind of perf testing -- have you looked at trying to revisit those tests?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto for the other Fx edits below...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am unaware of perf testing that was done when those mitigations were added - @stephentoub do you know how those might have been tested?

Copy link
Member

@AndyAyersMS AndyAyersMS Dec 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some perf tests in dotnet/coreclr#15629.

@AndyAyersMS
Copy link
Member

AndyAyersMS commented Dec 19, 2019

Possible follow-up: seems like this change might allow us to get rid of the lvVolatileHint computations we do in lvaMarkLclRefs and the code to copy volatile hinted locals in optAddCopies.

Also if you're interested in stress-testing, you might try the mutate-test tool from jitutils, it can introduce EH into all methods in a test case.

@CarolEidt
Copy link
Contributor Author

@AndyAyersMS - thanks. Right now I'm working on getting this checked in disabled. Turns out there are some minor regressions due to the way block boundaries are handled. I'm trying to get those eliminated. At that point it would be interesting to work on the stress testing; I think the other is best to leave for later.


if (newPreferences != RBM_NONE)
if (!interval->isWriteThru || !isCallKill)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you (perhaps in a comment) explain this part of the change?

// so we won't break even until we have at least 4 * BB_UNITY_WEIGHT.
// Given that we also don't have a good way to tell whether the variable is live
// across a call in the non-EH code, we'll be extra conservative about this.
// Note that for writeThru intervals we don't update the preferences to be only callee-save.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I take it this is the matching bit of logic for line 1185?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'll add a pointer

@AndyAyersMS
Copy link
Member

Have now looked over everything but lsra.cpp; not sure how much help I can be with the changes there.

@CarolEidt
Copy link
Contributor Author

@dotnet/jit-contrib - This is ready for review. It is disabled by default and there are zero diffs.

@CarolEidt
Copy link
Contributor Author

I plan to rebase and run jitstress jitstressregs once my two stress fixes are merged.

@CarolEidt
Copy link
Contributor Author

/azp run runtime-coreclr jitstress2-jitstressregs

@azure-pipelines
Copy link

No pipelines are associated with this pull request.

@echesakov
Copy link
Contributor

/azp list

{
if (isBorn)
{
compiler->codeGen->genUpdateVarReg(fldVarDsc, tree);
}
compiler->codeGen->genUpdateRegLife(fldVarDsc, isBorn, isDying DEBUGARG(tree));
}
else
if (isInMemory)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would these changes from else to if (isInMemory) be better as else if (isInMemory) with another else with a noway_assert()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No; the whole point of this change is that a variable can now be valid both in register and in memory, because when we "write-thru" a definition of an EH exposed variable it can remain live in the register as well.

@CarolEidt
Copy link
Contributor Author

@dotnet/jit-contrib ping

Copy link
Member

@AndyAyersMS AndyAyersMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few more notes; I think there are still some unaddressed comments from earlier too.


// The GTF_SPILL flag generally means that we need to spill this local.
// The exception is the case of an EH var that is being "spilled"
// to the stack, indicated by aby an EH GT_LCL_VAR use that is marked GTF_SPILL
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo 'by aby'

@@ -401,6 +401,7 @@ RETAIL_CONFIG_DWORD_INFO_EX(EXTERNAL_FeatureSIMD, W("FeatureSIMD"), EXTERNAL_Fea
RETAIL_CONFIG_DWORD_INFO(INTERNAL_SIMD16ByteOnly, W("SIMD16ByteOnly"), 0, "Limit maximum SIMD vector length to 16 bytes (used by x64_arm64_altjit)")
RETAIL_CONFIG_DWORD_INFO_EX(EXTERNAL_EnableAVX, W("EnableAVX"), EXTERNAL_JitEnableAVX_Default, "Enable AVX instruction set for wide operations as default", CLRConfig::REGUTIL_default)
RETAIL_CONFIG_DWORD_INFO_EX(UNSUPPORTED_TrackDynamicMethodDebugInfo, W("TrackDynamicMethodDebugInfo"), 0, "Specifies whether debug info should be generated and tracked for dynamic methods", CLRConfig::REGUTIL_default)
RETAIL_CONFIG_DWORD_INFO_EX(EXTERNAL_EnableEHWriteThru, W("EnableEHWriteThru"), 0, "Enable enregistration of variables live on EH edges", CLRConfig::REGUTIL_default)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need this, the runtime doesn't care...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes - thanks.

@@ -78,6 +78,14 @@ inline regMaskTP calleeSaveRegs(RegisterType rt)
return varTypeIsIntegralOrI(rt) ? RBM_INT_CALLEE_SAVED : RBM_FLT_CALLEE_SAVED;
}

//------------------------------------------------------------------------
// registerTypesEquivalent: Get the set of callee-save registers of the given RegisterType
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cut and paste issue in comment?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like same just above too, in calleeSaveRegs

Mark EH variables (those that are live in or out of exception regions) only as lvLiveInOutOfHndlr, not necessarily lvDoNotEnregister
During register allocation, mark these as write-thru, and mark all defs as write-thru, ensuring that the stack value is always valid.
Mark those defs with GTF_SPILLED (this the "reload" flag and is not currently used for pure defs) to indicate that it should be kept in the register.
Mark blocks that enter EH regions as having no predecessor, and set the location of all live-in vars to be on the stack.
Change genFnPrologCalleeRegArgs to store EH vars also to the stack if they have a register assignment.
…cal register RefPositions during allocation.
@CarolEidt
Copy link
Contributor Author

I think there are still some unaddressed comments from earlier too.

I believe I addressed all of the previous comments, along with the latest; if not, could you point out what's left?

@AndyAyersMS
Copy link
Member

could you point out what's left?

Looks like just this one.

@@ -724,13 +729,16 @@ void Compiler::compChangeLife(VARSET_VALARG_TP newLife)

if (varDsc->lvIsInReg())
{
#ifdef DEBUG
if (VarSetOps::IsMember(this, codeGen->gcInfo.gcVarPtrSetCur, bornVarIndex))
if (!varDsc->lvLiveInOutOfHndlr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment explaining what/why we are doing here..


#if defined(TARGET_ARM)
if (storeType == TYP_DOUBLE)
if ((storeType == TYP_DOUBLE) && !regArgTab[argNum].writeThru)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment regarding why we special case writeThru here

@CarolEidt
Copy link
Contributor Author

I believe that I've addressed all the PR feedback.

Copy link
Member

@AndyAyersMS AndyAyersMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Bruce and I were talking about creating a rolling test where we could enable off by default changes like this so they don't regress while we're working on getting them to be on by default.

@CarolEidt CarolEidt merged commit 3be5238 into dotnet:master Feb 19, 2020
@CarolEidt CarolEidt deleted the EHWriteThru branch July 16, 2020 16:58
@ghost ghost locked as resolved and limited conversation to collaborators Dec 11, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants