-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: enable tail calls and copy omission for implicit byref structs #33004
Changes from 1 commit
13f0f6a
364a302
5a34292
a0b9591
c3cc4b0
cf98b3e
50da720
7084ebf
a33348c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||
---|---|---|---|---|---|---|---|---|
|
@@ -16193,6 +16193,55 @@ bool GenTree::IsLocalAddrExpr(Compiler* comp, GenTreeLclVarCommon** pLclVarTree, | |||||||
return false; | ||||||||
} | ||||||||
|
||||||||
//------------------------------------------------------------------------ | ||||||||
// IsImplicitByrefParameterValue: determine if this tree is the entire | ||||||||
// value of a local implicit byref parameter | ||||||||
// | ||||||||
// Arguments: | ||||||||
// compiler -- compiler instance | ||||||||
// | ||||||||
// Return Value: | ||||||||
// GenTreeLclVarCommon node for the local, or nullptr. | ||||||||
|
||||||||
GenTreeLclVarCommon* GenTree::IsImplicitByrefParameterValue(Compiler* compiler) | ||||||||
{ | ||||||||
#if defined(TARGET_AMD64) || defined(TARGET_ARM64) | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Shouldn't this be limited to windows amd64? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Windows, yes. We seem to think arm64 can have these too. I don't know the ABI well enough to say one way or another. runtime/src/coreclr/src/jit/compiler.h Lines 444 to 446 in 5ea5b57
Perhaps we need a define just for this and a cleanup of all references? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Looks like windows arm64 uses the standard ARM64 EABI, so there are no implicit byrefs there. I'll put up a separate PR to clean this up here and throughout the rest of the jit codebase. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was under the impression that arm64 has implicit byrefs, but defining it for unix amd64 seems incorrect for sure. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. > 16 byte structs on arm64 windows or linux I believe are passed by reference. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah, I see it now.. (from the EABI)
So the right condition is windows x64, and all arm64. |
||||||||
|
||||||||
GenTreeLclVarCommon* lcl = nullptr; | ||||||||
|
||||||||
if (OperIs(GT_LCL_VAR)) | ||||||||
{ | ||||||||
lcl = AsLclVarCommon(); | ||||||||
} | ||||||||
else if (OperIs(GT_OBJ)) | ||||||||
{ | ||||||||
GenTree* addr = AsIndir()->Addr(); | ||||||||
|
||||||||
if (addr->OperIs(GT_LCL_VAR)) | ||||||||
{ | ||||||||
lcl = addr->AsLclVarCommon(); | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't understand this case, There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, this is potentially confusing. The short answer is that before morph, arguments passed implicitly by-reference are represented by-value in the IR. Note the first two "match" cases here are pre-existing logic; all I did was move this to a helper method and add the last case. |
||||||||
} | ||||||||
else if (addr->OperIs(GT_ADDR)) | ||||||||
{ | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there a semantic difference between There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What this means changes over time, and we're straddling that change. See above? |
||||||||
GenTree* base = addr->AsOp()->gtOp1; | ||||||||
|
||||||||
if (base->OperIs(GT_LCL_VAR)) | ||||||||
{ | ||||||||
lcl = base->AsLclVarCommon(); | ||||||||
} | ||||||||
} | ||||||||
} | ||||||||
|
||||||||
if ((lcl != nullptr) && compiler->lvaIsImplicitByRefLocal(lcl->GetLclNum())) | ||||||||
{ | ||||||||
return lcl; | ||||||||
} | ||||||||
|
||||||||
#endif // defined(TARGET_AMD64) || defined(TARGET_ARM64) | ||||||||
|
||||||||
return nullptr; | ||||||||
} | ||||||||
|
||||||||
//------------------------------------------------------------------------ | ||||||||
// IsLclVarUpdateTree: Determine whether this is an assignment tree of the | ||||||||
// form Vn = Vn 'oper' 'otherTree' where Vn is a lclVar | ||||||||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4796,46 +4796,49 @@ void Compiler::fgMakeOutgoingStructArgCopy(GenTreeCall* call, | |
// | ||
// We don't need a copy if this is the last use of an implicit by-ref local. | ||
// | ||
// We can't determine that all of the time, but if there is only | ||
// one use and the method has no loops, then this use must be the last. | ||
if (opts.OptimizationEnabled()) | ||
{ | ||
GenTreeLclVarCommon* lcl = nullptr; | ||
|
||
if (argx->OperIsLocal()) | ||
{ | ||
lcl = argx->AsLclVarCommon(); | ||
} | ||
else if ((argx->OperGet() == GT_OBJ) && argx->AsIndir()->Addr()->OperIsLocal()) | ||
{ | ||
lcl = argx->AsObj()->Addr()->AsLclVarCommon(); | ||
} | ||
GenTreeLclVarCommon* const lcl = argx->IsImplicitByrefParameterValue(this); | ||
|
||
if (lcl != nullptr) | ||
{ | ||
unsigned varNum = lcl->AsLclVarCommon()->GetLclNum(); | ||
if (lvaIsImplicitByRefLocal(varNum)) | ||
{ | ||
LclVarDsc* varDsc = &lvaTable[varNum]; | ||
// JIT_TailCall helper has an implicit assumption that all tail call arguments live | ||
// on the caller's frame. If an argument lives on the caller caller's frame, it may get | ||
// overwritten if that frame is reused for the tail call. Therefore, we should always copy | ||
// struct parameters if they are passed as arguments to a tail call. | ||
if (!call->IsTailCallViaHelper() && (varDsc->lvRefCnt(RCS_EARLY) == 1) && !fgMightHaveLoop()) | ||
{ | ||
assert(!call->IsTailCall()); | ||
unsigned varNum = lcl->AsLclVarCommon()->GetLclNum(); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No need for |
||
LclVarDsc* varDsc = lvaGetDesc(varNum); | ||
const unsigned short totalAppearances = varDsc->lvRefCnt(RCS_EARLY); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. question: how do you decide what is marked as const and what is not? Like why There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I try to mark all locals that won't be redefined as const, but am more likely to do this for lines I explicitly edit. |
||
|
||
varDsc->setLvRefCnt(0, RCS_EARLY); | ||
args->SetNode(lcl); | ||
assert(argEntry->GetNode() == lcl); | ||
// We don't have liveness so we rely on other indications of last use. | ||
// | ||
// We handle these cases: | ||
// | ||
// * (must not copy) If the call is a tail call, the use is a last use. | ||
// We must skip the copy if we have a fast tail call. | ||
// | ||
// * (must copy) However the current slow tail call helper always copies | ||
// the tail call args from the current frame, so we must copy | ||
// if the tail call is a slow tail call. | ||
// | ||
// * (may not copy) if the call is noreturn, the use is a last use. | ||
// | ||
// * (may not copy) if there is exactly one use of the local in the method, | ||
// and the call is not in loop, this is a last use. | ||
// | ||
const bool isTailCallLastUse = call->IsTailCall(); | ||
const bool isCallLastUse = (totalAppearances == 1) && !fgMightHaveLoop(); | ||
const bool isNoReturnLastUse = call->IsNoReturn(); | ||
if (!call->IsTailCallViaHelper() && (isTailCallLastUse || isCallLastUse || isNoReturnLastUse)) | ||
{ | ||
varDsc->setLvRefCnt(0, RCS_EARLY); | ||
args->SetNode(lcl); | ||
assert(argEntry->GetNode() == lcl); | ||
|
||
JITDUMP("did not have to make outgoing copy for V%2d", varNum); | ||
return; | ||
} | ||
JITDUMP("did not need to make outgoing copy for last use of implicit byref V%2d\n", varNum); | ||
return; | ||
} | ||
} | ||
} | ||
|
||
JITDUMP("making an outgoing copy for struct arg\n"); | ||
|
||
if (fgOutgoingArgTemps == nullptr) | ||
{ | ||
fgOutgoingArgTemps = hashBv::Create(this); | ||
|
@@ -6550,7 +6553,7 @@ void Compiler::fgMorphCallInlineHelper(GenTreeCall* call, InlineResult* result) | |
// | ||
// Windows Amd64: | ||
// A fast tail call can be made whenever the number of callee arguments | ||
// is larger than or equal to the number of caller arguments, or we have four | ||
// is less than or equal to the number of caller arguments, or we have four | ||
// or fewer callee arguments. This is because, on Windows AMD64, each | ||
// argument uses exactly one register or one 8-byte stack slot. Thus, we only | ||
// need to count arguments, and not be concerned with the size of each | ||
|
@@ -6688,12 +6691,30 @@ bool Compiler::fgCanFastTailCall(GenTreeCall* callee, const char** failReason) | |
|
||
if (arg->isStruct) | ||
{ | ||
// Byref struct arguments are not allowed to fast tail call as the information | ||
// of the caller's stack is lost when the callee is compiled. | ||
if (arg->passedByRef) | ||
{ | ||
hasByrefParameter = true; | ||
break; | ||
// If we're optimizing, we may be able to pass our caller's byref to our callee, | ||
// and so still be able to tail call. | ||
GenTreeLclVarCommon* const lcl = arg->GetNode()->IsImplicitByrefParameterValue(this); | ||
|
||
// For this to work we must not have promoted the parameter; if we've promoted | ||
// then we'll make a local struct at the call. | ||
// | ||
// We could handle this if there was some way to copy back to the original struct. | ||
if (opts.OptimizationEnabled() && (lcl != nullptr) && !lvaGetDesc(lcl)->lvPromoted) | ||
{ | ||
// We can still tail call in this case, provided we | ||
// suppress the copy in fgMakeOutgoingStructArgCopy | ||
JITDUMP("Arg [%06u] is unpromoted implicit byref V%02u\n", dspTreeID(arg->GetNode()), | ||
lcl->GetLclNum()); | ||
} | ||
else | ||
{ | ||
// We can't tail call in this case; note the reason why | ||
// and skip looking at the rest of the args. | ||
hasByrefParameter = true; | ||
break; | ||
} | ||
} | ||
} | ||
} | ||
|
@@ -6748,17 +6769,18 @@ bool Compiler::fgCanFastTailCall(GenTreeCall* callee, const char** failReason) | |
#endif // DEBUG | ||
}; | ||
|
||
// Note on vararg methods: | ||
// If the caller is vararg method, we don't know the number of arguments passed by caller's caller. | ||
// But we can be sure that in-coming arg area of vararg caller would be sufficient to hold its | ||
// fixed args. Therefore, we can allow a vararg method to fast tail call other methods as long as | ||
// out-going area required for callee is bounded by caller's fixed argument space. | ||
// | ||
// Note that callee being a vararg method is not a problem since we can account the params being passed. | ||
// | ||
// We will currently decide to not fast tail call on Windows armarch if the caller or callee is a vararg | ||
// method. This is due to the ABI differences for native vararg methods for these platforms. There is | ||
// work required to shuffle arguments to the correct locations. | ||
// Note on vararg methods: | ||
// If the caller is vararg method, we don't know the number of arguments passed by caller's caller. | ||
// But we can be sure that in-coming arg area of vararg caller would be sufficient to hold its | ||
// fixed args. Therefore, we can allow a vararg method to fast tail call other methods as long as | ||
// out-going area required for callee is bounded by caller's fixed argument space. | ||
// | ||
// Note that callee being a vararg method is not a problem since we can account the params being passed. | ||
// | ||
// We will currently decide to not fast tail call on Windows armarch if the caller or callee is a vararg | ||
// method. This is due to the ABI differences for native vararg methods for these platforms. There is | ||
// work required to shuffle arguments to the correct locations. | ||
CLANG_FORMAT_COMMENT_ANCHOR; | ||
|
||
#if (defined(TARGET_WINDOWS) && defined(TARGET_ARM)) || (defined(TARGET_WINDOWS) && defined(TARGET_ARM64)) | ||
if (info.compIsVarArgs || callee->IsVarargs()) | ||
|
@@ -6836,13 +6858,18 @@ GenTree* Compiler::fgMorphPotentialTailCall(GenTreeCall* call) | |
// It cannot be an inline candidate | ||
assert(!call->IsInlineCandidate()); | ||
|
||
auto failTailCall = [&](const char* reason) { | ||
auto failTailCall = [&](const char* reason, unsigned lclNum = BAD_VAR_NUM) { | ||
#ifdef DEBUG | ||
if (verbose) | ||
{ | ||
printf("\nRejecting tail call in morph for call "); | ||
printTreeID(call); | ||
printf(": %s\n", reason); | ||
printf(": %s", reason); | ||
if (lclNum != BAD_VAR_NUM) | ||
{ | ||
printf(" V%02u", lclNum); | ||
} | ||
printf("\n"); | ||
} | ||
#endif | ||
|
||
|
@@ -6946,9 +6973,9 @@ GenTree* Compiler::fgMorphPotentialTailCall(GenTreeCall* call) | |
// | ||
if (call->IsImplicitTailCall()) | ||
{ | ||
if (varDsc->lvHasLdAddrOp) | ||
if (varDsc->lvHasLdAddrOp && !lvaIsImplicitByRefLocal(varNum)) | ||
{ | ||
failTailCall("Local address taken"); | ||
failTailCall("Local address taken", varNum); | ||
return nullptr; | ||
} | ||
if (varDsc->lvAddrExposed) | ||
|
@@ -6971,20 +6998,20 @@ GenTree* Compiler::fgMorphPotentialTailCall(GenTreeCall* call) | |
} | ||
else | ||
{ | ||
failTailCall("Local address taken"); | ||
failTailCall("Local address taken", varNum); | ||
return nullptr; | ||
} | ||
} | ||
if (varDsc->lvPromoted && varDsc->lvIsParam && !lvaIsImplicitByRefLocal(varNum)) | ||
{ | ||
failTailCall("Has Struct Promoted Param"); | ||
failTailCall("Has Struct Promoted Param", varNum); | ||
return nullptr; | ||
} | ||
if (varDsc->lvPinned) | ||
{ | ||
// A tail call removes the method from the stack, which means the pinning | ||
// goes away for the callee. We can't allow that. | ||
failTailCall("Has Pinned Vars"); | ||
failTailCall("Has Pinned Vars", varNum); | ||
return nullptr; | ||
} | ||
} | ||
|
@@ -16828,8 +16855,23 @@ void Compiler::fgRetypeImplicitByRefArgs() | |
// through the pointer parameter, the same as we'd do for this | ||
// parameter if it weren't promoted at all (otherwise the initialization | ||
// of the new temp would just be a needless memcpy at method entry). | ||
bool undoPromotion = (lvaGetPromotionType(newVarDsc) == PROMOTION_TYPE_DEPENDENT) || | ||
(varDsc->lvRefCnt(RCS_EARLY) <= varDsc->lvFieldCnt); | ||
// | ||
// Otherwise, see how many appearances there are. We keep two early ref counts: total | ||
// number of references to the struct or some field, and how many of these are | ||
// arguments to calls. We undo promotion unless we see enougn non-call uses. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: |
||
// | ||
const unsigned totalAppearances = varDsc->lvRefCnt(RCS_EARLY); | ||
const unsigned callAppearances = varDsc->lvRefCntWtd(RCS_EARLY); | ||
assert(totalAppearances >= callAppearances); | ||
const unsigned nonCallAppearances = totalAppearances - callAppearances; | ||
|
||
bool undoPromotion = ((lvaGetPromotionType(newVarDsc) == PROMOTION_TYPE_DEPENDENT) || | ||
(nonCallAppearances <= varDsc->lvFieldCnt)); | ||
|
||
JITDUMP("%s promotion of implicit by-ref V%02u: %s total: %u non-call: %u fields: %u\n", | ||
undoPromotion ? "Undoing" : "Keeping", lclNum, | ||
(lvaGetPromotionType(newVarDsc) == PROMOTION_TYPE_DEPENDENT) ? "dependent;" : "", | ||
totalAppearances, nonCallAppearances, varDsc->lvFieldCnt); | ||
|
||
if (!undoPromotion) | ||
{ | ||
|
@@ -16863,7 +16905,7 @@ void Compiler::fgRetypeImplicitByRefArgs() | |
{ | ||
// Set the new parent. | ||
fieldVarDsc->lvParentLcl = newLclNum; | ||
// Clear the ref count field; it is used to communicate the nubmer of references | ||
// Clear the ref count field; it is used to communicate the number of references | ||
// to the implicit byref parameter when morphing calls that pass the implicit byref | ||
// out as an outgoing argument value, but that doesn't pertain to this field local | ||
// which is now a field of a non-arg local. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can change
GenTreeLclVarCommon
toGenTreeLclVar
, it is more precise for this case.