Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: enable tail calls and copy omission for implicit byref structs #33004

Merged
merged 9 commits into from
Mar 11, 2020
Merged
49 changes: 49 additions & 0 deletions src/coreclr/src/jit/gentree.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -16193,6 +16193,55 @@ bool GenTree::IsLocalAddrExpr(Compiler* comp, GenTreeLclVarCommon** pLclVarTree,
return false;
}

//------------------------------------------------------------------------
// IsImplicitByrefParameterValue: determine if this tree is the entire
// value of a local implicit byref parameter
//
// Arguments:
// compiler -- compiler instance
//
// Return Value:
// GenTreeLclVarCommon node for the local, or nullptr.

GenTreeLclVarCommon* GenTree::IsImplicitByrefParameterValue(Compiler* compiler)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can change GenTreeLclVarCommon to GenTreeLclVar, it is more precise for this case.

{
#if defined(TARGET_AMD64) || defined(TARGET_ARM64)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be limited to windows amd64?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Windows, yes.

We seem to think arm64 can have these too. I don't know the ABI well enough to say one way or another.

#if defined(TARGET_AMD64) || defined(TARGET_ARM64)
unsigned char lvIsImplicitByRef : 1; // Set if the argument is an implicit byref.
#endif // defined(TARGET_AMD64) || defined(TARGET_ARM64)

Perhaps we need a define just for this and a cleanup of all references?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like windows arm64 uses the standard ARM64 EABI, so there are no implicit byrefs there. I'll put up a separate PR to clean this up here and throughout the rest of the jit codebase.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was under the impression that arm64 has implicit byrefs, but defining it for unix amd64 seems incorrect for sure.

Copy link
Contributor

@jashook jashook Mar 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

> 16 byte structs on arm64 windows or linux I believe are passed by reference.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see it now.. (from the EABI)

If the argument type is a Composite Type larger than 16 bytes, then the argument is copied to memory allocated by the caller, and the argument is replaced by a pointer to the copy.

So the right condition is windows x64, and all arm64.


GenTreeLclVarCommon* lcl = nullptr;

if (OperIs(GT_LCL_VAR))
{
lcl = AsLclVarCommon();
}
else if (OperIs(GT_OBJ))
{
GenTree* addr = AsIndir()->Addr();

if (addr->OperIs(GT_LCL_VAR))
{
lcl = addr->AsLclVarCommon();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this case, LCL_VAR is used as an address here (in other cases it is used as a value), so it has to be 'byref' or 'ref' type. Then how can it be lvaIsImplicitByRefLocal if it always should return false for TYP_IMPL size lclVars?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is potentially confusing.

The short answer is that before morph, arguments passed implicitly by-reference are represented by-value in the IR.
This gets changed in fgMorphImplicitByRefArgs where we make the by-reference aspect explicit. But the above helper gets called both before and after, and so needs to recognize cases where the local either represents a struct or a pointer to a struct.

Note the first two "match" cases here are pre-existing logic; all I did was move this to a helper method and add the last case.

}
else if (addr->OperIs(GT_ADDR))
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a semantic difference between OBJ(lcl) and OBJ(ADDR(lcl)) where lcl is an implicit byref parameter? Do they both represent a copy of the struct? When do we end up with one vs. the other?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What this means changes over time, and we're straddling that change.

See above?

GenTree* base = addr->AsOp()->gtOp1;

if (base->OperIs(GT_LCL_VAR))
{
lcl = base->AsLclVarCommon();
}
}
}

if ((lcl != nullptr) && compiler->lvaIsImplicitByRefLocal(lcl->GetLclNum()))
{
return lcl;
}

#endif // defined(TARGET_AMD64) || defined(TARGET_ARM64)

return nullptr;
}

//------------------------------------------------------------------------
// IsLclVarUpdateTree: Determine whether this is an assignment tree of the
// form Vn = Vn 'oper' 'otherTree' where Vn is a lclVar
Expand Down
4 changes: 4 additions & 0 deletions src/coreclr/src/jit/gentree.h
Original file line number Diff line number Diff line change
Expand Up @@ -1841,6 +1841,10 @@ struct GenTree
// yields an address into a local
GenTreeLclVarCommon* IsLocalAddrExpr();

// Determine if this tree represents the value of an entire implict byref parameter,
// and if so return the tree for the parameter.
GenTreeLclVarCommon* IsImplicitByrefParameterValue(Compiler* compiler);

// Determine if this is a LclVarCommon node and return some additional info about it in the
// two out parameters.
bool IsLocalExpr(Compiler* comp, GenTreeLclVarCommon** pLclVarTree, FieldSeqNode** pFldSeq);
Expand Down
61 changes: 59 additions & 2 deletions src/coreclr/src/jit/lclmorph.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1097,10 +1097,67 @@ class LocalAddressVisitor final : public GenTreeVisitor<LocalAddressVisitor>
{
return;
}

LclVarDsc* varDsc = m_compiler->lvaGetDesc(lclNum);
JITDUMP("LocalAddressVisitor incrementing ref count from %d to %d for V%02d\n", varDsc->lvRefCnt(RCS_EARLY),
varDsc->lvRefCnt(RCS_EARLY) + 1, lclNum);
JITDUMP("LocalAddressVisitor incrementing ref count from %d to %d for implict by-ref V%02d\n",
varDsc->lvRefCnt(RCS_EARLY), varDsc->lvRefCnt(RCS_EARLY) + 1, lclNum);
varDsc->incLvRefCnt(1, RCS_EARLY);

// See if this struct is an argument to a call. This information is recorded
// via the weighted early ref count for the local, and feeds the undo promition
AndyAyersMS marked this conversation as resolved.
Show resolved Hide resolved
// heuristic.
//
// It can be approximate, so the pattern match below need not be exhaustive.
// But the pattern should at least subset the implicit byref cases that are
// handed in fgCanFastTailCall and fgMakeOutgoingStructArgCopy.
AndyAyersMS marked this conversation as resolved.
Show resolved Hide resolved
//
// CALL(OBJ(ADDR(LCL_VAR...)))
bool isArgToCall = false;
bool keepSearching = true;
for (int i = 0; i < m_ancestors.Height() && keepSearching; i++)
{
GenTree* node = m_ancestors.Top(i);
switch (i)
{
case 0:
{
keepSearching = node->OperIs(GT_LCL_VAR);
}
break;

case 1:
{
keepSearching = node->OperIs(GT_ADDR);
}
break;

case 2:
{
keepSearching = node->OperIs(GT_OBJ);
}
break;

case 3:
{
keepSearching = false;
isArgToCall = node->IsCall();
}
break;
default:
{
keepSearching = false;
}
break;
}
}

if (isArgToCall)
{
JITDUMP("LocalAddressVisitor incrementing weighted ref count from %d to %d"
" for implict by-ref V%02d arg passed to call\n",
varDsc->lvRefCntWtd(RCS_EARLY), varDsc->lvRefCntWtd(RCS_EARLY) + 1, lclNum);
varDsc->incLvRefCntWtd(1, RCS_EARLY);
}
}
};

Expand Down
152 changes: 97 additions & 55 deletions src/coreclr/src/jit/morph.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -4796,46 +4796,49 @@ void Compiler::fgMakeOutgoingStructArgCopy(GenTreeCall* call,
//
// We don't need a copy if this is the last use of an implicit by-ref local.
//
// We can't determine that all of the time, but if there is only
// one use and the method has no loops, then this use must be the last.
if (opts.OptimizationEnabled())
{
GenTreeLclVarCommon* lcl = nullptr;

if (argx->OperIsLocal())
{
lcl = argx->AsLclVarCommon();
}
else if ((argx->OperGet() == GT_OBJ) && argx->AsIndir()->Addr()->OperIsLocal())
{
lcl = argx->AsObj()->Addr()->AsLclVarCommon();
}
GenTreeLclVarCommon* const lcl = argx->IsImplicitByrefParameterValue(this);

if (lcl != nullptr)
{
unsigned varNum = lcl->AsLclVarCommon()->GetLclNum();
if (lvaIsImplicitByRefLocal(varNum))
{
LclVarDsc* varDsc = &lvaTable[varNum];
// JIT_TailCall helper has an implicit assumption that all tail call arguments live
// on the caller's frame. If an argument lives on the caller caller's frame, it may get
// overwritten if that frame is reused for the tail call. Therefore, we should always copy
// struct parameters if they are passed as arguments to a tail call.
if (!call->IsTailCallViaHelper() && (varDsc->lvRefCnt(RCS_EARLY) == 1) && !fgMightHaveLoop())
{
assert(!call->IsTailCall());
unsigned varNum = lcl->AsLclVarCommon()->GetLclNum();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for AsLclVarCommon.

LclVarDsc* varDsc = lvaGetDesc(varNum);
const unsigned short totalAppearances = varDsc->lvRefCnt(RCS_EARLY);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: how do you decide what is marked as const and what is not? Like why varNum is not in this case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I try to mark all locals that won't be redefined as const, but am more likely to do this for lines I explicitly edit.


varDsc->setLvRefCnt(0, RCS_EARLY);
args->SetNode(lcl);
assert(argEntry->GetNode() == lcl);
// We don't have liveness so we rely on other indications of last use.
//
// We handle these cases:
//
// * (must not copy) If the call is a tail call, the use is a last use.
// We must skip the copy if we have a fast tail call.
//
// * (must copy) However the current slow tail call helper always copies
// the tail call args from the current frame, so we must copy
// if the tail call is a slow tail call.
//
// * (may not copy) if the call is noreturn, the use is a last use.
//
// * (may not copy) if there is exactly one use of the local in the method,
// and the call is not in loop, this is a last use.
//
const bool isTailCallLastUse = call->IsTailCall();
const bool isCallLastUse = (totalAppearances == 1) && !fgMightHaveLoop();
const bool isNoReturnLastUse = call->IsNoReturn();
if (!call->IsTailCallViaHelper() && (isTailCallLastUse || isCallLastUse || isNoReturnLastUse))
{
varDsc->setLvRefCnt(0, RCS_EARLY);
args->SetNode(lcl);
assert(argEntry->GetNode() == lcl);

JITDUMP("did not have to make outgoing copy for V%2d", varNum);
return;
}
JITDUMP("did not need to make outgoing copy for last use of implicit byref V%2d\n", varNum);
return;
}
}
}

JITDUMP("making an outgoing copy for struct arg\n");

if (fgOutgoingArgTemps == nullptr)
{
fgOutgoingArgTemps = hashBv::Create(this);
Expand Down Expand Up @@ -6550,7 +6553,7 @@ void Compiler::fgMorphCallInlineHelper(GenTreeCall* call, InlineResult* result)
//
// Windows Amd64:
// A fast tail call can be made whenever the number of callee arguments
// is larger than or equal to the number of caller arguments, or we have four
// is less than or equal to the number of caller arguments, or we have four
// or fewer callee arguments. This is because, on Windows AMD64, each
// argument uses exactly one register or one 8-byte stack slot. Thus, we only
// need to count arguments, and not be concerned with the size of each
Expand Down Expand Up @@ -6688,12 +6691,30 @@ bool Compiler::fgCanFastTailCall(GenTreeCall* callee, const char** failReason)

if (arg->isStruct)
{
// Byref struct arguments are not allowed to fast tail call as the information
// of the caller's stack is lost when the callee is compiled.
if (arg->passedByRef)
{
hasByrefParameter = true;
break;
// If we're optimizing, we may be able to pass our caller's byref to our callee,
// and so still be able to tail call.
GenTreeLclVarCommon* const lcl = arg->GetNode()->IsImplicitByrefParameterValue(this);

// For this to work we must not have promoted the parameter; if we've promoted
// then we'll make a local struct at the call.
//
// We could handle this if there was some way to copy back to the original struct.
if (opts.OptimizationEnabled() && (lcl != nullptr) && !lvaGetDesc(lcl)->lvPromoted)
{
// We can still tail call in this case, provided we
// suppress the copy in fgMakeOutgoingStructArgCopy
JITDUMP("Arg [%06u] is unpromoted implicit byref V%02u\n", dspTreeID(arg->GetNode()),
lcl->GetLclNum());
}
else
{
// We can't tail call in this case; note the reason why
// and skip looking at the rest of the args.
hasByrefParameter = true;
break;
}
}
}
}
Expand Down Expand Up @@ -6748,17 +6769,18 @@ bool Compiler::fgCanFastTailCall(GenTreeCall* callee, const char** failReason)
#endif // DEBUG
};

// Note on vararg methods:
// If the caller is vararg method, we don't know the number of arguments passed by caller's caller.
// But we can be sure that in-coming arg area of vararg caller would be sufficient to hold its
// fixed args. Therefore, we can allow a vararg method to fast tail call other methods as long as
// out-going area required for callee is bounded by caller's fixed argument space.
//
// Note that callee being a vararg method is not a problem since we can account the params being passed.
//
// We will currently decide to not fast tail call on Windows armarch if the caller or callee is a vararg
// method. This is due to the ABI differences for native vararg methods for these platforms. There is
// work required to shuffle arguments to the correct locations.
// Note on vararg methods:
// If the caller is vararg method, we don't know the number of arguments passed by caller's caller.
// But we can be sure that in-coming arg area of vararg caller would be sufficient to hold its
// fixed args. Therefore, we can allow a vararg method to fast tail call other methods as long as
// out-going area required for callee is bounded by caller's fixed argument space.
//
// Note that callee being a vararg method is not a problem since we can account the params being passed.
//
// We will currently decide to not fast tail call on Windows armarch if the caller or callee is a vararg
// method. This is due to the ABI differences for native vararg methods for these platforms. There is
// work required to shuffle arguments to the correct locations.
CLANG_FORMAT_COMMENT_ANCHOR;

#if (defined(TARGET_WINDOWS) && defined(TARGET_ARM)) || (defined(TARGET_WINDOWS) && defined(TARGET_ARM64))
if (info.compIsVarArgs || callee->IsVarargs())
Expand Down Expand Up @@ -6836,13 +6858,18 @@ GenTree* Compiler::fgMorphPotentialTailCall(GenTreeCall* call)
// It cannot be an inline candidate
assert(!call->IsInlineCandidate());

auto failTailCall = [&](const char* reason) {
auto failTailCall = [&](const char* reason, unsigned lclNum = BAD_VAR_NUM) {
#ifdef DEBUG
if (verbose)
{
printf("\nRejecting tail call in morph for call ");
printTreeID(call);
printf(": %s\n", reason);
printf(": %s", reason);
if (lclNum != BAD_VAR_NUM)
{
printf(" V%02u", lclNum);
}
printf("\n");
}
#endif

Expand Down Expand Up @@ -6946,9 +6973,9 @@ GenTree* Compiler::fgMorphPotentialTailCall(GenTreeCall* call)
//
if (call->IsImplicitTailCall())
{
if (varDsc->lvHasLdAddrOp)
if (varDsc->lvHasLdAddrOp && !lvaIsImplicitByRefLocal(varNum))
{
failTailCall("Local address taken");
failTailCall("Local address taken", varNum);
return nullptr;
}
if (varDsc->lvAddrExposed)
Expand All @@ -6971,20 +6998,20 @@ GenTree* Compiler::fgMorphPotentialTailCall(GenTreeCall* call)
}
else
{
failTailCall("Local address taken");
failTailCall("Local address taken", varNum);
return nullptr;
}
}
if (varDsc->lvPromoted && varDsc->lvIsParam && !lvaIsImplicitByRefLocal(varNum))
{
failTailCall("Has Struct Promoted Param");
failTailCall("Has Struct Promoted Param", varNum);
return nullptr;
}
if (varDsc->lvPinned)
{
// A tail call removes the method from the stack, which means the pinning
// goes away for the callee. We can't allow that.
failTailCall("Has Pinned Vars");
failTailCall("Has Pinned Vars", varNum);
return nullptr;
}
}
Expand Down Expand Up @@ -16828,8 +16855,23 @@ void Compiler::fgRetypeImplicitByRefArgs()
// through the pointer parameter, the same as we'd do for this
// parameter if it weren't promoted at all (otherwise the initialization
// of the new temp would just be a needless memcpy at method entry).
bool undoPromotion = (lvaGetPromotionType(newVarDsc) == PROMOTION_TYPE_DEPENDENT) ||
(varDsc->lvRefCnt(RCS_EARLY) <= varDsc->lvFieldCnt);
//
// Otherwise, see how many appearances there are. We keep two early ref counts: total
// number of references to the struct or some field, and how many of these are
// arguments to calls. We undo promotion unless we see enougn non-call uses.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: enougn -> enough

//
const unsigned totalAppearances = varDsc->lvRefCnt(RCS_EARLY);
const unsigned callAppearances = varDsc->lvRefCntWtd(RCS_EARLY);
assert(totalAppearances >= callAppearances);
const unsigned nonCallAppearances = totalAppearances - callAppearances;

bool undoPromotion = ((lvaGetPromotionType(newVarDsc) == PROMOTION_TYPE_DEPENDENT) ||
(nonCallAppearances <= varDsc->lvFieldCnt));

JITDUMP("%s promotion of implicit by-ref V%02u: %s total: %u non-call: %u fields: %u\n",
undoPromotion ? "Undoing" : "Keeping", lclNum,
(lvaGetPromotionType(newVarDsc) == PROMOTION_TYPE_DEPENDENT) ? "dependent;" : "",
totalAppearances, nonCallAppearances, varDsc->lvFieldCnt);

if (!undoPromotion)
{
Expand Down Expand Up @@ -16863,7 +16905,7 @@ void Compiler::fgRetypeImplicitByRefArgs()
{
// Set the new parent.
fieldVarDsc->lvParentLcl = newLclNum;
// Clear the ref count field; it is used to communicate the nubmer of references
// Clear the ref count field; it is used to communicate the number of references
// to the implicit byref parameter when morphing calls that pass the implicit byref
// out as an outgoing argument value, but that doesn't pertain to this field local
// which is now a field of a non-arg local.
Expand Down
Loading