Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keep precise argument sizes between import and morph. #43130

Merged
merged 1 commit into from
Nov 25, 2020

Conversation

sandreenko
Copy link
Contributor

@sandreenko sandreenko commented Oct 7, 2020

On arm64 OSX we need to know precise argument sizes. I have added support for that between morph and lowering in #42503, this PR supports in between importation and morph.

When do we need it:

IL: load byte; call1(int) -> IR: CALL1(IND byte);
IL load int; call2(byte) -> IR: CALL2(IND int);

without the change, morph was seeing the incorrect type of the nodes but for other platforms it was not important (because all arguments were passed in TARGET_POINTER_SIZE slots).

I have stopped on the solution that adds a special GT_OPER to keep this information between these phases and a debug code in morph to check that it has survived phases between them (mostly inlining). Thanks @CarolEidt for the idea.
I want to enable this code for all platforms because:

  1. do not want to introduce platform dependency in HIR;
  2. it will make arm64 OSX work cheaper because we will catch bugs in this component on other platforms until we have arm64 OSX in ci;
  3. the overhead is small, it should not affect TP;

I was also considering other solutions, the main 2:

  1. add a new artificial structure to keep this information, map/dictionary, but
  • we can't increase call node size;
  • it will be an error-prone component from our previous experience with fgArgInfo;
  • it will have non-const access time or memory overhead, also non-trivial copying;
  1. change impImplicitIorI4Cast to insert cast not only between i4 and i8 but between all sizes, but
  • we don't need it if it is not under a call;
  • it will require changes during inlining to ignore such nodes;
  • it will be easy for such nodes to slip through morph phase;

No asm diffs (SPMI x64/x86). Update: ci shows related failures during the check phase, I will fix them soon, but I think it should not block review.

@sandreenko sandreenko added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Oct 7, 2020
@sandreenko sandreenko force-pushed the Arm64Apple-Jit-Part4 branch from 9d5efd5 to c986f65 Compare October 8, 2020 05:55
@sandreenko sandreenko force-pushed the Arm64Apple-Jit-Part4 branch 4 times, most recently from 0a0669b to 115e44c Compare October 12, 2020 07:02
@sandreenko sandreenko marked this pull request as ready for review October 12, 2020 07:03
@sandreenko sandreenko changed the title Add GenTreePutArgSmall. Keep precise argument sizes between import and morph. Oct 12, 2020

if (corType != CORINFO_TYPE_CLASS && corType != CORINFO_TYPE_BYREF && corType != CORINFO_TYPE_PTR &&
corType != CORINFO_TYPE_VAR && (argRealClass = info.compCompHnd->getArgClass(sig, argLst)) != nullptr)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why we did not see a warning about an assignment inside if.

// load byte; call(int) -> CALL(PUTARG_TYPE byte(IND byte));
// load int; call(byte) -> CALL(PUTARG_TYPE int (IND int));
// etc.
if ((call->callSig != nullptr) && (Target::g_tgtArgOrder == Target::ARG_ORDER_R2L))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

callSig == nullptr means mostly a helper call, all helper calls that I have checked are working with i4 and i8 types but I expect I will have to add more PUTARG_TYPE there once I run more tests on arm64 OSX and find bugs with missed type information.

@sandreenko
Copy link
Contributor Author

PTAL @CarolEidt @AndyAyersMS @dotnet/jit-contrib

@AndyAyersMS
Copy link
Member

I'm puzzled why we need to do things this way -- doesn't the callee sig tell us what size the callee expects?

@sandreenko
Copy link
Contributor Author

I'm puzzled why we need to do things this way -- doesn't the callee sig tell us what size the callee expects?

We get this information from callee sig, but it is available only during importation and then we lose it. In debug we save it in

#ifdef DEBUG
    // Used to register callsites with the EE
    CORINFO_SIG_INFO* callSig;
#endif

if we decide to keep it in Release it will increase the size of all large nodes.

@AndyAyersMS
Copy link
Member

Seems like we could deal with that easily enough, we have various extensions already hanging off of calls. For instance we could refactor TailCallSiteInfo which already holds onto a CORINFO_SIG_INFO and use it more broadly for cases where we need the sig later in compilation.

Is there other benefit to having these extra IR nodes beyond being able to handle this ABI quirk?

@sandreenko
Copy link
Contributor Author

Is there other benefit to having these extra IR nodes beyond being able to handle this ABI quirk?

These extra nodes are only for the ABI quirk, I do not see benefits to other places from it. I was thinking to postpone casts creations but it was a long shot (if we pass 1 byte as 1 byte we don't need CAST int<-byte<-int).
I was using the signature in my prototype (from the debug field) to get the correct handle and there were some issues:

  1. no way to handle calls without a signature if we get such;
  2. not easy to match the signature to the actual argument nodes in morph. I am trying to do it for checking now, but with additional arguments from both sides of the list it is not trivial;
  3. need to do extra calls to strip(info.compCompHnd->getArgType(sig, sigArgs, &classHnd)) (minor);

Copy link
Contributor

@CarolEidt CarolEidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some initial comments and suggestions

src/coreclr/src/jit/flowgraph.cpp Show resolved Hide resolved
{
GenTree* effectiveVal = this;
for (;;)
{
assert(!effectiveVal->OperIs(GT_PUTARG_TYPE));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious why you added this here - that is, I can see why we wouldn't expect to encounter these under a COMMA, but I'm not sure why this special assert is needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have several similar functions like gtEffectiveVal, gtRetExprVal and gtSkipPutArgType. When we call each of them we often don't expect nodes that can be skipped by two others to be on top.
For example, when we call gtEffectiveVal we don't expect something like GT_RET_EXPR(GT_COMMA) or GT_PUTARG_TYPE(GT_COMMA), similar for gtRetExprVal it should not see GT_PUTARG_TYPE(GT_PUTARG_TYPE).
If we do it means the caller did not check for these nodes and we don't skip what we hoped to skip, it doesn't cause correctness issues in my understanding but introduces asm regressions when we can't parse something like we did before the change.

src/coreclr/src/jit/importer.cpp Outdated Show resolved Hide resolved
src/coreclr/src/jit/importer.cpp Outdated Show resolved Hide resolved
src/coreclr/src/jit/morph.cpp Outdated Show resolved Hide resolved
// After tail call transformations we could have more nodes in the list + special arguments +
// setup trees. I need to find a clean way to match this list with the original signature if
// is possible. If not we will be left without a check that PUTARG_TYPE survive as they should.
if (nodeArgsCount == sigArgsCount)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be !=? Otherwise, the loop below will never execute, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a temporary condition, the idea here is to match the signature with the call arguments as we have them in morph and check that we still have the correct types. However, it is tricky to do because we could have some nodes added to gtCallArgs between importation and call to this function (also this function can be called several times).
So we need to skip all additional nodes that were added and I have problems with identifying them. I am planning to fix it once I pass all the other tests.

src/coreclr/src/jit/rationalize.cpp Outdated Show resolved Hide resolved
@sandreenko
Copy link
Contributor Author

/azp run runtime-coreclr outerloop

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@sandreenko
Copy link
Contributor Author

/azp run runtime-coreclr outerloop

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Contributor Author

@sandreenko sandreenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR was updated and is passing the tests now.
Thanks to @AndyAyersMS for helping me offline with the previous test failures.

The changes include #44790 that I hope to merge first.

This is the last big JIT change for arm64 apple, with it we will have precise arguments sizes from importer to codegen. The next changes will be on VM side for interop stubs.

{
InlArgInfo* inlCurArgInfo = &pInlineInfo->inlArgInfo[argNum];

inlCurArgInfo->argNode = curArgVal; // Save the original tree, with PUT_ARG and RET_EXPR.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main change is that we save PUTARG_TYPE and RET_EXPR in inlCurArgInfo->argNode.

.maxstack 8
IL_0000: nop
IL_0001: ldarg.0
IL_0002: call void Runtime_43130::CallAMethod1(int16)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

int32 is passed as int16 to Inline1 where it is passed to CallAMethod1, before the last changes the saved node was LCL_VAR int and we added another PUTARG_TYPE on top of an existing one, now it is fixed.

@@ -23820,7 +23848,8 @@ Statement* Compiler::fgInlinePrependStatements(InlineInfo* inlineInfo)

GenTree* argSingleUseNode = argInfo.argBashTmpNode;

if ((argSingleUseNode != nullptr) && !(argSingleUseNode->gtFlags & GTF_VAR_CLONED) && argIsSingleDef)
if ((argSingleUseNode != nullptr) && !(argSingleUseNode->gtFlags & GTF_VAR_CLONED) && argIsSingleDef &&
!argHasPutArg)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new condition allows us to always the argument as a temp with the correct type, we need it because for a correct direct substitution we need to know if the parent is a call or not but we don't have this information.
We create the node when we see an IL opcode like ldlocal and don't have parent information, later, when the node is used, we can't distinguish it from a regular LCL_VAR.
The change causes a small regression on all platforms:

windows x64 crossgen: 5134 (0.02% of base)
windows x64 pmi:  14911 (0.03% of base)
windows arm64 crossgen: 5688 (0.01% of base)
linux x64 crossgen: 9036 (0.02% of base)

, etc. all lower than 0.04%.

What happens there is:

  1. during importation we create a LCL_VAR tempForOurArg, save a pointer to it;
  2. in fgInsertInlineeBlocks we iterate over all such pointers and do in-place replacement if there was only one use of the lclVar, but we don't know the user, so we can't say if it is a call that need a PUTARG or not.

I have not found a simple way to avoid it or compensate for it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to have a comment in the code here explaining why argHasPutArg disqualifies this arg from direct substitution.

Copy link
Member

@AndyAyersMS AndyAyersMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left you some questions about this change.

@@ -5179,7 +5179,7 @@ void Compiler::fgObserveInlineConstants(OPCODE opcode, const FgStack& stack, boo
// Check for the double whammy of an incoming constant argument
// feeding a constant test.
unsigned varNum = FgStack::SlotTypeToArgNum(slot0);
if (impInlineInfo->inlArgInfo[varNum].argNode->OperIsConst())
if (impInlineInfo->inlArgInfo[varNum].argIsInvariant)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need this change for correctness? If not perhaps you should PR it separately.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto for the other two below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not need this for correctness. I am planning to merge them in #44790 (comment), if we decide not to merge it I will need to change this code to use argNode->gtSkipPutArgType()->gtRetExprVal().

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just approved #44790.

@@ -23820,7 +23848,8 @@ Statement* Compiler::fgInlinePrependStatements(InlineInfo* inlineInfo)

GenTree* argSingleUseNode = argInfo.argBashTmpNode;

if ((argSingleUseNode != nullptr) && !(argSingleUseNode->gtFlags & GTF_VAR_CLONED) && argIsSingleDef)
if ((argSingleUseNode != nullptr) && !(argSingleUseNode->gtFlags & GTF_VAR_CLONED) && argIsSingleDef &&
!argHasPutArg)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to have a comment in the code here explaining why argHasPutArg disqualifies this arg from direct substitution.

src/coreclr/src/jit/compiler.h Show resolved Hide resolved
src/coreclr/src/jit/flowgraph.cpp Outdated Show resolved Hide resolved
GenTree* argNode = inlArgInfo[argNum].argNode;
const bool argHasPutArg = argNode->OperIs(GT_PUTARG_TYPE);

unsigned __int64 bbFlags = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See note above; I don't believe you can set these to zero like this and pass jit stress.

src/coreclr/src/jit/morph.cpp Outdated Show resolved Hide resolved
@sandreenko
Copy link
Contributor Author

/azp run runtime-coreclr outerloop

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Member

@AndyAyersMS AndyAyersMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is good to merge once you've updated for #44790 and perhaps added a comment indicating why PUTARG_TYPE can't be directly substituted.

Create `GT_PUTARG_TYPE` when signature type does not match node type.
Check in morph that this information has survived inlining and other phases between.
@sandreenko
Copy link
Contributor Author

I have checked that the diffs are still as expected before the merge, PR and jitstress are clear,
the only ci failure is due to #45168.

@sandreenko sandreenko merged commit 3e65d68 into dotnet:master Nov 25, 2020
@sandreenko
Copy link
Contributor Author

Thanks @AndyAyersMS and @CarolEidt for your help with this change.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants