Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hoist the nullchecks for 'this' object #68588

Merged
merged 4 commits into from
Apr 29, 2022

Conversation

kunalspathak
Copy link
Member

@kunalspathak kunalspathak commented Apr 27, 2022

Marking nullcheck around this as hoistable-candidate should be safe IMO.

image

@ghost ghost assigned kunalspathak Apr 27, 2022
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Apr 27, 2022
@ghost
Copy link

ghost commented Apr 27, 2022

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

Marking nullcheck around this as cse-candidate should be safe IMO.

image

Author: kunalspathak
Assignees: kunalspathak
Labels:

area-CodeGen-coreclr

Milestone: -

@@ -3684,6 +3684,14 @@ bool Compiler::optIsCSEcandidate(GenTree* tree)
case GT_RETURN:
return false; // Currently the only special nodes that we hit
// that we know that we don't want to CSE
case GT_NULLCHECK:
Copy link
Contributor

@SingleAccretion SingleAccretion Apr 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NULLCHECK does not produce values and gets the "void" VN, so, how does this work at all...?

Edit: I see how it works, this actually just unblocks hoisting, not CSE. It would be better to special-case this in IsNodeHoistable.

Also, I don't think there is a need to check for this here, we can hoist any NULLCHECKs (it is always profitable, which is also where the cost function should be adjusted for these non-value nodes).

Copy link
Member Author

@kunalspathak kunalspathak Apr 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

optIsCSEcandidate() has 2 purposes.

  1. During loop hoisting we check if it will be handled by CSE and if yes, we hoist it out of the loop. During CSE phase, when we actually have to do the expression elimination. By telling that it is CSE candidate, it helps hoisting the check out of loop (because it is an invariant). While working on Hoist the invariants out of multi-level nested loops #68061, I realized cases where we would hoist the field address calculation ahead of null-check although there might not be a visible functional difference. But this PR fixes that to some extent at least for nullchecks around this.

image

  1. While doing actual CSE, we use optIsCSEcandidate(). However, right after that, we check if has any reserved VN and if yes, skip CSEing it.

ValueNum valueVN = vnStore->VNNormalValue(tree->GetVN(VNK_Liberal));
if (ValueNumStore::isReservedVN(valueVN) && (valueVN != ValueNumStore::VNForNull()))
{
continue;
}

Later, global propagation phase takes care of eliminating the null-check that is present inside the loop.

Copy link
Contributor

@SingleAccretion SingleAccretion Apr 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed.

Until now, it has been the case that a "hoistable tree" is (almost, modulo one struct pessimization) the same as "CSEable tree", and both must be values, as the design of hoisting is that we rely on CSE to "clean things up".

But if we start to hoist non-value trees (which is a perfectly valid thing to do), that's not really related to CSE, because: a) non-value trees will never be actual CSE candidates, b) the "cleanup" phase is assertion propagation.

So that's why I think it is confusing to put this check inside optIsCSECandidate.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So that's why I think it is confusing to put this check inside optIsCSECandidate.

Sure, I can probably put this in IsNodeHoistable() which is explicitly for hoisting. In future, we can add more non-value trees in it that can be hoisting candidates.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think based on the above discussion it would be good to add a comment in IsNodeHoistable saying that we also can hoist null checks since assertion prop deals with the redundant check left in the loop. It sort of ties loop hoisting together with assertion prop in the same way it is tied together with CSE.

Also, as @SingleAccretion pointed out, I think you can allow this in more places than just for 'this' pointer. We hoist a tree only if its children are invariant which should be safe enough in general for GT_NULLCHECK. Of course that assumes/hopes assertion prop can pick up the additional cases.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried experimenting with relaxing the hoisting for other object's nullchecks and there are some wins, but I see some regressions of this sort.

image

Basically, we overcome the number of variables hoisted out of the loop as compared to the available registers. I have plans to revisit these heuristics (specially for arm64) in or after #68061 and I would look into "other object's nullcheck hoisting" at that time.

      [000167] hoistable    tree cost too low: 3 < 6 (loopVarCount 6 >= availableRegCount 6)
   ... not profitable to hoist

it would be good to add a comment

If there are no other changes to be done, I would rather do it in my #68061

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the new on the right? It looks like an improvement to me, both in size and perf score (the loop body is smaller and with fewer memory accesses). Or am I missing something?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, I was hoping to also get lea too out of the loop and it was not (because of reasons I stated above). I reran the jit-analyze with PerfScore and I did some improvements. Let me push the changes that enables this for other objects.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I think there is still work to be done in hoisting, especially with regards to hoisting more invariant statements.

@kunalspathak kunalspathak changed the title CSE nullcheck involving 'this' Hoist the nullchecks for 'this' object Apr 27, 2022
@kunalspathak
Copy link
Member Author

Failures are related to dotnet/arcade#9208

@kunalspathak kunalspathak marked this pull request as ready for review April 27, 2022 23:37
@kunalspathak
Copy link
Member Author

@dotnet/jit-contrib

1 similar comment
@kunalspathak
Copy link
Member Author

@dotnet/jit-contrib

@DrewScoggins
Copy link
Member

Improvements: dotnet/perf-autofiling-issues#5074

@kunalspathak
Copy link
Member Author

kunalspathak commented May 5, 2022

Windows x64 improvements: dotnet/perf-autofiling-issues#5117
Ubuntu x64 improvements: dotnet/perf-autofiling-issues#5096

@ghost ghost locked as resolved and limited conversation to collaborators Jun 4, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants