-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Arm64: Add If conversion pass #73472
Conversation
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue DetailsThis patch builds upon the work in #71705 A single condition can be optimised by replacing a JTRUE For example:
Additional JTRUE statements can be replaced by placing AND For example:
Performance tests are here: dotnet/performance#2517
|
The first commit in this PR is a copy of #71705. Once that PR is merged I will remove that commit from this patch. This was just so I could get this out a little earlier. Also, this probably needs an additional check in the optimizer pass to limit the max size of the chains. |
28201a3
to
7563441
Compare
7563441
to
931c48f
Compare
c7edb88
to
a2c0607
Compare
PR is now rebased on top of HEAD, and is good for a review. Taking this out of draft. I do have the one failure being hit a few times in the testsuite - debugging this now. However, I'm not expecting that to make big changes to the patch. I'm away for two weeks from Monday. Getting this in before then feels unlikely. However, it'd be good to know how close it is. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of the logic looks right on the surface to me, but I would like a review of the flow checks/changes from @AndyAyersMS.
The current implementation looks quite limited, seems like we could get more out of it, I've left a few comments about that. I'm ok with leaving that for future PRs.
The diffs look like size wise regressions, why is that? I would expect this pattern should give smaller code in most cases.
The code needs to be updated to follow our code style, in particular camelCase
for variables and parentheses around all binary operations.
I'm away for two weeks from Monday. Getting this in before then feels unlikely. However, it'd be good to know how close it is.
Yes, I think it's unlikely. We will also need to do benchmarking and check regressions, plus run extended testing, and doing all of that before Friday seems unlikely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @jakobbotsch that this is unlikely to get merged by Friday provided code size diff is a concern and we want to see performance improvements on some small microbenchmarks. Our .NET 8 branch should open up in couple of weeks and we should merge this PR at that point. The previous related PRs still is giving us good code coverage and is part of .NET 7.
if (block->getBBWeight(this) > BB_UNITY_WEIGHT) | ||
{ | ||
return false; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For min/max reductions or sort inner loops seems like we might want to also if convert?
Also perhaps some idea of how costly it might be to evaluate the conditional input eagerly -- we are limited to just one tree so perhaps some cap on GetCostEx
for that input.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@a74nh are you working on this? (Note that GetCostSz()
below does not capture the fact that the optimization changes some code to be eagerly evaluated, as Andy points out)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was avoiding using this as loops as csel can be slower inside loops. Specifically, it's slower when the loop iterator is being used as an input to the condition (therefore blocking it from executing). I'm not sure exactly how we would test for that.
For all the cases where the code is longer that will be due to no longer using
now becomes:
Which is one instruction longer. Assuming the branch taken/not taken is fairly random, then the new code will be faster. I should probably extend dotnet/performance#2517 to make sure those cases are being tested. |
I'm not sure this is a fair assumption in practice. Can you post some micro benchmarks for all cases (both randomly taken branches and biased branches)? |
/azp run Fuzzlyn |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run runtime-coreclr jitstress, runtime-coreclr libraries-jitstress, runtime-coreclr gcstress |
Azure Pipelines successfully started running 2 pipeline(s). |
/azp run runtime-coreclr gcstress0x3-gcstress0xc |
Azure Pipelines successfully started running 1 pipeline(s). |
I'll take a look at the failures and see if they look related to this PR. |
Most of the failures looked to be about installing the .NET SDK. I've rerun them, but if they fail again I will just merge, since this is an arm64 change I think we had sufficient coverage in the arm64 jobs that actually were able to run. |
Do you mind adding some disasm-checks tests as well? |
Sure, happy to do that. If @jakobbotsch plans on merging this, then I could add it as a new PR. |
I'm ok with doing this in a follow-up PR (for example, @EgorBo and I are hoping for a follow up to handle ternaries that it could be added as part of :-) ). Does that sound fine to you @kunalspathak? |
That patch is sat ready awaiting this :) Quite a bit of refactoring, but all the changes are in the one function/file. |
Sounds good to me. I will go through the PR today (since it is awaiting my respond). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for the valuable feedback @jakobbotsch and thanks for your patience @a74nh .
//----------------------------------------------------------------------------- | ||
// optIfConvert | ||
// | ||
// Find blocks representing simple if statements represented by conditional jumps |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the wonderful summary.
{ | ||
#ifndef TARGET_ARM64 | ||
return false; | ||
#else |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#else | |
#endif |
nit: No need of #else
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might give warnings due to unreachable code?
In any case, let's leave it for the follow-up PR so we don't need to rerun the CI.
|
||
// Reverse iterate through the blocks. | ||
BasicBlock* block = fgLastBB; | ||
while (block != nullptr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not for this PR, but just a thought - Is there a quick way to check if the block
doesn't definitely have conditional in the end that can be optimized? I am trying to see if have information that we can do a quick check here to eliminate calls to optIfConvert()
and doing analysis only to find out that block cannot be optimized with if-convert.
Preliminerary scan seems an infra issue:
|
This one is #77241.
Yeah, it's not related to this PR. It was fixed by #77504 but needed a new collection after that PR for the fix to kick in. It should work on a rerun now that we have new collections. |
/azp run runtime-coreclr superpmi-diffs |
Azure Pipelines successfully started running 1 pipeline(s). |
Ah, well now superpmi-diffs conflicted with the JIT-EE GUID update in #77593 :-) ok, well this is ready anyway so I will merge it. Thanks for working with us on this @a74nh! |
@AndyAyersMS - Did we ever transfer any of the regression issue to the |
No, I don't think any were transferred. |
This patch builds upon the work in #71705
A single condition can be optimised by replacing a JTRUE
with a SELECT and removing the control flow.
For example:
Additional JTRUE statements can be replaced by placing AND
chains onto the conditional part of a SELECT.
For example:
Performance tests are here: dotnet/performance#2517