Arm64: Add If conversion pass #73472

a74nh · 2022-08-05T16:18:18Z

This patch builds upon the work in #71705

A single condition can be optimised by replacing a JTRUE
with a SELECT and removing the control flow.

For example:

if (op1 > 0) { op1 = 5; }
IN0001: 000008                    mov     w2, dotnet/perf-autofiling-issues#5
IN0002: 00000C                    cmp     w0, #0
IN0003: 000010                    csel    w0, w0, w2, eq

Additional JTRUE statements can be replaced by placing AND
chains onto the conditional part of a SELECT.

For example:

if (op1 > 3 && op2 != 10 && op3 < 7) { op1 = 5; }
IN0001: 000008                    cmp     w2, dotnet/runtime#56402
IN0002: 00000C                    ccmp    w1, dotnet/perf-autofiling-issues#10, z, lt
IN0003: 000010                    ccmp    w0, dotnet/perf-autofiling-issues#3, nzc, ne
IN0004: 000014                    mov     w2, dotnet/perf-autofiling-issues#5
IN0005: 000018                    csel    w0, w2, w0, gt

Performance tests are here: dotnet/performance#2517

ghost · 2022-08-05T16:18:33Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

This patch builds upon the work in #71705

A single condition can be optimised by replacing a JTRUE
with a SELECT and removing the control flow.

For example:

if (op1 > 0) { op1 = 5; }
IN0001: 000008                    mov     w2, dotnet/perf-autofiling-issues#5
IN0002: 00000C                    cmp     w0, #0
IN0003: 000010                    csel    w0, w0, w2, eq

Additional JTRUE statements can be replaced by placing AND
chains onto the conditional part of a SELECT.

For example:

if (op1 > 3 && op2 != 10 && op3 < 7) { op1 = 5; }
IN0001: 000008                    cmp     w2, dotnet/runtime#56402
IN0002: 00000C                    ccmp    w1, dotnet/perf-autofiling-issues#10, z, lt
IN0003: 000010                    ccmp    w0, dotnet/perf-autofiling-issues#3, nzc, ne
IN0004: 000014                    mov     w2, dotnet/perf-autofiling-issues#5
IN0005: 000018                    csel    w0, w2, w0, gt

Performance tests are here: dotnet/performance#2517

Author:	a74nh
Assignees:	-
Labels:	`area-CodeGen-coreclr`
Milestone:	-

a74nh · 2022-08-05T16:21:19Z

The first commit in this PR is a copy of #71705. Once that PR is merged I will remove that commit from this patch. This was just so I could get this out a little earlier.

@kunalspathak @jakobbotsch

Also, this probably needs an additional check in the optimizer pass to limit the max size of the chains.

a74nh · 2022-08-10T11:09:44Z

PR is now rebased on top of HEAD, and is good for a review. Taking this out of draft.

I do have the one failure being hit a few times in the testsuite - debugging this now. However, I'm not expecting that to make big changes to the patch.

I'm away for two weeks from Monday. Getting this in before then feels unlikely. However, it'd be good to know how close it is.

@kunalspathak @jakobbotsch

jakobbotsch

Most of the logic looks right on the surface to me, but I would like a review of the flow checks/changes from @AndyAyersMS.

The current implementation looks quite limited, seems like we could get more out of it, I've left a few comments about that. I'm ok with leaving that for future PRs.

The diffs look like size wise regressions, why is that? I would expect this pattern should give smaller code in most cases.

The code needs to be updated to follow our code style, in particular camelCase for variables and parentheses around all binary operations.

I'm away for two weeks from Monday. Getting this in before then feels unlikely. However, it'd be good to know how close it is.

Yes, I think it's unlikely. We will also need to do benchmarking and check regressions, plus run extended testing, and doing all of that before Friday seems unlikely.

src/coreclr/jit/gentree.cpp

src/coreclr/jit/optimizer.cpp

src/coreclr/jit/valuenum.cpp

src/coreclr/jit/assertionprop.cpp

kunalspathak

I agree with @jakobbotsch that this is unlikely to get merged by Friday provided code size diff is a concern and we want to see performance improvements on some small microbenchmarks. Our .NET 8 branch should open up in couple of weeks and we should merge this PR at that point. The previous related PRs still is giving us good code coverage and is part of .NET 7.

src/coreclr/jit/optimizer.cpp

AndyAyersMS · 2022-08-10T15:52:26Z

src/coreclr/jit/optimizer.cpp

+    if (block->getBBWeight(this) > BB_UNITY_WEIGHT)
+    {
+        return false;
+    }


For min/max reductions or sort inner loops seems like we might want to also if convert?

Also perhaps some idea of how costly it might be to evaluate the conditional input eagerly -- we are limited to just one tree so perhaps some cap on GetCostEx for that input.

@a74nh are you working on this? (Note that GetCostSz() below does not capture the fact that the optimization changes some code to be eagerly evaluated, as Andy points out)

I was avoiding using this as loops as csel can be slower inside loops. Specifically, it's slower when the loop iterator is being used as an input to the condition (therefore blocking it from executing). I'm not sure exactly how we would test for that.

src/coreclr/jit/optimizer.cpp

a74nh · 2022-08-11T08:58:12Z

The diffs look like size wise regressions, why is that? I would expect this pattern should give smaller code in most cases.

For all the cases where the code is longer that will be due to no longer using cbz/cbnz instructions:

cbnz x0, LABEL1
mov x1, 5
LABEL1:

now becomes:

cmp x0, 0
mov x2, 5
csel x1, x2, x1, eq

Which is one instruction longer. Assuming the branch taken/not taken is fairly random, then the new code will be faster. I should probably extend dotnet/performance#2517 to make sure those cases are being tested.

jakobbotsch · 2022-08-15T08:59:46Z

Assuming the branch taken/not taken is fairly random, then the new code will be faster.

I'm not sure this is a fair assumption in practice. Can you post some micro benchmarks for all cases (both randomly taken branches and biased branches)?

jakobbotsch · 2022-08-15T10:49:32Z

/azp run Fuzzlyn

azure-pipelines · 2022-08-15T10:49:46Z

Azure Pipelines successfully started running 1 pipeline(s).

jakobbotsch · 2022-10-28T15:59:25Z

/azp run runtime-coreclr jitstress, runtime-coreclr libraries-jitstress, runtime-coreclr gcstress

azure-pipelines · 2022-10-28T15:59:42Z

Azure Pipelines successfully started running 2 pipeline(s).

jakobbotsch · 2022-10-28T16:01:47Z

/azp run runtime-coreclr gcstress0x3-gcstress0xc

azure-pipelines · 2022-10-28T16:02:00Z

Azure Pipelines successfully started running 1 pipeline(s).

jakobbotsch · 2022-10-31T08:51:14Z

I'll take a look at the failures and see if they look related to this PR.

jakobbotsch · 2022-10-31T13:54:09Z

Most of the failures looked to be about installing the .NET SDK. I've rerun them, but if they fail again I will just merge, since this is an arm64 change I think we had sufficient coverage in the arm64 jobs that actually were able to run.

kunalspathak · 2022-10-31T14:55:24Z

Do you mind adding some disasm-checks tests as well?

a74nh · 2022-10-31T15:05:49Z

Do you mind adding some disasm-checks tests as well?

Sure, happy to do that. If @jakobbotsch plans on merging this, then I could add it as a new PR.

jakobbotsch · 2022-10-31T15:43:34Z

Sure, happy to do that. If @jakobbotsch plans on merging this, then I could add it as a new PR.

I'm ok with doing this in a follow-up PR (for example, @EgorBo and I are hoping for a follow up to handle ternaries that it could be added as part of :-) ). Does that sound fine to you @kunalspathak?

a74nh · 2022-10-31T16:22:58Z

I'm ok with doing this in a follow-up PR (for example, @EgorBo and I are hoping for a follow up to handle ternaries that it could be added as part of :-) ). Does that sound fine to you @kunalspathak?

That patch is sat ready awaiting this :) Quite a bit of refactoring, but all the changes are in the one function/file.

kunalspathak · 2022-10-31T16:25:00Z

Does that sound fine to you @kunalspathak?

Sounds good to me. I will go through the PR today (since it is awaiting my respond).

kunalspathak

LGTM. Thanks for the valuable feedback @jakobbotsch and thanks for your patience @a74nh .

kunalspathak · 2022-10-31T19:01:01Z

src/coreclr/jit/optimizer.cpp

+//-----------------------------------------------------------------------------
+// optIfConvert
+//
+// Find blocks representing simple if statements represented by conditional jumps


Thanks for the wonderful summary.

kunalspathak · 2022-10-31T19:05:54Z

src/coreclr/jit/optimizer.cpp

+{
+#ifndef TARGET_ARM64
+    return false;
+#else


Suggested change

#else

#endif

nit: No need of #else

Might give warnings due to unreachable code?
In any case, let's leave it for the follow-up PR so we don't need to rerun the CI.

kunalspathak · 2022-10-31T19:12:16Z

src/coreclr/jit/optimizer.cpp

+
+    // Reverse iterate through the blocks.
+    BasicBlock* block = fgLastBB;
+    while (block != nullptr)


Not for this PR, but just a thought - Is there a quick way to check if the block doesn't definitely have conditional in the end that can be optimized? I am trying to see if have information that we can do a quick check here to eliminate calls to optIfConvert() and doing analysis only to find out that block cannot be optimized with if-convert.

kunalspathak · 2022-10-31T19:23:21Z

System.Transactions.Tests.OleTxTests.ImplicitDistributedTransactions_cannot_be_changed_after_being_read_as_true seems from this PR?

superpmi-diff

Preliminerary scan seems an infra issue:

C:\h\w\AD3B0987\p\superpmi.py:600: DeprecationWarning: There is no current event loop

  loop = asyncio.get_event_loop()

Traceback (most recent call last):

  File "C:\h\w\AD3B0987\p\superpmi.py", line 1720, in create_one_artifact

    with open(item_path, 'r') as file_handle:

FileNotFoundError: [Errno 2] No such file or directory: 'C:\\h\\w\\AD3B0987\\p\\artifacts\\spmi\\asm.coreclr_tests.run.Linux.arm64.checked\\base\\209719.dasm'



The above exception was the direct cause of the following exception:



Traceback (most recent call last):

  File "C:\h\w\AD3B0987\p\superpmi.py", line 4451, in <module>

    sys.exit(main(args))

  File "C:\h\w\AD3B0987\p\superpmi.py", line 4342, in main

    success = asm_diffs.replay_with_asm_diffs()

  File "C:\h\w\AD3B0987\p\superpmi.py", line 1736, in replay_with_asm_diffs

    subproc_helper.run_to_completion(create_replay_artifacts, self, mch_file, asm_dotnet_vars_full_env, text_differences, base_asm_location, diff_asm_location, ".dasm")

  File "C:\h\w\AD3B0987\p\superpmi.py", line 601, in run_to_completion

    loop.run_until_complete(self.__run_to_completion__(async_callback, *extra_args))

  File "C:\python3\lib\asyncio\base_events.py", line 649, in run_until_complete

    return future.result()

  File "C:\h\w\AD3B0987\p\superpmi.py", line 584, in __run_to_completion__

    await asyncio.gather(*tasks)

  File "C:\h\w\AD3B0987\p\superpmi.py", line 556, in __get_item__

    await async_callback(print_prefix, item, *extra_args)

  File "C:\h\w\AD3B0987\p\superpmi.py", line 1726, in create_replay_artifacts

    base_txt = await create_one_artifact(self.base_jit_path, base_location, flags + base_option_flags_for_diff_artifact)

  File "C:\h\w\AD3B0987\p\superpmi.py", line 1723, in create_one_artifact

    raise create_exception() from err

Exception: Failure while creating JitStdOutFile.

Exit code: 0

jakobbotsch · 2022-11-01T09:54:29Z

System.Transactions.Tests.OleTxTests.ImplicitDistributedTransactions_cannot_be_changed_after_being_read_as_true seems from this PR?

This one is #77241.

superpmi-diff

Preliminerary scan seems an infra issue:

Yeah, it's not related to this PR. It was fixed by #77504 but needed a new collection after that PR for the fix to kick in. It should work on a rerun now that we have new collections.

jakobbotsch · 2022-11-01T10:01:57Z

/azp run runtime-coreclr superpmi-diffs

azure-pipelines · 2022-11-01T10:02:14Z

Azure Pipelines successfully started running 1 pipeline(s).

jakobbotsch · 2022-11-01T12:40:12Z

Ah, well now superpmi-diffs conflicted with the JIT-EE GUID update in #77593 :-) ok, well this is ready anyway so I will merge it.
For posterity, the last fully successful diffs is at https://dev.azure.com/dnceng-public/public/_build/results?buildId=64480&view=ms.vss-build-web.run-extensions-tab.

Thanks for working with us on this @a74nh!

AndyAyersMS · 2022-11-03T16:22:12Z

Regressions:

Improvements:

[Perf] Linux/arm64: 6 Improvements on 11/1/2022 1:15:18 PM perf-autofiling-issues#9496
[Perf] Windows/arm64: 2 Improvements on 11/1/2022 1:15:18 PM perf-autofiling-issues#9477
[Perf] Linux/arm64: 10 Improvements on 11/1/2022 1:15:18 PM perf-autofiling-issues#9677

kunalspathak · 2022-11-07T17:24:19Z

Regressions:

[Perf] Windows/arm64: 5 Regressions on 11/1/2022 1:15:18 PM perf-autofiling-issues#9471

[Perf] Linux/arm64: 2 Regressions on 11/1/2022 1:15:18 PM perf-autofiling-issues#9489

[Perf] Windows/arm64: 4 Regressions on 10/27/2022 5:53:52 AM perf-autofiling-issues#9469

Improvements:

[Perf] Linux/arm64: 6 Improvements on 11/1/2022 1:15:18 PM perf-autofiling-issues#9496

[Perf] Windows/arm64: 2 Improvements on 11/1/2022 1:15:18 PM perf-autofiling-issues#9477

@AndyAyersMS - Did we ever transfer any of the regression issue to the runtime repo?

AndyAyersMS · 2022-11-07T18:01:38Z

No, I don't think any were transferred.

kunalspathak · 2022-11-07T18:04:39Z

#77984

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Aug 5, 2022

ghost added the community-contribution Indicates that the PR has been added by a community member label Aug 5, 2022

a74nh force-pushed the github_a74nh_csel_optimizer branch from 28201a3 to 7563441 Compare August 8, 2022 10:32

a74nh marked this pull request as draft August 8, 2022 16:13

a74nh force-pushed the github_a74nh_csel_optimizer branch from 7563441 to 931c48f Compare August 8, 2022 16:26

a74nh force-pushed the github_a74nh_csel_optimizer branch 2 times, most recently from c7edb88 to a2c0607 Compare August 10, 2022 08:14

a74nh marked this pull request as ready for review August 10, 2022 11:09

jakobbotsch reviewed Aug 10, 2022

View reviewed changes

SingleAccretion reviewed Aug 10, 2022

View reviewed changes

src/coreclr/jit/valuenum.cpp Outdated Show resolved Hide resolved

src/coreclr/jit/assertionprop.cpp Outdated Show resolved Hide resolved

kunalspathak requested changes Aug 10, 2022

View reviewed changes

ghost added needs-author-action An issue or pull request that requires more info or actions from the author. and removed needs-author-action An issue or pull request that requires more info or actions from the author. labels Aug 10, 2022

AndyAyersMS reviewed Aug 10, 2022

View reviewed changes

jakobbotsch reviewed Aug 10, 2022

View reviewed changes

src/coreclr/jit/optimizer.cpp Outdated Show resolved Hide resolved

jakobbotsch reviewed Aug 10, 2022

View reviewed changes

src/coreclr/jit/optimizer.cpp Show resolved Hide resolved

JulieLeeMSFT assigned a74nh and kunalspathak Aug 11, 2022

kunalspathak added this to the 8.0.0 milestone Aug 11, 2022

build-analysis bot mentioned this pull request Oct 28, 2022

Test failure JIT\\HardwareIntrinsics\\General\\Vector256\\Vector256_r\\Vector256_r.cmd #76280

Closed

kunalspathak approved these changes Oct 31, 2022

View reviewed changes

jakobbotsch merged commit 33be29e into dotnet:main Nov 1, 2022

a74nh deleted the github_a74nh_csel_optimizer branch November 1, 2022 13:05

a74nh mentioned this pull request Nov 1, 2022

Add simple Else conditions to If Conversion #77728

Merged

EgorBo mentioned this pull request Nov 3, 2022

[Perf] Linux/arm64: 2 Regressions on 11/1/2022 1:15:18 PM dotnet/perf-autofiling-issues#9489

Closed

jakobbotsch mentioned this pull request Nov 4, 2022

Assertion failed '!"Should not see phi nodes after rationalize"' during 'Lowering nodeinfo' #77886

Closed

kunalspathak mentioned this pull request Nov 7, 2022

[Perf] Windows/arm64: Regressions in IfStatements #77984

Closed

EgorBo mentioned this pull request Nov 10, 2022

[Perf] Windows/arm64: 1 Regression on 11/1/2022 1:15:18 PM dotnet/perf-autofiling-issues#9464

Closed

ghost locked as resolved and limited conversation to collaborators Dec 10, 2022

Arm64: Add If conversion pass #73472

Arm64: Add If conversion pass #73472

Conversation

a74nh commented Aug 5, 2022

ghost commented Aug 5, 2022

a74nh commented Aug 5, 2022

a74nh commented Aug 10, 2022

jakobbotsch left a comment

Choose a reason for hiding this comment

kunalspathak left a comment

Choose a reason for hiding this comment

AndyAyersMS Aug 10, 2022

Choose a reason for hiding this comment

jakobbotsch Sep 30, 2022

Choose a reason for hiding this comment

a74nh Oct 5, 2022

Choose a reason for hiding this comment

a74nh commented Aug 11, 2022

jakobbotsch commented Aug 15, 2022

jakobbotsch commented Aug 15, 2022

azure-pipelines bot commented Aug 15, 2022

jakobbotsch commented Oct 28, 2022

azure-pipelines bot commented Oct 28, 2022

jakobbotsch commented Oct 28, 2022

azure-pipelines bot commented Oct 28, 2022

jakobbotsch commented Oct 31, 2022

jakobbotsch commented Oct 31, 2022

kunalspathak commented Oct 31, 2022

a74nh commented Oct 31, 2022

jakobbotsch commented Oct 31, 2022

a74nh commented Oct 31, 2022

kunalspathak commented Oct 31, 2022

kunalspathak left a comment

Choose a reason for hiding this comment

kunalspathak Oct 31, 2022

Choose a reason for hiding this comment

kunalspathak Oct 31, 2022

Choose a reason for hiding this comment

jakobbotsch Nov 1, 2022

Choose a reason for hiding this comment

kunalspathak Oct 31, 2022

Choose a reason for hiding this comment

kunalspathak commented Oct 31, 2022

jakobbotsch commented Nov 1, 2022 • edited Loading

jakobbotsch commented Nov 1, 2022

azure-pipelines bot commented Nov 1, 2022

jakobbotsch commented Nov 1, 2022

AndyAyersMS commented Nov 3, 2022 • edited by EgorBo Loading

kunalspathak commented Nov 7, 2022

AndyAyersMS commented Nov 7, 2022

kunalspathak commented Nov 7, 2022

jakobbotsch commented Nov 1, 2022 •

edited

Loading

AndyAyersMS commented Nov 3, 2022 •

edited by EgorBo

Loading