JIT: ARM64 SVE format encodings, `SVE_GP_3A` to `SVE_HV_4A` #98141

TIHan · 2024-02-08T01:37:53Z

Contributes to #94549

Adds formats:

SVE_GP_3A
SVE_HI_3A
SVE_HM_2A
SVE_HN_2A
SVE_HP_3A
SVE_HU_4B
SVE_HV_4A

Left: capstone,
Right: jit

You will notice that the instructions from format SVE_HM_2A, such a fadd, fsub, etc show differences of constant value between capstone and the JIT. This is a capstone bug as it is incorrectly displaying the wrong constant.

To show that it is a capstone bug, let's take the example:

    theEmitter->emitIns_R_R_F(INS_sve_fadd, EA_SCALABLE, REG_V0, REG_P0, 0.5,
                              INS_OPTS_SCALABLE_H); // FADD    <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <const>

This example is based on a LLVM test:

// CHECK-INST: fadd    z0.h, p0/m, z0.h, #0.5
// CHECK-ENCODING: [0x00,0x80,0x58,0x65]

JIT outputs: fadd z0.h, p0/m, z0.h, #0.5000
Capstone outputs: 00805865 fadd z0.h, p0/m, z0.h, #0.0

Notice capstone shows #0.0 instead of the expected #0.5. But, when you compare the hex code with LLVM's CHECK-ENCODING, they match.

ghost · 2024-02-08T01:43:01Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Contributes to #94549

Adds formats:

SVE_GP_3A
SVE_HI_3A
SVE_HM_2A
SVE_HN_2A
SVE_HP_3A
SVE_HU_4B
SVE_HV_4A

Left: capstone,
Right: jit

You will notice that the instructions from format SVE_HM_2A, such a fadd, fsub, etc show differences of constant value between capstone and the JIT. This is a capstone bug as it is incorrectly displaying the wrong constant.

To show that it is a capstone bug, let's take the example:

    theEmitter->emitIns_R_R_F(INS_sve_fadd, EA_SCALABLE, REG_V0, REG_P0, 0.5,
                              INS_OPTS_SCALABLE_H); // FADD    <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <const>

This example is based on a LLVM test:

// CHECK-INST: fadd    z0.h, p0/m, z0.h, #0.5
// CHECK-ENCODING: [0x00,0x80,0x58,0x65]

JIT outputs: fadd z0.h, p0/m, z0.h, #0.5000
Capstone outputs: 00805865 fadd z0.h, p0/m, z0.h, #0.0

Notice capstone shows #0.0 instead of the expected #0.5. But, when you compare the hex code with LLVM's CHECK-ENCODING, they match.

Author:	TIHan
Assignees:	TIHan
Labels:	`area-CodeGen-coreclr`
Milestone:	-

TIHan · 2024-02-08T01:57:44Z

@dotnet/arm64-contrib @dotnet/jit-contrib @kunalspathak @a74nh this is ready.

ryujit-bot · 2024-02-08T06:12:29Z

Diff results for #98141

Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

MinOpts (-0.00% to +0.01%)

Collection	PDIFF
benchmarks.run.linux.arm64.checked.mch	+0.01%
benchmarks.run_tiered.linux.arm64.checked.mch	+0.01%
coreclr_tests.run.linux.arm64.checked.mch	+0.01%
libraries.crossgen2.linux.arm64.checked.mch	+0.01%
libraries.pmi.linux.arm64.checked.mch	+0.01%
realworld.run.linux.arm64.checked.mch	+0.01%

Throughput diffs for osx/arm64 ran on windows/x64

MinOpts (+0.00% to +0.01%)

Collection	PDIFF
benchmarks.run.osx.arm64.checked.mch	+0.01%
benchmarks.run_pgo.osx.arm64.checked.mch	+0.01%
benchmarks.run_tiered.osx.arm64.checked.mch	+0.01%
coreclr_tests.run.osx.arm64.checked.mch	+0.01%
libraries.crossgen2.osx.arm64.checked.mch	+0.01%
libraries.pmi.osx.arm64.checked.mch	+0.01%
realworld.run.osx.arm64.checked.mch	+0.01%

Throughput diffs for windows/arm64 ran on windows/x64

MinOpts (-0.00% to +0.01%)

Collection	PDIFF
benchmarks.run.windows.arm64.checked.mch	+0.01%
benchmarks.run_pgo.windows.arm64.checked.mch	+0.01%
benchmarks.run_tiered.windows.arm64.checked.mch	+0.01%
coreclr_tests.run.windows.arm64.checked.mch	+0.01%
libraries.crossgen2.windows.arm64.checked.mch	+0.01%
libraries.pmi.windows.arm64.checked.mch	+0.01%
realworld.run.windows.arm64.checked.mch	+0.01%

Details here

a74nh

Not sure about the encodings of the imms. Everything else LGTM

a74nh · 2024-02-08T10:58:50Z

src/coreclr/jit/emitarm64.cpp

+
+            if (immDbl != 0.0)
+            {
+                fpi.immFPIVal = 0;


For all of these the float value is one of two values. It feels heavyweight to fully encode the float. Also I'm not sure if that requires a larger instrDesc to fit in the immediate.

If it does, then alternatively we could continue to overload _idRegBit with an instrDesc::idImmBit() function. Set the bit the same way it's set in the instruction encoding.

Then all the encoding and displaying becomes simpler too.

This could also be used for the group that has a 90 or 270 immediate.

I'm not super happy about it, but this is how we handle storing immediate floats in instrDesc. The current encode and decode functions are complicated, but using them isn't so bad.

For the 90 and 270 values, they are just like any other immediate value so I don't think we need to change that.

Also I think here, we should encode them as 0, 1, etc. Related: #98187 (comment)

We could as it doesn't matter too much since we have to decode them at display time, but since we already have encode/decode for immediate floats, it won't be too much different than what is already there.

I am wondering why we need special handling for immDbl == 0.0? doesn't canEncodeFloatImm8() handle it?

It doesn't handle 0 which is why I had to do it myself. However, since we are going to encode the rotation values as 0 - 3, I guess we should do the same here.

a74nh · 2024-02-08T11:17:50Z

You will notice that the instructions from format SVE_HM_2A, such a fadd, fsub, etc show differences of constant value between capstone and the JIT. This is a capstone bug as it is incorrectly displaying the wrong constant.

Agreed. Capstone is showing values that are not valid for that instruction.

It's can't be your code overflowing as insEncodeSveFloatImmZero_to_Two() explicitly only sets a single bit or nothing. Plus the next bits are hardcoded 0 and capstone would error if those bits were set.

Looks like Capstone is just interpreting the single immediate bit incorrectly.

How recent is the Capstone branch we are working from? If it's still present in HEAD, I think it would be worth raising a bug on the capstone project.

TIHan · 2024-02-08T21:01:19Z

How recent is the Capstone branch we are working from?

It was from early January, I doubt this issue has been fixed. I checked their repo and no reports about it. I agree it would be good to report it.

kunalspathak

With the encode change i suggested, can you double check the difference between capstone and jit?

src/coreclr/jit/emitarm64.cpp

TIHan · 2024-02-09T20:50:46Z

@kunalspathak this is ready. I added the encodings that you suggested for the rotation values and the float constants.

@amanasifkhalid This will have the encodings for the rotation values for you to use.

ryujit-bot · 2024-02-09T22:17:06Z

Diff results for #98141

Throughput diffs

Throughput diffs for linux/arm64 ran on linux/x64

MinOpts (-0.00% to +0.01%)

Collection	PDIFF
libraries.crossgen2.linux.arm64.checked.mch	+0.01%
realworld.run.linux.arm64.checked.mch	+0.01%

Throughput diffs for osx/arm64 ran on linux/x64

MinOpts (+0.00% to +0.01%)

Collection	PDIFF
libraries.crossgen2.osx.arm64.checked.mch	+0.01%
realworld.run.osx.arm64.checked.mch	+0.01%

Throughput diffs for windows/arm64 ran on linux/x64

MinOpts (-0.00% to +0.01%)

Collection	PDIFF
libraries.crossgen2.windows.arm64.checked.mch	+0.01%
libraries.pmi.windows.arm64.checked.mch	+0.01%

Details here

kunalspathak

LGTM. Thanks!

src/coreclr/jit/emitarm64.h

ryujit-bot · 2024-02-09T23:17:19Z

Diff results for #98141

Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

MinOpts (-0.00% to +0.01%)

Collection	PDIFF
libraries.crossgen2.linux.arm64.checked.mch	+0.01%
realworld.run.linux.arm64.checked.mch	+0.01%

Throughput diffs for osx/arm64 ran on windows/x64

MinOpts (+0.00% to +0.01%)

Collection	PDIFF
libraries.crossgen2.osx.arm64.checked.mch	+0.01%
realworld.run.osx.arm64.checked.mch	+0.01%

Throughput diffs for windows/arm64 ran on windows/x64

MinOpts (-0.00% to +0.01%)

Collection	PDIFF
libraries.crossgen2.windows.arm64.checked.mch	+0.01%
libraries.pmi.windows.arm64.checked.mch	+0.01%

Details here

TIHan · 2024-02-10T00:12:20Z

Merging this now, only thing I changed was formatting and everything was fine before, and the build is successful with the formatting.

ryujit-bot · 2024-02-10T01:17:40Z

Diff results for #98141

Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

MinOpts (-0.00% to +0.01%)

Collection	PDIFF
libraries.crossgen2.linux.arm64.checked.mch	+0.01%
libraries.pmi.linux.arm64.checked.mch	+0.01%
realworld.run.linux.arm64.checked.mch	+0.01%

Throughput diffs for osx/arm64 ran on windows/x64

MinOpts (+0.00% to +0.01%)

Collection	PDIFF
libraries.crossgen2.osx.arm64.checked.mch	+0.01%
libraries.pmi.osx.arm64.checked.mch	+0.01%

Throughput diffs for windows/arm64 ran on windows/x64

MinOpts (-0.00% to +0.01%)

Collection	PDIFF
libraries.crossgen2.windows.arm64.checked.mch	+0.01%
libraries.pmi.windows.arm64.checked.mch	+0.01%

Details here

TIHan added 4 commits February 6, 2024 20:21

Added SVE_GP_3A, SVE_GT_4A, SVE_HI_3A, SVE_HM_2A formats

a299d60

Minor fix for display

40ec094

Added more formats

5968609

Small tweak to test

7b6063c

ghost assigned TIHan Feb 8, 2024

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Feb 8, 2024

Merged

895c2ec

TIHan mentioned this pull request Feb 8, 2024

Arm64: Implement SVE encodings #94549

Closed

This was referenced Feb 8, 2024

Tracking issue for CI build timeouts #76454

Closed

Tests crashing in CI with no dump: exit code 137 means SIGKILL Killed #97049

Closed

a74nh reviewed Feb 8, 2024

View reviewed changes

build-analysis bot mentioned this pull request Feb 8, 2024

PortableSourceBuild failures in runtime-dev-innerloop #98160

Closed

kunalspathak added the arm-sve Work related to arm64 SVE/SVE2 support label Feb 8, 2024

TIHan mentioned this pull request Feb 8, 2024

JIT ARM64-SVE: Add FK_3{A,B,C}, EJ_3A, EK_3A, EY_3B, EW_3{A,B} #98187

Merged

kunalspathak requested changes Feb 9, 2024

View reviewed changes

src/coreclr/jit/emitarm64.cpp Outdated Show resolved Hide resolved

ghost added needs-author-action An issue or pull request that requires more info or actions from the author. and removed needs-author-action An issue or pull request that requires more info or actions from the author. labels Feb 9, 2024

TIHan added 2 commits February 9, 2024 12:36

Feedback

bd12649

Merged with main

0135f5c

kunalspathak approved these changes Feb 9, 2024

View reviewed changes

kunalspathak reviewed Feb 9, 2024

View reviewed changes

src/coreclr/jit/emitarm64.h Show resolved Hide resolved

Formatting

92f4cd5

TIHan merged commit 334cb02 into dotnet:main Feb 10, 2024
94 of 129 checks passed

TIHan deleted the arm64_sve_format_group7 branch February 10, 2024 00:12

build-analysis bot mentioned this pull request Feb 10, 2024

System.Net.Security.Tests.SslStreamCertificateContextOcspLinuxTests.RefreshOcspResponse_BeforeExpiration test failure #97779

Closed

github-actions bot locked and limited conversation to collaborators Mar 12, 2024

JIT: ARM64 SVE format encodings, SVE_GP_3A to SVE_HV_4A #98141

JIT: ARM64 SVE format encodings, SVE_GP_3A to SVE_HV_4A #98141

Conversation

TIHan commented Feb 8, 2024 • edited Loading

ghost commented Feb 8, 2024

TIHan commented Feb 8, 2024

ryujit-bot commented Feb 8, 2024

Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

Throughput diffs for osx/arm64 ran on windows/x64

Throughput diffs for windows/arm64 ran on windows/x64

a74nh left a comment

Choose a reason for hiding this comment

a74nh Feb 8, 2024

Choose a reason for hiding this comment

TIHan Feb 8, 2024

Choose a reason for hiding this comment

kunalspathak Feb 9, 2024

Choose a reason for hiding this comment

TIHan Feb 9, 2024

Choose a reason for hiding this comment

kunalspathak Feb 9, 2024

Choose a reason for hiding this comment

TIHan Feb 9, 2024

Choose a reason for hiding this comment

a74nh commented Feb 8, 2024

TIHan commented Feb 8, 2024

kunalspathak left a comment

Choose a reason for hiding this comment

TIHan commented Feb 9, 2024

ryujit-bot commented Feb 9, 2024

Throughput diffs

Throughput diffs for linux/arm64 ran on linux/x64

Throughput diffs for osx/arm64 ran on linux/x64

Throughput diffs for windows/arm64 ran on linux/x64

kunalspathak left a comment

Choose a reason for hiding this comment

ryujit-bot commented Feb 9, 2024

Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

Throughput diffs for osx/arm64 ran on windows/x64

Throughput diffs for windows/arm64 ran on windows/x64

TIHan commented Feb 10, 2024

ryujit-bot commented Feb 10, 2024

Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

Throughput diffs for osx/arm64 ran on windows/x64

Throughput diffs for windows/arm64 ran on windows/x64

JIT: ARM64 SVE format encodings, `SVE_GP_3A` to `SVE_HV_4A` #98141

JIT: ARM64 SVE format encodings, `SVE_GP_3A` to `SVE_HV_4A` #98141

TIHan commented Feb 8, 2024 •

edited

Loading