Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: ARM64 SVE format encodings, SVE_GP_3A to SVE_HV_4A #98141

Merged
merged 8 commits into from
Feb 10, 2024

Conversation

TIHan
Copy link
Contributor

@TIHan TIHan commented Feb 8, 2024

Contributes to #94549

Adds formats:

  • SVE_GP_3A
  • SVE_HI_3A
  • SVE_HM_2A
  • SVE_HN_2A
  • SVE_HP_3A
  • SVE_HU_4B
  • SVE_HV_4A

Left: capstone,
Right: jit
image

You will notice that the instructions from format SVE_HM_2A, such a fadd, fsub, etc show differences of constant value between capstone and the JIT. This is a capstone bug as it is incorrectly displaying the wrong constant.

To show that it is a capstone bug, let's take the example:

    theEmitter->emitIns_R_R_F(INS_sve_fadd, EA_SCALABLE, REG_V0, REG_P0, 0.5,
                              INS_OPTS_SCALABLE_H); // FADD    <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <const>

This example is based on a LLVM test:

// CHECK-INST: fadd    z0.h, p0/m, z0.h, #0.5
// CHECK-ENCODING: [0x00,0x80,0x58,0x65]

JIT outputs: fadd z0.h, p0/m, z0.h, #0.5000
Capstone outputs: 00805865 fadd z0.h, p0/m, z0.h, #0.0

Notice capstone shows #0.0 instead of the expected #0.5. But, when you compare the hex code with LLVM's CHECK-ENCODING, they match.

@ghost ghost assigned TIHan Feb 8, 2024
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Feb 8, 2024
@ghost
Copy link

ghost commented Feb 8, 2024

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Contributes to #94549

Adds formats:

  • SVE_GP_3A
  • SVE_HI_3A
  • SVE_HM_2A
  • SVE_HN_2A
  • SVE_HP_3A
  • SVE_HU_4B
  • SVE_HV_4A

Left: capstone,
Right: jit
image

You will notice that the instructions from format SVE_HM_2A, such a fadd, fsub, etc show differences of constant value between capstone and the JIT. This is a capstone bug as it is incorrectly displaying the wrong constant.

To show that it is a capstone bug, let's take the example:

    theEmitter->emitIns_R_R_F(INS_sve_fadd, EA_SCALABLE, REG_V0, REG_P0, 0.5,
                              INS_OPTS_SCALABLE_H); // FADD    <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <const>

This example is based on a LLVM test:

// CHECK-INST: fadd    z0.h, p0/m, z0.h, #0.5
// CHECK-ENCODING: [0x00,0x80,0x58,0x65]

JIT outputs: fadd z0.h, p0/m, z0.h, #0.5000
Capstone outputs: 00805865 fadd z0.h, p0/m, z0.h, #0.0

Notice capstone shows #0.0 instead of the expected #0.5. But, when you compare the hex code with LLVM's CHECK-ENCODING, they match.

Author: TIHan
Assignees: TIHan
Labels:

area-CodeGen-coreclr

Milestone: -

@TIHan
Copy link
Contributor Author

TIHan commented Feb 8, 2024

@dotnet/arm64-contrib @dotnet/jit-contrib @kunalspathak @a74nh this is ready.

@ryujit-bot
Copy link

Diff results for #98141

Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

MinOpts (-0.00% to +0.01%)
Collection PDIFF
benchmarks.run.linux.arm64.checked.mch +0.01%
benchmarks.run_tiered.linux.arm64.checked.mch +0.01%
coreclr_tests.run.linux.arm64.checked.mch +0.01%
libraries.crossgen2.linux.arm64.checked.mch +0.01%
libraries.pmi.linux.arm64.checked.mch +0.01%
realworld.run.linux.arm64.checked.mch +0.01%

Throughput diffs for osx/arm64 ran on windows/x64

MinOpts (+0.00% to +0.01%)
Collection PDIFF
benchmarks.run.osx.arm64.checked.mch +0.01%
benchmarks.run_pgo.osx.arm64.checked.mch +0.01%
benchmarks.run_tiered.osx.arm64.checked.mch +0.01%
coreclr_tests.run.osx.arm64.checked.mch +0.01%
libraries.crossgen2.osx.arm64.checked.mch +0.01%
libraries.pmi.osx.arm64.checked.mch +0.01%
realworld.run.osx.arm64.checked.mch +0.01%

Throughput diffs for windows/arm64 ran on windows/x64

MinOpts (-0.00% to +0.01%)
Collection PDIFF
benchmarks.run.windows.arm64.checked.mch +0.01%
benchmarks.run_pgo.windows.arm64.checked.mch +0.01%
benchmarks.run_tiered.windows.arm64.checked.mch +0.01%
coreclr_tests.run.windows.arm64.checked.mch +0.01%
libraries.crossgen2.windows.arm64.checked.mch +0.01%
libraries.pmi.windows.arm64.checked.mch +0.01%
realworld.run.windows.arm64.checked.mch +0.01%

Details here


Copy link
Contributor

@a74nh a74nh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about the encodings of the imms. Everything else LGTM


if (immDbl != 0.0)
{
fpi.immFPIVal = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For all of these the float value is one of two values. It feels heavyweight to fully encode the float. Also I'm not sure if that requires a larger instrDesc to fit in the immediate.

If it does, then alternatively we could continue to overload _idRegBit with an instrDesc::idImmBit() function. Set the bit the same way it's set in the instruction encoding.

Then all the encoding and displaying becomes simpler too.

This could also be used for the group that has a 90 or 270 immediate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not super happy about it, but this is how we handle storing immediate floats in instrDesc. The current encode and decode functions are complicated, but using them isn't so bad.

For the 90 and 270 values, they are just like any other immediate value so I don't think we need to change that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I think here, we should encode them as 0, 1, etc. Related: #98187 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could as it doesn't matter too much since we have to decode them at display time, but since we already have encode/decode for immediate floats, it won't be too much different than what is already there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering why we need special handling for immDbl == 0.0? doesn't canEncodeFloatImm8() handle it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't handle 0 which is why I had to do it myself. However, since we are going to encode the rotation values as 0 - 3, I guess we should do the same here.

@a74nh
Copy link
Contributor

a74nh commented Feb 8, 2024

You will notice that the instructions from format SVE_HM_2A, such a fadd, fsub, etc show differences of constant value between capstone and the JIT. This is a capstone bug as it is incorrectly displaying the wrong constant.

Agreed. Capstone is showing values that are not valid for that instruction.

It's can't be your code overflowing as insEncodeSveFloatImmZero_to_Two() explicitly only sets a single bit or nothing. Plus the next bits are hardcoded 0 and capstone would error if those bits were set.

Looks like Capstone is just interpreting the single immediate bit incorrectly.

How recent is the Capstone branch we are working from? If it's still present in HEAD, I think it would be worth raising a bug on the capstone project.

@kunalspathak kunalspathak added the arm-sve Work related to arm64 SVE/SVE2 support label Feb 8, 2024
@TIHan
Copy link
Contributor Author

TIHan commented Feb 8, 2024

How recent is the Capstone branch we are working from?

It was from early January, I doubt this issue has been fixed. I checked their repo and no reports about it. I agree it would be good to report it.

Copy link
Member

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the encode change i suggested, can you double check the difference between capstone and jit?

src/coreclr/jit/emitarm64.cpp Outdated Show resolved Hide resolved
@ghost ghost added needs-author-action An issue or pull request that requires more info or actions from the author. and removed needs-author-action An issue or pull request that requires more info or actions from the author. labels Feb 9, 2024
@TIHan
Copy link
Contributor Author

TIHan commented Feb 9, 2024

@kunalspathak this is ready. I added the encodings that you suggested for the rotation values and the float constants.

@amanasifkhalid This will have the encodings for the rotation values for you to use.

@ryujit-bot
Copy link

Diff results for #98141

Throughput diffs

Throughput diffs for linux/arm64 ran on linux/x64

MinOpts (-0.00% to +0.01%)
Collection PDIFF
libraries.crossgen2.linux.arm64.checked.mch +0.01%
realworld.run.linux.arm64.checked.mch +0.01%

Throughput diffs for osx/arm64 ran on linux/x64

MinOpts (+0.00% to +0.01%)
Collection PDIFF
libraries.crossgen2.osx.arm64.checked.mch +0.01%
realworld.run.osx.arm64.checked.mch +0.01%

Throughput diffs for windows/arm64 ran on linux/x64

MinOpts (-0.00% to +0.01%)
Collection PDIFF
libraries.crossgen2.windows.arm64.checked.mch +0.01%
libraries.pmi.windows.arm64.checked.mch +0.01%

Details here


Copy link
Member

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

@ryujit-bot
Copy link

Diff results for #98141

Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

MinOpts (-0.00% to +0.01%)
Collection PDIFF
libraries.crossgen2.linux.arm64.checked.mch +0.01%
realworld.run.linux.arm64.checked.mch +0.01%

Throughput diffs for osx/arm64 ran on windows/x64

MinOpts (+0.00% to +0.01%)
Collection PDIFF
libraries.crossgen2.osx.arm64.checked.mch +0.01%
realworld.run.osx.arm64.checked.mch +0.01%

Throughput diffs for windows/arm64 ran on windows/x64

MinOpts (-0.00% to +0.01%)
Collection PDIFF
libraries.crossgen2.windows.arm64.checked.mch +0.01%
libraries.pmi.windows.arm64.checked.mch +0.01%

Details here


@TIHan
Copy link
Contributor Author

TIHan commented Feb 10, 2024

Merging this now, only thing I changed was formatting and everything was fine before, and the build is successful with the formatting.

@TIHan TIHan merged commit 334cb02 into dotnet:main Feb 10, 2024
94 of 129 checks passed
@TIHan TIHan deleted the arm64_sve_format_group7 branch February 10, 2024 00:12
@ryujit-bot
Copy link

Diff results for #98141

Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

MinOpts (-0.00% to +0.01%)
Collection PDIFF
libraries.crossgen2.linux.arm64.checked.mch +0.01%
libraries.pmi.linux.arm64.checked.mch +0.01%
realworld.run.linux.arm64.checked.mch +0.01%

Throughput diffs for osx/arm64 ran on windows/x64

MinOpts (+0.00% to +0.01%)
Collection PDIFF
libraries.crossgen2.osx.arm64.checked.mch +0.01%
libraries.pmi.osx.arm64.checked.mch +0.01%

Throughput diffs for windows/arm64 ran on windows/x64

MinOpts (-0.00% to +0.01%)
Collection PDIFF
libraries.crossgen2.windows.arm64.checked.mch +0.01%
libraries.pmi.windows.arm64.checked.mch +0.01%

Details here


Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI arm-sve Work related to arm64 SVE/SVE2 support
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants