Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT ARM64-SVE: Add IF_SVE_GU_3*, IF_SVE_GX_3*, IF_SVE_FF_3*, IF_SVE_GY_3B #98136

Merged
merged 4 commits into from
Feb 8, 2024

Conversation

amanasifkhalid
Copy link
Member

Part of #94549. Adds the following encodings:

  • IF_SVE_GU_3A
  • IF_SVE_GU_3B
  • IF_SVE_GU_3C
  • IF_SVE_GX_3A
  • IF_SVE_GX_3B
  • IF_SVE_GX_3C
  • IF_SVE_FF_3A
  • IF_SVE_FF_3B
  • IF_SVE_FF_3C
  • IF_SVE_GY_3B

cstool output:

fmla  z0.s, z2.s, z1.s[0]
fmla  z4.s, z6.s, z3.s[1]
fmls  z8.s, z10.s, z5.s[2]
fmls  z12.s, z14.s, z7.s[3]
fmla  z1.d, z0.d, z0.d[0]
fmla  z3.d, z2.d, z5.d[1]
fmls  z5.d, z4.d, z10.d[0]
fmls  z7.d, z6.d, z15.d[1]
bfmla z1.h, z2.h, z0.h[0]
bfmla z3.h, z4.h, z2.h[2]
bfmls z5.h, z6.h, z4.h[5]
bfmls z7.h, z8.h, z7.h[7]
fmul  z0.s, z2.s, z1.s[0]
fmul  z4.s, z6.s, z3.s[1]
fmul  z8.s, z10.s, z5.s[2]
fmul  z12.s, z14.s, z7.s[3]
fmul  z1.d, z0.d, z0.d[0]
fmul  z3.d, z2.d, z5.d[1]
fmul  z5.d, z4.d, z10.d[0]
fmul  z7.d, z6.d, z15.d[1]
bfmul z1.h, z2.h, z0.h[0]
bfmul z3.h, z4.h, z2.h[2]
bfmul z5.h, z6.h, z4.h[5]
bfmul z7.h, z8.h, z7.h[7]
fdot  z0.s, z2.h, z1.h[0]
fdot  z4.s, z6.h, z3.h[1]
bfdot z8.s, z10.h, z5.h[2]
bfdot z12.s, z14.h, z7.h[3]
mla   z0.h, z1.h, z1.h[1]
mla   z2.h, z3.h, z3.h[3]
mls   z4.h, z5.h, z5.h[5]
mls   z6.h, z7.h, z7.h[7]
mla   z8.s, z9.s, z1.s[0]
mla   z10.s, z11.s, z3.s[1]
mls   z12.s, z13.s, z5.s[2]
mls   z14.s, z15.s, z7.s[3]
mla   z16.d, z17.d, z0.d[0]
mla   z18.d, z19.d, z5.d[1]
mls   z20.d, z21.d, z10.d[0]
mls   z22.d, z23.d, z15.d[1]

JitDisasm output:

fmla    z0.s, z2.s, z1.s[0]
fmla    z4.s, z6.s, z3.s[1]
fmls    z8.s, z10.s, z5.s[2]
fmls    z12.s, z14.s, z7.s[3]
fmla    z1.d, z0.d, z0.d[0]
fmla    z3.d, z2.d, z5.d[1]
fmls    z5.d, z4.d, z10.d[0]
fmls    z7.d, z6.d, z15.d[1]
bfmla   z1.h, z2.h, z0.h[0]
bfmla   z3.h, z4.h, z2.h[2]
bfmls   z5.h, z6.h, z4.h[5]
bfmls   z7.h, z8.h, z7.h[7]
fmul    z0.s, z2.s, z1.s[0]
fmul    z4.s, z6.s, z3.s[1]
fmul    z8.s, z10.s, z5.s[2]
fmul    z12.s, z14.s, z7.s[3]
fmul    z1.d, z0.d, z0.d[0]
fmul    z3.d, z2.d, z5.d[1]
fmul    z5.d, z4.d, z10.d[0]
fmul    z7.d, z6.d, z15.d[1]
bfmul   z1.h, z2.h, z0.h[0]
bfmul   z3.h, z4.h, z2.h[2]
bfmul   z5.h, z6.h, z4.h[5]
bfmul   z7.h, z8.h, z7.h[7]
fdot    z0.s, z2.h, z1.h[0]
fdot    z4.s, z6.h, z3.h[1]
bfdot   z8.s, z10.h, z5.h[2]
bfdot   z12.s, z14.h, z7.h[3]
mla     z0.h, z1.h, z1.h[1]
mla     z2.h, z3.h, z3.h[3]
mls     z4.h, z5.h, z5.h[5]
mls     z6.h, z7.h, z7.h[7]
mla     z8.s, z9.s, z1.s[0]
mla     z10.s, z11.s, z3.s[1]
mls     z12.s, z13.s, z5.s[2]
mls     z14.s, z15.s, z7.s[3]
mla     z16.d, z17.d, z0.d[0]
mla     z18.d, z19.d, z5.d[1]
mls     z20.d, z21.d, z10.d[0]
mls     z22.d, z23.d, z15.d[1]

I'm not sure if IF_SVE_GY_3A and IF_SVE_GY_3B_D are valid encodings. I tried implementing them locally, but cstool wouldn't recognize them, and I don't see any other variants of FDOT (indexed) in the docs. Am I looking in the wrong place?

cc @dotnet/arm64-contrib

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Feb 7, 2024
@ghost ghost assigned amanasifkhalid Feb 7, 2024
@ghost
Copy link

ghost commented Feb 7, 2024

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Part of #94549. Adds the following encodings:

  • IF_SVE_GU_3A
  • IF_SVE_GU_3B
  • IF_SVE_GU_3C
  • IF_SVE_GX_3A
  • IF_SVE_GX_3B
  • IF_SVE_GX_3C
  • IF_SVE_FF_3A
  • IF_SVE_FF_3B
  • IF_SVE_FF_3C
  • IF_SVE_GY_3B

cstool output:

fmla  z0.s, z2.s, z1.s[0]
fmla  z4.s, z6.s, z3.s[1]
fmls  z8.s, z10.s, z5.s[2]
fmls  z12.s, z14.s, z7.s[3]
fmla  z1.d, z0.d, z0.d[0]
fmla  z3.d, z2.d, z5.d[1]
fmls  z5.d, z4.d, z10.d[0]
fmls  z7.d, z6.d, z15.d[1]
bfmla z1.h, z2.h, z0.h[0]
bfmla z3.h, z4.h, z2.h[2]
bfmls z5.h, z6.h, z4.h[5]
bfmls z7.h, z8.h, z7.h[7]
fmul  z0.s, z2.s, z1.s[0]
fmul  z4.s, z6.s, z3.s[1]
fmul  z8.s, z10.s, z5.s[2]
fmul  z12.s, z14.s, z7.s[3]
fmul  z1.d, z0.d, z0.d[0]
fmul  z3.d, z2.d, z5.d[1]
fmul  z5.d, z4.d, z10.d[0]
fmul  z7.d, z6.d, z15.d[1]
bfmul z1.h, z2.h, z0.h[0]
bfmul z3.h, z4.h, z2.h[2]
bfmul z5.h, z6.h, z4.h[5]
bfmul z7.h, z8.h, z7.h[7]
fdot  z0.s, z2.h, z1.h[0]
fdot  z4.s, z6.h, z3.h[1]
bfdot z8.s, z10.h, z5.h[2]
bfdot z12.s, z14.h, z7.h[3]
mla   z0.h, z1.h, z1.h[1]
mla   z2.h, z3.h, z3.h[3]
mls   z4.h, z5.h, z5.h[5]
mls   z6.h, z7.h, z7.h[7]
mla   z8.s, z9.s, z1.s[0]
mla   z10.s, z11.s, z3.s[1]
mls   z12.s, z13.s, z5.s[2]
mls   z14.s, z15.s, z7.s[3]
mla   z16.d, z17.d, z0.d[0]
mla   z18.d, z19.d, z5.d[1]
mls   z20.d, z21.d, z10.d[0]
mls   z22.d, z23.d, z15.d[1]

JitDisasm output:

fmla    z0.s, z2.s, z1.s[0]
fmla    z4.s, z6.s, z3.s[1]
fmls    z8.s, z10.s, z5.s[2]
fmls    z12.s, z14.s, z7.s[3]
fmla    z1.d, z0.d, z0.d[0]
fmla    z3.d, z2.d, z5.d[1]
fmls    z5.d, z4.d, z10.d[0]
fmls    z7.d, z6.d, z15.d[1]
bfmla   z1.h, z2.h, z0.h[0]
bfmla   z3.h, z4.h, z2.h[2]
bfmls   z5.h, z6.h, z4.h[5]
bfmls   z7.h, z8.h, z7.h[7]
fmul    z0.s, z2.s, z1.s[0]
fmul    z4.s, z6.s, z3.s[1]
fmul    z8.s, z10.s, z5.s[2]
fmul    z12.s, z14.s, z7.s[3]
fmul    z1.d, z0.d, z0.d[0]
fmul    z3.d, z2.d, z5.d[1]
fmul    z5.d, z4.d, z10.d[0]
fmul    z7.d, z6.d, z15.d[1]
bfmul   z1.h, z2.h, z0.h[0]
bfmul   z3.h, z4.h, z2.h[2]
bfmul   z5.h, z6.h, z4.h[5]
bfmul   z7.h, z8.h, z7.h[7]
fdot    z0.s, z2.h, z1.h[0]
fdot    z4.s, z6.h, z3.h[1]
bfdot   z8.s, z10.h, z5.h[2]
bfdot   z12.s, z14.h, z7.h[3]
mla     z0.h, z1.h, z1.h[1]
mla     z2.h, z3.h, z3.h[3]
mls     z4.h, z5.h, z5.h[5]
mls     z6.h, z7.h, z7.h[7]
mla     z8.s, z9.s, z1.s[0]
mla     z10.s, z11.s, z3.s[1]
mls     z12.s, z13.s, z5.s[2]
mls     z14.s, z15.s, z7.s[3]
mla     z16.d, z17.d, z0.d[0]
mla     z18.d, z19.d, z5.d[1]
mls     z20.d, z21.d, z10.d[0]
mls     z22.d, z23.d, z15.d[1]

I'm not sure if IF_SVE_GY_3A and IF_SVE_GY_3B_D are valid encodings. I tried implementing them locally, but cstool wouldn't recognize them, and I don't see any other variants of FDOT (indexed) in the docs. Am I looking in the wrong place?

cc @dotnet/arm64-contrib

Author: amanasifkhalid
Assignees: amanasifkhalid
Labels:

area-CodeGen-coreclr

Milestone: -

@amanasifkhalid amanasifkhalid added the arm-sve Work related to arm64 SVE/SVE2 support label Feb 7, 2024
@amanasifkhalid amanasifkhalid added this to the 9.0.0 milestone Feb 7, 2024
Copy link
Contributor

@TIHan TIHan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM from my point of view

@ryujit-bot
Copy link

Diff results for #98136

Throughput diffs

Throughput diffs for windows/arm64 ran on windows/x64

MinOpts (0.00% to +0.01%)
Collection PDIFF
libraries.pmi.windows.arm64.checked.mch +0.01%

Details here


Copy link
Contributor

@a74nh a74nh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Very happy to see lots of code reuse happening.

@a74nh
Copy link
Contributor

a74nh commented Feb 8, 2024

I'm not sure if IF_SVE_GY_3A and IF_SVE_GY_3B_D are valid encodings. I tried implementing them locally, but cstool wouldn't recognize them, and I don't see any other variants of FDOT (indexed) in the docs. Am I looking in the wrong place?

Try https://docsmirror.github.io/A64/2023-09/sveindex.html - note it's the 2023-09 release instead of 2023-06. There have been a few new instructions added for FEAT_FP8DOT2 and FEAT_FP8DOT4.

I'm not surprised these are not supported in cstool yet.

There were some other instructions not in cstool. For those the encodings were fairly straightforward, so I added the code and then ifdefed out the tests using ALL_ARM64_EMITTER_UNIT_TESTS_SVE_UNSUPPORTED. Plus, I added a unreached() in the emitIns_R_R_*() function. When supported is added in cstool it'll be fairly quick to test (and it's easier to write the code now as it's in our heads).

Interestingly, there is now a 2023-12 release. But there's nothing on docsmirror yet and the autogenerated stuff is based on 2023-09, so we'll stick with that and should aim to get all of 2023-09 support in. But, it'll be a few years before anything very recent gets into real hardware.

@amanasifkhalid
Copy link
Member Author

@a74nh thanks for the updated docs link. I'll follow your lead with adding and disabling those encodings in a follow-up PR.

Thank you both for the reviews!

@amanasifkhalid
Copy link
Member Author

Failures are known.

@amanasifkhalid amanasifkhalid merged commit 247f8cd into dotnet:main Feb 8, 2024
122 of 129 checks passed
@amanasifkhalid amanasifkhalid deleted the sve-mul branch February 8, 2024 15:49
@github-actions github-actions bot locked and limited conversation to collaborators Mar 10, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI arm-sve Work related to arm64 SVE/SVE2 support
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants