Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT ARM64-SVE: Add IF_SVE_F{E,G,H,I,J}_3* #98142

Merged
merged 8 commits into from
Feb 8, 2024

Conversation

amanasifkhalid
Copy link
Member

Part of #94549. Implements the following encodings:

  • If_SVE_FE_3A
  • If_SVE_FE_3B
  • If_SVE_FG_3A
  • If_SVE_FG_3B
  • If_SVE_FH_3A
  • If_SVE_FH_3B
  • If_SVE_FI_3A
  • If_SVE_FI_3B
  • If_SVE_FI_3C
  • If_SVE_FJ_3A
  • If_SVE_FJ_3B

cstool output:

smullb        z0.s, z1.h, z0.h[0]
smullb        z2.s, z3.h, z1.h[1]
smullt        z4.s, z5.h, z2.h[2]
smullt        z6.s, z7.h, z3.h[3]
umullb        z8.s, z9.h, z4.h[4]
umullb        z10.s, z11.h, z5.h[5]
umullt        z12.s, z13.h, z6.h[6]
umullt        z14.s, z15.h, z7.h[7]
smullb        z0.d, z1.s, z0.s[0]
smullb        z2.d, z3.s, z2.s[1]
smullt        z4.d, z5.s, z4.s[2]
smullt        z6.d, z7.s, z6.s[3]
umullb        z8.d, z9.s, z8.s[0]
umullb        z10.d, z11.s, z10.s[1]
umullt        z12.d, z13.s, z12.s[2]
umullt        z14.d, z15.s, z14.s[3]
smlalb        z0.s, z1.h, z0.h[0]
smlalt        z2.s, z3.h, z1.h[1]
smlslb        z4.s, z5.h, z2.h[2]
smlslt        z6.s, z7.h, z3.h[3]
umlalb        z8.s, z9.h, z4.h[4]
umlalt        z10.s, z11.h, z5.h[5]
umlslb        z12.s, z13.h, z6.h[6]
umlslt        z14.s, z15.h, z7.h[7]
smlalb        z0.d, z1.s, z0.s[0]
smlalt        z2.d, z3.s, z2.s[1]
smlslb        z4.d, z5.s, z4.s[2]
smlslt        z6.d, z7.s, z6.s[3]
umlalb        z8.d, z9.s, z8.s[0]
umlalt        z10.d, z11.s, z10.s[1]
umlslb        z12.d, z13.s, z12.s[2]
umlslt        z14.d, z15.s, z14.s[3]
sqdmullb      z0.s, z2.h, z1.h[1]
sqdmullb      z4.s, z6.h, z3.h[3]
sqdmullt      z8.s, z10.h, z5.h[5]
sqdmullt      z12.s, z14.h, z7.h[7]
sqdmullb      z0.d, z2.s, z0.s[0]
sqdmullb      z4.d, z6.s, z5.s[1]
sqdmullt      z8.d, z10.s, z10.s[2]
sqdmullt      z12.d, z14.s, z15.s[3]
sqdmulh       z0.h, z1.h, z1.h[1]
sqdmulh       z2.h, z3.h, z3.h[3]
sqrdmulh      z4.h, z5.h, z5.h[5]
sqrdmulh      z6.h, z7.h, z7.h[7]
sqdmulh       z8.s, z9.s, z0.s[0]
sqdmulh       z10.s, z11.s, z2.s[1]
sqrdmulh      z12.s, z13.s, z4.s[2]
sqrdmulh      z14.s, z15.s, z6.s[3]
sqdmulh       z16.d, z17.d, z0.d[0]
sqdmulh       z18.d, z19.d, z5.d[1]
sqrdmulh      z20.d, z21.d, z10.d[0]
sqrdmulh      z22.d, z23.d, z15.d[1]
sqdmlalb      z0.s, z1.h, z1.h[1]
sqdmlalt      z2.s, z3.h, z3.h[3]
sqdmlslb      z4.s, z5.h, z5.h[5]
sqdmlslt      z6.s, z0.h, z7.h[7]
sqdmlalb      z8.d, z9.s, z0.s[0]
sqdmlalt      z10.d, z11.s, z5.s[1]
sqdmlslb      z12.d, z13.s, z10.s[2]
sqdmlslt      z14.d, z15.s, z15.s[3]

JitDisasm output:

smullb  z0.s, z1.h, z0.h[0]
smullb  z2.s, z3.h, z1.h[1]
smullt  z4.s, z5.h, z2.h[2]
smullt  z6.s, z7.h, z3.h[3]
umullb  z8.s, z9.h, z4.h[4]
umullb  z10.s, z11.h, z5.h[5]
umullt  z12.s, z13.h, z6.h[6]
umullt  z14.s, z15.h, z7.h[7]
smullb  z0.d, z1.s, z0.s[0]
smullb  z2.d, z3.s, z2.s[1]
smullt  z4.d, z5.s, z4.s[2]
smullt  z6.d, z7.s, z6.s[3]
umullb  z8.d, z9.s, z8.s[0]
umullb  z10.d, z11.s, z10.s[1]
umullt  z12.d, z13.s, z12.s[2]
umullt  z14.d, z15.s, z14.s[3]
smlalb  z0.s, z1.h, z0.h[0]
smlalt  z2.s, z3.h, z1.h[1]
smlslb  z4.s, z5.h, z2.h[2]
smlslt  z6.s, z7.h, z3.h[3]
umlalb  z8.s, z9.h, z4.h[4]
umlalt  z10.s, z11.h, z5.h[5]
umlslb  z12.s, z13.h, z6.h[6]
umlslt  z14.s, z15.h, z7.h[7]
smlalb  z0.d, z1.s, z0.s[0]
smlalt  z2.d, z3.s, z2.s[1]
smlslb  z4.d, z5.s, z4.s[2]
smlslt  z6.d, z7.s, z6.s[3]
umlalb  z8.d, z9.s, z8.s[0]
umlalt  z10.d, z11.s, z10.s[1]
umlslb  z12.d, z13.s, z12.s[2]
umlslt  z14.d, z15.s, z14.s[3]
sqdmullb z0.s, z2.h, z1.h[1]
sqdmullb z4.s, z6.h, z3.h[3]
sqdmullt z8.s, z10.h, z5.h[5]
sqdmullt z12.s, z14.h, z7.h[7]
sqdmullb z0.d, z2.s, z0.s[0]
sqdmullb z4.d, z6.s, z5.s[1]
sqdmullt z8.d, z10.s, z10.s[2]
sqdmullt z12.d, z14.s, z15.s[3]
sqdmulh z0.h, z1.h, z1.h[1]
sqdmulh z2.h, z3.h, z3.h[3]
sqrdmulh z4.h, z5.h, z5.h[5]
sqrdmulh z6.h, z7.h, z7.h[7]
sqdmulh z8.s, z9.s, z0.s[0]
sqdmulh z10.s, z11.s, z2.s[1]
sqrdmulh z12.s, z13.s, z4.s[2]
sqrdmulh z14.s, z15.s, z6.s[3]
sqdmulh z16.d, z17.d, z0.d[0]
sqdmulh z18.d, z19.d, z5.d[1]
sqrdmulh z20.d, z21.d, z10.d[0]
sqrdmulh z22.d, z23.d, z15.d[1]
sqdmlalb z0.s, z1.h, z1.h[1]
sqdmlalt z2.s, z3.h, z3.h[3]
sqdmlslb z4.s, z5.h, z5.h[5]
sqdmlslt z6.s, z0.h, z7.h[7]
sqdmlalb z8.d, z9.s, z0.s[0]
sqdmlalt z10.d, z11.s, z5.s[1]
sqdmlslb z12.d, z13.s, z10.s[2]
sqdmlslt z14.d, z15.s, z15.s[3]

cc @dotnet/arm64-contrib.

@ghost ghost assigned amanasifkhalid Feb 8, 2024
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Feb 8, 2024
@ghost
Copy link

ghost commented Feb 8, 2024

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Part of #94549. Implements the following encodings:

  • If_SVE_FE_3A
  • If_SVE_FE_3B
  • If_SVE_FG_3A
  • If_SVE_FG_3B
  • If_SVE_FH_3A
  • If_SVE_FH_3B
  • If_SVE_FI_3A
  • If_SVE_FI_3B
  • If_SVE_FI_3C
  • If_SVE_FJ_3A
  • If_SVE_FJ_3B

cstool output:

smullb        z0.s, z1.h, z0.h[0]
smullb        z2.s, z3.h, z1.h[1]
smullt        z4.s, z5.h, z2.h[2]
smullt        z6.s, z7.h, z3.h[3]
umullb        z8.s, z9.h, z4.h[4]
umullb        z10.s, z11.h, z5.h[5]
umullt        z12.s, z13.h, z6.h[6]
umullt        z14.s, z15.h, z7.h[7]
smullb        z0.d, z1.s, z0.s[0]
smullb        z2.d, z3.s, z2.s[1]
smullt        z4.d, z5.s, z4.s[2]
smullt        z6.d, z7.s, z6.s[3]
umullb        z8.d, z9.s, z8.s[0]
umullb        z10.d, z11.s, z10.s[1]
umullt        z12.d, z13.s, z12.s[2]
umullt        z14.d, z15.s, z14.s[3]
smlalb        z0.s, z1.h, z0.h[0]
smlalt        z2.s, z3.h, z1.h[1]
smlslb        z4.s, z5.h, z2.h[2]
smlslt        z6.s, z7.h, z3.h[3]
umlalb        z8.s, z9.h, z4.h[4]
umlalt        z10.s, z11.h, z5.h[5]
umlslb        z12.s, z13.h, z6.h[6]
umlslt        z14.s, z15.h, z7.h[7]
smlalb        z0.d, z1.s, z0.s[0]
smlalt        z2.d, z3.s, z2.s[1]
smlslb        z4.d, z5.s, z4.s[2]
smlslt        z6.d, z7.s, z6.s[3]
umlalb        z8.d, z9.s, z8.s[0]
umlalt        z10.d, z11.s, z10.s[1]
umlslb        z12.d, z13.s, z12.s[2]
umlslt        z14.d, z15.s, z14.s[3]
sqdmullb      z0.s, z2.h, z1.h[1]
sqdmullb      z4.s, z6.h, z3.h[3]
sqdmullt      z8.s, z10.h, z5.h[5]
sqdmullt      z12.s, z14.h, z7.h[7]
sqdmullb      z0.d, z2.s, z0.s[0]
sqdmullb      z4.d, z6.s, z5.s[1]
sqdmullt      z8.d, z10.s, z10.s[2]
sqdmullt      z12.d, z14.s, z15.s[3]
sqdmulh       z0.h, z1.h, z1.h[1]
sqdmulh       z2.h, z3.h, z3.h[3]
sqrdmulh      z4.h, z5.h, z5.h[5]
sqrdmulh      z6.h, z7.h, z7.h[7]
sqdmulh       z8.s, z9.s, z0.s[0]
sqdmulh       z10.s, z11.s, z2.s[1]
sqrdmulh      z12.s, z13.s, z4.s[2]
sqrdmulh      z14.s, z15.s, z6.s[3]
sqdmulh       z16.d, z17.d, z0.d[0]
sqdmulh       z18.d, z19.d, z5.d[1]
sqrdmulh      z20.d, z21.d, z10.d[0]
sqrdmulh      z22.d, z23.d, z15.d[1]
sqdmlalb      z0.s, z1.h, z1.h[1]
sqdmlalt      z2.s, z3.h, z3.h[3]
sqdmlslb      z4.s, z5.h, z5.h[5]
sqdmlslt      z6.s, z0.h, z7.h[7]
sqdmlalb      z8.d, z9.s, z0.s[0]
sqdmlalt      z10.d, z11.s, z5.s[1]
sqdmlslb      z12.d, z13.s, z10.s[2]
sqdmlslt      z14.d, z15.s, z15.s[3]

JitDisasm output:

smullb  z0.s, z1.h, z0.h[0]
smullb  z2.s, z3.h, z1.h[1]
smullt  z4.s, z5.h, z2.h[2]
smullt  z6.s, z7.h, z3.h[3]
umullb  z8.s, z9.h, z4.h[4]
umullb  z10.s, z11.h, z5.h[5]
umullt  z12.s, z13.h, z6.h[6]
umullt  z14.s, z15.h, z7.h[7]
smullb  z0.d, z1.s, z0.s[0]
smullb  z2.d, z3.s, z2.s[1]
smullt  z4.d, z5.s, z4.s[2]
smullt  z6.d, z7.s, z6.s[3]
umullb  z8.d, z9.s, z8.s[0]
umullb  z10.d, z11.s, z10.s[1]
umullt  z12.d, z13.s, z12.s[2]
umullt  z14.d, z15.s, z14.s[3]
smlalb  z0.s, z1.h, z0.h[0]
smlalt  z2.s, z3.h, z1.h[1]
smlslb  z4.s, z5.h, z2.h[2]
smlslt  z6.s, z7.h, z3.h[3]
umlalb  z8.s, z9.h, z4.h[4]
umlalt  z10.s, z11.h, z5.h[5]
umlslb  z12.s, z13.h, z6.h[6]
umlslt  z14.s, z15.h, z7.h[7]
smlalb  z0.d, z1.s, z0.s[0]
smlalt  z2.d, z3.s, z2.s[1]
smlslb  z4.d, z5.s, z4.s[2]
smlslt  z6.d, z7.s, z6.s[3]
umlalb  z8.d, z9.s, z8.s[0]
umlalt  z10.d, z11.s, z10.s[1]
umlslb  z12.d, z13.s, z12.s[2]
umlslt  z14.d, z15.s, z14.s[3]
sqdmullb z0.s, z2.h, z1.h[1]
sqdmullb z4.s, z6.h, z3.h[3]
sqdmullt z8.s, z10.h, z5.h[5]
sqdmullt z12.s, z14.h, z7.h[7]
sqdmullb z0.d, z2.s, z0.s[0]
sqdmullb z4.d, z6.s, z5.s[1]
sqdmullt z8.d, z10.s, z10.s[2]
sqdmullt z12.d, z14.s, z15.s[3]
sqdmulh z0.h, z1.h, z1.h[1]
sqdmulh z2.h, z3.h, z3.h[3]
sqrdmulh z4.h, z5.h, z5.h[5]
sqrdmulh z6.h, z7.h, z7.h[7]
sqdmulh z8.s, z9.s, z0.s[0]
sqdmulh z10.s, z11.s, z2.s[1]
sqrdmulh z12.s, z13.s, z4.s[2]
sqrdmulh z14.s, z15.s, z6.s[3]
sqdmulh z16.d, z17.d, z0.d[0]
sqdmulh z18.d, z19.d, z5.d[1]
sqrdmulh z20.d, z21.d, z10.d[0]
sqrdmulh z22.d, z23.d, z15.d[1]
sqdmlalb z0.s, z1.h, z1.h[1]
sqdmlalt z2.s, z3.h, z3.h[3]
sqdmlslb z4.s, z5.h, z5.h[5]
sqdmlslt z6.s, z0.h, z7.h[7]
sqdmlalb z8.d, z9.s, z0.s[0]
sqdmlalt z10.d, z11.s, z5.s[1]
sqdmlslb z12.d, z13.s, z10.s[2]
sqdmlslt z14.d, z15.s, z15.s[3]

cc @dotnet/arm64-contrib.

Author: amanasifkhalid
Assignees: amanasifkhalid
Labels:

area-CodeGen-coreclr

Milestone: -

@amanasifkhalid amanasifkhalid added the arm-sve Work related to arm64 SVE/SVE2 support label Feb 8, 2024
@ryujit-bot
Copy link

Diff results for #98142

Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

MinOpts (-0.01% to +0.00%)
Collection PDIFF
libraries.pmi.linux.arm64.checked.mch -0.01%

Throughput diffs for windows/arm64 ran on windows/x64

MinOpts (-0.00% to +0.01%)
Collection PDIFF
libraries.pmi.windows.arm64.checked.mch +0.01%

Details here


Copy link
Contributor

@a74nh a74nh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ryujit-bot
Copy link

Diff results for #98142

Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

MinOpts (-0.01% to +0.00%)
Collection PDIFF
libraries.pmi.linux.arm64.checked.mch -0.01%

Throughput diffs for windows/arm64 ran on windows/x64

MinOpts (-0.01% to +0.00%)
Collection PDIFF
libraries.pmi.windows.arm64.checked.mch -0.01%

Details here


Copy link
Contributor

@TIHan TIHan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM as well

@amanasifkhalid amanasifkhalid merged commit 87c1431 into dotnet:main Feb 8, 2024
127 of 129 checks passed
@amanasifkhalid amanasifkhalid deleted the sve-smullb branch February 8, 2024 23:27
@github-actions github-actions bot locked and limited conversation to collaborators Mar 10, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI arm-sve Work related to arm64 SVE/SVE2 support
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants