Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT ARM64-SVE: Add encodings FL_3A to FX_3A #98832

Merged
merged 11 commits into from
Feb 23, 2024

Conversation

amanasifkhalid
Copy link
Member

@amanasifkhalid amanasifkhalid commented Feb 22, 2024

Part of #94549. Implements the following encodings:

  • IF_SVE_FL_3A
  • IF_SVE_FM_3A
  • IF_SVE_FN_3A
  • IF_SVE_FN_3B
  • IF_SVE_FO_3A
  • IF_SVE_FP_3A
  • IF_SVE_FQ_3A
  • IF_SVE_FS_3A
  • IF_SVE_FW_3A
  • IF_SVE_FX_3A

cstool output:

sabdlb        z0.h, z1.b, z2.b
sabdlt        z3.s, z4.h, z5.h
saddlb        z6.d, z7.s, z8.s
saddlt        z9.h, z10.b, z11.b
ssublb        z12.s, z13.h, z14.h
ssublt        z15.d, z16.s, z17.s
uabdlb        z18.h, z19.b, z20.b
uabdlt        z21.s, z22.h, z24.h
uaddlb        z24.d, z25.s, z26.s
uaddlt        z27.h, z28.b, z29.b
usublb        z30.s, z31.h, z0.h
usublt        z1.d, z2.s, z3.s
saddwb        z0.h, z1.h, z2.b
saddwt        z3.s, z4.s, z5.h
ssubwb        z6.d, z7.d, z8.s
ssubwt        z9.h, z10.h, z11.b
uaddwb        z12.s, z13.s, z14.h
uaddwt        z15.d, z16.d, z17.s
usubwb        z18.h, z19.h, z20.b
usubwt        z21.s, z22.s, z23.h
pmullb        z0.h, z1.b, z2.b
pmullt        z3.d, z4.s, z5.s
smullb        z6.h, z7.b, z8.b
smullt        z9.d, z10.s, z11.s
sqdmullb      z12.h, z13.b, z14.b
sqdmullt      z15.d, z16.s, z17.s
umullb        z18.h, z19.b, z20.b
umullt        z21.d, z22.s, z23.s
pmullb        z0.q, z1.d, z2.d
pmullt        z3.q, z4.d, z5.d
smmla z0.s, z1.b, z2.b
ummla z3.s, z4.b, z5.b
usmmla        z6.s, z7.b, z8.b
eorbt z0.b, z1.b, z2.b
eorbt z3.h, z4.h, z5.h
eortb z6.s, z7.s, z8.s
eortb z9.d, z10.d, z11.d
bdep  z0.b, z1.b, z2.b
bext  z3.h, z4.h, z5.h
bgrp  z6.s, z7.s, z8.s
bgrp  z9.d, z10.d, z11.d
saddlbt       z0.h, z1.b, z2.b
ssublbt       z3.s, z4.h, z5.h
ssubltb       z6.d, z7.s, z8.s
saba  z0.b, z1.b, z2.b
saba  z3.h, z4.h, z5.h
uaba  z6.s, z7.s, z8.s
uaba  z9.d, z10.d, z11.d
sabalb        z0.h, z1.b, z2.b
sabalt        z3.s, z4.h, z5.h
uabalb        z6.d, z7.s, z8.s
uabalt        z9.h, z10.b, z11.b

JitDisasm output:

sabdlb  z0.h, z1.b, z2.b
sabdlt  z3.s, z4.h, z5.h
saddlb  z6.d, z7.s, z8.s
saddlt  z9.h, z10.b, z11.b
ssublb  z12.s, z13.h, z14.h
ssublt  z15.d, z16.s, z17.s
uabdlb  z18.h, z19.b, z20.b
uabdlt  z21.s, z22.h, z24.h
uaddlb  z24.d, z25.s, z26.s
uaddlt  z27.h, z28.b, z29.b
usublb  z30.s, z31.h, z0.h
usublt  z1.d, z2.s, z3.s
saddwb  z0.h, z1.h, z2.b
saddwt  z3.s, z4.s, z5.h
ssubwb  z6.d, z7.d, z8.s
ssubwt  z9.h, z10.h, z11.b
uaddwb  z12.s, z13.s, z14.h
uaddwt  z15.d, z16.d, z17.s
usubwb  z18.h, z19.h, z20.b
usubwt  z21.s, z22.s, z23.h
pmullb  z0.h, z1.b, z2.b
pmullt  z3.d, z4.s, z5.s
smullb  z6.h, z7.b, z8.b
smullt  z9.d, z10.s, z11.s
sqdmullb z12.h, z13.b, z14.b
sqdmullt z15.d, z16.s, z17.s
umullb  z18.h, z19.b, z20.b
umullt  z21.d, z22.s, z23.s
pmullb  z0.q, z1.d, z2.d
pmullt  z3.q, z4.d, z5.d
smmla   z0.s, z1.b, z2.b
ummla   z3.s, z4.b, z5.b
usmmla  z6.s, z7.b, z8.b
eorbt   z0.b, z1.b, z2.b
eorbt   z3.h, z4.h, z5.h
eortb   z6.s, z7.s, z8.s
eortb   z9.d, z10.d, z11.d
bdep    z0.b, z1.b, z2.b
bext    z3.h, z4.h, z5.h
bgrp    z6.s, z7.s, z8.s
bgrp    z9.d, z10.d, z11.d
saddlbt z0.h, z1.b, z2.b
ssublbt z3.s, z4.h, z5.h
ssubltb z6.d, z7.s, z8.s
saba    z0.b, z1.b, z2.b
saba    z3.h, z4.h, z5.h
uaba    z6.s, z7.s, z8.s
uaba    z9.d, z10.d, z11.d
sabalb  z0.h, z1.b, z2.b
sabalt  z3.s, z4.h, z5.h
uabalb  z6.d, z7.s, z8.s
uabalt  z9.h, z10.b, z11.b

cc @dotnet/arm64-contrib

@ghost ghost assigned amanasifkhalid Feb 22, 2024
@amanasifkhalid amanasifkhalid marked this pull request as ready for review February 22, 2024 20:17
@amanasifkhalid amanasifkhalid added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI arm-sve Work related to arm64 SVE/SVE2 support labels Feb 22, 2024
@ghost
Copy link

ghost commented Feb 22, 2024

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Part of #94549. Implements the following encodings:

  • IF_SVE_FL_3A
  • IF_SVE_FM_3A
  • IF_SVE_FN_3A
  • IF_SVE_FN_3B
  • IF_SVE_FO_3A
  • IF_SVE_FP_3A
  • IF_SVE_FQ_3A
  • IF_SVE_FS_3A
  • IF_SVE_FW_3A
  • IF_SVE_FX_3A

cstool output:

sabdlb        z0.h, z1.b, z2.b
sabdlt        z3.s, z4.h, z5.h
saddlb        z6.d, z7.s, z8.s
saddlt        z9.h, z10.b, z11.b
ssublb        z12.s, z13.h, z14.h
ssublt        z15.d, z16.s, z17.s
uabdlb        z18.h, z19.b, z20.b
uabdlt        z21.s, z22.h, z24.h
uaddlb        z24.d, z25.s, z26.s
uaddlt        z27.h, z28.b, z29.b
usublb        z30.s, z31.h, z0.h
usublt        z1.d, z2.s, z3.s
saddwb        z0.h, z1.h, z2.b
saddwt        z3.s, z4.s, z5.h
ssubwb        z6.d, z7.d, z8.s
ssubwt        z9.h, z10.h, z11.b
uaddwb        z12.s, z13.s, z14.h
uaddwt        z15.d, z16.d, z17.s
usubwb        z18.h, z19.h, z20.b
usubwt        z21.s, z22.s, z23.h
pmullb        z0.h, z1.b, z2.b
pmullt        z3.d, z4.s, z5.s
smullb        z6.h, z7.b, z8.b
smullt        z9.d, z10.s, z11.s
sqdmullb      z12.h, z13.b, z14.b
sqdmullt      z15.d, z16.s, z17.s
umullb        z18.h, z19.b, z20.b
umullt        z21.d, z22.s, z23.s
pmullb        z0.q, z1.d, z2.d
pmullt        z3.q, z4.d, z5.d
smmla z0.s, z1.b, z2.b
ummla z3.s, z4.b, z5.b
usmmla        z6.s, z7.b, z8.b
eorbt z0.b, z1.b, z2.b
eorbt z3.h, z4.h, z5.h
eortb z6.s, z7.s, z8.s
eortb z9.d, z10.d, z11.d
bdep  z0.b, z1.b, z2.b
bext  z3.h, z4.h, z5.h
bgrp  z6.s, z7.s, z8.s
bgrp  z9.d, z10.d, z11.d
saddlbt       z0.h, z1.b, z2.b
ssublbt       z3.s, z4.h, z5.h
ssubltb       z6.d, z7.s, z8.s
saba  z0.b, z1.b, z2.b
saba  z3.h, z4.h, z5.h
uaba  z6.s, z7.s, z8.s
uaba  z9.d, z10.d, z11.d
sabalb        z0.h, z1.b, z2.b
sabalt        z3.s, z4.h, z5.h
uabalb        z6.d, z7.s, z8.s
uabalt        z9.h, z10.b, z11.b

JitDisasm output:

sabdlb  z0.h, z1.b, z2.b
sabdlt  z3.s, z4.h, z5.h
saddlb  z6.d, z7.s, z8.s
saddlt  z9.h, z10.b, z11.b
ssublb  z12.s, z13.h, z14.h
ssublt  z15.d, z16.s, z17.s
uabdlb  z18.h, z19.b, z20.b
uabdlt  z21.s, z22.h, z24.h
uaddlb  z24.d, z25.s, z26.s
uaddlt  z27.h, z28.b, z29.b
usublb  z30.s, z31.h, z0.h
usublt  z1.d, z2.s, z3.s
saddwb  z0.h, z1.h, z2.b
saddwt  z3.s, z4.s, z5.h
ssubwb  z6.d, z7.d, z8.s
ssubwt  z9.h, z10.h, z11.b
uaddwb  z12.s, z13.s, z14.h
uaddwt  z15.d, z16.d, z17.s
usubwb  z18.h, z19.h, z20.b
usubwt  z21.s, z22.s, z23.h
pmullb  z0.h, z1.b, z2.b
pmullt  z3.d, z4.s, z5.s
smullb  z6.h, z7.b, z8.b
smullt  z9.d, z10.s, z11.s
sqdmullb z12.h, z13.b, z14.b
sqdmullt z15.d, z16.s, z17.s
umullb  z18.h, z19.b, z20.b
umullt  z21.d, z22.s, z23.s
pmullb  z0.q, z1.d, z2.d
pmullt  z3.q, z4.d, z5.d
smmla   z0.s, z1.b, z2.b
ummla   z3.s, z4.b, z5.b
usmmla  z6.s, z7.b, z8.b
eorbt   z0.b, z1.b, z2.b
eorbt   z3.h, z4.h, z5.h
eortb   z6.s, z7.s, z8.s
eortb   z9.d, z10.d, z11.d
bdep    z0.b, z1.b, z2.b
bext    z3.h, z4.h, z5.h
bgrp    z6.s, z7.s, z8.s
bgrp    z9.d, z10.d, z11.d
saddlbt z0.h, z1.b, z2.b
ssublbt z3.s, z4.h, z5.h
ssubltb z6.d, z7.s, z8.s
saba    z0.b, z1.b, z2.b
saba    z3.h, z4.h, z5.h
uaba    z6.s, z7.s, z8.s
uaba    z9.d, z10.d, z11.d
sabalb  z0.h, z1.b, z2.b
sabalt  z3.s, z4.h, z5.h
uabalb  z6.d, z7.s, z8.s
uabalt  z9.h, z10.b, z11.b

cc @dotnet/arm64-contrib

Author: amanasifkhalid
Assignees: amanasifkhalid
Labels:

area-CodeGen-coreclr, arch-arm64-sve

Milestone: -

@ryujit-bot
Copy link

Diff results for #98832

Assembly diffs

Assembly diffs for linux/arm64 ran on windows/x64

Diffs are based on 2,554,585 contexts (1,019,526 MinOpts, 1,535,059 FullOpts).

MISSED contexts: 172 (0.01%)

Overall (+0 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs
libraries_tests.run.linux.arm64.Release.mch 381,315,748 +0 0.00%
FullOpts (+0 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs
libraries_tests.run.linux.arm64.Release.mch 165,778,652 +0 0.00%

Assembly diffs for linux/x64 ran on windows/x64

Diffs are based on 2,543,224 contexts (988,245 MinOpts, 1,554,979 FullOpts).

MISSED contexts: 177 (0.01%)

Overall (-3 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs
libraries_tests.run.linux.x64.Release.mch 329,922,608 -3 +0.16%
FullOpts (-3 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs
libraries_tests.run.linux.x64.Release.mch 147,120,622 -3 +0.16%

Assembly diffs for osx/arm64 ran on windows/x64

Diffs are based on 2,317,543 contexts (945,402 MinOpts, 1,372,141 FullOpts).

MISSED contexts: 170 (0.01%)

Overall (+12 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs
libraries_tests.run.osx.arm64.Release.mch 312,729,720 +12 +0.04%
FullOpts (+12 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs
libraries_tests.run.osx.arm64.Release.mch 111,327,984 +12 +0.04%

Assembly diffs for windows/arm64 ran on windows/x64

Diffs are based on 2,402,908 contexts (955,693 MinOpts, 1,447,215 FullOpts).

MISSED contexts: 174 (0.01%)

Overall (+0 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs
libraries_tests.run.windows.arm64.Release.mch 328,693,680 +0 0.00%
FullOpts (+0 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs
libraries_tests.run.windows.arm64.Release.mch 123,601,008 +0 0.00%

Details here


Throughput diffs

Throughput diffs for osx/arm64 ran on windows/x64

MinOpts (-0.01% to +0.00%)
Collection PDIFF
libraries.pmi.osx.arm64.checked.mch -0.01%

Throughput diffs for windows/arm64 ran on windows/x64

MinOpts (-0.00% to +0.01%)
Collection PDIFF
libraries.pmi.windows.arm64.checked.mch +0.01%

Details here


Copy link
Contributor

@TIHan TIHan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, glad the formats have been straightforward.

@amanasifkhalid
Copy link
Member Author

amanasifkhalid commented Feb 22, 2024

SPMI diffs look wrong for this PR? Those are the diffs I got for #98789; these changes shouldn't affect codegen at all.

Edit: This seems to be an issue on several PRs, so not related to this one.

@TIHan
Copy link
Contributor

TIHan commented Feb 22, 2024

Yea, your changes shouldn't cause any diffs - so I think it's just a fluke.

@amanasifkhalid amanasifkhalid merged commit 9512aab into dotnet:main Feb 23, 2024
127 of 129 checks passed
@amanasifkhalid amanasifkhalid deleted the sve-fl-3a branch February 23, 2024 01:05
@github-actions github-actions bot locked and limited conversation to collaborators Mar 24, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI arm-sve Work related to arm64 SVE/SVE2 support
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants