Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT ARM64_SVE: Add IF_SVE_BR_3A to IF_SVE_EX_3A #98722

Merged
merged 7 commits into from
Feb 21, 2024

Conversation

amanasifkhalid
Copy link
Member

Part of #94549. Adds the following encodings:

  • IF_SVE_BR_3A
  • IF_SVE_BR_3B
  • IF_SVE_BZ_3A
  • IF_SVE_BZ_3A_A
  • IF_SVE_CA_3A
  • IF_SVE_EH_3A
  • IF_SVE_EL_3A
  • IF_SVE_EM_3A
  • IF_SVE_EN_3A
  • IF_SVE_EO_3A
  • IF_SVE_EV_3A
  • IF_SVE_EX_3A

cstool output:

trn1  z0.b, z1.b, z2.b
trn1  z3.h, z4.h, z5.h
trn2  z6.s, z7.s, z8.s
trn2  z9.d, z10.d, z11.d
uzp1  z12.b, z13.b, z14.b
uzp1  z15.h, z16.h, z17.h
uzp2  z18.s, z19.s, z20.s
uzp2  z21.d, z22.d, z23.d
zip1  z24.b, z25.b, z26.b
zip1  z27.h, z28.h, z29.h
zip2  z30.s, z31.s, z0.s
zip2  z1.d, z2.d, z3.d
trn1  z0.q, z1.q, z2.q
trn2  z3.q, z4.q, z5.q
uzp1  z6.q, z7.q, z8.q
uzp2  z9.q, z10.q, z11.q
zip1  z12.q, z13.q, z14.q
zip2  z15.q, z16.q, z17.q
tbl   z0.b, { z1.b }, z2.b
tbl   z3.h, { z4.h }, z5.h
tbx   z6.s, z7.s, z8.s
tbx   z9.d, z10.d, z11.d
tbl   z0.b, { z1.b, z2.b }, z2.b
tbl   z3.h, { z4.h, z5.h }, z5.h
tbl   z6.s, { z7.s, z8.s }, z8.s
tbl   z9.d, { z10.d, z11.d }, z11.d
tbxq  z0.b, z1.b, z2.b
tbxq  z3.h, z4.h, z5.h
tbxq  z6.s, z7.s, z8.s
tbxq  z9.d, z10.d, z11.d
sdot  z0.s, z1.b, z2.b
sdot  z3.d, z4.h, z5.h
udot  z6.s, z7.b, z8.b
udot  z9.d, z10.h, z11.h
smlalb        z0.h, z1.b, z2.b
smlalt        z3.s, z4.h, z5.h
smlslb        z6.d, z7.s, z8.s
smlslt        z9.h, z10.b, z11.b
umlalb        z12.s, z13.h, z14.h
umlalt        z15.d, z16.s, z17.s
umlslb        z18.h, z19.b, z20.b
umlslt        z21.s, z22.h, z23.h
sqrdmlah      z0.b, z1.b, z2.b
sqrdmlah      z3.h, z4.h, z5.h
sqrdmlsh      z6.s, z7.s, z8.s
sqrdmlsh      z9.d, z10.d, z11.d
sqdmlalbt     z0.h, z1.b, z2.b
sqdmlslbt     z3.s, z4.h, z5.h
sqdmlslbt     z6.d, z7.s, z8.s
sqdmlalb      z0.h, z1.b, z2.b
sqdmlalt      z3.s, z4.h, z5.h
sqdmlslb      z6.d, z7.s, z8.s
sqdmlslt      z9.h, z10.b, z11.b
sclamp        z0.b, z1.b, z2.b
sclamp        z3.h, z4.h, z5.h
uclamp        z6.s, z7.s, z8.s
uclamp        z9.d, z10.d, z11.d
tblq  z0.b, { z1.b }, z2.b
uzpq1 z3.h, z4.h, z5.h
uzpq2 z6.s, z7.s, z8.s
zipq1 z9.d, z10.d, z11.d
zipq2 z12.b, z13.b, z14.b

JitDisasm output:

trn1    z0.b, z1.b, z2.b
trn1    z3.h, z4.h, z5.h
trn2    z6.s, z7.s, z8.s
trn2    z9.d, z10.d, z11.d
uzp1    z12.b, z13.b, z14.b
uzp1    z15.h, z16.h, z17.h
uzp2    z18.s, z19.s, z20.s
uzp2    z21.d, z22.d, z23.d
zip1    z24.b, z25.b, z26.b
zip1    z27.h, z28.h, z29.h
zip2    z30.s, z31.s, z0.s
zip2    z1.d, z2.d, z3.d
trn1    z0.q, z1.q, z2.q
trn2    z3.q, z4.q, z5.q
uzp1    z6.q, z7.q, z8.q
uzp2    z9.q, z10.q, z11.q
zip1    z12.q, z13.q, z14.q
zip2    z15.q, z16.q, z17.q
tbl     z0.b, { z1.b }, z2.b
tbl     z3.h, { z4.h }, z5.h
tbx     z6.s, z7.s, z8.s
tbx     z9.d, z10.d, z11.d
tbl     z0.b, { z1.b, z2.b }, z2.b
tbl     z3.h, { z4.h, z5.h }, z5.h
tbl     z6.s, { z7.s, z8.s }, z8.s
tbl     z9.d, { z10.d, z11.d }, z11.d
tbxq    z0.b, z1.b, z2.b
tbxq    z3.h, z4.h, z5.h
tbxq    z6.s, z7.s, z8.s
tbxq    z9.d, z10.d, z11.d
sdot    z0.s, z1.b, z2.b
sdot    z3.d, z4.h, z5.h
udot    z6.s, z7.b, z8.b
udot    z9.d, z10.h, z11.h
smlalb  z0.h, z1.b, z2.b
smlalt  z3.s, z4.h, z5.h
smlslb  z6.d, z7.s, z8.s
smlslt  z9.h, z10.b, z11.b
umlalb  z12.s, z13.h, z14.h
umlalt  z15.d, z16.s, z17.s
umlslb  z18.h, z19.b, z20.b
umlslt  z21.s, z22.h, z23.h
sqrdmlah z0.b, z1.b, z2.b
sqrdmlah z3.h, z4.h, z5.h
sqrdmlsh z6.s, z7.s, z8.s
sqrdmlsh z9.d, z10.d, z11.d
sqdmlalbt z0.h, z1.b, z2.b
sqdmlslbt z3.s, z4.h, z5.h
sqdmlslbt z6.d, z7.s, z8.s
sqdmlalb z0.h, z1.b, z2.b
sqdmlalt z3.s, z4.h, z5.h
sqdmlslb z6.d, z7.s, z8.s
sqdmlslt z9.h, z10.b, z11.b
sclamp  z0.b, z1.b, z2.b
sclamp  z3.h, z4.h, z5.h
uclamp  z6.s, z7.s, z8.s
uclamp  z9.d, z10.d, z11.d
tblq    z0.b, { z1.b }, z2.b
uzpq1   z3.h, z4.h, z5.h
uzpq2   z6.s, z7.s, z8.s
zipq1   z9.d, z10.d, z11.d
zipq2   z12.b, z13.b, z14.b

cc @dotnet/arm64-contrib

@amanasifkhalid amanasifkhalid added the arm-sve Work related to arm64 SVE/SVE2 support label Feb 20, 2024
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Feb 20, 2024
@ghost ghost assigned amanasifkhalid Feb 20, 2024
@ghost
Copy link

ghost commented Feb 20, 2024

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Part of #94549. Adds the following encodings:

  • IF_SVE_BR_3A
  • IF_SVE_BR_3B
  • IF_SVE_BZ_3A
  • IF_SVE_BZ_3A_A
  • IF_SVE_CA_3A
  • IF_SVE_EH_3A
  • IF_SVE_EL_3A
  • IF_SVE_EM_3A
  • IF_SVE_EN_3A
  • IF_SVE_EO_3A
  • IF_SVE_EV_3A
  • IF_SVE_EX_3A

cstool output:

trn1  z0.b, z1.b, z2.b
trn1  z3.h, z4.h, z5.h
trn2  z6.s, z7.s, z8.s
trn2  z9.d, z10.d, z11.d
uzp1  z12.b, z13.b, z14.b
uzp1  z15.h, z16.h, z17.h
uzp2  z18.s, z19.s, z20.s
uzp2  z21.d, z22.d, z23.d
zip1  z24.b, z25.b, z26.b
zip1  z27.h, z28.h, z29.h
zip2  z30.s, z31.s, z0.s
zip2  z1.d, z2.d, z3.d
trn1  z0.q, z1.q, z2.q
trn2  z3.q, z4.q, z5.q
uzp1  z6.q, z7.q, z8.q
uzp2  z9.q, z10.q, z11.q
zip1  z12.q, z13.q, z14.q
zip2  z15.q, z16.q, z17.q
tbl   z0.b, { z1.b }, z2.b
tbl   z3.h, { z4.h }, z5.h
tbx   z6.s, z7.s, z8.s
tbx   z9.d, z10.d, z11.d
tbl   z0.b, { z1.b, z2.b }, z2.b
tbl   z3.h, { z4.h, z5.h }, z5.h
tbl   z6.s, { z7.s, z8.s }, z8.s
tbl   z9.d, { z10.d, z11.d }, z11.d
tbxq  z0.b, z1.b, z2.b
tbxq  z3.h, z4.h, z5.h
tbxq  z6.s, z7.s, z8.s
tbxq  z9.d, z10.d, z11.d
sdot  z0.s, z1.b, z2.b
sdot  z3.d, z4.h, z5.h
udot  z6.s, z7.b, z8.b
udot  z9.d, z10.h, z11.h
smlalb        z0.h, z1.b, z2.b
smlalt        z3.s, z4.h, z5.h
smlslb        z6.d, z7.s, z8.s
smlslt        z9.h, z10.b, z11.b
umlalb        z12.s, z13.h, z14.h
umlalt        z15.d, z16.s, z17.s
umlslb        z18.h, z19.b, z20.b
umlslt        z21.s, z22.h, z23.h
sqrdmlah      z0.b, z1.b, z2.b
sqrdmlah      z3.h, z4.h, z5.h
sqrdmlsh      z6.s, z7.s, z8.s
sqrdmlsh      z9.d, z10.d, z11.d
sqdmlalbt     z0.h, z1.b, z2.b
sqdmlslbt     z3.s, z4.h, z5.h
sqdmlslbt     z6.d, z7.s, z8.s
sqdmlalb      z0.h, z1.b, z2.b
sqdmlalt      z3.s, z4.h, z5.h
sqdmlslb      z6.d, z7.s, z8.s
sqdmlslt      z9.h, z10.b, z11.b
sclamp        z0.b, z1.b, z2.b
sclamp        z3.h, z4.h, z5.h
uclamp        z6.s, z7.s, z8.s
uclamp        z9.d, z10.d, z11.d
tblq  z0.b, { z1.b }, z2.b
uzpq1 z3.h, z4.h, z5.h
uzpq2 z6.s, z7.s, z8.s
zipq1 z9.d, z10.d, z11.d
zipq2 z12.b, z13.b, z14.b

JitDisasm output:

trn1    z0.b, z1.b, z2.b
trn1    z3.h, z4.h, z5.h
trn2    z6.s, z7.s, z8.s
trn2    z9.d, z10.d, z11.d
uzp1    z12.b, z13.b, z14.b
uzp1    z15.h, z16.h, z17.h
uzp2    z18.s, z19.s, z20.s
uzp2    z21.d, z22.d, z23.d
zip1    z24.b, z25.b, z26.b
zip1    z27.h, z28.h, z29.h
zip2    z30.s, z31.s, z0.s
zip2    z1.d, z2.d, z3.d
trn1    z0.q, z1.q, z2.q
trn2    z3.q, z4.q, z5.q
uzp1    z6.q, z7.q, z8.q
uzp2    z9.q, z10.q, z11.q
zip1    z12.q, z13.q, z14.q
zip2    z15.q, z16.q, z17.q
tbl     z0.b, { z1.b }, z2.b
tbl     z3.h, { z4.h }, z5.h
tbx     z6.s, z7.s, z8.s
tbx     z9.d, z10.d, z11.d
tbl     z0.b, { z1.b, z2.b }, z2.b
tbl     z3.h, { z4.h, z5.h }, z5.h
tbl     z6.s, { z7.s, z8.s }, z8.s
tbl     z9.d, { z10.d, z11.d }, z11.d
tbxq    z0.b, z1.b, z2.b
tbxq    z3.h, z4.h, z5.h
tbxq    z6.s, z7.s, z8.s
tbxq    z9.d, z10.d, z11.d
sdot    z0.s, z1.b, z2.b
sdot    z3.d, z4.h, z5.h
udot    z6.s, z7.b, z8.b
udot    z9.d, z10.h, z11.h
smlalb  z0.h, z1.b, z2.b
smlalt  z3.s, z4.h, z5.h
smlslb  z6.d, z7.s, z8.s
smlslt  z9.h, z10.b, z11.b
umlalb  z12.s, z13.h, z14.h
umlalt  z15.d, z16.s, z17.s
umlslb  z18.h, z19.b, z20.b
umlslt  z21.s, z22.h, z23.h
sqrdmlah z0.b, z1.b, z2.b
sqrdmlah z3.h, z4.h, z5.h
sqrdmlsh z6.s, z7.s, z8.s
sqrdmlsh z9.d, z10.d, z11.d
sqdmlalbt z0.h, z1.b, z2.b
sqdmlslbt z3.s, z4.h, z5.h
sqdmlslbt z6.d, z7.s, z8.s
sqdmlalb z0.h, z1.b, z2.b
sqdmlalt z3.s, z4.h, z5.h
sqdmlslb z6.d, z7.s, z8.s
sqdmlslt z9.h, z10.b, z11.b
sclamp  z0.b, z1.b, z2.b
sclamp  z3.h, z4.h, z5.h
uclamp  z6.s, z7.s, z8.s
uclamp  z9.d, z10.d, z11.d
tblq    z0.b, { z1.b }, z2.b
uzpq1   z3.h, z4.h, z5.h
uzpq2   z6.s, z7.s, z8.s
zipq1   z9.d, z10.d, z11.d
zipq2   z12.b, z13.b, z14.b

cc @dotnet/arm64-contrib

Author: amanasifkhalid
Assignees: amanasifkhalid
Labels:

area-CodeGen-coreclr, arch-arm64-sve

Milestone: -

@TIHan
Copy link
Contributor

TIHan commented Feb 20, 2024

cc @a74nh

case IF_SVE_EN_3A: // ........xx.mmmmm ......nnnnnddddd -- SVE2 saturating multiply-add interleaved long
case IF_SVE_EO_3A: // ........xx.mmmmm ......nnnnnddddd -- SVE2 saturating multiply-add long
case IF_SVE_EV_3A: // ........xx.mmmmm ......nnnnnddddd -- SVE integer clamp
case IF_SVE_EX_3A: // ........xx.mmmmm ......nnnnnddddd -- SVE permute vector elements (quadwords)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's nice that you didn't have to add any additional encodings.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, these ones were pretty slick; the next batch should be similarly easy.

Copy link
Contributor

@TIHan TIHan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@ryujit-bot
Copy link

Diff results for #98722

Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

MinOpts (0.00% to +0.01%)
Collection PDIFF
libraries.pmi.linux.arm64.checked.mch +0.01%

Throughput diffs for osx/arm64 ran on windows/x64

MinOpts (-0.01% to +0.00%)
Collection PDIFF
libraries.pmi.osx.arm64.checked.mch -0.01%

Details here


Copy link
Contributor

@a74nh a74nh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. These are all following existing formats and styles.

@amanasifkhalid amanasifkhalid merged commit 52e1858 into dotnet:main Feb 21, 2024
129 checks passed
@amanasifkhalid amanasifkhalid deleted the sve-br-3a branch February 21, 2024 13:53
@github-actions github-actions bot locked and limited conversation to collaborators Mar 23, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI arm-sve Work related to arm64 SVE/SVE2 support
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants