Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT ARM64-SVE: Add AW_2A to AZ_2A, BM_1A, BN_1A #99211

Merged
merged 8 commits into from
Mar 4, 2024

Conversation

amanasifkhalid
Copy link
Member

Part of #94549. Adds the following encodings:

  • SVE_AW_2A
  • SVE_AX_1A
  • SVE_AY_2A
  • SVE_AZ_2A
  • SVE_BM_1A
  • SVE_BN_1A

cstool output:

xar   z0.b, z0.b, z1.b, #1
xar   z2.b, z2.b, z3.b, #8
xar   z4.h, z4.h, z5.h, #2
xar   z6.h, z6.h, z7.h, #16
xar   z8.s, z8.s, z9.s, #3
xar   z10.s, z10.s, z11.s, #32
xar   z12.d, z12.d, z13.d, #4
xar   z14.d, z14.d, z15.d, #64
index z0.b, #-0x10, #0xF
index z1.h, #0xF, #-0x10
index z2.s, #0, #0
index z3.d, #-5, #5
index z0.b, #-0x10, w0
index z1.h, #0, w1
index z2.s, #5, w2
index z3.d, #10, x3
index z4.b, #-0x10, wzr
index z5.d, #15, xzr
index z0.b, w0, #-0x10
index z1.h, w1, #0
index z2.s, w2, #5
index z3.d, x3, #10
index z4.b, wzr, #-0x10
index z5.d, xzr, #15
decb  x0, pow2
decd  x1, vl16, mul #3
dech  x2, vl32, mul #5
decw  x3, vl64, mul #7
incb  x4, vl128, mul #9
incd  x5, mul3, mul #10
inch  x6, mul4, mul #13
incw  x7, all, mul #16
decd  z0.d, pow2
dech  z1.h, vl2, mul #2
decw  z2.s, vl3, mul #4
incd  z3.d, vl4, mul #8
inch  z4.h, vl5, mul #12
incw  z5.s, vl6, mul #16

JitDisasm output:

xar     z0.b, z0.b, z1.b, #1
xar     z2.b, z2.b, z3.b, #8
xar     z4.h, z4.h, z5.h, #2
xar     z6.h, z6.h, z7.h, #16
xar     z8.s, z8.s, z9.s, #3
xar     z10.s, z10.s, z11.s, #32
xar     z12.d, z12.d, z13.d, #4
xar     z14.d, z14.d, z15.d, #64
index   z0.b, #-16, #15
index   z1.h, #15, #-16
index   z2.s, #0, #0
index   z3.d, #-5, #5
index   z0.b, #-16, w0
index   z1.h, #0, w1
index   z2.s, #5, w2
index   z3.d, #10, x3
index   z4.b, #-16, wzr
index   z5.d, #15, xzr
index   z0.b, w0, #-16
index   z1.h, w1, #0
index   z2.s, w2, #5
index   z3.d, x3, #10
index   z4.b, wzr, #-16
index   z5.d, xzr, #15
decb    x0, pow2
decd    x1, vl16, mul #3
dech    x2, vl32, mul #5
decw    x3, vl64, mul #7
incb    x4, vl128, mul #9
incd    x5, mul3, mul #10
inch    x6, mul4, mul #13
incw    x7, all, mul #16
decd    z0.d, pow2
dech    z1.h, vl2, mul #2
decw    z2.s, vl3, mul #4
incd    z3.d, vl4, mul #8
inch    z4.h, vl5, mul #12
incw    z5.s, vl6, mul #16

cc @dotnet/arm64-contrib

@ghost ghost assigned amanasifkhalid Mar 3, 2024
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Mar 3, 2024
@ghost
Copy link

ghost commented Mar 3, 2024

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Part of #94549. Adds the following encodings:

  • SVE_AW_2A
  • SVE_AX_1A
  • SVE_AY_2A
  • SVE_AZ_2A
  • SVE_BM_1A
  • SVE_BN_1A

cstool output:

xar   z0.b, z0.b, z1.b, #1
xar   z2.b, z2.b, z3.b, #8
xar   z4.h, z4.h, z5.h, #2
xar   z6.h, z6.h, z7.h, #16
xar   z8.s, z8.s, z9.s, #3
xar   z10.s, z10.s, z11.s, #32
xar   z12.d, z12.d, z13.d, #4
xar   z14.d, z14.d, z15.d, #64
index z0.b, #-0x10, #0xF
index z1.h, #0xF, #-0x10
index z2.s, #0, #0
index z3.d, #-5, #5
index z0.b, #-0x10, w0
index z1.h, #0, w1
index z2.s, #5, w2
index z3.d, #10, x3
index z4.b, #-0x10, wzr
index z5.d, #15, xzr
index z0.b, w0, #-0x10
index z1.h, w1, #0
index z2.s, w2, #5
index z3.d, x3, #10
index z4.b, wzr, #-0x10
index z5.d, xzr, #15
decb  x0, pow2
decd  x1, vl16, mul #3
dech  x2, vl32, mul #5
decw  x3, vl64, mul #7
incb  x4, vl128, mul #9
incd  x5, mul3, mul #10
inch  x6, mul4, mul #13
incw  x7, all, mul #16
decd  z0.d, pow2
dech  z1.h, vl2, mul #2
decw  z2.s, vl3, mul #4
incd  z3.d, vl4, mul #8
inch  z4.h, vl5, mul #12
incw  z5.s, vl6, mul #16

JitDisasm output:

xar     z0.b, z0.b, z1.b, #1
xar     z2.b, z2.b, z3.b, #8
xar     z4.h, z4.h, z5.h, #2
xar     z6.h, z6.h, z7.h, #16
xar     z8.s, z8.s, z9.s, #3
xar     z10.s, z10.s, z11.s, #32
xar     z12.d, z12.d, z13.d, #4
xar     z14.d, z14.d, z15.d, #64
index   z0.b, #-16, #15
index   z1.h, #15, #-16
index   z2.s, #0, #0
index   z3.d, #-5, #5
index   z0.b, #-16, w0
index   z1.h, #0, w1
index   z2.s, #5, w2
index   z3.d, #10, x3
index   z4.b, #-16, wzr
index   z5.d, #15, xzr
index   z0.b, w0, #-16
index   z1.h, w1, #0
index   z2.s, w2, #5
index   z3.d, x3, #10
index   z4.b, wzr, #-16
index   z5.d, xzr, #15
decb    x0, pow2
decd    x1, vl16, mul #3
dech    x2, vl32, mul #5
decw    x3, vl64, mul #7
incb    x4, vl128, mul #9
incd    x5, mul3, mul #10
inch    x6, mul4, mul #13
incw    x7, all, mul #16
decd    z0.d, pow2
dech    z1.h, vl2, mul #2
decw    z2.s, vl3, mul #4
incd    z3.d, vl4, mul #8
inch    z4.h, vl5, mul #12
incw    z5.s, vl6, mul #16

cc @dotnet/arm64-contrib

Author: amanasifkhalid
Assignees: amanasifkhalid
Labels:

area-CodeGen-coreclr

Milestone: -

@@ -9300,6 +9378,7 @@ void emitter::emitIns_R_I_I(instruction ins,

id->idIns(ins);
id->idInsFmt(fmt);
id->idInsOpt(opt);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting that we weren't setting the opt before. Makes sense that we need to. I assume it doesn't cause any problems for the existing instructions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the existing instructions don't seem to check opt anywhere else -- not even to assert that it's INS_OPTS_NONE. That's the default value in emitIns_R_I_I, so I imagine always initializing idInsOpt is fine; I didn't notice any issues.

@@ -24094,8 +24345,9 @@ BYTE* emitter::emitOutput_InstrSve(BYTE* dst, instrDesc* id)
dst += emitOutput_Instr(dst, code);
break;

// Immediate and patterm to general purpose.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"patterm" xD

@@ -24628,7 +24935,7 @@ BYTE* emitter::emitOutput_InstrSve(BYTE* dst, instrDesc* id)
code = emitInsCodeSve(ins, fmt);
code |= insEncodeReg_V_4_to_0(id->idReg1()); // ddddd
code |= insEncodeReg_V_9_to_5(id->idReg2()); // nnnnn
code |= insEncodeSveElemsize_tszh_22_tszl_20_to_19(optGetSveElemsize(id->idInsOpt())); // xx
code |= insEncodeSveElemsize_tszh_23_tszl_20_to_19(optGetSveElemsize(id->idInsOpt())); // xx
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The existing uses for insEncodeSveElemsize_tszh_22_tszl_20_to_19 only assumed 1/2/4 byte sizes and not 8, so we should probably put an assert, before it assert(optGetSveElemsize(id->idInsOpt()) == EA_8BYTE), or we could just have two functions for tszh:tszl with one that allows 1/2/4 byte and the other 1/2/4/8 byte.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer the assert approach; I'll add those in.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for asserts. The size restriction is (usually) a property of the instruction operation, whereas the helper function only cares about bit encodings.

@@ -24638,7 +24945,7 @@ BYTE* emitter::emitOutput_InstrSve(BYTE* dst, instrDesc* id)
code |= insEncodeReg_V_4_to_0(id->idReg1()); // ddddd
code |= insEncodeReg_V_9_to_5(id->idReg2()); // nnnnn
code |= insEncodeUimm5_20_to_16(emitGetInsSC(id)); // iii
code |= insEncodeSveElemsize_tszh_22_tszl_20_to_19(optGetSveElemsize(id->idInsOpt())); // xx
code |= insEncodeSveElemsize_tszh_23_tszl_20_to_19(optGetSveElemsize(id->idInsOpt())); // xx
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as comment above regarding tszh:tszl.

@@ -24648,7 +24955,7 @@ BYTE* emitter::emitOutput_InstrSve(BYTE* dst, instrDesc* id)
code |= insEncodeReg_V_4_to_0(id->idReg1()); // ddddd
code |= insEncodeReg_V_9_to_5(id->idReg2()); // nnnnn
code |= insEncodeUimm5_20_to_16(insGetImmDiff(emitGetInsSC(id), id->idInsOpt())); // iii
code |= insEncodeSveElemsize_tszh_22_tszl_20_to_19(optGetSveElemsize(id->idInsOpt())); // xx
code |= insEncodeSveElemsize_tszh_23_tszl_20_to_19(optGetSveElemsize(id->idInsOpt())); // xx
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as comment above regarding tszh:tszl.

@@ -21738,6 +21922,10 @@ void emitter::emitIns_Call(EmitCallType callType,
assert(isValidUimm5From1(imm));
return (32 - imm);

case INS_OPTS_SCALABLE_D:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do the existing uses of insGetImmDiff assume only B, H, S? If so, we should either put asserts before calling insGetImmDiff ensuring that we don't accidently pass D, or just have two different functions. This is a similar suggestion to insEncodeSveElemsize_tszh_22_tszl_20_to_19 that I had.

Copy link
Contributor

@TIHan TIHan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just some suggestions for insGetImmDiff and insEncodeSveElemsize_tszh_23_tszl_20_to_19 if they are applicable.

Copy link
Contributor

@a74nh a74nh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming all the other review comments are fixed up, LGTM.

@amanasifkhalid
Copy link
Member Author

Thanks for the reviews!

@amanasifkhalid amanasifkhalid merged commit 962d15c into dotnet:main Mar 4, 2024
129 checks passed
@amanasifkhalid amanasifkhalid deleted the aw-2a branch March 4, 2024 14:40
@github-actions github-actions bot locked and limited conversation to collaborators Apr 4, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants