-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Arm64] Vector Load/Store structure instructions #33461
[Arm64] Vector Load/Store structure instructions #33461
Conversation
* Load/Store multiple structures base register * Load single structure and replicate base register
* Load/Store multiple structures post-indexed by an immediate * Load/Store single structure base register
* Load single structure and replicate post-indexed by an immediate
* Load/Store multiple structures post-indexed by a register * Load single structure and replicate post-indexed by a register
* Load/Store single structure post-indexed by a register
* Load/Store single structure post-indexed by an immediate
c0754ee
to
7ecdcc6
Compare
7ecdcc6
to
1e482c5
Compare
@dotnet/jit-contrib PTAL |
This looks reasonable - is the plan to put these in now, and then defer the associated intrinsics that require the register allocator to support allocating contiguous registers? |
Yes, LD1/ST1 (multple structures) will used via LoadVector64/LoadVector128/Store and there is a proposal how to utilize LD1R #33490 But this is almost it for now. However, we should decide how we are going to expose contiguous registers in API soon. |
We can bring this up in the upcoming API review next Tuesday (the 17th). |
@dotnet/jit-contrib ping |
src/coreclr/src/jit/emitarm64.cpp
Outdated
// | ||
/*static*/ unsigned emitter::insGetLoadStoreVectorSelem(instruction ins) | ||
{ | ||
unsigned selem = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we call this elementCount
or structureElements
or something a bit more descriptive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(same goes for the method name)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, I can rename it to insGetLoadStoreVectorStructureElements
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After more carefull reading Arm specs I don't think it should be called insGetLoadStoreVectorStructureElements
It should be insGetLoadStoreRegisterListSize
For LD1 (2 registers) - number of structure elements = 1 since the instruction loads 2 single-element structures to 2 registers.
And this function reflects how many registers ins
operates on
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's sounds good. I just don't think Selem
is very self explanatory.
* Display a register | ||
*/ | ||
//------------------------------------------------------------------------ | ||
// emitDispReg: Display a general-purpose register name or SIMD and floating-point scalar register name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: If we are updating these, we should likely do it fully and also document the parameters, etc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree. will do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM, just a comment about not liking the selem
name.
It also would have been nice to separate out the refactorings into a separate PR to help with review.
Just realized, what about C# tests for the functions? |
@tannergooding this part is only implementing instructions in the backend - there is a separate wip PR for Store intrinsic #33535 - where I have c# tests |
Thanks for the review! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With comment changes
elemsize = optGetElemsize(id->idInsOpt()); | ||
imm = emitGetInsSC(id); | ||
case IF_LS_2F: // LS_2F .Q.............. ...Sssnnnnnttttt Vt[] Rn | ||
case IF_LS_2G: // LS_2G .Q.............. ...Sssnnnnnttttt Vt[] Rn |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should have the xx
fields as well:
.Q.............. xx.Sssnnnnnttttt Vt[] Rn
emitDispReg(id->idReg1(), emitInsTargetRegSize(id), true); | ||
emitDispAddrRI(id->idReg2(), id->idInsOpt(), 0); | ||
case IF_LS_2F: // LS_2F .Q.............. ...Sssnnnnnttttt Vt[] Rn | ||
case IF_LS_2G: // LS_2G .Q.............. ...Sssnnnnnttttt Vt[] Rn |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should have the xx fields as well:
.Q.............. xx.Sssnnnnnttttt Vt[] Rn
case IF_LS_2D: // LS_2D .Q.............. ....ssnnnnnttttt Vt Rn | ||
case IF_LS_2E: // LS_2E .Q.............. ....ssnnnnnttttt Vt Rn | ||
case IF_LS_2F: // LS_2F .Q.............. ...Sssnnnnnttttt Vt[] Rn | ||
case IF_LS_2G: // LS_2G .Q.............. ...Sssnnnnnttttt Vt[] Rn |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two IF_LS_2F and IF_LS_2G should have the xx fields as well:
.Q.............. xx.Sssnnnnnttttt Vt[] Rn
case IF_LS_2D: // LS_2D .Q.............. ....ssnnnnnttttt Vt Rn | ||
case IF_LS_2E: // LS_2E .Q.............. ....ssnnnnnttttt Vt Rn | ||
case IF_LS_2F: // LS_2F .Q.............. ...Sssnnnnnttttt Vt[] Rn | ||
case IF_LS_2G: // LS_2G .Q.............. ...Sssnnnnnttttt Vt[] Rn |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two IF_LS_2F and IF_LS_2G should have the xx fields as well:
.Q.............. xx.Sssnnnnnttttt Vt[] Rn
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will make these changes in #33535
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just a few nits
@@ -39,7 +39,10 @@ void emitDispLSExtendOpts(insOpts opt); | |||
void emitDispReg(regNumber reg, emitAttr attr, bool addComma); | |||
void emitDispVectorReg(regNumber reg, insOpts opt, bool addComma); | |||
void emitDispVectorRegIndex(regNumber reg, emitAttr elemsize, ssize_t index, bool addComma); | |||
void emitDispVectorRegList(regNumber firstReg, unsigned listSize, insOpts opt, bool addComma); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: listSize => listLength? (same below)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will follow up in #33535
void emitDispArrangement(insOpts opt); | ||
void emitDispElemsize(emitAttr elemsize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: capitalize "size", emitDispElemsize
=> emitDispElemSize
?
@@ -445,6 +448,10 @@ static emitAttr optGetSrcsize(insOpts conversion); | |||
// for an element of size 'elemsize' in a vector register of size 'datasize' | |||
static bool isValidVectorIndex(emitAttr datasize, emitAttr elemsize, ssize_t index); | |||
|
|||
// For a given Load/Store Vector instruction 'ins' returns a number of consecutive SIMD registers | |||
// the instruction loads to/store from. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: "store from" => "stores from"
@@ -445,6 +448,10 @@ static emitAttr optGetSrcsize(insOpts conversion); | |||
// for an element of size 'elemsize' in a vector register of size 'datasize' | |||
static bool isValidVectorIndex(emitAttr datasize, emitAttr elemsize, ssize_t index); | |||
|
|||
// For a given Load/Store Vector instruction 'ins' returns a number of consecutive SIMD registers | |||
// the instruction loads to/store from. | |||
static unsigned insGetLoadStoreRegisterListSize(instruction ins); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: ListSize => ListLength?
// ld2 {Vt,Vt2},[Xn] LS_2D 0Q00110001000000 1000ssnnnnnttttt 0C40 8000 base register | ||
// ld2 {Vt,Vt2},[Xn],Xm LS_3F 0Q001100110mmmmm 1000ssnnnnnttttt 0CC0 8000 post-indexed by a register | ||
// ld2 {Vt,Vt2},[Xn],#imm LS_2E 0Q001100110mmmmm 1000ssnnnnnttttt 0CDF 8000 post-indexed by an immediate | ||
// ld2 {Vt,Vt2}[],[Xn] LS_2F 0Q00110101100000 xx0Sssnnnnnttttt 0D60 0000 base register |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it would be helpful to include a comment for each of these that would lead you to the correct reference manual page, e.g., "C7.2.174 LD2 (single structure)" (hopefully ARM doesn't renumber these... but at least the name would be the same)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's a good idea - I will add
@@ -378,6 +436,56 @@ INST3(mvn, "mvn", 0, 0, IF_EN3I, 0x2A2003E0, 0x2A2003E0, 0x2E205800) | |||
// mvn Rd,(Rm,shk,imm) DR_2F X0101010sh1mmmmm iiiiii11111ddddd 2A20 03E0 Rm {LSL,LSR,ASR} imm(0-63) | |||
// mvn Vd,Vn DV_2M 0Q10111000100000 010110nnnnnddddd 2E20 5800 Vd,Vn (vector) | |||
|
|||
// enum name FP LD/ST LS_2D LS_3F LS_2E | |||
INST3(ld1_2regs,"ld1", 0,LD, IF_EN3J, 0x0C40A000, 0x0CC0A000, 0x0CDFA000) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess you couldn't figure out a way to merge these (ld1_2regs, ld1_3regs, ld1_4regs, st1_2regs, etc.) with their respective ld1/st1 definitions, above? Or create a single ld1_multiregs/st1_multiregs that is distinguished in code with insOpts or similar?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I have considered different approaches while adding these instructions.
Since multiple registers variants exist only for a small number of instructions, namely ld[1234]
, ld[1234]r
,st[1234]
,tbx
and tbl
and only a subset of those require a way to specify a number of registers (since it's implied for ld[1234]r
, ld[234]
, st[234]
) I decided not to disrupt other emitter parts with adding number of registers to insOpts
.
@@ -5219,6 +5219,726 @@ void CodeGen::genArm64EmitterUnitTests() | |||
|
|||
#endif // ALL_ARM64_EMITTER_UNIT_TESTS | |||
|
|||
#ifdef ALL_ARM64_EMITTER_UNIT_TESTS | |||
// | |||
// Loads to /Stores from one, two, three, or four SIMD&FP registers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Loads to /Stores" => "Loads to and Stores from"? (same below)
I thought "/Stores" was a typo so I was confused reading it.
// ins - A Load/Store Vector instruction (e.g. ld1 (2 registers), ld1r, st1). | ||
// | ||
// Return value: | ||
// A number of consecutive SIMD and floating-point registers the instruction loads to/store from. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: "store from"=>"stores from"
* Display an vector register index suffix | ||
*/ | ||
//------------------------------------------------------------------------ | ||
// emitDispVectorRegIndex: Display a SIMD vector register name with element index |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this be clearer named emitDispVectorRegWithIndex
? or emitDispVectorRegWithSizeAndIndex
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "official" name for this is SIMD vector element name (see C1.2.5 Register names) so we might also rename it to emitDispVectorElementName
//------------------------------------------------------------------------ | ||
// emitDispVectorElemList: Display a SIMD vector element list | ||
// | ||
void emitter::emitDispVectorElemList( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this be clearer as emitDispVectorRegListWithSize
? As named, I wasn't sure the difference between emitDispVectorRegList
and emitDispVectorElemList
-- namely, what is a "VectorElem"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These terminology from C1.2.5 Register names - I don't have a strong preference how to name the function - I tried to follow what the Arm docs said.
The two Linux sigsegv failures are due to an issue reported in #33562 (comment). An arm64 leg passes, so merging. |
This adds support in the JIT emitter for Vector Load/Store structure instructions:
in the following addressing modes:
I supported multiple structures as well as single structure variants.
This also adds support in JitDump for printing
ld1 {v5.16b, v6.16b, v7.16b, v8.16b}, [x9]
st1 {v0.b}[3], [x1],#1
PerfScore numbers are according to Arm Cortex-A55 Software Optimization Guide
I validated the correctness of the instructions' encodings by comparing JitDump with WinDbg u command output. I attached the outputs of both:
jitDump.txt
windbg-u.txt
Examples of the instruction usages can be found in codegenarm64.cpp in Arm64 emitter unit tests collection.
This is needed for #24771