-
Notifications
You must be signed in to change notification settings - Fork 43
Add v8x16.shuffle1 instruction and rename v8x16.shuffle to v8x16.shuffle2_imm #71
Conversation
Thanks for following up! This will also need a corresponding encoding in the opcode table here. I suppose the least breaking way to add, and to maintain a logical grouping of opcode numbers is to introduce two new opcode numbers at the end for |
I was originally thinking that |
I agree that I would still suggest to keep the
|
Wikipedia distinguishes "permutations" from "n-tuples" (permutations with repetition):
To convey that this is a single-vector dynamic shuffle, in Rust, we ended up calling this operation |
A downside to including 2-input dynamic shuffle can be constructed as follows: |
Indeed, this is a valid emulation, but if hardware support for the |
One alternative naming scheme going forward is that If this seems reasonable then I can update this PR and the spec to suggest that this is the convention. |
IMO And maybe "immediate" is a more precise distinction than "dynamic? So something like: |
I'd like to figure out how to stop bikeshedding this. To me it doesn't truly matter whether to call this permute or shuffle, which suffix to add, and whether to rename the existing instruction (consistency vs churn) - what does matter is that this instruction lands. How do we proceed? |
Then you shouldn't have changed the name that was already chosen and agreed on (I don't say it was perfect). That being said, "bikeshedding" is important before a release (so before the MVP), as after it, names will be forever, and inconsistent names are frustrating and error-prone. In the end, I don't care which name is chosen, as long as it makes sense and is consistent. |
The initial version of this PR said |
To be precise, you were not ask to change it. But the entire community was asked if it would make sense to change (and my opinion would have been: "no it does not make sense to change").
I don't blame anybody, and I don't really care about the outcome (this instruction will be added whatever happens). I just want to explain why it ended up like this. |
I restored the proposal to its original form, since it seems like |
What do people think of |
I'm also fine keeping "permute" for a single vector and "shuffle" for two vectors. I think that gives a fair intuition of what they do. |
Note that this proposal is for a single-vector shuffle; two-vector shuffle isn’t as widely supported or applicable. So names need to be distinct and explicitly highlight the difference (either via a different name or a numeric suffix) |
Right, thanks for the clarification. I'm referring to the existing |
FWIW my thoughts on this is that for a programming language (like Rust, C++, etc.) the names by which these instructions at exposed at the language level matters because those are the names that the users interface with. Whether the C++ compiler lowers this to swizzle, shuffle, permute, or foobar, isn't that important IMO, and different architectures use different names for these already, so the compiler is going to be lowering to some of these names independently of what one chooses to do here. Choosing a similar name to one of the more mainstream architectures like x86 or arm is probably a better trade-off than inventing a new names just to be "technically correct", since that allows transferring knowledge from the mainstream assembly languages to wasm. |
The compiler implementation has to support the variable general shuffle function that is available now to support variables as an input to the shuffle anyway. A change to permute seems like a good idea to me. I like the idea of a naming convention of permute var and a permute imm with two input vectors as well. |
I wouldn't worry too much about names at this point. SIMD is still relatively early, and there will be plenty of time for bikeshedding names later (see |
@tlively Right, I guess what I meant that in the context of this discussion if we were to rename existing This does seem preferable - if the current shuffle name isn't set in stone then maybe we should call both |
As the opcode number of the shuffle instruction is being changed, and the any implementations will need to be updated anyway, changing the name to be more indicative makes sense. I like @AndrewScheidecker's suggestion above of using just the _imm prefix to distinguish, because that's more precise with what the operation is doing. |
Sounds good - I was thinking that renaming after changing opcode might be beneficial to reduce confusion anyway. |
I've updated the proposal to rename both shuffle instructions as follows:
In the future this naming scheme will allow us to ship more variants without friction, such as v8x16.shuffle2 (non-immediate version with out-of-range handling, this is supported by several architectures and can be trivially emulated using shuffle1 on others), v8x16.shuffle1_imm (this is a weaker version of v8x16.shuffle2_imm, which could be encoded using 8 bytes for the mask instead of 16 bytes for shuffle2_imm, which could help code size) or any other variants. Hopefully this is a reasonable compromise given that the names can be changed in the future before the spec is finalized. |
This change adds a variable shuffle instruction to SIMD proposal. When indices are out of range, the result is specified as 0 for each lane. This matches hardware behavior on ARM and RISCV architectures. On x86_64 and MIPS, the hardware provides instructions that can select 0 when the high bit is set to 1 (x86_64) or any of the two high bits are set to 1 (MIPS). On these architectures, the backend is expected to emit a pair of instructions, saturating add (saturate(x + (128 - 16)) for x86_64) and permute, to emulate the proposed behavior. To distinguish variable shuffles with immediate shuffles, existing v8x16.shuffle instruction is renamed to v8x16.shuffle2_imm to be explicit about the fact that it shuffles two vectors with an immediate argument. This naming scheme allows for adding variants like v8x16.shuffle2 and v8x16.shuffle1_imm in the future. Fixes #68. Contributes to #24. Fixes #11.
I think merging this into the proposal should not be blocked on resolving the names. The opcodes and semantics are the important part, and I haven't heard any pushback on those. So I think if @dtig and @arunetm sign off on this, then it is good to be merged. Back on the name bikeshedding, I'm not a fan of having numbers in the instruction names (apart from the type prefix, of course). No other WebAssembly instruction includes numbers like that. This is why I prefer using I also like the idea of having suffixes to identify the selectors. I think Let's have a non-binding, informational vote! |
I agree that this is good to be merged, approving this change. As pointed out a couple of this times on this thread, the SIMD proposal is still in it's early stages, and we can continue to bikeshed the names without blocking including the opcode. There is broad consensus on the semantics of a dynamic shuffle that shuffles the contents of one vector using lanes of the second vector as indices, and it is well supported by hardware, and we have performance numbers from @zeux that point to the usefulness of including the shuffle. As to naming, I apologize I wasn't clear in my previous comment about using the _imm suffix to distinguish. My comment was only about using the _imm suffix, and not the inclusion of numbers in the opcode name. I agree with tlively@ that having numbers in opcode names is inconsistent with the rest of the WebAssembly opcodes, this can always be revisited in the future if/when including other shuffle variants makes sense. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor nitpicks
|
||
``` | ||
v8x16.shuffle i5 i5 i5 i5 i5 i5 i5 i5 i5 i5 i5 i5 i5 i5 i5 i5 | ||
v8x16.shuffle2_imm i5 i5 i5 i5 i5 i5 i5 i5 i5 i5 i5 i5 i5 i5 i5 i5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit confusing. These are not really i5
s, but they are i8
s restricted to the i5
domain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this should use LaneIdx32
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is specifying text format so I believe i5 is accurate.
@@ -294,7 +294,7 @@ return. The indices `i` in range `[0, 15]` select the `i`-th element of `a`. The | |||
indices in range `[16, 31]` select the `i - 16`-th element of `b`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should IMO also mention what happens with out-of-range indices (e.g. validation error) for consistency with the variable index shuffle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding was that LaneIdx32 implies that indices must be 0-31. The encoding for these isn't documented atm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not strictly necessary to document these here. The definition of LaneIdx32
states:
[...] Many have a limited valid range, and it is a validation error if the immediate operands are out of range.
ImmLaneIdx32: A byte with values in the range 0–31 identifying a lane.
but I think a "(note: out-of-range indices are a validation error)" might be helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not the final spec text, so as long as the intended semantics and opcodes are clear it should be good to go.
If numbers are an issue, we can always spell them out: |
This change adds a dynamic permute instruction to SIMD proposal.
The name "permute", as opposed to shuffle, is used to distinguish it frominstructions that operate on two input vectors.
When indices are out of range, the result is specified as 0 for each
lane. This matches hardware behavior on ARM and RISCV architectures.
On x86_64 and MIPS, the hardware provides instructions that can select 0
when the high bit is set to 1 (x86_64) or any of the two high bits are
set to 1 (MIPS). On these architectures, the backend is expected to emit
a pair of instructions, saturating byte add (
saturate(x + (128 - 16))
forx86_64) and permute, to emulate the proposed behavior.
Fixes #68.
Contributes to #24.
Fixes #11.