Add v8x16.shuffle1 instruction and rename v8x16.shuffle to v8x16.shuffle2_imm #71

zeux · 2019-03-12T07:03:28Z

This change adds a dynamic permute instruction to SIMD proposal.

The name "permute", as opposed to shuffle, is used to distinguish it from
instructions that operate on two input vectors.

When indices are out of range, the result is specified as 0 for each
lane. This matches hardware behavior on ARM and RISCV architectures.

On x86_64 and MIPS, the hardware provides instructions that can select 0
when the high bit is set to 1 (x86_64) or any of the two high bits are
set to 1 (MIPS). On these architectures, the backend is expected to emit
a pair of instructions, saturating byte add (saturate(x + (128 - 16)) for
x86_64) and permute, to emulate the proposed behavior.

Fixes #68.
Contributes to #24.
Fixes #11.

dtig · 2019-03-12T07:21:33Z

Thanks for following up! This will also need a corresponding encoding in the opcode table here. I suppose the least breaking way to add, and to maintain a logical grouping of opcode numbers is to introduce two new opcode numbers at the end for v8x16.shuffle, and the permute instruction and leave 0x03 to be a reserved opcode. Any objections to calling this v8x16.permute instead of v8x16.permute_dyn? I think permute distinguishes sufficiently from the shuffle without the _dyn suffix, but I don't feel very strongly about this.

zeux · 2019-03-12T07:28:48Z

I was originally thinking that _dyn provides for a nice extension going forward to add permute but I now realize that immediate version of permute is redundant wrt existing shuffle modulo minor code size concerns. I'll rename to just "permute" and add the encoding entries to the end of the table as suggested, leaving a bit of a gap for existing floating point instructions similar to the integer instructions (gap around 0x35).

lemaitre · 2019-03-12T07:40:58Z

I agree that permute with immediate shuffling rule is redundant with shuffle and may never be added.

I would still suggest to keep the _dyn suffix.
Because we might end up adding a shuffle_dyn instruction, and it would be weird and inconsistent to have:

shuffle: 2 inputs, immediate shuffling rule
shuffle_dyn: 2 input, dynamic shuffling rule
permute: 1 input, dynamic shuffling rule

gnzlbg · 2019-03-12T07:45:10Z

Wikipedia distinguishes "permutations" from "n-tuples" (permutations with repetition):

Ordered arrangements of the elements of a set S of length n where repetition is allowed are called n-tuples, but have sometimes been referred to as permutations with repetition although they are not permutations in general.

To convey that this is a single-vector dynamic shuffle, in Rust, we ended up calling this operation shuffle1_dyn to distinguish it from "permutation" (without element repetition) and to leave the door open for adding a two-vector shuffle_dyn in the future.

zeux · 2019-03-12T07:45:58Z

A downside to including _dyn suffix is that in the current specification it looks out of place since all other instructions assume a specific encoding.

2-input dynamic shuffle can be constructed as follows:
or(permute(a, s), permute(b, s - 16))
I don't think the native version has hardware support that's as universal as permute although I haven't researched this fully. (edit I guess the missing archs are x64 and RISCV, seems like other archs support this)

lemaitre · 2019-03-12T07:53:45Z

2-input dynamic shuffle can be constructed as follows:
or(permute(a, s), permute(b, s - 16))
I don't think the native version has hardware support that's as universal as permute although I haven't researched this fully.

Indeed, this is a valid emulation, but if hardware support for the permute_dyn increases in the future, we might still want a dedicated instruction (like we have an instruction for f32x4.max which is easily emulateable).
In other words, keeping _dyn is an easy way to increase future-proofness of WASM without increasing complexity.

zeux · 2019-03-12T08:05:15Z

One alternative naming scheme going forward is that shuffle is a compile-time (immediate) construct and permute is a run-time (variable/dynamic) construct. So when/if we need a 2-input... something... we call it permute2. I looked at the naming conventions of different architectures and I guess they don't consistently assign any meaning to either permute or shuffle, e.g. SSSE3 actually calls the instruction discussed in this proposal "shuffle bytes", whereas AltiVec and AMD XOP have "permute" with 2 inputs.

If this seems reasonable then I can update this PR and the spec to suggest that this is the convention.

AndrewScheidecker · 2019-03-12T13:12:47Z

IMO shuffle vs permute is arbitrary, and shouldn't be used to distinguish any semantics. I'd just stick with shuffle, since it's what the proposal already uses, and distinguish the number of input vectors with shuffle1 vs shuffle2 (or maybe shuffle vs shuffle2).

And maybe "immediate" is a more precise distinction than "dynamic?

So something like:
v8x16.shuffle1_imm v8x16.shuffle1
v8x16.shuffle2_imm v8x16.shuffle2
where v8x16.shuffle2_imm corresponds to the current v8x16.shuffle.

zeux · 2019-03-12T14:06:50Z

I'd like to figure out how to stop bikeshedding this. To me it doesn't truly matter whether to call this permute or shuffle, which suffix to add, and whether to rename the existing instruction (consistency vs churn) - what does matter is that this instruction lands. How do we proceed?

lemaitre · 2019-03-12T14:36:42Z

I'd like to figure out how to stop bikeshedding this

Then you shouldn't have changed the name that was already chosen and agreed on (I don't say it was perfect).
If you change a name, you should expect discussion about the name change.

That being said, "bikeshedding" is important before a release (so before the MVP), as after it, names will be forever, and inconsistent names are frustrating and error-prone.

In the end, I don't care which name is chosen, as long as it makes sense and is consistent.

zeux · 2019-03-12T14:47:46Z

Then you shouldn't have changed the name that was already chosen and agreed on (I don't say it was perfect).

The initial version of this PR said permute_dyn, and I was asked by the project maintainer to change this. I didn't realize there was agreement about the _dyn suffix.

lemaitre · 2019-03-12T15:00:47Z

To be precise, you were not ask to change it. But the entire community was asked if it would make sense to change (and my opinion would have been: "no it does not make sense to change").

Any objections to calling this v8x16.permute instead of v8x16.permute_dyn? I think permute distinguishes sufficiently from the shuffle without the _dyn suffix, but I don't feel very strongly about this.

I don't blame anybody, and I don't really care about the outcome (this instruction will be added whatever happens). I just want to explain why it ended up like this.

zeux · 2019-03-12T15:03:44Z

I restored the proposal to its original form, since it seems like _dyn suffix is least controversial. I've kept the change of shuffle bytecode as originally suggested - it does introduce a change to the current specification, but it means that shuffle/permute_dyn are grouped together with, hopefully, space for more permutation opcodes in the future so it may be worthwhile? This does break all existing implementations of WebAssembly SIMD, I'll be happy to put shuffle back at 0x03 if necessary.

tlively · 2019-03-12T16:35:40Z

What do people think of v8x16.shuffle_imm and v8x16.shuffle_vec? I would also be happy to drop the suffix on one or the other.

tlively · 2019-03-12T16:38:59Z

I'm also fine keeping "permute" for a single vector and "shuffle" for two vectors. I think that gives a fair intuition of what they do.

zeux · 2019-03-12T16:39:53Z

Note that this proposal is for a single-vector shuffle; two-vector shuffle isn’t as widely supported or applicable. So names need to be distinct and explicitly highlight the difference (either via a different name or a numeric suffix)

tlively · 2019-03-12T16:42:38Z

Right, thanks for the clarification. I'm referring to the existing v8x16.shuffle instruction, which I would be fine renaming if there exists a better name in the presence of the new permute instruction.

gnzlbg · 2019-03-12T17:26:01Z

FWIW my thoughts on this is that for a programming language (like Rust, C++, etc.) the names by which these instructions at exposed at the language level matters because those are the names that the users interface with. Whether the C++ compiler lowers this to swizzle, shuffle, permute, or foobar, isn't that important IMO, and different architectures use different names for these already, so the compiler is going to be lowering to some of these names independently of what one chooses to do here. Choosing a similar name to one of the more mainstream architectures like x86 or arm is probably a better trade-off than inventing a new names just to be "technically correct", since that allows transferring knowledge from the mainstream assembly languages to wasm.

rrwinterton · 2019-03-12T20:07:26Z

The compiler implementation has to support the variable general shuffle function that is available now to support variables as an input to the shuffle anyway. A change to permute seems like a good idea to me. I like the idea of a naming convention of permute var and a permute imm with two input vectors as well.

binji · 2019-03-12T23:17:48Z

I wouldn't worry too much about names at this point. SIMD is still relatively early, and there will be plenty of time for bikeshedding names later (see get_local vs local.get, br vs. break, etc. i32.add vs add.i32, etc. etc)

zeux · 2019-03-13T03:21:54Z

@tlively Right, I guess what I meant that in the context of this discussion if we were to rename existing shuffle instruction, it would probably make sense to have something like v8x16.shuffle2_imm (for existing instruction) and v8x16.shuffle1_dyn (for proposed instruction), as opposed to unadorned shuffle. Alternatively we could use _var or _vec instead of dyn.

This does seem preferable - if the current shuffle name isn't set in stone then maybe we should call both shuffle as long as we clearly identify the variants using numeric suffixes for the number of inputs, and _imm/_? suffix for the type of the selector. This should be future proof wrt additions like shuffle2_dyn or shuffle1_imm.

dtig · 2019-03-13T15:15:45Z

As the opcode number of the shuffle instruction is being changed, and the any implementations will need to be updated anyway, changing the name to be more indicative makes sense.

I like @AndrewScheidecker's suggestion above of using just the _imm prefix to distinguish, because that's more precise with what the operation is doing.

zeux · 2019-03-13T15:24:01Z

Sounds good - I was thinking that renaming after changing opcode might be beneficial to reduce confusion anyway.

zeux · 2019-03-14T00:24:32Z

I've updated the proposal to rename both shuffle instructions as follows:

v8x16.shuffle from the current draft becomes v8x16.shuffle2_imm
v8x16.permute_dyn from this proposal becomes v8x16.shuffle1

In the future this naming scheme will allow us to ship more variants without friction, such as v8x16.shuffle2 (non-immediate version with out-of-range handling, this is supported by several architectures and can be trivially emulated using shuffle1 on others), v8x16.shuffle1_imm (this is a weaker version of v8x16.shuffle2_imm, which could be encoded using 8 bytes for the mask instead of 16 bytes for shuffle2_imm, which could help code size) or any other variants.

Hopefully this is a reasonable compromise given that the names can be changed in the future before the spec is finalized.

This change adds a variable shuffle instruction to SIMD proposal. When indices are out of range, the result is specified as 0 for each lane. This matches hardware behavior on ARM and RISCV architectures. On x86_64 and MIPS, the hardware provides instructions that can select 0 when the high bit is set to 1 (x86_64) or any of the two high bits are set to 1 (MIPS). On these architectures, the backend is expected to emit a pair of instructions, saturating add (saturate(x + (128 - 16)) for x86_64) and permute, to emulate the proposed behavior. To distinguish variable shuffles with immediate shuffles, existing v8x16.shuffle instruction is renamed to v8x16.shuffle2_imm to be explicit about the fact that it shuffles two vectors with an immediate argument. This naming scheme allows for adding variants like v8x16.shuffle2 and v8x16.shuffle1_imm in the future. Fixes #68. Contributes to #24. Fixes #11.

tlively · 2019-03-14T03:33:06Z

I think merging this into the proposal should not be blocked on resolving the names. The opcodes and semantics are the important part, and I haven't heard any pushback on those. So I think if @dtig and @arunetm sign off on this, then it is good to be merged.

Back on the name bikeshedding, I'm not a fan of having numbers in the instruction names (apart from the type prefix, of course). No other WebAssembly instruction includes numbers like that. This is why I prefer using permute and shuffle to distinguish the one and two source operations. This scheme should still be future-proof because there will never be more than two sources.

I also like the idea of having suffixes to identify the selectors. I think _imm and _vec or _imm and <none> would sound better than _imm and _dyn, though. I think I don't like _dyn because its opposite is "static", not "immediate", and there is no nice way to make "static" into a suffix.

Let's have a non-binding, informational vote!
👍 to support permute* and shuffle*
😄 to support shuffle1* and shuffle2*
🎉 to support _dyn
❤️ to support _vec
🚀 to support no suffix for dynamic indices

dtig · 2019-03-14T05:01:46Z

I agree that this is good to be merged, approving this change. As pointed out a couple of this times on this thread, the SIMD proposal is still in it's early stages, and we can continue to bikeshed the names without blocking including the opcode. There is broad consensus on the semantics of a dynamic shuffle that shuffles the contents of one vector using lanes of the second vector as indices, and it is well supported by hardware, and we have performance numbers from @zeux that point to the usefulness of including the shuffle.

As to naming, I apologize I wasn't clear in my previous comment about using the _imm suffix to distinguish. My comment was only about using the _imm suffix, and not the inclusion of numbers in the opcode name. I agree with tlively@ that having numbers in opcode names is inconsistent with the rest of the WebAssembly opcodes, this can always be revisited in the future if/when including other shuffle variants makes sense.

gnzlbg

minor nitpicks

gnzlbg · 2019-03-14T07:38:17Z

proposals/simd/TextSIMD.md


 ```
-v8x16.shuffle i5 i5 i5 i5 i5 i5 i5 i5 i5 i5 i5 i5 i5 i5 i5 i5
+v8x16.shuffle2_imm i5 i5 i5 i5 i5 i5 i5 i5 i5 i5 i5 i5 i5 i5 i5 i5


This is a bit confusing. These are not really i5s, but they are i8s restricted to the i5 domain.

Maybe this should use LaneIdx32 ?

This is specifying text format so I believe i5 is accurate.

gnzlbg · 2019-03-14T07:40:55Z

proposals/simd/SIMD.md

@@ -294,7 +294,7 @@ return. The indices `i` in range `[0, 15]` select the `i`-th element of `a`. The
 indices in range `[16, 31]` select the `i - 16`-th element of `b`.


This should IMO also mention what happens with out-of-range indices (e.g. validation error) for consistency with the variable index shuffle.

My understanding was that LaneIdx32 implies that indices must be 0-31. The encoding for these isn't documented atm.

It is not strictly necessary to document these here. The definition of LaneIdx32 states:

[...] Many have a limited valid range, and it is a validation error if the immediate operands are out of range.

ImmLaneIdx32: A byte with values in the range 0–31 identifying a lane.

but I think a "(note: out-of-range indices are a validation error)" might be helpful.

This is not the final spec text, so as long as the intended semantics and opcodes are clear it should be good to go.

proposals/simd/SIMD.md

gnzlbg · 2019-03-14T07:46:21Z

I agree with tlively@ that having numbers in opcode names is inconsistent with the rest of the WebAssembly opcodes,

If numbers are an issue, we can always spell them out: binary_shuffle_imm/unary_shuffle.

zeux changed the title ~~Add v8x16.permute_dyn instruction~~ Add v8x16.permute instruction Mar 12, 2019

gnzlbg mentioned this pull request Mar 12, 2019

Update the shuffle1_dyn API rust-lang/packed_simd#228

Open

3 tasks

zeux changed the title ~~Add v8x16.permute instruction~~ Add v8x16.permute_dyn instruction Mar 12, 2019

zeux changed the title ~~Add v8x16.permute_dyn instruction~~ Add v8x16.shuffle1 instruction and rename v8x16.shuffle to v8x16.shuffle2_imm Mar 14, 2019

dtig approved these changes Mar 14, 2019

View reviewed changes

gnzlbg suggested changes Mar 14, 2019

View reviewed changes

dtig merged commit 7f4d54d into WebAssembly:master Mar 27, 2019

gnzlbg mentioned this pull request Mar 29, 2019

Rename v8x16.shuffle to v8x16.shuffle2_imm WebAssembly/wabt#1053

Closed

tlively mentioned this pull request May 15, 2019

Naming shuffle instructions #78

Closed

		@@ -294,7 +294,7 @@ return. The indices `i` in range `[0, 15]` select the `i`-th element of `a`. The
		indices in range `[16, 31]` select the `i - 16`-th element of `b`.

Add v8x16.shuffle1 instruction and rename v8x16.shuffle to v8x16.shuffle2_imm #71

Add v8x16.shuffle1 instruction and rename v8x16.shuffle to v8x16.shuffle2_imm #71

Conversation

zeux commented Mar 12, 2019 • edited Loading

dtig commented Mar 12, 2019

zeux commented Mar 12, 2019

lemaitre commented Mar 12, 2019

gnzlbg commented Mar 12, 2019 • edited Loading

zeux commented Mar 12, 2019 • edited Loading

lemaitre commented Mar 12, 2019

zeux commented Mar 12, 2019 • edited Loading

AndrewScheidecker commented Mar 12, 2019

zeux commented Mar 12, 2019

lemaitre commented Mar 12, 2019

zeux commented Mar 12, 2019 • edited Loading

lemaitre commented Mar 12, 2019

zeux commented Mar 12, 2019 • edited Loading

tlively commented Mar 12, 2019

tlively commented Mar 12, 2019

zeux commented Mar 12, 2019

tlively commented Mar 12, 2019

gnzlbg commented Mar 12, 2019

rrwinterton commented Mar 12, 2019

binji commented Mar 12, 2019

zeux commented Mar 13, 2019

dtig commented Mar 13, 2019

zeux commented Mar 13, 2019

zeux commented Mar 14, 2019 • edited Loading

tlively commented Mar 14, 2019 • edited Loading

dtig commented Mar 14, 2019

gnzlbg left a comment

Choose a reason for hiding this comment

gnzlbg Mar 14, 2019

Choose a reason for hiding this comment

gnzlbg Mar 14, 2019

Choose a reason for hiding this comment

zeux Mar 14, 2019

Choose a reason for hiding this comment

gnzlbg Mar 14, 2019

Choose a reason for hiding this comment

zeux Mar 14, 2019 • edited Loading

Choose a reason for hiding this comment

gnzlbg Mar 14, 2019

Choose a reason for hiding this comment

tlively Mar 14, 2019

Choose a reason for hiding this comment

gnzlbg commented Mar 14, 2019

zeux commented Mar 12, 2019 •

edited

Loading

gnzlbg commented Mar 12, 2019 •

edited

Loading

zeux commented Mar 12, 2019 •

edited

Loading

zeux commented Mar 12, 2019 •

edited

Loading

zeux commented Mar 12, 2019 •

edited

Loading

zeux commented Mar 12, 2019 •

edited

Loading

zeux commented Mar 14, 2019 •

edited

Loading

tlively commented Mar 14, 2019 •

edited

Loading

zeux Mar 14, 2019 •

edited

Loading