Skip to content
This repository has been archived by the owner on Dec 22, 2021. It is now read-only.

Add v8x16.shuffle1 instruction and rename v8x16.shuffle to v8x16.shuffle2_imm #71

Merged
merged 1 commit into from
Mar 27, 2019

Conversation

zeux
Copy link
Contributor

@zeux zeux commented Mar 12, 2019

This change adds a dynamic permute instruction to SIMD proposal.

The name "permute", as opposed to shuffle, is used to distinguish it from
instructions that operate on two input vectors.

When indices are out of range, the result is specified as 0 for each
lane. This matches hardware behavior on ARM and RISCV architectures.

On x86_64 and MIPS, the hardware provides instructions that can select 0
when the high bit is set to 1 (x86_64) or any of the two high bits are
set to 1 (MIPS). On these architectures, the backend is expected to emit
a pair of instructions, saturating byte add (saturate(x + (128 - 16)) for
x86_64) and permute, to emulate the proposed behavior.

Fixes #68.
Contributes to #24.
Fixes #11.

@dtig
Copy link
Member

dtig commented Mar 12, 2019

Thanks for following up! This will also need a corresponding encoding in the opcode table here. I suppose the least breaking way to add, and to maintain a logical grouping of opcode numbers is to introduce two new opcode numbers at the end for v8x16.shuffle, and the permute instruction and leave 0x03 to be a reserved opcode. Any objections to calling this v8x16.permute instead of v8x16.permute_dyn? I think permute distinguishes sufficiently from the shuffle without the _dyn suffix, but I don't feel very strongly about this.

@zeux
Copy link
Contributor Author

zeux commented Mar 12, 2019

I was originally thinking that _dyn provides for a nice extension going forward to add permute but I now realize that immediate version of permute is redundant wrt existing shuffle modulo minor code size concerns. I'll rename to just "permute" and add the encoding entries to the end of the table as suggested, leaving a bit of a gap for existing floating point instructions similar to the integer instructions (gap around 0x35).

@zeux zeux changed the title Add v8x16.permute_dyn instruction Add v8x16.permute instruction Mar 12, 2019
@lemaitre
Copy link

I agree that permute with immediate shuffling rule is redundant with shuffle and may never be added.

I would still suggest to keep the _dyn suffix.
Because we might end up adding a shuffle_dyn instruction, and it would be weird and inconsistent to have:

  • shuffle: 2 inputs, immediate shuffling rule
  • shuffle_dyn: 2 input, dynamic shuffling rule
  • permute: 1 input, dynamic shuffling rule

@gnzlbg
Copy link
Contributor

gnzlbg commented Mar 12, 2019

Wikipedia distinguishes "permutations" from "n-tuples" (permutations with repetition):

Ordered arrangements of the elements of a set S of length n where repetition is allowed are called n-tuples, but have sometimes been referred to as permutations with repetition although they are not permutations in general.

To convey that this is a single-vector dynamic shuffle, in Rust, we ended up calling this operation shuffle1_dyn to distinguish it from "permutation" (without element repetition) and to leave the door open for adding a two-vector shuffle_dyn in the future.

@zeux
Copy link
Contributor Author

zeux commented Mar 12, 2019

A downside to including _dyn suffix is that in the current specification it looks out of place since all other instructions assume a specific encoding.

2-input dynamic shuffle can be constructed as follows:
or(permute(a, s), permute(b, s - 16))
I don't think the native version has hardware support that's as universal as permute although I haven't researched this fully. (edit I guess the missing archs are x64 and RISCV, seems like other archs support this)

@lemaitre
Copy link

2-input dynamic shuffle can be constructed as follows:
or(permute(a, s), permute(b, s - 16))
I don't think the native version has hardware support that's as universal as permute although I haven't researched this fully.

Indeed, this is a valid emulation, but if hardware support for the permute_dyn increases in the future, we might still want a dedicated instruction (like we have an instruction for f32x4.max which is easily emulateable).
In other words, keeping _dyn is an easy way to increase future-proofness of WASM without increasing complexity.

@zeux
Copy link
Contributor Author

zeux commented Mar 12, 2019

One alternative naming scheme going forward is that shuffle is a compile-time (immediate) construct and permute is a run-time (variable/dynamic) construct. So when/if we need a 2-input... something... we call it permute2. I looked at the naming conventions of different architectures and I guess they don't consistently assign any meaning to either permute or shuffle, e.g. SSSE3 actually calls the instruction discussed in this proposal "shuffle bytes", whereas AltiVec and AMD XOP have "permute" with 2 inputs.

If this seems reasonable then I can update this PR and the spec to suggest that this is the convention.

@AndrewScheidecker
Copy link
Contributor

IMO shuffle vs permute is arbitrary, and shouldn't be used to distinguish any semantics. I'd just stick with shuffle, since it's what the proposal already uses, and distinguish the number of input vectors with shuffle1 vs shuffle2 (or maybe shuffle vs shuffle2).

And maybe "immediate" is a more precise distinction than "dynamic?

So something like:
v8x16.shuffle1_imm v8x16.shuffle1
v8x16.shuffle2_imm v8x16.shuffle2
where v8x16.shuffle2_imm corresponds to the current v8x16.shuffle.

@zeux
Copy link
Contributor Author

zeux commented Mar 12, 2019

I'd like to figure out how to stop bikeshedding this. To me it doesn't truly matter whether to call this permute or shuffle, which suffix to add, and whether to rename the existing instruction (consistency vs churn) - what does matter is that this instruction lands. How do we proceed?

@lemaitre
Copy link

I'd like to figure out how to stop bikeshedding this

Then you shouldn't have changed the name that was already chosen and agreed on (I don't say it was perfect).
If you change a name, you should expect discussion about the name change.

That being said, "bikeshedding" is important before a release (so before the MVP), as after it, names will be forever, and inconsistent names are frustrating and error-prone.

In the end, I don't care which name is chosen, as long as it makes sense and is consistent.

@zeux
Copy link
Contributor Author

zeux commented Mar 12, 2019

Then you shouldn't have changed the name that was already chosen and agreed on (I don't say it was perfect).

The initial version of this PR said permute_dyn, and I was asked by the project maintainer to change this. I didn't realize there was agreement about the _dyn suffix.

@lemaitre
Copy link

To be precise, you were not ask to change it. But the entire community was asked if it would make sense to change (and my opinion would have been: "no it does not make sense to change").

Any objections to calling this v8x16.permute instead of v8x16.permute_dyn? I think permute distinguishes sufficiently from the shuffle without the _dyn suffix, but I don't feel very strongly about this.

I don't blame anybody, and I don't really care about the outcome (this instruction will be added whatever happens). I just want to explain why it ended up like this.

@zeux zeux changed the title Add v8x16.permute instruction Add v8x16.permute_dyn instruction Mar 12, 2019
@zeux
Copy link
Contributor Author

zeux commented Mar 12, 2019

I restored the proposal to its original form, since it seems like _dyn suffix is least controversial. I've kept the change of shuffle bytecode as originally suggested - it does introduce a change to the current specification, but it means that shuffle/permute_dyn are grouped together with, hopefully, space for more permutation opcodes in the future so it may be worthwhile? This does break all existing implementations of WebAssembly SIMD, I'll be happy to put shuffle back at 0x03 if necessary.

@tlively
Copy link
Member

tlively commented Mar 12, 2019

What do people think of v8x16.shuffle_imm and v8x16.shuffle_vec? I would also be happy to drop the suffix on one or the other.

@tlively
Copy link
Member

tlively commented Mar 12, 2019

I'm also fine keeping "permute" for a single vector and "shuffle" for two vectors. I think that gives a fair intuition of what they do.

@zeux
Copy link
Contributor Author

zeux commented Mar 12, 2019

Note that this proposal is for a single-vector shuffle; two-vector shuffle isn’t as widely supported or applicable. So names need to be distinct and explicitly highlight the difference (either via a different name or a numeric suffix)

@tlively
Copy link
Member

tlively commented Mar 12, 2019

Right, thanks for the clarification. I'm referring to the existing v8x16.shuffle instruction, which I would be fine renaming if there exists a better name in the presence of the new permute instruction.

@gnzlbg
Copy link
Contributor

gnzlbg commented Mar 12, 2019

FWIW my thoughts on this is that for a programming language (like Rust, C++, etc.) the names by which these instructions at exposed at the language level matters because those are the names that the users interface with. Whether the C++ compiler lowers this to swizzle, shuffle, permute, or foobar, isn't that important IMO, and different architectures use different names for these already, so the compiler is going to be lowering to some of these names independently of what one chooses to do here. Choosing a similar name to one of the more mainstream architectures like x86 or arm is probably a better trade-off than inventing a new names just to be "technically correct", since that allows transferring knowledge from the mainstream assembly languages to wasm.

@rrwinterton
Copy link

The compiler implementation has to support the variable general shuffle function that is available now to support variables as an input to the shuffle anyway. A change to permute seems like a good idea to me. I like the idea of a naming convention of permute var and a permute imm with two input vectors as well.

@binji
Copy link
Member

binji commented Mar 12, 2019

I wouldn't worry too much about names at this point. SIMD is still relatively early, and there will be plenty of time for bikeshedding names later (see get_local vs local.get, br vs. break, etc. i32.add vs add.i32, etc. etc)

@zeux
Copy link
Contributor Author

zeux commented Mar 13, 2019

@tlively Right, I guess what I meant that in the context of this discussion if we were to rename existing shuffle instruction, it would probably make sense to have something like v8x16.shuffle2_imm (for existing instruction) and v8x16.shuffle1_dyn (for proposed instruction), as opposed to unadorned shuffle. Alternatively we could use _var or _vec instead of dyn.

This does seem preferable - if the current shuffle name isn't set in stone then maybe we should call both shuffle as long as we clearly identify the variants using numeric suffixes for the number of inputs, and _imm/_? suffix for the type of the selector. This should be future proof wrt additions like shuffle2_dyn or shuffle1_imm.

@dtig
Copy link
Member

dtig commented Mar 13, 2019

As the opcode number of the shuffle instruction is being changed, and the any implementations will need to be updated anyway, changing the name to be more indicative makes sense.

I like @AndrewScheidecker's suggestion above of using just the _imm prefix to distinguish, because that's more precise with what the operation is doing.

@zeux
Copy link
Contributor Author

zeux commented Mar 13, 2019

Sounds good - I was thinking that renaming after changing opcode might be beneficial to reduce confusion anyway.

@zeux zeux changed the title Add v8x16.permute_dyn instruction Add v8x16.shuffle1 instruction and rename v8x16.shuffle to v8x16.shuffle2_imm Mar 14, 2019
@zeux
Copy link
Contributor Author

zeux commented Mar 14, 2019

I've updated the proposal to rename both shuffle instructions as follows:

  • v8x16.shuffle from the current draft becomes v8x16.shuffle2_imm
  • v8x16.permute_dyn from this proposal becomes v8x16.shuffle1

In the future this naming scheme will allow us to ship more variants without friction, such as v8x16.shuffle2 (non-immediate version with out-of-range handling, this is supported by several architectures and can be trivially emulated using shuffle1 on others), v8x16.shuffle1_imm (this is a weaker version of v8x16.shuffle2_imm, which could be encoded using 8 bytes for the mask instead of 16 bytes for shuffle2_imm, which could help code size) or any other variants.

Hopefully this is a reasonable compromise given that the names can be changed in the future before the spec is finalized.

This change adds a variable shuffle instruction to SIMD proposal.

When indices are out of range, the result is specified as 0 for each
lane. This matches hardware behavior on ARM and RISCV architectures.

On x86_64 and MIPS, the hardware provides instructions that can select 0
when the high bit is set to 1 (x86_64) or any of the two high bits are
set to 1 (MIPS). On these architectures, the backend is expected to emit
a pair of instructions, saturating add (saturate(x + (128 - 16)) for
x86_64) and permute, to emulate the proposed behavior.

To distinguish variable shuffles with immediate shuffles, existing
v8x16.shuffle instruction is renamed to v8x16.shuffle2_imm to be
explicit about the fact that it shuffles two vectors with an immediate
argument.

This naming scheme allows for adding variants like v8x16.shuffle2 and
v8x16.shuffle1_imm in the future.

Fixes #68.
Contributes to #24.
Fixes #11.
@tlively
Copy link
Member

tlively commented Mar 14, 2019

I think merging this into the proposal should not be blocked on resolving the names. The opcodes and semantics are the important part, and I haven't heard any pushback on those. So I think if @dtig and @arunetm sign off on this, then it is good to be merged.

Back on the name bikeshedding, I'm not a fan of having numbers in the instruction names (apart from the type prefix, of course). No other WebAssembly instruction includes numbers like that. This is why I prefer using permute and shuffle to distinguish the one and two source operations. This scheme should still be future-proof because there will never be more than two sources.

I also like the idea of having suffixes to identify the selectors. I think _imm and _vec or _imm and <none> would sound better than _imm and _dyn, though. I think I don't like _dyn because its opposite is "static", not "immediate", and there is no nice way to make "static" into a suffix.

Let's have a non-binding, informational vote!
👍 to support permute* and shuffle*
😄 to support shuffle1* and shuffle2*
🎉 to support _dyn
❤️ to support _vec
🚀 to support no suffix for dynamic indices

@dtig
Copy link
Member

dtig commented Mar 14, 2019

I agree that this is good to be merged, approving this change. As pointed out a couple of this times on this thread, the SIMD proposal is still in it's early stages, and we can continue to bikeshed the names without blocking including the opcode. There is broad consensus on the semantics of a dynamic shuffle that shuffles the contents of one vector using lanes of the second vector as indices, and it is well supported by hardware, and we have performance numbers from @zeux that point to the usefulness of including the shuffle.

As to naming, I apologize I wasn't clear in my previous comment about using the _imm suffix to distinguish. My comment was only about using the _imm suffix, and not the inclusion of numbers in the opcode name. I agree with tlively@ that having numbers in opcode names is inconsistent with the rest of the WebAssembly opcodes, this can always be revisited in the future if/when including other shuffle variants makes sense.

Copy link
Contributor

@gnzlbg gnzlbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor nitpicks


```
v8x16.shuffle i5 i5 i5 i5 i5 i5 i5 i5 i5 i5 i5 i5 i5 i5 i5 i5
v8x16.shuffle2_imm i5 i5 i5 i5 i5 i5 i5 i5 i5 i5 i5 i5 i5 i5 i5 i5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit confusing. These are not really i5s, but they are i8s restricted to the i5 domain.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this should use LaneIdx32 ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is specifying text format so I believe i5 is accurate.

@@ -294,7 +294,7 @@ return. The indices `i` in range `[0, 15]` select the `i`-th element of `a`. The
indices in range `[16, 31]` select the `i - 16`-th element of `b`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should IMO also mention what happens with out-of-range indices (e.g. validation error) for consistency with the variable index shuffle.

Copy link
Contributor Author

@zeux zeux Mar 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding was that LaneIdx32 implies that indices must be 0-31. The encoding for these isn't documented atm.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not strictly necessary to document these here. The definition of LaneIdx32 states:

[...] Many have a limited valid range, and it is a validation error if the immediate operands are out of range.

ImmLaneIdx32: A byte with values in the range 0–31 identifying a lane.

but I think a "(note: out-of-range indices are a validation error)" might be helpful.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the final spec text, so as long as the intended semantics and opcodes are clear it should be good to go.

proposals/simd/SIMD.md Show resolved Hide resolved
@gnzlbg
Copy link
Contributor

gnzlbg commented Mar 14, 2019

I agree with tlively@ that having numbers in opcode names is inconsistent with the rest of the WebAssembly opcodes,

If numbers are an issue, we can always spell them out: binary_shuffle_imm/unary_shuffle.

@dtig dtig merged commit 7f4d54d into WebAssembly:master Mar 27, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants