Skip to content
This repository has been archived by the owner on Dec 22, 2021. It is now read-only.

Are shuffle's lane indices dynamic? #33

Closed
gnzlbg opened this issue Aug 7, 2018 · 11 comments
Closed

Are shuffle's lane indices dynamic? #33

gnzlbg opened this issue Aug 7, 2018 · 11 comments

Comments

@gnzlbg
Copy link
Contributor

gnzlbg commented Aug 7, 2018

Reading the spec:

Shuffle lanes

v8x16.shuffle(a: v128, b: v128, s: LaneIdx32[16]) -> v128

it appears that the lane indices argument for the shuffle intrinsic is a non-immediate mode argument.

If my memory doesn't fail me:

  • LLVM's shufflevector requires compile-time indices.

  • Most architectures do not support shuffling vector lanes of arbitrary width with dynamic indices:

    • x86/x86_64 SSSE3's pshufb (and AVX-512 variants) support shuffling bytes with non-immediate mode indices
    • ARM 32/64 vtbl/vtbx family of instructions support shuffling bytes with non-immediate mode indices

That is, if the user is shuffling a v8x16, we can use those instructions directly to shuffle with dynamic indices without issues.

If the user attempts to shuffle a v16x8, v32x4, etc. what's the best machine code that we can generate for dynamic indices?

If I remember correctly, generating the appropriate sequence of shuffle bytes instructions requires knowing the lane indices during machine code generation. If these aren't known, the only thing a code generator could do AFAICT is fall back to scalar code, that is, copy the vectors to memory, do a scalar shuffle there e.g. by copying into a third vector, and copy the result of this third vector back to a vector register. This sounds like a huge performance cliff to me.

What am I missing? How can we generate efficient vector shuffles from dynamic lane indices?

@gnzlbg gnzlbg changed the title shuffle lanes takes dynamic lane indices Are shuffle's lane indices dynamic? Aug 7, 2018
@gnzlbg
Copy link
Contributor Author

gnzlbg commented Aug 7, 2018

Duh...

What I was missing is that LaneIdx32 is an immediate mode operand.

Honestly, I think the spec should be a bit more idiot proof here, by:

  • renaming LaneIdx32 to ImmLaneIdx32 (for consistency with ImmByte)
  • renaming the s argument in v128.shuffle to sImm, imm, or similar

@gnzlbg gnzlbg closed this as completed Aug 7, 2018
gnzlbg added a commit to gnzlbg/simd that referenced this issue Aug 7, 2018
This makes it clear everywhere that `ImmLaneIdx` is an immediate operand and also makes the nomenclature for immediate operands consistent (both `ImmByte` and `ImmLaneIdx`  start with the prefix `Imm`) .

See WebAssembly#33 .
@gnzlbg
Copy link
Contributor Author

gnzlbg commented Aug 8, 2018

So it appears that LaneIdx are not always immediate mode operands (e.g. in the extract_lanes instructions), which would mean that my original interpretation of the spec is correct, and that the vector shuffles accept dynamic run-time indices.

So I am re-opening this for clarification.

@gnzlbg gnzlbg reopened this Aug 8, 2018
@gnzlbg
Copy link
Contributor Author

gnzlbg commented Aug 8, 2018

@billbudge how does V8 implement these?

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Aug 8, 2018

Duh again. So yeah, the lane indices are dynamic, but that's ok because the shuffle only operates on v8x16 and one can generate efficient code for that on pretty much every architecture.

@gnzlbg gnzlbg closed this as completed Aug 8, 2018
@gnzlbg
Copy link
Contributor Author

gnzlbg commented Aug 8, 2018

So... no... the https://github.com/WebAssembly/simd/blob/master/proposals/simd/BinarySIMD.md document says that these operands are immediate mode arguments...

And that the operands for extract lanes are immediates as well...

I'm leaving this open until someone clarifies and the spec is made idiot proof. To check whether v8x16 shuffle takes immediate arguments or not, I read the definition in the spec, which doesn't mention it and doesn't call the argument imm, like other places in the spec do. Then I have to jump to the definition of the LaneIdx types, which do not start with Imm_ like ImmBytes do, and then I have to look at the binary encoding doc which does state that these are immediates. . .

I mean, is it me, or is the spec currently unnecessarily ambiguous ?

If instead of all that I would just read v8x16.shuffle(..., imm: ImmLaneIdx[16]) and had a comment next to it saying that imm is an immediate mode operand, the spec would be crystal clear.

@gnzlbg gnzlbg reopened this Aug 8, 2018
@penzn
Copy link
Contributor

penzn commented Sep 25, 2018

They are immediate mode arguments, and so are the arguments to hardware shuffle operations (at least on some popular platforms). I don't know why the wording in the spec is the way it is, but you can definitely try submitting a PR to correct it :)

@billbudge
Copy link

V8's implementation treats the lane index vector as an immediate value. However, there's a strong case to add the dynamic (dynamic shuffle held in a vector register) shuffle. See #24

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Dec 13, 2018

For what it's worth, there has been a PR that closes this issue open for a long time (#34), who should I ping to figure out what's required for it to make progress ?

However, there's a strong case to add the dynamic (dynamic shuffle held in a vector register) shuffle.

There has also been a PR open to address that for a long time and that hasn't seen much progress: #30 .

@binji
Copy link
Member

binji commented Dec 13, 2018

who should I ping to figure out what's required for it to make progress ?

@aretmr @PeterJensen are the SIMD champions, perhaps they have thoughts about this.

@dtig
Copy link
Member

dtig commented Dec 14, 2018

I've left comments on #34, and opened issue #58 to handle triage of the older issues/PRs.

dtig pushed a commit that referenced this issue Dec 17, 2018
* Rename LaneIdx{2,4,8,16,32} to ImmLaneIdx{2,4,8,16,32}

This makes it clear everywhere that `ImmLaneIdx` is an immediate operand and also makes the nomenclature for immediate operands consistent (both `ImmByte` and `ImmLaneIdx`  start with the prefix `Imm`) .

See #33 .

* further clarify the representation hierarchies and immediate mode operands

* memarg is an immediate mode argument

* reference scalar load/stores spec

* Fix typo: floating-point value to integer-value

* Fix memarg

* Fix language in replace value

* Update "hierarchy" of types paragraph
@gnzlbg
Copy link
Contributor Author

gnzlbg commented Mar 2, 2019

IIRC this was fixed by #34 , thank you all!

@gnzlbg gnzlbg closed this as completed Mar 2, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants