From 7f4d54d4c9a525f6e04c7edcecdc1969f27e5d09 Mon Sep 17 00:00:00 2001 From: Arseny Kapoulkine Date: Wed, 27 Mar 2019 10:10:01 -0700 Subject: [PATCH] Add v8x16.shuffle1 instruction (#71) This change adds a variable shuffle instruction to SIMD proposal. When indices are out of range, the result is specified as 0 for each lane. This matches hardware behavior on ARM and RISCV architectures. On x86_64 and MIPS, the hardware provides instructions that can select 0 when the high bit is set to 1 (x86_64) or any of the two high bits are set to 1 (MIPS). On these architectures, the backend is expected to emit a pair of instructions, saturating add (saturate(x + (128 - 16)) for x86_64) and permute, to emulate the proposed behavior. To distinguish variable shuffles with immediate shuffles, existing v8x16.shuffle instruction is renamed to v8x16.shuffle2_imm to be explicit about the fact that it shuffles two vectors with an immediate argument. This naming scheme allows for adding variants like v8x16.shuffle2 and v8x16.shuffle1_imm in the future. Fixes #68. Contributes to #24. Fixes #11. --- proposals/simd/BinarySIMD.md | 5 +++-- proposals/simd/SIMD.md | 25 ++++++++++++++++++++++--- proposals/simd/TextSIMD.md | 4 ++-- 3 files changed, 27 insertions(+), 7 deletions(-) diff --git a/proposals/simd/BinarySIMD.md b/proposals/simd/BinarySIMD.md index 08dadda339..ff51a7f0f1 100644 --- a/proposals/simd/BinarySIMD.md +++ b/proposals/simd/BinarySIMD.md @@ -23,14 +23,13 @@ instr ::= ... ``` Some SIMD instructions have additional immediate operands following `simdop`. -The `v8x16.shuffle` instruction has 16 bytes after `simdop`. +The `v8x16.shuffle2_imm` instruction has 16 bytes after `simdop`. | Instruction | `simdop` | Immediate operands | | --------------------------|---------:|--------------------| | `v128.load` | `0x00`| m:memarg | | `v128.store` | `0x01`| m:memarg | | `v128.const` | `0x02`| i:ImmByte[16] | -| `v8x16.shuffle` | `0x03`| s:LaneIdx32[16] | | `i8x16.splat` | `0x04`| - | | `i8x16.extract_lane_s` | `0x05`| i:LaneIdx16 | | `i8x16.extract_lane_u` | `0x06`| i:LaneIdx16 | @@ -167,3 +166,5 @@ The `v8x16.shuffle` instruction has 16 bytes after `simdop`. | `f32x4.convert_u/i32x4` | `0xb0`| - | | `f64x2.convert_s/i64x2` | `0xb1`| - | | `f64x2.convert_u/i64x2` | `0xb2`| - | +| `v8x16.shuffle1` | `0xc0`| - | +| `v8x16.shuffle2_imm` | `0xc1`| s:LaneIdx32[16] | \ No newline at end of file diff --git a/proposals/simd/SIMD.md b/proposals/simd/SIMD.md index bb8d1d4a75..4862364e8e 100644 --- a/proposals/simd/SIMD.md +++ b/proposals/simd/SIMD.md @@ -284,8 +284,8 @@ def S.replace_lane(a, i, x): The input lane value, `x`, is interpreted the same way as for the splat instructions. For the `i8` and `i16` lanes, the high bits of `x` are ignored. -### Shuffle lanes -* `v8x16.shuffle(a: v128, b: v128, imm: ImmLaneIdx32[16]) -> v128` +### Shuffling using immediate indices +* `v8x16.shuffle2_imm(a: v128, b: v128, imm: ImmLaneIdx32[16]) -> v128` Returns a new vector with lanes selected from the lanes of the two input vectors `a` and `b` specified in the 16 byte wide immediate mode operand `imm`. This @@ -294,7 +294,7 @@ return. The indices `i` in range `[0, 15]` select the `i`-th element of `a`. The indices in range `[16, 31]` select the `i - 16`-th element of `b`. ```python -def S.shuffle(a, b, s): +def S.shuffle2_imm(a, b, s): result = S.New() for i in range(S.Lanes): if s[i] < S.lanes: @@ -304,6 +304,25 @@ def S.shuffle(a, b, s): return result ``` +### Shuffling using variable indices +* `v8x16.shuffle1(a: v128, s: v128) -> v128` + +Returns a new vector with lanes selected from the lanes of the first input +vector `a` specified in the second input vector `s`. The indices `i` in range +`[0, 15]` select the `i`-th element of `a`. For indices outside of the range +the resulting lane is 0. + +```python +def S.shuffle1(a, s): + result = S.New() + for i in range(S.Lanes): + if s[i] < S.lanes: + result[i] = a[s[i]] + else: + result[i] = 0 + return result +``` + ## Integer arithmetic Wrapping integer arithmetic discards the high bits of the result. diff --git a/proposals/simd/TextSIMD.md b/proposals/simd/TextSIMD.md index 8ba2e4a7b6..fc3a7e7d2c 100644 --- a/proposals/simd/TextSIMD.md +++ b/proposals/simd/TextSIMD.md @@ -20,8 +20,8 @@ The canonical text format used for printing `v128.const` instructions is v128.const i32x4 0xNNNNNNNN 0xNNNNNNNN 0xNNNNNNNN 0xNNNNNNNN ``` -### v8x16.shuffle +### v8x16.shuffle2_imm ``` -v8x16.shuffle i5 i5 i5 i5 i5 i5 i5 i5 i5 i5 i5 i5 i5 i5 i5 i5 +v8x16.shuffle2_imm i5 i5 i5 i5 i5 i5 i5 i5 i5 i5 i5 i5 i5 i5 i5 i5 ```