Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow simd_bitmask to return byte arrays #88868

Merged
merged 5 commits into from
Nov 10, 2021
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
98 changes: 66 additions & 32 deletions compiler/rustc_codegen_llvm/src/intrinsic.rs
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ use rustc_middle::ty::layout::{FnAbiOf, HasTyCtxt, LayoutOf};
use rustc_middle::ty::{self, Ty};
use rustc_middle::{bug, span_bug};
use rustc_span::{sym, symbol::kw, Span, Symbol};
use rustc_target::abi::{self, HasDataLayout, Primitive};
use rustc_target::abi::{self, Align, HasDataLayout, Primitive};
use rustc_target::spec::{HasTargetSpec, PanicStrategy};

use std::cmp::Ordering;
Expand Down Expand Up @@ -857,28 +857,39 @@ fn generic_simd_intrinsic(
let arg_tys = sig.inputs();

if name == sym::simd_select_bitmask {
let in_ty = arg_tys[0];
let m_len = match in_ty.kind() {
// Note that this `.unwrap()` crashes for isize/usize, that's sort
// of intentional as there's not currently a use case for that.
ty::Int(i) => i.bit_width().unwrap(),
ty::Uint(i) => i.bit_width().unwrap(),
_ => return_error!("`{}` is not an integral type", in_ty),
};
require_simd!(arg_tys[1], "argument");
let (v_len, _) = arg_tys[1].simd_size_and_type(bx.tcx());
require!(
// Allow masks for vectors with fewer than 8 elements to be
// represented with a u8 or i8.
m_len == v_len || (m_len == 8 && v_len < 8),
"mismatched lengths: mask length `{}` != other vector length `{}`",
m_len,
v_len
);
let (len, _) = arg_tys[1].simd_size_and_type(bx.tcx());

let expected_int_bits = (len.max(8) - 1).next_power_of_two();
let expected_bytes = len / 8 + ((len % 8 > 0) as u64);

let mask_ty = arg_tys[0];
let mask = match mask_ty.kind() {
ty::Int(i) if i.bit_width() == Some(expected_int_bits) => args[0].immediate(),
ty::Uint(i) if i.bit_width() == Some(expected_int_bits) => args[0].immediate(),
ty::Array(elem, len)
if matches!(elem.kind(), ty::Uint(ty::UintTy::U8))
&& len.try_eval_usize(bx.tcx, ty::ParamEnv::reveal_all())
== Some(expected_bytes) =>
{
let place = PlaceRef::alloca(bx, args[0].layout);
args[0].val.store(bx, place);
let int_ty = bx.type_ix(expected_bytes * 8);
let ptr = bx.pointercast(place.llval, bx.cx.type_ptr_to(int_ty));
bx.load(int_ty, ptr, Align::ONE)
}
_ => return_error!(
"invalid bitmask `{}`, expected `u{}` or `[u8; {}]`",
mask_ty,
expected_int_bits,
expected_bytes
),
};

let i1 = bx.type_i1();
let im = bx.type_ix(v_len);
let i1xn = bx.type_vector(i1, v_len);
let m_im = bx.trunc(args[0].immediate(), im);
let im = bx.type_ix(len);
let i1xn = bx.type_vector(i1, len);
let m_im = bx.trunc(mask, im);
let m_i1s = bx.bitcast(m_im, i1xn);
return Ok(bx.select(m_i1s, args[1].immediate(), args[2].immediate()));
}
Expand Down Expand Up @@ -1056,16 +1067,13 @@ fn generic_simd_intrinsic(

if name == sym::simd_bitmask {
// The `fn simd_bitmask(vector) -> unsigned integer` intrinsic takes a
// vector mask and returns an unsigned integer containing the most
// significant bit (MSB) of each lane.

// If the vector has less than 8 lanes, a u8 is returned with zeroed
// trailing bits.
// vector mask and returns the most significant bit (MSB) of each lane in the form
// of either:
// * an unsigned integer
// * an array of `u8`
// If the vector has less than 8 lanes, a u8 is returned with zeroed trailing bits.
let expected_int_bits = in_len.max(8);
match ret_ty.kind() {
ty::Uint(i) if i.bit_width() == Some(expected_int_bits) => (),
_ => return_error!("bitmask `{}`, expected `u{}`", ret_ty, expected_int_bits),
}
let expected_bytes = expected_int_bits / 8 + ((expected_int_bits % 8 > 0) as u64);

// Integer vector <i{in_bitwidth} x in_len>:
let (i_xn, in_elem_bitwidth) = match in_elem.kind() {
Expand Down Expand Up @@ -1095,8 +1103,34 @@ fn generic_simd_intrinsic(
let i1xn = bx.trunc(i_xn_msb, bx.type_vector(bx.type_i1(), in_len));
// Bitcast <i1 x N> to iN:
let i_ = bx.bitcast(i1xn, bx.type_ix(in_len));
// Zero-extend iN to the bitmask type:
return Ok(bx.zext(i_, bx.type_ix(expected_int_bits)));

match ret_ty.kind() {
ty::Uint(i) if i.bit_width() == Some(expected_int_bits) => {
// Zero-extend iN to the bitmask type:
return Ok(bx.zext(i_, bx.type_ix(expected_int_bits)));
}
ty::Array(elem, len)
if matches!(elem.kind(), ty::Uint(ty::UintTy::U8))
&& len.try_eval_usize(bx.tcx, ty::ParamEnv::reveal_all())
== Some(expected_bytes) =>
{
// Zero-extend iN to the array lengh:
let ze = bx.zext(i_, bx.type_ix(expected_bytes * 8));

// Convert the integer to a byte array
let ptr = bx.alloca(bx.type_ix(expected_bytes * 8), Align::ONE);
bx.store(ze, ptr, Align::ONE);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For anyone more familiar with this than me: is there any reason this alignment or the one below must be greater than a byte?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, when these are handled by SSE or AVX512 instructions, they are normal integers of 16 to 64 bits. Even though we are saying we want to interpret this as a byte array, we should probably use the natural word-alignment of the processor: that way, kmovq mem, reg (reg to mem) naturally hits the alignment to pick it up in a GPR correctly, and if we for some reason are executing AVX512 features on a 32-bit processor, we still are correctly aligned for such (since then it would go into two GPRs).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think anything is preventing LLVM from using a greater alignment if it's more efficient, but the actual alignment is just 1, right? Regular u8 arrays are definitely used with larger alignments in most situations, but their alignment is still 1.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as you're coherent about this (I believe PlaceRef tracks it for you, so it may be worth trying to use it more), you can increase the alignment of the alloca (and then pass that alignment to the store/load - whether manually, or relying on the PlaceRef API, or letting LLVM infer it).

I'm not sure how useful it is here, but it may make LLVM more likely to generate the aligned versions of instructions that have both aligned and unaligned variants (assuming LLVM doesn't rewrite this cast away).

Also, if you're using allocas, please consider using place.storage_{live,dead}() before/after all of the memory operations on the alloca - again I'm unsure whether this is significant in the context (since e.g. a small wrapper that gets inlined will do the right thing itself during LLVM inlining), but the default without those (i.e. without llvm.lifetime.{start,end}) is every alloca adds to the stack frame size (for the entire function).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm. At least hypothetically this should in fact be "a small wrapper that gets inlined". I don't think either of us is familiar with the PlaceRef API tho'. But that shouldn't be a blocking concern.

let array_ty = bx.type_array(bx.type_i8(), expected_bytes);
let ptr = bx.pointercast(ptr, bx.cx.type_ptr_to(array_ty));
return Ok(bx.load(array_ty, ptr, Align::ONE));
}
_ => return_error!(
"cannot return `{}`, expected `u{}` or `[u8; {}]`",
ret_ty,
expected_int_bits,
expected_bytes
),
}
}

fn simd_simple_float_intrinsic(
Expand Down
10 changes: 5 additions & 5 deletions src/test/ui/simd/intrinsic/generic-bitmask.rs
Original file line number Diff line number Diff line change
Expand Up @@ -51,19 +51,19 @@ fn main() {
let _: u64 = simd_bitmask(m64);

let _: u16 = simd_bitmask(m2);
//~^ ERROR bitmask `u16`, expected `u8`
//~^ ERROR invalid monomorphization of `simd_bitmask` intrinsic

let _: u16 = simd_bitmask(m8);
//~^ ERROR bitmask `u16`, expected `u8`
//~^ ERROR invalid monomorphization of `simd_bitmask` intrinsic

let _: u32 = simd_bitmask(m16);
//~^ ERROR bitmask `u32`, expected `u16`
//~^ ERROR invalid monomorphization of `simd_bitmask` intrinsic

let _: u64 = simd_bitmask(m32);
//~^ ERROR bitmask `u64`, expected `u32`
//~^ ERROR invalid monomorphization of `simd_bitmask` intrinsic

let _: u128 = simd_bitmask(m64);
//~^ ERROR bitmask `u128`, expected `u64`
//~^ ERROR invalid monomorphization of `simd_bitmask` intrinsic

}
}
10 changes: 5 additions & 5 deletions src/test/ui/simd/intrinsic/generic-bitmask.stderr
Original file line number Diff line number Diff line change
@@ -1,28 +1,28 @@
error[E0511]: invalid monomorphization of `simd_bitmask` intrinsic: bitmask `u16`, expected `u8`
error[E0511]: invalid monomorphization of `simd_bitmask` intrinsic: cannot return `u16`, expected `u8` or `[u8; 1]`
--> $DIR/generic-bitmask.rs:53:22
|
LL | let _: u16 = simd_bitmask(m2);
| ^^^^^^^^^^^^^^^^

error[E0511]: invalid monomorphization of `simd_bitmask` intrinsic: bitmask `u16`, expected `u8`
error[E0511]: invalid monomorphization of `simd_bitmask` intrinsic: cannot return `u16`, expected `u8` or `[u8; 1]`
--> $DIR/generic-bitmask.rs:56:22
|
LL | let _: u16 = simd_bitmask(m8);
| ^^^^^^^^^^^^^^^^

error[E0511]: invalid monomorphization of `simd_bitmask` intrinsic: bitmask `u32`, expected `u16`
error[E0511]: invalid monomorphization of `simd_bitmask` intrinsic: cannot return `u32`, expected `u16` or `[u8; 2]`
--> $DIR/generic-bitmask.rs:59:22
|
LL | let _: u32 = simd_bitmask(m16);
| ^^^^^^^^^^^^^^^^^

error[E0511]: invalid monomorphization of `simd_bitmask` intrinsic: bitmask `u64`, expected `u32`
error[E0511]: invalid monomorphization of `simd_bitmask` intrinsic: cannot return `u64`, expected `u32` or `[u8; 4]`
--> $DIR/generic-bitmask.rs:62:22
|
LL | let _: u64 = simd_bitmask(m32);
| ^^^^^^^^^^^^^^^^^

error[E0511]: invalid monomorphization of `simd_bitmask` intrinsic: bitmask `u128`, expected `u64`
error[E0511]: invalid monomorphization of `simd_bitmask` intrinsic: cannot return `u128`, expected `u64` or `[u8; 8]`
--> $DIR/generic-bitmask.rs:65:23
|
LL | let _: u128 = simd_bitmask(m64);
Expand Down
11 changes: 5 additions & 6 deletions src/test/ui/simd/intrinsic/generic-select.rs
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,7 @@ struct b8x4(pub i8, pub i8, pub i8, pub i8);

#[repr(simd)]
#[derive(Copy, Clone, PartialEq)]
struct b8x8(pub i8, pub i8, pub i8, pub i8,
pub i8, pub i8, pub i8, pub i8);
struct b8x8(pub i8, pub i8, pub i8, pub i8, pub i8, pub i8, pub i8, pub i8);

extern "platform-intrinsic" {
fn simd_select<T, U>(x: T, a: U, b: U) -> U;
Expand Down Expand Up @@ -50,15 +49,15 @@ fn main() {
//~^ ERROR found non-SIMD `u32`

simd_select_bitmask(0u16, x, x);
//~^ ERROR mask length `16` != other vector length `4`
//
//~^ ERROR invalid bitmask `u16`, expected `u8` or `[u8; 1]`

simd_select_bitmask(0u8, 1u32, 2u32);
//~^ ERROR found non-SIMD `u32`

simd_select_bitmask(0.0f32, x, x);
//~^ ERROR `f32` is not an integral type
//~^ ERROR invalid bitmask `f32`, expected `u8` or `[u8; 1]`

simd_select_bitmask("x", x, x);
//~^ ERROR `&str` is not an integral type
//~^ ERROR invalid bitmask `&str`, expected `u8` or `[u8; 1]`
}
}
22 changes: 11 additions & 11 deletions src/test/ui/simd/intrinsic/generic-select.stderr
Original file line number Diff line number Diff line change
@@ -1,47 +1,47 @@
error[E0511]: invalid monomorphization of `simd_select` intrinsic: mismatched lengths: mask length `8` != other vector length `4`
--> $DIR/generic-select.rs:40:9
--> $DIR/generic-select.rs:39:9
|
LL | simd_select(m8, x, x);
| ^^^^^^^^^^^^^^^^^^^^^

error[E0511]: invalid monomorphization of `simd_select` intrinsic: mask element type is `u32`, expected `i_`
--> $DIR/generic-select.rs:43:9
--> $DIR/generic-select.rs:42:9
|
LL | simd_select(x, x, x);
| ^^^^^^^^^^^^^^^^^^^^

error[E0511]: invalid monomorphization of `simd_select` intrinsic: mask element type is `f32`, expected `i_`
--> $DIR/generic-select.rs:46:9
--> $DIR/generic-select.rs:45:9
|
LL | simd_select(z, z, z);
| ^^^^^^^^^^^^^^^^^^^^

error[E0511]: invalid monomorphization of `simd_select` intrinsic: expected SIMD argument type, found non-SIMD `u32`
--> $DIR/generic-select.rs:49:9
--> $DIR/generic-select.rs:48:9
|
LL | simd_select(m4, 0u32, 1u32);
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^

error[E0511]: invalid monomorphization of `simd_select_bitmask` intrinsic: mismatched lengths: mask length `16` != other vector length `4`
--> $DIR/generic-select.rs:52:9
error[E0511]: invalid monomorphization of `simd_select_bitmask` intrinsic: invalid bitmask `u16`, expected `u8` or `[u8; 1]`
--> $DIR/generic-select.rs:51:9
|
LL | simd_select_bitmask(0u16, x, x);
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

error[E0511]: invalid monomorphization of `simd_select_bitmask` intrinsic: expected SIMD argument type, found non-SIMD `u32`
--> $DIR/generic-select.rs:55:9
--> $DIR/generic-select.rs:54:9
|
LL | simd_select_bitmask(0u8, 1u32, 2u32);
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

error[E0511]: invalid monomorphization of `simd_select_bitmask` intrinsic: `f32` is not an integral type
--> $DIR/generic-select.rs:58:9
error[E0511]: invalid monomorphization of `simd_select_bitmask` intrinsic: invalid bitmask `f32`, expected `u8` or `[u8; 1]`
--> $DIR/generic-select.rs:57:9
|
LL | simd_select_bitmask(0.0f32, x, x);
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

error[E0511]: invalid monomorphization of `simd_select_bitmask` intrinsic: `&str` is not an integral type
--> $DIR/generic-select.rs:61:9
error[E0511]: invalid monomorphization of `simd_select_bitmask` intrinsic: invalid bitmask `&str`, expected `u8` or `[u8; 1]`
--> $DIR/generic-select.rs:60:9
|
LL | simd_select_bitmask("x", x, x);
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Expand Down
52 changes: 52 additions & 0 deletions src/test/ui/simd/simd-bitmask.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
//run-pass
//ignore-endian-big behavior of simd_select_bitmask is endian-specific
#![feature(repr_simd, platform_intrinsics)]

extern "platform-intrinsic" {
fn simd_bitmask<T, U>(v: T) -> U;
fn simd_select_bitmask<T, U>(m: T, a: U, b: U) -> U;
}

#[derive(Copy, Clone)]
#[repr(simd)]
struct Simd<T, const N: usize>([T; N]);

fn main() {
unsafe {
let v = Simd::<i8, 4>([-1, 0, -1, 0]);
let i: u8 = simd_bitmask(v);
let a: [u8; 1] = simd_bitmask(v);

assert_eq!(i, 0b0101);
assert_eq!(a, [0b0101]);

let v = Simd::<i8, 16>([0, 0, -1, -1, 0, 0, 0, 0, 0, 0, 0, 0, -1, 0, -1, 0]);
let i: u16 = simd_bitmask(v);
let a: [u8; 2] = simd_bitmask(v);

assert_eq!(i, 0b0101000000001100);
assert_eq!(a, [0b1100, 0b01010000]);
}

unsafe {
let a = Simd::<i32, 8>([0, 1, 2, 3, 4, 5, 6, 7]);
let b = Simd::<i32, 8>([8, 9, 10, 11, 12, 13, 14, 15]);
let e = [0, 9, 2, 11, 12, 13, 14, 15];

let r = simd_select_bitmask(0b0101u8, a, b);
assert_eq!(r.0, e);

let r = simd_select_bitmask([0b0101u8], a, b);
assert_eq!(r.0, e);

let a = Simd::<i32, 16>([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]);
let b = Simd::<i32, 16>([16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31]);
let e = [16, 17, 2, 3, 20, 21, 22, 23, 24, 25, 26, 27, 12, 29, 14, 31];

let r = simd_select_bitmask(0b0101000000001100u16, a, b);
assert_eq!(r.0, e);

let r = simd_select_bitmask([0b1100u8, 0b01010000u8], a, b);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

won't this break on big-endian?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not exactly sure how LLVM handles the bitcast. We've actually seen the opposite issue (bit order, not byte order) on BE: https://github.com/rust-lang/portable-simd/blob/master/crates/core_simd/src/masks/full_masks.rs#L118-L125

Copy link
Member

@programmerjake programmerjake Sep 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bitcasts in llvm are supposed to be semantically equivalent to a memcpy iirc

The ‘bitcast’ instruction converts value to type ty2. It is always a no-op cast because no bits change with this conversion. The conversion is done as if the value had been stored to memory and read back as type ty2.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try running tests on power64 (not power64le), it'll likely break

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inlined from conversation, relevant snippets from @programmerjake:

iirc the issue is that architectures have both a bit-level endianness and a byte-level endianness, both of which can independently vary
install qemu-user and use that to run tests†

† for ppc64

I am not as sure that bit-endianness should be a problem, however, per the LLVM LangRef: Vector Type:

In general vector elements are laid out in memory in the same way as array types. Such an analogy works fine as long as the vector elements are byte sized. However, when the elements of the vector aren’t byte sized it gets a bit more complicated. One way to describe the layout is by describing what happens when a vector such as is bitcasted to an integer type with N*M bits, and then following the rules for storing such an integer to memory.

I think the greater problem might be that it is not clear the i1 to integer bitcast was ever well-formed.

That said, the LLVM LangRef on bitcasts also expounds on vectors:

There is a caveat for bitcasts involving vector types in relation to endianess. For example bitcast <2 x i8> to i16 puts element zero of the vector in the least significant bits of the i16 for little-endian while element zero ends up in the most significant bits for big-endian.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I found a comment in a test that completely confirms @programmerjake's remarks, actually:

// ignore-endian-big behavior of simd_select_bitmask is endian-specific

🙃

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just copied this to the test. We can work out the endianness (and add tests if necessary) in another issue.

assert_eq!(r.0, e);
}
}