-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow simd_bitmask to return byte arrays #88868
Changes from 4 commits
7964942
3981ca0
569c51d
1e18869
fe8ae57
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,52 @@ | ||
//run-pass | ||
//ignore-endian-big behavior of simd_select_bitmask is endian-specific | ||
#![feature(repr_simd, platform_intrinsics)] | ||
|
||
extern "platform-intrinsic" { | ||
fn simd_bitmask<T, U>(v: T) -> U; | ||
fn simd_select_bitmask<T, U>(m: T, a: U, b: U) -> U; | ||
} | ||
|
||
#[derive(Copy, Clone)] | ||
#[repr(simd)] | ||
struct Simd<T, const N: usize>([T; N]); | ||
|
||
fn main() { | ||
unsafe { | ||
let v = Simd::<i8, 4>([-1, 0, -1, 0]); | ||
let i: u8 = simd_bitmask(v); | ||
let a: [u8; 1] = simd_bitmask(v); | ||
|
||
assert_eq!(i, 0b0101); | ||
assert_eq!(a, [0b0101]); | ||
|
||
let v = Simd::<i8, 16>([0, 0, -1, -1, 0, 0, 0, 0, 0, 0, 0, 0, -1, 0, -1, 0]); | ||
let i: u16 = simd_bitmask(v); | ||
let a: [u8; 2] = simd_bitmask(v); | ||
|
||
assert_eq!(i, 0b0101000000001100); | ||
assert_eq!(a, [0b1100, 0b01010000]); | ||
} | ||
|
||
unsafe { | ||
let a = Simd::<i32, 8>([0, 1, 2, 3, 4, 5, 6, 7]); | ||
let b = Simd::<i32, 8>([8, 9, 10, 11, 12, 13, 14, 15]); | ||
let e = [0, 9, 2, 11, 12, 13, 14, 15]; | ||
|
||
let r = simd_select_bitmask(0b0101u8, a, b); | ||
assert_eq!(r.0, e); | ||
|
||
let r = simd_select_bitmask([0b0101u8], a, b); | ||
assert_eq!(r.0, e); | ||
|
||
let a = Simd::<i32, 16>([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]); | ||
let b = Simd::<i32, 16>([16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31]); | ||
let e = [16, 17, 2, 3, 20, 21, 22, 23, 24, 25, 26, 27, 12, 29, 14, 31]; | ||
|
||
let r = simd_select_bitmask(0b0101000000001100u16, a, b); | ||
assert_eq!(r.0, e); | ||
|
||
let r = simd_select_bitmask([0b1100u8, 0b01010000u8], a, b); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. won't this break on big-endian? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not exactly sure how LLVM handles the bitcast. We've actually seen the opposite issue (bit order, not byte order) on BE: https://github.com/rust-lang/portable-simd/blob/master/crates/core_simd/src/masks/full_masks.rs#L118-L125 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. bitcasts in llvm are supposed to be semantically equivalent to a memcpy iirc
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. try running tests on power64 (not power64le), it'll likely break There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Inlined from conversation, relevant snippets from @programmerjake:
† for ppc64 I am not as sure that bit-endianness should be a problem, however, per the LLVM LangRef: Vector Type:
I think the greater problem might be that it is not clear the i1 to integer bitcast was ever well-formed. That said, the LLVM LangRef on bitcasts also expounds on vectors:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm, I found a comment in a test that completely confirms @programmerjake's remarks, actually:
🙃 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I just copied this to the test. We can work out the endianness (and add tests if necessary) in another issue. |
||
assert_eq!(r.0, e); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For anyone more familiar with this than me: is there any reason this alignment or the one below must be greater than a byte?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, when these are handled by SSE or AVX512 instructions, they are normal integers of 16 to 64 bits. Even though we are saying we want to interpret this as a byte array, we should probably use the natural word-alignment of the processor: that way,
kmovq mem, reg
(reg to mem) naturally hits the alignment to pick it up in a GPR correctly, and if we for some reason are executing AVX512 features on a 32-bit processor, we still are correctly aligned for such (since then it would go into two GPRs).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think anything is preventing LLVM from using a greater alignment if it's more efficient, but the actual alignment is just 1, right? Regular
u8
arrays are definitely used with larger alignments in most situations, but their alignment is still 1.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As long as you're coherent about this (I believe
PlaceRef
tracks it for you, so it may be worth trying to use it more), you can increase the alignment of thealloca
(and then pass that alignment to thestore
/load
- whether manually, or relying on thePlaceRef
API, or letting LLVM infer it).I'm not sure how useful it is here, but it may make LLVM more likely to generate the aligned versions of instructions that have both aligned and unaligned variants (assuming LLVM doesn't rewrite this cast away).
Also, if you're using
alloca
s, please consider usingplace.storage_{live,dead}()
before/after all of the memory operations on thealloca
- again I'm unsure whether this is significant in the context (since e.g. a small wrapper that gets inlined will do the right thing itself during LLVM inlining), but the default without those (i.e. withoutllvm.lifetime.{start,end}
) is everyalloca
adds to the stack frame size (for the entire function).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm. At least hypothetically this should in fact be "a small wrapper that gets inlined". I don't think either of us is familiar with the PlaceRef API tho'. But that shouldn't be a blocking concern.