-
Notifications
You must be signed in to change notification settings - Fork 356
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
_mm_movemask_epi8
unexpectedly triggers Undefined Behavior
#2617
Comments
I am fairly sure I was told that |
Didn't know that. If that is indeed the case please revert rust-lang/stdarch#1331. |
https://doc.rust-lang.org/nightly/std/simd/struct.Mask.html#method.from_int_unchecked indicates that this is a preconditon of from_int_unchecked, which I took as a sign that the underlying intrinsic has that requirement. |
It isn't documented anywhere (none of the SIMD intrinsics are, which isn't good) but yes, my understanding is that the various bitmask intrinsics expect all 0 or 1 lanes. |
(regardless of the other bits) Changing the requirement may introduce unsoundness into existing crates. |
To clarify, I mean the |
Close as it is not an issue of miri. |
The documentation in rustc's codegen says that |
That's an internal documentation, not necessarily a stable API-side promise. https://github.com/rust-lang/portable-simd/blob/master/crates/core_simd/src/intrinsics.rs is the closest thing we have to API docs for these intrinsics. It's not terribly detailed, but it is clear on this question // truncate integer vector to bitmask
// `fn simd_bitmask(vector) -> unsigned integer` takes a vector of integers and
// returns either an unsigned integer or array of `u8`.
// Every element in the vector becomes a single bit in the returned bitmask.
// If the vector has less than 8 lanes, a u8 is returned with zeroed trailing bits.
// The bit order of the result depends on the byte endianness. LSB-first for little
// endian and MSB-first for big endian.
//
// UB if called on a vector with values other than 0 and -1.
#[allow(unused)]
pub(crate) fn simd_bitmask<T, U>(x: T) -> U; |
I'm not sure who is in charge of these intrinsics, btw -- usually intrinsics are t-lang territory, but I don't think they are involved here. This is so far just @rust-lang/project-portable-simd experimenting. As long as that is the case, the intrinsics should probably not be used outside of https://github.com/rust-lang/portable-simd/ . Everybody else should use the APIs exposed in |
I don't think I think we'd say just to use |
I think the precondition for |
Yes, definitely enforcing this on |
Yeah I don't think anyone meant to say that The question is whether
|
I'm just saying that |
The reason it does that is because that's how For note, I'm fine with saying the definition for |
Miri accepts a |
That sounds terrible, since it muddies the water around what UB is. We have two choices here:
|
It's an internal function never intended for stabilization, so we have some more options I think, including defining it to be the MSB for now and changing that in the future if needed. Both miri and stdarch are tied to releases of the compiler, so they can rely on internal details. I'm necessarily not saying this is the approach we should take (it has maintenance downsides), but it's definitely an option. (I also think calling this the only piece of target-specific behavior in Rust is fairly dubious, and requires ignoring the behavior of many builtin attributes, floating point numbers, much of |
For now:
The three parts work well but can not be put together. What should we do to solve this? BTW there is another potential conflict: |
Do we implement these using a common intrinsic? I believe they all have different underlying behavior in some cases. |
No. They are implemented by special llvm intrinsics now. |
Then there's no conflict? |
Yes. I mean a conflict like |
The |
Fair -- but it is the only piece of target-specific behavior that a MIR-level Rust interpreter or verifier or model like MiniRust has to worry about, so far. Miri is literally entirely target-agnostics except for endianess and pointer size. (The differences in type layouts are factored into a separate part of rustc so Miri just sees the results of that computation -- this is hiding some other target-specific bits like the alignment of
So is that the behavior of LLVM on all targets right now? Also what about other 'boolean' SIMD intrinsics? IMO they should all behave the same.
Do they all look at the MSB only on all targets? |
Most/all of the others would suffer from significantly pessimized codegen on some/all platforms by defining them this way, so I think that's a non-starter. |
I think my stance is that ideally we wouldn't be implementing I can't speak for anybody else, but as a member of wg-portable-simd (not one that implemented these functions, though), it wouldn't really bother me that much to take an approach of carving out a set that's useful for both in a piecemeal fashion, e.g. possibly starting with what I described here. What would bother me to define the |
Ah I see. I think it'd be odd if different SIMD intrinsics had different rules for what the representation of a "SIMD boolean stored as an int vector" is. To me as Miri maintainer and language specifier, these intrinsics are the interface and I don't like seeing them become messy. We could, however, have a
Yeah I think so too.
Ah, fair... I didn't realize that these changes were done to have cranelift (and Miri!) support more of |
Would it make sense to split simd_bitmask into one intrinsic which only looks at the MSB of every lane and one which makes it UB to have lanes that are not all zero or all one? If not as I already said I'm fine with reverting this specific stdarch change. For most simd_* cases in stdarch there is only one possible behavior of the simd_* intrinsic that makes sense. For example for simd_add anything other than a regular twos complement lanewise addition for integer vector types should in all likelyhood use a different name like simd_add_unchecked. For those cases IMO stdarch should keep using them. I have tried to use simd_* intrinsics for float ops in stdarch, but there behavior is more ambiguous and in fact tests failed around NaN. As such I think the float operations in stdarch should keep using LLVM intrinsics. They aren't all that important for cg_clif anyway as integer and bitwise operations are much more common. |
I think that's a good solution for now. |
If having a "convert MSB-bool-vector to full-width-bool-vector" intrinsic does not give the desired result then that seems like the 2nd best thing, yeah. |
Shift right by |
Yeah, I'm not sure LLVM can see through that. My experience is that LLVM often gets confused with bool vector operations, frustratingly. Note that this is often a very hot operation in inner loops, and people tend to reach for simd in code that is performance critical, so I think having two intrinsics would be a better option. As having LLVM codegen two instructions here when it should emit one would be... highly undesirable.
That's kind of unfortunate. The nature of SIMD is that it's inherently extremely performance sensitive, and fairly platform specific. There is some amount of consistency between the actual operations exposed by different SIMD ISAs, but it's somewhat expected that those might not have the internal consistency that you desire. I also think that coming up with an efficient and consistent user-facing API surface for this is hard enough (the semantics of boolean vectors and masks in particular took us a very long time to work out, with a lot of heated debate) that I would really want to push back on adding the similar consistency requirements to private internals APIs. |
As an example, we're already facing a performance issue with masks in rust-lang/portable-simd#312 which is likely going to require codegen change to one of the |
I think these are not quite as private as you might like -- any tool that wants to analyze or verify Rust code for correctness (e.g. Miri but also all the static analysis and verification tools being built) need to have proper models of these functions. So they are about as private as the rest of MIR I would say. (Intrinsics are part of the MIR syntax for all intents and purposes.) We are certainly not making stability promises, but we should keep in mind that not all consumers of this API are in-tree. For SIMD specifically, I don't know for sure that there is an out-of-tree consumer, but I know for sure that there are people that would like to run their static analysis and verification tools on SIMD code and they asked about how to best handle that. Given that modeling all of stdarch is not feasible, the portable-simd intrinsics are a great way to increase the amount of code these tools can handle. That's why Miri supports them. Anyway, two intrinsics seem like a reasonable compromise to me for this particular case. |
What we want ultimately is for there to be specified-in-Rust models of how these intrinsics can be executed using scalar primitives, and for this case, I think we should easily have the option to wrap more basal intrinsics with higher level abstractions. In this case, the x86 |
It looks like the pattern https://godbolt.org/z/7KKrcojY9 #![feature(portable_simd)]
extern crate core;
use core::arch::x86_64::*;
use core::simd::*;
#[inline(always)]
fn i8xn_bitmask<const N: usize>(x: Simd<i8, N>) -> <Mask<i8, N> as ToBitMask>::BitMask
where
LaneCount<N>: SupportedLaneCount,
Mask<i8, N>: ToBitMask,
{
x.simd_lt(Simd::splat(0)).to_bitmask()
}
#[target_feature(enable = "sse2")]
pub unsafe fn is_ascii_sse2(a: &[u8; 16]) -> bool {
let a = _mm_loadu_si128(a.as_ptr().cast());
let m = i8xn_bitmask(a.into());
m == 0
}
#[target_feature(enable = "avx2")]
pub unsafe fn is_ascii_avx2(a: &[u8; 32]) -> bool {
let a = _mm256_loadu_si256(a.as_ptr().cast());
let m = i8xn_bitmask(a.into());
m == 0
} |
Hm, if memory serves, as it's part of std, the result will be precompiled anyways, won't it? |
I'd like us to have codegen tests that ensure this is optimized appropriately if we're going to go this route. I wouldn't feel good about shipping a perf regression to all |
Close as Miri does not need to change for this. |
playground
See also:
The text was updated successfully, but these errors were encountered: