-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What should SIMD bitmasks look like? #126217
Comments
if we get generic-sized integers
The standard format that LLVM uses on little-endian (and x86 and a few other arches too) is that bits are counted from the LSB end to the MSB end, not MSB to LSB. The idea is if you have some integer type I strongly think that we should just use that format everywhere if we don't want an endian-dependent format and generic-sized integers aren't ready yet. |
if we get generic-sized integers uint<N> (like C's unsigned _BitInt(N)) somewhat soon, I would just use those.
I don't know of any initiative working on them. What is the current status?
|
There's a postponed RFC that people recently have been asking if it's been long enough to restart it: rust-lang/rfcs#2581 iirc 3-4 different people have been talking about it in the last month or two (mostly on Zulip or other random corners of the Rust project). |
I don't have an opinion either way -- this sounds perfectly reasonable to me. We'd then say: I think the LLVM IR for big-endian would then be something like
Is there some particular instruction sequence we want to generate here or would something like that work? |
FWIW I am also entirely open to the idea that the current behavior is already what we want (including on big-endian). But the fact that portable-simd stopped using the array-based variant entirely is an indication that something is not optimal. I have no idea what, as I don't really know the design space here. I see my role as that of an advisor with a t-opsem view point. The reason the current semantics seem odd is that Miri currently has exactly 4 places where endianess matters:
So, the intrinsics are certainly somewhat striking. But maybe that's expected for converting between arrays of bits and a more compact representation; I don't have any intuition for what to expect here. |
the reasons we stopped are that because generic const exprs aren't working that well, we have to have the output byte array have the same length as the input mask, despite being 8x overkill -- this would be solved by |
The portable-simd intrinsics
simd_bitmask
andsimd_bitmask_select
work with a bit-efficient representation of a SIMD vector ofbool
. Specifically they actually support two representations: as an integer of sufficient size, and as an array ofu8
.However, the exact format of the array-based bitmask is endianess-dependent in subtle ways, so much so that portable-simd stopped using that format. The integer-based format is pretty simple (albeit still being endianess-dependent) but does not scale to vectors of arbitrary size.
IMO we should only have one format supported by the low-level intrinsics, and leave it to higher-level code like portable-simd to convert that to other formats if needed. That one format probably has to be the array-based one, since that is the only way to actually support vectors of arbitrary size. But what should that format look like? Currently it is endianess-dependent, and then portable-simd has to do some work to convert that into an endianess-independent format and back. Can we make the intrinsic directly use an endianess-independent format? (It is extremely rare for our intrinsics to behave in an endianess-dependent way.)
What are the key constraints the format has to satisfy? Without knowing those, here's a completely naive proposal:
the array must be big enough to contain at least as many bits as the vector has elements (but that's just a lower bound, arbitrarily bigger arrays are allowed), and then vector elements are mapped to bits in the array as follows: the vector element
i
is represented in array elementi / 8
, in the biti % 8
, where bits are indexed from most significant to least significant. So for instance the vector[-1, -1, -1, 0, 0, 0, 0, 0, 0, -1]
becomes the bitmask[0b11100000, 0b01_000000]
, simply filling in the vector elements left-to right bit-for-bit and then padding with0
until the array is full. I don't know if that format is any good -- probably it is not -- but it is certainly easy to explain. :)Cc @calebzulawski @workingjubilee @programmerjake @rust-lang/opsem
The text was updated successfully, but these errors were encountered: