-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Add all integer overloads for AlignRight/BlendVariable and unsigned overloads for MultiplyLow #19420
Conversation
CC. @terrajobst as well. This probably deserves some more API design discussion. Effectively, this is just avoiding the need to insert |
This change was approved by the API review meeting. Can we approve here? @tannergooding @CarolEidt @eerhardt |
|
||
/// <summary> | ||
/// __m256i _mm256_alignr_epi8 (__m256i a, __m256i b, const int count) | ||
/// VPALIGNR ymm, ymm, ymm/m256, imm8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: It might be nice to have a comment on these explaining that the actual instruction operates on bytes and that you should adjust the mask appropriately
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, will do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -44,23 +44,12 @@ public abstract class Sse41 : Ssse3 | |||
/// <summary> | |||
/// __m128i _mm_blendv_epi8 (__m128i a, __m128i b, __m128i mask) | |||
/// PBLENDVB xmm, xmm/m128, xmm | |||
/// </summary> | |||
public static Vector128<sbyte> BlendVariable(Vector128<sbyte> left, Vector128<sbyte> right, Vector128<sbyte> mask) { throw new PlatformNotSupportedException(); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the API review determined that we should just explode these, rather than making them generic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, since AVX2 BlendVariable
is for integer vectors and floating versions are in AVX.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. But the diff looks like you are making them generic, instead of exploding them...
Maybe, I'm missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, you are talking about SSE4.1 versions. SSE4.1 supports integer and floating both, so after exploding integer types, that will support all the 10 base types. And we have decided that should be generic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the last API review, and the ARM Review before that, determined that we should just always explode and be explicit.
This also makes it clear when/if support for new types (like Half, Quad, etc) are added in the future that they aren't supported in the older ISAs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CC. @terrajobst for confirmation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, but that is another story, and we have had some other generic APIs. Shall we change them at all?
Btw, I believe that Intel has different Half roadmaps from ARM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we change them at all
I believe so, but that is likely for a separate PR (as is moving the helper functions to the Vector64
, Vector128
, Vector256
types). I pinged Immo asking for the meeting notes.
I believe that Intel has different Half roadmaps from ARM.
Right, but I believe this was one of the "generalized" API design decisions that was made for intrinsics at the last two meetings. Which was basically: match the underlying instruction/hardware as closely as possible and always explode to avoid ambiguities from possible future expansions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, thanks.
I believe so, but that is likely for a separate PR
Let me put these changes together and open a new issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Add unsigned overloads for |
@dotnet-bot test this please |
Merging, Two of the pending jobs actually completed. The other was an out of disk space issue, but a similar job (testing additional functionality) succeeded. |
Thansk @fiigii |
This PR adds all integer overloads for AVX2/SSSE3
AlignRight
and Avx2/SSE4.1BlendVariable
that originally only hasbyte/sbyte
overloads. We make this change because these intrinsics are very common to use in integer SIMD programming, so the change will significantly improve user experience from avoidingStaticCast
Note, although integer
BlendVariable
works on the vector mask of "8-bit", that usually directly consumes the mask generated by other intrinsic (likeCompare
). The "generated" mask has all-one bits on the corresponding selected elements, so that can perfectly work with other (wider) base types. If users want to manually create the mask, the IMM version (Blend
) should be the better choice.Additionally, there are some discussions (https://github.com/dotnet/coreclr/issues/18910#issuecomment-404963232) that request the same change on AVX2/SSSE3
Shuffle
(e.g.,Vector128<byte> Shuffle(Vector128<byte> value, Vector128<byte> mask)
). However, this one is different, whichShuffle
usually needs a specially built "mask" rather than "generated mask" to indicate the selected elements.This PR also temporarily disables the tests of SSE4.1
BlendVariable
.Close https://github.com/dotnet/coreclr/issues/18910