Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

Add all integer overloads for AlignRight/BlendVariable and unsigned overloads for MultiplyLow #19420

Merged
merged 3 commits into from
Sep 20, 2018

Conversation

fiigii
Copy link

@fiigii fiigii commented Aug 10, 2018

This PR adds all integer overloads for AVX2/SSSE3 AlignRight and Avx2/SSE4.1 BlendVariable that originally only has byte/sbyte overloads. We make this change because these intrinsics are very common to use in integer SIMD programming, so the change will significantly improve user experience from avoiding StaticCast

Note, although integer BlendVariable works on the vector mask of "8-bit", that usually directly consumes the mask generated by other intrinsic (like Compare). The "generated" mask has all-one bits on the corresponding selected elements, so that can perfectly work with other (wider) base types. If users want to manually create the mask, the IMM version (Blend) should be the better choice.

Additionally, there are some discussions (https://github.com/dotnet/coreclr/issues/18910#issuecomment-404963232) that request the same change on AVX2/SSSE3 Shuffle (e.g., Vector128<byte> Shuffle(Vector128<byte> value, Vector128<byte> mask)). However, this one is different, which Shuffle usually needs a specially built "mask" rather than "generated mask" to indicate the selected elements.

This PR also temporarily disables the tests of SSE4.1 BlendVariable.

Close https://github.com/dotnet/coreclr/issues/18910

@fiigii
Copy link
Author

fiigii commented Aug 13, 2018

@tannergooding
Copy link
Member

CC. @terrajobst as well.

This probably deserves some more API design discussion. Effectively, this is just avoiding the need to insert StaticCast and could probably be solved trivially with helper methods if a user needed them.

@fiigii
Copy link
Author

fiigii commented Sep 19, 2018

This change was approved by the API review meeting. Can we approve here? @tannergooding @CarolEidt @eerhardt


/// <summary>
/// __m256i _mm256_alignr_epi8 (__m256i a, __m256i b, const int count)
/// VPALIGNR ymm, ymm, ymm/m256, imm8
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: It might be nice to have a comment on these explaining that the actual instruction operates on bytes and that you should adjust the mask appropriately

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, will do.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -44,23 +44,12 @@ public abstract class Sse41 : Ssse3
/// <summary>
/// __m128i _mm_blendv_epi8 (__m128i a, __m128i b, __m128i mask)
/// PBLENDVB xmm, xmm/m128, xmm
/// </summary>
public static Vector128<sbyte> BlendVariable(Vector128<sbyte> left, Vector128<sbyte> right, Vector128<sbyte> mask) { throw new PlatformNotSupportedException(); }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the API review determined that we should just explode these, rather than making them generic?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, since AVX2 BlendVariable is for integer vectors and floating versions are in AVX.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. But the diff looks like you are making them generic, instead of exploding them...

Maybe, I'm missing something?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, you are talking about SSE4.1 versions. SSE4.1 supports integer and floating both, so after exploding integer types, that will support all the 10 base types. And we have decided that should be generic.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the last API review, and the ARM Review before that, determined that we should just always explode and be explicit.

This also makes it clear when/if support for new types (like Half, Quad, etc) are added in the future that they aren't supported in the older ISAs

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CC. @terrajobst for confirmation.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, but that is another story, and we have had some other generic APIs. Shall we change them at all?

Btw, I believe that Intel has different Half roadmaps from ARM.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we change them at all

I believe so, but that is likely for a separate PR (as is moving the helper functions to the Vector64, Vector128, Vector256 types). I pinged Immo asking for the meeting notes.

I believe that Intel has different Half roadmaps from ARM.

Right, but I believe this was one of the "generalized" API design decisions that was made for intrinsics at the last two meetings. Which was basically: match the underlying instruction/hardware as closely as possible and always explode to avoid ambiguities from possible future expansions.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, thanks.

I believe so, but that is likely for a separate PR

Let me put these changes together and open a new issue.

Copy link
Member

@tannergooding tannergooding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@fiigii fiigii changed the title Add all integer overloads for AVX2/SSSE3 AlignRight and Avx2/SSE4.1 BlendVariable Add all integer overloads for AlignRight/BlendVariable and unsigned overloads for MultiplyLow Sep 19, 2018
@fiigii
Copy link
Author

fiigii commented Sep 19, 2018

Add unsigned overloads for MultiplyLow in this PR to solve https://github.com/dotnet/coreclr/issues/18905

@fiigii
Copy link
Author

fiigii commented Sep 20, 2018

@dotnet-bot test this please

@tannergooding
Copy link
Member

Merging, Two of the pending jobs actually completed. The other was an out of disk space issue, but a similar job (testing additional functionality) succeeded.

@tannergooding
Copy link
Member

Thansk @fiigii

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants