Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add all integer overloads for Avx2.BlendVariable #10679

Closed
fiigii opened this issue Jul 13, 2018 · 8 comments
Closed

Add all integer overloads for Avx2.BlendVariable #10679

fiigii opened this issue Jul 13, 2018 · 8 comments
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

Comments

@fiigii
Copy link
Contributor

fiigii commented Jul 13, 2018

Now, we only have the byte/sbyte overload for Avx2.BlendVariable since the instruction uses the mask vector as a byte sequence.
However, Avx2.BlendVariable is very common to use, adding all integer overloads (short/ushort/int/uint/long/ulong) will significantly improve the user experience.

cc @tannergooding @CarolEidt @eerhardt

@saucecontrol
Copy link
Member

If you do it on Avx2.BlendVariable, I'd think it should be done on all the byte-shuffle ops. Ssse3.Shuffle would also need them, for example.

And there are less obvious ones like Ssse3.AlignRight.

Now that the StaticCast calls get folded, it's strictly a usability issue. Perhaps the answer is to improve the StaticCast to be more terse?

@fiigii
Copy link
Contributor Author

fiigii commented Jul 13, 2018

If you do it on Avx2.BlendVariable, I'd think it should be done on all the byte-shuffle ops. Ssse3.Shuffle would also need them, for example.

Agree, I will investigate it and put here later. Thanks!

Perhaps the answer is to improve the StaticCast to be more terse?

Don't you mean eliminating StaticCast and using implicit conversion of Vector128/256 instead?

@saucecontrol
Copy link
Member

saucecontrol commented Jul 13, 2018

Don't you mean eliminating StaticCast and using implicit conversion of Vector128/256 instead?

Yeah, that might be ideal. I remember seeing the discussions around when generics would be used and when the argument types would be exploded, and while the rules make a lot of sense, there's so much flexibility built into the way the Intel intrinsics are defined with __m128i and __m256i that it would mean a ton of overloads to cover everything people will be doing with these instructions.

@tannergooding
Copy link
Member

Perhaps the answer is to improve the StaticCast to be more terse?

One of the proposals here was to partially explode the StaticCast methods. That is, where today you have Vector128<U> StaticCast<T, U>(Vector128<T>) you could instead do Vector128<T> StaticCast<T>(Vector128<float>) (and explode the other 9 types as well). This makes it significantly less verbose.

using implicit conversion of Vector128/256 instead?

I don't think implicit conversions are a good idea, and they aren't supported with some languages (such as F#).

@tannergooding
Copy link
Member

As for the general proposal. I think it would depend on how confusing it is perceived to pass in a Vector128<int> but operate as if it was a Vector128<byte>.

@saucecontrol
Copy link
Member

saucecontrol commented Jul 13, 2018

you could instead do Vector128<T> StaticCast<T>(Vector128<float>)

I like this idea a lot. It should totally be a thing.

Also, what was the call on the naming? I remember seeing a suggestion that it be called ReinterpretCast, but why not just Cast? That's what's used in System.Memory for reinterpreting Span<T>s

Also also, let's say that doesn't make it past API review... If I were to define my own partially exploded Cast, would that get inlined and then folded away same as the official one?

I think it would depend on how confusing it is perceived to pass in a Vector128<int> but operate as if it was a Vector128<byte>.

I think it goes back to the target audience for these. You have to assume the user knows what the instruction does, and if the instruction operates on the byte level, I'd know that if I were calling it, regardless of the argument types.

@tannergooding
Copy link
Member

Also also, let's say that doesn't make it past API review... If I were to define my own partially exploded Cast, would that get inlined and then folded away same as the official one?

Provided it called the underlying StaticCast methods, it should.

@4creators
Copy link
Contributor

One of the proposals here was to partially explode the StaticCast methods. That is, where today you have Vector128 StaticCast<T, U>(Vector128) you could instead do Vector128 StaticCast(Vector128) (and explode the other 9 types as well).

Proposal to simplify StaticCast<T, U> API is here: https://github.com/dotnet/corefx/issues/27911#issuecomment-372684004

@msftgits msftgits transferred this issue from dotnet/coreclr Jan 31, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 16, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

No branches or pull requests

4 participants