Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lowering Vector512() methods : comparison + shift #84942

Merged
merged 4 commits into from
Apr 21, 2023

Conversation

DeepakRajendrakumaran
Copy link
Contributor

@DeepakRajendrakumaran DeepakRajendrakumaran commented Apr 17, 2023

Includes following

Comparison : LessThan(), LessThanOrEqual(), GreaterThan(), GreaterThanOrEqual() +corresponding *ANY(), *ALL() and ConditionalSelect()

Arithmetic : ShiftLeft, ShiftRight, ShiftRightArithmetic

Open : Some cases for ConditionalSelect() uses blend. Skipping for now. Have some other thoughts here anyway : using VPTERNLOG()??

@dotnet/avx512-contrib

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Apr 17, 2023
src/coreclr/jit/gentree.cpp Outdated Show resolved Hide resolved
@BruceForstall BruceForstall added the avx512 Related to the AVX-512 architecture label Apr 17, 2023
@DeepakRajendrakumaran DeepakRajendrakumaran marked this pull request as ready for review April 17, 2023 23:27
@DeepakRajendrakumaran
Copy link
Contributor Author

@BruceForstall
Copy link
Member

cc @dotnet/jit-contrib

@tannergooding
Copy link
Member

There's some merge conflicts that need to be resolved, let me know if you need any assistance with them.

@DeepakRajendrakumaran
Copy link
Contributor Author

There's some merge conflicts that need to be resolved, let me know if you need any assistance with them.

Have resolved them

@DeepakRajendrakumaran DeepakRajendrakumaran force-pushed the Deepak_comparison branch 3 times, most recently from ad69641 to 46b5933 Compare April 19, 2023 18:24
…corresponding *ANY(), *ALL() and ConditionaSelect()

ShiftLeft, ShiftRight, ShiftRightArithmetic
Comment on lines +20980 to +20985
// TODO-XArch-CQ: It's a non-trivial amount of work to support these
// for floating-point while only utilizing AVX. It would require, among
// other things, inverting the comparison and potentially support for a
// new Avx.TestNotZ intrinsic to ensure the codegen remains efficient.
assert(compIsaSupportedDebugOnly(InstructionSet_AVX2));
intrinsic = NI_Vector256_op_Equality;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For other reviewers, this is just a copy/paste of an existing comment that used to be shared across the total of GE/GT/LE/LT

It's actually a lot easier for us to handle this today if we wanted to and is maybe a small/easy win we can do for .NET 8 (of course in a separate PR), so having the logic duplicated now will make doing that a bit simpler

@BruceForstall
Copy link
Member

Manually triggered replay to try to get past infra issue: https://dev.azure.com/dnceng-public/public/_build/results?buildId=245191&view=results

@DeepakRajendrakumaran
Copy link
Contributor Author

Manually triggered replay to try to get past infra issue: https://dev.azure.com/dnceng-public/public/_build/results?buildId=245191&view=results

Manually triggered replay to try to get past infra issue: https://dev.azure.com/dnceng-public/public/_build/results?buildId=245191&view=results

Thanks @BruceForstall .Looks like it passed on rerun. Do let me know if any other changes are needed or this is good to go.

intrinsic = NI_Vector512_op_Equality;
}
else if (simdSize == 32)
if (simdSize == 32)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just curious, why we decided to check for 32 , then 64 and then (implicit) 16 or less? Is it because Vector256 is more common and should hit that condition first from TP perspective?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. Initially when we had throughput issues, this was one of the things we tried. Making check for 32 the first(being the most common case)

Copy link
Member

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A suggestion and a question. Feel free to do it in follow-up PR or including it with the next PR.

@tannergooding
Copy link
Member

tannergooding commented Apr 20, 2023

Resolved conflict with multiply pr

@BruceForstall
Copy link
Member

BruceForstall commented Apr 20, 2023

Resolved conflict with #85070

@tannergooding tannergooding merged commit f8d1116 into dotnet:main Apr 21, 2023
@ghost ghost locked as resolved and limited conversation to collaborators May 21, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI avx512 Related to the AVX-512 architecture
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants