-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT ARM64-SVE: Add CreateWhileLessThan* #100949
Conversation
Note regarding the
|
Tagging subscribers to this area: @dotnet/area-system-runtime-intrinsics |
e2c5d99
to
944467a
Compare
The instruction emitted is dependent on if the inputs are signed or unsigned. This information is lost when the nodes are created (an input with The easiest way to mark this is to use |
|
@kunalspathak @dotnet/arm64-contrib |
This information should be tracked by |
Problem here is that, for example:
For both of these, the |
I've pushed a version that does this. But we still don't have enough information as there are two distinct types - in the example above - the input type ( With the latest code....
|
The return type is implicit based on the In general, The In the incredibly rare case there needs to be a 3rd type present (such as is done for |
Yes! That's a simple fix then
Looks like using this requires arg1/arg2 to be a SIMD type, whereas here we just have a scalar. Is it ok to extend these flags to support scalar args too? |
Yep, that should be fine! The parameter is named |
Done. Now supports Also had to make changes to convert-mask-to-vector code. Need to ensure it always uses the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Arm/Sve.cs
Show resolved
Hide resolved
superpmi-replay failure is #101070 |
case NI_Sve_CreateWhileLessThanMask8Bit: | ||
case NI_Sve_CreateWhileLessThanOrEqualMask8Bit: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this going to be a more common pattern? Should we have a way to make it more table driven?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Problem is we need to know the name of the intrinsic in order to know the opt
value (eg INS_OPTS_SCALABLE_H
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we not tracking the vector type anywhere? We have a couple fields (simdBaseJitType and altType for example) so we should be able to track both the overload type (int
vs uint
vs long
vs ulong
) and the vector base type/size
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switched the code so that it uses the return vector type as the basetype, and set auxiliary type to arg1 type. That simplifies the code a lot and removes many of my changes.
@@ -1515,6 +1515,17 @@ GenTree* Compiler::impHWIntrinsic(NamedIntrinsic intrinsic, | |||
} | |||
break; | |||
|
|||
case NI_Sve_CreateWhileLessThanMask8Bit: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like how we have these target specific switch statements in hwintrinsic.cpp, it makes the code hard to follow. I highly expect the number of these special cases to grow too.
I'd much prefer it if each of these instrinsics (including all the neon and X86 ones) were marked with SpecialImport
. The special import cases would then have to duplicate the common get args code, but it's only a few lines (which could go into a helper). Not going to do it for this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍, we've definitely done that for almost all the x86/x64 code, there's only a couple special handlers left for that platform (like Sse42.Crc32). It would be nice to get both platforms doing this overall consistently here.
Merged to main. Results using the new stress tester:
|
/ba-g Failure is #101559 |
* JIT ARM64-SVE: Add CreateWhileLessThan* * Set simdBaseJitType to type of input args * Hardcode opt in codegen * Fix gtNewSimdConvertMaskToVectorNode types * Use HW_Flag_BaseTypeFromFirstArg * Set base type to return type and auxiliary type to input type
* JIT ARM64-SVE: Add CreateWhileLessThan* * Set simdBaseJitType to type of input args * Hardcode opt in codegen * Fix gtNewSimdConvertMaskToVectorNode types * Use HW_Flag_BaseTypeFromFirstArg * Set base type to return type and auxiliary type to input type
Support for all CreateWhileLessThanMask and CreateWhileLessThanOrEqualMask APIs.