Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove more unnecessary scenarios from HWIntrinsic test templates and fix timeout/failure #85026

Merged
merged 10 commits into from
Apr 19, 2023

Conversation

tannergooding
Copy link
Member

@tannergooding tannergooding commented Apr 19, 2023

Follow up to #85008, this removes the unnecessary load scenarios from the rest of the hwintrinsic test templates.

It also removes the ClsVar scenarios as they weren't actually testing CLS_VAR scenarios. That would require passing in something recognizable as a GT_CNS_VEC or similar, which would require a significantly more complex template or static readonly variables + tiering to kick in.

It also removes the ClassLclFld scenario as there is no real difference between local._fld and this._fld, the latter being tested by ClassFld already.

Finally it divides the original 20 striping used in the HardwareIntrinsics_r/ro projects into the new HardwareIntrinsics_*_r/ro projects based on the original test count vs the new test count:

  • Arm - 2390 generated tests, gets 8 stripes
  • General - 2589 generated tests, gets 8 stripes
  • X86 - 504 generated tests, gets 2 stripes
  • X86_Avx - 500 generated tests, gets 2 stripes
  • X86_Avx512 - 200 generated tests (more to be added), gets 2 stripes

This also resolves a timeout issue that was cropping up and in turn fixes a downstream failure that was being hidden. It is unclear why this wasn't surfacing on the Linux or Windows x86 legs.

@ghost
Copy link

ghost commented Apr 19, 2023

Tagging subscribers to this area: @dotnet/area-system-runtime-intrinsics
See info in area-owners.md if you want to be subscribed.

Issue Details

Follow up to #85008, this removes the unnecessary load scenarios from the rest of the hwintrinsic test templates.

It also removes the ClsVar scenarios as they weren't actually testing CLS_VAR scenarios. That would require passing in something recognizable as a GT_CNS_VEC or similar, which would require a significantly more complex template or static readonly variables + tiering to kick in.

It also removes the ClassLclFld scenario as there is no real difference between local._fld and this._fld, the latter being tested by ClassFld already.

Finally it divides the original 20 striping used in the HardwareIntrinsics_r/ro projects into the new HardwareIntrinsics_*_r/ro projects based on the original test count vs the new test count:

  • Arm - 2390 generated tests, gets 8 stripes
  • General - 2589 generated tests, gets 8 stripes
  • X86 - 504 generated tests, gets 2 stripes
  • X86_Avx - 500 generated tests, gets 2 stripes
  • X86_Avx512 - 200 generated tests (more to be added), gets 2 stripes
Author: tannergooding
Assignees: -
Labels:

area-System.Runtime.Intrinsics

Milestone: -

@tannergooding
Copy link
Member Author

tannergooding commented Apr 19, 2023

CC. @BruceForstall, @dotnet/avx512-contrib, @dotnet/jit-contrib

Also CC. @markples, @trylek for the striping change.

We're still seeing timeouts and long run times for the Avx512 stress job, regardless of striping count. It's possible that is related to #84967 and the instruction that is being incorrectly encoded; however it seems unique to Windows runs at the moment (possibly an ABI difference causing it...).

@tannergooding tannergooding force-pushed the avx512-2 branch 3 times, most recently from 084e344 to 3a36b4a Compare April 19, 2023 12:30
@tannergooding
Copy link
Member Author

tannergooding commented Apr 19, 2023

Figured out the timeout issue.

Turns out crossgen was only handling the X64 nested class and not the VL nested class and so Avx512DQ.VL.Method() was compiling down to an infinite loop (since it is simply recursive), rather than getting recognized as a "mustExpand" intrinsic.

This meant that the RunReflectionScenario_UnsafeRead scenario would hang the tests indefinitely.

@tannergooding tannergooding changed the title Remove more unnecessary scenarios from HWIntrinsic templates Remove more unnecessary scenarios from HWIntrinsic test templates and fix timeout/failure Apr 19, 2023
@@ -801,7 +801,7 @@ HARDWARE_INTRINSIC(AVX512F, ConvertToVector128UInt16,
HARDWARE_INTRINSIC(AVX512F, ConvertToVector128UInt32, -1, 1, {INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_vpmovqd, INS_vpmovqd, INS_invalid, INS_invalid}, HW_Category_SimpleSIMD, HW_Flag_BaseTypeFromFirstArg|HW_Flag_SpecialCodeGen)
HARDWARE_INTRINSIC(AVX512F, ConvertToVector256Int16, 64, 1, {INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_vpmovdw, INS_vpmovdw, INS_invalid, INS_invalid, INS_invalid, INS_invalid}, HW_Category_SimpleSIMD, HW_Flag_BaseTypeFromFirstArg|HW_Flag_SpecialCodeGen)
HARDWARE_INTRINSIC(AVX512F, ConvertToVector256Int32, 64, 1, {INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_vpmovqd, INS_vpmovqd, INS_invalid, INS_cvtpd2dq}, HW_Category_SimpleSIMD, HW_Flag_BaseTypeFromFirstArg|HW_Flag_SpecialCodeGen)
HARDWARE_INTRINSIC(AVX512F, ConvertToVector128Int32WithTruncation, 64, 1, {INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_cvttpd2dq}, HW_Category_SimpleSIMD, HW_Flag_BaseTypeFromFirstArg)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so we don't need ConvertToVector128Int32WithTruncation at all for AVX512F?

Copy link
Member Author

@tannergooding tannergooding Apr 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no such API for Avx512F

You have Sse2.ConvertToVector128Int32WithTruncation which handles Vector128<float> to Vector128<int> (4x32 to 4x32) and Avx.ConvertToVector128Int32WithTruncation which handles Vector256<double> to Vector128<int> (4x64 to 4x32)

There is then no need for any Avx512F API since we don't have any 4x128 to 4x32 like scenario. This was jsut a typo that should've been Vector256Int32 since it handles Vector512<double> to Vector256<int> (8x64 to 8x32)

@kunalspathak
Copy link
Member

was compiling down to an infinite loop (since it is simply recursive)

Interestingly we didn't find out in the initial PR CI?

@tannergooding
Copy link
Member Author

Interestingly we didn't find out in the initial PR CI?

It was dependent on a couple few factors, such as whether or not the R2R assembly is used. It was unclear why Linux nor Windows x86 failed and why it was only showing up on Windows x64 in CI.

@@ -1786,12 +1786,22 @@ void CodeGen::genAvxFamilyIntrinsic(GenTreeHWIntrinsic* node)
break;
}

case NI_AVX512F_ConvertToVector256Int32:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why was this moved up?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be consistent with the other places that specially handle this one path and to avoid the cost of the varTypeIsFloating check for the many intrinsics where it can't be true.

Copy link
Member

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tannergooding
Copy link
Member Author

Merging after talking with @BruceForstall.

The only jobs that timed out are the arm32 replay jobs, which won't be impacted by this PR as we're only touching xarch specific files/paths. The reply job did pass earlier today before the minor test fix went in.

@tannergooding tannergooding merged commit 3e79220 into dotnet:main Apr 19, 2023
@tannergooding tannergooding deleted the avx512-2 branch April 19, 2023 20:38
@ghost ghost locked as resolved and limited conversation to collaborators May 20, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants