Remove more unnecessary scenarios from HWIntrinsic test templates and fix timeout/failure #85026

tannergooding · 2023-04-19T01:42:22Z

Follow up to #85008, this removes the unnecessary load scenarios from the rest of the hwintrinsic test templates.

It also removes the ClsVar scenarios as they weren't actually testing CLS_VAR scenarios. That would require passing in something recognizable as a GT_CNS_VEC or similar, which would require a significantly more complex template or static readonly variables + tiering to kick in.

It also removes the ClassLclFld scenario as there is no real difference between local._fld and this._fld, the latter being tested by ClassFld already.

Finally it divides the original 20 striping used in the HardwareIntrinsics_r/ro projects into the new HardwareIntrinsics_*_r/ro projects based on the original test count vs the new test count:

Arm - 2390 generated tests, gets 8 stripes
General - 2589 generated tests, gets 8 stripes
X86 - 504 generated tests, gets 2 stripes
X86_Avx - 500 generated tests, gets 2 stripes
X86_Avx512 - 200 generated tests (more to be added), gets 2 stripes

This also resolves a timeout issue that was cropping up and in turn fixes a downstream failure that was being hidden. It is unclear why this wasn't surfacing on the Linux or Windows x86 legs.

ghost · 2023-04-19T02:00:35Z

Tagging subscribers to this area: @dotnet/area-system-runtime-intrinsics
See info in area-owners.md if you want to be subscribed.

Issue Details

Follow up to #85008, this removes the unnecessary load scenarios from the rest of the hwintrinsic test templates.

It also removes the ClsVar scenarios as they weren't actually testing CLS_VAR scenarios. That would require passing in something recognizable as a GT_CNS_VEC or similar, which would require a significantly more complex template or static readonly variables + tiering to kick in.

It also removes the ClassLclFld scenario as there is no real difference between local._fld and this._fld, the latter being tested by ClassFld already.

Finally it divides the original 20 striping used in the HardwareIntrinsics_r/ro projects into the new HardwareIntrinsics_*_r/ro projects based on the original test count vs the new test count:

Arm - 2390 generated tests, gets 8 stripes
General - 2589 generated tests, gets 8 stripes
X86 - 504 generated tests, gets 2 stripes
X86_Avx - 500 generated tests, gets 2 stripes
X86_Avx512 - 200 generated tests (more to be added), gets 2 stripes

Author:	tannergooding
Assignees:	-
Labels:	`area-System.Runtime.Intrinsics`
Milestone:	-

tannergooding · 2023-04-19T04:18:50Z

CC. @BruceForstall, @dotnet/avx512-contrib, @dotnet/jit-contrib

Also CC. @markples, @trylek for the striping change.

We're still seeing timeouts and long run times for the Avx512 stress job, regardless of striping count. It's possible that is related to #84967 and the instruction that is being incorrectly encoded; however it seems unique to Windows runs at the moment (possibly an ABI difference causing it...).

tannergooding · 2023-04-19T12:57:38Z

Figured out the timeout issue.

Turns out crossgen was only handling the X64 nested class and not the VL nested class and so Avx512DQ.VL.Method() was compiling down to an infinite loop (since it is simply recursive), rather than getting recognized as a "mustExpand" intrinsic.

This meant that the RunReflectionScenario_UnsafeRead scenario would hang the tests indefinitely.

kunalspathak · 2023-04-19T17:59:12Z

src/coreclr/jit/hwintrinsiclistxarch.h

@@ -801,7 +801,7 @@ HARDWARE_INTRINSIC(AVX512F,         ConvertToVector128UInt16,
 HARDWARE_INTRINSIC(AVX512F,         ConvertToVector128UInt32,                   -1,              1,     {INS_invalid,           INS_invalid,            INS_invalid,            INS_invalid,            INS_invalid,            INS_invalid,            INS_vpmovqd,            INS_vpmovqd,            INS_invalid,            INS_invalid},           HW_Category_SimpleSIMD,             HW_Flag_BaseTypeFromFirstArg|HW_Flag_SpecialCodeGen)
 HARDWARE_INTRINSIC(AVX512F,         ConvertToVector256Int16,                    64,              1,     {INS_invalid,           INS_invalid,            INS_invalid,            INS_invalid,            INS_vpmovdw,            INS_vpmovdw,            INS_invalid,            INS_invalid,            INS_invalid,            INS_invalid},           HW_Category_SimpleSIMD,             HW_Flag_BaseTypeFromFirstArg|HW_Flag_SpecialCodeGen)
 HARDWARE_INTRINSIC(AVX512F,         ConvertToVector256Int32,                    64,              1,     {INS_invalid,           INS_invalid,            INS_invalid,            INS_invalid,            INS_invalid,            INS_invalid,            INS_vpmovqd,            INS_vpmovqd,            INS_invalid,            INS_cvtpd2dq},          HW_Category_SimpleSIMD,             HW_Flag_BaseTypeFromFirstArg|HW_Flag_SpecialCodeGen)
-HARDWARE_INTRINSIC(AVX512F,         ConvertToVector128Int32WithTruncation,      64,              1,     {INS_invalid,           INS_invalid,            INS_invalid,            INS_invalid,            INS_invalid,            INS_invalid,            INS_invalid,            INS_invalid,            INS_invalid,            INS_cvttpd2dq},         HW_Category_SimpleSIMD,             HW_Flag_BaseTypeFromFirstArg)


so we don't need ConvertToVector128Int32WithTruncation at all for AVX512F?

There is no such API for Avx512F

You have Sse2.ConvertToVector128Int32WithTruncation which handles Vector128<float> to Vector128<int> (4x32 to 4x32) and Avx.ConvertToVector128Int32WithTruncation which handles Vector256<double> to Vector128<int> (4x64 to 4x32)

There is then no need for any Avx512F API since we don't have any 4x128 to 4x32 like scenario. This was jsut a typo that should've been Vector256Int32 since it handles Vector512<double> to Vector256<int> (8x64 to 8x32)

kunalspathak · 2023-04-19T18:01:18Z

was compiling down to an infinite loop (since it is simply recursive)

Interestingly we didn't find out in the initial PR CI?

tannergooding · 2023-04-19T18:04:50Z

Interestingly we didn't find out in the initial PR CI?

It was dependent on a couple few factors, such as whether or not the R2R assembly is used. It was unclear why Linux nor Windows x86 failed and why it was only showing up on Windows x64 in CI.

kunalspathak · 2023-04-19T18:30:26Z

src/coreclr/jit/hwintrinsiccodegenxarch.cpp

@@ -1786,12 +1786,22 @@ void CodeGen::genAvxFamilyIntrinsic(GenTreeHWIntrinsic* node)
            break;
        }

+        case NI_AVX512F_ConvertToVector256Int32:


why was this moved up?

To be consistent with the other places that specially handle this one path and to avoid the cost of the varTypeIsFloating check for the many intrinsics where it can't be true.

kunalspathak

LGTM

tannergooding · 2023-04-19T20:38:09Z

Merging after talking with @BruceForstall.

The only jobs that timed out are the arm32 replay jobs, which won't be impacted by this PR as we're only touching xarch specific files/paths. The reply job did pass earlier today before the minor test fix went in.

tannergooding added 4 commits April 18, 2023 18:36

Remove unnecessary scenarios from other HWIntrinsic templates

1858196

Remove ClsVar scenarios from HWIntrinsic test templates

5df76a5

Removing ClassLclFld scenarios from the HWIntrinsic test templates

7a56982

Divide the original striping between the split hwintrinsic test projects

b6dac50

dotnet-issue-labeler bot added the area-System.Runtime.Intrinsics label Apr 19, 2023

Minor formatting change to trigger JIT tests

3f48142

tannergooding mentioned this pull request Apr 19, 2023

Usage of TargetArchitecture in managed coreclr test #83980

Open

tannergooding marked this pull request as ready for review April 19, 2023 04:13

This was referenced Apr 19, 2023

IOException running NuGet-Migrations during tests in dotnet CLI first run #80619

Closed

WasmTestOnBrowser-System.Text.Json.Tests.WorkItemExecution timing out #84434

Closed

Allow Avx512 tests to run in Pri0

ba62665

tannergooding force-pushed the avx512-2 branch 3 times, most recently from 084e344 to 3a36b4a Compare April 19, 2023 12:30

Ensure that crossgen correctly handles the VL nested class

7fdc37d

tannergooding force-pushed the avx512-2 branch from 3a36b4a to 7fdc37d Compare April 19, 2023 12:56

tannergooding requested a review from MichalStrehovsky as a code owner April 19, 2023 12:56

tannergooding mentioned this pull request Apr 19, 2023

Ensure vextractf64x4 and vextracti64x4 aren't marked DstDstSrc #85030

Merged

tannergooding changed the title ~~Remove more unnecessary scenarios from HWIntrinsic templates~~ Remove more unnecessary scenarios from HWIntrinsic test templates and fix timeout/failure Apr 19, 2023

tannergooding added 3 commits April 19, 2023 06:14

Fixing an issue with NI_AVX512F_ConvertToVector256Int32 for TYP_DOUBLE

2b2bc7b

Fixing a couple small test failures that were masked

56f93c4

Fix a mistyped intrinsic id

0c05a42

kunalspathak reviewed Apr 19, 2023

View reviewed changes

kunalspathak approved these changes Apr 19, 2023

View reviewed changes

EgorBo approved these changes Apr 19, 2023

View reviewed changes

tannergooding merged commit 3e79220 into dotnet:main Apr 19, 2023

BruceForstall approved these changes Apr 19, 2023

View reviewed changes

tannergooding deleted the avx512-2 branch April 19, 2023 20:38

build-analysis bot mentioned this pull request Apr 19, 2023

Tracking issue for CI build timeouts #76454

Closed

ghost locked as resolved and limited conversation to collaborators May 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove more unnecessary scenarios from HWIntrinsic test templates and fix timeout/failure #85026

Remove more unnecessary scenarios from HWIntrinsic test templates and fix timeout/failure #85026

tannergooding commented Apr 19, 2023 •

edited

Loading

ghost commented Apr 19, 2023

tannergooding commented Apr 19, 2023 •

edited

Loading

tannergooding commented Apr 19, 2023 •

edited

Loading

kunalspathak Apr 19, 2023

tannergooding Apr 19, 2023 •

edited

Loading

kunalspathak commented Apr 19, 2023

tannergooding commented Apr 19, 2023

kunalspathak Apr 19, 2023

tannergooding Apr 19, 2023

kunalspathak left a comment

tannergooding commented Apr 19, 2023

Remove more unnecessary scenarios from HWIntrinsic test templates and fix timeout/failure #85026

Remove more unnecessary scenarios from HWIntrinsic test templates and fix timeout/failure #85026

Conversation

tannergooding commented Apr 19, 2023 • edited Loading

ghost commented Apr 19, 2023

tannergooding commented Apr 19, 2023 • edited Loading

tannergooding commented Apr 19, 2023 • edited Loading

kunalspathak Apr 19, 2023

Choose a reason for hiding this comment

tannergooding Apr 19, 2023 • edited Loading

Choose a reason for hiding this comment

kunalspathak commented Apr 19, 2023

tannergooding commented Apr 19, 2023

kunalspathak Apr 19, 2023

Choose a reason for hiding this comment

tannergooding Apr 19, 2023

Choose a reason for hiding this comment

kunalspathak left a comment

Choose a reason for hiding this comment

tannergooding commented Apr 19, 2023

tannergooding commented Apr 19, 2023 •

edited

Loading

tannergooding commented Apr 19, 2023 •

edited

Loading

tannergooding commented Apr 19, 2023 •

edited

Loading

tannergooding Apr 19, 2023 •

edited

Loading