Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use AVX512 to zero locals #91166

Merged
merged 2 commits into from
Aug 28, 2023
Merged

Use AVX512 to zero locals #91166

merged 2 commits into from
Aug 28, 2023

Conversation

EgorBo
Copy link
Member

@EgorBo EgorBo commented Aug 27, 2023

Extends #32538 to use AVX-512 (and AVX1) to zero locals for non-loop path. I am going to slightly refactor it to use AVX in the loop path too but later, this seems to be a low-hanging fruit with nice diffs.

Diff example:

@@ -17,17 +17,13 @@ G_M59697_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
        push     rbx
        sub      rsp, 160
        vxorps   xmm4, xmm4, xmm4
-       vmovdqa  xmmword ptr [rsp+0x20], xmm4
-       vmovdqa  xmmword ptr [rsp+0x30], xmm4
-       mov      rax, -96
-       vmovdqa  xmmword ptr [rsp+rax+0xA0], xmm4
-       vmovdqa  xmmword ptr [rsp+rax+0xB0], xmm4
-       vmovdqa  xmmword ptr [rsp+rax+0xC0], xmm4
-       add      rax, 48
-       jne      SHORT  -5 instr
+       vmovdqu  ymmword ptr [rsp+0x20], ymm4
+       vmovdqu  ymmword ptr [rsp+0x40], ymm4
+       vmovdqu  ymmword ptr [rsp+0x60], ymm4
+       vmovdqu  ymmword ptr [rsp+0x80], ymm4
        mov      rbx, rcx
        ; gcrRegs +[rbx]
-						;; size=70 bbWeight=1 PerfScore 13.33
+						;; size=42 bbWeight=1 PerfScore 9.83
 G_M59697_IG02:        ; bbWeight=1, gcrefRegs=0008 {rbx}, byrefRegs=0000 {}, byref
        lea      rcx, [rsp+0x20]
        call     [<unknown method>]
@@ -46,7 +42,7 @@ G_M59697_IG03:        ; bbWeight=1, epilog, nogc, extend
        ret      
 						;; size=9 bbWeight=1 PerfScore 1.75

(apparently this collection has no avx-512, but still looks better)

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Aug 27, 2023
@ghost ghost assigned EgorBo Aug 27, 2023
@ghost
Copy link

ghost commented Aug 27, 2023

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Extends #32538 to use AVX-512 (and AVX1) to zero locals for non-loop path. I am going to slightly refactor it to use AVX in the loop path too but later, this seems to be a low-hanging fruit with nice diffs.

Author: EgorBo
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@EgorBo
Copy link
Member Author

EgorBo commented Aug 27, 2023

@dotnet/jit-contrib PTAL, simple change with nice diffs (-122kb for benchmarks.pgo collection, -0.13% TP for the same collection).

The logic has plenty of opportunities to optimize futher, e.g. use AVX in the loop - I didn't change it here because for that we need to align data to 32/64 bytes + remainder can be handled with overlapping -- but I am leaving it for future follow ups. I was mostly interested in removing loops by allowing up to 6*64=384 bytes to be zeroed directly with avx512 where previously we switched to the loop for >96 bytes.

@EgorBo EgorBo merged commit 3a1570f into dotnet:main Aug 28, 2023
127 checks passed
@EgorBo EgorBo deleted the zero-locals-avx512 branch August 28, 2023 16:29
@EgorBo EgorBo mentioned this pull request Sep 3, 2023
56 tasks
@ghost ghost locked as resolved and limited conversation to collaborators Oct 5, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI avx512 Related to the AVX-512 architecture
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants