-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use AVX512 to zero locals #91166
Use AVX512 to zero locals #91166
Conversation
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue DetailsExtends #32538 to use AVX-512 (and AVX1) to zero locals for non-loop path. I am going to slightly refactor it to use AVX in the loop path too but later, this seems to be a low-hanging fruit with nice diffs.
|
8ac3214
to
61a05ee
Compare
@dotnet/jit-contrib PTAL, simple change with nice diffs (-122kb for benchmarks.pgo collection, -0.13% TP for the same collection). The logic has plenty of opportunities to optimize futher, e.g. use AVX in the loop - I didn't change it here because for that we need to align data to 32/64 bytes + remainder can be handled with overlapping -- but I am leaving it for future follow ups. I was mostly interested in removing loops by allowing up to 6*64=384 bytes to be zeroed directly with avx512 where previously we switched to the loop for >96 bytes. |
Extends #32538 to use AVX-512 (and AVX1) to zero locals for non-loop path. I am going to slightly refactor it to use AVX in the loop path too but later, this seems to be a low-hanging fruit with nice diffs.
Diff example:
(apparently this collection has no avx-512, but still looks better)