-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: Enable CSE for VectorX.Create #50644
Conversation
PTAL @dotnet/jit-contrib @tannergooding |
src/coreclr/jit/valuenum.cpp
Outdated
if ((HWIntrinsicInfo::lookupNumArgs(hwIntrinsicNode->gtHWIntrinsicId) == -1) || | ||
const bool isVariableArgs = HWIntrinsicInfo::lookupNumArgs(hwIntrinsicNode->gtHWIntrinsicId) == -1; | ||
|
||
// In case of functions with variable number of args we only support single-arg ones. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wondering, why 0 or 1 and not 0, 1, or 2?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Variable length args only impact Vector.Create
calls for ARM, but also impact cases like Sse.ReciprocalScalar
for x86, where the upper limit is 2 args (and therefore op1 will not be a GT_LIST
, but NumArgs
will be -1
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initially I wanted to add support for args >= 2 cases later but since you noticed this case I've added it as part of this pr.
Repro:
static int Test()
{
return Sse2.MoveMask(Sse2.Add(
Vector128.Create(1, 1),
Vector128.Create(1, 1)).AsByte());
}
New codegen:
; Method Program:Test():int
G_M58954_IG01:
vzeroupper
;; bbWeight=1 PerfScore 1.00
G_M58954_IG02:
vmovupd xmm0, xmmword ptr [reloc @RWD00]
vpaddq xmm0, xmm0, xmm0
vpmovmskb eax, xmm0
;; bbWeight=1 PerfScore 4.33
G_M58954_IG03:
ret
;; bbWeight=1 PerfScore 1.00
RWD00 dq 0000000000000001h, 0000000000000001h
; Total bytes of code: 20
* upstream/main: (568 commits) [wasm] Set __DistroRid on Windows to browser-wasm (dotnet#50842) [wasm] Fix order of include paths, to have the obj dir first (dotnet#50303) [wasm] Fix debug build of AOT cross compiler (dotnet#50418) Fix outdated comment (dotnet#50834) [wasm][tests] Add properties to allow passing args to xharness (dotnet#50678) Vectorized common String.Split() paths (dotnet#38001) Fix binplacing symbol files. (dotnet#50819) Move type check to after the null ref branch in out marshalling of blittable classes. (dotnet#50735) Remove extraneous CMake version requirement. (dotnet#50805) [wasm] Remove unncessary condition for EMSDK (dotnet#50810) Add loop alignment stats to JitLogCsv (dotnet#50624) Resolve ILLink warnings in System.Diagnostics.DiagnosticSource (dotnet#50265) Avoid unnecessary closures/delegates in Process (dotnet#50496) Fix for field layout verification across version bubble boundary (dotnet#50364) JIT: Enable CSE for VectorX.Create (dotnet#50644) [main] Update dependencies from mono/linker (dotnet#50779) [mono] More domain cleanup (dotnet#50771) Race condition in Mock reference tracker runtime with GC. (dotnet#50804) Remove IAssemblyName (and various fusion remnants) (dotnet#50755) Disable failing test for GCStress. (dotnet#50828) ...
A low hanging fruit a noticed while I was triaging #13811, but it'd be also nice to allow CSE for GT_LIST-type of args in future.
This PR enables CSE (and hoisting) for Vector128.Create (and other functions with variable amount of args where the actual amount of them is zero or one).
Example:
codegen diff: https://www.diffchecker.com/LMZuC6QS
jit-diff (-f):
The regression is actually a perf improvement: https://www.diffchecker.com/54f2vVgR