-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: Redundant fmov's on arm64 for a simple function #58954
Comments
Tagging subscribers to this area: @JulieLeeMSFT Issue DetailsRepro: static float Lerp(float v0, float v1, float t) =>
MathF.FusedMultiplyAdd(t, v1,
MathF.FusedMultiplyAdd(-t, v0, v0)); arm64: ; Method Program:Lerp(float,float,float):float
G_M22020_IG01: ;; offset=0000H
A9BF7BFD stp fp, lr, [sp,#-16]!
910003FD mov fp, sp
;; bbWeight=1 PerfScore 1.50
G_M22020_IG02: ;; offset=0008H
1E204010 fmov s16, s0
1E204000 fmov s0, s0
1E204051 fmov s17, s2
1F11C000 fmsub s0, s0, s17, s16
1E204000 fmov s0, s0
1E204021 fmov s1, s1
1E204042 fmov s2, s2
1F020020 fmadd s0, s1, s2, s0
;; bbWeight=1 PerfScore 9.00
G_M22020_IG03: ;; offset=0028H
A8C17BFD ldp fp, lr, [sp],#16
D65F03C0 ret lr
;; bbWeight=1 PerfScore 2.00
; Total bytes of code: 48 namely: 1E204000 fmov s0, s0
1E204021 fmov s1, s1
1E204042 fmov s2, s2
|
Looks like should be handled in |
This is likely because This is somewhat unfortunate as the side effect doesn't always matter, it only really matters when the user is explicitly aware they are operating on a Because of this, I imagine the fix is slightly more complicated as it likely requires knowing that the source and destination are the same size vs knowing when you are using a hardware intrinsic instead. |
@EgorBo, sorry I don't mean to distract you from using System.Runtime.CompilerServices;
class C
{
[MethodImpl(MethodImplOptions.NoInlining)]
static string M1() => "C";
} In this case, runtime/src/coreclr/jit/emitxarch.cpp Line 4579 in a3f7d29
but we still get: ; Assembly listing for method C:M1():System.String
; Emitting BLENDED_CODE for X64 CPU with AVX - Unix
; optimized code
; rsp based frame
; partially interruptible
; No PGO data
; Final local variable assignments
;
;# V00 OutArgs [V00 ] ( 1, 1 ) lclBlk ( 0) [rsp+00H] "OutgoingArgSpace"
;
; Lcl frame size = 0
G_M9818_IG01: ;; offset=0000H
;; bbWeight=1 PerfScore 0.00
G_M9818_IG02: ;; offset=0000H
48B8B0110068927F0000 mov rax, 0x7F92680011B0 ; string handle
488B00 mov rax, gword ptr [rax]
;; bbWeight=1 PerfScore 2.25
G_M9818_IG03: ;; offset=000DH
C3 ret
;; bbWeight=1 PerfScore 1.00
; Total bytes of code 14, prolog size 0, PerfScore 4.65, instruction count 3, allocated bytes for code 14 (MethodHash=28ffd9a5) for method C:M1():System.String
; ============================================================ |
haha, feel free to ask any questions if you're interested in contributing to JIT 🙂
not sure I understand, I don't see any redundant mov in the output codegen uint64_t rax = 0x7F92680011B0ULL;
return *((uint64_t*)(void*)rax); |
Maybe it's just the encoding difference, but I was expecting one mov like: mov eax, [0x12e391a0]
ret or movabs rax, gword ptr [ds:0x7F92680011B0]
ret |
that can only be done (we call it "contained") when the constant (imm) fits into 4byte integer, see https://godbolt.org/z/v81ozcEKd (and even less on arm64) |
Moving this to .NET 8 |
@TIHan - assigning this to you. |
For every FMOV emitted by the hardware intrinsics that is currently marked as not skippable: if the src and dest registers are the same and the types match, then the instruction will have no effect and can be safely marked as skippable.
* Arm64: Skip redundant hwintrinsic float movs (#58954) For every FMOV emitted by the hardware intrinsics that is currently marked as not skippable: if the src and dest registers are the same and the types match, then the instruction will have no effect and can be safely marked as skippable. * Use hardcoded canSkip values
Repro:
arm64:
namely:
Expected codegen: https://godbolt.org/z/9e91zE3j3
The text was updated successfully, but these errors were encountered: