-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: slightly more aggressive tail duplication #61179
Conversation
Tagging subscribers to this area: @JulieLeeMSFT Issue DetailsGeneralize tail duplication to catch patterns like the one in #37904 where
|
cc @dotnet/jit-contrib Not sure this is in its final form yet; thinking about ways to generalize the pattern match. Would perhaps be a nice test case for a high-level declarative syntax driven match. Also might want to experiment with making this even more aggressive. On the example from #37904 this now gets us the same codegen for ;; before
; Assembly listing for method C`1:Compare1(Span`1,int):int
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; optimized code
; rsp based frame
; partially interruptible
; No PGO data
; 0 inlinees with PGO data; 0 single block inlinees; 2 inlinees without PGO data
; Final local variable assignments
;
; V00 arg0 [V00,T00] ( 4, 8 ) byref -> rcx ld-addr-op single-def
; V01 arg1 [V01,T01] ( 4, 3.50) int -> rdx single-def
; V02 OutArgs [V02 ] ( 1, 1 ) lclBlk (32) [rsp+00H] "OutgoingArgSpace"
;* V03 tmp1 [V03,T04] ( 0, 0 ) bool -> zero-ref "Inline return value spill temp"
; V04 tmp2 [V04,T02] ( 3, 5 ) int -> rax ld-addr-op "Inlining Arg"
; V05 tmp3 [V05,T03] ( 4, 2.50) int -> rcx "Inline return value spill temp"
;* V06 tmp4 [V06 ] ( 0, 0 ) byref -> zero-ref V08._pointer(offs=0x00) P-INDEP "field V00._pointer (fldOffset=0x0)"
;* V07 tmp5 [V07 ] ( 0, 0 ) int -> zero-ref V08._length(offs=0x08) P-INDEP "field V00._length (fldOffset=0x8)"
;* V08 tmp6 [V08 ] ( 0, 0 ) struct (16) zero-ref "Promoted implicit byref"
;
; Lcl frame size = 40
G_M38094_IG01:
sub rsp, 40
;; bbWeight=1 PerfScore 0.25
G_M38094_IG02:
cmp dword ptr [rcx+8], 0
jbe SHORT G_M38094_IG11
mov rax, bword ptr [rcx]
mov eax, dword ptr [rax]
cmp eax, edx
jge SHORT G_M38094_IG04
;; bbWeight=1 PerfScore 9.25
G_M38094_IG03:
mov ecx, -1
jmp SHORT G_M38094_IG06
;; bbWeight=0.50 PerfScore 1.12
G_M38094_IG04:
cmp eax, edx
jle SHORT G_M38094_IG05
mov ecx, 1
jmp SHORT G_M38094_IG06
;; bbWeight=0.50 PerfScore 1.75
G_M38094_IG05:
xor ecx, ecx
;; bbWeight=0.50 PerfScore 0.12
G_M38094_IG06:
test ecx, ecx
jl SHORT G_M38094_IG09
;; bbWeight=1 PerfScore 1.25
G_M38094_IG07:
xor eax, eax
;; bbWeight=0.50 PerfScore 0.12
G_M38094_IG08:
add rsp, 40
ret
;; bbWeight=0.50 PerfScore 0.62
G_M38094_IG09:
mov eax, 1
;; bbWeight=0.50 PerfScore 0.12
G_M38094_IG10:
add rsp, 40
ret
;; bbWeight=0.50 PerfScore 0.62
G_M38094_IG11:
call CORINFO_HELP_RNGCHKFAIL
int3
;; bbWeight=0 PerfScore 0.00
;; after
; Assembly listing for method C`1:Compare1(Span`1,int):int
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; optimized code
; rsp based frame
; partially interruptible
; No PGO data
; 0 inlinees with PGO data; 0 single block inlinees; 2 inlinees without PGO data
; Final local variable assignments
;
; V00 arg0 [V00,T00] ( 4, 8 ) byref -> rcx ld-addr-op single-def
; V01 arg1 [V01,T01] ( 3, 3 ) int -> rdx single-def
; V02 OutArgs [V02 ] ( 1, 1 ) lclBlk (32) [rsp+00H] "OutgoingArgSpace"
;* V03 tmp1 [V03,T03] ( 0, 0 ) bool -> zero-ref "Inline return value spill temp"
; V04 tmp2 [V04,T02] ( 2, 4 ) int -> rax ld-addr-op "Inlining Arg"
;* V05 tmp3 [V05,T04] ( 0, 0 ) int -> zero-ref "Inline return value spill temp"
;* V06 tmp4 [V06 ] ( 0, 0 ) byref -> zero-ref V08._pointer(offs=0x00) P-INDEP "field V00._pointer (fldOffset=0x0)"
;* V07 tmp5 [V07 ] ( 0, 0 ) int -> zero-ref V08._length(offs=0x08) P-INDEP "field V00._length (fldOffset=0x8)"
;* V08 tmp6 [V08 ] ( 0, 0 ) struct (16) zero-ref "Promoted implicit byref"
;
; Lcl frame size = 40
G_M38094_IG01:
sub rsp, 40
;; bbWeight=1 PerfScore 0.25
G_M38094_IG02:
cmp dword ptr [rcx+8], 0
jbe SHORT G_M38094_IG07
mov rax, bword ptr [rcx]
mov eax, dword ptr [rax]
cmp eax, edx
jl SHORT G_M38094_IG05
;; bbWeight=1 PerfScore 9.25
G_M38094_IG03:
xor eax, eax
;; bbWeight=0.50 PerfScore 0.12
G_M38094_IG04:
add rsp, 40
ret
;; bbWeight=0.50 PerfScore 0.62
G_M38094_IG05:
mov eax, 1
;; bbWeight=0.50 PerfScore 0.12
G_M38094_IG06:
add rsp, 40
ret
;; bbWeight=0.50 PerfScore 0.62
G_M38094_IG07:
call CORINFO_HELP_RNGCHKFAIL
int3
;; bbWeight=0 PerfScore 0.00 SPMI on (on top of #61023) shows modest number of hits.. aspnet.run.windows.x64.checked.mch:
Detail diffs
benchmarks.run.windows.x64.checked.mch:
Detail diffs
coreclr_tests.pmi.windows.x64.checked.mch:
Detail diffs
libraries.crossgen2.windows.x64.checked.mch:
Detail diffs
libraries.pmi.windows.x64.checked.mch:
Detail diffs
libraries_tests.pmi.windows.x64.checked.mch:
Detail diffs
|
Not cleear yet what's going wrong with the installer builds, could be related.
Also have a branch off this change where I'm trying to remove all the |
Hmm, this is going to take a bit more work. Right now the If we get rid of these constructs we instead end up with two relops in the same block, I'd hoped that #61023 would be sufficient to clean this up, but (at least in the case of
If the two statements are adjacent then moving instead of copying is not too difficult; something like:
|
Going to wait on this until after #61275. |
Diffs on top of #61275. aspnet.run.windows.x64.checked.mch:
Detail diffs
benchmarks.run.windows.x64.checked.mch:
Detail diffs
coreclr_tests.pmi.windows.x64.checked.mch:
Detail diffs
libraries.crossgen2.windows.x64.checked.mch:
Detail diffs
libraries.pmi.windows.x64.checked.mch:
Detail diffs
libraries_tests.pmi.windows.x64.checked.mch:
Detail diffs
|
Catch patterns like the one in dotnet#37904 where a trinary compare feeds a binary compare.
80acdcd
to
314509e
Compare
The odd non-windows installer failure is still there, looks like I'll have to debug it. |
@dotnet/jit-contrib think this is ready for review. |
Generalize tail duplication to catch patterns like the one in #37904 where
a trinary compare feeds a binary compare.