-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize indirection cell call sequences more generally #59602
Conversation
Tagging subscribers to this area: @JulieLeeMSFT Issue DetailsCurrently we apply an optimization for R2R ARM architectures where we
|
This comment has been minimized.
This comment has been minimized.
Currently we apply an optimization for ARM architectures where we make sure we do not duplicate instructions to compute the target address for calls that involve indirection cells, instead loading it from the indirection cell directly. We can apply this optimization for x64 VSD and tailcalls that also use indirection cells. This decreases the size of these calls. I have also included a bug fix for ARM/ARM64: the optimization was only enabled under FEATURE_READYTORUN which is not always defined (e.g. in single-file scenarios).
b342729
to
9f898e1
Compare
/azp run runtime-coreclr outerloop |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run runtime-coreclr jitstress |
Azure Pipelines successfully started running 1 pipeline(s). |
This should be ready, PTAL @dotnet/jit-contrib. The diffs are substantial (-2% code size on x64 crossgen libs). aspnet.run.windows.x64.checked.mch:
Detail diffs
coreclr_tests.pmi.windows.x64.checked.mch:
Detail diffs
libraries.crossgen2.windows.x64.checked.mch:
Detail diffs
libraries.pmi.windows.x64.checked.mch:
Detail diffs
libraries_tests.pmi.windows.x64.checked.mch:
Detail diffs
|
Can you show some sample diffs? |
The regressions are due to alignment changes. Otherwise the diffs all look like this: @@ -165,7 +165,7 @@ G_M44684_IG03: ; , extend
; byrRegs -[rcx]
lea r11, [(reloc)]
cmp dword ptr [rcx], ecx
- call [hackishModuleName:hackishMethodName()]
+ call gword ptr [r11]hackishModuleName:hackishMethodName()
; gcrRegs -[rcx rdx rsi] +[rax]
; byrRegs -[rdi]
; gcr arg pop 0
@@ -201,7 +201,7 @@ G_M44684_IG04: ; , epilog, nogc, extend
ret @@ -22,11 +22,10 @@ G_M51262_IG02: ; gcrefRegs=00000006 {rcx rdx}, byrefRegs=00000000 {}, byr
cmp dword ptr [rcx], ecx
;; bbWeight=1 PerfScore 5.50
G_M51262_IG03: ; , epilog, nogc, extend
- tail.jmp [hackishModuleName:hackishMethodName()]
- ; gcr arg pop 0
+ tail.jmp qword ptr [r11]hackishModuleName:hackishMethodName() |
In |
Looks like we use that bit to indicate the GC-ness of the return value for calls: runtime/src/coreclr/jit/emitxarch.cpp Lines 13443 to 13451 in 3649506
So it is the disassembly that's confusing, we always print it before the addressing mode for |
Improvements on ubuntu/x64 - dotnet/perf-autofiling-issues#1898 |
Currently we apply an optimization for ARM architectures where we make
sure we do not duplicate instructions to compute the target address for
calls that involve indirection cells, instead loading it from the
indirection cell directly. We can apply this optimization for x64 VSD
and tailcalls that also use indirection cells. This decreases the size
of these calls.
I have also included a bug fix for ARM/ARM64: the optimization was only
enabled under FEATURE_READYTORUN which is not always defined (e.g. in
single-file scenarios).