-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: investigate EH Write Through Failures #35534
Comments
Tests show the following issues:
Think I have a fix for some of the runtime failures. Looks like we are not initializing the stack home for a EH live parameter. |
@AndyAyersMS - let me know if you'd like me to track down the LSRA assertions. |
@CarolEidt I'll keep looking, but maybe you can give me some pointers. The assert is in runtime/src/coreclr/src/jit/lsra.cpp Lines 10710 to 10714 in 2b30fdb
|
The associated interval is a write-through local:
|
I would guess that V08 didn't get a register at entry, or was allocated |
For the two ref positions:
I think V08's liveness is over-stated but perhaps the use (in a return) is leaking upwards through infeasible EH paths. First bit of the allocation table:
Method being jitted: runtime/src/libraries/System.Linq.Expressions/src/System/Linq/Expressions/Compiler/VariableBinder.cs Lines 145 to 195 in b4f7380
V08 (local6) is defined/used by that inner return. |
I'm also looking for simpler failing test cases or a simple repro based on the above, but no luck so far. |
I was able to reproduce this using SuperPmi. It is indeed the double zero-init that is causing the problem. I submitted PR #35585 to fix it. |
Still ~34 failures ( a bit less ) with #35585. They all look like unexpected null reference exceptions. I'll look for a simple case and try and see what's up. |
A number of the remaining bugs involve Looks like either there's a missing reload of RCX before the call, or else the xor of RCX at the end of the zeroing done in the jit prolog inadvertently trashes RCX which we expected we could keep live. Will keep digging. ;; System.Diagnostics.Process:WaitForExitCore(int):bool:this
G_M64267_IG01:
55 push rbp
4156 push r14
57 push rdi
56 push rsi
53 push rbx
4883EC40 sub rsp, 64
488D6C2460 lea rbp, [rsp+60H]
33C0 xor rax, rax
488945D8 mov qword ptr [rbp-28H], rax
488965C0 mov qword ptr [rbp-40H], rsp
48894D10 mov gword ptr [rbp+10H], rcx
895518 mov dword ptr [rbp+18H], edx
33C9 xor rcx, rcx
;; bbWeight=1 PerfScore 10.25
G_M64267_IG02:
4533C0 xor r8, r8
33C0 xor rax, rax
488945D8 mov gword ptr [rbp-28H], rax
;; bbWeight=1 PerfScore 1.50
G_M64267_IG03:
; **** need to reload RCX here, or not zero it above ***
BA00001000 mov edx, 0x100000
4533C0 xor r8d, r8d
E8081BFEFF call System.Diagnostics.Process:GetProcessHandle(int,bool):Microsoft.Win32.SafeHandles.SafeProcessHandle:this
|
Looks to me like RCX is getting trashed. The troublemaker is
which is assigned RCX for a stretch down in the try body. We zero both its memory and its register locations in the jit prolog; unfortunately the register location isn't live there. Relevant logic is here: runtime/src/coreclr/src/jit/codegencommon.cpp Lines 7563 to 7573 in c614097
Seems plausible that if a variable is must init and in both memory and a register we only need to zero memory in the prolog, but that's probably too simplistic. @CarolEidt thoughts? |
It seems that the register allocator thinks that the variable is in
Which will show the variables that it believes are live in registers at the start of the block (it shows up twice, once we we are generating code for that block, and once just prior to:
Let me know if you'd like me to take over tracking this down. |
Only V00 (this) is live in BB01:
V02 becomes live in BB01, but at that point it's in RAX. It only live is in RCX later in the method -- that is, its highest refpos appearances are in RCX. I wonder if this carries over to the prolog codegen and that's why we think we need to zero RCX. For a variable that lives in different registers in different parts of the code, what's the intended meaning of |
It is the "current" register occupied by the variable. Unfortunately, I believe that it relies on the invariant that you only query it if the variable is actually live. Otherwise the register allocator would have to reset the register number for all the variables at each boundary, not just those that are live. In this case |
If you want to investigate, here's a simple repro:
Key method is |
Thanks, I'll take a look. |
So ... without EHWriteThru we never have a |
Great -- this fixes my local repro case. I'll add the jit-experimental testing to your new PR. This may be the last issue for x64 Pri1 tests. |
Now that all the fixes are in, I launched a few ASP.NET perf runs. [EDIT] Baseline data here is stale, so improvements are not accurate -- see comments below On
On
@sebastienros can you help us do more comprehensive testing? To enable this you need a build from Saturday or later, and need to set
|
What OSes and architectures should it be tested on? |
It should improve codegen for all OSes/Architectures. |
Preliminary results on 12 core machines
|
Sigh, I had a bug in my script and was comparing new runs to older baseline files. Results vs proper baselines are more in line with the data Sebastien has gathered. Odd though that Json RPS was 350K yesterday (for both baseline and EHWT) and only 330K today. |
Adding more numbers, this time on the citrine environment. INTEL machines are 14/28(ht) cores, ARM is 32 cores.
|
How much variability do you usually see in results? I'd be surprised if enabling this made things slower, but I see cases above where it looks like we lose 2-3% on RPS. If that's accurate, we should try and look at codegen for the key methods more closely. |
Each number is the average of two runs. I will redo the ones that regressed just to be sure. It might happen that a run is bad. |
There are definitely places where EH Write through could be slightly worse. In situations where there are multiple definitions in the Try clause, each of those will do a store, and may also define a register - requiring an additional mov if the value could have been directly defined to memory. It is most effective when there are more uses than defs, and when register pressure is not excessive. |
I ran PlaintextPlatform 5 times for each, and the min value for no EHWT is greater than the max value with it, so it's definitely slower with on this scenario: COMPlus_EnableEHWriteThru=0
COMPlus_EnableEHWriteThru=1
|
I'm guessing you've looked into this, but couldn't we screen candidates and only enable write through for locals that have favorable def/use ratios? |
I didn't pursue that; we don't readily have that information until we build Intervals, and at that point it is difficult for the register allocator to change its mind about making something a candidate. That said, it could presumably decide that some of the candidates should never get a register, effectively making it the same as if it were not a candidate, though that would require some tweaking to avoid actually allocating a register when not needed (it does better with RegOptional uses than defs). It would require retaining additional information on each |
I wonder if we could gather this info during one of our ref count traversals. We currently don't distinguish reads and writes, but it seems like we easily could. Probably best to first dig into the code that's causing slowdowns in the tests above to ensure this is why things end up slower. |
This issue is nominally done. Would like to close and capture what remains in a new issue. @CarolEidt is there an issue for enabling EH Write Thru by default? I couldn't find one. Feels like there is still a moderate amount of work to do before we can turn EH Write Thru on by default, and some risk that we're just not going to see the benefits here we'd hoped to see. I'd like to do a more careful evaluation and try and find or create examples where we clearly do expect to see benefits. Seems like all this might be a stretch for 5.0 but possibly worthwhile; trying to assess if we should find a way to do that sometime soon, or defer all of this until after 5.0. |
Thanks for all your analysis, Andy. I agree that this issue, the investigation, is complete. |
Thanks, I will close this issue. @sebastienros thanks for running all those tests for us -- we'll need to dig in deeper to understand what's going on. Hope we can get to it before too long. |
We enable EH write through on the jit-experimental pipeline which runs every weekend, and there are a good number of test failures
https://dev.azure.com/dnceng/public/_build/results?buildId=618423&view=ms.vss-test-web.build-test-results-tab
The text was updated successfully, but these errors were encountered: