JIT: investigate EH Write Through Failures #35534

AndyAyersMS · 2020-04-27T19:27:56Z

We enable EH write through on the jit-experimental pipeline which runs every weekend, and there are a good number of test failures

https://dev.azure.com/dnceng/public/_build/results?buildId=618423&view=ms.vss-test-web.build-test-results-tab

AndyAyersMS · 2020-04-28T01:42:24Z

Tests show the following issues:

runtime failures, eg JIT\opt\Devirtualization\box2.
Assertion failed 'regRecord->assignedInterval == nullptr', eg JIT\Intrinsics\TypeIntrinsics_r\TypeIntrinsics_r

Think I have a fix for some of the runtime failures. Looks like we are not initializing the stack home for a EH live parameter.

CarolEidt · 2020-04-28T15:12:36Z

@AndyAyersMS - let me know if you'd like me to track down the LSRA assertions.
And thanks for tracking down the runtime failures!

AndyAyersMS · 2020-04-28T17:46:40Z

@CarolEidt I'll keep looking, but maybe you can give me some pointers.

The assert is in veryfyFinalAllocation: there is a RefTypeKill (RAX killed by helper call) with an assigned interval:

runtime/src/coreclr/src/jit/lsra.cpp

Lines 10710 to 10714 in 2b30fdb

    
           case RefTypeKill: 
        
               assert(regRecord != nullptr); 
        
               assert(regRecord->assignedInterval == nullptr); 
        
               dumpLsraAllocationEvent(LSRA_EVENT_KEPT_ALLOCATION, nullptr, regRecord->regNum, currentBlock); 
        
               break;

<RefPosition #18  @14  RefTypeKill <Reg:rax> BB01 regmask=[rax] minReg=1 last>

AndyAyersMS · 2020-04-28T17:54:52Z

The associated interval is a write-through local:

;  V08 loc6              ref  EH must-init class-hnd EH-live
Interval  8: (V08) ref (SPILLED) (writeThru) RefPositions {#2@0 #3@0 #754@705 #1006@962 #1036@1009} physReg:NA Preferences=[rax]

CarolEidt · 2020-04-28T19:26:32Z

I would guess that V08 didn't get a register at entry, or was allocated rax and then latter spilled, but was not marked spilled. I'm not sure why there are two RefPositions at location zero (i.e. entry). The entry RefPositions (which are usually either RefTypeParamDef or RefTypeZeroInit but not both, get special handling, so perhaps something is missing there.

AndyAyersMS · 2020-04-28T19:44:34Z

For the two ref positions:

V08 was live in to first block: creating ZeroInit
<RefPosition #2   @0   RefTypeZeroInit <Ivl:8 V08> IL_OFFSET BB01 regmask=[allIntButFP] minReg=1>
V08 is a finally var: creating ZeroInit
<RefPosition #3   @0   RefTypeZeroInit <Ivl:8 V08> IL_OFFSET BB01 regmask=[allIntButFP] minReg=1>

I think V08's liveness is over-stated but perhaps the use (in a return) is leaking upwards through infeasible EH paths.

First bit of the allocation table:

-----------------------------------+-----+-----+-----+-----+-----+-----+-----+-----+-----+
Loc  RP#   Name  Type  Action Reg  |rax  |rcx  |rdx  |rbx  |rbp  |rsi  |rdi  |r8   |r9   |
-----------------------------------+-----+-----+-----+-----+-----+-----+-----+-----+-----+
                                   |     |V0  a|V1  a|     |     |     |     |     |     |
   0.#0    V1    Parm   Alloc rsi  |     |V0  a|     |     |     |V1  a|     |     |     |
   0.#1    V0    Parm   Alloc rdi  |     |     |     |     |     |V1  a|V0  a|     |     |
   0.#2    V8    Zero   Alloc rax  |V8  a|     |     |     |     |V1  a|V0  a|     |     |
   0.#3    V8    Zero   Keep  rax  |V8  a|     |     |     |     |V1  a|V0  a|     |     |
   1.#4    BB1  PredBB0            |V8  a|     |     |     |     |V1  a|V0  a|     |     |
   7.#5    rdx   Fixd   Keep  rdx  |V8  a|     |     |     |     |V1  a|V0  a|     |     |
   7.#6    V1    Use    Copy  rdx  |V8  a|     |V1  a|     |     |V1  a|V0  a|     |     |
   8.#7    rdx   Fixd   Keep  rdx  |V8  a|     |V1  a|     |     |V1  a|V0  a|     |     |
   8.#8    I35   Def    Alloc rdx  |V8  a|     |I35 a|     |     |V1  a|V0  a|     |     |
  10.#9    C36   Def    Alloc rcx  |V8  a|C36 a|I35 a|     |     |V1  a|V0  a|     |     |
  11.#10   rcx   Fixd   Keep  rcx  |V8  a|C36 a|I35 a|     |     |V1  a|V0  a|     |     |
  11.#11   C36   Use *  Keep  rcx  |V8  a|C36 a|I35 a|     |     |V1  a|V0  a|     |     |
  12.#12   rcx   Fixd   Keep  rcx  |V8  a|     |I35 a|     |     |V1  a|V0  a|     |     |
  12.#13   I37   Def    Alloc rcx  |V8  a|I37 a|I35 a|     |     |V1  a|V0  a|     |     |
  13.#14   rdx   Fixd   Keep  rdx  |V8  a|I37 a|I35 a|     |     |V1  a|V0  a|     |     |
  13.#15   I35   Use *  Keep  rdx  |V8  a|I37 a|I35 a|     |     |V1  a|V0  a|     |     |
  13.#16   rcx   Fixd   Keep  rcx  |V8  a|I37 a|I35 a|     |     |V1  a|V0  a|     |     |
  13.#17   I37   Use *  Keep  rcx  |V8  a|I37 a|I35 a|     |     |V1  a|V0  a|     |     |
  14.#18   rax   Kill   Spill rax  |     |     |     |     |     |V1  a|V0  a|     |     |
                        Keep  rax  |     |     |     |     |     |V1  a|V0  a|     |     |

Method being jitted:

runtime/src/libraries/System.Linq.Expressions/src/System/Linq/Expressions/Compiler/VariableBinder.cs

Lines 145 to 195 in b4f7380

    
           // If the immediate child is another scope, merge it into this one 
        
           // This is an optimization to save environment allocations and 
        
           // array accesses. 
        
           private ReadOnlyCollection<Expression> MergeScopes(Expression node) 
        
           { 
        
               ReadOnlyCollection<Expression> body; 
        
               var lambda = node as LambdaExpression; 
        
               if (lambda != null) 
        
               { 
        
                   body = new ReadOnlyCollection<Expression>(new[] { lambda.Body }); 
        
               } 
        
               else 
        
               { 
        
                   body = ((BlockExpression)node).Expressions; 
        
               } 
        
               CompilerScope currentScope = _scopes.Peek(); 
        
               // A block body is mergeable if the body only contains one single block node containing variables, 
        
               // and the child block has the same type as the parent block. 
        
               while (body.Count == 1 && body[0].NodeType == ExpressionType.Block) 
        
               { 
        
                   var block = (BlockExpression)body[0]; 
        
                   if (block.Variables.Count > 0) 
        
                   { 
        
                       // Make sure none of the variables are shadowed. If any 
        
                       // are, we can't merge it. 
        
                       foreach (ParameterExpression v in block.Variables) 
        
                       { 
        
                           if (currentScope.Definitions.ContainsKey(v)) 
        
                           { 
        
                               return body; 
        
                           } 
        
                       } 
        
                       // Otherwise, merge it 
        
                       if (currentScope.MergedScopes == null) 
        
                       { 
        
                           currentScope.MergedScopes = new HashSet<BlockExpression>(ReferenceEqualityComparer.Instance); 
        
                       } 
        
                       currentScope.MergedScopes.Add(block); 
        
                       foreach (ParameterExpression v in block.Variables) 
        
                       { 
        
                           currentScope.Definitions.Add(v, VariableStorageKind.Local); 
        
                       } 
        
                   } 
        
                   body = block.Expressions; 
        
               } 
        
               return body; 
        
           }

V08 (local6) is defined/used by that inner return.

AndyAyersMS · 2020-04-28T19:46:33Z

I'm also looking for simpler failing test cases or a simple repro based on the above, but no luck so far.

CarolEidt · 2020-04-28T21:35:12Z

I was able to reproduce this using SuperPmi. It is indeed the double zero-init that is causing the problem. I submitted PR #35585 to fix it.

AndyAyersMS · 2020-04-29T07:28:36Z

Still ~34 failures ( a bit less ) with #35585. They all look like unexpected null reference exceptions. I'll look for a simple case and try and see what's up.

AndyAyersMS · 2020-05-01T01:15:22Z

A number of the remaining bugs involve WaitForExitCore -- failure is an AV in the callee GetProcessHandle because we call it with null in RCX.

Looks like either there's a missing reload of RCX before the call, or else the xor of RCX at the end of the zeroing done in the jit prolog inadvertently trashes RCX which we expected we could keep live.

Will keep digging.

;; System.Diagnostics.Process:WaitForExitCore(int):bool:this
G_M64267_IG01:
       55                   push     rbp
       4156                 push     r14
       57                   push     rdi
       56                   push     rsi
       53                   push     rbx
       4883EC40             sub      rsp, 64
       488D6C2460           lea      rbp, [rsp+60H]
       33C0                 xor      rax, rax
       488945D8             mov      qword ptr [rbp-28H], rax
       488965C0             mov      qword ptr [rbp-40H], rsp
       48894D10             mov      gword ptr [rbp+10H], rcx
       895518               mov      dword ptr [rbp+18H], edx
       33C9                 xor      rcx, rcx
						;; bbWeight=1    PerfScore 10.25
G_M64267_IG02:
       4533C0               xor      r8, r8
       33C0                 xor      rax, rax
       488945D8             mov      gword ptr [rbp-28H], rax
						;; bbWeight=1    PerfScore 1.50
G_M64267_IG03:

;  **** need to reload RCX here, or not zero it above ***

       BA00001000           mov      edx, 0x100000
       4533C0               xor      r8d, r8d
       E8081BFEFF           call     System.Diagnostics.Process:GetProcessHandle(int,bool):Microsoft.Win32.SafeHandles.SafeProcessHandle:this

AndyAyersMS · 2020-05-01T02:34:56Z

Looks to me like RCX is getting trashed. The troublemaker is

;  V02 loc0         [V02,T02] ( 11,  4.50)     ref  ->  [rbp-0x28]   EH must-init class-hnd EH-live

which is assigned RCX for a stretch down in the try body. We zero both its memory and its register locations in the jit prolog; unfortunately the register location isn't live there.

Relevant logic is here:

runtime/src/coreclr/src/jit/codegencommon.cpp

Lines 7563 to 7573 in c614097

    
           /* For lvMustInit vars, gather pertinent info */ 
        
           if (!varDsc->lvMustInit) 
        
           { 
        
               continue; 
        
           } 
        
           bool isInReg    = varDsc->lvIsInReg(); 
        
           bool isInMemory = !isInReg || varDsc->lvLiveInOutOfHndlr; 
        
           if (isInReg) 
        
           {

Seems plausible that if a variable is must init and in both memory and a register we only need to zero memory in the prolog, but that's probably too simplistic.

@CarolEidt thoughts?

CarolEidt · 2020-05-01T16:29:22Z

It seems that the register allocator thinks that the variable is in RCX at procedure entry, so that seems to be the source of the issue. I did a little searching and I didn't find any clear indication of what might be going wrong. If you have a jitdump you can search for:

Recording Var Locations at start of BB01

Which will show the variables that it believes are live in registers at the start of the block (it shows up twice, once we we are generating code for that block, and once just prior to:

*************** In genFnProlog()

Let me know if you'd like me to take over tracking this down.

AndyAyersMS · 2020-05-01T17:21:10Z

Only V00 (this) is live in BB01:

Recording Var Locations at start of BB01
  V00(rcx)
*************** In genFnProlog()

V02 becomes live in BB01, but at that point it's in RAX. It only live is in RCX later in the method -- that is, its highest refpos appearances are in RCX. I wonder if this carries over to the prolog codegen and that's why we think we need to zero RCX.

For a variable that lives in different registers in different parts of the code, what's the intended meaning of varDsc->GetRegNum() ?

CarolEidt · 2020-05-01T17:42:46Z

For a variable that lives in different registers in different parts of the code, what's the intended meaning of varDsc->GetRegNum() ?

It is the "current" register occupied by the variable. Unfortunately, I believe that it relies on the invariant that you only query it if the variable is actually live. Otherwise the register allocator would have to reset the register number for all the variables at each boundary, not just those that are live. In this case V02 isn't live, but its last register was apparently RCX. It seems that this would be an issue for any lvMustInit variable that was in a register at the end of the method, so I'm not sure why this isn't a problem without EHWriteThru.

AndyAyersMS · 2020-05-01T17:52:36Z

If you want to investigate, here's a simple repro:

using System;
using System.Diagnostics;

class X
{
    public static int Main()
    {
        var process = new Process {
            StartInfo = new ProcessStartInfo {
                FileName = "notepad.exe"
            }
        };
        process.Start();
        process.WaitForExit();
        return 100;
    }
}

Key method is WaitForExitCore.

CarolEidt · 2020-05-01T17:54:28Z

Thanks, I'll take a look.

CarolEidt · 2020-05-01T20:18:32Z

So ... without EHWriteThru we never have a lvMustInit variable that's not live-in to the entry block, but with EHWriteThru we set lvMustInit on all the variables that are live-in to a finally block. Here's a fix that addresses this case, and adds an assert that in all other cases it must be live-in: #35723

AndyAyersMS · 2020-05-01T20:21:14Z

Great -- this fixes my local repro case. I'll add the jit-experimental testing to your new PR.

This may be the last issue for x64 Pri1 tests.

AndyAyersMS · 2020-05-03T20:57:26Z

Now that all the fixes are in, I launched a few ASP.NET perf runs.

[EDIT] Baseline data here is stale, so improvements are not accurate -- see comments below

On Json for windows x64, I am seeing a 7% improvement in RPS and 10% improvement in latency.

Description	RPS	CPU (%)	Memory (MB)	Avg. Latency (ms)	Startup (ms)	Build Time (ms)	Published Size (KB)	First Request (ms)	Latency (ms)	Errors	Ratio
Baseline	329,134	75	168	2.47	466	4551	106554	51.12	0.52	0	1.00
EHWT	350,934	78	163	2.21	458	4053	106808	57.89	0.54	0	1.07

On ResponseCachingPlaintextCached for windows x64, 4% improvement on RPS, 10% on latency.

Description	RPS	CPU (%)	Memory (MB)	Avg. Latency (ms)	Startup (ms)	Build Time (ms)	Published Size (KB)	First Request (ms)	Latency (ms)	Errors	Ratio
Baseline	900,719	95	168	3.44	459	4540	106554	53.86	0.55	0	1.00
EHWT	936,250	96	168	3.09	460	4056	106808	61.25	0.48	0	1.04

@sebastienros can you help us do more comprehensive testing? To enable this you need a build from Saturday or later, and need to set

COMPlus_EnableEHWriteThru=1

sebastienros · 2020-05-04T15:10:30Z

What OSes and architectures should it be tested on?

AndyAyersMS · 2020-05-04T15:44:08Z

It should improve codegen for all OSes/Architectures.

sebastienros · 2020-05-04T17:47:48Z

Preliminary results on 12 core machines

Scenario	ARCH	OS	ENV	RPS	CPU (%)	Memory (MB)	Avg. Latency (ms)	Startup (ms)	Build Time (ms)	Published Size (KB)	First Request (ms)	Latency (ms)
PlaintextPlatform	INTEL	Windows	Baseline	4,974,135	74	53	2.75	266	3867	87990	31.77	0.45
PlaintextPlatform	INTEL	Windows	EHWT	4,965,548	71	54	2.63	272	3545	87990	32.88	0.49
Plaintext	INTEL	Windows	Baseline	2,309,179	85	58	1.93	530	4212	106817	41.3	0.48
Plaintext	INTEL	Windows	EHWT	2,287,358	85	58	2.12	524	4044	106817	41.67	0.41
MvcPlaintext	INTEL	Windows	Baseline	781,889	93	182	4.78	585	4031	106817	72.85	0.44
MvcPlaintext	INTEL	Windows	EHWT	771,005	93	180	4.61	577	4047	106817	72.31	0.37
Json	INTEL	Windows	Baseline	328,910	74	164	2.35	516	4041	106817	53.85	0.39
Json	INTEL	Windows	EHWT	330,200	75	165	2.19	510	4037	106817	52.58	0.44
MvcJson	INTEL	Windows	Baseline	162,222	88	179	3.44	577	4040	106817	82.17	0.48
MvcJson	INTEL	Windows	EHWT	163,071	84	180	3.18	576	4034	106817	87.4	0.68
DbFortunesRaw	INTEL	Windows	Baseline	115,521	88	214	2.63	512	4035	106817	389.58	1.07
DbFortunesRaw	INTEL	Windows	EHWT	115,419	88	207	2.6	519	4028	106817	387.23	1.11
MvcDbFortunesEf	INTEL	Windows	Baseline	42,030	88	333	6.85	640	4029	106817	921.32	1.3
MvcDbFortunesEf	INTEL	Windows	EHWT	41,798	89	325	7.09	644	4038	106817	920.97	1.35
PlaintextPlatform	INTEL	Linux	Baseline	5,115,384	99	69	2.08	224	4502	102118	49.33	0.39
PlaintextPlatform	INTEL	Linux	EHWT	5,176,576	99	70	2.39	231	4002	102118	52.28	0.34
Plaintext	INTEL	Linux	Baseline	2,084,199	99	83	1.33	381	4335	120698	54.46	0.4
Plaintext	INTEL	Linux	EHWT	2,083,963	99	82	1.35	372	4002	120698	59.62	0.51
MvcPlaintext	INTEL	Linux	Baseline	723,120	97	205	3.27	395	4002	120698	101.96	0.46
MvcPlaintext	INTEL	Linux	EHWT	738,900	97	204	3.19	414	4002	120698	93.82	0.52
Json	INTEL	Linux	Baseline	360,628	99	191	1.13	375	4002	120698	68.11	0.49
Json	INTEL	Linux	EHWT	364,641	99	191	1.09	381	4002	120698	67.28	0.45
MvcJson	INTEL	Linux	Baseline	177,961	98	200	1.59	392	4002	120698	105.27	0.74
MvcJson	INTEL	Linux	EHWT	176,572	98	201	1.67	415	4002	120698	110.76	0.72
DbFortunesRaw	INTEL	Linux	Baseline	105,510	97	234	2.7	372	4002	120698	431.16	1.24
DbFortunesRaw	INTEL	Linux	EHWT	105,452	97	235	2.7	385	4002	120698	426.17	1.28
MvcDbFortunesEf	INTEL	Linux	Baseline	44,683	96	254	5.82	471	4002	120698	937.98	1.56
MvcDbFortunesEf	INTEL	Linux	EHWT	44,849	97	259	5.79	481	4002	120698	937.98	1.59

AndyAyersMS · 2020-05-04T18:14:49Z

Sigh, I had a bug in my script and was comparing new runs to older baseline files.

Results vs proper baselines are more in line with the data Sebastien has gathered. Odd though that Json RPS was 350K yesterday (for both baseline and EHWT) and only 330K today.

sebastienros · 2020-05-04T20:52:47Z

Adding more numbers, this time on the citrine environment. INTEL machines are 14/28(ht) cores, ARM is 32 cores.

Scenario	ARCH	OS	ENV	RPS	CPU (%)	Memory (MB)	Avg. Latency (ms)	Startup (ms)	Build Time (ms)	Published Size (KB)	First Request (ms)	Latency (ms)
PlaintextPlatform	ARM	Linux	Baseline	5,532,785	98	75	1.04	502	10005	115711	119.75	0.46
PlaintextPlatform	ARM	Linux	EHWT	5,686,295	98	73	0.91	509	10006	115711	114.75	0.18
Plaintext	ARM	Linux	Baseline	2,083,741	99	90	1.54	968	12257	134302	129.02	0.34
Plaintext	ARM	Linux	EHWT	2,074,222	99	91	1.77	952	12006	134302	137.13	0.29
MvcPlaintext	ARM	Linux	Baseline	289,407	93	121	10.09	1096	12006	134302	237.66	1.13
MvcPlaintext	ARM	Linux	EHWT	296,528	92	118	9.72	1112	12257	134302	233.25	0.57
Json	ARM	Linux	Baseline	351,495	98	126	1.48	938	12006	134302	163.35	0.59
Json	ARM	Linux	EHWT	355,377	98	125	1.52	955	12007	134302	159.38	0.55
MvcJson	ARM	Linux	Baseline	105,397	96	120	3.5	1088	12007	134302	258.84	0.82
MvcJson	ARM	Linux	EHWT	111,171	96	125	3.37	1082	12007	134302	268.7	0.68
FortunesPlatform	ARM	Linux	Baseline	82,966	95	167	4.16	516	12507	115713	1278.05	0.8
FortunesPlatform	ARM	Linux	EHWT	82,249	94	196	4.18	498	10005	115713	1303.77	1.29
DbFortunesRaw	ARM	Linux	Baseline	60,219	96	182	5.16	930	12007	134302	1187.43	2.23
DbFortunesRaw	ARM	Linux	EHWT	59,997	95	190	5.2	955	12006	134302	1211.3	1.35
MvcDbFortunesEf	ARM	Linux	Baseline	27,915	95	282	10.66	1254	11757	134302	2890.65	2.75
MvcDbFortunesEf	ARM	Linux	EHWT	27,969	96	300	10.73	1223	12007	134302	2893.05	3.47
PlaintextPlatform	INTEL	Windows	Baseline	7,949,569	82	59	0.45	285	5536	87990	35.75	0.11
PlaintextPlatform	INTEL	Windows	EHWT	7,766,029	82	58	0.4	310	4784	87990	36.08	0.11
Plaintext	INTEL	Windows	Baseline	5,296,659	92	63	2.46	565	7031	106818	45.43	0.15
Plaintext	INTEL	Windows	EHWT	5,396,604	88	62	2.23	547	4527	106818	45.41	0.1
MvcPlaintext	INTEL	Windows	Baseline	2,322,387	98	414	4.27	621	4544	106818	82.96	0.15
MvcPlaintext	INTEL	Windows	EHWT	2,336,585	98	412	5.28	603	4532	106818	83.08	0.13
Json	INTEL	Windows	Baseline	775,633	80	390	0.45	548	4532	106818	59.57	0.13
Json	INTEL	Windows	EHWT	760,177	82	391	0.46	576	4525	106818	59.87	0.12
MvcJson	INTEL	Windows	Baseline	514,506	92	404	4.43	618	4523	106818	94.46	0.22
MvcJson	INTEL	Windows	EHWT	508,750	92	401	3.72	641	4525	106818	93.77	0.21
FortunesPlatform	INTEL	Windows	Baseline	310,423	82	433	1.4	318	4037	87993	513.02	0.38
FortunesPlatform	INTEL	Windows	EHWT	309,203	78	430	1.44	310	4036	87993	505.71	0.4
DbFortunesRaw	INTEL	Windows	Baseline	278,565	83	434	1.58	548	4531	106818	462.88	0.48
DbFortunesRaw	INTEL	Windows	EHWT	276,785	85	435	1.58	541	4539	106818	463.26	0.44
MvcDbFortunesEf	INTEL	Windows	Baseline	121,677	94	476	5.55	709	4532	106818	1097.87	0.56
MvcDbFortunesEf	INTEL	Windows	EHWT	123,023	96	473	4.09	686	4530	106818	1100.22	0.54
PlaintextPlatform	INTEL	Linux	Baseline	9,008,473	99	77	0.49	195	4752	102122	30.89	0.11
PlaintextPlatform	INTEL	Linux	EHWT	9,060,435	98	76	0.48	195	4002	102122	29.7	0.11
Plaintext	INTEL	Linux	Baseline	4,180,523	99	89	0.9	340	7502	120702	36.39	0.11
Plaintext	INTEL	Linux	EHWT	4,181,031	99	89	1.06	337	4001	120702	36.27	0.11
MvcPlaintext	INTEL	Linux	Baseline	1,617,319	98	435	1.65	384	4001	120702	68.48	0.12
MvcPlaintext	INTEL	Linux	EHWT	1,588,558	98	436	1.78	381	4001	120702	68.36	0.13
Json	INTEL	Linux	Baseline	794,491	99	419	1.2	344	4001	120702	48.11	0.11
Json	INTEL	Linux	EHWT	797,031	99	417	1.11	334	4001	120702	48.19	0.12
MvcJson	INTEL	Linux	Baseline	420,179	98	432	1.02	381	4001	120702	77.18	0.19
MvcJson	INTEL	Linux	EHWT	421,569	98	432	1.03	388	4001	120702	77.56	0.2
FortunesPlatform	INTEL	Linux	Baseline	303,162	98	472	1.34	199	4001	102124	406.91	0.34
FortunesPlatform	INTEL	Linux	EHWT	301,909	98	462	1.4	201	3751	102124	403.55	0.32
DbFortunesRaw	INTEL	Linux	Baseline	255,194	98	486	1.46	340	4001	120702	376.94	0.33
DbFortunesRaw	INTEL	Linux	EHWT	254,049	98	489	1.52	341	4001	120702	379.51	0.39
MvcDbFortunesEf	INTEL	Linux	Baseline	101,943	96	488	2.77	436	4001	120702	904.9	0.64
MvcDbFortunesEf	INTEL	Linux	EHWT	103,843	97	492	2.8	427	4001	120702	928.15	0.6

AndyAyersMS · 2020-05-05T00:07:22Z

How much variability do you usually see in results? I'd be surprised if enabling this made things slower, but I see cases above where it looks like we lose 2-3% on RPS. If that's accurate, we should try and look at codegen for the key methods more closely.

sebastienros · 2020-05-05T13:51:48Z

Each number is the average of two runs. I will redo the ones that regressed just to be sure. It might happen that a run is bad.

CarolEidt · 2020-05-05T14:29:38Z

There are definitely places where EH Write through could be slightly worse. In situations where there are multiple definitions in the Try clause, each of those will do a store, and may also define a register - requiring an additional mov if the value could have been directly defined to memory. It is most effective when there are more uses than defs, and when register pressure is not excessive.

sebastienros · 2020-05-05T14:57:23Z

I ran PlaintextPlatform 5 times for each, and the min value for no EHWT is greater than the max value with it, so it's definitely slower with on this scenario:

COMPlus_EnableEHWriteThru=0

Description	RPS	CPU (%)	Memory (MB)	Avg. Latency (ms)	Startup (ms)	Build Time (ms)	Published Size (KB)	First Request (ms)	Latency (ms)
√	8,001,783	84	58	0.381	287.0385	5572.3578	87991	35.5769	0.0977
√	8,039,917	89	58	0.417	281.8752	4027.2104	87991	36.4238	0.1013
√	8,101,041	84	59	0.40608	296.4301	4028.946	87991	35.6608	0.0958
√	7,960,022	85	57	0.41706	286.7777	4043.8363	87991	36.1157	0.1007
√	7,894,460	79	58	0.38552	281.9774	4053.8737	87991	35.8702	0.0974

COMPlus_EnableEHWriteThru=1

Description	RPS	CPU (%)	Memory (MB)	Avg. Latency (ms)	Startup (ms)	Build Time (ms)	Published Size (KB)	First Request (ms)	Latency (ms)
√	7,877,458	82	59	0.383	288.2986	4028.7659	87991	36.1246	0.1018
√	7,891,998	81	58	0.39981	297.0558	4029.3682	87991	35.6291	0.0972
√	7,815,282	78	59	0.39094	286.5538	4045.0774	87991	36.0377	0.1061
√	7,842,825	78	58	0.42286	314.2182	4536.1388	87991	35.9681	0.0999
√	7,804,675	90	58	0.428	286.0867	4021.7947	87991	35.5549	0.0971

AndyAyersMS · 2020-05-05T15:39:43Z

It is most effective when there are more uses than defs, and when register pressure is not excessive.

I'm guessing you've looked into this, but couldn't we screen candidates and only enable write through for locals that have favorable def/use ratios?

CarolEidt · 2020-05-05T16:13:10Z

couldn't we screen candidates and only enable write through for locals that have favorable def/use ratios?

I didn't pursue that; we don't readily have that information until we build Intervals, and at that point it is difficult for the register allocator to change its mind about making something a candidate. That said, it could presumably decide that some of the candidates should never get a register, effectively making it the same as if it were not a candidate, though that would require some tweaking to avoid actually allocating a register when not needed (it does better with RegOptional uses than defs).

It would require retaining additional information on each Interval which would in turn probably require some additional tuning to ensure that it doesn't degrade throughput.

AndyAyersMS · 2020-05-05T18:52:40Z

I wonder if we could gather this info during one of our ref count traversals. We currently don't distinguish reads and writes, but it seems like we easily could.

Probably best to first dig into the code that's causing slowdowns in the tests above to ensure this is why things end up slower.

AndyAyersMS · 2020-05-06T19:17:42Z

This issue is nominally done. Would like to close and capture what remains in a new issue. @CarolEidt is there an issue for enabling EH Write Thru by default? I couldn't find one.

Feels like there is still a moderate amount of work to do before we can turn EH Write Thru on by default, and some risk that we're just not going to see the benefits here we'd hoped to see. I'd like to do a more careful evaluation and try and find or create examples where we clearly do expect to see benefits.

Seems like all this might be a stretch for 5.0 but possibly worthwhile; trying to assess if we should find a way to do that sometime soon, or defer all of this until after 5.0.

CarolEidt · 2020-05-06T22:08:31Z

Thanks for all your analysis, Andy. I agree that this issue, the investigation, is complete.
There's no issue for enabling EH write thru by default - I'll add one.

AndyAyersMS · 2020-05-06T22:45:04Z

Thanks, I will close this issue.

@sebastienros thanks for running all those tests for us -- we'll need to dig in deeper to understand what's going on. Hope we can get to it before too long.

AndyAyersMS added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Apr 27, 2020

AndyAyersMS added this to the 5.0 milestone Apr 27, 2020

AndyAyersMS self-assigned this Apr 27, 2020

Dotnet-GitSync-Bot added the untriaged New issue has not been triaged by the area owner label Apr 27, 2020

AndyAyersMS removed the untriaged New issue has not been triaged by the area owner label Apr 28, 2020

AndyAyersMS mentioned this issue Apr 28, 2020

JIT: fixes for EH Write Thru and OSR #35550

Merged

CarolEidt mentioned this issue May 6, 2020

JIT: Enable EH Write Thru by default #35923

Open

4 tasks

AndyAyersMS closed this as completed May 6, 2020

ghost locked as resolved and limited conversation to collaborators Dec 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JIT: investigate EH Write Through Failures #35534

JIT: investigate EH Write Through Failures #35534

AndyAyersMS commented Apr 27, 2020

AndyAyersMS commented Apr 28, 2020

CarolEidt commented Apr 28, 2020

AndyAyersMS commented Apr 28, 2020

AndyAyersMS commented Apr 28, 2020 •

edited

Loading

CarolEidt commented Apr 28, 2020

AndyAyersMS commented Apr 28, 2020

AndyAyersMS commented Apr 28, 2020

CarolEidt commented Apr 28, 2020

AndyAyersMS commented Apr 29, 2020

AndyAyersMS commented May 1, 2020

AndyAyersMS commented May 1, 2020

CarolEidt commented May 1, 2020

AndyAyersMS commented May 1, 2020

CarolEidt commented May 1, 2020

AndyAyersMS commented May 1, 2020

CarolEidt commented May 1, 2020

CarolEidt commented May 1, 2020

AndyAyersMS commented May 1, 2020

AndyAyersMS commented May 3, 2020 •

edited

Loading

sebastienros commented May 4, 2020

AndyAyersMS commented May 4, 2020

sebastienros commented May 4, 2020

AndyAyersMS commented May 4, 2020

sebastienros commented May 4, 2020

AndyAyersMS commented May 5, 2020

sebastienros commented May 5, 2020

CarolEidt commented May 5, 2020

sebastienros commented May 5, 2020

AndyAyersMS commented May 5, 2020

CarolEidt commented May 5, 2020

AndyAyersMS commented May 5, 2020

AndyAyersMS commented May 6, 2020

CarolEidt commented May 6, 2020

AndyAyersMS commented May 6, 2020

JIT: investigate EH Write Through Failures #35534

JIT: investigate EH Write Through Failures #35534

Comments

AndyAyersMS commented Apr 27, 2020

AndyAyersMS commented Apr 28, 2020

CarolEidt commented Apr 28, 2020

AndyAyersMS commented Apr 28, 2020

AndyAyersMS commented Apr 28, 2020 • edited Loading

CarolEidt commented Apr 28, 2020

AndyAyersMS commented Apr 28, 2020

AndyAyersMS commented Apr 28, 2020

CarolEidt commented Apr 28, 2020

AndyAyersMS commented Apr 29, 2020

AndyAyersMS commented May 1, 2020

AndyAyersMS commented May 1, 2020

CarolEidt commented May 1, 2020

AndyAyersMS commented May 1, 2020

CarolEidt commented May 1, 2020

AndyAyersMS commented May 1, 2020

CarolEidt commented May 1, 2020

CarolEidt commented May 1, 2020

AndyAyersMS commented May 1, 2020

AndyAyersMS commented May 3, 2020 • edited Loading

sebastienros commented May 4, 2020

AndyAyersMS commented May 4, 2020

sebastienros commented May 4, 2020

AndyAyersMS commented May 4, 2020

sebastienros commented May 4, 2020

AndyAyersMS commented May 5, 2020

sebastienros commented May 5, 2020

CarolEidt commented May 5, 2020

sebastienros commented May 5, 2020

AndyAyersMS commented May 5, 2020

CarolEidt commented May 5, 2020

AndyAyersMS commented May 5, 2020

AndyAyersMS commented May 6, 2020

CarolEidt commented May 6, 2020

AndyAyersMS commented May 6, 2020

AndyAyersMS commented Apr 28, 2020 •

edited

Loading

AndyAyersMS commented May 3, 2020 •

edited

Loading