Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: Enable EH Write Thru by default #35923

Open
1 of 4 tasks
CarolEidt opened this issue May 6, 2020 · 7 comments
Open
1 of 4 tasks

JIT: Enable EH Write Thru by default #35923

CarolEidt opened this issue May 6, 2020 · 7 comments
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI Bottom Up Work Not part of a theme, epic, or user story User Story A single user-facing feature. Can be grouped under an epic.
Milestone

Comments

@CarolEidt
Copy link
Contributor

CarolEidt commented May 6, 2020

Today, if all the variables that are live cross exception handlers (EH) are kept on stack. This proves expensive for hot methods containing EH code. It is also expensive for certain language constructs like presence async which calls in .NET libraries code that has EH. In below example, even if method Calculate doesn't have any EH regions, the presence of async introduces EH region.

public class AsyncTest
{
    private static AsyncTest obj = new AsyncTest();

    [MethodImpl(MethodImplOptions.NoInlining)]
    public static async Task<int> Calculate()
    {
        return await obj.GetResult();
    }

    [MethodImpl(MethodImplOptions.NoInlining)]
    private async Task<int> GetResult()
    {
        Console.WriteLine("Calculating...");
        await Task.Delay(1 * 1000);
        return 1;
    }
}
; Assembly listing for method MiniBench.AsyncTest:Calculate():System.Threading.Tasks.Task`1[Int32]
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows

; Lcl frame size = 56

G_M26483_IG01:              ;; offset=0000H
; ...
; ...
G_M26483_IG02:              ;; offset=0015H
       33C9                 xor      rcx, rcx
       48894C2428           mov      gword ptr [rsp+28H], rcx
       C7442420FFFFFFFF     mov      dword ptr [rsp+20H], -1
       488D4C2420           lea      rcx, bword ptr [rsp+20H]
       E842FFFFFF           call     System.Runtime.CompilerServices.AsyncMethodBuilderCore:Start(byref)
       488B442428           mov      rax, gword ptr [rsp+28H]
       4885C0               test     rax, rax
; ...

The call to AsyncMethodBuilderCore:Start contains EH code and the performance for these methods can be degraded because of accessing the EH vars from stack. We added a work around specially inside AsyncMethodBuilderCore:Start() in dotnet/coreclr#15629 to make sure that EH vars get registers.

We have an implementation of enregistering EH Vars but from the analysis of #35534 showed mixed results for enabling EH Write Thru by default. This issue tracks of enabling the EH Write thru by default. The outcome that we are expecting is that in addition to storing the EH variable values on stack, also store them in a register and use register (as much as possible) during the uses in non-EH code. Additionally, also assign a register for the code that is totally contained inside a EH region.

There are at least a couple of issues, though further analysis should be done to validate that these are the major contributing factors to performance regressions:

  • In situations where there are multiple definitions in a Try clause, each of those will do a store, and may also define a register - requiring an additional move if the value could have been directly defined to memory.
  • In cases of high register pressure, even though the EH vars are considered to have lower spill cost (since they never actually have to be stored, only reloaded at uses), the increase in register pressure due to the EH vars can expose (known) weaknesses in the register allocator's handling of code with high register-pressure.

One option would be to track the number/ratio of defs and uses. Currently we don't readily have that information until we build Intervals, and at that point it is difficult for the register allocator to change its mind about making something a candidate. That said, it could presumably decide that some of the candidates should never get a register, effectively making it the same as if it were not a candidate, though that would require some tweaking to avoid actually allocating a register when not needed (it does better with RegOptional uses than defs).

It's possible that this tracking could be done prior to register allocation at the time of ref counting.

For now, we will take an incremental approach:

Future work

  • Enable EH write thru for local vars having multiple def. This has following dependency:
    • Tune register allocation heuristics to make it prefer spilling lower-weight values rather than assigning bad free register. ( Captured in [LSRA][RyuJIT] Tune register selection heuristics #43318)
    • Further, explore if EH write thru decision can be depend on the ratio of defs/uses and if yes, include that heuristics.

Related issue:

Workaround added so far: dotnet/coreclr#15629

category:cq
theme:register-allocator
skill-level:expert
cost:medium

@CarolEidt CarolEidt added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label May 6, 2020
@CarolEidt CarolEidt added this to the 5.0 milestone May 6, 2020
@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added the untriaged New issue has not been triaged by the area owner label May 6, 2020
@AndyAyersMS AndyAyersMS removed the untriaged New issue has not been triaged by the area owner label May 6, 2020
@benaadams
Copy link
Member

@CarolEidt is this 5.0? I can look at cleaning up some of the hot path workarounds if that would be helpful?

@CarolEidt
Copy link
Contributor Author

@benaadams - I think I should have put this in "Future". The issue is that it degrades performance in some cases (see the comments above, and the linked issue). What's needed is to first fully understand the causes for the scenarios where performance degrades and then determine the best approach to mitigate those.

@benaadams
Copy link
Member

I'm hoping it's caused by the hot-paths current higher register cost to workaround the EH issue; which then the writethrough adds to; so taking those out should make it lighter? 52d6d27

@CarolEidt
Copy link
Contributor Author

Oh, shoot. At the time I was working on EH WriteThru I had change those workarounds, but I'm pretty sure that wasn't done for the measurements described in #35534. The one in AsyncMethodBuilderCore:Start could just be removed with EH WriteThru, but the other one did slightly worse when removed, because it has multiple definitions, so I'd modified that one to use a temporary variable for the conditional definition (which got a register) and then there could be a single EH-live definition. In that case the JIT could possibly be taught to only do the write on entry to the try, but that would, of course, add more complexity.

@CarolEidt CarolEidt modified the milestones: Future, 6.0.0 Nov 14, 2020
@kunalspathak
Copy link
Member

While starting to analyzing the scenarios where we would see regression, here is an example of:

In situations where there are multiple definitions in a Try clause, each of those will do a store, and may also define a register - requiring an additional move if the value could have been directly defined to memory.

image

During build interval, since EH Var is register candidate, we create an interval for it.

image

Here, #157 is a def and #177 is a use as see below:

image

During register allocation, we assign rax to #157

image

and then use the value from rax at #177
image

We update the register information in our tree as seen here:
image

and then finally, during code generation, we store value of cnst 1 in rax before moving it to a register:

image

Ideally, if there is just a single def, single use, we should avoid allocating register to it. That is one of the criteria of not allocating register as Carol pointed.

That said, it could presumably decide that some of the candidates should never get a register, effectively making it the same as if it were not a candidate, though that would require some tweaking to avoid actually allocating a register when not needed (it does better with RegOptional uses than defs).

Here, although the #157 def was marked with regOptional, we ended up allocating a register to it.

@kunalspathak kunalspathak self-assigned this Jan 19, 2021
@kunalspathak kunalspathak added the User Story A single user-facing feature. Can be grouped under an epic. label Jan 27, 2021
@JulieLeeMSFT JulieLeeMSFT added the Bottom Up Work Not part of a theme, epic, or user story label Feb 11, 2021
@JulieLeeMSFT JulieLeeMSFT modified the milestones: 6.0.0, Future Mar 22, 2021
@JulieLeeMSFT
Copy link
Member

Moved future since .NET 6 scope items are complete now.

@kunalspathak
Copy link
Member

kunalspathak commented May 17, 2023

I spent some time to enable this for multi-vars and the ones that needs zero initialization and from the asmdiffs, we continue to see the regressions because of the reason Carol mentioned:

the increase in register pressure due to the EH vars can expose (known) weaknesses in the register allocator's handling of code with high register-pressure.

The first line of attack would be to track the def/use ratio and see if we can enable enregistration of lvlVar if def/use <= 0.25 (for every def there should be at least 4 uses).

  1. During early phase, in lvaMarkLocalVars(), continue marking the variables as singleDef.
  2. Because of step 1, the liveness update that happens during SSABuild will skip marking singleDef as DoNotRegister.
  3. During liveness update that happens after lowering, calculate the def/use ratio (which should be possible in lvaComputeRefCounts). Mark all the variables that are lvLiveInOutOfHndlr == true and whose def/use > 0.25 as doNotEnregister.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI Bottom Up Work Not part of a theme, epic, or user story User Story A single user-facing feature. Can be grouped under an epic.
Projects
Archived in project
Development

No branches or pull requests

6 participants