Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

[WIP] Support Write-Thru of EH variables in LSRA #26087

Closed
wants to merge 12 commits into from

Conversation

CarolEidt
Copy link

Mark EH variables (those that are live in or out of exception regions) only as lvLiveInOutOfHndlr, not necessarily lvDoNotEnregister
During register allocation, mark these as write-thru, and mark all defs as write-thru, ensuring that the stack value is always valid.
Mark those defs with GTF_SPILLED (this the "reload" flag and is not currently used for pure defs) to indicate that it should be kept in the register.
Mark blocks that enter EH regions as having no predecessor, and set the location of all live-in vars to be on the stack.
Change genFnPrologCalleeRegArgs to store EH vars also to the stack if they have a register assignment.

@benaadams
Copy link
Member

ExecutionContext.RunInternal might be a good example method to test with?

// Enregister variables with 0 post-fix so they can be used in registers without EH forcing them to stack
// Capture references to Thread Contexts
Thread currentThread0 = Thread.CurrentThread;
Thread currentThread = currentThread0;
ExecutionContext? previousExecutionCtx0 = currentThread0._executionContext;
if (previousExecutionCtx0 != null && previousExecutionCtx0.m_isDefault)
{
// Default is a null ExecutionContext internally
previousExecutionCtx0 = null;
}
// Store current ExecutionContext and SynchronizationContext as "previousXxx".
// This allows us to restore them and undo any Context changes made in callback.Invoke
// so that they won't "leak" back into caller.
// These variables will cross EH so be forced to stack
ExecutionContext? previousExecutionCtx = previousExecutionCtx0;
SynchronizationContext? previousSyncCtx = currentThread0._synchronizationContext;
if (executionContext != null && executionContext.m_isDefault)
{
// Default is a null ExecutionContext internally
executionContext = null;
}
if (previousExecutionCtx0 != executionContext)
{
RestoreChangedContextToThread(currentThread0, executionContext, previousExecutionCtx0);
}
ExceptionDispatchInfo? edi = null;
try
{
callback.Invoke(state);
}
catch (Exception ex)
{
// Note: we have a "catch" rather than a "finally" because we want
// to stop the first pass of EH here. That way we can restore the previous
// context before any of our callers' EH filters run.
edi = ExceptionDispatchInfo.Capture(ex);
}
// Re-enregistrer variables post EH with 1 post-fix so they can be used in registers rather than from stack
SynchronizationContext? previousSyncCtx1 = previousSyncCtx;
Thread currentThread1 = currentThread;
// The common case is that these have not changed, so avoid the cost of a write barrier if not needed.
if (currentThread1._synchronizationContext != previousSyncCtx1)
{
// Restore changed SynchronizationContext back to previous
currentThread1._synchronizationContext = previousSyncCtx1;
}
ExecutionContext? previousExecutionCtx1 = previousExecutionCtx;
ExecutionContext? currentExecutionCtx1 = currentThread1._executionContext;
if (currentExecutionCtx1 != previousExecutionCtx1)
{
RestoreChangedContextToThread(currentThread1, previousExecutionCtx1, currentExecutionCtx1);
}
// If exception was thrown by callback, rethrow it now original contexts are restored
edi?.Throw();

Currently it takes second copies of variables so it can enregister them as the EH they pass over otherwise forces everything to stack.

@AndyAyersMS
Copy link
Member

@benaadams can you cross-link the PR/issue where you were working on this? And are there other examples you know of?

@benaadams
Copy link
Member

benaadams commented Aug 9, 2019

@benaadams can you cross-link the PR/issue where you were working on this?

This was the original change #15629 that moved from using the ExecutionContextSwitcher struct to locals and introduced locals that crossed the EH (on stack) and locals to read from them to allow enregistering as part of the feedback and analysis of asm.

#20628 introduced a generic copy of ExecutionContext.RunInternal for for async iterators

I looked to simplify it in #21908 by moving the try/catch block into a different method, so the main body didn't need so many locals; but it wasn't ideal.

And are there other examples you know of?

The two ExecutionContext.RunInternal methods:

RunInternal(ExecutionContext? executionContext, ContextCallback callback, object? state)
RunInternal<TState>(ExecutionContext? executionContext, ContextCallback<TState> callback, ref TState state)

Also every async method, both from
AsyncMethodBuilderCore.Start<TStateMachine>(ref TStateMachine stateMachine) which has the same mitigation pattern, and I think most of the statemachines would benefit due to the liberal use of try/catch in them by the C# compiler.

Those are the 3 methods which have mitigations that I'm aware of. They make the code quite messy; and since it always currently has to spill to stack; the number of reads/uses has to out number the writes to be of benefit. However, these functions are both ubiquitously used and also in most async and ThreadPool hot paths that it made it worth it.

It would be great to simplify the code to just use a single local for each variable rather than 3 (pre-EH, cross-EH, post-EH); and the Jit could potentially generate better asm from it?

@CarolEidt
Copy link
Author

@benaadams - thanks for the pointers! Once I've got this actually working across the board I'll have a look at those.

the number of reads/uses has to out number the writes to be of benefit

Yes, that's true even within the register allocator. I've added some heuristics, but I suspect they'll need further tuning.

@CarolEidt
Copy link
Author

Initial results aren't 100% promising. Although there's an overall code-size reduction (x64 pmi diffs of frameworks and benchmarks), the results are definitely mixed:

Total bytes of diff: -76154 (-0.18% of base)
9239 total methods with size differences (4316 improved, 4923 regressed), 243271 unchanged.

Looking at System.Threading.ExecutionContext:RunInternal and System.Runtime.CompilerServices.AsyncMethodBuilderCore:Start they suffer from excessive use of callee-save regs, resulting in larger code even with the mitigations removed.

There are several factors at play.

  • The benefit of a write-thru local having a callee-save register is half that of a normal local, since it never has to be saved, only restored.
  • Methods with EH are more likely to have multiple prologs and epilogs, exacerbating the impact of saves & restores.
  • The funclet saves & restores the same set of callee-save registers, even if they are not all used in the funclet.

So I'll plan to investigate whether I can come up with better heuristics to deal with the first issue at least.

@benaadams
Copy link
Member

I think removing the mitigations would look like: c9d005a

@erozenfeld
Copy link
Member

@dotnet/jit-contrib

@maryamariyan
Copy link
Member

Thank you for your contribution. As announced in dotnet/coreclr#27549 this repository will be moving to dotnet/runtime on November 13. If you would like to continue working on this PR after this date, the easiest way to move the change to dotnet/runtime is:

  1. In your coreclr repository clone, create patch by running git format-patch origin
  2. In your runtime repository clone, apply the patch by running git apply --directory src/coreclr <path to the patch created in step 1>

@BruceForstall BruceForstall added the post-consolidation PRs which will be hand ported to dotnet/runtime label Nov 7, 2019
Mark EH variables (those that are live in or out of exception regions) only as lvLiveInOutOfHndlr, not necessarily lvDoNotEnregister
During register allocation, mark these as write-thru, and mark all defs as write-thru, ensuring that the stack value is always valid.
Mark those defs with GTF_SPILLED (this the "reload" flag and is not currently used for pure defs) to indicate that it should be kept in the register.
Mark blocks that enter EH regions as having no predecessor, and set the location of all live-in vars to be on the stack.
Change genFnPrologCalleeRegArgs to store EH vars also to the stack if they have a register assignment.
Tune callee-save heuristics for EH vars.
Remove/modify mitigations
… they always go to memory.

- Never split a block if the only resolution moves are for EH vars.
- Add a test case to enable diff tracking for #21973
Modify weight for EH vars in LIR
Take into account how many callee save regs there are
@maryamariyan
Copy link
Member

Thank you for your contribution. As announced in #27549 the dotnet/runtime repository will be used going forward for changes to this code base. Closing this PR as no more changes will be accepted into master for this repository. If you’d like to continue working on this change please move it to dotnet/runtime.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen post-consolidation PRs which will be hand ported to dotnet/runtime
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants