-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow Timers to resume async state machines directly without requiring a Task #48875
Comments
Tagging subscribers to this area: Issue DetailsDescriptionToday we allocate a while (someCondition)
{
await DoHealthCheckPingAsync();
await Task.Delay(50);
} DataToday this allocates tasks on every iteration (just showing the timer related allocations here): AnalysisThe public class ValueTask
{
+ public ValueTask Delay(TimeSpan delay);
} PS: I understand exposing another method and potentially type for optimization purposes can be confusing.
|
Does this allocation really matter in practice? The delay itself is inherently throttling the allocation rate. Not disagreeing that it could be done, rather questioning whether it's really worth the extra surface area and, as you say, confusion. You previously opened an issue about an IAsyncEnumerable-based timer. For a recurring timer (which presumably is the reason one might care about the extra allocation) that would have a much lower amortized allocation overhead. |
If you have a pool of timers, you can achieve an even lower amortized allocation rate than creating a new TimerQueueTimer on each use, and that's something that can be done by anyone as well. |
I was thinking that we should reuse TimerQueueTimer instances as well (where possible), that was going to be my next issue 😄 |
In Task.Delay? I doubt it's even close to be worth the cost of pooling. Prove me wrong with real-app numbers and we could certainly implement it. :-) |
I don't know. I actually think the additional diagnostics benefit is nice (though that can be solved in other ways).
Right, and that is a better solution to a recurring timer. I figured I'd bring this up anyways just because I was looking at timer leaks of the form: await Task.WhenAny(SomeAsyncOperation(), Timer.Delay(delay)); |
I hope WaitAsync will make 99.9% of such usage go away eventuality. |
It would help to pair it with an analyzer. I wouldn't die on a hill for this and its hard to change the past but I wanted to call out the inefficiency.
I'll look around. Allocations are indeed throttled by the delay itself is true for the pattern I showed but if requests aren't throttled you can allocate lots of timers, and tasks 😄. Just imagine, with value task pooling turned on, the state machine itself would be pooled and the task would go away. Then you stash store a reusable timer on the AsyncStateMachineBox if the state machine ever waits on the timer and reuse it for future delays. Magic.... |
That's what #33805 (comment) tracks.
I think you'd find various diagnostics actually get worse, at least until every tool is revved to understand the new thing. One of the great things about every async operation mapping to a Task is it provides a single type everything can gravitate to. That includes APIs and tools. ValueTask provides another such thing for the API side of the equation, but tools, not so much. Every IValueTaskSource implementation then needs to be recognized and parsed in a unique way, making them largely opaque to all diagnostic tools today. You also lose some of the implicit tracing Tasks provide, impacting profilers that try to stitch together async flows. It's "just work", but a non-trivial amount. |
Ah, well. I tried 😉 . |
There's something here though.. |
Can you elaborate on the idea? How would you do this without making every yielding await more expensive? |
Essentially the AsyncStateMachineBox gets an extra Timer field that get used like this in the new DelayAwaiter: interface IAsyncStateMachineBox
{
+ TimerQueueTimer Timer { get; }
} public struct DelayAwaiter : ICriticalNotifyCompletion, IStateMachineBoxAwareAwaiter
{
private readonly TimeSpan _ts;
public DelayAwaiter(TimeSpan ts)
{
_ts = ts;
}
public DelayAwaiter GetAwaiter() => this;
public bool IsCompleted => false;
public void GetResult() { }
public void AwaitUnsafeOnCompleted(IAsyncStateMachineBox box)
{
box.Timer.Initialize(box, _ts, _ts);
}
public void OnCompleted(Action continuation) { ... }
public void UnsafeOnCompleted(Action continuation){ ... }
}
}
public static class Timer2
{
public static DelayAwaiter Delay(TimeSpan delay) => new DelayAwaiter(delay);
} This does change the timing a little as the timer isn't started by until AwaitUnsafeOnCompleted is called but lets pretend that's OK. Now the extra field on This is the field version: private class AsyncStateMachineBox<TStateMachine> : // SOS DumpAsync command depends on this name
Task<TResult>, IAsyncStateMachineBox
where TStateMachine : IAsyncStateMachine
{
+ TimerQueueTimer { get; set; }
} The compiler optimized, TimerQueueTimer per state machine version. interface IAsyncStateMachineState // In compiler services
{
object State { get; set; }
}
private class AsyncStateMachineBox<TStateMachine> : // SOS DumpAsync command depends on this name
Task<TResult>, IAsyncStateMachineBox
where TStateMachine : IAsyncStateMachine
{
TimerQueueTimer Timer {
get
{
var stateHolder = (IAsyncStateMachineState )TStateMachine;
return (TimerQueueTimer)stateHolder.State ??= new TimerQueueTimer(); // Empty ctor will be set by caller
}
}
} I think this would work but open to hearing why this is 🍌s |
This isn't a metric of course but look at how many frames we could remove 😄 WebApplication40.dll!WebApplication40.Startup.Configure.AnonymousMethod__1_1(Microsoft.AspNetCore.Http.HttpContext context) Line 142 C#
[Resuming Async Method]
System.Private.CoreLib.dll!System.Runtime.CompilerServices.AsyncTaskMethodBuilder<System.Threading.Tasks.VoidTaskResult>.AsyncStateMachineBox<System.__Canon>.ExecutionContextCallback(object s) Unknown
System.Private.CoreLib.dll!System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state) Unknown
System.Private.CoreLib.dll!System.Runtime.CompilerServices.AsyncTaskMethodBuilder<System.Threading.Tasks.VoidTaskResult>.AsyncStateMachineBox<WebApplication40.Startup.MoveNext(System.Threading.Thread threadPoolThread) Unknown
System.Private.CoreLib.dll!System.Runtime.CompilerServices.AsyncTaskMethodBuilder<System.Threading.Tasks.VoidTaskResult>.AsyncStateMachineBox<System.__Canon>.MoveNext() Unknown
-System.Private.CoreLib.dll!System.Threading.Tasks.AwaitTaskContinuation.RunOrScheduleAction(System.Action action, bool allowInlining) Unknown
-System.Private.CoreLib.dll!System.Threading.Tasks.Task.RunContinuations(object continuationObject) Unknown
-System.Private.CoreLib.dll!System.Threading.Tasks.Task.FinishContinuations() Unknown
-System.Private.CoreLib.dll!System.Threading.Tasks.Task.TrySetResult() Unknown
-System.Private.CoreLib.dll!System.Threading.Tasks.Task.DelayPromise.CompleteTimedOut() Unknown
-System.Private.CoreLib.dll!System.Threading.Tasks.Task.DelayPromise..ctor.AnonymousMethod__1_0(object state) Unknown
System.Private.CoreLib.dll!System.Threading.TimerQueueTimer.CallCallback(bool isThreadPool) Unknown
System.Private.CoreLib.dll!System.Threading.TimerQueueTimer.Fire(bool isThreadPool) Unknown
System.Private.CoreLib.dll!System.Threading.TimerQueue.FireNextTimers() Unknown
System.Private.CoreLib.dll!System.Threading.TimerQueue.AppDomainTimerCallback(int id) Unknown
[Native to Managed Transition]
[Managed to Native Transition]
System.Private.CoreLib.dll!System.Threading.UnmanagedThreadPoolWorkItem.System.Threading.IThreadPoolWorkItem.Execute() Unknown
System.Private.CoreLib.dll!System.Threading.ThreadPoolWorkQueue.Dispatch() Unknown
System.Private.CoreLib.dll!System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart() Unknown
System.Private.CoreLib.dll!System.Threading.Thread.StartCallback() Unknown |
Correct 😉
Right, that's not a good tradeoff. Only a small minority of async methods
You could probably make it work, but:
It's a fine thought experiment, but I don't really know what problem this is trying to solve. Sounds like your goal is to avoid allocating a new timer every time |
I'm not trying to solve a specific problem. I just think that the task overhead that we have is unnecessary. Many advances have been make to async/await and the machinery behind it in .NET Core and I'm looking for opportunities to optimize. Task.Yield didn't need optimizing either but we did it (it's not apples to apples but the idea is the same). Also I'm inspired by how go structures primitives that go routines can wait on; those primitives basically hold onto a go-routine and resume them. I think we're much closer to this since the refactoring (I know I'm splitting hairs though). I like how the bcl sometimes cut through layers when they are unnecessary while still exposing the public API for normal use cases (for example timers don't capture the execution context when used in a Task.Delay). This is in that category of optimization. Anyways that's a long winded way of saying there are lots of interesting opportunities like this that we should have these discussions about 😁. |
If this were just about optimizing an existing implementation, that'd be different; at that point it becomes a conversation purely about the work involved and the technical tradeoffs. But this issue is talking about exposing new public API; that's not the same category.
Great. 😄 I just ask that we focus on cases we know to actual need optimizing because they're actually impactful, in particular when proposing new public API. |
Description
Today we allocate a
Task
(DelayPromise) and aTimerQueueTimer
wheneverTask.Delay
is called. This is fine for most cases, but we can avoid theTask
allocation all together when used in the very common case of pausing and resuming state machines following a very similar pattern to Task.Yield().Data
Today this allocates tasks on every iteration (just showing the timer related allocations here):
Analysis
The
TimerQueueTimer
type should be able to hold a reference to theIAsyncStateMachineBox
directly instead of going through delegate invocations. Then we can use a custom awaiter or a ValueTask backed by IValueTaskSource to implement the new API. To avoid the extra object reference in theTimerQueueTimer
, we could stash theIAsyncStateMachineBox
reference in the current_state
field.public class ValueTask { + public ValueTask Delay(TimeSpan delay); }
Bonus: This direct link between the timer and the state machine would make it a little easier to find the code that is waiting on the timer to fire (I know it's possible today its just more painful using dumpdelegate)
PS: I understand exposing another method and potentially type for optimization purposes can be confusing.
Also PS: This isn't possible to implement today outside of the core libraries.
The text was updated successfully, but these errors were encountered: