Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow Timers to resume async state machines directly without requiring a Task #48875

Closed
davidfowl opened this issue Feb 27, 2021 · 17 comments
Closed
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Threading.Tasks tenet-performance Performance related issue

Comments

@davidfowl
Copy link
Member

davidfowl commented Feb 27, 2021

Description

Today we allocate a Task (DelayPromise) and a TimerQueueTimer whenever Task.Delay is called. This is fine for most cases, but we can avoid the Task allocation all together when used in the very common case of pausing and resuming state machines following a very similar pattern to Task.Yield().

while (someCondition)
{
   await DoHealthCheckPingAsync();
   await Task.Delay(50);
}

Data

Today this allocates tasks on every iteration (just showing the timer related allocations here):

image

Analysis

The TimerQueueTimer type should be able to hold a reference to the IAsyncStateMachineBox directly instead of going through delegate invocations. Then we can use a custom awaiter or a ValueTask backed by IValueTaskSource to implement the new API. To avoid the extra object reference in the TimerQueueTimer, we could stash the IAsyncStateMachineBox reference in the current _state field.

public class ValueTask
{
+   public ValueTask Delay(TimeSpan delay);
}

Bonus: This direct link between the timer and the state machine would make it a little easier to find the code that is waiting on the timer to fire (I know it's possible today its just more painful using dumpdelegate)

PS: I understand exposing another method and potentially type for optimization purposes can be confusing.

Also PS: This isn't possible to implement today outside of the core libraries.

@davidfowl davidfowl added api-suggestion Early API idea and discussion, it is NOT ready for implementation tenet-performance Performance related issue labels Feb 27, 2021
@dotnet-issue-labeler dotnet-issue-labeler bot added area-System.Threading.Tasks untriaged New issue has not been triaged by the area owner labels Feb 27, 2021
@ghost
Copy link

ghost commented Feb 27, 2021

Tagging subscribers to this area:
See info in area-owners.md if you want to be subscribed.

Issue Details

Description

Today we allocate a Task (DelayPromise) and a TimerQueueTimer whenever Task.Delay is called. This is fine for most cases, but we can avoid the Task allocation all together when used in the very common case of pausing and resuming state machines following a very similar pattern to Task.Yield().

while (someCondition)
{
   await DoHealthCheckPingAsync();
   await Task.Delay(50);
}

Data

Today this allocates tasks on every iteration (just showing the timer related allocations here):

image

Analysis

The TimerQueueTimer type should be able to hold a reference to the IAsyncStateMachineBox directly instead of going through delegate invocations. Then we can use a custom awaiter or a ValueTask backed by IValueTaskSource to implement the new API. To avoid the extra object reference in the TimerQueueTimer, we could stash the IAsyncStateMachineBox reference in the current _state field.

public class ValueTask
{
+   public ValueTask Delay(TimeSpan delay);
}

PS: I understand exposing another method and potentially type for optimization purposes can be confusing.

Author: davidfowl
Assignees: -
Labels:

api-suggestion, area-System.Threading.Tasks, tenet-performance, untriaged

Milestone: -

@stephentoub
Copy link
Member

stephentoub commented Feb 27, 2021

Does this allocation really matter in practice? The delay itself is inherently throttling the allocation rate.

Not disagreeing that it could be done, rather questioning whether it's really worth the extra surface area and, as you say, confusion.

You previously opened an issue about an IAsyncEnumerable-based timer. For a recurring timer (which presumably is the reason one might care about the extra allocation) that would have a much lower amortized allocation overhead.

@stephentoub
Copy link
Member

Also PS: This isn't possible to implement today outside of the core libraries.

If you have a pool of timers, you can achieve an even lower amortized allocation rate than creating a new TimerQueueTimer on each use, and that's something that can be done by anyone as well.

@davidfowl
Copy link
Member Author

I was thinking that we should reuse TimerQueueTimer instances as well (where possible), that was going to be my next issue 😄

@stephentoub
Copy link
Member

In Task.Delay? I doubt it's even close to be worth the cost of pooling. Prove me wrong with real-app numbers and we could certainly implement it. :-)

@davidfowl
Copy link
Member Author

Does this allocation really matter in practice? The delay itself is inherently throttling the allocation rate.

I don't know. I actually think the additional diagnostics benefit is nice (though that can be solved in other ways).

You previously opened an issue about an IAsyncEnumerable-based timer. For a recurring timer (which presumably is the reason one might care about the extra allocation) that would have a much lower amortized allocation overhead.

Right, and that is a better solution to a recurring timer. I figured I'd bring this up anyways just because I was looking at timer leaks of the form:

await Task.WhenAny(SomeAsyncOperation(), Timer.Delay(delay));

@stephentoub
Copy link
Member

I was looking at timer leaks of the form

I hope WaitAsync will make 99.9% of such usage go away eventuality.

@davidfowl
Copy link
Member Author

I hope WaitAsync will make 99.9% of such usage go away eventuality.

It would help to pair it with an analyzer.

I wouldn't die on a hill for this and its hard to change the past but I wanted to call out the inefficiency.

In Task.Delay? I doubt it's even close to be worth the cost of pooling. Prove me wrong with real-app numbers and we could certainly implement it. :-)

I'll look around. Allocations are indeed throttled by the delay itself is true for the pattern I showed but if requests aren't throttled you can allocate lots of timers, and tasks 😄. Just imagine, with value task pooling turned on, the state machine itself would be pooled and the task would go away. Then you stash store a reusable timer on the AsyncStateMachineBox if the state machine ever waits on the timer and reuse it for future delays.

Magic....

@stephentoub
Copy link
Member

It would help to pair it with an analyzer.

That's what #33805 (comment) tracks.

I actually think the additional diagnostics benefit is nice

I think you'd find various diagnostics actually get worse, at least until every tool is revved to understand the new thing. One of the great things about every async operation mapping to a Task is it provides a single type everything can gravitate to. That includes APIs and tools. ValueTask provides another such thing for the API side of the equation, but tools, not so much. Every IValueTaskSource implementation then needs to be recognized and parsed in a unique way, making them largely opaque to all diagnostic tools today. You also lose some of the implicit tracing Tasks provide, impacting profilers that try to stitch together async flows. It's "just work", but a non-trivial amount.

@davidfowl
Copy link
Member Author

Ah, well. I tried 😉 .

@davidfowl
Copy link
Member Author

Then you stash store a reusable timer on the AsyncStateMachineBox if the state machine ever waits on the timer and reuse it for future delays.

There's something here though..

@stephentoub
Copy link
Member

There's something here though..

Can you elaborate on the idea? How would you do this without making every yielding await more expensive?

@davidfowl
Copy link
Member Author

davidfowl commented Feb 27, 2021

Essentially the AsyncStateMachineBox gets an extra Timer field that get used like this in the new DelayAwaiter:

interface IAsyncStateMachineBox
{
+   TimerQueueTimer Timer { get; }
}
public struct DelayAwaiter : ICriticalNotifyCompletion, IStateMachineBoxAwareAwaiter
{
      private readonly TimeSpan _ts;
      public DelayAwaiter(TimeSpan ts)
      {
          _ts = ts;
      }
      
      public DelayAwaiter GetAwaiter() => this;
      public bool IsCompleted => false;
      public void GetResult() { }

      public void AwaitUnsafeOnCompleted(IAsyncStateMachineBox box)
      {
          box.Timer.Initialize(box, _ts, _ts);
      }

      public void OnCompleted(Action continuation) { ... }
      public void UnsafeOnCompleted(Action continuation){ ... }
  }
}

public static class Timer2
{
    public static DelayAwaiter Delay(TimeSpan delay) => new DelayAwaiter(delay);
}

This does change the timing a little as the timer isn't started by until AwaitUnsafeOnCompleted is called but lets pretend that's OK. Now the extra field on AsyncStateMachineBox<T> would mean all boxed state machines pay this cost. My only other crazy idea is to avoid this is to have the compiler create a TimerQueueTimer field and property (likely backed by some interface) on the state machine itself. This could be further decoupled by letting the compiler generate a read/write object property that could be used as storage for the TimerQueueTimer. That way it wouldn't need to be a public contract.

This is the field version:

private class AsyncStateMachineBox<TStateMachine> : // SOS DumpAsync command depends on this name
            Task<TResult>, IAsyncStateMachineBox
            where TStateMachine : IAsyncStateMachine
{
+    TimerQueueTimer { get; set; } 
}

The compiler optimized, TimerQueueTimer per state machine version.

interface IAsyncStateMachineState // In compiler services
{
    object State { get; set; }
}

private class AsyncStateMachineBox<TStateMachine> : // SOS DumpAsync command depends on this name
            Task<TResult>, IAsyncStateMachineBox
            where TStateMachine : IAsyncStateMachine
{
    TimerQueueTimer Timer { 
        get 
        { 
            var stateHolder = (IAsyncStateMachineState )TStateMachine;
            return (TimerQueueTimer)stateHolder.State ??= new TimerQueueTimer(); // Empty ctor will be set by caller
        }
    }
}

I think this would work but open to hearing why this is 🍌s

@davidfowl
Copy link
Member Author

This isn't a metric of course but look at how many frames we could remove 😄

WebApplication40.dll!WebApplication40.Startup.Configure.AnonymousMethod__1_1(Microsoft.AspNetCore.Http.HttpContext context) Line 142	C#
[Resuming Async Method]	
System.Private.CoreLib.dll!System.Runtime.CompilerServices.AsyncTaskMethodBuilder<System.Threading.Tasks.VoidTaskResult>.AsyncStateMachineBox<System.__Canon>.ExecutionContextCallback(object s)	Unknown
System.Private.CoreLib.dll!System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state)	Unknown
System.Private.CoreLib.dll!System.Runtime.CompilerServices.AsyncTaskMethodBuilder<System.Threading.Tasks.VoidTaskResult>.AsyncStateMachineBox<WebApplication40.Startup.MoveNext(System.Threading.Thread threadPoolThread)	Unknown
System.Private.CoreLib.dll!System.Runtime.CompilerServices.AsyncTaskMethodBuilder<System.Threading.Tasks.VoidTaskResult>.AsyncStateMachineBox<System.__Canon>.MoveNext()	Unknown
-System.Private.CoreLib.dll!System.Threading.Tasks.AwaitTaskContinuation.RunOrScheduleAction(System.Action action, bool allowInlining)	Unknown
-System.Private.CoreLib.dll!System.Threading.Tasks.Task.RunContinuations(object continuationObject)	Unknown
-System.Private.CoreLib.dll!System.Threading.Tasks.Task.FinishContinuations()	Unknown
-System.Private.CoreLib.dll!System.Threading.Tasks.Task.TrySetResult()	Unknown
-System.Private.CoreLib.dll!System.Threading.Tasks.Task.DelayPromise.CompleteTimedOut()	Unknown
-System.Private.CoreLib.dll!System.Threading.Tasks.Task.DelayPromise..ctor.AnonymousMethod__1_0(object state)	Unknown
System.Private.CoreLib.dll!System.Threading.TimerQueueTimer.CallCallback(bool isThreadPool)	Unknown
System.Private.CoreLib.dll!System.Threading.TimerQueueTimer.Fire(bool isThreadPool)	Unknown
System.Private.CoreLib.dll!System.Threading.TimerQueue.FireNextTimers()	Unknown
System.Private.CoreLib.dll!System.Threading.TimerQueue.AppDomainTimerCallback(int id)	Unknown
[Native to Managed Transition]	
[Managed to Native Transition]	
System.Private.CoreLib.dll!System.Threading.UnmanagedThreadPoolWorkItem.System.Threading.IThreadPoolWorkItem.Execute()	Unknown
System.Private.CoreLib.dll!System.Threading.ThreadPoolWorkQueue.Dispatch()	Unknown
System.Private.CoreLib.dll!System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart()	Unknown
System.Private.CoreLib.dll!System.Threading.Thread.StartCallback()	Unknown

@stephentoub
Copy link
Member

This isn't a metric of course

Correct 😉

Now the extra field on AsyncStateMachineBox would mean all boxed state machines pay this cost.

Right, that's not a good tradeoff. Only a small minority of async methods await Task.Delay(...), but every one would pay this cost.

I think this would work but open to hearing why this is 🍌s

You could probably make it work, but:

  • It'd require the compiler to have first-class knowledge of this particular API to know that an additional field/property/interface implementation need be injected.
  • Every task-like builder would need to special-case this awaiter type to provide a custom implementation rather than using the awaiter's implementation.
  • The awaiter would still need an implementation not relying on that field to support other task-like builders.
  • In the general case the compiler would not be able to prove if or how many times the await Whatever.Delay(...) was used, so it could end up increasing the size of the state machine for no good reason, and it would only end up being a win in terms of allocation if it were hit more than once.

It's a fine thought experiment, but I don't really know what problem this is trying to solve. Sounds like your goal is to avoid allocating a new timer every time await Whatever.Delay(...) is hit inside of a loop in an async method; even if that proved valuable (and, again, I'm skeptical), there are way simpler ways to achieve that.

@davidfowl
Copy link
Member Author

I'm not trying to solve a specific problem. I just think that the task overhead that we have is unnecessary. Many advances have been make to async/await and the machinery behind it in .NET Core and I'm looking for opportunities to optimize.

Task.Yield didn't need optimizing either but we did it (it's not apples to apples but the idea is the same). Also I'm inspired by how go structures primitives that go routines can wait on; those primitives basically hold onto a go-routine and resume them. I think we're much closer to this since the refactoring (I know I'm splitting hairs though).

I like how the bcl sometimes cut through layers when they are unnecessary while still exposing the public API for normal use cases (for example timers don't capture the execution context when used in a Task.Delay). This is in that category of optimization.

Anyways that's a long winded way of saying there are lots of interesting opportunities like this that we should have these discussions about 😁.

@stephentoub
Copy link
Member

This is in that category of optimization.

If this were just about optimizing an existing implementation, that'd be different; at that point it becomes a conversation purely about the work involved and the technical tradeoffs. But this issue is talking about exposing new public API; that's not the same category.

I'm looking for opportunities to optimize.

Great. 😄 I just ask that we focus on cases we know to actual need optimizing because they're actually impactful, in particular when proposing new public API.

@ghost ghost locked as resolved and limited conversation to collaborators Mar 30, 2021
@tannergooding tannergooding removed the untriaged New issue has not been triaged by the area owner label Jun 24, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Threading.Tasks tenet-performance Performance related issue
Projects
None yet
Development

No branches or pull requests

3 participants