rt: Tolerate slow task polls #6315

carllerche · 2024-01-30T18:33:41Z

Currently, Tokio relies on well-behaved application behavior, including tasks completing their polls in a reasonable time. When user tasks are not well-behaved, the application can suffer high latencies due to the Tokio scheduler being blocked by the user task and unable to process other tasks.

Fundamentally, because Tokio is an asynchronous task scheduler, it requires well-behaved user tasks. Any attempt to mitigate poorly behaved tasks will most likely be less effective than an OS's thread scheduler. That said, it is still valuable to be able to detect and mitigate these poorly behaved tasks.

I don't know what the best strategy for this is. My gut is it will be a background process, either in a dedicated background thread or handled by idle workers, that checks that active workers are making consistent progress.

Any solution cannot impact the existing performance of well-behaved apps, so if the best path forward comes with additional overhead, it needs to be opt-in.

Moving this issue forward will require the following:

A survey of other async systems to determine prior art.
A report of our options with an analysis of pros/cons, including performance impact.
A decision on which avenue to take.

Refs: #4730

exabytes18 · 2024-02-01T05:22:38Z

Currently, Tokio relies on well-behaved application behavior, including tasks completing their polls in a reasonable time.

It may be worth trying to further define what "well-behaved application behavior" actually is, as it's probably not immediately obvious to some developers (or me). I don't know how long tasks are reasonably allowed to execute, or what set of circumstances may lead to degraded performance.

Anecdotally, I come from working on JVM-based webservices, and have some implicit fear of GC. For the sake of argument, let's say a typical GC pause may be 20ms. Thus, I'd draw the line at something like a task taking 20ms to complete. Does this mean that due to the pathological nature of the scheduler, accidentally running a task that takes 20ms could result in my tokio/rust application pausing for longer than GC pauses we typically see in the JVM world?

Depending on your standpoint, maybe 20ms is too much or maybe it's too little. It's hard to say. At what point do you offload that work to background thread pool / spawn_blocking / whatever?
How savvy are developers at knowing how long all tasks take to complete?
- One case we saw of this was decompressing messages come off the wire from kafka. Decompressing a chunk of bytes seems relatively safe on the surface (no IO, anyway), but as a tokio-user, you need to now know that doing so in an async task is not a "well-behaved application" as, depending on how large those messages are, they could result in spending more CPU than you might realize.

Last thought for now: the problem is very apparent if you run a very lengthy task (something that takes 1000ms), as demo'd in the past github issues. However, is the problem still present, just at a more difficult to measure level, with tasks that take 10us? The point being that if this actually impacts tasks both long and short, then this issue does represent a performance issue for reasonably well-behaved applications. That is, what is the overall system throughput penalty resulting from running a spurious task that takes 1us, or 10us, or 100us, or 1ms, or 10ms, or 100ms, or 1000ms? What is throughput penalty of various fixes posed in the previous github issues?

Darksonn · 2024-02-01T09:23:52Z

Yes, 20ms is definitely too long. In my article on blocking, I recommend 10μs to 100μs as an upper limit.

In general, there are three ways that long polls can cause degraded performance:

Every single worker thread is blocked by a different task.
There's a single blocking future, and there is a task in the LIFO slot of that thread (because the blocking future woke it up).
There's a single blocking future on the thread that is currently responsible for receiving incoming IO/timer events.

In the first case, there is nothing we can do. The second case is tracked by #4941. This issue tracks the third case. In all other cases, work stealing will kick in and move tasks to a different worker so that they do not have to wait for the blocking future.

The way that this issue gets triggered is as follows:

All threads are idle. Thread A is waiting for IO/timer events.
Thread A receives an IO or timer event and wakes up to handle it.
A single task is woken up by the IO/timer event, and thread A polls it.
During the call to poll, a new IO (or timer) event becomes available.
Thread A does not see the IO event since it's polling a future. The other worker threads don't see it either, since none of them are currently registered to receive IO events.

An important distinction here is that work-stealing is working correctly. There's only a single ready-to-run task on the entire runtime, so there is no work stealing that could happen. The problem is that the tasks will not get marked as ready until the runtime next asks the OS for IO events.

None of this happens if there are other active workers, because they will take over IO events when they next become idle. This is why you can make your own monitor thread by regularly spawning tasks on the runtime.

A silver lining of the above is that the issue can only really happen when all threads are idle and a single incoming IO event arrives, so it will not happen while the runtime is under load.

CinchBlue · 2024-02-04T01:11:34Z

Hey -- I just landed up being affected by #4730 -- in my case, only 2 out of the 8 Tokio threads I had available were in deadlock but the rest of the 6 were not proceeding.

I don't know what the best strategy for this is. My gut is it will be a background process, either in a dedicated background thread or handled by idle workers, that checks that active workers are making consistent progress.

Idea: in addition to having this "monitor thread" doing automatic spawning on the runtime, would it make sense to allow for manually signaling the runtime to do wake-up/shuffling of tasks/workers to allow movement in this case when it gets stuck? This wouldn't solve the issue definitively, but it would also allow testing whether a currently "not-well-formed" program is either only partially deadlocked or entirely deadlocked. I can see something like this being valuable while debugging.

Any solution cannot impact the existing performance of well-behaved apps, so if the best path forward comes with additional overhead, it needs to be opt-in.

If I remember from the discussion, there was more than 1 solution being considered. Eventually, as implementations come about, they would just be gated behind tokio_unstable features?

inevity · 2024-02-16T13:59:47Z

Does current thread runtime with localset suffer those 2 /3 issue @Darksonn ？
I think it will suffer 3， and what about 2？

Darksonn · 2024-02-19T10:09:20Z

@inevity The current-thread runtime can only suffer from case 1. After all, if you are blocking one thread, you are blocking all of them. (The current-thread runtime has no lifo slot.)

would it make sense to allow for manually signaling the runtime to do wake-up/shuffling of tasks/workers to allow movement in this case when it gets stuck?

You can call handle.spawn(async {}) on a handle to the runtime. That will wake a new worker thread, which will then check the IO driver after handling your spawned task.

But I want to clarify that there is no shuffling of tasks that can happen here. It's not like there are a bunch of tasks on those two threads that could run if they were just moved to a different worker thread. Other than your blocking tasks, all other tasks on your runtime are considered idle. It's just that if you polled the IO driver, then that would result in more tasks being woken. But since the IO driver isn't being polled, those tasks are considered idle, and that is why the runtime isn't polling them.

rcoh · 2024-11-01T10:31:59Z

I had a thought on this: what if we had a mode (or a way to mark tasks) as poorly behaved.

Each poll would run as if the user had wrapped the code with block_in_place. There are none of the normal block in place risks here — we know nothing else is going to make progress on that task anyway while a single poll is in progress.

This obviously would degrade performance for some applications, but it seems like a potential thing to try for a more tolerant Tokio.

In some experiments even 200us polls were benefitted by this behavior.

Avi-D-coder · 2024-11-01T21:33:20Z

I had a thought on this: what if we had a mode (or a way to mark tasks) as poorly behaved.

@rcoh I wrote about something similar on Reddit, my post was not well received, but I maintain something like this is an important feature tokio should have.

carllerche added A-tokio Area: The main tokio crate C-feature-request Category: A feature request. labels Jan 30, 2024

This was referenced Jan 30, 2024

One bad task can halt all executor progress forever #4730

Closed

rt: reduce the impact of CPU bound tasks on the overall runtime shceduler #6251

Closed

Darksonn added the M-runtime Module: tokio/runtime label Jan 30, 2024

theron-eg mentioned this issue Feb 16, 2024

Add worker id to on_thread_park() and on_thread_unpark() callbacks (for stuck worker watchdog) #6353

Closed

wathenjiang mentioned this issue May 8, 2024

tokio::time::interval is not working #6545

Closed

Darksonn mentioned this issue Jun 12, 2024

std::thread::sleep blocks another thread even when it's multi thread and multiple workers #6631

Closed

wathenjiang mentioned this issue Jul 11, 2024

The tokio::time::timeout() does not return after timeout. #6682

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rt: Tolerate slow task polls #6315

rt: Tolerate slow task polls #6315

carllerche commented Jan 30, 2024 •

edited

Loading

exabytes18 commented Feb 1, 2024

Darksonn commented Feb 1, 2024

CinchBlue commented Feb 4, 2024

inevity commented Feb 16, 2024

Darksonn commented Feb 19, 2024

rcoh commented Nov 1, 2024

Avi-D-coder commented Nov 1, 2024

rt: Tolerate slow task polls #6315

rt: Tolerate slow task polls #6315

Comments

carllerche commented Jan 30, 2024 • edited Loading

exabytes18 commented Feb 1, 2024

Darksonn commented Feb 1, 2024

CinchBlue commented Feb 4, 2024

inevity commented Feb 16, 2024

Darksonn commented Feb 19, 2024

rcoh commented Nov 1, 2024

Avi-D-coder commented Nov 1, 2024

carllerche commented Jan 30, 2024 •

edited

Loading