Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rt: Tolerate slow task polls #6315

Open
carllerche opened this issue Jan 30, 2024 · 7 comments
Open

rt: Tolerate slow task polls #6315

carllerche opened this issue Jan 30, 2024 · 7 comments
Labels
A-tokio Area: The main tokio crate C-feature-request Category: A feature request. M-runtime Module: tokio/runtime

Comments

@carllerche
Copy link
Member

carllerche commented Jan 30, 2024

Currently, Tokio relies on well-behaved application behavior, including tasks completing their polls in a reasonable time. When user tasks are not well-behaved, the application can suffer high latencies due to the Tokio scheduler being blocked by the user task and unable to process other tasks.

Fundamentally, because Tokio is an asynchronous task scheduler, it requires well-behaved user tasks. Any attempt to mitigate poorly behaved tasks will most likely be less effective than an OS's thread scheduler. That said, it is still valuable to be able to detect and mitigate these poorly behaved tasks.

I don't know what the best strategy for this is. My gut is it will be a background process, either in a dedicated background thread or handled by idle workers, that checks that active workers are making consistent progress.

Any solution cannot impact the existing performance of well-behaved apps, so if the best path forward comes with additional overhead, it needs to be opt-in.

Moving this issue forward will require the following:

  1. A survey of other async systems to determine prior art.
  2. A report of our options with an analysis of pros/cons, including performance impact.
  3. A decision on which avenue to take.

Refs: #4730

@carllerche carllerche added A-tokio Area: The main tokio crate C-feature-request Category: A feature request. labels Jan 30, 2024
@Darksonn Darksonn added the M-runtime Module: tokio/runtime label Jan 30, 2024
@exabytes18
Copy link

Currently, Tokio relies on well-behaved application behavior, including tasks completing their polls in a reasonable time.

It may be worth trying to further define what "well-behaved application behavior" actually is, as it's probably not immediately obvious to some developers (or me). I don't know how long tasks are reasonably allowed to execute, or what set of circumstances may lead to degraded performance.

Anecdotally, I come from working on JVM-based webservices, and have some implicit fear of GC. For the sake of argument, let's say a typical GC pause may be 20ms. Thus, I'd draw the line at something like a task taking 20ms to complete. Does this mean that due to the pathological nature of the scheduler, accidentally running a task that takes 20ms could result in my tokio/rust application pausing for longer than GC pauses we typically see in the JVM world?

  • Depending on your standpoint, maybe 20ms is too much or maybe it's too little. It's hard to say. At what point do you offload that work to background thread pool / spawn_blocking / whatever?
  • How savvy are developers at knowing how long all tasks take to complete?
    • One case we saw of this was decompressing messages come off the wire from kafka. Decompressing a chunk of bytes seems relatively safe on the surface (no IO, anyway), but as a tokio-user, you need to now know that doing so in an async task is not a "well-behaved application" as, depending on how large those messages are, they could result in spending more CPU than you might realize.

Last thought for now: the problem is very apparent if you run a very lengthy task (something that takes 1000ms), as demo'd in the past github issues. However, is the problem still present, just at a more difficult to measure level, with tasks that take 10us? The point being that if this actually impacts tasks both long and short, then this issue does represent a performance issue for reasonably well-behaved applications. That is, what is the overall system throughput penalty resulting from running a spurious task that takes 1us, or 10us, or 100us, or 1ms, or 10ms, or 100ms, or 1000ms? What is throughput penalty of various fixes posed in the previous github issues?

@Darksonn
Copy link
Contributor

Darksonn commented Feb 1, 2024

Yes, 20ms is definitely too long. In my article on blocking, I recommend 10μs to 100μs as an upper limit.

In general, there are three ways that long polls can cause degraded performance:

  1. Every single worker thread is blocked by a different task.
  2. There's a single blocking future, and there is a task in the LIFO slot of that thread (because the blocking future woke it up).
  3. There's a single blocking future on the thread that is currently responsible for receiving incoming IO/timer events.

In the first case, there is nothing we can do. The second case is tracked by #4941. This issue tracks the third case. In all other cases, work stealing will kick in and move tasks to a different worker so that they do not have to wait for the blocking future.

The way that this issue gets triggered is as follows:

  1. All threads are idle. Thread A is waiting for IO/timer events.
  2. Thread A receives an IO or timer event and wakes up to handle it.
  3. A single task is woken up by the IO/timer event, and thread A polls it.
  4. During the call to poll, a new IO (or timer) event becomes available.
  5. Thread A does not see the IO event since it's polling a future. The other worker threads don't see it either, since none of them are currently registered to receive IO events.

An important distinction here is that work-stealing is working correctly. There's only a single ready-to-run task on the entire runtime, so there is no work stealing that could happen. The problem is that the tasks will not get marked as ready until the runtime next asks the OS for IO events.

None of this happens if there are other active workers, because they will take over IO events when they next become idle. This is why you can make your own monitor thread by regularly spawning tasks on the runtime.

A silver lining of the above is that the issue can only really happen when all threads are idle and a single incoming IO event arrives, so it will not happen while the runtime is under load.

@CinchBlue
Copy link

Hey -- I just landed up being affected by #4730 -- in my case, only 2 out of the 8 Tokio threads I had available were in deadlock but the rest of the 6 were not proceeding.

I don't know what the best strategy for this is. My gut is it will be a background process, either in a dedicated background thread or handled by idle workers, that checks that active workers are making consistent progress.

Idea: in addition to having this "monitor thread" doing automatic spawning on the runtime, would it make sense to allow for manually signaling the runtime to do wake-up/shuffling of tasks/workers to allow movement in this case when it gets stuck? This wouldn't solve the issue definitively, but it would also allow testing whether a currently "not-well-formed" program is either only partially deadlocked or entirely deadlocked. I can see something like this being valuable while debugging.

Any solution cannot impact the existing performance of well-behaved apps, so if the best path forward comes with additional overhead, it needs to be opt-in.

If I remember from the discussion, there was more than 1 solution being considered. Eventually, as implementations come about, they would just be gated behind tokio_unstable features?

@inevity
Copy link
Contributor

inevity commented Feb 16, 2024

Does current thread runtime with localset suffer those 2 /3 issue @Darksonn
I think it will suffer 3, and what about 2?

@Darksonn
Copy link
Contributor

@inevity The current-thread runtime can only suffer from case 1. After all, if you are blocking one thread, you are blocking all of them. (The current-thread runtime has no lifo slot.)

would it make sense to allow for manually signaling the runtime to do wake-up/shuffling of tasks/workers to allow movement in this case when it gets stuck?

You can call handle.spawn(async {}) on a handle to the runtime. That will wake a new worker thread, which will then check the IO driver after handling your spawned task.

But I want to clarify that there is no shuffling of tasks that can happen here. It's not like there are a bunch of tasks on those two threads that could run if they were just moved to a different worker thread. Other than your blocking tasks, all other tasks on your runtime are considered idle. It's just that if you polled the IO driver, then that would result in more tasks being woken. But since the IO driver isn't being polled, those tasks are considered idle, and that is why the runtime isn't polling them.

@rcoh
Copy link
Contributor

rcoh commented Nov 1, 2024

I had a thought on this: what if we had a mode (or a way to mark tasks) as poorly behaved.

Each poll would run as if the user had wrapped the code with block_in_place. There are none of the normal block in place risks here — we know nothing else is going to make progress on that task anyway while a single poll is in progress.

This obviously would degrade performance for some applications, but it seems like a potential thing to try for a more tolerant Tokio.

In some experiments even 200us polls were benefitted by this behavior.

@Avi-D-coder
Copy link

I had a thought on this: what if we had a mode (or a way to mark tasks) as poorly behaved.

@rcoh I wrote about something similar on Reddit, my post was not well received, but I maintain something like this is an important feature tokio should have.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-tokio Area: The main tokio crate C-feature-request Category: A feature request. M-runtime Module: tokio/runtime
Projects
None yet
Development

No branches or pull requests

7 participants