Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LIFO task slot optimization in worker is potential footgun when a task doesn't yield. #4323

Closed
tobz opened this issue Dec 15, 2021 · 1 comment
Labels
A-tokio Area: The main tokio crate C-bug Category: This is a bug. M-runtime Module: tokio/runtime

Comments

@tobz
Copy link
Member

tobz commented Dec 15, 2021

Version
tokio 1.14.0

Platform
Linux derp 5.11.0-31-generic #33-Ubuntu SMP Wed Aug 11 13:19:04 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

Description
In a small application used to benchmark the performance of some code that dealt with writing to and reading from an external data source, I encountered a particular mysterious issue related to how Tokio handles the worker scheduler optimization of storing the "next task to poll" in a slot that is not considered by the normal work-stealing algorithm.

Essentially[1], the code has a #[tokio::main] annotated async fn main(), which then spawns two tasks -- one for the reader, and one for the writer -- and runs them until both complete. The complication comes in where the writer, as written, has no need to yield: its work happens off-thread and so it is wrapped in an asynchronous interface (Sink) but never does anything to trigger a yield, or manually yields.

Where this caused an issue is that this writer task was on the same worker as the reader task, and additionally, it actively notifies the reader of progress (via AtomicWaker), which lead to a situation where the worker was holding on the reader task in its "next task to poll" slot, which I've been lead to understand is not considered when the normal work-stealing algorithm runs.

While the documentation, in many places, talks about tasks not yielding as being detrimental, and having the ability to cause other tasks to not be scheduled/polled, it was very unintuitive that having only two tasks spawned onto a multithreaded runtime with 16 worker threads still had no way to push the second task to another worker.

I'm not sure if there's even a reasonable way to avoid this, and maybe the answer is simply having something like tokio-console be able to better surface this issue, but it definitely felt like a quirk, and ultimately required an answer from a core Tokio dev: there was no existing blog post, Github issue, or other piece of information that explained this particular quirk.

@tobz tobz added C-bug Category: This is a bug. A-tokio Area: The main tokio crate labels Dec 15, 2021
@Darksonn Darksonn added the M-runtime Module: tokio/runtime label Dec 15, 2021
facebook-github-bot pushed a commit to facebook/buck2 that referenced this issue Mar 24, 2022
Summary:
Our event consumer task is getting stuck in the lifo slot of other
long-running tasks, causing events to queue until that long-running task
yields.

This change adds a separate single-thread tokio runtime to run that consumer
task. This avoids it getting stuck in another task's lifo slot.

See also tokio-rs/tokio#4323

Reviewed By: swgillespie

Differential Revision: D35067199

fbshipit-source-id: 63f68a00b33e45089eb6f02a4df6322c5991dcc6
@carllerche
Copy link
Member

Closing in favor of #4941

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-tokio Area: The main tokio crate C-bug Category: This is a bug. M-runtime Module: tokio/runtime
Projects
None yet
Development

No branches or pull requests

3 participants