rt: reduce the impact of CPU bound tasks on the overall runtime shceduler #6251
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
We discussed in #4730 that CPU-bound tasks will cause increased scheduling delays in the entire Tokio runtime. We recommend using tokio::task::block_in_place for these tasks to prevent latency spikes for other tasks. However, even with a multi-threaded runtime and multiple worker threads, just one single CPU-bound task can still cause significant latency.
Some async runtimes, like Go, have dedicated threads for polling drivers (Go's sysmon thread). Tokio, however, has each worker thread responsible for polling the driver under certain conditions, allowing only one worker thread to do so at a time by using Condvar. Go's strategy may not be suitable for Tokio, as its sysmon thread also handles GC-related operations and goroutine preemption.
This PR aims to fully harness the CPU computing potential of the multi_thread tokio runtime, reduce the delay impact of CPU bound tasks on IO event processing.
Note: This does not solve the problem that asynchronous tasks in Rust cannot currently implement preemptive scheduling. The number of CPU bound tasks exceeding the number of worker threads will still cause the entire tokio runtime to run blocked.
Tokio's scheduling mechanism is very great. It minimizes the need for worker thread wake-ups and has significantly improved performance, which can been seen in #4383 . However, the thread wake-up mechanism seems a bit too conservative. It might be worth making it slightly more aggressive to ensure a quicker I/O event response time. This PR includes the following change:
When the worker responsible for polling the driver gets unparked, no thread will continue to poll the driver at that moment, as the current worker is going to run tasks. So, we quickly try to wake up another worker, hoping it can poll the driver.
Here is the test case for the current PR:
The test result on master:
The test result on this PR:
Referring to #4383 , I did a performance test of Hyper's "hello" server:
This benches Hyper's "hello" server using
wrk -t1 -c400 -d10s http://127.0.0.1:3000/
Master
This PR
There is almost no performance difference in this test case.
This PR will not completely solve the negative impact of CPU bound tasks (or blocking tasks), but it can reduce the processing delay of I/O events when the task schedule is not busy.