Don't always wake up sleeping schedulers #11325
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I created a benchmark recently of incrementing a global variable inside of a
extra::sync::Mutex
, and it turned out to be horribly slow. For 100Kincrements (per thread), the timings I got were:
Upon profiling the test, most of the time is spent in
kevent()
(I'm on OSX)and
write()
. I thought that this was because we were falling into epoll toomuch, but after implementing the scheduler only falling back to epoll() if there
is no work or active I/O handles, it didn't fix the problem.
The problem actually turned out to be that the schedulers were in high
contention over the tasks being run. With RUST_TASKS=1, this test is blazingly
fast (78ms), and with RUST_TASKS=2, its incredibly slow (3824ms). The reason
that I found for this is that the tasks being enqueued are constantly stolen by
other schedulers, meaning that tasks are just getting ping-ponged back and forth
around schedulers while the schedulers spend a lot of time in
kevent
andwrite
waking each other up.This optimization only wakes up a sleeping scheduler on every 8th task that is
enqueued. I have found this number to be the "low sweet spot" for maximizing
performance. The numbers after I made this change are:
Which indicates that the 8-thread performance is up to the same level of
RUST_TASKS=1, and the other numbers essentiallyt stayed the same.
In other words, this is a 136x improvement in highly contentious green programs.