-
Notifications
You must be signed in to change notification settings - Fork 4.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
getBlock spike triggers permanent catchup difficulty by rayon/crossbeam bug #22603
Comments
also, this fix should contribute overall latency reduction across board (dunno about the how-much though). i'm guessing just few % decrease. |
@ryoqun We've seen this spike a lot, though less recently (possibly with move to different hardware). We have some older hardware running the SOLANA_RAYON_THREADS=8 hack, do you see any potentially downsides to this param? |
there should be none as long as your node is catching up to the cluster. (the hack slightly increases the likelihood of the inability of catching up). |
@steviez hi, just want to know how's going to fix this. i might be able to work on this soon? |
Hi @ryoqun - Some stuff came up so haven't spent as much time as I would have linked, but I'm currently working on a simplified, standalone test that will exhibit the issue (I agree with you that having this would be useful) before starting to tinker in rayon. -- Out of curiosity, I dug through the codebase and wanted to see how many worker threads we make with rayon to understand what proportion of additional threads the
Here is a catalogue of where we spawn Rayon threads:
Adding all of these up and:
On a 16 physical core / 32 thread machine:
On that same 16 physical core / 32 thread machine with
Sticking with the 32-thread machine, one Also, rayon worker cleanup issue aside, I wonder if 16 threads is overkill for fetching blocks from Rocks, especially on a validator under heavy load |
Problem
technical mechanism of the problem
in short, rayon/crossbeam isn't written with the assumption of idling threads in various pools, which exceeds the system's core count by far. and we're doing that unfortunately.
firstly, rayon thread pool is backed by
crossbeam-deque
, in turncrossbeam-epoch
(a garbage collection library). and each rayon worker thread periodically runs the gc bookeeping code.and the bookkeeping code isn't well optimized for our usecase. it contains a periodic linear scan of (lockless) linked list of all rayon worker threads in the os process as a part of it. so, as the number of threads increases, each scanning process takes longer, burdening every rayon threads. and the
PINNINGS_BETWEEN_COLLECT
constant controls the frequency and it's too short (i.e. frequent) for us. so, seemingly independent thread pools affects each other. (that was the mystery part)and solana-validator's rpc subsystem spawns new set of rayon thread pool if its blockstore related jsonrpc method is invoked.
that causes sudden increase of alive rayon threads, so the crossbeam's scanning is increased. finally, if a certain threshold is crossed depending on the system configuration, the catchup (the replay stage) related thread pools (
blockstore_processor
andentry
ones) are too wasting cycles in the gc bookkeeping to process transactions/entries in timely manner.Proposed Solution
The text was updated successfully, but these errors were encountered: