-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: dynamically batch tx sender recovery #1834
Conversation
82b89a9
to
ffe7c3b
Compare
Codecov Report
📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more @@ Coverage Diff @@
## main #1834 +/- ##
=======================================
Coverage 73.50% 73.51%
=======================================
Files 410 410
Lines 50515 50527 +12
=======================================
+ Hits 37131 37143 +12
Misses 13384 13384
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 6 files with indirect coverage changes Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gg
.for_each(|result: Result<_, StageError>| { | ||
let _ = tx.send(result); | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sending them one by one is totally fine
for chunk in | ||
&tx_walker.chunks(self.commit_threshold as usize / rayon::current_num_threads()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
in hindsight this is kinda obvious
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah...
The performance regression in the sender recovery stage was caused by us effectively queuing 5000 really "fast" (relatively) jobs, leading to a lot of time lost on Rayon's worker threads trying to steal more jobs.
The solution is to reintroduce batching. For now, we create batches based on the number of worker threads in the Rayon threadpool. This works since we are limited by memory, and can't crank the commit threshold too much, and separate config for batch sizes in this case doesn't make much sense.
This is a perf grab of the current sender recovery stage:
As we can see here (on the top right), almost 50%(!) of the time is spent trying to get more work.
Compare this with this PR:
We almost spend no time trying to get more work.
The speedup for me is that sender recovery now feels snappy again - before, it felt like it took 20-30s per 5k blocks, now it takes about 3-4.