Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: dynamically batch tx sender recovery #1834

Merged
merged 1 commit into from
Mar 18, 2023

Conversation

onbjerg
Copy link
Member

@onbjerg onbjerg commented Mar 18, 2023

The performance regression in the sender recovery stage was caused by us effectively queuing 5000 really "fast" (relatively) jobs, leading to a lot of time lost on Rayon's worker threads trying to steal more jobs.

The solution is to reintroduce batching. For now, we create batches based on the number of worker threads in the Rayon threadpool. This works since we are limited by memory, and can't crank the commit threshold too much, and separate config for batch sizes in this case doesn't make much sense.

This is a perf grab of the current sender recovery stage:

Screenshot from 2023-03-18 13-19-39

As we can see here (on the top right), almost 50%(!) of the time is spent trying to get more work.

Compare this with this PR:

Screenshot from 2023-03-18 14-01-03

We almost spend no time trying to get more work.

The speedup for me is that sender recovery now feels snappy again - before, it felt like it took 20-30s per 5k blocks, now it takes about 3-4.

@onbjerg onbjerg added A-staged-sync Related to staged sync (pipelines and stages) C-perf A change motivated by improving speed, memory usage or disk footprint labels Mar 18, 2023
@onbjerg onbjerg requested a review from rkrasiuk as a code owner March 18, 2023 13:05
@onbjerg onbjerg force-pushed the onbjerg/sender-recovery-perf branch from 82b89a9 to ffe7c3b Compare March 18, 2023 13:08
@codecov-commenter
Copy link

codecov-commenter commented Mar 18, 2023

Codecov Report

Merging #1834 (ffe7c3b) into main (995c5ad) will increase coverage by 0.00%.
The diff coverage is 100.00%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

@@           Coverage Diff           @@
##             main    #1834   +/-   ##
=======================================
  Coverage   73.50%   73.51%           
=======================================
  Files         410      410           
  Lines       50515    50527   +12     
=======================================
+ Hits        37131    37143   +12     
  Misses      13384    13384           
Flag Coverage Δ
integration-tests 19.70% <0.00%> (-0.01%) ⬇️
unit-tests 67.86% <100.00%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
crates/stages/src/stages/sender_recovery.rs 94.14% <100.00%> (+2.95%) ⬆️

... and 6 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

Copy link
Collaborator

@mattsse mattsse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gg

Comment on lines +113 to +115
.for_each(|result: Result<_, StageError>| {
let _ = tx.send(result);
});
Copy link
Collaborator

@mattsse mattsse Mar 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sending them one by one is totally fine

Comment on lines +93 to +94
for chunk in
&tx_walker.chunks(self.commit_threshold as usize / rayon::current_num_threads())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

in hindsight this is kinda obvious

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah...

@onbjerg onbjerg merged commit a05b3ff into main Mar 18, 2023
@onbjerg onbjerg deleted the onbjerg/sender-recovery-perf branch March 18, 2023 13:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-staged-sync Related to staged sync (pipelines and stages) C-perf A change motivated by improving speed, memory usage or disk footprint
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants