Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make ReplayStage own the threadpool for tx replay #190

Merged
merged 4 commits into from
Mar 12, 2024

Conversation

steviez
Copy link

@steviez steviez commented Mar 11, 2024

Problem

The threadpool used to replay multiple transactions in parallel is
currently global state via a lazy_static definition. Making this pool
owned by ReplayStage will enable subsequent work to make the pool
size configurable on the CLI.

More context in #105

Summary of Changes

Make ReplayStage create the threadpool and pass it down to blockstore_processor

steviez added 3 commits March 11, 2024 14:26
This items is currently global state via a lazy_static in
blockstore_processor.rs. Making it owned by ReplayStage is setup to
allow configuration of the size of the pool via CLI
Copy link
Author

@steviez steviez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR maintains the existing thread-pool size for non-test cases. That being said, I noticed this line that will probably need to be evaluated when the change to reduce this threadpool size comes up

let target_batch_count = get_thread_count() as u64;

@@ -978,9 +991,11 @@ fn verify_ticks(
Ok(())
}

#[allow(clippy::too_many_arguments)]
Copy link
Author

@steviez steviez Mar 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These functions as well as ReplayStage are in need of some cleanup. One easy win I think would be packing all the stuff that only gets accessed by the bottom layer of the call-stack (execute_batch()) into a struct and unpacking the struct in the bottom layer instead of passing each individual piece in every function.

That being said, I would like to defer that work after this PR - soon enough, work on this project will be lots of experiments / data collection. At that point, I'll have more "idle" time that I could come back to perform this cleanup if there is nothing else more pressing for me

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll have more "idle" time

you wish

@@ -871,10 +882,12 @@ pub fn process_blockstore_from_root(
.meta(start_slot)
.unwrap_or_else(|_| panic!("Failed to get meta for slot {start_slot}"))
{
let replay_tx_thread_pool = create_thread_pool(get_max_thread_count());
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another pro of this change is that we'll be able to configure this value to be different size for startup and steady-state. That's good because startup involves replaying entire blocks that are already in ledger, so higher amount of parallelism might be warranted vs. steady-state when blocks are gradually streamed over

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Looks like we're using 1 thread per core already, so opportunity seems to be dialing this down for steady state

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so opportunity seems to be dialing this down for steady state

Exactly - already have a good bit of data to support this being the correct decision in #25

@steviez steviez requested review from t-nelson and bw-solana March 11, 2024 20:45
where XY is one of {0, ..., num_threads_in_pool}
Copy link

@bw-solana bw-solana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@t-nelson
Copy link

should we not loft this even higher so banking stage can use the same pool while replay is idle?

@steviez
Copy link
Author

steviez commented Mar 12, 2024

should we not loft this even higher so banking stage can use the same pool while replay is idle?

Maybe ? If I recall correctly, I thought BankingStage started doing some work prior to its' leader slots actually starting. Something like organization of tx's so that it things are primed when the leader slots actually start. If that is correct, sharing the pool could have some adverse effects, ie slowing down replay just prior / slowing down the banking stage "priming".

I can followup with someone in the morning. I also already plan on attempting to use the same pool for entry verification and tx replay; that should be a pretty good gain. And comparatively, BankingStage isn't as abusive on thread counts (can't say the same for the tx ingestion pipeline) so we could also keep things separate for now to claim the low hanging fruit, and then circle back to try to further condense

@steviez
Copy link
Author

steviez commented Mar 12, 2024

should we not loft this even higher so banking stage can use the same pool while replay is idle?

Maybe ? If I recall correctly, I thought BankingStage started doing some work prior to its' leader slots actually starting. Something like organization of tx's so that it things are primed when the leader slots actually start. If that is correct, sharing the pool could have some adverse effects, ie slowing down replay just prior / slowing down the banking stage "priming".

I can followup with someone in the morning...

I talked to @apfitzge, and the short answer is yes, there appears to be some work that BankingStage is doing prior to leader slots actually starting. I will have to defer to him to go into more detail, but given that, I'm inclined to push this PR as-is and pursue any combination of resources between ReplayStage and BankingStage in the future

@apfitzge
Copy link

I think it could probably be done, but will need to be careful with how we do that. BankingStage still does work before becoming leader:

  1. receiving packets from sigverify
  2. sanitizing transactions
  3. buffering transactions (priority ordered - O(nlogn) operation!)
  4. clean buffer - removes txs that were already processed or expired

We probably could make the ConsumeWorkers in the new central-scheduler mode of banking stage use a shared thread-pool but not the scheduler thread itself.
Workers should be sleeping (blocking crossbeam receive) except when we are leader - scheduler thread has sent them work.

@steviez steviez merged commit 7a144e2 into anza-xyz:master Mar 12, 2024
35 checks passed
@steviez steviez deleted the rs_own_tx_replay_tpool branch March 12, 2024 18:21
codebender828 pushed a commit to codebender828/agave that referenced this pull request Oct 3, 2024
The threadpool used to replay multiple transactions in parallel is
currently global state via a lazy_static definition. Making this pool
owned by ReplayStage will enable subsequent work to make the pool
size configurable on the CLI.

This makes `ReplayStage` create and hold the threadpool which is passed
down to blockstore_processor::confirm_slot().

blockstore_processor::process_blockstore_from_root() now creates its'
own threadpool as well; however, this pool is only alive while for
the scope of that function and does not persist the lifetime of the
process.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging this pull request may close these issues.

4 participants