Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

leader-qos part 1: move buffer from banking_stage to its own module #23508

Merged
merged 1 commit into from
Mar 10, 2022

Conversation

tao-stones
Copy link
Contributor

@tao-stones tao-stones commented Mar 7, 2022

Problem

This is part 1 of breaking #23257 PR, proposal review at #23369

Summary of Changes

  • move unprocessed_packet_batches from BankingStage to its own module
  • deserialize packets during receiving and buffering

Fixes #

@tao-stones tao-stones mentioned this pull request Mar 7, 2022
&working_bank,
&bank_creation_time,
recorder,
packet_batch,
Copy link
Contributor

@carllin carllin Mar 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we pass the deserialize_packet_batch.unprocessed_packets here and rework process_packets_transactions() to use those deserialized packets instead so we don't have to incur the cost of deserializing these packets again?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to above, process_packets_transactions is going to be replace by process_transactions(&transactions...), to take advantage of packets were deserialized upon receiving.

let should_retain = if let Some(bank) = &end_of_slot.working_bank {
let new_unprocessed_indexes = Self::filter_unprocessed_packets_at_end_of_slot(
bank,
packet_batch,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we pass the deserialize_packet_batch.unprocessed_packets here and rework filter_unprocessed_packets_at_end_of_slot() to use those deserialized packets instead so we don't have to incur the cost of deserializing these packets again?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this also has the added benefit that we no longer need to clone the transaction_indexes above, we just need the length below to compute the end_of_slot_filtered_invalid_count

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

filter_unprocessed_packets_at_end_of_slot is going to change in next commits, it'd eventually be this: https://github.com/solana-labs/solana/pull/23257/files#diff-bb8464c2f02c8863fb60e6e29f023f3e6da93a93fbb9199a70a44c9c1b41c7f3R557-R571

In this PR (to move UnprocessedPacketBatches out to its own module), consume_buffered_packets still iterates through buffer by packet_batch. In next steps, individual packets will be indexed and filtered. There will be no re-dederializing then.

core/src/banking_stage.rs Outdated Show resolved Hide resolved
core/src/banking_stage.rs Outdated Show resolved Hide resolved
Comment on lines +1988 to +1977
unprocessed_packet_batches.push_back(DeserializedPacketBatch::new(
packet_batch,
packet_indexes,
false,
));
Copy link
Contributor

@carllin carllin Mar 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change will now deserialize every PacketBatch before adding it to the buffered queue, whereas the current code only deserializes when we're about to process the packet.

How much of a perf hit does this incur if we're ingesting up to max buffered packets? Before we were just adding to a vector so it probably wouldn't block the transaction processing.

If the extra perf hit is significant, we could add a max time spent receiving now/max packets receive to ensure we don't spin in here for too long:

for packet_batch in packet_batch_iter {
let packet_indexes = Self::generate_packet_indexes(&packet_batch.packets);
// Track all the packets incoming from sigverify, both valid and invalid
slot_metrics_tracker.increment_total_new_valid_packets(packet_indexes.len() as u64);
slot_metrics_tracker.increment_newly_failed_sigverify_count(
packet_batch
.packets
.len()
.saturating_sub(packet_indexes.len()) as u64,
);
Self::push_unprocessed(
buffered_packet_batches,
packet_batch,
packet_indexes,
&mut dropped_packet_batches_count,
&mut dropped_packets_count,
&mut newly_buffered_packets_count,
batch_limit,
banking_stage_stats,
slot_metrics_tracker,
)

Could also make the deserialization lazy in DeserializedPacketBatch

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! I did perf test to compare. So banking_bench on my local box indicates a slight regression on mean tps, the GCE 1-hour perf test has more noticeable regression (mean tps went from ~20K -> ~16K). Would think banking-bench better captures changes here.

Compare to current code, it is deserialize every received packet once, whereas in current version a packet could be deserialized multiple time if it is to be retried.

In next commits, I increased number of banking threads, therefore reduced each thread's buffer size (calculated as TOTAL_BUFFERED_PACKETS / number_of_threads), that seems to improve perf a lot. So there's a sweet spot of buffer size.

Making it lazy is a good idea. I'll put it into final tuning perhaps.

Copy link
Contributor

@carllin carllin Mar 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mean tps went from ~20K -> ~16K

hmmm that's quite a big perf hit, that's just from this PR or the bigger PR here: #23257

I would prefer to not check in PR's that induce a perf hit, and then try to check in fixes in later PR's. Keeping each PR perf-neutral is easier for management so that if we roll back certain PR's, we don't have to remember the interdependencies.

Here, maybe a lazy deserialization might help? Or what if we moved the deserialization into a separate stage/part of sigverify that streamed the deserialized transactions to banking stage?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's just from this PR or the bigger PR here: #23257

That's from the bigger all-in-one PR.

But this is a fair point. I just kicked a 1-hour test. Let's see the results. Definitely signup on the idea.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool, thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

looks inline with master

carllin
carllin previously approved these changes Mar 9, 2022
  module
- deserialize packets during receving and buffering
@tao-stones tao-stones added the automerge Merge this Pull Request automatically once CI passes label Mar 10, 2022
@mergify mergify bot dismissed carllin’s stale review March 10, 2022 16:48

Pull request has been modified.

@mergify mergify bot merged commit 35d1235 into solana-labs:master Mar 10, 2022
@github-actions
Copy link
Contributor

This PR has been automatically locked since there has not been any activity in past 14 days after it was merged.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 30, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
automerge Merge this Pull Request automatically once CI passes locked PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants