-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
leader-qos part 1: move buffer from banking_stage to its own module #23508
leader-qos part 1: move buffer from banking_stage to its own module #23508
Conversation
&working_bank, | ||
&bank_creation_time, | ||
recorder, | ||
packet_batch, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we pass the deserialize_packet_batch.unprocessed_packets
here and rework process_packets_transactions()
to use those deserialized packets instead so we don't have to incur the cost of deserializing these packets again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to above, process_packets_transactions
is going to be replace by process_transactions(&transactions...)
, to take advantage of packets were deserialized upon receiving.
let should_retain = if let Some(bank) = &end_of_slot.working_bank { | ||
let new_unprocessed_indexes = Self::filter_unprocessed_packets_at_end_of_slot( | ||
bank, | ||
packet_batch, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we pass the deserialize_packet_batch.unprocessed_packets
here and rework filter_unprocessed_packets_at_end_of_slot()
to use those deserialized packets instead so we don't have to incur the cost of deserializing these packets again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this also has the added benefit that we no longer need to clone the transaction_indexes
above, we just need the length
below to compute the end_of_slot_filtered_invalid_count
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
filter_unprocessed_packets_at_end_of_slot
is going to change in next commits, it'd eventually be this: https://github.com/solana-labs/solana/pull/23257/files#diff-bb8464c2f02c8863fb60e6e29f023f3e6da93a93fbb9199a70a44c9c1b41c7f3R557-R571
In this PR (to move UnprocessedPacketBatches
out to its own module), consume_buffered_packets
still iterates through buffer by packet_batch. In next steps, individual packets will be indexed and filtered. There will be no re-dederializing then.
unprocessed_packet_batches.push_back(DeserializedPacketBatch::new( | ||
packet_batch, | ||
packet_indexes, | ||
false, | ||
)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change will now deserialize every PacketBatch
before adding it to the buffered queue, whereas the current code only deserializes when we're about to process the packet.
How much of a perf hit does this incur if we're ingesting up to max buffered packets? Before we were just adding to a vector so it probably wouldn't block the transaction processing.
If the extra perf hit is significant, we could add a max time spent receiving now/max packets receive to ensure we don't spin in here for too long:
solana/core/src/banking_stage.rs
Lines 1894 to 1915 in caf2b94
for packet_batch in packet_batch_iter { | |
let packet_indexes = Self::generate_packet_indexes(&packet_batch.packets); | |
// Track all the packets incoming from sigverify, both valid and invalid | |
slot_metrics_tracker.increment_total_new_valid_packets(packet_indexes.len() as u64); | |
slot_metrics_tracker.increment_newly_failed_sigverify_count( | |
packet_batch | |
.packets | |
.len() | |
.saturating_sub(packet_indexes.len()) as u64, | |
); | |
Self::push_unprocessed( | |
buffered_packet_batches, | |
packet_batch, | |
packet_indexes, | |
&mut dropped_packet_batches_count, | |
&mut dropped_packets_count, | |
&mut newly_buffered_packets_count, | |
batch_limit, | |
banking_stage_stats, | |
slot_metrics_tracker, | |
) |
Could also make the deserialization lazy in DeserializedPacketBatch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! I did perf test to compare. So banking_bench
on my local box indicates a slight regression on mean tps, the GCE 1-hour perf
test has more noticeable regression (mean tps went from ~20K -> ~16K). Would think banking-bench
better captures changes here.
Compare to current code, it is deserialize every received packet once, whereas in current version a packet could be deserialized multiple time if it is to be retried.
In next commits, I increased number of banking threads, therefore reduced each thread's buffer size (calculated as TOTAL_BUFFERED_PACKETS / number_of_threads), that seems to improve perf a lot. So there's a sweet spot of buffer size.
Making it lazy is a good idea. I'll put it into final tuning perhaps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mean tps went from ~20K -> ~16K
hmmm that's quite a big perf hit, that's just from this PR or the bigger PR here: #23257
I would prefer to not check in PR's that induce a perf hit, and then try to check in fixes in later PR's. Keeping each PR perf-neutral is easier for management so that if we roll back certain PR's, we don't have to remember the interdependencies.
Here, maybe a lazy deserialization might help? Or what if we moved the deserialization into a separate stage/part of sigverify that streamed the deserialized transactions to banking stage?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's just from this PR or the bigger PR here: #23257
That's from the bigger all-in-one PR.
But this is a fair point. I just kicked a 1-hour test. Let's see the results. Definitely signup on the idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
caf2b94
to
c16095f
Compare
module - deserialize packets during receving and buffering
c16095f
to
573f21a
Compare
This PR has been automatically locked since there has not been any activity in past 14 days after it was merged. |
Problem
This is part 1 of breaking #23257 PR, proposal review at #23369
Summary of Changes
unprocessed_packet_batches
fromBankingStage
to its own moduleFixes #