leader-qos part 1: move buffer from banking_stage to its own module #23508

tao-stones · 2022-03-07T16:18:22Z

Problem

This is part 1 of breaking #23257 PR, proposal review at #23369

Summary of Changes

move unprocessed_packet_batches from BankingStage to its own module
deserialize packets during receiving and buffering

Fixes #

core/src/unprocessed_packet_batches.rs

carllin · 2022-03-08T01:01:42Z

core/src/banking_stage.rs

+                                    &working_bank,
+                                    &bank_creation_time,
+                                    recorder,
+                                    packet_batch,


can we pass the deserialize_packet_batch.unprocessed_packets here and rework process_packets_transactions() to use those deserialized packets instead so we don't have to incur the cost of deserializing these packets again?

Similar to above, process_packets_transactions is going to be replace by process_transactions(&transactions...), to take advantage of packets were deserialized upon receiving.

carllin · 2022-03-08T01:05:50Z

core/src/banking_stage.rs

+                let should_retain = if let Some(bank) = &end_of_slot.working_bank {
+                    let new_unprocessed_indexes = Self::filter_unprocessed_packets_at_end_of_slot(
+                        bank,
+                        packet_batch,


can we pass the deserialize_packet_batch.unprocessed_packets here and rework filter_unprocessed_packets_at_end_of_slot() to use those deserialized packets instead so we don't have to incur the cost of deserializing these packets again?

I think this also has the added benefit that we no longer need to clone the transaction_indexes above, we just need the length below to compute the end_of_slot_filtered_invalid_count

filter_unprocessed_packets_at_end_of_slot is going to change in next commits, it'd eventually be this: https://github.com/solana-labs/solana/pull/23257/files#diff-bb8464c2f02c8863fb60e6e29f023f3e6da93a93fbb9199a70a44c9c1b41c7f3R557-R571

In this PR (to move UnprocessedPacketBatches out to its own module), consume_buffered_packets still iterates through buffer by packet_batch. In next steps, individual packets will be indexed and filtered. There will be no re-dederializing then.

core/src/banking_stage.rs

carllin · 2022-03-08T01:28:10Z

core/src/banking_stage.rs

+            unprocessed_packet_batches.push_back(DeserializedPacketBatch::new(
+                packet_batch,
+                packet_indexes,
+                false,
+            ));


This change will now deserialize every PacketBatch before adding it to the buffered queue, whereas the current code only deserializes when we're about to process the packet.

How much of a perf hit does this incur if we're ingesting up to max buffered packets? Before we were just adding to a vector so it probably wouldn't block the transaction processing.

If the extra perf hit is significant, we could add a max time spent receiving now/max packets receive to ensure we don't spin in here for too long:

solana/core/src/banking_stage.rs

Lines 1894 to 1915 in caf2b94

for packet_batch in packet_batch_iter {

let packet_indexes = Self::generate_packet_indexes(&packet_batch.packets);

// Track all the packets incoming from sigverify, both valid and invalid

slot_metrics_tracker.increment_total_new_valid_packets(packet_indexes.len() as u64);

slot_metrics_tracker.increment_newly_failed_sigverify_count(

packet_batch

.packets

.len()

.saturating_sub(packet_indexes.len()) as u64,

);

Self::push_unprocessed(

buffered_packet_batches,

packet_batch,

packet_indexes,

&mut dropped_packet_batches_count,

&mut dropped_packets_count,

&mut newly_buffered_packets_count,

batch_limit,

banking_stage_stats,

slot_metrics_tracker,

)

Could also make the deserialization lazy in DeserializedPacketBatch

Good point! I did perf test to compare. So banking_bench on my local box indicates a slight regression on mean tps, the GCE 1-hour perf test has more noticeable regression (mean tps went from ~20K -> ~16K). Would think banking-bench better captures changes here.

Compare to current code, it is deserialize every received packet once, whereas in current version a packet could be deserialized multiple time if it is to be retried.

In next commits, I increased number of banking threads, therefore reduced each thread's buffer size (calculated as TOTAL_BUFFERED_PACKETS / number_of_threads), that seems to improve perf a lot. So there's a sweet spot of buffer size.

Making it lazy is a good idea. I'll put it into final tuning perhaps.

mean tps went from ~20K -> ~16K

hmmm that's quite a big perf hit, that's just from this PR or the bigger PR here: #23257

I would prefer to not check in PR's that induce a perf hit, and then try to check in fixes in later PR's. Keeping each PR perf-neutral is easier for management so that if we roll back certain PR's, we don't have to remember the interdependencies.

Here, maybe a lazy deserialization might help? Or what if we moved the deserialization into a separate stage/part of sigverify that streamed the deserialized transactions to banking stage?

that's just from this PR or the bigger PR here: #23257

That's from the bigger all-in-one PR.

But this is a fair point. I just kicked a 1-hour test. Let's see the results. Definitely signup on the idea.

cool, thanks!

looks inline with master

module - deserialize packets during receving and buffering

Pull request has been modified.

github-actions · 2022-03-30T13:33:33Z

This PR has been automatically locked since there has not been any activity in past 14 days after it was merged.

tao-stones requested review from carllin, jstarry and sakridge March 7, 2022 16:18

tao-stones mentioned this pull request Mar 7, 2022

Leader qos #23257

Closed