Use fee and sender_stake prioritization to add new packet_batch to ba… #23841

tao-stones · 2022-03-22T16:32:31Z

Problem

Banking_stage currently using FIFO to drop buffered packet_batch to make room for received new batch. It could drop highly prioritized Transactions in old batch. It should instead to drop Transactions with lowest priority.

Priority is determined by Transaction's fee-per-cu, and sender_stake if fee-per-cu is same.

Summary of Changes

update banking_stage::push_unprocessed() to UnprocessedPacketBatches::insert_or_swap_batch() instead of fifo;
UnprocessedPacketBatches::prioritize_by_fee_per_cu() to index Transaction by its fee/CU, which is calculated with working_bank
Weight-shuffle locators with send_stakes
swap_packet_by_priority implements the logic of dropping lower prioritized Transaction, indexed by prioritize_by_fee_then_stakes, find an empty patch_batch to swap with received new packet_batch.

Fixes #23369

tao-stones · 2022-03-23T02:16:37Z

Ran 1-hour 5 nodes perf test, this PR reported 16K, while the tip of master is 15k-17k. It is basically inline with master.

A side note of performance, since each banking thread is doing more for prioritization, i will be increasing banking_stage thread number from 4 to 6. The test I conducted before indicates adding 2 more threads yields good benefit on performance (it also reduce each thread's buffer size).

codecov · 2022-03-23T02:35:27Z

Codecov Report

Merging #23841 (cdb9cd8) into master (2da4e3e) will decrease coverage by 0.0%.
The diff coverage is 81.1%.

❗ Current head cdb9cd8 differs from pull request most recent head b040999. Consider uploading reports for the commit b040999 to get more accurate results

@@            Coverage Diff            @@
##           master   #23841     +/-   ##
=========================================
- Coverage    81.8%    81.7%   -0.1%     
=========================================
  Files         581      583      +2     
  Lines      158312   159086    +774     
=========================================
+ Hits       129518   130119    +601     
- Misses      28794    28967    +173

core/src/unprocessed_packet_batches.rs

core/src/banking_stage.rs

core/src/unprocessed_packet_batches.rs

carllin · 2022-03-23T08:00:31Z

core/src/unprocessed_packet_batches.rs

+        if self.len() >= batch_limit {
+            let (_, dropped_batches_count, dropped_packets_count) =
+                self.swap_packet_by_priority(deserialized_packet_batch);


When there are a lot of incoming packets during spikes, I see this pathway being hit repeatedly in the metrics, meaning it can't be expensive to run.

This currently a pretty expensive operation of figuring out the fees for every transaction in the queue, sorting them, dropping the unnecessary ones.

Maybe it would be better to change the queue to support ordering by the fee-per-cu on insertion, maybe a min-max-heap: https://docs.rs/min-max-heap/latest/min_max_heap/?

This is a good point, I paid attention on bench test and perf test. Tests didn't show noticeable degrading, but the concern is valid.

If the buffered entry is packet_batch (which I'd like to avoid changing it to minimize the change scope), I'm not sure how BTreeMap could help here.

Another constrain is fee-per-cu is bank dependent, it is not simply derived from transaction itself (at least at the moment), rather it depends on current bank's fee_structure and lamports_per_signature.

Amount the added overhead in order to be able to prioritize by fee, I think compute fee per transaction perhaps the most expensive. Sorting by fee (utilized a BTreeMap) might be second; dropping should cost very little (drop a packet is just remove its index from unprocessed vec, drop a batch is a HashMap::Swap op).

Mentioned somewhere early, I intended to increase banking_stage non-vote threads from 2 to 4 to compensate added overhead, which also automatically reduce each thread's buffer size (= TOTAL_BUFFERED_PACKETS/number_of_thredas), that'd help on reduce sorting cost a bit. (I tested it before, the benching show significant improvement with 6 threads, more test should be done ofc)

Having said that, I think to cache computed fee-per-cu for transactions for up to X slots would help. Ideally X is 1, so it computes fee-per-cu for each transaction once per bank, still would save a lot during spikes. Or we can increase X if we don't care to be super precise, or performance is prove be too bad during spikes. I'll start work on this.

commit adds cache for fee-per-cu for up to 1 slot. Going to add some metrics

Tests didn't show noticeable degrading, but the concern is valid.

Hmm I think to test this drop path properly, we would have to spam packets such that each thread hits the buffer limit, i.e. somewhere close to 500k tps, which I don't think any of the tests currently do

Another constrain is fee-per-cu is bank dependent, it is not simply derived from transaction itself (at least at the moment), rather it depends on current bank's fee_structure and lamports_per_signature.

I think if we follow your suggestion here: #23841 (comment), it won't matter too much, we just need to pick one consistent bank so that everyone is relatively equal to one another.

If the buffered entry is packet_batch (which I'd like to avoid changing it to minimize the change scope), I'm not sure how BTreeMap could help here.

Yeah this is a pain, we really want to be able to weigh each packet separately in a min-max-heap. This will help too when we need to find the highest priority transaction to process, which my 🔮 is telling me is going to be the follow-up to this 😉

With our current grouping of multiple packets into PacketBatch there's also a correctness problem when a high priority transaction is grouped into the same batch as a low priority batch. This is exemplified by compromises like this:

solana/core/src/unprocessed_packet_batches.rs

Line 400 in d0a0bf6

warn!("Cannot find eviction candidate from buffer in single iteration. New packet batch is dropped.");

where a potentially high fee transaction doesn't make it into the buffer because a bunch of low transaction transactions were grouped into the same batch as high priority transactions in a PacketBatch

I think we should consider ripping off the bandaid and switch to operating on individual Packet objects here.

I think we should consider ripping off the bandaid and switch to operating on individual Packet objects here.

Yeah, the benefits of dealing with individual packets is obvious, particularly from the angle we are in right now 😃 ; Just concerned if there are downsides are not apparent. Better to do this separately, imo.

Ahh right right, I got the part about the fee-per-cu. I was just thinking in the case where fee-prioritization is a constant --- it would put the smaller validators at that inherent disadvantage when queue exceeds block size. The only way for them to get any transaction inclusion would then be to submit transactions with higher fee-priority than the larger validators. I guess I just wasn't quite sure what the goal was of the stake weight prioritization if Quic happens to work. I think for both Quic and stake-weight priority it'll be something that validators end up tossing aside at some point anyway if MEV or block rewards makes them more $ by not following those rules.

Yea, I can see your point. The original idea behind stake-weight is "validator has x% stake should have x% say on what transactions to be included in block". At what stage it to be implemented is more flexible. If Quic works, we can easily toss this out from banking_stage, i think.

Hmm maybe the other thing that could be possible is that you give precedence to a certain sized set of immediately prior leaders (i.e. last 8 leaders), this would still be somewhat stake-weighted (because higher staked validators appear more often in schedule), but would allow for lower staked validators to appear in the ordering (since they'll pop up in the set too). Also, in the case that the prior leaders' transactions may have not all gone through, they still have some lasting prioritization beyond their leader slots (this is more loose than direct forwarding, but maybe helps?? dunno, lol).

Good question, but just want to point it out stake weight here plays a small part in prioritization. Transactions in buffer are prioritized by their fee-per-cu. Sender stakes are weighted to transactions have same fee-per-cu. Arguably, it is optional to using sender stake in leader prioritization, just an extra.

Coming back to this, given fee priority rate will be quite fine-grained as of now, I think any type of stake weight ordering is a bit pointless. I think you'll see a wide variety of priority except for maybe at the base level, but if I'm a bot I'll never send at that level, I'll add a few micro lamports and send transactions across a range of priority which doesn't perceptibly cost me anything, then some other bot will add a few micro lamports in response until you never see priority fees at the exact same level and things will be too dynamic to ever fall at the same exact priority rate. There won't be any beneficial stake weighted ordering that can occur at that point anymore. I think ordering by stake weight priority might just waste valuable time.

Good question, but just want to point it out stake weight here plays a small part in prioritization. Transactions in buffer are prioritized by their fee-per-cu. Sender stakes are weighted to transactions have same fee-per-cu. Arguably, it is optional to using sender stake in leader prioritization, just an extra.

Coming back to this, given fee priority rate will be quite fine-grained as of now, I think any type of stake weight ordering is a bit pointless. I think you'll see a wide variety of priority except for maybe at the base level, but if I'm a bot I'll never send at that level, I'll add a few micro lamports and send transactions across a range of priority which doesn't perceptibly cost me anything, then some other bot will add a few micro lamports in response until you never see priority fees at the exact same level and things will be too dynamic to ever fall at the same exact priority rate. There won't be any beneficial stake weighted ordering that can occur at that point anymore. I think ordering by stake weight priority might just waste valuable time.

agree. Perhaps worth to see how it plays out.

…nking thread buffer

…e fee-per-cu computing; avoid read-lock blockhash_queue when getting lamports-per-signature;

tao-stones · 2022-04-01T21:15:16Z

opened issue #24069 to discuss/document design of indexing. turn this PR to Draft until this issue is solved.

stale · 2022-04-16T12:42:57Z

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

stale · 2022-04-27T23:12:24Z

This stale pull request has been automatically closed. Thank you for your contributions.

tao-stones requested review from carllin, sakridge and jstarry and removed request for carllin March 22, 2022 16:32

tao-stones force-pushed the leader-qos-part-4 branch 2 times, most recently from 12b0b3c to cdb9cd8 Compare March 22, 2022 16:39

tao-stones added the v1.10 label Mar 22, 2022