Consider larger lookback window in BankingStage for producing entries #22096

buffalu · 2021-12-23T18:56:39Z

Note: I haven't gathered data on the below yet, but from understanding how the validator works believe there could be a better way to handle batching entries.

Problem

It seems that under high load and lots of account contention, you can end up with the possibility of many small entries in a block and also lots of small entries where you have a chain of small entries accessing the same accounts. (ie entry n = only mango, entry n+1 only mango, ...)

More, smaller entries means less parallelization and the longer it would take a block to execute. The longer it takes a block to execute, the more replay_stage would blow up attempting to choose a fork.

For instance, take a look at the below example which illustrates this problem:

Proposed Solution

Instead of examining packets in batches of 128, you massively extend it a much larger (potentially the entire buffered queue) to pack entries. This would prevent high load from impacting non-related transactions (NFT minting, transfers, etc.) and make the system pack larger entries, which means more parallelization, less replay_stage blowup, and happier users!

What this would look like is:

In this particular example, there's 1 less entry, but the lookback is much longer so transactions that don't have contention with A & B can at least be processed, instead of draining the entire buffer_packets buffered where you might end up with lots of smaller entries.

Challenges:

Need to shrink/drop entries from a vector of packets. Maybe cheap if using linked list type data structure with dynamic allocation
Entries may now be much larger. Need to think more about impact of this, but seems like main selling point of Solana is mass parallelization so we'll want to address those challenges eventually.

jstarry · 2021-12-24T19:16:26Z

Each batch of packets in banking stage can contain up to 128 transactions and we greedily select as many non-conflicting transactions from each batch. That should in theory lead to a decent sized entry but I do think a fair number of transactions get locked out by other banking threads. I think that it would be helpful to get some data first on how banking stage is currently packing entries to see what low hanging fruit there is here.

buffalu · 2022-01-16T05:24:35Z

some more data from snapshot 115343448

the averages remove entries that don't have any type of transactions, so we're not getting screwed up by ticks that have 0 transactions.

i had a feeling the non-voting txs weren't being batched in perfect manner in banking stage, but didn't think it was this bad!
link to data: https://drive.google.com/file/d/1BHJeS7GFI8drkIWebk3AYfGyN3htmhU4/view?usp=sharing

i think it might be combo of:

having suboptimal lookback window (only try to build batch with 128 packets in your batch).
spam from same signature txs (handled in packet_indexes) + bad signatures (marked as discard) can fill up batches, further limiting parallelization.
lots of write contention on accounts

nikhayes · 2022-03-11T17:07:00Z

Each batch of packets in banking stage can contain up to 128 transactions and we greedily select as many non-conflicting transactions from each batch. That should in theory lead to a decent sized entry but I do think a fair number of transactions get locked out by other banking threads. I think that it would be helpful to get some data first on how banking stage is currently packing entries to see what low hanging fruit there is here.

Hey Starry, my understanding is that during a Raydium IDO or NFT drop, some 98% of transactions (just taking a guess) seem to originate from these sources (even after dedupe), and with greedy packing, batches will be nearly completely full with non-parallelizable transactions? It doesn't even seem like maxing the number of banking threads will help much given the way batches are packed like this, and I think adding fee-prioritization won't resolve this issue of the batches containing a lot of non-parallelizable transactions, since I imagine a lot of jpeg-hungry people submitting all their transactions with priority at the same time. It would be great to see data for this as well... but even just based on the greedy-batch packing design I think one can predict the outcome of these situations... I believe a well-behaved, QUIC-filtered 10000 NFT drop will result in similar issues due to the same greedy packing, with overpacking of non-parallelizable transactions into batches. I worry that QUIC will help bandaid the situation for a little bit of time, but this may just hide the underlying issue with the batch design, which might delay effort into improvements to this design.

Btw, @buffalu is your script to track entry data available? Maybe someone(s) could compare congestion vs non-congestion situations. If the hypothesis is that it's still duplicates that are causing these small entries, maybe experiment(s) could be run on mainnet with candymachine with more tame, controlled amounts of spam in the meantime?

@carllin @taozhu-chicago @ryoqun @aeyakovenko

tao-stones · 2022-03-16T16:17:44Z

w.r.t. the original problem @buffalu described, the leader QoS PR is being actively reviewed rn, described here. It basically does similar thing as @buffalu suggested - scans through entire buffer, packet by packet, to select higher paid transaction to be included in Entry. Tho the entry size still stays same as 128 atm, it can be changed laster when we have more data available.
@carllin has another proposal that introduces central scheduler outside of banking_stage. That could be our round 2 improving.
in the worst scenario as @nikhayes mentioned above, if the banking_stage's buffer is full of non-paralleliable transactions, I would think fee-prioritization would be a working mechanism, users just need to bid the fees higher.

nikhayes · 2022-03-17T16:31:34Z

For bulletpoint 1 that seems helpful but the small transactions in an entry would get slowed down by the presence of large transactions requiring execution? I feel like what could be an "easy" speed up before a more complex transaction scheduler is the separation of the transaction queue into transaction pools of different sizes (this could be based on compute, instruction number, etc). Threads would be dedicated to certain sized transaction pools and wouldn't need to always search through a queue that might contain many many raydium transactions, and the entry would include a lot of smaller transactions so could probably execute those chunks of transactions much more quickly. For example, for now you could have a thread/batch that processes transactions of 2 instructions or less (Pyth, transfers, tetc), one for 3 to 5 (Serum, etc), one for 6 to 10, and one for 11+. I forget if chunks are combined across threads to generate an entry or if threads create their own entry. If it's the latter it might be easier to implement this (?). Even if they are timed together, and the small transaction thread is rate limited by the big transaction thread/entry, you would still get better transaction distribution and more parallelization (since it doesn't seem that other transactions typically conflict with Raydium transactions?).

buffalu · 2022-04-11T14:47:25Z

#23066 partially fixes this, but doesn't address the entire issue

buffalu · 2022-05-08T15:16:45Z

looks like mainnet could probably still use this. this is printing from rebatched_txs in execute_batches after the rebatching to try to maximize parallelism occurs.

looks like #24187 from @carllin should somewhat help w/ this when its live

buffalu mentioned this issue Jan 30, 2022

Use parallel iterator for executing transactions in an entry #22717

Closed

buffalu mentioned this issue Mar 11, 2022

Introduce tiled tx execution model instead of batched entries at banking/replaying #23548

Closed

buffalu mentioned this issue May 21, 2022

Transaction Scheduler Prototype #25389

Closed

github-actions bot added the stale [bot only] Added to stale content; results in auto-close after a week. label May 9, 2023

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale May 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider larger lookback window in BankingStage for producing entries #22096

Consider larger lookback window in BankingStage for producing entries #22096

buffalu commented Dec 23, 2021 •

edited

Loading

jstarry commented Dec 24, 2021

buffalu commented Jan 16, 2022 •

edited

Loading

nikhayes commented Mar 11, 2022 •

edited

Loading

tao-stones commented Mar 16, 2022

nikhayes commented Mar 17, 2022 •

edited

Loading

buffalu commented Apr 11, 2022

buffalu commented May 8, 2022 •

edited

Loading

Consider larger lookback window in BankingStage for producing entries #22096

Consider larger lookback window in BankingStage for producing entries #22096

Comments

buffalu commented Dec 23, 2021 • edited Loading

Problem

Proposed Solution

jstarry commented Dec 24, 2021

buffalu commented Jan 16, 2022 • edited Loading

nikhayes commented Mar 11, 2022 • edited Loading

tao-stones commented Mar 16, 2022

nikhayes commented Mar 17, 2022 • edited Loading

buffalu commented Apr 11, 2022

buffalu commented May 8, 2022 • edited Loading

buffalu commented Dec 23, 2021 •

edited

Loading

buffalu commented Jan 16, 2022 •

edited

Loading

nikhayes commented Mar 11, 2022 •

edited

Loading

nikhayes commented Mar 17, 2022 •

edited

Loading

buffalu commented May 8, 2022 •

edited

Loading