Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider larger lookback window in BankingStage for producing entries #22096

Closed
buffalu opened this issue Dec 23, 2021 · 7 comments
Closed

Consider larger lookback window in BankingStage for producing entries #22096

buffalu opened this issue Dec 23, 2021 · 7 comments
Labels
stale [bot only] Added to stale content; results in auto-close after a week.

Comments

@buffalu
Copy link
Contributor

buffalu commented Dec 23, 2021

Note: I haven't gathered data on the below yet, but from understanding how the validator works believe there could be a better way to handle batching entries.

Problem

It seems that under high load and lots of account contention, you can end up with the possibility of many small entries in a block and also lots of small entries where you have a chain of small entries accessing the same accounts. (ie entry n = only mango, entry n+1 only mango, ...)

More, smaller entries means less parallelization and the longer it would take a block to execute. The longer it takes a block to execute, the more replay_stage would blow up attempting to choose a fork.

For instance, take a look at the below example which illustrates this problem:
Screen Shot 2021-12-23 at 12 54 09 PM

Proposed Solution

Instead of examining packets in batches of 128, you massively extend it a much larger (potentially the entire buffered queue) to pack entries. This would prevent high load from impacting non-related transactions (NFT minting, transfers, etc.) and make the system pack larger entries, which means more parallelization, less replay_stage blowup, and happier users!

What this would look like is:
Screen Shot 2021-12-23 at 12 51 32 PM

In this particular example, there's 1 less entry, but the lookback is much longer so transactions that don't have contention with A & B can at least be processed, instead of draining the entire buffer_packets buffered where you might end up with lots of smaller entries.

Challenges:

  • Need to shrink/drop entries from a vector of packets. Maybe cheap if using linked list type data structure with dynamic allocation
  • Entries may now be much larger. Need to think more about impact of this, but seems like main selling point of Solana is mass parallelization so we'll want to address those challenges eventually.
@jstarry
Copy link
Member

jstarry commented Dec 24, 2021

Each batch of packets in banking stage can contain up to 128 transactions and we greedily select as many non-conflicting transactions from each batch. That should in theory lead to a decent sized entry but I do think a fair number of transactions get locked out by other banking threads. I think that it would be helpful to get some data first on how banking stage is currently packing entries to see what low hanging fruit there is here.

@buffalu
Copy link
Contributor Author

buffalu commented Jan 16, 2022

some more data from snapshot 115343448
image
image
image
Screen Shot 2022-01-15 at 9 37 06 PM
Screen Shot 2022-01-15 at 9 37 13 PM

the averages remove entries that don't have any type of transactions, so we're not getting screwed up by ticks that have 0 transactions.

i had a feeling the non-voting txs weren't being batched in perfect manner in banking stage, but didn't think it was this bad!
link to data: https://drive.google.com/file/d/1BHJeS7GFI8drkIWebk3AYfGyN3htmhU4/view?usp=sharing

i think it might be combo of:

  • having suboptimal lookback window (only try to build batch with 128 packets in your batch).
  • spam from same signature txs (handled in packet_indexes) + bad signatures (marked as discard) can fill up batches, further limiting parallelization.
  • lots of write contention on accounts

@nikhayes
Copy link

nikhayes commented Mar 11, 2022

Each batch of packets in banking stage can contain up to 128 transactions and we greedily select as many non-conflicting transactions from each batch. That should in theory lead to a decent sized entry but I do think a fair number of transactions get locked out by other banking threads. I think that it would be helpful to get some data first on how banking stage is currently packing entries to see what low hanging fruit there is here.

Hey Starry, my understanding is that during a Raydium IDO or NFT drop, some 98% of transactions (just taking a guess) seem to originate from these sources (even after dedupe), and with greedy packing, batches will be nearly completely full with non-parallelizable transactions? It doesn't even seem like maxing the number of banking threads will help much given the way batches are packed like this, and I think adding fee-prioritization won't resolve this issue of the batches containing a lot of non-parallelizable transactions, since I imagine a lot of jpeg-hungry people submitting all their transactions with priority at the same time. It would be great to see data for this as well... but even just based on the greedy-batch packing design I think one can predict the outcome of these situations... I believe a well-behaved, QUIC-filtered 10000 NFT drop will result in similar issues due to the same greedy packing, with overpacking of non-parallelizable transactions into batches. I worry that QUIC will help bandaid the situation for a little bit of time, but this may just hide the underlying issue with the batch design, which might delay effort into improvements to this design.

Btw, @buffalu is your script to track entry data available? Maybe someone(s) could compare congestion vs non-congestion situations. If the hypothesis is that it's still duplicates that are causing these small entries, maybe experiment(s) could be run on mainnet with candymachine with more tame, controlled amounts of spam in the meantime?

@carllin @taozhu-chicago @ryoqun @aeyakovenko

@tao-stones
Copy link
Contributor

  • w.r.t. the original problem @buffalu described, the leader QoS PR is being actively reviewed rn, described here. It basically does similar thing as @buffalu suggested - scans through entire buffer, packet by packet, to select higher paid transaction to be included in Entry. Tho the entry size still stays same as 128 atm, it can be changed laster when we have more data available.
  • @carllin has another proposal that introduces central scheduler outside of banking_stage. That could be our round 2 improving.
  • in the worst scenario as @nikhayes mentioned above, if the banking_stage's buffer is full of non-paralleliable transactions, I would think fee-prioritization would be a working mechanism, users just need to bid the fees higher.

@nikhayes
Copy link

nikhayes commented Mar 17, 2022

For bulletpoint 1 that seems helpful but the small transactions in an entry would get slowed down by the presence of large transactions requiring execution? I feel like what could be an "easy" speed up before a more complex transaction scheduler is the separation of the transaction queue into transaction pools of different sizes (this could be based on compute, instruction number, etc). Threads would be dedicated to certain sized transaction pools and wouldn't need to always search through a queue that might contain many many raydium transactions, and the entry would include a lot of smaller transactions so could probably execute those chunks of transactions much more quickly. For example, for now you could have a thread/batch that processes transactions of 2 instructions or less (Pyth, transfers, tetc), one for 3 to 5 (Serum, etc), one for 6 to 10, and one for 11+. I forget if chunks are combined across threads to generate an entry or if threads create their own entry. If it's the latter it might be easier to implement this (?). Even if they are timed together, and the small transaction thread is rate limited by the big transaction thread/entry, you would still get better transaction distribution and more parallelization (since it doesn't seem that other transactions typically conflict with Raydium transactions?).

@buffalu
Copy link
Contributor Author

buffalu commented Apr 11, 2022

#23066 partially fixes this, but doesn't address the entire issue

@buffalu
Copy link
Contributor Author

buffalu commented May 8, 2022

looks like mainnet could probably still use this. this is printing from rebatched_txs in execute_batches after the rebatching to try to maximize parallelism occurs.

looks like #24187 from @carllin should somewhat help w/ this when its live

Screen Shot 2022-05-08 at 10 16 39 AM

@github-actions github-actions bot added the stale [bot only] Added to stale content; results in auto-close after a week. label May 9, 2023
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale May 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale [bot only] Added to stale content; results in auto-close after a week.
Projects
None yet
Development

No branches or pull requests

4 participants