-
Notifications
You must be signed in to change notification settings - Fork 4.5k
Proposal for transaction scheduler based on fee priority #23438
Conversation
Additional fees were introduced to transactions as a method to allow users to bid for priority for | ||
their transactions in the leader's queue. | ||
|
||
Let the additional fee for a transaction `T` be defined as `F(T)`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
F(T) should be defined as
(additional_fee + base_fee) / requested_cuas fee-per-cu to prioritize transactions, instead of just using
additional_fee`, so 100 additional lamports for 1,000 CU transaction should have lower priority compare to 10 additional lamports for 10 CU transaction. The thing is the base_fee (eg signature, write lock etc) are bank dependent, as the fee_schedule changes over epochs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated!
|
||
Pipeline: | ||
1. Sigverify | ||
2. Scheduler |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for a separate scheduler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add 0. Filter
stage. once a state auction is saturated, we can start dropping txs that are below the min price to be considered for inclusion.
@taozhu-chicago, Yes I think we'll have to push individual transactions into the the scheduler's heap, which means deserializing the transactions from the I think this has the added benefit that we don't have to keep deserializing transactions from packets in the BankingStage threads because they will now receive transactions instead of |
Right, deserializing packets into versioned_transaction isn't added cost, I did just that in my proposed PR. But it needs a I was trying to avoid the additional copy from |
#### Components of the `Scheduler`: | ||
|
||
1. `default_transaction_queue` - A max-heap `BinaryHeap<Transaction>` that tracks all pending transactions. | ||
The priority in the heap is the additional fee of the transaction. Transactions are added to this queue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The priority in the heap is the F(T) = fee-per-cu. Part of the calculation is base_fee
that includes signature, write_lock, and compute_fee that all depends on current bank
's feature_set and fee_structure (in bank::calculate_fee(...)
). Probably need to pass leader's current bank to Scheduler somehow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds familiar 🤔
I need to go over the algorithms again in the morning. Looking good though! Thanks for writing it up!
channel. | ||
|
||
Once a BankingStage thread finishes processing a transaction `T` , it sends the `T` back | ||
to the scheduler via the same channel to signal of completion. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need the whole T
? Seems like a ()
would be sufficient
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it doesn't need the whole transaction, just needs:
- The locked accounts https://github.com/solana-labs/solana/pull/23438/files#diff-3507d019efe6542cf0a788ecb226e735731e3b344e6bc2342a31d298fa5c7cefR169
- The transaction signature: https://github.com/solana-labs/solana/pull/23438/files#diff-3507d019efe6542cf0a788ecb226e735731e3b344e6bc2342a31d298fa5c7cefR202
struct BlockedTransactionsQueue { | ||
// The higher priority transactin blocking all the other transactions in | ||
// `blocked_transactions` below | ||
highest_priority_blocked_transaction: Transaction, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't the root of the heap already this by definition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had it not part of the heap. Since this transaction would be referenced/checked a lot it made sense to me to clearly delineate it from the other transactions.
Write(Pubkey), | ||
} | ||
``` | ||
4. `blocked_transactions` - A `HashMap<Signature, Rc<BlockedTransactionsQueue>>` keyed by |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One entry per transaction in the BlockedTransactionQueue
, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah a transaction should only be entered into this heap once, guess we need an existence check/dedup
other_blocked_transactions: BinaryHeap<Transaction> | ||
} | ||
``` | ||
5. `blocked_transaction_queues_by_accounts` - A `HashMap<Pubkey, Rc<BlockedTransactionsQueue>>` keyed by |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How will this work if the same account is referenced in multiple transactions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
they all get stuffed into the heap in the same BlockedTransactionsQueue
|
||
#### Algorithm (Main Loop): | ||
|
||
Assume `N` BankingStage threads: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're smart about the implementation, I think we can track a separate state per banking thread for each iteration
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what kind of state? Was thinking the banking thread state in the scheduler would be pretty lightweight, just channels to send transactions.
2. If `T1` cannot be processed before `T2` because there's already a transaction currently being | ||
processed that contends on an account `A`, then `T2` should not be scheduled if it would grab | ||
any account locks needed by `T1`. This prevents lower fee transactions like `T2` from starving | ||
higher paying transactions like `T1`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify, the starvation is:
- is a pre-existing issue due to banking-stage threads are isolated from each other, a TX that needs many accounts in one thread can be starved due to other threads keep submitting other TXs that take one of those accounts.
- it becomes necessary to be solved now because we are to promise prioritize Txs by fee/CU.
- to solve it, needs some kind central scheduling schema across banking-stage threads.
Is this the correct premises?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you have the following 5 txs:
- fee rate 300. write locks accounts: A, B, C
- fee rate 200: write locks accounts: B, C, D
- fee rate 250: write locks accounts: C, D
- fee rate 400. write locks accounts: E, F
- fee rate 500. write locks accounts: D
would the tx batching then be: [[5, 4, 1], [3], [2]]?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- 500: [A, B, C(read)]
- 450: [A, C(write)]
- 400: [C(read), D]
what about these transactions with account and rw flags? you could build a batches like:
- [[1,3], [2]]
- [[1], [2], [3]]
the second option would respect the fee ordering assuming you care about read lock fees, the first one would result in less batches
@buffalu feel free to take a look as well :) |
1. Once a BankingStage thread finishes processing a batch of transactions `completed_transactions_batch` , | ||
it sends the `completed_transactions_batch` back to the scheduler via the same channel to signal of completion. | ||
|
||
2. Upon receiving this signal, the BankingStage thread processes the locked accounts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo? maybe Upon receiving this signal, the Scheduler thread processes
`transaction_accounts` for each `completed_transaction` in `completed_transactions_batch`: | ||
``` | ||
let mut unlocked_accounts = vec![]; | ||
// First remove all the locks from the tracking list | ||
for locked_account in transaction_accounts { | ||
if self.locked_accounts.remove_reference(locked_account) { | ||
unlocked_accounts.push(locked_account.key()); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fyi, I'm proposing another drastic change, departing from the batching altogether: #23548
if the scheduler thread still don't unlock at all until the whole batched (completed) transactions are returned back from the banking stage, i think we still suffer from somewhat constrained tps due to the problem described there (or i might be wrong...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. What I think will happen is that people participating in a Raydium IDO or NFT drop will be submitting larger numbers of fee-prioritized transactions than other users and these will even more heavily drown at the ability of other transactions in the pool to be executed in parallel, because these fee-prioritized transactions will get batch preference, and will result in the same tps drops. It'll also be tough to deal with these acute spamming periods with the congestion fee raises since it will take time to increase the fees... Maximizing throughput/parallelism as much as possible will more quickly solve demand issues and relieve the spamming more quickly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maximizing throughput/parallelism as much as possible will more quickly solve demand issues and relieve the spamming more quickly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm yeah understandable but I think induced demand > reduced demand due to not meeting expectations that have been set by the main selling point of Solana being the high tps. It seems like supply is going down with a lot of demand. I was seeing blocks during congestion that had 0 votes or non-votes at times which seems strange (Zan was the first one who spotted one and then I saw others thereafter). I just think it would be ideal to have a design where other types of transactions (i.e. payments) are better able to flow around jams caused by certain groups of transactions -- right now they can't when batches are filled with a monoculture of transactions. I mentioned a design idea in response to ryoqun's new thread.
Anyway, just looking to brainstorm with you Trent :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spamming periods with the congestion fee raises since it will take time to increase the fees.
well, I don't think Solana's priority fee is completely same thing as the eth gas. eth gas is for global fee market and solana is for local fee market. and the former is under the persistently saturated condition. the later is under the temporal spiky condition. that's why i want bidding info to be on chain so that people can react quickly to the demand.
It seems like supply is going down with a lot of demand. I was seeing blocks during congestion that had 0 votes or non-votes at times which seems strange
This is true. but this is a bug, not design by intention. Along with this proposal and #23548 and #21883, we're trying to localize the heavily contended accounts, while payments can be as fast and cheat as possible par solana's selling point. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yeah, agree with you that it seems like it's a bug-type of situation. With the current situation in which parallelization seems to become compromised though (probably?) it essentially turns what should be a local fee market into a global fee market though I think. That's why I worry about the situation where you have an nft drop -> NFT people prioritize their transactions -> parallelism drops with the batch design -> people outside of the nft drop need to add more priority than the NFT people. It looks like the newer transaction scheduler should help deal with this situation better though so that's good to see.
overall, i think this is quite good direction with concrete algorithms some random thoughts:
that being said, I still think
|
Yeah this is one of the tradeoffs here for the single-threaded scheduler, which is now the bottleneck for all of banking stage. For now this might be ok, at And +1 for adding a fee for data accesses that will have to be factored in later I think. |
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. |
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. |
for account_key in unlocked_accounts { | ||
if let Some(blocked_transaction_queue) = self.blocked_transaction_queues_by_accounts.get(account_key) { | ||
// Check if the transaction blocking this queue can be run now, thereby unblocking this queue | ||
if blocked_transaction_queue.highest_priority_blocked_transaction.can_get_locks() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of unblocking only the top of queue, you can unblock the first k
elements to potentially avoid starvation. After processing you can also allow any of the first q
elements of the global blocked queue to skip the line if they are now freed up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed separately, but it probably also makes sense to keep track of the number of times that the leader was cut, as to prevent the highest prio tx from also getting starved
i have a WIP scheduler here that im still trying to convince myself works as designed 😆 |
Problem
Summary of Changes
Fixes #
related #23211