Skip to content
This repository has been archived by the owner on Jan 13, 2025. It is now read-only.

Proposal for transaction scheduler based on fee priority #23438

Merged
merged 5 commits into from
May 5, 2022

Conversation

carllin
Copy link
Contributor

@carllin carllin commented Mar 2, 2022

Problem

Summary of Changes

Fixes #

related #23211

@carllin carllin changed the title Proposal for scheduler based on fee priority Proposal for transaction scheduler based on fee priority Mar 2, 2022
Additional fees were introduced to transactions as a method to allow users to bid for priority for
their transactions in the leader's queue.

Let the additional fee for a transaction `T` be defined as `F(T)`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

F(T) should be defined as (additional_fee + base_fee) / requested_cuas fee-per-cu to prioritize transactions, instead of just usingadditional_fee`, so 100 additional lamports for 1,000 CU transaction should have lower priority compare to 10 additional lamports for 10 CU transaction. The thing is the base_fee (eg signature, write lock etc) are bank dependent, as the fee_schedule changes over epochs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated!


Pipeline:
1. Sigverify
2. Scheduler
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for a separate scheduler.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add 0. Filter stage. once a state auction is saturated, we can start dropping txs that are below the min price to be considered for inclusion.

@carllin
Copy link
Contributor Author

carllin commented Mar 2, 2022

One question, are we doing away with packet_batch?

@taozhu-chicago, Yes I think we'll have to push individual transactions into the the scheduler's heap, which means deserializing the transactions from the PacketBatch's that SigVerify is sending to this new scheduler (maybe we could have a pool of threads doing just this).

I think this has the added benefit that we don't have to keep deserializing transactions from packets in the BankingStage threads because they will now receive transactions instead of Packets

@tao-stones
Copy link
Contributor

One question, are we doing away with packet_batch?

@taozhu-chicago, Yes I think we'll have to push individual transactions into the the scheduler's heap, which means deserializing the transactions from the PacketBatch's that SigVerify is sending to this new scheduler (maybe we could have a pool of threads doing just this).

I think this has the added benefit that we don't have to keep deserializing transactions from packets in the BankingStage threads because they will now receive transactions instead of Packets

Right, deserializing packets into versioned_transaction isn't added cost, I did just that in my proposed PR. But it needs a bank to sanitized into sanitized_transaction.

I was trying to avoid the additional copy from PacketBatch -> Transactions.

#### Components of the `Scheduler`:

1. `default_transaction_queue` - A max-heap `BinaryHeap<Transaction>` that tracks all pending transactions.
The priority in the heap is the additional fee of the transaction. Transactions are added to this queue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The priority in the heap is the F(T) = fee-per-cu. Part of the calculation is base_fee that includes signature, write_lock, and compute_fee that all depends on current bank's feature_set and fee_structure (in bank::calculate_fee(...) ). Probably need to pass leader's current bank to Scheduler somehow

Copy link
Contributor

@t-nelson t-nelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds familiar 🤔

I need to go over the algorithms again in the morning. Looking good though! Thanks for writing it up!

channel.

Once a BankingStage thread finishes processing a transaction `T` , it sends the `T` back
to the scheduler via the same channel to signal of completion.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need the whole T? Seems like a () would be sufficient

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

struct BlockedTransactionsQueue {
// The higher priority transactin blocking all the other transactions in
// `blocked_transactions` below
highest_priority_blocked_transaction: Transaction,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the root of the heap already this by definition?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had it not part of the heap. Since this transaction would be referenced/checked a lot it made sense to me to clearly delineate it from the other transactions.

Write(Pubkey),
}
```
4. `blocked_transactions` - A `HashMap<Signature, Rc<BlockedTransactionsQueue>>` keyed by
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One entry per transaction in the BlockedTransactionQueue, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah a transaction should only be entered into this heap once, guess we need an existence check/dedup

other_blocked_transactions: BinaryHeap<Transaction>
}
```
5. `blocked_transaction_queues_by_accounts` - A `HashMap<Pubkey, Rc<BlockedTransactionsQueue>>` keyed by
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will this work if the same account is referenced in multiple transactions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they all get stuffed into the heap in the same BlockedTransactionsQueue

docs/src/proposals/fee_transaction_priority.md Outdated Show resolved Hide resolved
docs/src/proposals/fee_transaction_priority.md Outdated Show resolved Hide resolved

#### Algorithm (Main Loop):

Assume `N` BankingStage threads:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're smart about the implementation, I think we can track a separate state per banking thread for each iteration

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what kind of state? Was thinking the banking thread state in the scheduler would be pretty lightweight, just channels to send transactions.

docs/src/proposals/fee_transaction_priority.md Outdated Show resolved Hide resolved
2. If `T1` cannot be processed before `T2` because there's already a transaction currently being
processed that contends on an account `A`, then `T2` should not be scheduled if it would grab
any account locks needed by `T1`. This prevents lower fee transactions like `T2` from starving
higher paying transactions like `T1`.
Copy link
Contributor

@tao-stones tao-stones Mar 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify, the starvation is:

  • is a pre-existing issue due to banking-stage threads are isolated from each other, a TX that needs many accounts in one thread can be starved due to other threads keep submitting other TXs that take one of those accounts.
  • it becomes necessary to be solved now because we are to promise prioritize Txs by fee/CU.
  • to solve it, needs some kind central scheduling schema across banking-stage threads.

Is this the correct premises?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you have the following 5 txs:

  1. fee rate 300. write locks accounts: A, B, C
  2. fee rate 200: write locks accounts: B, C, D
  3. fee rate 250: write locks accounts: C, D
  4. fee rate 400. write locks accounts: E, F
  5. fee rate 500. write locks accounts: D

would the tx batching then be: [[5, 4, 1], [3], [2]]?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. 500: [A, B, C(read)]
  2. 450: [A, C(write)]
  3. 400: [C(read), D]

what about these transactions with account and rw flags? you could build a batches like:

  • [[1,3], [2]]
  • [[1], [2], [3]]

the second option would respect the fee ordering assuming you care about read lock fees, the first one would result in less batches

@carllin
Copy link
Contributor Author

carllin commented Mar 3, 2022

@buffalu feel free to take a look as well :)

1. Once a BankingStage thread finishes processing a batch of transactions `completed_transactions_batch` ,
it sends the `completed_transactions_batch` back to the scheduler via the same channel to signal of completion.

2. Upon receiving this signal, the BankingStage thread processes the locked accounts
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo? maybe Upon receiving this signal, the Scheduler thread processes

Comment on lines +170 to +178
`transaction_accounts` for each `completed_transaction` in `completed_transactions_batch`:
```
let mut unlocked_accounts = vec![];
// First remove all the locks from the tracking list
for locked_account in transaction_accounts {
if self.locked_accounts.remove_reference(locked_account) {
unlocked_accounts.push(locked_account.key());
}
}
Copy link
Contributor

@ryoqun ryoqun Mar 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fyi, I'm proposing another drastic change, departing from the batching altogether: #23548

if the scheduler thread still don't unlock at all until the whole batched (completed) transactions are returned back from the banking stage, i think we still suffer from somewhat constrained tps due to the problem described there (or i might be wrong...)

Copy link

@nikhayes nikhayes Mar 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. What I think will happen is that people participating in a Raydium IDO or NFT drop will be submitting larger numbers of fee-prioritized transactions than other users and these will even more heavily drown at the ability of other transactions in the pool to be executed in parallel, because these fee-prioritized transactions will get batch preference, and will result in the same tps drops. It'll also be tough to deal with these acute spamming periods with the congestion fee raises since it will take time to increase the fees... Maximizing throughput/parallelism as much as possible will more quickly solve demand issues and relieve the spamming more quickly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maximizing throughput/parallelism as much as possible will more quickly solve demand issues and relieve the spamming more quickly

gm! https://en.wikipedia.org/wiki/Induced_demand

Copy link

@nikhayes nikhayes Mar 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm yeah understandable but I think induced demand > reduced demand due to not meeting expectations that have been set by the main selling point of Solana being the high tps. It seems like supply is going down with a lot of demand. I was seeing blocks during congestion that had 0 votes or non-votes at times which seems strange (Zan was the first one who spotted one and then I saw others thereafter). I just think it would be ideal to have a design where other types of transactions (i.e. payments) are better able to flow around jams caused by certain groups of transactions -- right now they can't when batches are filled with a monoculture of transactions. I mentioned a design idea in response to ryoqun's new thread.

Anyway, just looking to brainstorm with you Trent :)

#23548 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spamming periods with the congestion fee raises since it will take time to increase the fees.

well, I don't think Solana's priority fee is completely same thing as the eth gas. eth gas is for global fee market and solana is for local fee market. and the former is under the persistently saturated condition. the later is under the temporal spiky condition. that's why i want bidding info to be on chain so that people can react quickly to the demand.

It seems like supply is going down with a lot of demand. I was seeing blocks during congestion that had 0 votes or non-votes at times which seems strange

This is true. but this is a bug, not design by intention. Along with this proposal and #23548 and #21883, we're trying to localize the heavily contended accounts, while payments can be as fast and cheat as possible par solana's selling point. :)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah, agree with you that it seems like it's a bug-type of situation. With the current situation in which parallelization seems to become compromised though (probably?) it essentially turns what should be a local fee market into a global fee market though I think. That's why I worry about the situation where you have an nft drop -> NFT people prioritize their transactions -> parallelism drops with the batch design -> people outside of the nft drop need to add more priority than the NFT people. It looks like the newer transaction scheduler should help deal with this situation better though so that's good to see.

@ryoqun
Copy link
Contributor

ryoqun commented Mar 16, 2022

overall, i think this is quite good direction with concrete algorithms

some random thoughts:

  • maybe needs to apply non-linear (yet small?) taxes for txes with huge number (~30) of accounts because of its scheduling cost (will need many collection lookups in the scheduling stage)
  • provide an ability to displace existing pending transaction with higher priority fee while avoiding the replay risk?

that being said, I still think all_transaction_queues should be on-chain-ed:

  • clients still spam the network with their txes as they can't get rid of fear of other's spamming. ;)
  • worse, they still can't have any clue whether their transaction is in the all_transaction_queues or not (i.e. recognized by the cluster)
  • also hard to know current prevailing priority fee for given interested accounts.
  • exposing this off-chain state would be a bit challenging from the operational point of view. rpc nodes aren't staked and staked nodes shouldn't expose rpc.

@carllin
Copy link
Contributor Author

carllin commented Mar 16, 2022

maybe needs to apply non-linear (yet small?) taxes for txes with huge number (~30) of accounts because of its scheduling cost (will need many collection lookups in the scheduling stage)

Yeah this is one of the tradeoffs here for the single-threaded scheduler, which is now the bottleneck for all of banking stage.

For now this might be ok, at <10k tps. My intuition though is that these hashmap lookups should be fast, and since these lookups are only incurred on the next highest fee transaction, that means these lookups only occur on transactions that are about to be scheduled. In comparison to the rest of the transaction execution, I think these will be rather cheap. Even at 10k tps, 30 accounts a transaction, that's 300,000 lookups, which should be worst case a few ms?

And +1 for adding a fee for data accesses that will have to be factored in later I think.

@stale
Copy link

stale bot commented Apr 16, 2022

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

@stale stale bot added stale [bot only] Added to stale content; results in auto-close after a week. and removed stale [bot only] Added to stale content; results in auto-close after a week. labels Apr 16, 2022
@stale
Copy link

stale bot commented Apr 30, 2022

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

@stale stale bot added the stale [bot only] Added to stale content; results in auto-close after a week. label Apr 30, 2022
@carllin carllin merged commit 6a9a7df into solana-labs:master May 5, 2022
@carllin carllin requested a review from jarry-xiao May 6, 2022 17:54
for account_key in unlocked_accounts {
if let Some(blocked_transaction_queue) = self.blocked_transaction_queues_by_accounts.get(account_key) {
// Check if the transaction blocking this queue can be run now, thereby unblocking this queue
if blocked_transaction_queue.highest_priority_blocked_transaction.can_get_locks() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of unblocking only the top of queue, you can unblock the first k elements to potentially avoid starvation. After processing you can also allow any of the first q elements of the global blocked queue to skip the line if they are now freed up.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed separately, but it probably also makes sense to keep track of the number of times that the leader was cut, as to prevent the highest prio tx from also getting starved

@buffalu
Copy link
Contributor

buffalu commented May 19, 2022

i have a WIP scheduler here that im still trying to convince myself works as designed 😆

https://github.com/buffalu/solana/pull/1/files#diff-ef114a0b70f4af706d610895893f1d3146402fdc427971af41db5dc1fe866186

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
stale [bot only] Added to stale content; results in auto-close after a week.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants