Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Network Stability rollout for quic/qos/fee markets #23211

Closed
13 of 15 tasks
aeyakovenko opened this issue Feb 17, 2022 · 47 comments
Closed
13 of 15 tasks

Network Stability rollout for quic/qos/fee markets #23211

aeyakovenko opened this issue Feb 17, 2022 · 47 comments
Labels
stale [bot only] Added to stale content; results in auto-close after a week.

Comments

@aeyakovenko
Copy link
Member

aeyakovenko commented Feb 17, 2022

Problem

Users need to land TXs into a block even when the leader is loaded. This is a summary of the solution so everyone is on the same page.

Proposed Solution

  • implement quic so bots can't send unbounded traffic (1.10)

  • pull txs proportionally from staked nodes, both the TPU and TPUfwd ports. The goal is to prevent anyone from starving the pipeline for proposing txs to the leader. There should be some mechanism for a validator to set a preferred RPC node that overrides the staked list. (v0 in 1.10.15)

  • sort txs by (additional_fee)/(requested compute units). CU's should be deterministic since they are completely specified by the ComputeBudget instruction. Users that are willing to pay more, and devs that can make their TXs more efficient, should be prioritized first. (V0 in 1.10.15)

  • fill the account buckets up to the account write CU limits and the block CU limits. (1.9)

  • keep filling account buckets and forward each bucket to the next leader to avoid starving accounts on forwards. As the account buckets are filled with CUs, forwards should start sending txs from multiple accounts, not just the one with the highest fees. Naive implementation would forward highest paid TXs first, which are likely to be only for a single hot account. Otherwise a single congested market with a ton of demand will saturate the forward queues. (forward packets by prioritization in desc order #25406)

  • design, https://github.com/solana-labs/solana/blob/master/docs/src/proposals/fee_transaction_priority.md

  • testnet release with 1.10.x

  • mainnet-beta upgrade to 1.10.x

  • backport of quic fix Backport ghsa x236 qc46 7v8j #26671 released

  • quic is turned on on mb

  • Priority fee rollout. All major features activated! cli and rpc apis are left Prioritization Fee function enabling  #26017

  • rpc provides switched to quic boot with --tpu-use-quic

  • wallets rolled out with compute budget additional fee

  • UDP is disabled on mb

TBD: optimizations

tag @jackcmay @taozhu-chicago @carllin @t-nelson @sakridge

@t-nelson
Copy link
Contributor

Can you elaborate "loaded"? AFAIK we have several notions

  1. Physical network link
  2. Block "CU" cap
  3. Per-writer "CU" cap
  4. Banking throughput
  5. Replay throughput
  6. ...

It seems unlikely that these all have the same solution. I'm not necessarily convinced that these are root causes or even have a single root cause at this point

@aeyakovenko
Copy link
Member Author

aeyakovenko commented Feb 17, 2022

@t-nelson

the pipeline between the TPU and where banking state puts an entry of TXs into PoH, any place in there could be saturated to the point that it has more stuff to process then time before PoH ends the block. we want TXs to be prioritized as early as possible by 'fee/cu'. If forwarders are doing the prioritization right, and leaders QoS by forwarder stake weight, then valid txs that have a verified fee payer that can afford the TX should get ahead of the queue on the leader most of the time.

@t-nelson
Copy link
Contributor

  1. pull txs proportionally from staked nodes - the goal is to prevent anyone from starving the pipeline for proposing txs to the leader. There should be some mechanism for a validator to set a preferred RPC node that overrides the staked list.

This will require some changes to TX submission since validators only send to TPUFwd. They shouldn't be a major contributor if at all on the TPU port today, so stake-weighted priority will have little effect

  1. sort txs by (additional_fee + base fee = total fee)/compute_units. CU's should be deterministic since they are completely specified by the ComputeBudget instruction. Users that are willing to pay more, and devs that can make their TXs more efficient, should be prioritized first.

I think we need to be deliberate with wording here that these are requested CUs. The effectiveness of this treatment will hinge on both the coverage and assigned-cost accuracy of our measure of true CU usage. Given that "CU" has become a fairly nebulous term, it's hard to be confident that that's the case or even that we're always talking about the same thing.

  1. fill the account buckets up to the account write CU limits and the block CU limits.

  2. keep filling account buckets and forward each bucket to avoid starving accounts on forwards. Naive implementation would forward highest paid TXs first, which are likely to be only for a single hot account.

"Forward" here is to the next leader or down the pipeline?

Any thoughts on how we'll balance fee-based priority with parallelization maximization?

  1. Need an RPC api to estimate the fee user should pay. Sample last 150 slots of MAX(lowest fee for the block, lowest fee of the writable accounts in the TX), show user the top 5%, 50%, 95% paid per CU across those 150 slots.

Since fees are no longer fully deterministic, it seems like there are a lot of games both validators and users can play to influence this. I'll have to think on it before elaborating

@aeyakovenko
Copy link
Member Author

This will require some changes to TX submission since validators only send to TPUFwd. They shouldn't be a major contributor if at all on the TPU port today, so stake-weighted priority will have little effect

both TPU and fwd ports

I think we need to be deliberate with wording here that these are requested CUs. The effectiveness of this treatment will hinge on both the coverage and assigned-cost accuracy of our measure of true CU usage. Given that "CU" has become a fairly nebulous term, it's hard to be confident that that's the case or even that we're always talking about the same thing.

yep, CUs need to be as accurate as gas is in eth.

"Forward" here is to the next leader or down the pipeline?

forward to the next leader

Any thoughts on how we'll balance fee-based priority with parallelization maximization?

keep filling account buckets and forward each bucket to the next leader to avoid starving accounts on forwards.

The idea here is to keep sorting txs into account buckets and then pull from N queues.

Since fees are no longer fully deterministic, it seems like there are a lot of games both validators and users can play to influence this. I'll have to think on it before elaborating

They are deterministic, user signs exactly what they pay for. CUs are specified in the TX, what they request is what they get charged for.

@tao-stones
Copy link
Contributor

sort txs by (additional_fee + base fee = total fee)/(requested compute units).

where does base fee come from? @aeyakovenko

@t-nelson
Copy link
Contributor

sort txs by (additional_fee + base fee = total fee)/(requested compute units).

where does base fee come from? @aeyakovenko

Should be the CU bin they bought

@t-nelson
Copy link
Contributor

Any thoughts on how we'll balance fee-based priority with parallelization maximization?

keep filling account buckets and forward each bucket to the next leader to avoid starving accounts on forwards.

The idea here is to keep sorting txs into account buckets and then pull from N queues.

Yeah the sorting is what I'm interested in details about. Seems like we need to sort by multiple things they may not have the same ideal weights in all conditions

Since fees are no longer fully deterministic, it seems like there are a lot of games both validators and users can play to influence this. I'll have to think on it before elaborating

They are deterministic, user signs exactly what they pay for. CUs are specified in the TX, what they request is what they get charged for.

Right. What they're going to spend is still deterministic, but whether the fee they specify is sufficient for block inclusion is not

@t-nelson t-nelson mentioned this issue Feb 22, 2022
@tao-stones
Copy link
Contributor

Yeah the sorting is what I'm interested in details about. Seems like we need to sort by multiple things they may not have the same ideal weights in all conditions

In #23257, assuming one packet has one transaction, packets with same fee-per-cu are put into same bucket, keeping the same order of stake weighted shuffle. So if two transactions have same fee-per-cu, the one from higher staked validator has better chance to first picked.

@tao-stones
Copy link
Contributor

sort txs by (additional_fee + base fee = total fee)/(requested compute units).

where does base fee come from? @aeyakovenko

Should be the CU bin they bought

does https://github.com/solana-labs/solana/blob/master/runtime/src/bank.rs#L3257-L3274 returns total_fee that includes additional_ff and base_fee?

@aeyakovenko
Copy link
Member Author

@jackcmay ^

@t-nelson
Copy link
Contributor

In #23257, assuming one packet has one transaction, packets with same fee-per-cu are put into same bucket, keeping the same order of stake weighted shuffle. So if two transactions have same fee-per-cu, the one from higher staked validator has better chance to first picked.

This doesn't address my concern. I'm wondering how parallelization will contribute to the sort. We shouldn't find ourselves in a situation that whether through malice or incentives motivated usage, users are buying sub-optimal parallelization

@jackcmay
Copy link
Contributor

sort txs by (additional_fee + base fee = total fee)/(requested compute units).

where does base fee come from? @aeyakovenko

Should be the CU bin they bought

does https://github.com/solana-labs/solana/blob/master/runtime/src/bank.rs#L3257-L3274 returns total_fee that includes additional_ff and base_fee?

Total fee, aka, includes additional fee

@tao-stones
Copy link
Contributor

sort txs by (additional_fee + base fee = total fee)/(requested compute units).

where does base fee come from? @aeyakovenko

Should be the CU bin they bought

does https://github.com/solana-labs/solana/blob/master/runtime/src/bank.rs#L3257-L3274 returns total_fee that includes additional_ff and base_fee?

Total fee, aka, includes additional fee

It takes a bank to calculate total_fee, mainly because it get fee_structure from bank. This works fine except leader would have not secured a bank when it needs fee-per-cu to prioritize buffered packets. Is it OK to use FeeStructure::default() instead of bank.fee_structure for fee calculation?

@aeyakovenko
Copy link
Member Author

aeyakovenko commented Feb 22, 2022

@taozhu-chicago @t-nelson I was thinking we sort by fee, then bucket by account/CU limit. then forward by bucket. so if there are N buckets, we round robin between each bucket up to the block limit and take some X CU's for forward.

@tao-stones
Copy link
Contributor

tao-stones commented Feb 22, 2022

@taozhu-chicago @t-nelson I was thinking we sort by fee, then bucket by account/CU limit. then forward by bucket. so if there are N buckets, we round robin between each bucket up to the block limit and take some X CU's for forward.

Right, after sorting the transactions in buffer by fee-per-cu, leader starts to fill block from highest fee-per-cu TXs, to ensure each write-accounts are always filled higher paid TXs first. We don't really need to create account bucket per se.

wrt forwarding, I am thinking to do following:
create a temp cost_tracker, fill it with sorted txs from buffer, same as filling actual block. when hits block_limit, or buffer is exhausted, forward txs in that cost_tracker in same order. Essentially forwarding a block of Txs that prioritized by fee, bucket by write-accounts.

@jackcmay
Copy link
Contributor

Is it OK to use FeeStructure::default() instead of bank.fee_structure for fee calculation?

FeeStructure is not fixed and will be modified at epoch boundaries over time. It probably falls into a similar category as the compute_budget and congestion where both of those will also be bank-specific based on various conditions (feature activation, congestion, etc...)

@tao-stones
Copy link
Contributor

Is it OK to use FeeStructure::default() instead of bank.fee_structure for fee calculation?

FeeStructure is not fixed and will be modified at epoch boundaries over time. It probably falls into a similar category as the compute_budget and congestion where both of those will also be bank-specific based on various conditions (feature activation, congestion, etc...)

Understood. Was hoping to calculate fee-per-cu for each packet upfront when it was received, so the results can be reused. It'd be easier to do if not requiring a bank.
Guess I can cache "last working_bank", and use it for calculate fee.

@jackcmay
Copy link
Contributor

Or just cache the things you need and yes that is probably the best predictor without looking at current state

@tao-stones
Copy link
Contributor

Yea, so just cache fee_structure and lamports_per_signature

@carllin
Copy link
Contributor

carllin commented Feb 23, 2022

In #23257, assuming one packet has one transaction, packets with same fee-per-cu are put into same bucket, keeping the same order of stake weighted shuffle. So if two transactions have same fee-per-cu, the one from higher staked validator has better chance to first picked.

I was thinking we sort by fee, then bucket by account/CU limit. then forward by bucket. so if there are N buckets, we round robin between each bucket up to the block limit and take some X CU's for forward.

@taozhu-chicago @aeyakovenko if this round robin is done naively, does this open up a way for low transaction fees to starve higher transaction fee transactions?

Imagine I had two buckets, with pending transactions of the form (Write Accounts, Fee/CU)

Bucket 1: [(A, 1000000), (AB, 1000000)]
Bucket 2: [(B, 1), (A, 1), (B, 1), (A, 1), repeats, these are long running computations where Task B takes much longer than task A]

If we are doing round robin by first nonconflicting transaction, then we could get a schedule like

  1. Schedule (A, 1000000)
  2. Can't schedule (AB, 1000000), conflicts on A
  3. Schedule (B, 1),
  4. Task 1. finishes running, can't schedule (AB, 1000000), conflicts on B from Task 3. Instead
    schedule first non-conflicting transaction (A, 1)
  5. Task 3. finishes running. Because Task B takes much longer than Task A, then task 4 is still running. This means we can't schedule (AB, 1000000), because it conflicts on A from Task 4. Instead
    schedule first non-conflicting transaction (B, 1)
  6. Repeat steps 4 and 5 over and over, starving (AB, 1000000)

Seems like to prevent this, you should also have higher fee transactions that are blocked prevent other lower fee transactions from grabbing the same accounts it needs, even if those lower fee transactions could run without conflict at this moment

@tao-stones
Copy link
Contributor

the bucket is by writable-account.
In your example above, after sorting, we'd have a view of buffer like this:
[(A, 1000000), (AB, 1000000), (A, 1), (B, 1), (A, 1), (B, 1), (A, 1), (B, 1) ...]
assume we have chunk_size=4, it'd submitted following chunks for processing, in sequence:

  1. [(A, 1000000), (AB, 1000000), (A, 1), (B, 1)] --> result (B, 1) to be retried due to account_in_use
  2. [(A, 1), (B, 1), (A, 1), (B, 1)]. --> result [(A, 1), (B, 1)] to be retried due to account_in_use

This is assume account_limit >= 2000000;
If account_limit =1000000, with same chunk_size, then:

  1. [(A, 1000000), (AB, 1000000), (A, 1), (B, 1)] --> result [(AB, 1000000), (A, 1)] to be retried due to account_limit and account_in_use
  2. [(A, 1), (B, 1), (A, 1), (B, 1)]. --> result [(A, 1), (B, 1)] to be retried due to account_in_use
    at the end of buffer iteration, what's remaining in buffer will be:
    [(AB, 1000000), (A, 1), (A, 1), (B,1)]
    the second iteration of buffer will first pick up (AB, 1000000)

@carllin
Copy link
Contributor

carllin commented Feb 23, 2022

the bucket is by writable-account.

Hmmm, does this mean every account that tries to grab a write lock on A will fall into the same bucket? What if transactions grab write locks on multiple accounts, like A and B. For example if we had three transactions that each grabbed two write locks:

AB, BC, CD

How would these be organized into buckets?

assume we have chunk_size=4, it'd submitted following chunks for processing, in sequence:

  1. [(A, 1000000), (AB, 1000000), (A, 1), (B, 1)] --> result (B, 1) to be retried due to account_in_use

Hmm in this example, why is only B retried due to account in use, won't (AB, 1000000), (A, 1) be retried because they both try to grab the lock on account A, but A is being locked by the first transaction (A, 1000000)?

@tao-stones
Copy link
Contributor

the bucket is by writable-account.

Hmmm, does this mean every account that tries to grab a write lock on A will fall into the same bucket? What if transactions grab write locks on multiple accounts, like A and B. For example if we had three transactions that each grabbed two write locks:

AB, BC, CD

How would these be organized into buckets?

Right now, if (AB, 1000) is included in a block, it will be counted into both A and B buckets. So both A and B buckets will increase by 1000 CUs in this example.

assume we have chunk_size=4, it'd submitted following chunks for processing, in sequence:

  1. [(A, 1000000), (AB, 1000000), (A, 1), (B, 1)] --> result (B, 1) to be retried due to account_in_use

Hmm in this example, why is only B retried due to account in use, won't (AB, 1000000), (A, 1) be retried because they both try to grab the lock on account A, but A is being locked by the first transaction (A, 1000000)?

Oops, mistakenly flipped outcome. Sorry ... you are right (B, 1) will be included, but any TXs with A will be retied due to AccountInUse

@carllin
Copy link
Contributor

carllin commented Feb 24, 2022

Right now, if (AB, 1000) is included in a block, it will be counted into both A and B buckets. So both A and B buckets will increase by 1000 CUs in this example.

In your example above, after sorting, we'd have a view of buffer like this:
[(A, 1000000), (AB, 1000000), (A, 1), (B, 1), (A, 1), (B, 1), (A, 1), (B, 1) ...]"

So from the above statement, the buckets would actually looks like this?
Bucket 1: [(A, 1000000), (AB, 1000000), (A, 1), (A, 1), (A, 1) ...]
Bucket 2: [(B, 1), (B, 1), (B, 1)]

  1. How do we pick which buckets to schedule from? Do we use any information about the fees of the individual transactions in each of the buckets to prioritize which bucket to choose from?

  2. Does this open the starvation of (AB, 1000000) scenario I described earlier? What happens if (A, 1000000) is scheduled, then (B, 1), then (A, 1) (((AB, 1000000) is skipped because it conflicts with the ongoing transaction (B, 1)), and this keeps repeating

@tao-stones
Copy link
Contributor

  1. Bucket by write lock is to gate TXs not to exceed max_account_limit. (A minor correction is AB will be in bucket 2 as well, as it will also take that amount of CU from B) However, TXs are submitted to bank for execution in order of fee-per-cu, the higher paid TXs first.
  2. AB will be starved, same as in current version. However, with fee-per-cu, it can be prioritized to guarantee a spot in block if it pays enough in additional-fee.

Go through the example in steps:

  1. transactions are sorted by fee-per-cu in following order, where each entry is:
(Write Accounts, Fee/CU, TX_CUs)
(A, 1000000, 100), 
(AB, 1000000, 100), 
(A, 1, 1), 
(B, 1, 2), 
(A, 1, 3), 
(B, 1, 4), 
... 
  1. Assume account_max_limit=101, block_limit=202; TXs will be fitted/filtered by these limits:
(A, 1000000, 100), // scheduled in block, bucket_A is filled by 100cu, 1cu left;
(AB, 1000000, 100), // not scheduled for this block, would exceed bucket_A limit, will retry;
(A, 1, 1), // scheduled in block, bucket_A is filled by 1cu, 0 left;
(B, 1, 2), // scheduled in block, bucket_A is filled by 2cu, 99 left;
(A, 1, 3), // not scheduled for this block, would exceed bucket_A limit, will retry;
(B, 1, 4), // scheduled in block, bucket_A is filled by 3cu, 95 left;
... 

this continues until either block_limit is met, or went through entire buffer. At the end of iteration, these transactions are submitted to bank for execution, and their outcomes:

(A, 1000000, 100),  // commit to bank;
(A, 1, 1), // account-in-use, retry;
(B, 1, 2), // commit to bank;
(B, 1, 4), // account-in-use, retry;
... 

And the remaining buffer (after sorted by fee-per-cu) looks like this:

(AB, 1000000, 100), 
(A, 1, 1), 
(A, 1, 3), 
(B, 1, 4), 
... 
  1. for the next slot:
(AB, 1000000, 100),  // will be scheduled for execution, and will be committed to new block;
(A, 1, 1),  // will be scheduled, but not executed due to account-in-use, therefore retry
(A, 1, 3), // will not be scheduled due to exceed bucket_A limit
(B, 1, 4), // will be scheduled for execution, and will be committed to new block;
... 

Note, if AB was paid more so that its fee/cu > 1000000, then it would be at the top of the sorted list (in step 1), wherefore would land in first block (in step 2).

Also note, this same sorting/fitting/filtering process apply to both block producing in current leader, and packets forwarding.

@t-nelson
Copy link
Contributor

It's starting to feel like this needs a full fledged proposal rather than a bunch of adhoc discussions

@carllin
Copy link
Contributor

carllin commented Feb 24, 2022

  1. Bucket by write lock is to gate TXs not to exceed max_account_limit. (A minor correction is AB will be in bucket 2 as well, as it will also take that amount of CU from B) However, TXs are submitted to bank for execution in order of fee-per-cu, the higher paid TXs first.

Got it, thanks!

  1. AB will be starved, same as in current version.

Hmm I think with the higher fees, it's no longer sufficient to just uphold current guarantees. We have to guarantee that higher fees land with much higher priority than lower fee transactions whenever possible.

  1. Assume account_max_limit=101, block_limit=202; TXs will be fitted/filtered by these limits:
(A, 1000000, 100), // scheduled in block, bucket_A is filled by 100cu, 1cu left;
(AB, 1000000, 100), // not scheduled for this block, would exceed bucket_A limit, will retry;
(A, 1, 1), // scheduled in block, bucket_A is filled by 1cu, 0 left;
(B, 1, 2), // scheduled in block, bucket_A is filled by 2cu, 99 left;
(A, 1, 3), // not scheduled for this block, would exceed bucket_A limit, will retry;
(B, 1, 4), // scheduled in block, bucket_A is filled by 3cu, 95 left;
... 

this continues until either block_limit is met, or went through entire buffer. At the end of iteration, these transactions are submitted to bank for execution, and their outcomes:

I think in your example where the account limit is hit, then the delay on the (AB, 1000000, 100) transaction is reasonable. But what if the issue is not the block/account limits being hit, but one of pure contention on the write locks?

In this example, supposed the account_max_limit and block_limit are infinitely large, so we can just ignore them.

Say now we have the same queue:

(A, 1000000, 100), // scheduled
(AB, 1000000, 100), // not scheduled because contention on `A`
(A, 1, 1),  // not scheduled because contention on `A`
(B, 1, 2), // scheduled
(Task A ends, (AB, 1000000, 100) once again not scheduled because contention on `B`)
(A, 1, 3), // scheduled
(Task B ends, (AB, 1000000, 100) once again not scheduled because contention on `A`)
(B, 1, 2), // scheduled
... repeats

This seems like a starvation scenario we should actively avoid.

@tao-stones
Copy link
Contributor

Hmm, isn't this why we have block-limit and account-limit to help such scenario? Plus allowing additional-fee to give AB better chance?

I feel I am missing something. Can you describe what do you see about to actively avoid this starvation scenario?

@carllin
Copy link
Contributor

carllin commented Feb 25, 2022

Hmm, isn't this why we have block-limit and account-limit to help such scenario? Plus allowing additional-fee to give AB better chance?

Based on my understanding thus far, it doesn't seem these limits will help.

Can you describe what do you see about to actively avoid this starvation scenario?

I was thinking maybe you would need to reserve accounts that are touched by higher paying transactions, even if you could schedule another non-conflicting transaction.

So going back to the example above:

(A, 1000000, 100), // scheduled
(AB, 1000000, 100), // not scheduled because contention on `A`
(Reserve A and B)
(A, 1, 1),  // not scheduled because `A` is reserved
(B, 1, 2), // not scheduled because `B` is reserved, even though there is nobody else currently using `B`

It may also be good to factor in estimated compute. For instance it might be fine to schedule B if we're sure the time to execute is less than the remaining time on (A, 1000000, 100)

@carllin
Copy link
Contributor

carllin commented Feb 25, 2022

@behzadnouri do you have any input on this?

This seems like a variation of the job-shop scheduling problem with n tasks and m machines, where:

  1. We have an estimate of time needed for each task
  2. Each task has a weight w (the fee)
  3. Only constraint is certain tasks that need an account write lock cannot run at the same time as any other tasks that need a lock on the same account

And we need to find a valid schedule that maximizes the total weight of the tasks executed. Are you familiar with any of the latest heuristic based algorithms for this type of problem?

@tao-stones
Copy link
Contributor

I am taking it as a simple scheduling issue.

I interpret the issues as: within banking stage, 1. there are many packets want to be scheduled to be processed by bank, 2. there are way more packets than block capacity;

Proposed solution is simple: to prioritized buffered packets with their fee/cu. Higher paid packet get to "see" bank first.

Want to clarify that the prioritization, eg sorting by fee/cu, is done before loading and locking accounts, which is before bank executing.

@tao-stones
Copy link
Contributor

Hmm, isn't this why we have block-limit and account-limit to help such scenario? Plus allowing additional-fee to give AB better chance?

Based on my understanding thus far, it doesn't seem these limits will help.

With block and account limits, AB would have change to be attempted for process again, as early example shown, it'll be processed. So that'd solve the starvation issue, no?

Can you describe what do you see about to actively avoid this starvation scenario?

I was thinking maybe you would need to reserve accounts that are touched by higher paying transactions, even if you could schedule another non-conflicting transaction.

So going back to the example above:

(A, 1000000, 100), // scheduled
(AB, 1000000, 100), // not scheduled because contention on `A`
(Reserve A and B)
(A, 1, 1),  // not scheduled because `A` is reserved
(B, 1, 2), // not scheduled because `B` is reserved, even though there is nobody else currently using `B`

It may also be good to factor in estimated compute. For instance it might be fine to schedule B if we're sure the time to execute is less than the remaining time on (A, 1000000, 100)

"reserve" isn't really a status for lock. I mean what does "reserved" different from "read locked"?

@aeyakovenko
Copy link
Member Author

aeyakovenko commented Feb 25, 2022

@taozhu-chicago @carllin @t-nelson

This shouldn't be that complicated. The sorter creates N queues, each queue represents a writable account. The forwarder should forward in RR from the N queues, taking some amount of CU's up to the block limit

On the leader side we should get a mix of the highest paying TXs from the top accounts from all the validators so we shouldn't see anyone starved.

  1. high paying fees shouldn't be starved by any other txs
  2. high paying saturated accounts shouldn't starve every one else
  3. small staked nodes should be able to propose their txs so leader can observe high paying fees (1 and 2) from everyone on the network

Basically, however we are packing the block now, we can continue to do so, except instead of moving TXs into banking stage they are just forwarded, and the current node observes the same block limits, etc...

@carllin
Copy link
Contributor

carllin commented Feb 25, 2022

@aeyakovenko

high paying fees shouldn't be starved by any other txs

For this to be true, In this round robin, for each queue Q we have to ensure then that we never replay a later transaction in Q if an earlier one has not been replayed then right?

For instance if you have two queues:

Queue A: {A, 10000}, {AB, 10000}
Queue B: {AB, 10000} (same transaction as in Queue A}, {B, 1}

If the order of events is doing round robin:

  1. {A, 10000} from Queue A scheduled
  2. {AB, 10000} from Queue B not scheduled because there's contention on account A
  3. We then do not schedule {B, 1} in Queue B, even if it was able to be replayed without contention because an earlier transaction in Queue B was not played yet.

@carllin
Copy link
Contributor

carllin commented Feb 25, 2022

The sorter creates N queues, each queue represents a writable account. The forwarder should forward in RR from the N queues, taking some amount of CU's up to the block limit

Seems like you also need top level prioritization of which bucket to look at first based on fees, it shouldn't be a blind round robin. You could easily have 100,000 buckets for different write accounts, majority of them being low fees, and only a couple containing high fees transactions that we need to prioritize

@tao-stones
Copy link
Contributor

Thinking it might be simpler to select TXs from sorted list that can fill up a block (with account limits), without worrying about account contention yet. Then submit selected TXs to bank for execution, from this point on, it works the same way as it is today - TX can't get its account(s) locked will be put back to buffered as retryable, therefore to be reconsidered for next bank.

@jstarry
Copy link
Member

jstarry commented Feb 26, 2022

@taozhu-chicago @carllin @t-nelson

This shouldn't be that complicated. The sorter creates N queues, each queue represents a writable account. The forwarder should forward in RR from the N queues, taking some amount of CU's up to the block limit

In my opinion, this problem is inherently complicated because packing conflicting transactions efficiently by fee weight is not trivial. We are going to have to make some tradeoffs here and I think we just need to get aligned on what those tradeoffs are (including attack vectors) in a proposal.

@t-nelson
Copy link
Contributor

t-nelson commented Feb 26, 2022

the fact that we're dealing with an np-hard problem for which we want an online, time-constrained solution, should preclude "simple"

@ryoqun
Copy link
Member

ryoqun commented Feb 27, 2022

(hi, let's make the party even merrier. lol)

here's another rando idea: on-chain per-account auctioned mempool with deterministic scheduling (this is like a pre-proposal; did we consider like this?)

pros

  • no off-chain mechanism for tx ordering decision. (personally, i want to avoid those gaming factors even if the power is weighted by stakes)
  • tx with less contentious accoutns will be executed cheaply as was before.

cons

  • yes, mempool! we're same as ETH in a sense. (but the tx should be very cheap and congestion should be local to specific programs)
  • introduce single threaded code a bit
  • rather a big change

current num-of-sigs tx fee is re-interpreted as block-inclusion-fee (bi-fee). this fixed fee is now strictly modeled as sig verify cost and the tx blob propagation cost across the cluster. now, no execution is guaranteed; older contentious txes will be pruned from the mempool according to recent_blockhash without execution at all while the bi-fee (also ee-fee; described below) is debited.

additionally, users can now tip more as expedite-execution-fee (ee-fee) with bidding mechanism detailed below.

background/thoughts:

  • imo, the underlying problem is that clients can't know their txes are observed or not by the cluster in the first place. so, they blindly sends a ton.
    • so, make tx inclustion very reliable and observable.
    • finally shift the scheduling problem to the original tx submitters (after all, they should know the value of tx most accurately)
  • maybe leader's outgoing bandwidth can be better utilized for turbine not for (possibly redundant) tx forwarding. so pipe all the possibly unused txes into the downstream as much as possible (while collecting tx fees from greedy bots)
    • there should be little security concern for so many txes in mempools; memory should be cheap and ample, compared to other resources.
  • no matter what, people will compete for tx prioritization on-/off-chain, (c)overtly. so let's introduce efficient, transparent, and permissionless markets for it.
  • (bonus) the hard-to-debug my-(dropped)-tx-not-found-on-explorer problem will also be solved.

(for simplicity, assume that only write accounts and tx runtime is rather uniform)

pending txes:

tx1 acc_a, acc_b (ee-fee: 300)
tx2 acc_a, acc_c (ee-fee: 250)
tx3 acc_a (ee-fee: 20)
tx4 acc_b (ee-fee: 400)
tx5 acc_e (ee-fee: 0)

(informative only) mempools for each account (ordered by ee-fee)

mempool acc_a
300 tx1
250 tx2

mempool acc_b
400 tx4
300 tx1

mempool acc_c
250 tx2

mempool acc_d
20 tx3

mempool acc_e
0 tx5

so, a single execution batch packing algorithm is simple:

  1. maintain all pending txes sorted by ee-fee
  2. pick highest paying transaction and lock the accounts and put it into the batch
  3. pick the next tx
  4. if the accounts are already locked, skip
  5. if the accounts not locked, lock them and put it into the batch
  6. continue until some kind of maximum concurrent execution limit (like batch max size) is reached.

lastly, the scheduled batch would be like this:

batch1: tx4, tx2, tx5
batch2: tx1
batch3: tx3


finally, the tx submitters can plan ee-fee for their urgency and prevalent bidding info, and re-adjust ee-fee if their tx are in the mempool for too long.

@tao-stones
Copy link
Contributor

Proposal to review Leader QoS #23369
@jstarry @t-nelson @carllin @aeyakovenko

@t-nelson
Copy link
Contributor

(hi, let's make the party even merrier. lol)

here's another rando idea: on-chain per-account auctioned mempool with deterministic scheduling (this is like a pre-

We're looking for less complex solutions, ATM. This sounds cool though! Maybe a future effort 😂

@carllin
Copy link
Contributor

carllin commented Mar 2, 2022

Hmm, isn't this why we have block-limit and account-limit to help such scenario? Plus allowing additional-fee to give AB better chance?

I feel I am missing something. Can you describe what do you see about to actively avoid this starvation scenario?

Here's a proposal @t-nelson and I were brainstorming on: https://github.com/solana-labs/solana/blob/539cb5c857982444d4d5b238638e8028471e3c70/docs/src/proposals/fee_transaction_priority.md, please take a look as it goes into more detail

@tao-stones
Copy link
Contributor

tao-stones commented Mar 2, 2022

Here's a proposal @t-nelson and I were brainstorming on: https://github.com/solana-labs/solana/blob/539cb5c857982444d4d5b238638e8028471e3c70/docs/src/proposals/fee_transaction_priority.md, please take a look as it goes into more detail

The proposal looks great! 👍🏼 Is there a PR to put commits?

  • +1 for separation of responsibility, putting prioritization algo into a self contained logical model is neat. Potential concern is if requires an additional copying.
  • +1 for moving up account contention check in the pipeline; I still have question wrt the No.2 point in proposal tho, mainly on benefit/cost. Proposed approach seems to guarantee fee prioritization within each block, with additional gate keeping logic; v.s. what I did to let lower priority TXs in current block even they requires accounts needed by higher priority TX, but to ensure the higher TX has its place in next block (e.g., fee prioritization cross slots).
  • One thing to point out: it should use (additional_fee + base_fee) / requested_cu as fee-per-cu to prioritize transactions, instead of just using additional_fee, so 100 additional lamports for 1,000 CU transaction should have lower priority compare to 10 additional lamports for 10 CU transaction. The thing is the base_fee (eg signature, write lock etc) are bank dependent, as the fee_schedule changes over epochs.
  • One question, are we doing away with packet_batchs?

@carllin
Copy link
Contributor

carllin commented Mar 2, 2022

Let's move this discussion to the proposal! 😄

@gshum
Copy link

gshum commented May 17, 2022

What about txs involving the same address that need to be ran serially? Is there some way to up fees on them or something like that. Seems like it creates a bottleneck but I've never analyzed Solana traffic so I can't say 100%

@solana-labs solana-labs deleted a comment Jun 13, 2022
@aeyakovenko aeyakovenko changed the title compute budget/qos/fee priority change summary Network Stability rollout for quic/qos/fee markets Jun 15, 2022
@kasoutsuuka
Copy link

How do we GUARANTEE a leader will process transactions through QoS?

@hydrogenbond007
Copy link
Contributor

Hey had a question been going through this report and saw the test results in quic-go/quic-go#2586 (comment) was a bit a confused isn't quic supposed to be ore performant than TCP? and what are the effects of quic on QOS?
Sorry if the the questions sound dumb still learning

@github-actions github-actions bot added the stale [bot only] Added to stale content; results in auto-close after a week. label Apr 22, 2024
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Apr 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale [bot only] Added to stale content; results in auto-close after a week.
Projects
None yet
Development

No branches or pull requests

10 participants