-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scheduler - prioritization fees/cost #34888
Conversation
Prioritization function in this PR should change slightly when SIMD-0096 feature gets implemented: #34731 |
1286858
to
a155566
Compare
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## master #34888 +/- ##
===========================================
+ Coverage 0 81.6% +81.6%
===========================================
Files 0 831 +831
Lines 0 224808 +224808
===========================================
+ Hits 0 183522 +183522
- Misses 0 41286 +41286 |
signature_fees | ||
.saturating_add(priority_fee) | ||
.saturating_div(cost) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commenting here just as a note, because my original push (EDIT(ryoqun): the code was signature_fees / cost + compute_unit_price
) was wrong here and did :
which was slightly incorrect since transaction cost is greater than the requested_cus i.e. bpf cost.
The "priority" (cu_price) is the price only of requested_cus, whereas cost is what counts towards our limits/supply.
So we should be prioritizing to get best price per unit taken out of the supply.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's also a subtle difference here that currently you can game cu_price and get added priority without paying additional fees. Since we are calculating priority fees here, that boosted prioritization goes away.
71899b7
to
d33247b
Compare
Adding tx signature fee into priority calculation makes sense from validator perspective, but want to point out the cost impact to transaction submitters. The priority order changed from
They have same If larger tx_2 wants to catch up to same priority with tx_1, it'd have to add 0.04 lamports/cu in |
Realized several tests were tied to the cost model and fee-budget implementations. Also realized I never use anything but the sum of cost, so just changed to storing that meta instead |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm - another possible solution would be replacing static "signature-fee" with a base_fee = cu_price * cost, where "cost" includes CUs used for sig verify. If that's the case, calculate_prioritization() would equal to cu_price. But that'd be a consensus breaking change. This PR gets same result, but quicker.
Another thought is: due to the impact to end user, might be good idea to have community sign up before merge, wdyt?
what do you mean by "community sign up"? |
A second approval perhaps? |
id, | ||
transaction_ttl, | ||
TransactionPriorityDetails { | ||
priority, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about renaming this to compute_unit_price
in this pr or as a prep pr? two things will be referred to as priority
after this pr, which is confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, rename the struct to ComputeBudgetDetails
to remove Priority
.
// For the purposes of calculating prioritization, we multiply the fees by a large number so that | ||
// the cost is a small fraction. | ||
const MULTIPLIER: u64 = 1_000_000; | ||
total_fee.saturating_mul(MULTIPLIER).saturating_div(cost) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: it seems cost
can't be 0
as far as i read .calculate_fee()
code. however, i think every .*_div(cost)
or /
needs an explicit commentary for the case of 0
or an explicit handling (maybe 0
priority in this case) if it's fed with user-controlled inputs.
@@ -309,21 +319,24 @@ impl SchedulerController { | |||
let mut error_counts = TransactionErrorMetrics::default(); | |||
for chunk in packets.chunks(CHUNK_SIZE) { | |||
let mut post_sanitization_count: usize = 0; | |||
let (transactions, priority_details): (Vec<_>, Vec<_>) = chunk | |||
let (transactions, budget_limits): (Vec<_>, Vec<_>) = chunk |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: how about naming these as transaction_vec
and budget_limits_vec
. so that budget_limtis
doesn't refer to 2 distinct types: Vec<FeeBudgetLimits>
here and plain FeeBudgetLimits
in the following for
loop.
i think this pr is small in code wise but needs careful thoughts (I'll later post such one). also, i wonder we should do some treatment for the
I know this general spirit. however, is there strong/concrete motivation in the real world? |
My main push for this comes from the fact that it incentivizes accurate requests. Requesting lower CUs means lower TransactionCost, which means you get boosted priority. This means by just measuring your app and making accurate requests for those CUs you can get priority for free. Wildly innaccurate cu requests causes so many problems in banking, replay, and elsewhere. Evenso, there's still often non-conflicting transactions that do not use priority fees at all. Let's assume these txs could make reasonably accurate requests of 65k and 135k. |
let cost = CostModel::calculate_cost(&transaction, &bank.feature_set).sum(); | ||
let priority = Self::calculate_prioritization( | ||
fee_structure, | ||
include_loaded_account_data_size_in_fee, | ||
transaction.message(), | ||
&budget_limits, | ||
cost, | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
imo, this call-site should look like this to illustrate its high level relationships for humans at a quick glance:
let (priority, cost) = Self::calculate_priority_and_cost(
&transaction,
fee_structure,
&compute_budget_limits,
// include_loaded_account_data_size_in_fee // <= depends on perf. penalty of .feature_set().is_active(...) invoked everytime for txes.
)
then, the call-hierarchy should like like this
- calculate_priority_and_cost()
- calculate_fee()
- calculate_cost()
fee.saturating_mul(MULTIPLIER).saturating_div(cost)
these 2 new calls are complex in shape, i think hiding details in this context (fn named as buffer_packets()
) should improve readability.
budget_limits: &FeeBudgetLimits, | ||
cost: u64, | ||
) -> u64 { | ||
let total_fee = fee_structure.calculate_fee( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think calling this var just fee
is fine, because there's no other *_fee
s other than this total_fee
.
// We need a multiplier here to avoid rounding down too aggressively. | ||
// For many transactions, the cost will be greater than the fees in terms of raw lamports. | ||
// For the purposes of calculating prioritization, we multiply the fees by a large number so that | ||
// the cost is a small fraction. | ||
const MULTIPLIER: u64 = 1_000_000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good commentary. ❤️
#[test] | ||
fn test_retry_transaction() { | ||
let mut container = TransactionStateContainer::with_capacity(5); | ||
push_to_container(&mut container, 5); | ||
|
||
let id = container.pop().unwrap(); | ||
let transaction_ttl = container | ||
.get_mut_transaction_state(&id.id) | ||
.unwrap() | ||
.transition_to_pending(); | ||
container.retry_transaction(id.id, transaction_ttl); | ||
assert_eq!(container.priority_queue.len(), 5); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i wonder how this test is related to the pr's actual changes?
) -> u64 { | ||
let total_fee = fee_structure.calculate_fee( | ||
message, | ||
5_000, // this just need to be non-zero |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I read the called code... i won't insist for yet another prep pr for cleaning up this api mess. ;)
here it is! To be honest, I'm not super fan of this change. However, this isn't about on-chain/protocol-level change. We can revisit at will. So, I'll approve this pr, once after all code review comments are addressed. As for communication stuff for the community at whole, i think some explanation is needed in some docs (maybe after central scheduler is enabled by default). And there should be some way to expose this priority with rpc. The most fundamental issue is that solana's priority fee / cost relationship is awkward in the first place. And I'm afraid but that this pr looks like a band-aid (hopefully short-term) for the situation, rather than a step of a well-principled solution. Putting aside all status quo, historical contexts, multi-dimensional fees and bla bla, i think solana can loosely model the cost of any given transaction (strictly stake-averaged wall time of executing it, imo) with a single unit of scalar value. Then, users should be able to price the cost depending on their perceived intrinsic value of their transactions as they like with yet another single unit of scalar value. The former should be the CU. The latter should be the CU_PRICE. (well don't nit-pick on naming..). However, this isn't true today. CU doesn't factor in all variables which affects the execution duration (sig verify, scheduling) and CU_PRICE only cover the on-chain program execution. So, I'm saying CU should cover all of costs (including the base cost of fee payer sig verify) and CU_PRICE should be applied to the all-encompassing CU in the ideal world. and that should be all for deriving the prioritoy of given transaction. It conceptually sounds odd to me that the hard-coded fixed 5000 lamports take a part in the priority calc after this pr, meaning it's gaining additional utility, while I totally admit the simple total_fee/total_cost=priority formula makes sense at highest level. some examples for the undelying current problems:
Also, this pr informally establish the fee/cost ratio from the sig verify. ie. 5000 lamports / 720 costs (
Thanks for explaining in detail. I think I'm well exposed to the context. That said, still I wonder this tweak can really motivate people to limit compute units. After all, i don't think they're caring enough about this and I think that's the reason they haven't requested cus for years. I wonder they'll appreciate the boosted priority. I think more direct approach can be approached now that cu_price is around for a while, like increasing fees 10x for not specifying the ComputeBudget IXes. |
2b7670d
to
495fafb
Compare
@ryoqun after merging the previously requested renames, I started rebasing this. Ended up running into several conflicts which would have created more work with original commits. So I just squashed & reworded several commits to minimize.
Not sure I agree with this. This prioritization function is outside of consensus and is just leader making an effort to maximize reward. I will likely be testing other modifications to this function to consider opportunity cost of writing to accounts. So it's entirely possible that this prioritization becomes something we cannot expose via RPC, since it may consider pending transactions held only by the leader. For example, if I have 4 transactions (exagerated CUs to show what happens as we run into block-limits):
even though reward/cost of 1 is higher, we may want to take (2,3,4) instead because the total fee collection is higher while using less total CUs and also using less of the CU-space in accounts A and B, which are potentially hot. Leader's are incentivized to increase reward collection, and the code we provide should make a reasonable effort at doing so. With this PR, savvy operators could pretty easily replace this prioritization function with their own if they felt it could do a better job in the auction of blockspace. Long-term the transaction-selection method(s) we choose to use will almost certainly not be simply representable by a single statically calculated u64 - this is a natural consequence of complex and dynamic markets which leaders are incentivized to adapt to. With all of that said, I do not think UX changes tremendously. We can still estimate fees based on recent fees on the same state. Generally, I think these few properties will tend to hold, regardless of the details of determining inclusion order:
I will add we should communicate point 2 somehow since this threshold will be calculable.
Agree the change in this PR comes from that difference, but I think there will remain a gap even if we were to make CU all-encompassing and the single cu-price the only reward. See above comments and example about this. |
seems there's still some conflicts.. |
495fafb
to
6ae044d
Compare
Yeah seems #35119 which moved an import into a different crate got merged an hour after I pushed yesterday. Bad timing! |
core/src/banking_stage/transaction_scheduler/scheduler_controller.rs
Outdated
Show resolved
Hide resolved
no problem. thanks for addressing my review comments!
Thanks for yet another great write-up. I think we're getting into the bottom line from this constructive discussion: there's a conflict of interest. and which interested party should be represented first? i think it's end users. and you think it's validator operators. firstly, i agree with all of your points raised, if I'm also sided with best interest of validator operators. Also, I'll approve this pr as said before shortly after posting this comment as i understand this pr is done in good spirit. note that I'm kind of extremist and quite opinionated. The reason of my alignment with end users is a bit nuanced. however the gist is that end users are the source of value of solana's existence. ie. no non-vote transactions, no meaning of solana. so, transaction processing should be fair and its mechanism should be predictable and transparent.
this is a nice example to illustrate the different stances. imo, I think 1 should be chosen, following the simple and transparent transaction inclusion rules for the sake of end users.
so, to reiterate my opinion, I still think the transaction 1 should be chosen even though this is sub-optimal for validators, preferring sticking to the strict priority adherence. that's why I'm nit-picking even the slightest incentive skew in this pr. (no intention to block your work, though; just pointing out my opinion) if i were with validator operators, I'd just implement some vote-tx manipulation mechanism and generalized front-run, or recommend using jito. putting aside ethical concerns (the context here is blockchain), that's way more effective to maximize their rewards ;) however, that's not what i want to do. conversely, I'm with end users. so to speak, i think this is a dilemma. On the other hand, I'm aware that this is far from the actual reality on solana or even on other blockchains. validator operator isn't that evil in practice. and end-users won't that worried to the extent of paranoia state-level censorship around them casually. ultimately, i want to obsolete this philosophical difficulty altogether with the scrambled transaction thing by means of technical measure, removing the possibility of arbitrary transaction censoring and reordering from leaders.
yeah, i agree. i don't think the change in this pr per se doesn't materially deviate from the general properties. I'm mainly talking on the theoretical basis to clarify the different opinions in the general direction of banking stage improvements. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm with nits.
@ryoqun - addressed last nit (added comment on github to hopefully clarify confusion on the comment in code). Also since I'm really enjoying this back-and-forth on weighing the interests of validators and users, let's continue!
I suppose that is indirectly my opinion. I actually believe that a self-serving validator that is optimizing for profit also serves the interests of the users; at least to an extent/limit. I have a lot of thoughts on this and should probably write them up somewhere!
I definitely agree that users are the primary source of value to the network! Actually, I think as the ecosystem evolves, and assuming we do not have capability to enforce such things at the protocol level, there needs to be tools developed to identify validators engaging in these nefarious behaviors. RPCs could avoid sending their users transactions to those nodes, which ultimately would make the behavior less profitable overall - due to essentially having access cut off to many RPCs, who do have their users best interest in mind.
Your opinion is actually interesting to me here. From my view, by the validator optimizing for profit and ignoring a strict priority-rule, it is actually serving more users. So it seems to me that this self-serving by the validator also helped users. There's potentially an argument to be made that different validators running different styles of compute-unit auctions is bad for UX because it is harder for the user to predict. Current behavior, and still even with this PR, is similar to a first-price auction whereas what I descived earlier would be more similar to a knapsack auction.
I have been interested in this proposal, but am unsure the trade-off against speed is something the Solana community would accept - although I think there are many concrete benefits to it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
I'll reply to the fun discussion later. this lgtm is just to unblock merging this nice pr.
finally! here's the quite delayed reply from me...
👍
Yeah, i agree that there's a some sweet spot where both validators and users are happy. That's why solana and any blockchains (for that matter) are working in expected way today. So, I'd say this established fact has a great practical significance. However, from the theoretical stand point, blockchains should be usable and resilient to extremely adversarial environments in the face of its existential question. I think this guarantee is what blockchains provide, while centralized systems can't. And I want solana to archive this technical breakthrough.
hehe, I'll look forward to it.
i agree with this as well. thanks for writing down your thinking.
Interesting idea. These tools will be nice to have. That said, i'm living under the threat model where even rpc providers can't be trusted. I'm not day-dreaming that every retail end users will spawn aws instance just to see their balances on daap, however any serious mms/arbs should run their rpc nodes by themselves. that means running rpc should be easy and have no barrier to the entry (i.e. i don't like and actually am planning to ditch stake-weighted qos). along the lines, this off-chain reputation mechanism isn't desirable under that threat model as well...
Let me elaborate my thinking in this aspect. In my opinion, this will erode the trustworthiness of compute unit price from users. If there's some chance to cut in line with fewer cost, why do they do line up obediently? Then, this underprices or overprices cu. I'm not econ expert. but this larger variance will entail some skew in either of direction. This means one party exploits the other because this is a zero-sum game in the pursuit of maximizing the welfare in some measured units as a whole. thus, validator's expected return could be reduced as a whole. All being said, i think this is unfair. Overall, that hampers the price discovery. Also, the knapsack situation arises from the fact there's some discrete slot/block boundaries, which is arbitrarily created due to the existence of cu limits. imo, that's the problem in the very first place. Ideally, everything should be continuous. i.e. block boundaries should a posteriori after the fact of actual consumed wall-time after executing txes.
Yeah, I second about implementing some reasonable default strategy in short term. In Long term, I'm leaning towards introducing huge changes to the protocol.
Yeah, definitely, this is the hardest call. I'm betting my proposal won't be accepted due to pure lack of interest from end users. Still, I'll struggle as much as i can.
I won't say this is impossible at the app-level. However, doing this at the l1 level will make things a lot easier for daap devs, which in turn should attract more daaps. also true rng (a by-product of scrambled tx) is needed to tackle some long-standing dillema at the l1 level. |
* v1.18: Scheduler - prioritization fees/cost (solana-labs#34888) * add commented out functions - simplify diff
Problem
Summary of Changes
Blocked by #34881
Fixes #