-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
expand block capacity calc #24664
expand block capacity calc #24664
Conversation
runtime/src/cost_model.rs
Outdated
tx_cost.writable_accounts.push(*k); | ||
tx_cost.write_lock_cost += WRITE_LOCK_UNITS; | ||
write_lock_cost += WRITE_LOCK_UNITS; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: using saturating_add
instead
ledger/src/blockstore_processor.rs
Outdated
let transaction_cost = cost_model.calculate_cost(transaction); | ||
additional_costs += transaction_cost.signature_cost; | ||
additional_costs += transaction_cost.write_lock_cost; | ||
additional_costs += transaction_cost.data_bytes_cost; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like missing builtins_execution_cost
as part of "additional_costs"
@@ -154,6 +154,20 @@ fn aggregate_total_execution_units(execute_timings: &ExecuteTimings) -> u64 { | |||
execute_cost_units | |||
} | |||
|
|||
fn sum_additional_costs(batch: &TransactionBatch) -> u64 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably want to benchmark the performance impact of adding this function during replay.
let cost_model = CostModel::new(); | ||
|
||
for transaction in transactions { | ||
let transaction_cost = cost_model.calculate_cost(transaction); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for using exact same method to compute tx cost, tho not sure if there's performance impact.
On the same token, maybe looking into possibility of replacing cost_capacity_meter
with cost_tracker
to avoid duplicated logic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like cost capacity meter for its simplicity, I think. Using the CostTracker to handle this is a large hammer for a very small nail, and I think it would be more confusing (because it tracks all sorts of stuff, reports it, etc., none of which we need here).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The motivation of this change is to have a safety check to prevent validator being sucked into replaying a large block for long time. So a "large block" could be one that exceeds block_max_limit (capacity meter sort of cover that), or could be one has txs that exceeds account_max_limit, which is not checked. Cost_tracker handles both case, would be good to reuse it for consistency.
let remaining_block_cost_cap = cost_capacity_meter | ||
.write() | ||
.unwrap() | ||
.accumulate(execution_cost_units); | ||
.accumulate(execution_cost_units + additional_cost_units); | ||
|
||
debug!( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you see any block being dropped by this gating feature during tests? I'm very interested to see if block produced by leader would be considered as too-large by validators. Such case should be carefully avoid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not in normal tests, but when I made the cap lower (by hacking the code), I could make tests fail, and I added a few print! statements to show that valid numbers were coming through in the additional_cost section of this.
Is the motivation for this change elaborated in an issue somewhere? |
@@ -154,6 +154,20 @@ fn aggregate_total_execution_units(execute_timings: &ExecuteTimings) -> u64 { | |||
execute_cost_units | |||
} | |||
|
|||
fn sum_additional_costs(batch: &TransactionBatch) -> u64 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer we didn't sum up all these miscellaneous costs for v1. I don't want all these other costs getting grouped together under the same units as execution cost
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we wanted, for instance, in v2 to limit the number of sequential operations, then we could have a maximum compute per account tracked in a separate data structure. But I don't think they should be mixed into the same unit of measurement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think here Jason sums up the CUs accountable for those "const" costs - sigverify, write lock, etc. These CUs are needed if to check against block_limit
. Account_limit
is not yet checked in this version.
Jason early brought up a good concern that if it's possible a leader packed a block to block_limit + 1
CUs (given all the estimated/actual cost adjustment, it is possible), should replay_stage therefore abandon this block because it is 1CU above limit? Probably not. Maybe we want to have replay to use a different limit. If that is the case, then we may as well set up a bpf execution limit
for replay, then we don't need to sum up the other costs. Just this bpf execution limit
would be total arb.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Carl, I think these are already all in the same unit of measurement. The original code totaled the cu used by the transaction's code, but did not include the static costs, which ARE totaled by the leader when creating the block.
This PR to accumulate additional costs seems unnecessary. What's missing the enforcement on the leader side in banking stage to ensure the leader doesn't pack more than the execution cost limit into the block |
I was just told we've already moved away from the estimated CU into actual computed CU on the leader banking stage, so we just need to reuse that limit here. |
So the new work flow on leader side is: use the However, the |
As i understand it, the main concern is that a leader could choose to put more transactions into a block than it should, which would increase its profits (as long as they didn't overdo it). Essentially overclocking the system, which would work so long as enough machines could still finish in time and reach consensus. This validator-side check would prevent that hack from being profitable. |
Codecov Report
@@ Coverage Diff @@
## master #24664 +/- ##
=========================================
- Coverage 70.1% 70.0% -0.1%
=========================================
Files 38 37 -1
Lines 2303 2301 -2
Branches 325 325
=========================================
- Hits 1615 1613 -2
Misses 573 573
Partials 115 115 |
daf86f6
to
f76c784
Compare
yeah sure, but we're kinda adding pseudo consensus here, though. two implementations means we're going to screw up modifying it in the future. prefer to unify them (in |
} | ||
}); | ||
write_lock_cost |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what are we gaining here?
Closing in favor of a re-implementation that doesn't change one of the files. |
Problem
A leader could put more transactions into a block than can be completed during the slot. Could put validators behind on future blocks, an even bigger issue if that validator will soon be a leader.
Summary of Changes
If, during replay, we've already taken more time (in compute-unit terms) than fits in a slot, existing code stops work on this block and moves on (if this feature is turned on). This PR expands the capacity-gating of blocks during replay to not only include execution time used, but also three static values -- signature, write-lock, and data bytes costs. This will give a more accurate (and earlier) stopping point if needed.
Fixes #