Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

expand block capacity calc #24664

Closed

Conversation

jdavis103
Copy link
Contributor

Problem

A leader could put more transactions into a block than can be completed during the slot. Could put validators behind on future blocks, an even bigger issue if that validator will soon be a leader.

Summary of Changes

If, during replay, we've already taken more time (in compute-unit terms) than fits in a slot, existing code stops work on this block and moves on (if this feature is turned on). This PR expands the capacity-gating of blocks during replay to not only include execution time used, but also three static values -- signature, write-lock, and data bytes costs. This will give a more accurate (and earlier) stopping point if needed.

Fixes #

@jdavis103 jdavis103 requested a review from tao-stones April 25, 2022 21:35
tx_cost.writable_accounts.push(*k);
tx_cost.write_lock_cost += WRITE_LOCK_UNITS;
write_lock_cost += WRITE_LOCK_UNITS;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: using saturating_add instead

let transaction_cost = cost_model.calculate_cost(transaction);
additional_costs += transaction_cost.signature_cost;
additional_costs += transaction_cost.write_lock_cost;
additional_costs += transaction_cost.data_bytes_cost;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like missing builtins_execution_cost as part of "additional_costs"

@@ -154,6 +154,20 @@ fn aggregate_total_execution_units(execute_timings: &ExecuteTimings) -> u64 {
execute_cost_units
}

fn sum_additional_costs(batch: &TransactionBatch) -> u64 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably want to benchmark the performance impact of adding this function during replay.

let cost_model = CostModel::new();

for transaction in transactions {
let transaction_cost = cost_model.calculate_cost(transaction);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for using exact same method to compute tx cost, tho not sure if there's performance impact.

On the same token, maybe looking into possibility of replacing cost_capacity_meter with cost_tracker to avoid duplicated logic?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like cost capacity meter for its simplicity, I think. Using the CostTracker to handle this is a large hammer for a very small nail, and I think it would be more confusing (because it tracks all sorts of stuff, reports it, etc., none of which we need here).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The motivation of this change is to have a safety check to prevent validator being sucked into replaying a large block for long time. So a "large block" could be one that exceeds block_max_limit (capacity meter sort of cover that), or could be one has txs that exceeds account_max_limit, which is not checked. Cost_tracker handles both case, would be good to reuse it for consistency.

let remaining_block_cost_cap = cost_capacity_meter
.write()
.unwrap()
.accumulate(execution_cost_units);
.accumulate(execution_cost_units + additional_cost_units);

debug!(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you see any block being dropped by this gating feature during tests? I'm very interested to see if block produced by leader would be considered as too-large by validators. Such case should be carefully avoid.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not in normal tests, but when I made the cap lower (by hacking the code), I could make tests fail, and I added a few print! statements to show that valid numbers were coming through in the additional_cost section of this.

@t-nelson
Copy link
Contributor

Is the motivation for this change elaborated in an issue somewhere?

@@ -154,6 +154,20 @@ fn aggregate_total_execution_units(execute_timings: &ExecuteTimings) -> u64 {
execute_cost_units
}

fn sum_additional_costs(batch: &TransactionBatch) -> u64 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer we didn't sum up all these miscellaneous costs for v1. I don't want all these other costs getting grouped together under the same units as execution cost

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we wanted, for instance, in v2 to limit the number of sequential operations, then we could have a maximum compute per account tracked in a separate data structure. But I don't think they should be mixed into the same unit of measurement.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think here Jason sums up the CUs accountable for those "const" costs - sigverify, write lock, etc. These CUs are needed if to check against block_limit. Account_limit is not yet checked in this version.

Jason early brought up a good concern that if it's possible a leader packed a block to block_limit + 1 CUs (given all the estimated/actual cost adjustment, it is possible), should replay_stage therefore abandon this block because it is 1CU above limit? Probably not. Maybe we want to have replay to use a different limit. If that is the case, then we may as well set up a bpf execution limit for replay, then we don't need to sum up the other costs. Just this bpf execution limit would be total arb.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Carl, I think these are already all in the same unit of measurement. The original code totaled the cu used by the transaction's code, but did not include the static costs, which ARE totaled by the leader when creating the block.

@carllin
Copy link
Contributor

carllin commented Apr 25, 2022

This PR to accumulate additional costs seems unnecessary. What's missing the enforcement on the leader side in banking stage to ensure the leader doesn't pack more than the execution cost limit into the block

@carllin
Copy link
Contributor

carllin commented Apr 25, 2022

I was just told we've already moved away from the estimated CU into actual computed CU on the leader banking stage, so we just need to reuse that limit here.

@tao-stones
Copy link
Contributor

I was just told we've already moved away from the estimated CU into actual computed CU on the leader banking stage, so we just need to reuse that limit here.

So the new work flow on leader side is: use the estimated bpf units to select transactions for execution, and update cost_tracker accordingly; after execution, the cost_tracker is adjusted with the actual bpf units, so leader does not waster block space (as estimated is often, if not always, higher than actual).

However, the block_limit (as well as other limits) referring to transaction cost, which is the sum of sigverify, write lock, data size, builtin program cost, and bpf program cost. So to reuse that limit/logic, you need to sum up those cost beside bpf.

@mvines mvines changed the title Jason expand block capacity calc expand block capacity calc Apr 26, 2022
@jdavis103
Copy link
Contributor Author

Is the motivation for this change elaborated in an issue somewhere?

As i understand it, the main concern is that a leader could choose to put more transactions into a block than it should, which would increase its profits (as long as they didn't overdo it). Essentially overclocking the system, which would work so long as enough machines could still finish in time and reach consensus. This validator-side check would prevent that hack from being profitable.

@codecov
Copy link

codecov bot commented Apr 27, 2022

Codecov Report

Merging #24664 (72c427c) into master (7b5aee7) will decrease coverage by 0.0%.
The diff coverage is n/a.

❗ Current head 72c427c differs from pull request most recent head f76c784. Consider uploading reports for the commit f76c784 to get more accurate results

@@            Coverage Diff            @@
##           master   #24664     +/-   ##
=========================================
- Coverage    70.1%    70.0%   -0.1%     
=========================================
  Files          38       37      -1     
  Lines        2303     2301      -2     
  Branches      325      325             
=========================================
- Hits         1615     1613      -2     
  Misses        573      573             
  Partials      115      115             

@jdavis103 jdavis103 force-pushed the jason_expand_block_capacity_calc branch from daf86f6 to f76c784 Compare April 27, 2022 19:14
@t-nelson
Copy link
Contributor

Is the motivation for this change elaborated in an issue somewhere?

As i understand it, the main concern is that a leader could choose to put more transactions into a block than it should, which would increase its profits (as long as they didn't overdo it). Essentially overclocking the system, which would work so long as enough machines could still finish in time and reach consensus. This validator-side check would prevent that hack from being profitable.

yeah sure, but we're kinda adding pseudo consensus here, though. two implementations means we're going to screw up modifying it in the future. prefer to unify them (in Bank?)

}
});
write_lock_cost
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what are we gaining here?

@jdavis103
Copy link
Contributor Author

Closing in favor of a re-implementation that doesn't change one of the files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants