expand block capacity calc #24664

jdavis103 · 2022-04-25T21:35:44Z

Problem

A leader could put more transactions into a block than can be completed during the slot. Could put validators behind on future blocks, an even bigger issue if that validator will soon be a leader.

Summary of Changes

If, during replay, we've already taken more time (in compute-unit terms) than fits in a slot, existing code stops work on this block and moves on (if this feature is turned on). This PR expands the capacity-gating of blocks during replay to not only include execution time used, but also three static values -- signature, write-lock, and data bytes costs. This will give a more accurate (and earlier) stopping point if needed.

Fixes #

tao-stones · 2022-04-25T21:54:14Z

runtime/src/cost_model.rs

                    tx_cost.writable_accounts.push(*k);
-                    tx_cost.write_lock_cost += WRITE_LOCK_UNITS;
+                    write_lock_cost += WRITE_LOCK_UNITS;


nit: using saturating_add instead

tao-stones · 2022-04-25T21:59:01Z

ledger/src/blockstore_processor.rs

+        let transaction_cost = cost_model.calculate_cost(transaction);
+        additional_costs += transaction_cost.signature_cost;
+        additional_costs += transaction_cost.write_lock_cost;
+        additional_costs += transaction_cost.data_bytes_cost;


Looks like missing builtins_execution_cost as part of "additional_costs"

tao-stones · 2022-04-25T22:02:03Z

ledger/src/blockstore_processor.rs

@@ -154,6 +154,20 @@ fn aggregate_total_execution_units(execute_timings: &ExecuteTimings) -> u64 {
    execute_cost_units
 }

+fn sum_additional_costs(batch: &TransactionBatch) -> u64 {


probably want to benchmark the performance impact of adding this function during replay.

tao-stones · 2022-04-25T22:08:03Z

ledger/src/blockstore_processor.rs

+    let cost_model = CostModel::new();
+
+    for transaction in transactions {
+        let transaction_cost = cost_model.calculate_cost(transaction);


+1 for using exact same method to compute tx cost, tho not sure if there's performance impact.

On the same token, maybe looking into possibility of replacing cost_capacity_meter with cost_tracker to avoid duplicated logic?

I like cost capacity meter for its simplicity, I think. Using the CostTracker to handle this is a large hammer for a very small nail, and I think it would be more confusing (because it tracks all sorts of stuff, reports it, etc., none of which we need here).

The motivation of this change is to have a safety check to prevent validator being sucked into replaying a large block for long time. So a "large block" could be one that exceeds block_max_limit (capacity meter sort of cover that), or could be one has txs that exceeds account_max_limit, which is not checked. Cost_tracker handles both case, would be good to reuse it for consistency.

tao-stones · 2022-04-25T22:09:21Z

ledger/src/blockstore_processor.rs

        let remaining_block_cost_cap = cost_capacity_meter
            .write()
            .unwrap()
-            .accumulate(execution_cost_units);
+            .accumulate(execution_cost_units + additional_cost_units);

        debug!(


Do you see any block being dropped by this gating feature during tests? I'm very interested to see if block produced by leader would be considered as too-large by validators. Such case should be carefully avoid.

Not in normal tests, but when I made the cap lower (by hacking the code), I could make tests fail, and I added a few print! statements to show that valid numbers were coming through in the additional_cost section of this.

t-nelson · 2022-04-25T22:27:02Z

Is the motivation for this change elaborated in an issue somewhere?

carllin · 2022-04-25T22:41:09Z

ledger/src/blockstore_processor.rs

@@ -154,6 +154,20 @@ fn aggregate_total_execution_units(execute_timings: &ExecuteTimings) -> u64 {
    execute_cost_units
 }

+fn sum_additional_costs(batch: &TransactionBatch) -> u64 {


I would prefer we didn't sum up all these miscellaneous costs for v1. I don't want all these other costs getting grouped together under the same units as execution cost

If we wanted, for instance, in v2 to limit the number of sequential operations, then we could have a maximum compute per account tracked in a separate data structure. But I don't think they should be mixed into the same unit of measurement.

I think here Jason sums up the CUs accountable for those "const" costs - sigverify, write lock, etc. These CUs are needed if to check against block_limit. Account_limit is not yet checked in this version.

Jason early brought up a good concern that if it's possible a leader packed a block to block_limit + 1 CUs (given all the estimated/actual cost adjustment, it is possible), should replay_stage therefore abandon this block because it is 1CU above limit? Probably not. Maybe we want to have replay to use a different limit. If that is the case, then we may as well set up a bpf execution limit for replay, then we don't need to sum up the other costs. Just this bpf execution limit would be total arb.

Carl, I think these are already all in the same unit of measurement. The original code totaled the cu used by the transaction's code, but did not include the static costs, which ARE totaled by the leader when creating the block.

carllin · 2022-04-25T22:43:00Z

This PR to accumulate additional costs seems unnecessary. What's missing the enforcement on the leader side in banking stage to ensure the leader doesn't pack more than the execution cost limit into the block

carllin · 2022-04-25T22:48:51Z

I was just told we've already moved away from the estimated CU into actual computed CU on the leader banking stage, so we just need to reuse that limit here.

tao-stones · 2022-04-26T00:00:18Z

I was just told we've already moved away from the estimated CU into actual computed CU on the leader banking stage, so we just need to reuse that limit here.

So the new work flow on leader side is: use the estimated bpf units to select transactions for execution, and update cost_tracker accordingly; after execution, the cost_tracker is adjusted with the actual bpf units, so leader does not waster block space (as estimated is often, if not always, higher than actual).

However, the block_limit (as well as other limits) referring to transaction cost, which is the sum of sigverify, write lock, data size, builtin program cost, and bpf program cost. So to reuse that limit/logic, you need to sum up those cost beside bpf.

jdavis103 · 2022-04-26T17:54:50Z

Is the motivation for this change elaborated in an issue somewhere?

As i understand it, the main concern is that a leader could choose to put more transactions into a block than it should, which would increase its profits (as long as they didn't overdo it). Essentially overclocking the system, which would work so long as enough machines could still finish in time and reach consensus. This validator-side check would prevent that hack from being profitable.

codecov · 2022-04-27T15:40:05Z

Codecov Report

Merging #24664 (72c427c) into master (7b5aee7) will decrease coverage by 0.0%.
The diff coverage is n/a.

❗ Current head 72c427c differs from pull request most recent head f76c784. Consider uploading reports for the commit f76c784 to get more accurate results

@@            Coverage Diff            @@
##           master   #24664     +/-   ##
=========================================
- Coverage    70.1%    70.0%   -0.1%     
=========================================
  Files          38       37      -1     
  Lines        2303     2301      -2     
  Branches      325      325             
=========================================
- Hits         1615     1613      -2     
  Misses        573      573             
  Partials      115      115

t-nelson · 2022-04-29T08:29:05Z

Is the motivation for this change elaborated in an issue somewhere?
As i understand it, the main concern is that a leader could choose to put more transactions into a block than it should, which would increase its profits (as long as they didn't overdo it). Essentially overclocking the system, which would work so long as enough machines could still finish in time and reach consensus. This validator-side check would prevent that hack from being profitable.

yeah sure, but we're kinda adding pseudo consensus here, though. two implementations means we're going to screw up modifying it in the future. prefer to unify them (in Bank?)

t-nelson · 2022-04-29T08:24:47Z

runtime/src/cost_model.rs

                }
            });
+        write_lock_cost


what are we gaining here?

jdavis103 · 2022-05-06T21:59:00Z

Closing in favor of a re-implementation that doesn't change one of the files.

jdavis103 requested a review from tao-stones April 25, 2022 21:35

tao-stones reviewed Apr 25, 2022

View reviewed changes

carllin reviewed Apr 25, 2022

View reviewed changes

mvines changed the title ~~Jason expand block capacity calc~~ expand block capacity calc Apr 26, 2022

jdavis103 added 9 commits April 27, 2022 14:07

Add additional cost values to capacity-gating of blocks during replay

8da8357

Fix a bug and cleanup get_write_lock_cost()

7754c3b

Typo fix

86bfd1f

Add additional cost values to capacity-gating of blocks during replay

cbc4f54

Rebasing

77c2554

Add additional cost values to capacity-gating of blocks during replay

367aad6

Rebasing

1d72041

Update to use saturating adds

40b5a28

Fix fn name error

f76c784

jdavis103 force-pushed the jason_expand_block_capacity_calc branch from daf86f6 to f76c784 Compare April 27, 2022 19:14

t-nelson reviewed Apr 29, 2022

View reviewed changes

runtime/src/cost_model.rs

}

});

write_lock_cost

Copy link

Contributor

t-nelson Apr 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what are we gaining here?

jdavis103 closed this May 6, 2022

jdavis103 mentioned this pull request May 6, 2022

additional costs in block capacity calc #25059

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

expand block capacity calc #24664

expand block capacity calc #24664

jdavis103 commented Apr 25, 2022

tao-stones Apr 25, 2022

tao-stones Apr 25, 2022

tao-stones Apr 25, 2022

tao-stones Apr 25, 2022

jdavis103 Apr 26, 2022

tao-stones May 5, 2022

tao-stones Apr 25, 2022

jdavis103 Apr 26, 2022

t-nelson commented Apr 25, 2022

carllin Apr 25, 2022

carllin Apr 25, 2022

tao-stones Apr 25, 2022

jdavis103 Apr 26, 2022

carllin commented Apr 25, 2022

carllin commented Apr 25, 2022

tao-stones commented Apr 26, 2022

jdavis103 commented Apr 26, 2022

codecov bot commented Apr 27, 2022 •

edited

Loading

t-nelson commented Apr 29, 2022

t-nelson Apr 29, 2022

jdavis103 commented May 6, 2022

expand block capacity calc #24664

expand block capacity calc #24664

Conversation

jdavis103 commented Apr 25, 2022

Problem

Summary of Changes

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

t-nelson commented Apr 25, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

carllin commented Apr 25, 2022

carllin commented Apr 25, 2022

tao-stones commented Apr 26, 2022

jdavis103 commented Apr 26, 2022

codecov bot commented Apr 27, 2022 • edited Loading

Codecov Report

t-nelson commented Apr 29, 2022

Choose a reason for hiding this comment

jdavis103 commented May 6, 2022

codecov bot commented Apr 27, 2022 •

edited

Loading