Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BlockSTM] Per-block Gas Limit #7488

Merged
merged 65 commits into from
May 17, 2023
Merged

[BlockSTM] Per-block Gas Limit #7488

merged 65 commits into from
May 17, 2023

Conversation

danielxiangzl
Copy link
Contributor

@danielxiangzl danielxiangzl commented Mar 30, 2023

Description

This PR supports early halting parallel execution (BlockSTM), for scenarios such as exceeding per-block gas limit, module publishing read / write conflicts, reconfiguration and VM internal error. For per-block gas limit, a new on-chain config is added for compatibility.

The PR builds upon rolling commit, which allows the BlockSTM to keep track of committed transaction prefix, and therefore compute the accumulated gas of committed transactions.

Once the parallel execution needs to be early halted, one thread will be in charge of waking up any pending thread (on dependency reads) and mark the execution status of each transaction properly (a new status named ExecutionHalted is introduced).

Fixed related tests on parallel executor and executor. Added on-chain config for compatibility. Passed forge test with block gas limits. Additional monitoring metrics are added in the Execution dashboard with row named "Execution Per Block Gas".

TODO:

  • Determine the per-block gas limit.
  • Properly remove the storage error logging in MoveVM, as we return the storage error for read dependency once the parallel executor is early halted. The error should be passed to AptosVM and handled by the speculative logging.

Test Plan

Existing and new unit / prop tests on parallel executor and executor with block gas limit on and off.
Smoke test related to execution / storage.
Forge performance test and forge compatibility test.
Will add additional tests if necessary.

Copy link
Contributor

@gelash gelash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current semantics seems that we skip the rest of the block once accumulated gas exceeds the limit, but include the first transaction that exceeded (made it >= to be precise). Is this what we want, or should we include the prefix of txns such that the total consumed gas is <limit? (or <=, but I guess < is fine here).

@danielxiangzl @zekun000 @grao1991

aptos-move/block-executor/src/executor.rs Outdated Show resolved Hide resolved
@@ -390,6 +435,14 @@ where
if must_skip {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could merge this check with must_skip || accumulated_gas >= PER_BLOCK_GAS_LIMIT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually need to separate them to distinguish reconfiguration from exceeding the per-block gas limit.

aptos-move/block-executor/src/executor.rs Outdated Show resolved Hide resolved
aptos-move/aptos-vm/src/block_executor/mod.rs Show resolved Hide resolved
@davidiw
Copy link
Contributor

davidiw commented Apr 2, 2023

Do we have a (public) design doc on this topic?

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

} else {
unreachable!();
}
drop(lock);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ugly

aptos-move/block-executor/src/txn_last_input_output.rs Outdated Show resolved Hide resolved
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@danielxiangzl danielxiangzl enabled auto-merge (squash) May 17, 2023 02:49
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions
Copy link
Contributor

✅ Forge suite compat success on testnet_2d8b1b57553d869190f61df1aaf7f31a8fc19a7b ==> dfd15ce3648718b673b745b133ad21757cbf6daf

Compatibility test results for testnet_2d8b1b57553d869190f61df1aaf7f31a8fc19a7b ==> dfd15ce3648718b673b745b133ad21757cbf6daf (PR)
1. Check liveness of validators at old version: testnet_2d8b1b57553d869190f61df1aaf7f31a8fc19a7b
compatibility::simple-validator-upgrade::liveness-check : 10187 TPS, 3682 ms latency, 5800 ms p99 latency,no expired txns
2. Upgrading first Validator to new version: dfd15ce3648718b673b745b133ad21757cbf6daf
compatibility::simple-validator-upgrade::single-validator-upgrade : 5526 TPS, 7197 ms latency, 10000 ms p99 latency,no expired txns
3. Upgrading rest of first batch to new version: dfd15ce3648718b673b745b133ad21757cbf6daf
compatibility::simple-validator-upgrade::half-validator-upgrade : 5771 TPS, 6865 ms latency, 9000 ms p99 latency,no expired txns
4. upgrading second batch to new version: dfd15ce3648718b673b745b133ad21757cbf6daf
compatibility::simple-validator-upgrade::rest-validator-upgrade : 8056 TPS, 4685 ms latency, 8800 ms p99 latency,no expired txns
5. check swarm health
Compatibility test for testnet_2d8b1b57553d869190f61df1aaf7f31a8fc19a7b ==> dfd15ce3648718b673b745b133ad21757cbf6daf passed
Test Ok

@github-actions
Copy link
Contributor

✅ Forge suite land_blocking success on dfd15ce3648718b673b745b133ad21757cbf6daf

performance benchmark : 6060 TPS, 6522 ms latency, 26800 ms p99 latency,no expired txns
Test Ok

@github-actions
Copy link
Contributor

✅ Forge suite framework_upgrade success on aptos-node-v1.3.0_3fc3d42b6cfe27460004f9a0326451bcda840a60 ==> dfd15ce3648718b673b745b133ad21757cbf6daf

Compatibility test results for aptos-node-v1.3.0_3fc3d42b6cfe27460004f9a0326451bcda840a60 ==> dfd15ce3648718b673b745b133ad21757cbf6daf (PR)
Upgrade the nodes to version: dfd15ce3648718b673b745b133ad21757cbf6daf
framework_upgrade::framework-upgrade::full-framework-upgrade : 6554 TPS, 6035 ms latency, 9600 ms p99 latency,no expired txns
5. check swarm health
Compatibility test for aptos-node-v1.3.0_3fc3d42b6cfe27460004f9a0326451bcda840a60 ==> dfd15ce3648718b673b745b133ad21757cbf6daf passed
Test Ok

@danielxiangzl danielxiangzl merged commit 714ee1c into main May 17, 2023
@danielxiangzl danielxiangzl deleted the daniel-per-block-gas branch May 17, 2023 03:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CICD:run-e2e-tests when this label is present github actions will run all land-blocking e2e tests from the PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants