Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[cherry-pick] PR 7194: [Quorum Store] block timestamp-based expiration of batches #7175

Merged
merged 3 commits into from
Mar 17, 2023

Conversation

bchocho
Copy link
Contributor

@bchocho bchocho commented Mar 15, 2023

Description

Previously, batches were expired using logical time, i.e., (epoch, round). Validators set batch expiration based on their local logical time. For validators that fell behind in consensus, this causes an issue where batches they create are more likely to not reach quorum or expire after quorum, as other validators have a more advanced round. There were several gap buffers introduced to try to mitigate this, which added complexity.

In this PR, logical time is replaced with system clock times (for creation) and block timestamp times (for expiration). Validators set expiration when creating a batch using their local clock time. The system clock is less likely to fall behind than round times (it doesn't depend on consensus progress). Expiration is based on the block timestamp, which is monotonically increasing on committed blocks, so all validators will still see the same expiration at the same round.

This simplifies things, now there is a single expiration limit (default to 60 seconds) at batch creation time.

Test Plan

Ran land-blocking and three regions tests and did not see any regression.

@bchocho bchocho changed the base branch from main to preview March 15, 2023 17:09
@bchocho bchocho added the CICD:run-e2e-tests when this label is present github actions will run all land-blocking e2e tests from the PR label Mar 15, 2023
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@bchocho bchocho changed the title [Quorum Store] block timestamp-based expiration [Quorum Store] block timestamp-based expiration of batches Mar 16, 2023
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

bchocho added a commit that referenced this pull request Mar 16, 2023
@bchocho bchocho changed the title [Quorum Store] block timestamp-based expiration of batches [cherry-pick] [Quorum Store] block timestamp-based expiration of batches Mar 16, 2023
@github-actions

This comment has been minimized.

bchocho added a commit that referenced this pull request Mar 16, 2023
@bchocho bchocho force-pushed the brian/preview-batch-expiration branch from 54d5ffe to 2e4be4f Compare March 16, 2023 22:03
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

bchocho added a commit that referenced this pull request Mar 17, 2023
@bchocho bchocho force-pushed the brian/preview-batch-expiration branch from 2e4be4f to 8a258c4 Compare March 17, 2023 04:56
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions
Copy link
Contributor

❌ Forge suite framework_upgrade failure on cb4ba0a57c998c60cbab65af31a64875d2588ca5 ==> 8a258c43c8b8b72117153dd8f8e6c5ecd94faeb0

Compatibility test results for cb4ba0a57c998c60cbab65af31a64875d2588ca5 ==> 8a258c43c8b8b72117153dd8f8e6c5ecd94faeb0 (PR)
Upgrade the nodes to version: 8a258c43c8b8b72117153dd8f8e6c5ecd94faeb0
Test Failed: Timed out waiting for Node validator-0:744d4e22fbabc88b28c72b97988eedf9581ee9bc3ba12e89e68284d599887626 to be healthy
Trailing Log Lines:
{"level":"INFO","source":{"package":"aptos_forge","file":"testsuite/forge/src/backend/k8s/stateful_set.rs:145"},"thread_name":"main","hostname":"forge-framework-upgrade-pr-7175-1679030562-cb4ba0a57c998c60cbab","timestamp":"2023-03-17T05:26:00.630725Z","message":"Waiting for pod aptos-node-0-validator-0"}
{"level":"INFO","source":{"package":"aptos_forge","file":"testsuite/forge/src/backend/k8s/stateful_set.rs:101"},"thread_name":"main","hostname":"forge-framework-upgrade-pr-7175-1679030562-cb4ba0a57c998c60cbab","timestamp":"2023-03-17T05:26:10.637293Z","message":"StatefulSet aptos-node-0-validator has scaled to 1"}
::error::Timed out waiting for Node validator-0:744d4e22fbabc88b28c72b97988eedf9581ee9bc3ba12e89e68284d599887626 to be healthy
test framework_upgrade::framework-upgrade ... FAILED
Error: Timed out waiting for Node validator-0:744d4e22fbabc88b28c72b97988eedf9581ee9bc3ba12e89e68284d599887626 to be healthy
Test Statistics: 
Compatibility test results for cb4ba0a57c998c60cbab65af31a64875d2588ca5 ==> 8a258c43c8b8b72117153dd8f8e6c5ecd94faeb0 (PR)
Upgrade the nodes to version: 8a258c43c8b8b72117153dd8f8e6c5ecd94faeb0
Test Failed: Timed out waiting for Node validator-0:744d4e22fbabc88b28c72b97988eedf9581ee9bc3ba12e89e68284d599887626 to be healthy


Swarm logs can be found here: See fgi output for more information.
{"level":"INFO","source":{"package":"aptos_forge","file":"testsuite/forge/src/backend/k8s/cluster_helper.rs:281"},"thread_name":"main","hostname":"forge-framework-upgrade-pr-7175-1679030562-cb4ba0a57c998c60cbab","timestamp":"2023-03-17T05:27:10.934275Z","message":"Deleting namespace forge-framework-upgrade-pr-7175: Some(NamespaceStatus { phase: Some(\"Terminating\") })"}
{"level":"INFO","source":{"package":"aptos_forge","file":"testsuite/forge/src/backend/k8s/cluster_helper.rs:389"},"thread_name":"main","hostname":"forge-framework-upgrade-pr-7175-1679030562-cb4ba0a57c998c60cbab","timestamp":"2023-03-17T05:27:10.934298Z","message":"aptos-node resources for Forge removed in namespace: forge-framework-upgrade-pr-7175"}

failures:
    framework_upgrade::framework-upgrade

test result: FAILED. 0 passed; 1 failed; 0 filtered out

Failed to run tests:
Tests Failed
Error: Tests Failed
Debugging output:
NAME                                    READY   STATUS      RESTARTS      AGE
aptos-node-0-validator-0                0/1     Error       3 (45s ago)   2m24s
aptos-node-1-validator-0                1/1     Running     0             3m18s
aptos-node-2-validator-0                1/1     Running     0             3m18s
aptos-node-3-validator-0                1/1     Running     0             3m18s
aptos-node-4-validator-0                1/1     Running     0             3m18s
genesis-aptos-genesis-eforge215-shdtf   0/1     Completed   0             3m29s

@bchocho bchocho changed the title [cherry-pick] [Quorum Store] block timestamp-based expiration of batches [cherry-pick] PR 7194: [Quorum Store] block timestamp-based expiration of batches Mar 17, 2023
@bchocho bchocho merged commit 5a96cde into preview Mar 17, 2023
@bchocho bchocho deleted the brian/preview-batch-expiration branch March 17, 2023 05:29
@github-actions
Copy link
Contributor

❌ Forge suite compat failure on testnet_2d8b1b57553d869190f61df1aaf7f31a8fc19a7b ==> 8a258c43c8b8b72117153dd8f8e6c5ecd94faeb0

Compatibility test results for testnet_2d8b1b57553d869190f61df1aaf7f31a8fc19a7b ==> 8a258c43c8b8b72117153dd8f8e6c5ecd94faeb0 (PR)
1. Check liveness of validators at old version: testnet_2d8b1b57553d869190f61df1aaf7f31a8fc19a7b
compatibility::simple-validator-upgrade::liveness-check : 7682 TPS, 4994 ms latency, 6900 ms p99 latency,no expired txns
2. Upgrading first Validator to new version: 8a258c43c8b8b72117153dd8f8e6c5ecd94faeb0
Test Failed: Timed out waiting for Node validator-1:e89924ae9f086742fff9c98363ae8d2d2f83803fb89934ff6d2d1111ac738d44 to be healthy
Trailing Log Lines:
::error::Timed out waiting for Node validator-1:e89924ae9f086742fff9c98363ae8d2d2f83803fb89934ff6d2d1111ac738d44 to be healthy
test compatibility::simple-validator-upgrade ... FAILED
Error: Timed out waiting for Node validator-1:e89924ae9f086742fff9c98363ae8d2d2f83803fb89934ff6d2d1111ac738d44 to be healthy
Test Statistics: 
Compatibility test results for testnet_2d8b1b57553d869190f61df1aaf7f31a8fc19a7b ==> 8a258c43c8b8b72117153dd8f8e6c5ecd94faeb0 (PR)
1. Check liveness of validators at old version: testnet_2d8b1b57553d869190f61df1aaf7f31a8fc19a7b
compatibility::simple-validator-upgrade::liveness-check : 7682 TPS, 4994 ms latency, 6900 ms p99 latency,no expired txns
2. Upgrading first Validator to new version: 8a258c43c8b8b72117153dd8f8e6c5ecd94faeb0
Test Failed: Timed out waiting for Node validator-1:e89924ae9f086742fff9c98363ae8d2d2f83803fb89934ff6d2d1111ac738d44 to be healthy


Swarm logs can be found here: See fgi output for more information.
{"level":"INFO","source":{"package":"aptos_forge","file":"testsuite/forge/src/backend/k8s/cluster_helper.rs:281"},"thread_name":"main","hostname":"forge-compat-pr-7175-1679030572-testnet-2d8b1b57553d869190f61df","timestamp":"2023-03-17T05:29:52.960400Z","message":"Deleting namespace forge-compat-pr-7175: Some(NamespaceStatus { phase: Some(\"Terminating\") })"}
{"level":"INFO","source":{"package":"aptos_forge","file":"testsuite/forge/src/backend/k8s/cluster_helper.rs:389"},"thread_name":"main","hostname":"forge-compat-pr-7175-1679030572-testnet-2d8b1b57553d869190f61df","timestamp":"2023-03-17T05:29:52.960425Z","message":"aptos-node resources for Forge removed in namespace: forge-compat-pr-7175"}

failures:
    compatibility::simple-validator-upgrade

test result: FAILED. 0 passed; 1 failed; 0 filtered out

Failed to run tests:
Tests Failed
Error: Tests Failed
Debugging output:
NAME                                    READY   STATUS      RESTARTS      AGE
aptos-node-0-validator-0                1/1     Running     0             4m28s
aptos-node-1-validator-0                1/1     Running     3 (32s ago)   2m7s
aptos-node-2-validator-0                1/1     Running     0             4m28s
aptos-node-3-validator-0                1/1     Running     0             4m28s
aptos-node-4-validator-0                1/1     Running     0             4m28s
genesis-aptos-genesis-eforge113-9f6tx   0/1     Completed   0             6m13s

@github-actions
Copy link
Contributor

❌ Forge suite land_blocking failure on 8a258c43c8b8b72117153dd8f8e6c5ecd94faeb0

performance benchmark with full nodes : 6913 TPS, 5713 ms latency, 9300 ms p99 latency,no expired txns
Test Failed: check for success

Caused by:
    "cpu" metric violated threshold of 12, max_breach_pct: 30, breach_pct: 52 
Trailing Log Lines:

Caused by:
    "cpu" metric violated threshold of 12, max_breach_pct: 30, breach_pct: 52 
Test Statistics: 
performance benchmark with full nodes : 6913 TPS, 5713 ms latency, 9300 ms p99 latency,no expired txns
Test Failed: check for success

Caused by:
    "cpu" metric violated threshold of 12, max_breach_pct: 30, breach_pct: 52 


Swarm logs can be found here: See fgi output for more information.
{"level":"INFO","source":{"package":"aptos_forge","file":"testsuite/forge/src/backend/k8s/cluster_helper.rs:281"},"thread_name":"main","hostname":"forge-e2e-pr-7175-1679030566-8a258c43c8b8b72117153dd8f8e6c5ecd9","timestamp":"2023-03-17T05:37:00.160601Z","message":"Deleting namespace forge-e2e-pr-7175: Some(NamespaceStatus { phase: Some(\"Terminating\") })"}
{"level":"INFO","source":{"package":"aptos_forge","file":"testsuite/forge/src/backend/k8s/cluster_helper.rs:389"},"thread_name":"main","hostname":"forge-e2e-pr-7175-1679030566-8a258c43c8b8b72117153dd8f8e6c5ecd9","timestamp":"2023-03-17T05:37:00.160627Z","message":"aptos-node resources for Forge removed in namespace: forge-e2e-pr-7175"}

failures:
    performance benchmark with full nodes

test result: FAILED. 0 passed; 1 failed; 0 filtered out

Failed to run tests:
Tests Failed
Error: Tests Failed
Debugging output:
NAME                                  READY   STATUS      RESTARTS   AGE
aptos-node-0-fullnode-eforge0-0       1/1     Running     0          10m
aptos-node-0-validator-0              1/1     Running     0          10m
aptos-node-1-fullnode-eforge0-0       1/1     Running     0          10m
aptos-node-1-validator-0              1/1     Running     0          10m
aptos-node-10-validator-0             1/1     Running     0          10m
aptos-node-11-validator-0             1/1     Running     0          10m
aptos-node-12-validator-0             1/1     Running     0          10m
aptos-node-13-validator-0             1/1     Running     0          10m
aptos-node-14-validator-0             1/1     Running     0          10m
aptos-node-15-validator-0             1/1     Running     0          10m
aptos-node-16-validator-0             1/1     Running     0          10m
aptos-node-17-validator-0             1/1     Running     0          10m
aptos-node-18-validator-0             1/1     Running     0          10m
aptos-node-19-validator-0             1/1     Running     0          10m
aptos-node-2-fullnode-eforge0-0       1/1     Running     0          10m
aptos-node-2-validator-0              1/1     Running     0          10m
aptos-node-3-fullnode-eforge0-0       1/1     Running     0          10m
aptos-node-3-validator-0              1/1     Running     0          10m
aptos-node-4-fullnode-eforge0-0       1/1     Running     0          10m
aptos-node-4-validator-0              1/1     Running     0          10m
aptos-node-5-fullnode-eforge0-0       1/1     Running     0          10m
aptos-node-5-validator-0              1/1     Running     0          10m
aptos-node-6-fullnode-eforge0-0       1/1     Running     0          10m
aptos-node-6-validator-0              1/1     Running     0          10m
aptos-node-7-fullnode-eforge0-0       1/1     Running     0          10m
aptos-node-7-validator-0              1/1     Running     0          10m
aptos-node-8-fullnode-eforge0-0       1/1     Running     0          10m
aptos-node-8-validator-0              1/1     Running     0          10m
aptos-node-9-fullnode-eforge0-0       1/1     Running     0          10m
aptos-node-9-validator-0              1/1     Running     0          10m
genesis-aptos-genesis-eforge0-m6pbv   0/1     Completed   0          13m

@bchocho bchocho mentioned this pull request Mar 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CICD:run-e2e-tests when this label is present github actions will run all land-blocking e2e tests from the PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant