Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

brian/qs revert the revert #13964

Closed
wants to merge 88 commits into from
Closed

brian/qs revert the revert #13964

wants to merge 88 commits into from

Conversation

bchocho
Copy link
Contributor

@bchocho bchocho commented Jul 10, 2024

  • Use proof queue asynchronously
  • Committing what I have
  • Sending AddBatches message
  • Calcuating the remaining txns
  • Calculate proof queue size correctly
  • Add a counter
  • Update pfn_const_tps test
  • Minor changes
  • Minor change
  • Add some coutners
  • Rust lint
  • Increasing quorum store backpressure limits
  • setting dynamic_min_txns_per_sec to 160
  • Fixing the calculation
  • increase vfns to 7
  • Fixing the typo in batch generator
  • Add increase fraction
  • Removing skipped transactions after inserting them
  • Add some counters
  • Update consensus pending duration counter
  • Add more counters
  • Increasing block size to 2500
  • Update a counter
  • Increase block size limit
  • Resetting execution config params
  • Moving proof queue to utils.rs
  • Moving counters
  • Use transaction summary
  • intelligent pull proofs
  • Fix a bug in pull proofs
  • Fix the bug
  • Rest to full to false in every iteration
  • Addressing PR comments
  • Move backpressure_tx to proof queue
  • Add info statement
  • Change buckets
  • Add some info statements
  • Cleanup
  • Remove an unrelated change
  • Addressing PR comments
  • Addressing PR comments
  • Add some timer counters
  • Add more timer counters
  • Minor optimization
  • Proof queue to be part of proof manager
  • Move some code to a function
  • Minor fixes
  • Add max_unique_txns parameter
  • Use Lazy
  • Removing comments
  • Minor change
  • Minor change
  • Minor fix
  • Add unit test and address PR comments
  • Minor fix in proof manager
  • Use saturating_sub
  • Exclude expired transactions when counting block size
  • Minor fix
  • Addressing PR comments
  • Minor fix
  • Change the expiration units
  • Fixing unit tests
  • Update unit tests
  • renaming
  • Revert "Revert quorum store] reduce backpressure significantly for more TPS ([quorum store] reduce backpressure significantly for more TPS #13558) and Swap parameters (Revert quorum store] reduce backpressure significantly for more TPS (#13558) and Swap parameters #13666)"
  • run graceful overload

Description

Type of Change

  • New feature
  • Bug fix
  • Breaking change
  • Performance improvement
  • Refactoring
  • Dependency update
  • Documentation update
  • Tests

Which Components or Systems Does This Change Impact?

  • Validator Node
  • Full Node (API, Indexer, etc.)
  • Move/Aptos Virtual Machine
  • Aptos Framework
  • Aptos CLI/SDK
  • Developer Infrastructure
  • Other (specify)

How Has This Been Tested?

Key Areas to Review

Checklist

  • I have read and followed the CONTRIBUTING doc
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I identified and added all stakeholders and component owners affected by this change as reviewers
  • I tested both happy and unhappy path of the functionality
  • I have made corresponding changes to the documentation

Copy link

trunk-io bot commented Jul 10, 2024

⏱️ 4h 39m total CI duration on this PR
Job Cumulative Duration Recent Runs
test-fuzzers 2h 23m 🟩🟩🟩🟩
forge-e2e-test / forge 56m 🟥🟥🟥
rust-images / rust-all 35m 🟩🟩🟩
test-target-determinator 13m 🟩🟩🟩
rust-move-tests 6m 🟩
general-lints 5m 🟩🟩🟩
rust-cargo-deny 5m 🟩🟩🟩
check-dynamic-deps 5m 🟩🟩🟩🟩
rust-move-tests 4m 🟩
rust-move-tests 3m 🟩
semgrep/ci 2m 🟩🟩🟩🟩
file_change_determinator 47s 🟩🟩🟩
file_change_determinator 41s 🟩🟩🟩🟩
file_change_determinator 36s 🟩🟩🟩
permission-check 15s 🟩🟩🟩🟩
permission-check 10s 🟩🟩🟩🟩
permission-check 10s 🟩🟩🟩🟩
permission-check 9s 🟩🟩🟩🟩
permission-check 8s 🟩🟩🟩
determine-docker-build-metadata 8s 🟩🟩🟩
Backport PR 3s 🟥
permission-check 3s 🟩
rust-move-tests 1s

🚨 1 job on the last run was significantly faster/slower than expected

Job Duration vs 7d avg Delta
forge-e2e-test / forge 21m 14m +47%

settingsfeedbackdocs ⋅ learn more about trunk.io

@bchocho bchocho added the CICD:run-forge-e2e-perf Run the e2e perf forge only label Jul 10, 2024

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

Copy link
Contributor

❌ Forge suite realistic_env_max_load failure on 1b55dd8772d865689983ae167fac2b288d2e2a8e

two traffics test: inner traffic : committed: 7440.6442644701465 txn/s, submitted: 11775.31862461904 txn/s, expired: 4334.674360148894 txn/s, latency: 53515.07778111305 ms, (p50: 59600 ms, p90: 61000 ms, p99: 102800 ms), latency samples: 83285
Test Failed: test NetworkLoadTest

Caused by:
    TPS requirement for inner traffic failed. Average TPS 7440.6442644701465, minimum TPS requirement 7500. Full stats: committed: 7440.6442644701465 txn/s, submitted: 11775.31862461904 txn/s, expired: 4334.674360148894 txn/s, latency: 53515.07778111305 ms, (p50: 59600 ms, p90: 61000 ms, p99: 102800 ms), latency samples: 83285

Stack backtrace:
   0: anyhow::error::<impl anyhow::Error>::msg
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/anyhow-1.0.79/src/error.rs:83:36
   1: aptos_forge::success_criteria::SuccessCriteriaChecker::check_tps
             at ./testsuite/forge/src/success_criteria.rs:467:13
   2: aptos_forge::success_criteria::SuccessCriteriaChecker::check_throughput
             at ./testsuite/forge/src/success_criteria.rs:520:9
   3: aptos_forge::success_criteria::SuccessCriteriaChecker::check_core_for_success
             at ./testsuite/forge/src/success_criteria.rs:251:9
   4: <aptos_testcases::two_traffics_test::TwoTrafficsTest as aptos_testcases::NetworkLoadTest>::test::{{closure}}
             at ./testsuite/testcases/src/two_traffics_test.rs:74:9
   5: <core::pin::Pin<P> as core::future::future::Future>::poll
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/future/future.rs:123:9
   6: <dyn aptos_testcases::NetworkLoadTest>::network_load_test::{{closure}}
             at ./testsuite/testcases/src/lib.rs:405:18
   7: <dyn aptos_testcases::NetworkLoadTest as aptos_forge::interface::network::NetworkTest>::run::{{closure}}
             at ./testsuite/testcases/src/lib.rs:268:14
   8: <core::pin::Pin<P> as core::future::future::Future>::poll
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/future/future.rs:123:9
   9: <aptos_testcases::two_traffics_test::TwoTrafficsTest as aptos_forge::interface::network::NetworkTest>::run::{{closure}}
             at ./testsuite/testcases/src/two_traffics_test.rs:88:47
  10: <core::pin::Pin<P> as core::future::future::Future>::poll
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/future/future.rs:123:9
  11: <aptos_testcases::CompositeNetworkTest as aptos_forge::interface::network::NetworkTest>::run::{{closure}}
             at ./testsuite/testcases/src/lib.rs:601:37
  12: <core::pin::Pin<P> as core::future::future::Future>::poll
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/future/future.rs:123:9
  13: tokio::runtime::park::CachedParkThread::block_on::{{closure}}
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/park.rs:282:63
  14: tokio::runtime::coop::with_budget
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/coop.rs:107:5
  15: tokio::runtime::coop::budget
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/coop.rs:73:5
  16: tokio::runtime::park::CachedParkThread::block_on
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/park.rs:282:31
  17: tokio::runtime::context::blocking::BlockingRegionGuard::block_on
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/context/blocking.rs:66:9
  18: tokio::runtime::handle::Handle::block_on::{{closure}}
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/handle.rs:310:22
  19: tokio::runtime::context::runtime::enter_runtime
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/context/runtime.rs:65:16
  20: tokio::runtime::handle::Handle::block_on
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/handle.rs:309:9
  21: aptos_forge::runner::Forge<F>::run::{{closure}}
             at ./testsuite/forge/src/runner.rs:611:49
  22: aptos_forge::runner::run_test
             at ./testsuite/forge/src/runner.rs:684:11
  23: aptos_forge::runner::Forge<F>::run
             at ./testsuite/forge/src/runner.rs:611:30
  24: forge::run_forge
             at ./testsuite/forge-cli/src/main.rs:429:11
  25: forge::main
             at ./testsuite/forge-cli/src/main.rs:355:21
  26: core::ops::function::FnOnce::call_once
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/ops/function.rs:250:5
  27: std::sys_common::backtrace::__rust_begin_short_backtrace
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/sys_common/backtrace.rs:155:18
  28: std::rt::lang_start::{{closure}}
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/rt.rs:166:18
  29: core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &F>::call_once
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/ops/function.rs:284:13
  30: std::panicking::try::do_call
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:552:40
  31: std::panicking::try
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:516:19
  32: std::panic::catch_unwind
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panic.rs:146:14
  33: std::rt::lang_start_internal::{{closure}}
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/rt.rs:148:48
  34: std::panicking::try::do_call
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:552:40
  35: std::panicking::try
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:516:19
  36: std::panic::catch_unwind
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panic.rs:146:14
  37: std::rt::lang_start_internal
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/rt.rs:148:20
  38: main
  39: __libc_start_main
  40: _start
Trailing Log Lines:
  35: std::panicking::try
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:516:19
  36: std::panic::catch_unwind
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panic.rs:146:14
  37: std::rt::lang_start_internal
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/rt.rs:148:20
  38: main
  39: __libc_start_main
  40: _start


Swarm logs can be found here: See fgi output for more information.
{"level":"INFO","source":{"package":"aptos_forge","file":"testsuite/forge/src/backend/k8s/cluster_helper.rs:292"},"thread_name":"main","hostname":"forge-e2e-pr-13964-1720634694-1b55dd8772d865689983ae167fac2b288","timestamp":"2024-07-10T18:23:40.505084Z","message":"Deleting namespace forge-e2e-pr-13964: Some(NamespaceStatus { conditions: None, phase: Some(\"Terminating\") })"}
{"level":"INFO","source":{"package":"aptos_forge","file":"testsuite/forge/src/backend/k8s/cluster_helper.rs:400"},"thread_name":"main","hostname":"forge-e2e-pr-13964-1720634694-1b55dd8772d865689983ae167fac2b288","timestamp":"2024-07-10T18:23:40.505122Z","message":"aptos-node resources for Forge removed in namespace: forge-e2e-pr-13964"}

failures:
    CompositeNetworkTest

test result: FAILED. 0 passed; 1 failed; 0 filtered out

Failed to run tests:
Tests Failed
Error: Tests Failed

Stack backtrace:
   0: anyhow::error::<impl anyhow::Error>::msg
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/anyhow-1.0.79/src/error.rs:83:36
   1: aptos_forge::runner::Forge<F>::run
             at ./testsuite/forge/src/runner.rs:636:13
   2: forge::run_forge
             at ./testsuite/forge-cli/src/main.rs:429:11
   3: forge::main
             at ./testsuite/forge-cli/src/main.rs:355:21
   4: core::ops::function::FnOnce::call_once
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/ops/function.rs:250:5
   5: std::sys_common::backtrace::__rust_begin_short_backtrace
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/sys_common/backtrace.rs:155:18
   6: std::rt::lang_start::{{closure}}
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/rt.rs:166:18
   7: core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &F>::call_once
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/ops/function.rs:284:13
   8: std::panicking::try::do_call
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:552:40
   9: std::panicking::try
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:516:19
  10: std::panic::catch_unwind
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panic.rs:146:14
  11: std::rt::lang_start_internal::{{closure}}
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/rt.rs:148:48
  12: std::panicking::try::do_call
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:552:40
  13: std::panicking::try
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:516:19
  14: std::panic::catch_unwind
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panic.rs:146:14
  15: std::rt::lang_start_internal
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/rt.rs:148:20
  16: main
  17: __libc_start_main
  18: _start
Debugging output:
NAME                                   READY   STATUS      RESTARTS   AGE
aptos-node-0-fullnode-eforge15-0       1/1     Running     0          17m
aptos-node-0-validator-0               1/1     Running     0          17m
aptos-node-1-fullnode-eforge15-0       1/1     Running     0          17m
aptos-node-1-validator-0               1/1     Running     0          17m
aptos-node-10-fullnode-eforge15-0      1/1     Running     0          17m
aptos-node-10-validator-0              1/1     Running     0          17m
aptos-node-11-fullnode-eforge15-0      1/1     Running     0          17m
aptos-node-11-validator-0              1/1     Running     0          17m
aptos-node-12-fullnode-eforge15-0      1/1     Running     0          17m
aptos-node-12-validator-0              1/1     Running     0          17m
aptos-node-13-fullnode-eforge15-0      1/1     Running     0          17m
aptos-node-13-validator-0              1/1     Running     0          17m
aptos-node-14-fullnode-eforge15-0      1/1     Running     0          17m
aptos-node-14-validator-0              1/1     Running     0          17m
aptos-node-15-fullnode-eforge15-0      1/1     Running     0          17m
aptos-node-15-validator-0              1/1     Running     0          17m
aptos-node-16-fullnode-eforge15-0      1/1     Running     0          17m
aptos-node-16-validator-0              1/1     Running     0          17m
aptos-node-17-fullnode-eforge15-0      1/1     Running     0          17m
aptos-node-17-validator-0              1/1     Running     0          17m
aptos-node-18-fullnode-eforge15-0      1/1     Running     0          17m
aptos-node-18-validator-0              1/1     Running     0          17m
aptos-node-19-fullnode-eforge15-0      1/1     Running     0          17m
aptos-node-19-validator-0              1/1     Running     0          17m
aptos-node-2-fullnode-eforge15-0       1/1     Running     0          17m
aptos-node-2-validator-0               1/1     Running     0          17m
aptos-node-3-fullnode-eforge15-0       1/1     Running     0          17m
aptos-node-3-validator-0               1/1     Running     0          17m
aptos-node-4-fullnode-eforge15-0       1/1     Running     0          17m
aptos-node-4-validator-0               1/1     Running     0          17m
aptos-node-5-fullnode-eforge15-0       1/1     Running     0          17m
aptos-node-5-validator-0               1/1     Running     0          17m
aptos-node-6-fullnode-eforge15-0       1/1     Running     0          17m
aptos-node-6-validator-0               1/1     Running     0          17m
aptos-node-7-fullnode-eforge15-0       1/1     Running     0          17m
aptos-node-7-validator-0               1/1     Running     0          17m
aptos-node-8-fullnode-eforge15-0       1/1     Running     0          17m
aptos-node-8-validator-0               1/1     Running     0          17m
aptos-node-9-fullnode-eforge15-0       1/1     Running     0          17m
aptos-node-9-validator-0               1/1     Running     0          17m
genesis-aptos-genesis-eforge15-k6dkd   0/1     Completed   0          18m

@bchocho bchocho closed this Jul 15, 2024
@bchocho bchocho deleted the brian/qs-revert-the-revert branch July 15, 2024 18:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CICD:run-forge-e2e-perf Run the e2e perf forge only
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants