Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

brian/test graceful overload #13856

Closed
wants to merge 74 commits into from
Closed

brian/test graceful overload #13856

wants to merge 74 commits into from

Conversation

bchocho
Copy link
Contributor

@bchocho bchocho commented Jun 28, 2024

  • Use proof queue asynchronously
  • Committing what I have
  • Sending AddBatches message
  • Calcuating the remaining txns
  • Calculate proof queue size correctly
  • Add a counter
  • Update pfn_const_tps test
  • Minor changes
  • Minor change
  • Add some coutners
  • Rust lint
  • Increasing quorum store backpressure limits
  • setting dynamic_min_txns_per_sec to 160
  • Fixing the calculation
  • increase vfns to 7
  • Fixing the typo in batch generator
  • Add increase fraction
  • Removing skipped transactions after inserting them
  • Add some counters
  • Update consensus pending duration counter
  • Add more counters
  • Increasing block size to 2500
  • Update a counter
  • Increase block size limit
  • Resetting execution config params
  • Moving proof queue to utils.rs
  • Moving counters
  • Use transaction summary
  • intelligent pull proofs
  • Fix a bug in pull proofs
  • Fix the bug
  • Rest to full to false in every iteration
  • Addressing PR comments
  • Move backpressure_tx to proof queue
  • Add info statement
  • Change buckets
  • Add some info statements
  • Cleanup
  • Remove an unrelated change
  • Addressing PR comments
  • Addressing PR comments
  • Add some timer counters
  • Add more timer counters
  • Minor optimization
  • Proof queue to be part of proof manager
  • Move some code to a function
  • Minor fixes
  • Add max_unique_txns parameter
  • Use Lazy
  • Removing comments
  • Minor change
  • Minor change
  • Minor fix
  • Add unit test and address PR comments
  • Add feature gate and golden test
  • only run test-replay for PRs with label (only run test-replay for PRs with label #13833)
  • [Consensus] Disable flaky tests.
  • [move] Enable type size limit ([move] Enable type size limit #13793)
  • compat expansion; forge refactor (compat expansion; forge refactor #13302)
  • replay last 300m transactions (replay last 300m transactions #13777)
  • Minor fix in proof manager
  • [rand] cleanup unused api v1
  • [perf] re-calibrate single node numbers
  • Use saturating_sub
  • Exclude expired transactions when counting block size
  • Revert "Revert quorum store] reduce backpressure significantly for more TPS ([quorum store] reduce backpressure significantly for more TPS #13558) and Swap parameters (Revert quorum store] reduce backpressure significantly for more TPS (#13558) and Swap parameters #13666)"
  • run graceful overload

Description

Type of Change

  • New feature
  • Bug fix
  • Breaking change
  • Performance improvement
  • Refactoring
  • Dependency update
  • Documentation update
  • Tests

Which Components or Systems Does This Change Impact?

  • Validator Node
  • Full Node (API, Indexer, etc.)
  • Move/Aptos Virtual Machine
  • Aptos Framework
  • Aptos CLI/SDK
  • Developer Infrastructure
  • Other (specify)

How Has This Been Tested?

Key Areas to Review

Checklist

  • I have read and followed the CONTRIBUTING doc
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I identified and added all stakeholders and component owners affected by this change as reviewers
  • I tested both happy and unhappy path of the functionality
  • I have made corresponding changes to the documentation

Copy link

trunk-io bot commented Jun 28, 2024

⏱️ 2h 22m total CI duration on this PR
Job Cumulative Duration Recent Runs
test-fuzzers 1h 12m 🟩🟩
forge-e2e-test / forge 41m 🟥🟥
rust-images / rust-all 12m 🟩
test-target-determinator 5m 🟩
rust-move-tests 4m 🟩
general-lints 2m 🟩
check-dynamic-deps 2m 🟩🟩
rust-move-tests 50s
semgrep/ci 50s 🟩🟩
file_change_determinator 19s 🟩🟩
file_change_determinator 11s 🟩
file_change_determinator 10s 🟩
permission-check 7s 🟩🟩
permission-check 6s 🟩🟩
permission-check 5s 🟩🟩
permission-check 4s 🟩🟩
determine-docker-build-metadata 3s 🟩
permission-check 2s 🟩

🚨 1 job on the last run was significantly faster/slower than expected

Job Duration vs 7d avg Delta
forge-e2e-test / forge 18m 14m +26%

settingsfeedbackdocs ⋅ learn more about trunk.io

@bchocho bchocho added the CICD:run-forge-e2e-perf Run the e2e perf forge only label Jun 28, 2024

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

Copy link
Contributor

❌ Forge suite realistic_env_max_load failure on 2878fc98ce0f2fd8076e8186ee387752e26003e6

two traffics test: inner traffic : committed: 328.5975808652673 txn/s, submitted: 12821.257449825021 txn/s, expired: 12492.659868959754 txn/s, latency: 16134.580281690141 ms, (p50: 10900 ms, p90: 38400 ms, p99: 49400 ms), latency samples: 2840
Test Failed: test NetworkLoadTest

Caused by:
    TPS requirement for inner traffic failed. Average TPS 328.5975808652673, minimum TPS requirement 7500. Full stats: committed: 328.5975808652673 txn/s, submitted: 12821.257449825021 txn/s, expired: 12492.659868959754 txn/s, latency: 16134.580281690141 ms, (p50: 10900 ms, p90: 38400 ms, p99: 49400 ms), latency samples: 2840

Stack backtrace:
   0: anyhow::error::<impl anyhow::Error>::msg
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/anyhow-1.0.79/src/error.rs:83:36
   1: aptos_forge::success_criteria::SuccessCriteriaChecker::check_tps
             at ./testsuite/forge/src/success_criteria.rs:467:13
   2: aptos_forge::success_criteria::SuccessCriteriaChecker::check_throughput
             at ./testsuite/forge/src/success_criteria.rs:520:9
   3: aptos_forge::success_criteria::SuccessCriteriaChecker::check_core_for_success
             at ./testsuite/forge/src/success_criteria.rs:251:9
   4: <aptos_testcases::two_traffics_test::TwoTrafficsTest as aptos_testcases::NetworkLoadTest>::test::{{closure}}
             at ./testsuite/testcases/src/two_traffics_test.rs:74:9
   5: <core::pin::Pin<P> as core::future::future::Future>::poll
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/future/future.rs:123:9
   6: <dyn aptos_testcases::NetworkLoadTest>::network_load_test::{{closure}}
             at ./testsuite/testcases/src/lib.rs:405:18
   7: <dyn aptos_testcases::NetworkLoadTest as aptos_forge::interface::network::NetworkTest>::run::{{closure}}
             at ./testsuite/testcases/src/lib.rs:268:14
   8: <core::pin::Pin<P> as core::future::future::Future>::poll
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/future/future.rs:123:9
   9: <aptos_testcases::two_traffics_test::TwoTrafficsTest as aptos_forge::interface::network::NetworkTest>::run::{{closure}}
             at ./testsuite/testcases/src/two_traffics_test.rs:88:47
  10: <core::pin::Pin<P> as core::future::future::Future>::poll
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/future/future.rs:123:9
  11: <aptos_testcases::CompositeNetworkTest as aptos_forge::interface::network::NetworkTest>::run::{{closure}}
             at ./testsuite/testcases/src/lib.rs:601:37
  12: <core::pin::Pin<P> as core::future::future::Future>::poll
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/future/future.rs:123:9
  13: tokio::runtime::park::CachedParkThread::block_on::{{closure}}
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/park.rs:282:63
  14: tokio::runtime::coop::with_budget
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/coop.rs:107:5
  15: tokio::runtime::coop::budget
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/coop.rs:73:5
  16: tokio::runtime::park::CachedParkThread::block_on
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/park.rs:282:31
  17: tokio::runtime::context::blocking::BlockingRegionGuard::block_on
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/context/blocking.rs:66:9
  18: tokio::runtime::handle::Handle::block_on::{{closure}}
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/handle.rs:310:22
  19: tokio::runtime::context::runtime::enter_runtime
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/context/runtime.rs:65:16
  20: tokio::runtime::handle::Handle::block_on
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/handle.rs:309:9
  21: aptos_forge::runner::Forge<F>::run::{{closure}}
             at ./testsuite/forge/src/runner.rs:603:49
  22: aptos_forge::runner::run_test
             at ./testsuite/forge/src/runner.rs:676:11
  23: aptos_forge::runner::Forge<F>::run
             at ./testsuite/forge/src/runner.rs:603:30
  24: forge::run_forge
             at ./testsuite/forge-cli/src/main.rs:429:11
  25: forge::main
             at ./testsuite/forge-cli/src/main.rs:355:21
  26: core::ops::function::FnOnce::call_once
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/ops/function.rs:250:5
  27: std::sys_common::backtrace::__rust_begin_short_backtrace
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/sys_common/backtrace.rs:155:18
  28: std::rt::lang_start::{{closure}}
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/rt.rs:166:18
  29: core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &F>::call_once
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/ops/function.rs:284:13
  30: std::panicking::try::do_call
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:552:40
  31: std::panicking::try
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:516:19
  32: std::panic::catch_unwind
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panic.rs:146:14
  33: std::rt::lang_start_internal::{{closure}}
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/rt.rs:148:48
  34: std::panicking::try::do_call
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:552:40
  35: std::panicking::try
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:516:19
  36: std::panic::catch_unwind
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panic.rs:146:14
  37: std::rt::lang_start_internal
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/rt.rs:148:20
  38: main
  39: __libc_start_main
  40: _start
Trailing Log Lines:
  35: std::panicking::try
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:516:19
  36: std::panic::catch_unwind
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panic.rs:146:14
  37: std::rt::lang_start_internal
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/rt.rs:148:20
  38: main
  39: __libc_start_main
  40: _start


Swarm logs can be found here: See fgi output for more information.
{"level":"INFO","source":{"package":"aptos_forge","file":"testsuite/forge/src/backend/k8s/cluster_helper.rs:292"},"thread_name":"main","hostname":"forge-e2e-pr-13856-1719595539-2878fc98ce0f2fd8076e8186ee387752e","timestamp":"2024-06-28T17:41:59.605821Z","message":"Deleting namespace forge-e2e-pr-13856: Some(NamespaceStatus { conditions: None, phase: Some(\"Terminating\") })"}
{"level":"INFO","source":{"package":"aptos_forge","file":"testsuite/forge/src/backend/k8s/cluster_helper.rs:400"},"thread_name":"main","hostname":"forge-e2e-pr-13856-1719595539-2878fc98ce0f2fd8076e8186ee387752e","timestamp":"2024-06-28T17:41:59.605845Z","message":"aptos-node resources for Forge removed in namespace: forge-e2e-pr-13856"}

failures:
    CompositeNetworkTest

test result: FAILED. 0 passed; 1 failed; 0 filtered out

Failed to run tests:
Tests Failed
Error: Tests Failed

Stack backtrace:
   0: anyhow::error::<impl anyhow::Error>::msg
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/anyhow-1.0.79/src/error.rs:83:36
   1: aptos_forge::runner::Forge<F>::run
             at ./testsuite/forge/src/runner.rs:628:13
   2: forge::run_forge
             at ./testsuite/forge-cli/src/main.rs:429:11
   3: forge::main
             at ./testsuite/forge-cli/src/main.rs:355:21
   4: core::ops::function::FnOnce::call_once
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/ops/function.rs:250:5
   5: std::sys_common::backtrace::__rust_begin_short_backtrace
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/sys_common/backtrace.rs:155:18
   6: std::rt::lang_start::{{closure}}
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/rt.rs:166:18
   7: core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &F>::call_once
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/ops/function.rs:284:13
   8: std::panicking::try::do_call
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:552:40
   9: std::panicking::try
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:516:19
  10: std::panic::catch_unwind
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panic.rs:146:14
  11: std::rt::lang_start_internal::{{closure}}
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/rt.rs:148:48
  12: std::panicking::try::do_call
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:552:40
  13: std::panicking::try
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:516:19
  14: std::panic::catch_unwind
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panic.rs:146:14
  15: std::rt::lang_start_internal
             at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/rt.rs:148:20
  16: main
  17: __libc_start_main
  18: _start
Debugging output:
NAME                                    READY   STATUS      RESTARTS      AGE
aptos-node-0-fullnode-eforge241-0       1/1     Running     0             14m
aptos-node-0-validator-0                1/1     Running     1 (41s ago)   14m
aptos-node-1-fullnode-eforge241-0       1/1     Running     0             14m
aptos-node-1-validator-0                1/1     Running     1 (35s ago)   14m
aptos-node-10-fullnode-eforge241-0      1/1     Running     0             14m
aptos-node-10-validator-0               1/1     Running     1 (40s ago)   14m
aptos-node-11-fullnode-eforge241-0      1/1     Running     0             14m
aptos-node-11-validator-0               1/1     Running     0             14m
aptos-node-12-fullnode-eforge241-0      1/1     Running     0             14m
aptos-node-12-validator-0               1/1     Running     1 (29s ago)   14m
aptos-node-13-fullnode-eforge241-0      1/1     Running     0             14m
aptos-node-13-validator-0               1/1     Running     1 (37s ago)   14m
aptos-node-14-fullnode-eforge241-0      1/1     Running     0             14m
aptos-node-14-validator-0               1/1     Running     1 (36s ago)   14m
aptos-node-15-fullnode-eforge241-0      1/1     Running     0             14m
aptos-node-15-validator-0               1/1     Running     0             14m
aptos-node-16-fullnode-eforge241-0      1/1     Running     0             14m
aptos-node-16-validator-0               1/1     Running     1 (33s ago)   14m
aptos-node-17-fullnode-eforge241-0      1/1     Running     0             14m
aptos-node-17-validator-0               1/1     Running     1 (19s ago)   14m
aptos-node-18-fullnode-eforge241-0      1/1     Running     0             14m
aptos-node-18-validator-0               1/1     Running     1 (30s ago)   14m
aptos-node-19-fullnode-eforge241-0      1/1     Running     0             14m
aptos-node-19-validator-0               1/1     Running     1 (28s ago)   14m
aptos-node-2-fullnode-eforge241-0       1/1     Running     0             14m
aptos-node-2-validator-0                1/1     Running     1 (38s ago)   14m
aptos-node-3-fullnode-eforge241-0       1/1     Running     0             14m
aptos-node-3-validator-0                1/1     Running     1 (30s ago)   14m
aptos-node-4-fullnode-eforge241-0       1/1     Running     0             14m
aptos-node-4-validator-0                1/1     Running     1 (26s ago)   14m
aptos-node-5-fullnode-eforge241-0       1/1     Running     0             14m
aptos-node-5-validator-0                1/1     Running     1 (30s ago)   14m
aptos-node-6-fullnode-eforge241-0       1/1     Running     0             14m
aptos-node-6-validator-0                1/1     Running     1 (38s ago)   14m
aptos-node-7-fullnode-eforge241-0       1/1     Running     0             14m
aptos-node-7-validator-0                1/1     Running     1 (42s ago)   14m
aptos-node-8-fullnode-eforge241-0       1/1     Running     0             14m
aptos-node-8-validator-0                1/1     Running     1 (28s ago)   14m
aptos-node-9-fullnode-eforge241-0       1/1     Running     0             14m
aptos-node-9-validator-0                1/1     Running     1 (39s ago)   14m
genesis-aptos-genesis-eforge241-2zhwf   0/1     Completed   0             15m

Copy link
Contributor

This issue is stale because it has been open 45 days with no activity. Remove the stale label, comment or push a commit - otherwise this will be closed in 15 days.

@github-actions github-actions bot added the Stale label Aug 13, 2024
@github-actions github-actions bot closed this Aug 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CICD:run-forge-e2e-perf Run the e2e perf forge only Stale
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants