Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[preview] [aptos channel] always reset counters on new channel #12346

Merged
merged 3 commits into from
Mar 2, 2024

Conversation

bchocho
Copy link
Contributor

@bchocho bchocho commented Mar 2, 2024

Description

The consensus queues are recreated per epoch. Without resetting the counters, the metrics show as increasing number of "stuck" messages, which are simply messages that were in the queue during the epoch change (and hence were not dequeued or dropped).

@bchocho bchocho added the CICD:run-forge-e2e-perf Run the e2e perf forge only label Mar 2, 2024
Copy link

trunk-io bot commented Mar 2, 2024

⏱️ 5h 59m total CI duration on this PR
Job Cumulative Duration Recent Runs
rust-smoke-coverage 2h 56m 🟩
windows-build 1h 7m 🟩🟩🟩
rust-unit-tests 36m 🟥🟥
forge-e2e-test / forge 16m 🟥
rust-images / rust-all 13m 🟩
rust-lints 12m 🟥🟥
rust-unit-coverage 10m 🟥
run-tests-main-branch 8m 🟩🟩
check 8m 🟩🟩
check-dynamic-deps 5m 🟩🟩🟩
general-lints 4m 🟩🟩
semgrep/ci 1m 🟩🟩🟩
file_change_determinator 19s 🟩🟩
file_change_determinator 17s 🟩🟩
file_change_determinator 11s 🟩
permission-check 11s 🟩🟩🟩
permission-check 10s 🟩🟩🟩
permission-check 5s 🟩🟩
permission-check 5s 🟩🟩
permission-check 3s 🟩
determine-docker-build-metadata 2s 🟩

🚨 1 job on the last run was significantly faster/slower than expected

Job Duration vs 7d avg Delta
windows-build 27m 20m +36%

settingsfeedbackdocs ⋅ learn more about trunk.io

This comment has been minimized.

Copy link
Contributor

github-actions bot commented Mar 2, 2024

❌ Forge suite realistic_env_max_load failure on 694c3a953330ff9d6ca4922904741bb54127c9dc

two traffics test: inner traffic : committed: 23292 txn/s, latency: 14071 ms, (p50: 12600 ms, p90: 21200 ms, p99: 34300 ms), latency samples: 10248540
two traffics test : committed: 100 txn/s, latency: 3924 ms, (p50: 3600 ms, p90: 4700 ms, p99: 11000 ms), latency samples: 1920
Latency breakdown for phase 0: ["QsBatchToPos: max: 0.395, avg: 0.248", "QsPosToProposal: max: 2.365, avg: 1.709", "ConsensusProposalToOrdered: max: 0.558, avg: 0.535", "ConsensusOrderedToCommit: max: 1.049, avg: 0.910", "ConsensusProposalToCommit: max: 1.603, avg: 1.444"]
Test Failed: check for success

Caused by:
    Failed latency check, for ["P50 latency is 3.6s and exceeds limit of 3.4s", "P90 latency is 4.7s and exceeds limit of 4.5s"]

Stack backtrace:
   0: anyhow::error::<impl anyhow::Error>::msg
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/anyhow-1.0.79/src/error.rs:83:36
   1: aptos_forge::success_criteria::SuccessCriteriaChecker::check_latency
             at ./testsuite/forge/src/success_criteria.rs:531:13
   2: aptos_forge::success_criteria::SuccessCriteriaChecker::check_for_success::{{closure}}
             at ./testsuite/forge/src/success_criteria.rs:271:9
   3: tokio::runtime::park::CachedParkThread::block_on::{{closure}}
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/park.rs:282:63
   4: tokio::runtime::coop::with_budget
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/coop.rs:107:5
   5: tokio::runtime::coop::budget
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/coop.rs:73:5
   6: tokio::runtime::park::CachedParkThread::block_on
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/park.rs:282:31
   7: tokio::runtime::context::blocking::BlockingRegionGuard::block_on
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/context/blocking.rs:66:9
   8: tokio::runtime::scheduler::multi_thread::MultiThread::block_on::{{closure}}
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/scheduler/multi_thread/mod.rs:87:13
   9: tokio::runtime::context::runtime::enter_runtime
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/context/runtime.rs:65:16
  10: tokio::runtime::scheduler::multi_thread::MultiThread::block_on
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/scheduler/multi_thread/mod.rs:86:9
  11: tokio::runtime::runtime::Runtime::block_on
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.1/src/runtime/runtime.rs:350:50
  12: aptos_forge::interface::network::NetworkContext::check_for_success
             at ./testsuite/forge/src/interface/network.rs:71:9
  13: <dyn aptos_testcases::NetworkLoadTest as aptos_forge::interface::network::NetworkTest>::run
             at ./testsuite/testcases/src/lib.rs:229:13
  14: <aptos_testcases::CompositeNetworkTest as aptos_forge::interface::network::NetworkTest>::run
             at ./testsuite/testcases/src/lib.rs:499:9
  15: aptos_forge::runner::Forge<F>::run::{{closure}}
             at ./testsuite/forge/src/runner.rs:598:42
  16: aptos_forge::runner::run_test
             at ./testsuite/forge/src/runner.rs:666:11
  17: aptos_forge::runner::Forge<F>::run
             at ./testsuite/forge/src/runner.rs:598:30
  18: forge::run_forge
             at ./testsuite/forge-cli/src/main.rs:425:11
  19: forge::main
             at ./testsuite/forge-cli/src/main.rs:351:21
  20: core::ops::function::FnOnce::call_once
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/core/src/ops/function.rs:250:5
  21: std::sys_common::backtrace::__rust_begin_short_backtrace
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/sys_common/backtrace.rs:154:18
  22: std::rt::lang_start::{{closure}}
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/rt.rs:167:18
  23: core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &F>::call_once
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/core/src/ops/function.rs:284:13
  24: std::panicking::try::do_call
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panicking.rs:552:40
  25: std::panicking::try
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panicking.rs:516:19
  26: std::panic::catch_unwind
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panic.rs:142:14
  27: std::rt::lang_start_internal::{{closure}}
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/rt.rs:148:48
  28: std::panicking::try::do_call
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panicking.rs:552:40
  29: std::panicking::try
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panicking.rs:516:19
  30: std::panic::catch_unwind
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panic.rs:142:14
  31: std::rt::lang_start_internal
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/rt.rs:148:20
  32: std::rt::lang_start
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/rt.rs:166:17
  33: __libc_start_main
  34: _start
Trailing Log Lines:
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panicking.rs:516:19
  30: std::panic::catch_unwind
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panic.rs:142:14
  31: std::rt::lang_start_internal
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/rt.rs:148:20
  32: std::rt::lang_start
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/rt.rs:166:17
  33: __libc_start_main
  34: _start


Swarm logs can be found here: See fgi output for more information.
{"level":"INFO","source":{"package":"aptos_forge","file":"testsuite/forge/src/backend/k8s/cluster_helper.rs:292"},"thread_name":"main","hostname":"forge-e2e-pr-12346-1709366764-694c3a953330ff9d6ca4922904741bb54","timestamp":"2024-03-02T08:20:16.191308Z","message":"Deleting namespace forge-e2e-pr-12346: Some(NamespaceStatus { conditions: None, phase: Some(\"Terminating\") })"}
{"level":"INFO","source":{"package":"aptos_forge","file":"testsuite/forge/src/backend/k8s/cluster_helper.rs:400"},"thread_name":"main","hostname":"forge-e2e-pr-12346-1709366764-694c3a953330ff9d6ca4922904741bb54","timestamp":"2024-03-02T08:20:16.191341Z","message":"aptos-node resources for Forge removed in namespace: forge-e2e-pr-12346"}
Failed to run tests:
Tests Failed

failures:
    CompositeNetworkTest

test result: FAILED. 0 passed; 1 failed; 0 filtered out

Error: Tests Failed

Stack backtrace:
   0: anyhow::error::<impl anyhow::Error>::msg
             at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/anyhow-1.0.79/src/error.rs:83:36
   1: aptos_forge::runner::Forge<F>::run
             at ./testsuite/forge/src/runner.rs:618:13
   2: forge::run_forge
             at ./testsuite/forge-cli/src/main.rs:425:11
   3: forge::main
             at ./testsuite/forge-cli/src/main.rs:351:21
   4: core::ops::function::FnOnce::call_once
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/core/src/ops/function.rs:250:5
   5: std::sys_common::backtrace::__rust_begin_short_backtrace
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/sys_common/backtrace.rs:154:18
   6: std::rt::lang_start::{{closure}}
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/rt.rs:167:18
   7: core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &F>::call_once
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/core/src/ops/function.rs:284:13
   8: std::panicking::try::do_call
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panicking.rs:552:40
   9: std::panicking::try
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panicking.rs:516:19
  10: std::panic::catch_unwind
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panic.rs:142:14
  11: std::rt::lang_start_internal::{{closure}}
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/rt.rs:148:48
  12: std::panicking::try::do_call
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panicking.rs:552:40
  13: std::panicking::try
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panicking.rs:516:19
  14: std::panic::catch_unwind
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panic.rs:142:14
  15: std::rt::lang_start_internal
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/rt.rs:148:20
  16: std::rt::lang_start
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/rt.rs:166:17
  17: __libc_start_main
  18: _start
Debugging output:
NAME                                    READY   STATUS      RESTARTS   AGE
aptos-node-0-fullnode-eforge253-0       1/1     Running     0          12m
aptos-node-0-validator-0                1/1     Running     0          12m
aptos-node-1-fullnode-eforge253-0       1/1     Running     0          12m
aptos-node-1-validator-0                1/1     Running     0          13m
aptos-node-10-validator-0               1/1     Running     0          12m
aptos-node-11-validator-0               1/1     Running     0          12m
aptos-node-12-validator-0               1/1     Running     0          12m
aptos-node-13-validator-0               1/1     Running     0          12m
aptos-node-14-validator-0               1/1     Running     0          12m
aptos-node-15-validator-0               1/1     Running     0          13m
aptos-node-16-validator-0               1/1     Running     0          12m
aptos-node-17-validator-0               1/1     Running     0          12m
aptos-node-18-validator-0               1/1     Running     0          13m
aptos-node-19-validator-0               1/1     Running     0          13m
aptos-node-2-fullnode-eforge253-0       1/1     Running     0          13m
aptos-node-2-validator-0                1/1     Running     0          12m
aptos-node-20-validator-0               1/1     Running     0          12m
aptos-node-21-validator-0               1/1     Running     0          12m
aptos-node-22-validator-0               1/1     Running     0          12m
aptos-node-23-validator-0               1/1     Running     0          12m
aptos-node-24-validator-0               1/1     Running     0          12m
aptos-node-25-validator-0               1/1     Running     0          12m
aptos-node-26-validator-0               1/1     Running     0          12m
aptos-node-27-validator-0               1/1     Running     0          12m
aptos-node-28-validator-0               1/1     Running     0          13m
aptos-node-29-validator-0               1/1     Running     0          12m
aptos-node-3-fullnode-eforge253-0       1/1     Running     0          12m
aptos-node-3-validator-0                1/1     Running     0          13m
aptos-node-30-validator-0               1/1     Running     0          12m
aptos-node-31-validator-0               1/1     Running     0          12m
aptos-node-32-validator-0               1/1     Running     0          12m
aptos-node-33-validator-0               1/1     Running     0          12m
aptos-node-34-validator-0               1/1     Running     0          12m
aptos-node-35-validator-0               1/1     Running     0          12m
aptos-node-36-validator-0               1/1     Running     0          12m
aptos-node-37-validator-0               1/1     Running     0          12m
aptos-node-38-validator-0               1/1     Running     0          12m
aptos-node-39-validator-0               1/1     Running     0          12m
aptos-node-4-fullnode-eforge253-0       1/1     Running     0          12m
aptos-node-4-validator-0                1/1     Running     0          13m
aptos-node-40-validator-0               1/1     Running     0          12m
aptos-node-41-validator-0               1/1     Running     0          12m
aptos-node-42-validator-0               1/1     Running     0          12m
aptos-node-43-validator-0               1/1     Running     0          12m
aptos-node-44-validator-0               1/1     Running     0          13m
aptos-node-45-validator-0               1/1     Running     0          13m
aptos-node-46-validator-0               1/1     Running     0          12m
aptos-node-47-validator-0               1/1     Running     0          12m
aptos-node-48-validator-0               1/1     Running     0          12m
aptos-node-49-validator-0               1/1     Running     0          12m
aptos-node-5-fullnode-eforge253-0       1/1     Running     0          12m
aptos-node-5-validator-0                1/1     Running     0          13m
aptos-node-50-validator-0               1/1     Running     0          12m
aptos-node-51-validator-0               1/1     Running     0          12m
aptos-node-52-validator-0               1/1     Running     0          12m
aptos-node-53-validator-0               1/1     Running     0          13m
aptos-node-54-validator-0               1/1     Running     0          12m
aptos-node-55-validator-0               1/1     Running     0          12m
aptos-node-56-validator-0               1/1     Running     0          12m
aptos-node-57-validator-0               1/1     Running     0          12m
aptos-node-58-validator-0               1/1     Running     0          12m
aptos-node-59-validator-0               1/1     Running     0          12m
aptos-node-6-fullnode-eforge253-0       1/1     Running     0          12m
aptos-node-6-validator-0                1/1     Running     0          12m
aptos-node-60-validator-0               1/1     Running     0          12m
aptos-node-61-validator-0               1/1     Running     0          12m
aptos-node-62-validator-0               1/1     Running     0          12m
aptos-node-63-validator-0               1/1     Running     0          12m
aptos-node-64-validator-0               1/1     Running     0          12m
aptos-node-65-validator-0               1/1     Running     0          12m
aptos-node-66-validator-0               1/1     Running     0          12m
aptos-node-67-validator-0               1/1     Running     0          12m
aptos-node-68-validator-0               1/1     Running     0          12m
aptos-node-69-validator-0               1/1     Running     0          12m
aptos-node-7-fullnode-eforge253-0       1/1     Running     0          12m
aptos-node-7-validator-0                1/1     Running     0          12m
aptos-node-70-validator-0               1/1     Running     0          12m
aptos-node-71-validator-0               1/1     Running     0          12m
aptos-node-72-validator-0               1/1     Running     0          12m
aptos-node-73-validator-0               1/1     Running     0          12m
aptos-node-74-validator-0               1/1     Running     0          12m
aptos-node-75-validator-0               1/1     Running     0          12m
aptos-node-76-validator-0               1/1     Running     0          12m
aptos-node-77-validator-0               1/1     Running     0          12m
aptos-node-78-validator-0               1/1     Running     0          12m
aptos-node-79-validator-0               1/1     Running     0          12m
aptos-node-8-fullnode-eforge253-0       1/1     Running     0          13m
aptos-node-8-validator-0                1/1     Running     0          12m
aptos-node-80-validator-0               1/1     Running     0          12m
aptos-node-81-validator-0               1/1     Running     0          12m
aptos-node-82-validator-0               1/1     Running     0          12m
aptos-node-83-validator-0               1/1     Running     0          13m
aptos-node-84-validator-0               1/1     Running     0          12m
aptos-node-85-validator-0               1/1     Running     0          12m
aptos-node-86-validator-0               1/1     Running     0          12m
aptos-node-87-validator-0               1/1     Running     0          12m
aptos-node-88-validator-0               1/1     Running     0          12m
aptos-node-89-validator-0               1/1     Running     0          12m
aptos-node-9-fullnode-eforge253-0       1/1     Running     0          12m
aptos-node-9-validator-0                1/1     Running     0          12m
aptos-node-90-validator-0               1/1     Running     0          12m
aptos-node-91-validator-0               1/1     Running     0          12m
aptos-node-92-validator-0               1/1     Running     0          12m
aptos-node-93-validator-0               1/1     Running     0          12m
aptos-node-94-validator-0               1/1     Running     0          12m
aptos-node-95-validator-0               1/1     Running     0          12m
aptos-node-96-validator-0               1/1     Running     0          12m
aptos-node-97-validator-0               1/1     Running     0          12m
aptos-node-98-validator-0               1/1     Running     0          12m
aptos-node-99-validator-0               1/1     Running     0          13m
genesis-aptos-genesis-eforge253-f84d7   0/1     Completed   0          13m

@bchocho bchocho removed the CICD:run-forge-e2e-perf Run the e2e perf forge only label Mar 2, 2024
@bchocho bchocho changed the title Brian/reset queue counters [aptos channel] always reset counters on new channel Mar 2, 2024
@bchocho bchocho changed the title [aptos channel] always reset counters on new channel [preview] [aptos channel] always reset counters on new channel Mar 2, 2024
@bchocho bchocho marked this pull request as ready for review March 2, 2024 17:19
@bchocho bchocho merged commit 1e729d0 into preview Mar 2, 2024
37 of 42 checks passed
@bchocho bchocho deleted the brian/reset-queue-counters branch March 2, 2024 17:19
bchocho added a commit that referenced this pull request Mar 7, 2024
The consensus queues are recreated per epoch. Without resetting the counters, the metrics show as increasing number of "stuck" messages, which are simply messages that were in the queue during the epoch change (and hence were not dequeued or dropped).
bchocho added a commit that referenced this pull request Apr 3, 2024
The consensus queues are recreated per epoch. Without resetting the counters, the metrics show as increasing number of "stuck" messages, which are simply messages that were in the queue during the epoch change (and hence were not dequeued or dropped).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant