Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

increase forge node startup timeout #15244

Merged
merged 1 commit into from
Nov 13, 2024
Merged

increase forge node startup timeout #15244

merged 1 commit into from
Nov 13, 2024

Conversation

aluon
Copy link
Contributor

@aluon aluon commented Nov 8, 2024

Description

Increase the startup timeout for nodes during Forge tests. Pods sometimes take longer to startup when they get rescheduled to a different node and attach a pvc. Also added a separate check for pod status that could help us distinguish between infra errors vs other issues

How Has This Been Tested?

https://github.com/aptos-labs/aptos-core/actions/runs/11750653293

Key Areas to Review

Type of Change

  • New feature
  • Bug fix
  • Breaking change
  • Performance improvement
  • Refactoring
  • Dependency update
  • Documentation update
  • Tests

Which Components or Systems Does This Change Impact?

  • Validator Node
  • Full Node (API, Indexer, etc.)
  • Move/Aptos Virtual Machine
  • Aptos Framework
  • Aptos CLI/SDK
  • Developer Infrastructure
  • Move Compiler
  • Other (specify)

Checklist

  • I have read and followed the CONTRIBUTING doc
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I identified and added all stakeholders and component owners affected by this change as reviewers
  • I tested both happy and unhappy path of the functionality
  • I have made corresponding changes to the documentation

@aluon aluon added the CICD:run-e2e-tests when this label is present github actions will run all land-blocking e2e tests from the PR label Nov 8, 2024
Copy link

trunk-io bot commented Nov 8, 2024

⏱️ 2h 51m total CI duration on this PR
Slowest 15 Jobs Cumulative Duration Recent Runs
execution-performance / test-target-determinator 34m 🟩🟩🟩🟩🟩 (+3 more)
test-target-determinator 27m 🟩🟩🟩🟩🟩 (+2 more)
check 24m 🟩🟩🟩🟩🟩 (+2 more)
rust-cargo-deny 12m 🟩🟩🟩🟩🟩 (+2 more)
check-dynamic-deps 12m 🟩🟩🟩🟩🟩 (+3 more)
fetch-last-released-docker-image-tag 11m 🟩🟩🟩🟩🟩 (+2 more)
rust-doc-tests 5m 🟩
rust-doc-tests 5m 🟩
rust-doc-tests 5m 🟩
rust-doc-tests 4m 🟥
rust-doc-tests 4m 🟥
rust-doc-tests 4m 🟥
semgrep/ci 3m 🟩🟩🟩🟩🟩 (+3 more)
general-lints 3m 🟩🟩🟩🟩🟩 (+2 more)
rust-move-tests 2m 🟩

settingsfeedbackdocs ⋅ learn more about trunk.io

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

@aluon aluon force-pushed the aluon/push-qyuwrmyyknus branch 2 times, most recently from b928f23 to 450812a Compare November 8, 2024 22:38
@aluon aluon marked this pull request as ready for review November 8, 2024 22:44
@aluon aluon enabled auto-merge (squash) November 12, 2024 23:08
@aluon aluon disabled auto-merge November 13, 2024 19:24

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

@aluon aluon enabled auto-merge (squash) November 13, 2024 21:03

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

Copy link
Contributor

✅ Forge suite realistic_env_max_load success on 6cdf5ff8d3b21d047e3df1b733a06290df6a05cb

two traffics test: inner traffic : committed: 14492.02 txn/s, latency: 2744.41 ms, (p50: 2700 ms, p70: 2700, p90: 2900 ms, p99: 3000 ms), latency samples: 5510200
two traffics test : committed: 100.02 txn/s, latency: 1640.38 ms, (p50: 1400 ms, p70: 1400, p90: 1500 ms, p99: 11500 ms), latency samples: 1720
Latency breakdown for phase 0: ["MempoolToBlockCreation: max: 2.077, avg: 1.577", "ConsensusProposalToOrdered: max: 0.318, avg: 0.293", "ConsensusOrderedToCommit: max: 0.355, avg: 0.345", "ConsensusProposalToCommit: max: 0.646, avg: 0.638"]
Max non-epoch-change gap was: 0 rounds at version 0 (avg 0.00) [limit 4], 0.91s no progress at version 2295668 (avg 0.20s) [limit 15].
Max epoch-change gap was: 0 rounds at version 0 (avg 0.00) [limit 4], 8.64s no progress at version 2295666 (avg 8.64s) [limit 15].
Test Ok

Copy link
Contributor

✅ Forge suite framework_upgrade success on ea6e45f0eee4b6da2ebf93b9b89d269d334fcf70 ==> 6cdf5ff8d3b21d047e3df1b733a06290df6a05cb

Compatibility test results for ea6e45f0eee4b6da2ebf93b9b89d269d334fcf70 ==> 6cdf5ff8d3b21d047e3df1b733a06290df6a05cb (PR)
Upgrade the nodes to version: 6cdf5ff8d3b21d047e3df1b733a06290df6a05cb
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 1486.28 txn/s, submitted: 1489.53 txn/s, failed submission: 3.26 txn/s, expired: 3.26 txn/s, latency: 2421.14 ms, (p50: 2300 ms, p70: 2400, p90: 3900 ms, p99: 5200 ms), latency samples: 118640
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 1291.46 txn/s, submitted: 1293.67 txn/s, failed submission: 2.21 txn/s, expired: 2.21 txn/s, latency: 2341.90 ms, (p50: 2100 ms, p70: 2400, p90: 3900 ms, p99: 5800 ms), latency samples: 116840
5. check swarm health
Compatibility test for ea6e45f0eee4b6da2ebf93b9b89d269d334fcf70 ==> 6cdf5ff8d3b21d047e3df1b733a06290df6a05cb passed
Upgrade the remaining nodes to version: 6cdf5ff8d3b21d047e3df1b733a06290df6a05cb
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 1283.74 txn/s, submitted: 1286.37 txn/s, failed submission: 2.63 txn/s, expired: 2.63 txn/s, latency: 2308.54 ms, (p50: 2100 ms, p70: 2400, p90: 3800 ms, p99: 6000 ms), latency samples: 117180
Test Ok

Copy link
Contributor

✅ Forge suite compat success on ea6e45f0eee4b6da2ebf93b9b89d269d334fcf70 ==> 6cdf5ff8d3b21d047e3df1b733a06290df6a05cb

Compatibility test results for ea6e45f0eee4b6da2ebf93b9b89d269d334fcf70 ==> 6cdf5ff8d3b21d047e3df1b733a06290df6a05cb (PR)
1. Check liveness of validators at old version: ea6e45f0eee4b6da2ebf93b9b89d269d334fcf70
compatibility::simple-validator-upgrade::liveness-check : committed: 13929.30 txn/s, latency: 2478.88 ms, (p50: 1900 ms, p70: 2100, p90: 4800 ms, p99: 9000 ms), latency samples: 497520
2. Upgrading first Validator to new version: 6cdf5ff8d3b21d047e3df1b733a06290df6a05cb
compatibility::simple-validator-upgrade::single-validator-upgrading : committed: 6018.51 txn/s, latency: 4780.95 ms, (p50: 5500 ms, p70: 5700, p90: 6000 ms, p99: 6100 ms), latency samples: 113740
compatibility::simple-validator-upgrade::single-validator-upgrade : committed: 6027.90 txn/s, latency: 5359.92 ms, (p50: 5600 ms, p70: 6000, p90: 6800 ms, p99: 7200 ms), latency samples: 213240
3. Upgrading rest of first batch to new version: 6cdf5ff8d3b21d047e3df1b733a06290df6a05cb
compatibility::simple-validator-upgrade::half-validator-upgrading : committed: 7833.62 txn/s, latency: 3706.76 ms, (p50: 4200 ms, p70: 4300, p90: 4400 ms, p99: 4500 ms), latency samples: 146180
compatibility::simple-validator-upgrade::half-validator-upgrade : committed: 7764.28 txn/s, latency: 4149.06 ms, (p50: 4400 ms, p70: 4500, p90: 5300 ms, p99: 5800 ms), latency samples: 256600
4. upgrading second batch to new version: 6cdf5ff8d3b21d047e3df1b733a06290df6a05cb
compatibility::simple-validator-upgrade::rest-validator-upgrading : committed: 10932.60 txn/s, latency: 2491.34 ms, (p50: 2300 ms, p70: 2900, p90: 4000 ms, p99: 4300 ms), latency samples: 187800
compatibility::simple-validator-upgrade::rest-validator-upgrade : committed: 10832.84 txn/s, latency: 2840.02 ms, (p50: 2300 ms, p70: 2800, p90: 5600 ms, p99: 8000 ms), latency samples: 355540
5. check swarm health
Compatibility test for ea6e45f0eee4b6da2ebf93b9b89d269d334fcf70 ==> 6cdf5ff8d3b21d047e3df1b733a06290df6a05cb passed
Test Ok

@aluon aluon merged commit aab8d5d into main Nov 13, 2024
93 checks passed
@aluon aluon deleted the aluon/push-qyuwrmyyknus branch November 13, 2024 21:36
vineethk pushed a commit that referenced this pull request Nov 13, 2024
vineethk pushed a commit that referenced this pull request Nov 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CICD:run-e2e-tests when this label is present github actions will run all land-blocking e2e tests from the PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants