Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

switch high-perf-docker to runs-on.com runners #13658

Merged
merged 1 commit into from
Jun 18, 2024
Merged

switch high-perf-docker to runs-on.com runners #13658

merged 1 commit into from
Jun 18, 2024

Conversation

geekflyer
Copy link
Contributor

@geekflyer geekflyer commented Jun 12, 2024

This switches high-perf-docker (c2-standard-60) to roughly equivalent runs-on.com runners. Specifically AWS machines of the c7 family (runs-on has some heuristics to pick the cheapest spot instance in the c7 family that satisfies the CPU requirement of 64 CPUs).

Test Plan

Ran several tests a bunch of times. Compared performance to baseline PR (#13698 ). Performance is roughly similar, usually a bit faster (5-15%), few times a bit slower (less than 5%).

Re spot preemption: Haven't seen a single spot preemption so far (added a log-based chart here https://aptoslabs.grafana.net/d/cb066e70-b378-4de7-aadb-79b43386e664/gha-self-hosted-runners?orgId=1&from=now-30d&to=now ).

Note: Some of the checks below are failing but to the best of my ability they are just broken and also fail in the baseline PR that uses high-perf-docker.

Something to note

Runs-on in some cases actually picks machines with higher CPU count (96+ etc.). This due its heuristics of picking the cheapest instance, (sometimes the bigger machines are cheaper, who knew).
Probably not a problem for most jobs, but may lead to some surprised for perf/benchmarking jobs.
I think its worth the tradeoff for these types of jobs anyways.

Copy link

trunk-io bot commented Jun 12, 2024

⏱️ 28h 13m total CI duration on this PR
Job Cumulative Duration Recent Runs
rust-smoke-coverage 4h 26m 🟩
test-fuzzers 2h 20m 🟩🟩🟩🟩
forge-e2e-test / forge 1h 35m 🟩🟩🟩🟩🟩 (+2 more)
rust-images / rust-all 1h 35m 🟩🟩🟩🟩 (+8 more)
forge-compat-test / forge 1h 25m 🟩🟩🟩🟩🟩 (+2 more)
run-tests-main-branch 1h 15m 🟩🟩🟩🟩🟩 (+12 more)
rust-smoke-coverage 1h 14m 🟩
execution-performance / test-target-determinator 53m 🟩🟩🟩🟩🟩 (+9 more)
check 48m 🟩🟩🟩🟩🟩 (+9 more)
test-target-determinator 47m 🟩🟩🟩🟩🟩 (+9 more)
cli-e2e-tests / run-cli-tests 44m 🟩🟩🟩🟩🟩 (+2 more)
general-lints 30m 🟩🟩🟩🟩🟩 (+11 more)
rust-smoke-tests 25m
rust-unit-tests 22m 🟥
rust-unit-tests 22m 🟥
check-dynamic-deps 19m 🟩🟩🟩🟩🟩 (+12 more)
rust-move-tests 14m 🟩
rust-move-tests 13m 🟩
rust-move-tests 13m 🟩
rust-move-tests 13m 🟩
rust-move-unit-coverage 13m 🟩
rust-move-unit-coverage 12m 🟩
rust-move-unit-coverage 12m 🟩
rust-move-unit-coverage 12m 🟩
rust-move-unit-coverage 11m 🟩
indexer-grpc-e2e-tests / test-indexer-grpc-docker-compose 11m 🟩🟩🟩🟩🟩 (+2 more)
prover-inconsistency-test 10m 🟥
prover-inconsistency-test 10m 🟥
prover-inconsistency-test 10m 🟥
prover-inconsistency-test 10m 🟥
prover-inconsistency-test 10m 🟥
prover-inconsistency-test 10m 🟥
prover-inconsistency-test 10m 🟥
prover-inconsistency-test 10m 🟥
prover-inconsistency-test 10m 🟥
prover-inconsistency-test 10m 🟥
prover-inconsistency-test 10m 🟥
prover-inconsistency-test 10m 🟥
rust-unit-coverage 9m 🟥
rust-move-tests 9m 🟩
rust-move-tests 8m 🟩
rust-unit-coverage 8m 🟥
rust-move-tests 8m 🟩
rust-move-tests 8m 🟩
rust-move-tests 8m 🟩
rust-move-unit-coverage 8m 🟩
rust-move-tests 8m 🟩
rust-move-tests 8m 🟩
rust-move-tests 8m 🟩
rust-move-unit-coverage 8m 🟩
rust-move-unit-coverage 8m 🟩
rust-move-tests 8m 🟩
rust-move-unit-coverage 8m 🟩
rust-move-unit-coverage 8m 🟩
rust-move-unit-coverage 8m 🟩
run-all-circuit-tests 8m 🟩
run-all-circuit-tests 7m 🟩
run-all-circuit-tests 7m 🟩
run-all-circuit-tests 7m 🟩
rust-move-unit-coverage 7m 🟩
run-all-circuit-tests 7m 🟩
node-api-compatibility-tests / node-api-compatibility-tests 7m 🟥🟩🟩🟩🟩 (+2 more)
semgrep/ci 7m 🟩🟩🟩🟩🟩 (+12 more)
rust-move-tests 6m
rust-move-tests 6m
rust-move-unit-coverage 6m
prover-inconsistency-test 5m
rust-unit-tests 5m
run-all-circuit-tests 4m 🟩
run-all-circuit-tests 4m 🟩
run-all-circuit-tests 4m 🟩
run-all-circuit-tests 4m 🟩
run-all-circuit-tests 4m 🟩
run-all-circuit-tests 4m 🟩
run-all-circuit-tests 4m 🟩
run-all-circuit-tests 4m 🟩
file_change_determinator 3m 🟩🟩🟩🟩🟩 (+12 more)
file_change_determinator 3m 🟩🟩🟩🟩🟩 (+12 more)
build 3m 🟥
build 3m 🟥
file_change_determinator 2m 🟩🟩🟩🟩🟩 (+9 more)
rust-move-tests 2m
build 2m 🟥
build 2m 🟥
build 2m 🟥
build 2m 🟥
build 2m 🟥
build 2m 🟥
build 2m 🟥
build 2m 🟥
build 2m 🟥
build 2m 🟥
build 2m 🟥
build 2m 🟥
execution-performance / single-node-performance 2m 🟩🟩🟩🟩🟩 (+9 more)
permission-check 51s 🟩🟩🟩🟩🟩 (+12 more)
permission-check 47s 🟩🟩🟩🟩🟩 (+12 more)
permission-check 45s 🟩🟩🟩🟩🟩 (+12 more)
permission-check 44s 🟩🟩🟩🟩🟩 (+9 more)
permission-check 43s 🟩🟩🟩🟩🟩 (+12 more)
determine-docker-build-metadata 35s 🟩🟩🟩🟩🟩 (+9 more)
upload-to-codecov 16s 🟥🟥
run-all-circuit-tests 1s
rust-move-unit-coverage 1s
rust-move-tests 1s
prover-inconsistency-test 1s

🚨 1 job on the last run was significantly faster/slower than expected

Job Duration vs 7d avg Delta
run-tests-main-branch 6m 5m +26%

settingsfeedbackdocs ⋅ learn more about trunk.io

Copy link

codecov bot commented Jun 12, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 58.9%. Comparing base (061db31) to head (0686419).
Report is 1 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff             @@
##             main   #13658       +/-   ##
===========================================
- Coverage    70.8%    58.9%    -11.9%     
===========================================
  Files        2299      817     -1482     
  Lines      451793   195823   -255970     
===========================================
- Hits       319875   115392   -204483     
+ Misses     131918    80431    -51487     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@geekflyer geekflyer force-pushed the runs-on branch 2 times, most recently from 1ebb379 to 064b008 Compare June 14, 2024 03:42
@geekflyer geekflyer added CICD:run-e2e-tests when this label is present github actions will run all land-blocking e2e tests from the PR CICD:run-coverage run tests with test coverage instrumentation labels Jun 14, 2024

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

@geekflyer geekflyer force-pushed the runs-on branch 3 times, most recently from 5e2a4c1 to 3214c47 Compare June 15, 2024 10:01

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

@geekflyer geekflyer force-pushed the runs-on branch 4 times, most recently from b46ffc9 to 22b89f2 Compare June 16, 2024 07:22

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

@geekflyer geekflyer changed the title switch to runs-on.com runners switch high-perf-docker to runs-on.com runners Jun 18, 2024
@geekflyer geekflyer requested a review from ibalajiarun June 18, 2024 07:55
@geekflyer geekflyer marked this pull request as ready for review June 18, 2024 07:58
@geekflyer geekflyer requested a review from a team as a code owner June 18, 2024 07:58

This comment has been minimized.

This comment has been minimized.

Copy link
Contributor

✅ Forge suite compat success on a68e71c05caebf01504d4499110f3fba213fb53d ==> 0686419e48ab6612468a09595f0d0c73a457f5ce

Compatibility test results for a68e71c05caebf01504d4499110f3fba213fb53d ==> 0686419e48ab6612468a09595f0d0c73a457f5ce (PR)
1. Check liveness of validators at old version: a68e71c05caebf01504d4499110f3fba213fb53d
compatibility::simple-validator-upgrade::liveness-check : committed: 8359.613818878674 txn/s, latency: 3938.6850707004273 ms, (p50: 2700 ms, p90: 6900 ms, p99: 19300 ms), latency samples: 304100
2. Upgrading first Validator to new version: 0686419e48ab6612468a09595f0d0c73a457f5ce
compatibility::simple-validator-upgrade::single-validator-upgrade : committed: 3003.0044655720594 txn/s, latency: 10349.235604378173 ms, (p50: 10700 ms, p90: 14200 ms, p99: 14500 ms), latency samples: 126080
3. Upgrading rest of first batch to new version: 0686419e48ab6612468a09595f0d0c73a457f5ce
compatibility::simple-validator-upgrade::half-validator-upgrade : committed: 3379.008874254004 txn/s, latency: 9169.159594634004 ms, (p50: 9400 ms, p90: 14000 ms, p99: 14300 ms), latency samples: 137160
4. upgrading second batch to new version: 0686419e48ab6612468a09595f0d0c73a457f5ce
compatibility::simple-validator-upgrade::rest-validator-upgrade : committed: 4731.03378043793 txn/s, latency: 6802.845256556442 ms, (p50: 5100 ms, p90: 13400 ms, p99: 17700 ms), latency samples: 175400
5. check swarm health
Compatibility test for a68e71c05caebf01504d4499110f3fba213fb53d ==> 0686419e48ab6612468a09595f0d0c73a457f5ce passed
Test Ok

Copy link
Contributor

✅ Forge suite realistic_env_max_load success on 0686419e48ab6612468a09595f0d0c73a457f5ce

two traffics test: inner traffic : committed: 8724.81672218612 txn/s, latency: 4494.159594391641 ms, (p50: 4500 ms, p90: 5100 ms, p99: 10200 ms), latency samples: 3768660
two traffics test : committed: 99.90328013269485 txn/s, latency: 2149.007222222222 ms, (p50: 2000 ms, p90: 2200 ms, p99: 6500 ms), latency samples: 1800
Latency breakdown for phase 0: ["QsBatchToPos: max: 0.215, avg: 0.209", "QsPosToProposal: max: 0.251, avg: 0.240", "ConsensusProposalToOrdered: max: 0.312, avg: 0.287", "ConsensusOrderedToCommit: max: 0.369, avg: 0.356", "ConsensusProposalToCommit: max: 0.654, avg: 0.643"]
Max round gap was 1 [limit 4] at version 1872855. Max no progress secs was 5.000799 [limit 15] at version 1872855.
Test Ok

@@ -1,4 +1,4 @@
self-hosted-runner:
# Labels of self-hosted runners in array of string
labels:
- high-perf-docker
- runs-on,cpu=64,family=c7,hdd=500,image=ubuntu22-full-x64,run-id=${{ github.run_id }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessary ? It looks like a label not a runner spec.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is just to tell the actionlint linter not to complain about this in the action yamls.

Copy link
Contributor

@ibalajiarun ibalajiarun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should build and use a custom AMI soon with the required tools.

@geekflyer geekflyer merged commit 916dba1 into main Jun 18, 2024
49 of 52 checks passed
@geekflyer geekflyer deleted the runs-on branch June 18, 2024 17:29
geekflyer added a commit that referenced this pull request Jun 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CICD:run-all-unit-tests Runs all unit tests CICD:run-coverage run tests with test coverage instrumentation CICD:run-e2e-tests when this label is present github actions will run all land-blocking e2e tests from the PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants