switch high-perf-docker to runs-on.com runners #13658

geekflyer · 2024-06-12T11:00:18Z

This switches high-perf-docker (c2-standard-60) to roughly equivalent runs-on.com runners. Specifically AWS machines of the c7 family (runs-on has some heuristics to pick the cheapest spot instance in the c7 family that satisfies the CPU requirement of 64 CPUs).

Test Plan

Ran several tests a bunch of times. Compared performance to baseline PR (#13698 ). Performance is roughly similar, usually a bit faster (5-15%), few times a bit slower (less than 5%).

Re spot preemption: Haven't seen a single spot preemption so far (added a log-based chart here https://aptoslabs.grafana.net/d/cb066e70-b378-4de7-aadb-79b43386e664/gha-self-hosted-runners?orgId=1&from=now-30d&to=now ).

Note: Some of the checks below are failing but to the best of my ability they are just broken and also fail in the baseline PR that uses high-perf-docker.

Something to note

Runs-on in some cases actually picks machines with higher CPU count (96+ etc.). This due its heuristics of picking the cheapest instance, (sometimes the bigger machines are cheaper, who knew).
Probably not a problem for most jobs, but may lead to some surprised for perf/benchmarking jobs.
I think its worth the tradeoff for these types of jobs anyways.

trunk-io · 2024-06-12T11:00:22Z

⏱️ 28h 13m total CI duration on this PR

Job	Cumulative Duration	Recent Runs
rust-smoke-coverage	4h 26m	🟩
test-fuzzers	2h 20m	🟩 🟩 🟩 🟩
forge-e2e-test / forge	1h 35m	🟩 🟩 🟩 🟩 🟩 (+2 more)
rust-images / rust-all	1h 35m	🟩 🟩 ⬜ 🟩 🟩 (+8 more)
forge-compat-test / forge	1h 25m	🟩 🟩 🟩 🟩 🟩 (+2 more)
run-tests-main-branch	1h 15m	🟩 🟩 🟩 🟩 🟩 (+12 more)
rust-smoke-coverage	1h 14m	🟩
execution-performance / test-target-determinator	53m	🟩 🟩 🟩 🟩 🟩 (+9 more)
check	48m	🟩 🟩 🟩 🟩 🟩 (+9 more)
test-target-determinator	47m	🟩 🟩 🟩 🟩 🟩 (+9 more)
cli-e2e-tests / run-cli-tests	44m	🟩 🟩 🟩 🟩 🟩 (+2 more)
general-lints	30m	🟩 🟩 🟩 🟩 🟩 (+11 more)
rust-smoke-tests	25m	⬜
rust-unit-tests	22m	🟥
rust-unit-tests	22m	🟥
check-dynamic-deps	19m	🟩 🟩 🟩 🟩 🟩 (+12 more)
rust-move-tests	14m	🟩
rust-move-tests	13m	🟩
rust-move-tests	13m	🟩
rust-move-tests	13m	🟩
rust-move-unit-coverage	13m	🟩
rust-move-unit-coverage	12m	🟩
rust-move-unit-coverage	12m	🟩
rust-move-unit-coverage	12m	🟩
rust-move-unit-coverage	11m	🟩
indexer-grpc-e2e-tests / test-indexer-grpc-docker-compose	11m	🟩 🟩 🟩 🟩 🟩 (+2 more)
prover-inconsistency-test	10m	🟥
prover-inconsistency-test	10m	🟥
prover-inconsistency-test	10m	🟥
prover-inconsistency-test	10m	🟥
prover-inconsistency-test	10m	🟥
prover-inconsistency-test	10m	🟥
prover-inconsistency-test	10m	🟥
prover-inconsistency-test	10m	🟥
prover-inconsistency-test	10m	🟥
prover-inconsistency-test	10m	🟥
prover-inconsistency-test	10m	🟥
prover-inconsistency-test	10m	🟥
rust-unit-coverage	9m	🟥
rust-move-tests	9m	🟩
rust-move-tests	8m	🟩
rust-unit-coverage	8m	🟥
rust-move-tests	8m	🟩
rust-move-tests	8m	🟩
rust-move-tests	8m	🟩
rust-move-unit-coverage	8m	🟩
rust-move-tests	8m	🟩
rust-move-tests	8m	🟩
rust-move-tests	8m	🟩
rust-move-unit-coverage	8m	🟩
rust-move-unit-coverage	8m	🟩
rust-move-tests	8m	🟩
rust-move-unit-coverage	8m	🟩
rust-move-unit-coverage	8m	🟩
rust-move-unit-coverage	8m	🟩
run-all-circuit-tests	8m	🟩
run-all-circuit-tests	7m	🟩
run-all-circuit-tests	7m	🟩
run-all-circuit-tests	7m	🟩
rust-move-unit-coverage	7m	🟩
run-all-circuit-tests	7m	🟩
node-api-compatibility-tests / node-api-compatibility-tests	7m	🟥 🟩 🟩 🟩 🟩 (+2 more)
semgrep/ci	7m	🟩 🟩 🟩 🟩 🟩 (+12 more)
rust-move-tests	6m	⬜
rust-move-tests	6m	⬜
rust-move-unit-coverage	6m	⬜
prover-inconsistency-test	5m	⬜
rust-unit-tests	5m	⬜
run-all-circuit-tests	4m	🟩
run-all-circuit-tests	4m	🟩
run-all-circuit-tests	4m	🟩
run-all-circuit-tests	4m	🟩
run-all-circuit-tests	4m	🟩
run-all-circuit-tests	4m	🟩
run-all-circuit-tests	4m	🟩
run-all-circuit-tests	4m	🟩
file_change_determinator	3m	🟩 🟩 🟩 🟩 🟩 (+12 more)
file_change_determinator	3m	🟩 🟩 🟩 🟩 🟩 (+12 more)
build	3m	🟥
build	3m	🟥
file_change_determinator	2m	🟩 🟩 🟩 🟩 🟩 (+9 more)
rust-move-tests	2m	⬜
build	2m	🟥
build	2m	🟥
build	2m	🟥
build	2m	🟥
build	2m	🟥
build	2m	🟥
build	2m	🟥
build	2m	🟥
build	2m	🟥
build	2m	🟥
build	2m	🟥
build	2m	🟥
execution-performance / single-node-performance	2m	🟩 🟩 🟩 🟩 🟩 (+9 more)
permission-check	51s	🟩 🟩 🟩 🟩 🟩 (+12 more)
permission-check	47s	🟩 🟩 🟩 🟩 🟩 (+12 more)
permission-check	45s	🟩 🟩 🟩 🟩 🟩 (+12 more)
permission-check	44s	🟩 🟩 🟩 🟩 🟩 (+9 more)
permission-check	43s	🟩 🟩 🟩 🟩 🟩 (+12 more)
determine-docker-build-metadata	35s	🟩 🟩 🟩 🟩 🟩 (+9 more)
upload-to-codecov	16s	🟥 🟥
run-all-circuit-tests	1s	⬜
rust-move-unit-coverage	1s	⬜
rust-move-tests	1s	⬜
prover-inconsistency-test	1s	⬜

🚨 1 job on the last run was significantly faster/slower than expected

Job	Duration	vs 7d avg	Delta
run-tests-main-branch	6m	5m

_{settings ⋅ feedback ⋅ docs ⋅ learn more about trunk.io}

codecov · 2024-06-12T11:11:34Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 58.9%. Comparing base (061db31) to head (0686419).
Report is 1 commits behind head on main.

Additional details and impacted files

@@             Coverage Diff             @@
##             main   #13658       +/-   ##
===========================================
- Coverage    70.8%    58.9%    -11.9%     
===========================================
  Files        2299      817     -1482     
  Lines      451793   195823   -255970     
===========================================
- Hits       319875   115392   -204483     
+ Misses     131918    80431    -51487

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

github-actions · 2024-06-18T08:16:01Z

✅ Forge suite `compat` success on `a68e71c05caebf01504d4499110f3fba213fb53d` ==> `0686419e48ab6612468a09595f0d0c73a457f5ce`

Compatibility test results for a68e71c05caebf01504d4499110f3fba213fb53d ==> 0686419e48ab6612468a09595f0d0c73a457f5ce (PR)
1. Check liveness of validators at old version: a68e71c05caebf01504d4499110f3fba213fb53d
compatibility::simple-validator-upgrade::liveness-check : committed: 8359.613818878674 txn/s, latency: 3938.6850707004273 ms, (p50: 2700 ms, p90: 6900 ms, p99: 19300 ms), latency samples: 304100
2. Upgrading first Validator to new version: 0686419e48ab6612468a09595f0d0c73a457f5ce
compatibility::simple-validator-upgrade::single-validator-upgrade : committed: 3003.0044655720594 txn/s, latency: 10349.235604378173 ms, (p50: 10700 ms, p90: 14200 ms, p99: 14500 ms), latency samples: 126080
3. Upgrading rest of first batch to new version: 0686419e48ab6612468a09595f0d0c73a457f5ce
compatibility::simple-validator-upgrade::half-validator-upgrade : committed: 3379.008874254004 txn/s, latency: 9169.159594634004 ms, (p50: 9400 ms, p90: 14000 ms, p99: 14300 ms), latency samples: 137160
4. upgrading second batch to new version: 0686419e48ab6612468a09595f0d0c73a457f5ce
compatibility::simple-validator-upgrade::rest-validator-upgrade : committed: 4731.03378043793 txn/s, latency: 6802.845256556442 ms, (p50: 5100 ms, p90: 13400 ms, p99: 17700 ms), latency samples: 175400
5. check swarm health
Compatibility test for a68e71c05caebf01504d4499110f3fba213fb53d ==> 0686419e48ab6612468a09595f0d0c73a457f5ce passed
Test Ok

github-actions · 2024-06-18T08:18:10Z

✅ Forge suite `realistic_env_max_load` success on `0686419e48ab6612468a09595f0d0c73a457f5ce`

two traffics test: inner traffic : committed: 8724.81672218612 txn/s, latency: 4494.159594391641 ms, (p50: 4500 ms, p90: 5100 ms, p99: 10200 ms), latency samples: 3768660
two traffics test : committed: 99.90328013269485 txn/s, latency: 2149.007222222222 ms, (p50: 2000 ms, p90: 2200 ms, p99: 6500 ms), latency samples: 1800
Latency breakdown for phase 0: ["QsBatchToPos: max: 0.215, avg: 0.209", "QsPosToProposal: max: 0.251, avg: 0.240", "ConsensusProposalToOrdered: max: 0.312, avg: 0.287", "ConsensusOrderedToCommit: max: 0.369, avg: 0.356", "ConsensusProposalToCommit: max: 0.654, avg: 0.643"]
Max round gap was 1 [limit 4] at version 1872855. Max no progress secs was 5.000799 [limit 15] at version 1872855.
Test Ok

sionescu · 2024-06-18T15:50:26Z

.github/actionlint.yaml

@@ -1,4 +1,4 @@
 self-hosted-runner:
  # Labels of self-hosted runners in array of string
  labels:
-    - high-perf-docker
+    - runs-on,cpu=64,family=c7,hdd=500,image=ubuntu22-full-x64,run-id=${{ github.run_id }}


Is this necessary ? It looks like a label not a runner spec.

this is just to tell the actionlint linter not to complain about this in the action yamls.

ibalajiarun

We should build and use a custom AMI soon with the required tools.

This reverts commit 916dba1.

geekflyer force-pushed the runs-on branch 2 times, most recently from 1ebb379 to 064b008 Compare June 14, 2024 03:42

geekflyer added CICD:run-e2e-tests when this label is present github actions will run all land-blocking e2e tests from the PR CICD:run-coverage run tests with test coverage instrumentation labels Jun 14, 2024

This comment has been minimized.

Sign in to view

geekflyer force-pushed the runs-on branch from 064b008 to 70f3917 Compare June 14, 2024 09:34

This comment has been minimized.

Sign in to view

geekflyer force-pushed the runs-on branch 3 times, most recently from 5e2a4c1 to 3214c47 Compare June 15, 2024 10:01

This comment has been minimized.

Sign in to view

geekflyer force-pushed the runs-on branch 4 times, most recently from b46ffc9 to 22b89f2 Compare June 16, 2024 07:22

This comment has been minimized.

Sign in to view

geekflyer force-pushed the runs-on branch from 22b89f2 to 744128a Compare June 18, 2024 05:28

This comment has been minimized.

Sign in to view

geekflyer added the CICD:run-all-unit-tests Runs all unit tests label Jun 18, 2024

geekflyer force-pushed the runs-on branch from 744128a to 33089f7 Compare June 18, 2024 06:36

This comment has been minimized.

Sign in to view

switch to runs-on.com runners

0686419

geekflyer force-pushed the runs-on branch from 33089f7 to 0686419 Compare June 18, 2024 07:52

geekflyer changed the title ~~switch to runs-on.com runners~~ switch high-perf-docker to runs-on.com runners Jun 18, 2024

geekflyer requested a review from ibalajiarun June 18, 2024 07:55

geekflyer marked this pull request as ready for review June 18, 2024 07:58

geekflyer requested a review from a team as a code owner June 18, 2024 07:58

This comment has been minimized.

Sign in to view

sionescu approved these changes Jun 18, 2024

View reviewed changes

ibalajiarun approved these changes Jun 18, 2024

View reviewed changes

geekflyer merged commit 916dba1 into main Jun 18, 2024
49 of 52 checks passed

geekflyer deleted the runs-on branch June 18, 2024 17:29

geekflyer added a commit that referenced this pull request Jun 18, 2024

Revert "switch high-perf-docker to runs-on.com runners (#13658)"

a10f880

This reverts commit 916dba1.

geekflyer mentioned this pull request Jun 18, 2024

Revert "switch high-perf-docker to runs-on.com runners" #13752

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

switch high-perf-docker to runs-on.com runners #13658

switch high-perf-docker to runs-on.com runners #13658

geekflyer commented Jun 12, 2024 •

edited

Loading

trunk-io bot commented Jun 12, 2024 •

edited

Loading

codecov bot commented Jun 12, 2024 •

edited

Loading

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

github-actions bot commented Jun 18, 2024

github-actions bot commented Jun 18, 2024

sionescu Jun 18, 2024

geekflyer Jun 18, 2024

ibalajiarun left a comment

switch high-perf-docker to runs-on.com runners #13658

switch high-perf-docker to runs-on.com runners #13658

Conversation

geekflyer commented Jun 12, 2024 • edited Loading

Test Plan

Something to note

trunk-io bot commented Jun 12, 2024 • edited Loading

codecov bot commented Jun 12, 2024 • edited Loading

Codecov Report

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

github-actions bot commented Jun 18, 2024

✅ Forge suite compat success on a68e71c05caebf01504d4499110f3fba213fb53d ==> 0686419e48ab6612468a09595f0d0c73a457f5ce

github-actions bot commented Jun 18, 2024

✅ Forge suite realistic_env_max_load success on 0686419e48ab6612468a09595f0d0c73a457f5ce

sionescu Jun 18, 2024

Choose a reason for hiding this comment

geekflyer Jun 18, 2024

Choose a reason for hiding this comment

ibalajiarun left a comment

Choose a reason for hiding this comment

geekflyer commented Jun 12, 2024 •

edited

Loading

trunk-io bot commented Jun 12, 2024 •

edited

Loading

codecov bot commented Jun 12, 2024 •

edited

Loading

✅ Forge suite `compat` success on `a68e71c05caebf01504d4499110f3fba213fb53d` ==> `0686419e48ab6612468a09595f0d0c73a457f5ce`

✅ Forge suite `realistic_env_max_load` success on `0686419e48ab6612468a09595f0d0c73a457f5ce`