Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[move-vm][aptos-vm] Basic execution counters #15086

Merged
merged 2 commits into from
Oct 30, 2024
Merged

Conversation

georgemitenkov
Copy link
Contributor

@georgemitenkov georgemitenkov commented Oct 25, 2024

Description

Some basic counters that address #14348. Note that some metrics should not be collected via this timer, e.g., interpreter loop so that there is no perf degradation. Hence, added only in a few interesting places (e.g., module loading cache misses, resource loading cache misses, total execution times for user transaction with basic break downs).

How Has This Been Tested?

Key Areas to Review

Type of Change

  • New feature
  • Bug fix
  • Breaking change
  • Performance improvement
  • Refactoring
  • Dependency update
  • Documentation update
  • Tests

Which Components or Systems Does This Change Impact?

  • Validator Node
  • Full Node (API, Indexer, etc.)
  • Move/Aptos Virtual Machine
  • Aptos Framework
  • Aptos CLI/SDK
  • Developer Infrastructure
  • Move Compiler
  • Other (specify)

Checklist

  • I have read and followed the CONTRIBUTING doc
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I identified and added all stakeholders and component owners affected by this change as reviewers
  • I tested both happy and unhappy path of the functionality
  • I have made corresponding changes to the documentation

Copy link

trunk-io bot commented Oct 25, 2024

⏱️ 2h 54m total CI duration on this PR
Slowest 15 Jobs Cumulative Duration Recent Runs
execution-performance / single-node-performance 1h 9m 🟩🟥🟩
check 13m 🟩🟩🟩
execution-performance / test-target-determinator 11m 🟩🟩🟩
test-target-determinator 11m 🟩🟩🟩
rust-move-tests 9m 🟩
rust-move-tests 9m 🟩
rust-move-tests 9m 🟩
check-dynamic-deps 8m 🟩🟩🟩🟩🟩
rust-cargo-deny 8m 🟩🟩🟩🟩
rust-doc-tests 5m 🟩
fetch-last-released-docker-image-tag 5m 🟩🟩🟩
rust-doc-tests 5m 🟩
rust-doc-tests 5m 🟩
general-lints 2m 🟩🟩🟩🟩
semgrep/ci 2m 🟩🟩🟩🟩🟩

🚨 1 job on the last run was significantly faster/slower than expected

Job Duration vs 7d avg Delta
check-dynamic-deps 2m 1m +92%

settingsfeedbackdocs ⋅ learn more about trunk.io

Copy link
Contributor Author

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @georgemitenkov and the rest of your teammates on Graphite Graphite

@georgemitenkov georgemitenkov changed the title [move] Basic execution counters [move-vm][aptos-vm] Basic execution counters Oct 25, 2024
@georgemitenkov georgemitenkov marked this pull request as ready for review October 25, 2024 18:01
@georgemitenkov georgemitenkov requested review from gelash, brmataptos and ziaptos and removed request for davidiw and wrwg October 25, 2024 18:01
Copy link
Contributor

@brmataptos brmataptos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Copy link
Contributor

@vineethk vineethk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving with comments to be addressed.

third_party/move/move-vm/metrics/src/lib.rs Show resolved Hide resolved
third_party/move/move-vm/metrics/src/lib.rs Outdated Show resolved Hide resolved
third_party/move/move-vm/metrics/src/lib.rs Show resolved Hide resolved
third_party/move/move-vm/metrics/src/lib.rs Show resolved Hide resolved
@georgemitenkov georgemitenkov enabled auto-merge (squash) October 29, 2024 17:09

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.


/// Helper trait to encapsulate [HistogramVec] functionality. Users can use this trait to time
/// different VM parts collecting metrics for different labels. Use wisely as timers do introduce
/// an overhead, so using on extremely hot path is not recommended.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this is too optimistic of a wording, let's at least remove word "extremely". Even a moderate hot paths can cause slowdown, as histogram is not that cheap.

Things in this PR are probably fine (I assume type_to_type_tag is not called too often?), but it is easy do add overhead (i.e
Cache hot for data would probably be questionable)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point!

Was testing with e2e, data accesses are indeed something we do not want to record, as there is a bit of regression. At least recording gas traversal & loading/verification miss + total execution time should be enough

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

Copy link
Contributor

✅ Forge suite realistic_env_max_load success on fa283cadbb1f9d988e831f8c4b0849ff8cf5250c

two traffics test: inner traffic : committed: 14234.81 txn/s, latency: 2790.93 ms, (p50: 2700 ms, p70: 2700, p90: 3000 ms, p99: 3300 ms), latency samples: 5412400
two traffics test : committed: 99.92 txn/s, latency: 1683.72 ms, (p50: 1400 ms, p70: 1500, p90: 1600 ms, p99: 11900 ms), latency samples: 1780
Latency breakdown for phase 0: ["MempoolToBlockCreation: max: 1.972, avg: 1.592", "ConsensusProposalToOrdered: max: 0.329, avg: 0.295", "ConsensusOrderedToCommit: max: 0.376, avg: 0.355", "ConsensusProposalToCommit: max: 0.668, avg: 0.651"]
Max non-epoch-change gap was: 0 rounds at version 0 (avg 0.00) [limit 4], 0.66s no progress at version 35486 (avg 0.20s) [limit 15].
Max epoch-change gap was: 1 rounds at version 2733064 (avg 1.00) [limit 4], 9.87s no progress at version 2733064 (avg 9.87s) [limit 15].
Test Ok

Copy link
Contributor

✅ Forge suite framework_upgrade success on 9c922ebe94f5ff4b58df4617f3ff003e2ce10ccd ==> fa283cadbb1f9d988e831f8c4b0849ff8cf5250c

Compatibility test results for 9c922ebe94f5ff4b58df4617f3ff003e2ce10ccd ==> fa283cadbb1f9d988e831f8c4b0849ff8cf5250c (PR)
Upgrade the nodes to version: fa283cadbb1f9d988e831f8c4b0849ff8cf5250c
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 940.83 txn/s, submitted: 943.04 txn/s, failed submission: 2.21 txn/s, expired: 2.21 txn/s, latency: 3223.71 ms, (p50: 2600 ms, p70: 3600, p90: 6100 ms, p99: 7500 ms), latency samples: 84980
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 1085.19 txn/s, submitted: 1086.38 txn/s, failed submission: 1.18 txn/s, expired: 1.18 txn/s, latency: 3307.60 ms, (p50: 2100 ms, p70: 3200, p90: 7200 ms, p99: 13000 ms), latency samples: 91680
5. check swarm health
Compatibility test for 9c922ebe94f5ff4b58df4617f3ff003e2ce10ccd ==> fa283cadbb1f9d988e831f8c4b0849ff8cf5250c passed
Upgrade the remaining nodes to version: fa283cadbb1f9d988e831f8c4b0849ff8cf5250c
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 1050.97 txn/s, submitted: 1052.79 txn/s, failed submission: 1.82 txn/s, expired: 1.82 txn/s, latency: 3186.74 ms, (p50: 2400 ms, p70: 3300, p90: 6300 ms, p99: 8100 ms), latency samples: 92600
Test Ok

Copy link
Contributor

✅ Forge suite compat success on 9c922ebe94f5ff4b58df4617f3ff003e2ce10ccd ==> fa283cadbb1f9d988e831f8c4b0849ff8cf5250c

Compatibility test results for 9c922ebe94f5ff4b58df4617f3ff003e2ce10ccd ==> fa283cadbb1f9d988e831f8c4b0849ff8cf5250c (PR)
1. Check liveness of validators at old version: 9c922ebe94f5ff4b58df4617f3ff003e2ce10ccd
compatibility::simple-validator-upgrade::liveness-check : committed: 14410.03 txn/s, latency: 2403.85 ms, (p50: 1800 ms, p70: 2000, p90: 3300 ms, p99: 12700 ms), latency samples: 537860
2. Upgrading first Validator to new version: fa283cadbb1f9d988e831f8c4b0849ff8cf5250c
compatibility::simple-validator-upgrade::single-validator-upgrading : committed: 5858.68 txn/s, latency: 4847.73 ms, (p50: 5300 ms, p70: 5800, p90: 6300 ms, p99: 6500 ms), latency samples: 112820
compatibility::simple-validator-upgrade::single-validator-upgrade : committed: 6406.99 txn/s, latency: 5048.05 ms, (p50: 5400 ms, p70: 5500, p90: 6800 ms, p99: 7200 ms), latency samples: 218860
3. Upgrading rest of first batch to new version: fa283cadbb1f9d988e831f8c4b0849ff8cf5250c
compatibility::simple-validator-upgrade::half-validator-upgrading : committed: 7114.20 txn/s, latency: 3841.17 ms, (p50: 4400 ms, p70: 4800, p90: 5100 ms, p99: 5300 ms), latency samples: 126700
compatibility::simple-validator-upgrade::half-validator-upgrade : committed: 7208.05 txn/s, latency: 4432.77 ms, (p50: 4600 ms, p70: 4800, p90: 6600 ms, p99: 6900 ms), latency samples: 238700
4. upgrading second batch to new version: fa283cadbb1f9d988e831f8c4b0849ff8cf5250c
compatibility::simple-validator-upgrade::rest-validator-upgrading : committed: 9209.29 txn/s, latency: 3061.52 ms, (p50: 3500 ms, p70: 3600, p90: 3800 ms, p99: 4000 ms), latency samples: 163380
compatibility::simple-validator-upgrade::rest-validator-upgrade : committed: 8294.84 txn/s, latency: 3790.03 ms, (p50: 3700 ms, p70: 3900, p90: 5900 ms, p99: 6400 ms), latency samples: 268960
5. check swarm health
Compatibility test for 9c922ebe94f5ff4b58df4617f3ff003e2ce10ccd ==> fa283cadbb1f9d988e831f8c4b0849ff8cf5250c passed
Test Ok

@georgemitenkov georgemitenkov merged commit f8c5a60 into main Oct 30, 2024
48 checks passed
@georgemitenkov georgemitenkov deleted the george/benchmarks branch October 30, 2024 12:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants