Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[gas] infrastructure for gas profiling #6535

Merged
merged 2 commits into from
Mar 27, 2023

Conversation

vgao1996
Copy link
Contributor

@vgao1996 vgao1996 commented Feb 8, 2023

Overview

This is the initial implementation of the gas profiling infrastructure. More specially, this

  1. Makes AptosGasMeter a trait, which the AptosVM can take to support different gas meter implementations
  2. Renames the original AptosGasMeter to StandardGasMeter
  3. Implements a gas profiler (the aptos-gas-profiling crate) that is capable of recording a transaction gas log and converting it to a flamegraph
  4. Adds an option to the CLI that would allow one to simulate a transaction locally with the gas profiler

How to Run the Gas Profiler?

You can run the gas profiler by appending the --profile-gas option to the aptos cli's move publish, move run & move run-script commands. Here is an example:

>> cargo run -p aptos -- move publish --profile-gas
    Finished dev [unoptimized + debuginfo] target(s) in 0.51s
     Running `/home/vgao/aptos-core/target/debug/aptos move publish --profile-gas`
Compiling, may take a little while to download git dependencies...
BUILDING empty_fun
package size 427 bytes

Simulating transaction locally with the gas profiler...
This is still experimental so results may be inaccurate.

Execution & IO Gas flamegraph saved to gas-profiling/txn-69e19ee4-0x1-code-publish_package_txn.exec_io.svg
Storage fee flamegraph saved to gas-profiling/txn-69e19ee4-0x1-code-publish_package_txn.storage.svg

{
  "Result": {
    "transaction_hash": "0x69e19ee4cc89cb1f84ee21a46e6b281bd8696115aa332275eca38c4857818dfe",
    "gas_used": 1007,
    "gas_unit_price": 100,
    "sender": "dbcbe741d003a7369d87ec8717afb5df425977106497052f96f4e236372f7dd5",
    "success": true,
    "version": 473269362,
    "vm_status": "status EXECUTED of type Execution"
  }
}

Unlike the regular transaction simulation process that happens remotely at the fullnode, this simulates your transaction locally using a tool called "aptos debugger", with the gas profiler enabled. You will find the generated flamegraphs in a directory called gas-profiling.

Samples

txn-911d0966-script exec_io
txn-911d0966-script storage
(If you want to interact with a graph, save it to your local disk and then open the saved copy in your browser. For some reason it is not going to work if you simply click the image.)

Known Issues

  1. Currently, the generated graphs may still contain truncated addresses, which are still quite cumbersome. The plan is to write a smart algorithm that removes addresses, provided that the functions/items can still be unambiguously identified. I'm leaving this for a future PR for both cleanness and to unblock some other projects.
  2. The storage fee graph does not show the free quota for events yet.

Comment on lines 1077 to 1093
/// If enabled, will execute transactions with the gas profiler.
///
/// The result will be saved to the `gas-profiling` directory as a flamegraph.
#[clap(long)]
enable_gas_profiling: bool,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gregnazario Even though it's possible to enable gas profiling by modifying the node config, I feel like that would still be too inconvenient for a user so I added this command line option. Feel free to let me know if you think it should be done differently.

@0xrelapse
Copy link

HYYYYPE

@0xrelapse
Copy link

0xrelapse commented Feb 16, 2023

Gave this a try locally and got this error:

details = '''panicked at 'assertion failed: self.bytes.len() > 2', aptos-move/aptos-gas-profiling/src/flamegraph.rs:101:9'''
backtrace = '''
   0:        0x103b4d8f0 - backtrace::backtrace::libunwind::trace::hb390dff9310fa195
                               at /Users/shahmeer/.cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.67/src/backtrace/libunwind.rs:93:5
                           backtrace::backtrace::trace_unsynchronized::h58ab526920c94023
                               at /Users/shahmeer/.cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.67/src/backtrace/mod.rs:66:5
                           backtrace::backtrace::trace::hff02ca95676169cf
                               at /Users/shahmeer/.cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.67/src/backtrace/mod.rs:53:14
                           backtrace::capture::Backtrace::create::h58b154e53acc2910
                               at /Users/shahmeer/.cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.67/src/capture.rs:176:9
                           backtrace::capture::Backtrace::new::h7590fae7b256a4b6
                               at /Users/shahmeer/.cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.67/src/capture.rs:140:22
   1:        0x101fe3022 - aptos_crash_handler::handle_panic::hab19fcfd92dd3c57
                               at /Users/shahmeer/Desktop/ferum-indexer-node/crates/crash-handler/src/lib.rs:36:38
                           aptos_crash_handler::setup_panic_handler::{{closure}}::h101c60026ff28a33
                               at /Users/shahmeer/Desktop/ferum-indexer-node/crates/crash-handler/src/lib.rs:28:9
   2:        0x103d07741 - <alloc::boxed::Box<F,A> as core::ops::function::Fn<Args>>::call::h45dd192ca0452fea
                               at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/alloc/src/boxed.rs:2001:9
                           std::panicking::rust_panic_with_hook::hed342721bf9addfa
                               at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/std/src/panicking.rs:692:13
   3:        0x103d07492 - std::panicking::begin_panic_handler::{{closure}}::h3d9af89e51f2fba9
                               at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/std/src/panicking.rs:577:13
   4:        0x103d05918 - std::sys_common::backtrace::__rust_end_short_backtrace::hfb9719355016e93f
                               at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/std/src/sys_common/backtrace.rs:137:18
   5:        0x103d0719d - rust_begin_unwind
                               at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/std/src/panicking.rs:575:5
   6:        0x103e14da3 - core::panicking::panic_fmt::h1965fc2159be50bb
                               at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/core/src/panicking.rs:65:14
   7:        0x103e14e77 - core::panicking::panic::h26331fff934fb8d8
                               at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/core/src/panicking.rs:115:5
   8:        0x102fbbcdf - <aptos_gas_profiling::flamegraph::TableKey as core::fmt::Display>::fmt::h0a6192586689abb2
                               at /Users/shahmeer/Desktop/ferum-indexer-node/aptos-move/aptos-gas-profiling/src/flamegraph.rs:101:9
   9:        0x103d245ca - core::fmt::write::h421d4212716e9716
                               at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/core/src/fmt/mod.rs:1209:17
  10:        0x103d2549e - core::fmt::Formatter::write_fmt::hc5d014f739fd5cb2
                               at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/core/src/fmt/mod.rs:1663:9
  11:        0x102fbbd90 - <aptos_gas_profiling::flamegraph::Render<aptos_types::state_store::state_key::StateKey> as core::fmt::Display>::fmt::hc61150965bc6932b
                               at /Users/shahmeer/Desktop/ferum-indexer-node/aptos-move/aptos-gas-profiling/src/flamegraph.rs:113:17
  12:        0x103d245ca - core::fmt::write::h421d4212716e9716
                               at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/core/src/fmt/mod.rs:1209:17
  13:        0x103d1b569 - core::fmt::Write::write_fmt::h8842a7e955f541d2
                               at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/core/src/fmt/mod.rs:192:9
                           alloc::fmt::format::format_inner::h5728e6a9ef417716
                               at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/alloc/src/fmt.rs:612:9
  14:        0x102fbcf7c - alloc::fmt::format::{{closure}}::hb89a943d01606b89
                               at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/alloc/src/fmt.rs:616:34
                           core::option::Option<T>::map_or_else::hbc374bebc684c8c8
                               at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/core/src/option.rs:1019:21
                           alloc::fmt::format::hd4da209367d6f4be
                               at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/alloc/src/fmt.rs:616:5
                           aptos_gas_profiling::flamegraph::<impl aptos_gas_profiling::log::TransactionGasLog>::to_folded_stack_lines::h4cc5d1e7606de34d
                               at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/alloc/src/macros.rs:118:19
  15:        0x102f6c88e - aptos_gas_profiling::flamegraph::<impl aptos_gas_profiling::log::TransactionGasLog>::save_as_flamegraph::h3e5f59515179d966
                               at /Users/shahmeer/Desktop/ferum-indexer-node/aptos-move/aptos-gas-profiling/src/flamegraph.rs:216:21
  16:        0x102f91aaf - aptos_vm::aptos_vm::AptosVM::generate_and_save_flamegraph::hfa8f5e52c07b6c72
                               at /Users/shahmeer/Desktop/ferum-indexer-node/aptos-move/aptos-vm/src/aptos_vm.rs:904:9
  17:        0x102cf538a - aptos_vm::aptos_vm::AptosVM::execute_user_transaction_with_gas_profiler::h90e679678caa3260
                               at /Users/shahmeer/Desktop/ferum-indexer-node/aptos-move/aptos-vm/src/aptos_vm.rs:941:27
  18:        0x102ce7a03 - aptos_vm::aptos_vm::AptosVM::execute_user_transaction::h1aabe7b703d3da0d
                               at /Users/shahmeer/Desktop/ferum-indexer-node/aptos-move/aptos-vm/src/aptos_vm.rs:959:13
                           <aptos_vm::aptos_vm::AptosVM as aptos_vm::adapter_common::VMAdapter>::execute_single_transaction::h0efc9769c019aa87
                               at /Users/shahmeer/Desktop/ferum-indexer-node/aptos-move/aptos-vm/src/aptos_vm.rs:1394:21
  19:        0x102c52a97 - <aptos_vm::block_executor::vm_wrapper::AptosExecutorTask<S> as aptos_block_executor::task::ExecutorTask>::execute_transaction::h6ccfbeb575055efe
                               at /Users/shahmeer/Desktop/ferum-indexer-node/aptos-move/aptos-vm/src/block_executor/vm_wrapper.rs:66:15
  20:        0x102ca6046 - aptos_block_executor::executor::BlockExecutor<T,E,S>::execute_transactions_sequential::h11a4251d678e7f1f
                               at /Users/shahmeer/Desktop/ferum-indexer-node/aptos-move/block-executor/src/executor.rs:340:23
  21:        0x102ca3b68 - aptos_block_executor::executor::BlockExecutor<T,E,S>::execute_block::hc78a97977ef92cf2
                               at /Users/shahmeer/Desktop/ferum-indexer-node/aptos-move/block-executor/src/executor.rs:390:13
  22:        0x102c42046 - aptos_vm::block_executor::BlockAptosVM::execute_block::ha1a185cde078caec
                               at /Users/shahmeer/Desktop/ferum-indexer-node/aptos-move/aptos-vm/src/block_executor/mod.rs:111:19
  23:        0x102ce76b0 - <aptos_vm::aptos_vm::AptosVM as aptos_vm::VMExecutor>::execute_block::h35a70570fca21125
                               at /Users/shahmeer/Desktop/ferum-indexer-node/aptos-move/aptos-vm/src/aptos_vm.rs:1255:13
  24:        0x102c6a136 - aptos_executor::components::chunk_output::ChunkOutput::execute_block::hebc6ca7721246333
                               at /Users/shahmeer/Desktop/ferum-indexer-node/execution/executor/src/components/chunk_output.rs:107:12
                           aptos_executor::components::chunk_output::ChunkOutput::by_transaction_execution::h17d15b2a8da02378
                               at /Users/shahmeer/Desktop/ferum-indexer-node/execution/executor/src/components/chunk_output.rs:38:35
  25:        0x102cf6031 - <aptos_vm::aptos_vm::AptosVM as aptos_executor::block_executor::TransactionBlockExecutor<aptos_types::transaction::Transaction>>::execute_transaction_block::h4d41515c8204543b
                               at /Users/shahmeer/Desktop/ferum-indexer-node/execution/executor/src/block_executor.rs:45:9
  26:        0x102af4345 - aptos_executor::block_executor::BlockExecutorInner<V,T>::execute_block::haa1fea77e670fedb
                               at /Users/shahmeer/Desktop/ferum-indexer-node/execution/executor/src/block_executor.rs:227:17
                           <aptos_executor::block_executor::BlockExecutor<V,T> as aptos_executor_types::BlockExecutorTrait<T>>::execute_block::h5fe9282aac6b8935
                               at /Users/shahmeer/Desktop/ferum-indexer-node/execution/executor/src/block_executor.rs:107:9
  27:        0x102b39efb - <aptos_consensus::state_computer::ExecutionProxy as aptos_consensus::state_replication::StateComputer>::compute::{{closure}}::{{closure}}::hd73a5eef73a42692
                               at /Users/shahmeer/Desktop/ferum-indexer-node/consensus/src/state_computer.rs:122:17
                           <tokio::runtime::blocking::task::BlockingTask<T> as core::future::future::Future>::poll::ha7d187bc8723d2aa
                               at /Users/shahmeer/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.25.0/src/runtime/blocking/task.rs:42:21
  28:        0x102adc51d - tokio::runtime::task::core::Core<T,S>::poll::{{closure}}::heb9506438f88102b
                               at /Users/shahmeer/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.25.0/src/runtime/task/core.rs:223:17
                           tokio::loom::std::unsafe_cell::UnsafeCell<T>::with_mut::h2a99f6639d92fbc5
                               at /Users/shahmeer/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.25.0/src/loom/std/unsafe_cell.rs:14:9
                           tokio::runtime::task::core::Core<T,S>::poll::he442ff83fe3a18d6
                               at /Users/shahmeer/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.25.0/src/runtime/task/core.rs:212:13
  29:        0x102aa6a9e - tokio::runtime::task::harness::poll_future::{{closure}}::hb9370bb1f8bd7cfa
                               at /Users/shahmeer/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.25.0/src/runtime/task/harness.rs:476:19
                           <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once::h1dc6c3873da76c77
                               at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/core/src/panic/unwind_safe.rs:271:9
                           std::panicking::try::do_call::hac846852ccb3f65d
                               at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/std/src/panicking.rs:483:40
                           std::panicking::try::h6ea2dbfef0965ddb
                               at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/std/src/panicking.rs:447:19
                           std::panic::catch_unwind::he6df502311b83890
                               at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/std/src/panic.rs:137:14
                           tokio::runtime::task::harness::poll_future::h84c8b1c128406df3
                               at /Users/shahmeer/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.25.0/src/runtime/task/harness.rs:464:18
                           tokio::runtime::task::harness::Harness<T,S>::poll_inner::h35f19fa051d0489b
                               at /Users/shahmeer/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.25.0/src/runtime/task/harness.rs:198:27
                           tokio::runtime::task::harness::Harness<T,S>::poll::hb0add9d0e3808f94
                               at /Users/shahmeer/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.25.0/src/runtime/task/harness.rs:152:15
  30:        0x103b80aed - tokio::runtime::task::raw::RawTask::poll::h4f31511b0f703bae
                               at /Users/shahmeer/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.25.0/src/runtime/task/raw.rs:200:18
                           tokio::runtime::task::UnownedTask<S>::run::h78f9dcc0f17aa87e
                               at /Users/shahmeer/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.25.0/src/runtime/task/mod.rs:431:9
                           tokio::runtime::blocking::pool::Task::run::heb5ad9f13827d0e6
                               at /Users/shahmeer/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.25.0/src/runtime/blocking/pool.rs:159:9
                           tokio::runtime::blocking::pool::Inner::run::hc6cfc448d2d1d01e
                               at /Users/shahmeer/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.25.0/src/runtime/blocking/pool.rs:511:17
  31:        0x103b8266b - tokio::runtime::blocking::pool::Spawner::spawn_thread::{{closure}}::ha37196b3ca7a64ef
                               at /Users/shahmeer/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.25.0/src/runtime/blocking/pool.rs:469:13
                           std::sys_common::backtrace::__rust_begin_short_backtrace::h134c54a361773639
                               at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/std/src/sys_common/backtrace.rs:121:18
  32:        0x103b93548 - std::thread::Builder::spawn_unchecked_::{{closure}}::{{closure}}::h21036d00d09a6dbd
                               at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/std/src/thread/mod.rs:551:17
                           <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once::hdc7c3895d5aef9d4
                               at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/core/src/panic/unwind_safe.rs:271:9
                           std::panicking::try::do_call::h98b9de1746f981b8
                               at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/std/src/panicking.rs:483:40
                           std::panicking::try::h182ce7f5ad16ee90
                               at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/std/src/panicking.rs:447:19
                           std::panic::catch_unwind::h84ebd0b17c52913b
                               at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/std/src/panic.rs:137:14
                           std::thread::Builder::spawn_unchecked_::{{closure}}::hf826c74bdef4cc69
                               at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/std/src/thread/mod.rs:550:30
                           core::ops::function::FnOnce::call_once{{vtable.shim}}::h39cee8bfa474b23f
                               at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/core/src/ops/function.rs:251:5
  33:        0x103d0b067 - <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once::h5044ef3856644fc6
                               at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/alloc/src/boxed.rs:1987:9
                           <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once::h4cd06f57c6ec0db9
                               at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/alloc/src/boxed.rs:1987:9
                           std::sys::unix::thread::Thread::new::thread_start::h1311e3ea18e7aa94
                               at /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/std/src/sys/unix/thread.rs:108:17
  34:     0x7ff81f2e54e1 - __pthread_start
'''

[Logging] Unable to flush recv: timed out waiting on channel
Timed out flushing telemetry: timed out waiting on channel

Some of the bytes array values that get printed out:

[1, 0, 0, 0]
[1, 0]
[1, 0, 0, 0]
[63, 225, 161, 38, 88, 73, 96, 213, 254, 19, 236, 116, 244, 221, 70, 137, 157, 94, 67, 62, 84, 107, 81, 39, 199, 221, 152, 224, 141, 230, 22, 10, 63, 225, 161, 38, 88, 73, 96, 213, 254, 19, 236, 116, 244, 221, 70, 137, 157, 94, 67, 62, 84, 107, 81, 39, 199, 221, 152, 224, 141, 230, 22, 10]

I think you prbly want self.bytes.len() > 2. That seems to work for me.

Also, every action on the node seems to take alot longer even when enable_gas_profiling is false. Will try to dig into why

@0xrelapse
Copy link

0xrelapse commented Feb 16, 2023

Overall, this is great!

One additional feature request would be to also include information about which instructions a function is calling. For example, I'm trying to debug why a repeated borrow is costing an additional ~6000 octas. With flame graphs, I can see that the cost is coming from the additional table borrow but not more beyond that:

image

The left is the trace of the function with the repeated borrow and the right is the trace without. The left call cost 250600 octas while the right cost 244500.

Given that storage cost is free for the first 1kb of space in a slot and the object being borrowed is very small (17 bytes), I'm having trouble explaining where the additional cost is coming from (would love some help with this :p). If this tool was able to help answer this question, it'd be incredibly useful 🙏

EDIT: more granular information would look like how much gas came from the storage portion of the cost (byte + per item) and how much came from the tx portion of the cost

@vgao1996
Copy link
Contributor Author

vgao1996 commented Feb 16, 2023

@0xrelapse thank you for giving this a try and all the great feedback!

Gave this a try locally and got this error:

I'll double check that assertion. It could be that I'm making some wrong assumptions about the keys.

Also, every action on the node seems to take alot longer even when enable_gas_profiling is false. Will try to dig into why

This shouldn't be the case. Will double check. It's normal for things to run a lot slower when enable_gas_profiling is true though.

One additional feature request would be to also include information about which instructions a function is calling.

This should be possible. Right now the profiler is logging more information than it needs, including the cost of every instruction being run. It's just that I'm not including that info in the current graph due to simplicity. The plan is to allow the graph to be configurable so you can choose the granularity that fits you the most, or even generate alternative graphs (e.g. flamegraph has the option to merge and sort items).

EDIT: more granular information would look like how much gas came from the storage portion of the cost (byte + per item) and how much came from the tx portion of the cost

The current graph already shows this distinction: the "write_set" section covers entirely the (byte + per item) costs and the execution (<script> or function name) section covers the transaction (execution) portion.

or example, I'm trying to debug why a repeated borrow is costing an additional ~6000 octas.

Can you upload the generated .svg file? That way I can inspect the entries and see if I can find anything interesting.

@0xrelapse
Copy link

0xrelapse commented Feb 19, 2023

This shouldn't be the case. Will double check. It's normal for things to run a lot slower when enable_gas_profiling is true though.

yeah, could also just be a difference between the version this PR is on top of and the testnet version of the validator. I'll let you know if I notice anything that could potentially be the root cause

This should be possible. Right now the profiler is logging more information than it needs, including the cost of every instruction being run. It's just that I'm not including that info in the current graph due to simplicity. The plan is to allow the graph to be configurable so you can choose the granularity that fits you the most, or even generate alternative graphs (e.g. flamegraph has the option to merge and sort items).

Yeah, looks like all the information is there, just needs to be visualized.

The current graph already shows this distinction: the "write_set" section covers entirely the (byte + per item) costs and the execution (<script> or function name) section covers the transaction (execution) portion.

Ah I didn't notice that! Where are read costs included?

Can you upload the generated .svg file? That way I can inspect the entries and see if I can find anything interesting.

I'll DM you - don't want to crowd the discussion around this PR with specific gas questions

Copy link

@0xrelapse 0xrelapse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code lgtm. I think only blockers for getting this in a mergable state is:

  • double checking if assert!(self.bytes.len() > 2); needs to be assert!(self.bytes.len() >= 2);

Should also probably rebase - I had some merge conflicts with HEAD


fn charge_intrinsic_gas_for_transaction(&mut self, txn_size: NumBytes) -> VMResult<()>;

fn charge_write_item_gas(&mut self, key: &StateKey, op: &WriteOp) -> VMResult<()>;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: should the public functions of StandardGasMeter also be public functions on the AptosGasMeter trait?

for example, having change_set_configs be available to call in failed_transaction_cleanup will save having to add a new argument to the method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, I agree with what you said and this is what I ended up doing in the latest update. However change_set_configs is a bit different -- it's cleaner logic-wise to have the Aptos VM, rather than the gas meter to carry it.

aptos-move/aptos-gas/src/instr.rs Outdated Show resolved Hide resolved
) -> VMResult<()> {
let cost = self.storage_gas_params.pricing.calculate_write_set_gas(ops);
self.charge(cost).map_err(|e| e.finish(Location::Undefined))
fn charge_write_item_gas(&mut self, key: &StateKey, op: &WriteOp) -> VMResult<()> {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: was charge_write_set_gas also removed upstream? Also curious why that change was made

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to make some changes to these APIs to make gas meters composable. See my latest design for details.

@vgao1996
Copy link
Contributor Author

vgao1996 commented Mar 2, 2023

@0xrelapse Hey just to let you know that I had to prioritize #6683 to enable a massive reduction in gas costs that would benefit the whole Aptos community. Because of that, this PR has become a bit outdated, but I'm hoping to update it soon. Again, thank you for being patient and I'm looking forward to sharing the new version with you!

@vgao1996 vgao1996 force-pushed the gas-profiling branch 2 times, most recently from 47d3b27 to 6175746 Compare March 17, 2023 17:35
@vgao1996 vgao1996 requested a review from msmouse March 17, 2023 17:47
@vgao1996 vgao1996 added the CICD:run-e2e-tests when this label is present github actions will run all land-blocking e2e tests from the PR label Mar 17, 2023
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

Copy link
Contributor

@davidiw davidiw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very excited for this to get out there. We've already started receiving confusion about the new testnet gas fees and why account creation is so expensive. This framework would reinforce that storage fees are substantially higher htan execution-only.

Beyond that we should consider when and what we want to gas profile. I think it makes more sense to have gas profiling be a result like simulation. Embedding it too deeply in the first pass seems to add a bit of complex code.

aptos-move/aptos-vm/src/aptos_vm.rs Outdated Show resolved Hide resolved
@vgao1996 vgao1996 force-pushed the gas-profiling branch 2 times, most recently from 2954b02 to 30ce9cb Compare March 20, 2023 17:46
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

This implements the infrastructure required to support gas profiling and
adds an alternative mode to aptos-node, which if enabled, will generate
a flamegraph for each transaction being executed.
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions
Copy link
Contributor

❌ Forge suite framework_upgrade failure on cb4ba0a57c998c60cbab65af31a64875d2588ca5 ==> cc7436b070eea2b18804f8b42c22c5ee167246fc

Compatibility test results for cb4ba0a57c998c60cbab65af31a64875d2588ca5 ==> cc7436b070eea2b18804f8b42c22c5ee167246fc (PR)
Upgrade the nodes to version: cc7436b070eea2b18804f8b42c22c5ee167246fc
Test Failed: API error: Unknown error Transaction committed on chain, but failed execution: Transaction Executed and Committed with Error CONSTRAINT_NOT_SATISFIED
Trailing Log Lines:
INCLUDING DEPENDENCY MoveStdlib
BUILDING ExecuteProposal
::error::API error: Unknown error Transaction committed on chain, but failed execution: Transaction Executed and Committed with Error CONSTRAINT_NOT_SATISFIED
test framework_upgrade::framework-upgrade ... FAILED
Error: API error: Unknown error Transaction committed on chain, but failed execution: Transaction Executed and Committed with Error CONSTRAINT_NOT_SATISFIED
Test Statistics: 
Compatibility test results for cb4ba0a57c998c60cbab65af31a64875d2588ca5 ==> cc7436b070eea2b18804f8b42c22c5ee167246fc (PR)
Upgrade the nodes to version: cc7436b070eea2b18804f8b42c22c5ee167246fc
Test Failed: API error: Unknown error Transaction committed on chain, but failed execution: Transaction Executed and Committed with Error CONSTRAINT_NOT_SATISFIED


Swarm logs can be found here: See fgi output for more information.
{"level":"INFO","source":{"package":"aptos_forge","file":"testsuite/forge/src/backend/k8s/cluster_helper.rs:281"},"thread_name":"main","hostname":"forge-framework-upgrade-pr-6535-1679675680-cb4ba0a57c998c60cbab","timestamp":"2023-03-24T16:58:41.699108Z","message":"Deleting namespace forge-framework-upgrade-pr-6535: Some(NamespaceStatus { phase: Some(\"Terminating\") })"}
{"level":"INFO","source":{"package":"aptos_forge","file":"testsuite/forge/src/backend/k8s/cluster_helper.rs:389"},"thread_name":"main","hostname":"forge-framework-upgrade-pr-6535-1679675680-cb4ba0a57c998c60cbab","timestamp":"2023-03-24T16:58:41.699133Z","message":"aptos-node resources for Forge removed in namespace: forge-framework-upgrade-pr-6535"}

failures:
    framework_upgrade::framework-upgrade

test result: FAILED. 0 passed; 1 failed; 0 filtered out

Failed to run tests:
Tests Failed
Error: Tests Failed
Debugging output:
NAME                                   READY   STATUS      RESTARTS   AGE
genesis-aptos-genesis-eforge93-sgdz7   0/1     Completed   0          21m

Comment on lines +1295 to +1297

/// If this option is set, simulate the transaction locally using the debugger and generate
/// flamegraphs that reflect the gas usage.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add a "short" description that will show up with aptos -h instead of aptos --help

Suggested change
/// If this option is set, simulate the transaction locally using the debugger and generate
/// flamegraphs that reflect the gas usage.
/// Profile the transaction's gas cost
///
/// If this option is set, simulate the transaction locally using the debugger and generate
/// flamegraphs that reflect the gas usage.

Comment on lines +1466 to +1468
// TODO(Gas): get the following from the chain
const DEFAULT_GAS_UNIT_PRICE: u64 = 100;
const DEFAULT_MAX_GAS: u64 = 2_000_000;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can follow up and pull this from onchain. Should just be another REST call

Comment on lines +1480 to +1492
let balance = client
.get_account_balance_at_version(sender_address, version)
.await
.map_err(|err| CliError::ApiError(err.to_string()))?
.into_inner();

let max_gas = self.gas_options.max_gas.unwrap_or_else(|| {
if gas_unit_price == 0 {
DEFAULT_MAX_GAS
} else {
std::cmp::min(balance.coin.value.0 / gas_unit_price, DEFAULT_MAX_GAS)
}
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could simulate the transaction to get the max gas expected btw, though making a simulate then a profile is weird.

Just trying to reduce places where this kind of logic can slip from each other.

Comment on lines +1519 to +1525
let addr_short = module_id.address().short_str_lossless();
let addr_truncated = if addr_short.len() > 4 {
&addr_short[..4]
} else {
addr_short.as_str()
};
format!("0x{}-{}-{}", addr_truncated, module_id.name(), name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not invent new ways to output function names, this kind of drift made it really hard to fix address string outputs in the first place. Can we just use the current way that's being outputted elsewhere?

Oh, I see is there a character limit on file name?

let hash = transaction.clone().committed_hash();

// Execute the transaction using the debugger
let debugger = AptosDebugger::rest_client(client).unwrap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer to throw a CliError instead of an unwrap(), but if it's not expected to fail for now, that should be fine and we can clean it up soon.

// TODO(Gas): double check if this is correct.
let success = match output.status() {
TransactionStatus::Keep(exec_status) => Some(exec_status.is_success()),
TransactionStatus::Discard(_) | TransactionStatus::Retry => None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be, Some(false) because if it's discarded, it's failing.

sequence_number: None, // The transaction is not comitted so there is no new sequence number.
success,
timestamp_us: None,
version: Some(version), // The transaction is not comitted so there is no new version.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm debating this, but it is kind of nice to see the version that it's run on. Just interesting to wonder whether that will confuse people that it ran

success,
timestamp_us: None,
version: Some(version), // The transaction is not comitted so there is no new version.
vm_status: Some(vm_status.to_string()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make this explain the status, idk how we get the error mapping for that though

Copy link
Contributor

@gerben-stavenga gerben-stavenga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally I would think we should find a solution where gasmeter is not a generic type. I'm doing some investigation in compile times that really annoy me. I think we are over relying on generics as the primary cause of our compile time problem

@gregnazario gregnazario enabled auto-merge (squash) March 27, 2023 15:20
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions
Copy link
Contributor

✅ Forge suite compat success on testnet_2d8b1b57553d869190f61df1aaf7f31a8fc19a7b ==> 904e77dc7b924fdd308b2c8de43090980d29bbca

Compatibility test results for testnet_2d8b1b57553d869190f61df1aaf7f31a8fc19a7b ==> 904e77dc7b924fdd308b2c8de43090980d29bbca (PR)
1. Check liveness of validators at old version: testnet_2d8b1b57553d869190f61df1aaf7f31a8fc19a7b
compatibility::simple-validator-upgrade::liveness-check : 7970 TPS, 4812 ms latency, 6500 ms p99 latency,no expired txns
2. Upgrading first Validator to new version: 904e77dc7b924fdd308b2c8de43090980d29bbca
compatibility::simple-validator-upgrade::single-validator-upgrade : 4604 TPS, 8544 ms latency, 11400 ms p99 latency,no expired txns
3. Upgrading rest of first batch to new version: 904e77dc7b924fdd308b2c8de43090980d29bbca
compatibility::simple-validator-upgrade::half-validator-upgrade : 4848 TPS, 8371 ms latency, 10400 ms p99 latency,no expired txns
4. upgrading second batch to new version: 904e77dc7b924fdd308b2c8de43090980d29bbca
compatibility::simple-validator-upgrade::rest-validator-upgrade : 6570 TPS, 5817 ms latency, 10600 ms p99 latency,no expired txns
5. check swarm health
Compatibility test for testnet_2d8b1b57553d869190f61df1aaf7f31a8fc19a7b ==> 904e77dc7b924fdd308b2c8de43090980d29bbca passed
Test Ok

@github-actions
Copy link
Contributor

✅ Forge suite land_blocking success on 904e77dc7b924fdd308b2c8de43090980d29bbca

performance benchmark with full nodes : 5888 TPS, 6724 ms latency, 10600 ms p99 latency,no expired txns
Test Ok

@gregnazario gregnazario merged commit f66d717 into aptos-labs:main Mar 27, 2023
@vgao1996
Copy link
Contributor Author

@gerben-stavenga Ah, did you notice any significant change in compile time?

@clay-aptos
Copy link
Contributor

@gregnazario , is this something we should surface on Aptos.dev? Even if a simple link to aptos-move/aptos-gas-profiling/README.md?

May I ask whom would use this feature? Thanks!

@vgao1996
Copy link
Contributor Author

@clay-aptos we should definitely talk about this.

May I ask whom would use this feature? Thanks!

Developers using the CLI. We should let them know that

  1. This tool is now available so try it out!
  2. It's still experimental so there may be inaccuracies/bugs/known issues that haven't been dealt with

@clay-aptos
Copy link
Contributor

@clay-aptos we should definitely talk about this.

May I ask whom would use this feature? Thanks!

Developers using the CLI. We should let them know that

  1. This tool is now available so try it out!
  2. It's still experimental so there may be inaccuracies/bugs/known issues that haven't been dealt with

Thanks, @vgao1996 ! Here is a PR to get this into the docs:
#7457

I am also taking the opportunity to link to the Move examples section of the Aptos CLI user docs from the Move landing page. Please take a look. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CICD:run-e2e-tests when this label is present github actions will run all land-blocking e2e tests from the PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants