-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[gas] infrastructure for gas profiling #6535
Conversation
crates/aptos/src/node/mod.rs
Outdated
/// If enabled, will execute transactions with the gas profiler. | ||
/// | ||
/// The result will be saved to the `gas-profiling` directory as a flamegraph. | ||
#[clap(long)] | ||
enable_gas_profiling: bool, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gregnazario Even though it's possible to enable gas profiling by modifying the node config, I feel like that would still be too inconvenient for a user so I added this command line option. Feel free to let me know if you think it should be done differently.
HYYYYPE |
Gave this a try locally and got this error:
Some of the bytes array values that get printed out:
I think you prbly want Also, every action on the node seems to take alot longer even when |
Overall, this is great! One additional feature request would be to also include information about which instructions a function is calling. For example, I'm trying to debug why a repeated borrow is costing an additional ~6000 octas. With flame graphs, I can see that the cost is coming from the additional table borrow but not more beyond that: The left is the trace of the function with the repeated borrow and the right is the trace without. The left call cost 250600 octas while the right cost 244500. Given that storage cost is free for the first 1kb of space in a slot and the object being borrowed is very small (17 bytes), I'm having trouble explaining where the additional cost is coming from (would love some help with this :p). If this tool was able to help answer this question, it'd be incredibly useful 🙏 EDIT: more granular information would look like how much gas came from the storage portion of the cost (byte + per item) and how much came from the tx portion of the cost |
@0xrelapse thank you for giving this a try and all the great feedback!
I'll double check that assertion. It could be that I'm making some wrong assumptions about the keys.
This shouldn't be the case. Will double check. It's normal for things to run a lot slower when enable_gas_profiling is true though.
This should be possible. Right now the profiler is logging more information than it needs, including the cost of every instruction being run. It's just that I'm not including that info in the current graph due to simplicity. The plan is to allow the graph to be configurable so you can choose the granularity that fits you the most, or even generate alternative graphs (e.g. flamegraph has the option to merge and sort items).
The current graph already shows this distinction: the "write_set" section covers entirely the (byte + per item) costs and the execution (<script> or function name) section covers the transaction (execution) portion.
Can you upload the generated .svg file? That way I can inspect the entries and see if I can find anything interesting. |
yeah, could also just be a difference between the version this PR is on top of and the testnet version of the validator. I'll let you know if I notice anything that could potentially be the root cause
Yeah, looks like all the information is there, just needs to be visualized.
Ah I didn't notice that! Where are read costs included?
I'll DM you - don't want to crowd the discussion around this PR with specific gas questions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code lgtm. I think only blockers for getting this in a mergable state is:
- double checking if
assert!(self.bytes.len() > 2);
needs to beassert!(self.bytes.len() >= 2);
Should also probably rebase - I had some merge conflicts with HEAD
|
||
fn charge_intrinsic_gas_for_transaction(&mut self, txn_size: NumBytes) -> VMResult<()>; | ||
|
||
fn charge_write_item_gas(&mut self, key: &StateKey, op: &WriteOp) -> VMResult<()>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: should the public functions of StandardGasMeter
also be public functions on the AptosGasMeter
trait?
for example, having change_set_configs
be available to call in failed_transaction_cleanup
will save having to add a new argument to the method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally, I agree with what you said and this is what I ended up doing in the latest update. However change_set_configs
is a bit different -- it's cleaner logic-wise to have the Aptos VM, rather than the gas meter to carry it.
) -> VMResult<()> { | ||
let cost = self.storage_gas_params.pricing.calculate_write_set_gas(ops); | ||
self.charge(cost).map_err(|e| e.finish(Location::Undefined)) | ||
fn charge_write_item_gas(&mut self, key: &StateKey, op: &WriteOp) -> VMResult<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: was charge_write_set_gas also removed upstream? Also curious why that change was made
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to make some changes to these APIs to make gas meters composable. See my latest design for details.
@0xrelapse Hey just to let you know that I had to prioritize #6683 to enable a massive reduction in gas costs that would benefit the whole Aptos community. Because of that, this PR has become a bit outdated, but I'm hoping to update it soon. Again, thank you for being patient and I'm looking forward to sharing the new version with you! |
47d3b27
to
6175746
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very excited for this to get out there. We've already started receiving confusion about the new testnet gas fees and why account creation is so expensive. This framework would reinforce that storage fees are substantially higher htan execution-only.
Beyond that we should consider when and what we want to gas profile. I think it makes more sense to have gas profiling be a result like simulation. Embedding it too deeply in the first pass seems to add a bit of complex code.
2954b02
to
30ce9cb
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This implements the infrastructure required to support gas profiling and adds an alternative mode to aptos-node, which if enabled, will generate a flamegraph for each transaction being executed.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
❌ Forge suite
|
|
||
/// If this option is set, simulate the transaction locally using the debugger and generate | ||
/// flamegraphs that reflect the gas usage. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add a "short" description that will show up with aptos -h
instead of aptos --help
/// If this option is set, simulate the transaction locally using the debugger and generate | |
/// flamegraphs that reflect the gas usage. | |
/// Profile the transaction's gas cost | |
/// | |
/// If this option is set, simulate the transaction locally using the debugger and generate | |
/// flamegraphs that reflect the gas usage. |
// TODO(Gas): get the following from the chain | ||
const DEFAULT_GAS_UNIT_PRICE: u64 = 100; | ||
const DEFAULT_MAX_GAS: u64 = 2_000_000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can follow up and pull this from onchain. Should just be another REST call
let balance = client | ||
.get_account_balance_at_version(sender_address, version) | ||
.await | ||
.map_err(|err| CliError::ApiError(err.to_string()))? | ||
.into_inner(); | ||
|
||
let max_gas = self.gas_options.max_gas.unwrap_or_else(|| { | ||
if gas_unit_price == 0 { | ||
DEFAULT_MAX_GAS | ||
} else { | ||
std::cmp::min(balance.coin.value.0 / gas_unit_price, DEFAULT_MAX_GAS) | ||
} | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could simulate the transaction to get the max gas expected btw, though making a simulate then a profile is weird.
Just trying to reduce places where this kind of logic can slip from each other.
let addr_short = module_id.address().short_str_lossless(); | ||
let addr_truncated = if addr_short.len() > 4 { | ||
&addr_short[..4] | ||
} else { | ||
addr_short.as_str() | ||
}; | ||
format!("0x{}-{}-{}", addr_truncated, module_id.name(), name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not invent new ways to output function names, this kind of drift made it really hard to fix address string outputs in the first place. Can we just use the current way that's being outputted elsewhere?
Oh, I see is there a character limit on file name?
let hash = transaction.clone().committed_hash(); | ||
|
||
// Execute the transaction using the debugger | ||
let debugger = AptosDebugger::rest_client(client).unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prefer to throw a CliError
instead of an unwrap()
, but if it's not expected to fail for now, that should be fine and we can clean it up soon.
// TODO(Gas): double check if this is correct. | ||
let success = match output.status() { | ||
TransactionStatus::Keep(exec_status) => Some(exec_status.is_success()), | ||
TransactionStatus::Discard(_) | TransactionStatus::Retry => None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be, Some(false)
because if it's discarded, it's failing.
sequence_number: None, // The transaction is not comitted so there is no new sequence number. | ||
success, | ||
timestamp_us: None, | ||
version: Some(version), // The transaction is not comitted so there is no new version. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm debating this, but it is kind of nice to see the version that it's run on. Just interesting to wonder whether that will confuse people that it ran
success, | ||
timestamp_us: None, | ||
version: Some(version), // The transaction is not comitted so there is no new version. | ||
vm_status: Some(vm_status.to_string()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make this explain the status, idk how we get the error mapping for that though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally I would think we should find a solution where gasmeter is not a generic type. I'm doing some investigation in compile times that really annoy me. I think we are over relying on generics as the primary cause of our compile time problem
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
✅ Forge suite
|
✅ Forge suite
|
@gerben-stavenga Ah, did you notice any significant change in compile time? |
@gregnazario , is this something we should surface on Aptos.dev? Even if a simple link to May I ask whom would use this feature? Thanks! |
@clay-aptos we should definitely talk about this.
Developers using the CLI. We should let them know that
|
Thanks, @vgao1996 ! Here is a PR to get this into the docs: I am also taking the opportunity to link to the Move examples section of the Aptos CLI user docs from the Move landing page. Please take a look. Thanks! |
Overview
This is the initial implementation of the gas profiling infrastructure. More specially, this
AptosGasMeter
a trait, which the AptosVM can take to support different gas meter implementationsAptosGasMeter
toStandardGasMeter
aptos-gas-profiling
crate) that is capable of recording a transaction gas log and converting it to a flamegraphHow to Run the Gas Profiler?
You can run the gas profiler by appending the
--profile-gas
option to the aptos cli'smove publish
,move run
&move run-script
commands. Here is an example:Unlike the regular transaction simulation process that happens remotely at the fullnode, this simulates your transaction locally using a tool called "aptos debugger", with the gas profiler enabled. You will find the generated flamegraphs in a directory called
gas-profiling
.Samples
(If you want to interact with a graph, save it to your local disk and then open the saved copy in your browser. For some reason it is not going to work if you simply click the image.)
Known Issues