Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[movevm] improve automatic creation of account for sponsored txn #11076

Merged
merged 1 commit into from
Dec 13, 2023

Conversation

davidiw
Copy link
Contributor

@davidiw davidiw commented Nov 25, 2023

  • If an account is created during sponsored transaction and the txn aborts.
  • The transaction is now kept but aborted.
  • The state associated with account creation is wiped and The account is no longer created.
  • During epilogue, we then hit a invariant violation, because the sequence number for the account cannot be incremented.

This ensures that we create an account even on transaction failure in the epilogue. We then charge as much as possible by recording the error on processing the change set. An error indicates we have run out of gas.

Note, the min gas amount is effectively 2, where we need roughly 4 for this operation, so more needs to be figured out. We could consider updating the simulation, but it is probably easier to just mention in the AIP this specific scenario as that minimizes further code hacking.

There are tests that validate that this works even with enough gas to execute and that it fails to execute without enough gas.

Copy link

codecov bot commented Nov 25, 2023

Codecov Report

Attention: 33 lines in your changes are missing coverage. Please review.

Comparison is base (0e91537) 68.7% compared to head (b003b9f) 68.7%.
Report is 3 commits behind head on main.

Files Patch % Lines
aptos-move/aptos-vm/src/aptos_vm.rs 86.3% 23 Missing ⚠️
aptos-move/aptos-vm/src/gas.rs 81.4% 5 Missing ⚠️
aptos-move/aptos-vm/src/transaction_validation.rs 37.5% 5 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff            @@
##             main   #11076    +/-   ##
========================================
  Coverage    68.7%    68.7%            
========================================
  Files         764      764            
  Lines      176995   177145   +150     
========================================
+ Hits       121630   121819   +189     
+ Misses      55365    55326    -39     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@davidiw davidiw force-pushed the david-sponsored branch 2 times, most recently from 7967a8f to e9ebbc1 Compare November 26, 2023 18:13
@davidiw davidiw force-pushed the david-sponsored branch 4 times, most recently from c8fed55 to b0bddb7 Compare November 28, 2023 02:28
aptos-move/aptos-vm/src/aptos_vm.rs Outdated Show resolved Hide resolved
self.vm_impl
if matches!(transaction_status, TransactionStatus::Keep(_))
&& txn_data.fee_payer().is_some()
&& txn_data.sequence_number == 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think account sequence number starts from 0 right? so there could be a valid account with sequence number 0, in that case if the txn fails, I guess it just needs to pay the extra gas for calling create_account_if_does_not_exist

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would have to pay that in the happy path anyway...

aptos-move/aptos-vm/src/aptos_vm.rs Outdated Show resolved Hide resolved
aptos-move/aptos-vm/src/aptos_vm_impl.rs Outdated Show resolved Hide resolved
aptos-move/aptos-vm/src/aptos_vm_impl.rs Outdated Show resolved Hide resolved

// Need to zero this out, because that's the legacy behavior. Otherwise we charge for txn
// size where we used to not.
let mut null_txn_data = txn_data.clone();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not pass a size to charge_change_set_and_respawn_session? seems better than cloning the whole struct

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm... because this isn't common case and it will be removed, so I like to leave the minimal foot print... I'm fine either way.

aptos-move/aptos-vm/src/aptos_vm.rs Outdated Show resolved Hide resolved
@davidiw davidiw force-pushed the david-sponsored branch 2 times, most recently from 5264d3e to 369a4ad Compare December 5, 2023 08:11
aptos-move/aptos-vm/src/aptos_vm.rs Outdated Show resolved Hide resolved
aptos-move/aptos-vm/src/aptos_vm.rs Outdated Show resolved Hide resolved
})?;

let mut change_set = session.finish(change_set_configs)?;
if let Err(err) = self.charge_change_set(&mut change_set, gas_meter, txn_data) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we add a flag to not record deposits on the new slots created (so no refund for these slots)? Makes me feel safer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do that? if we can give them, we should

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we will never create a bug where you can get more refund then you've paid. I know you defend against that by requiring more that two slots worth of it upfront, but I'm not feeling safe -- one can accidentally remove the requirement.

Copy link
Contributor

@igor-aptos igor-aptos Dec 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thinking outloud - can we somehow make this more robust, either:

  • make sure that charge we are missing here is below what we already charged, or
  • refund what we already charged, charge here expecting no failures, and then try to charge back how much we refunded?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the current form, it's also a bit odd we are checking whether we have enough only for the storage deposit, but not for the other costs. they might not be high by default, but might be quite high if transaction has set a high gas price for example.

either of the above should account for the extras.

if we don't have enough funds for all the costs of account creation alone, we should probably be discarding the whole transaction, instead of forcing account creation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering if we can at least capture these cases in tests, e.g. have a test to check the refund is < what was paid? In case some checks are removed we will see something is off

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, this is not just a test issue.
The cheapest way now to create an account is to have account with 2x storage slot fee, and submit a fee payer txn.
In idle state difference is like 700 octas, but if network is under load - you can freely submit txn with large gas price, get the priority, and get it executed without paying high gas price.

If nothing else, we should check balance for 2* storage + 10 * gas_price , and have test to check its enough

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming you landed on the idea that the validation function should test (2* storage + 10*gas_price) which I think is much, much more expensive then it is to create an account. I think that is acceptable... we should be willing to charge more for aborted transactions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@georgemitenkov , we cannot really test this because we don't have the ability to delete an account. I might propose that we only allow those that upgrade from v1 to v2 to actual delete. In which case, there's conservation here, because we'll need to create new storage anyway.

&& self
.get_features()
.is_enabled(FeatureFlag::SPONSORED_AUTOMATIC_ACCOUNT_CREATION)
&& max_gas_amount < 2 * storage_slot_cost
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are comparing gas units with octas -- see how it's dangerous?

.sign_fee_payer();

let output = h.run_raw(transaction);
// ECOIN_STORE_NOT_PUBLISHED
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we run a transaction, we split into prologue, user txn, epilogue, right? The error you are checking here comes from prologue, I believe, so you are not testing the user txn execution path. Adding a test which does divide by zero allows us to test the behaviour when failure comes inside of a user txn.

.max_gas_amount(100_010) // This is the minimum to execute this transaction
.gas_unit_price(1)
.sign_fee_payer();
let status = h.run_raw(transaction);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: status should be called output? In some other places as well?

@davidiw davidiw requested a review from 0xmaayan as a code owner December 12, 2023 18:37
@davidiw davidiw force-pushed the david-sponsored branch 2 times, most recently from 1132a43 to 361fa80 Compare December 12, 2023 19:08
@davidiw davidiw added the CICD:run-e2e-tests when this label is present github actions will run all land-blocking e2e tests from the PR label Dec 12, 2023

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

.gas_unit_price(1)
.sign_fee_payer();
let result = h.run_raw(transaction);
assert_eq!(result.gas_used(), 100_011);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: if we trigger out of gas, would be nice to assert the returned status as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these run out of gas in the epilogue, so we don't get those errors. only the logger would

let bob_start = h.read_aptos_balance(bob.address());

// will trigger a failed execution
let data =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can you copy the comment which says that this does divide by zero? So we don't forget what these bytes are :)

&TransactionStatus::Discard(StatusCode::MAX_GAS_UNITS_BELOW_MIN_TRANSACTION_GAS_UNITS)
));

let alice_after =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we assert accounts exist/do not exist explicitly? I have seen there are checks for sequence number which is sort of an existence check, but I think would be nice to see that in the test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a framework perspective, the sequence number is pretty explicit.

};
use move_core_types::{move_resource::MoveStructType, vm_status::StatusCode};

// Fee payer has several modes and requires several tests to validate:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

// * Account exists and transaction executes successfully
// * Account exists and transaction aborts but is kept
// * Account doesn't exist (seq num 0) and transaction executes successfully
// * Account doesn't exist (seq num 0), transaction aborts due move abort, and account is created
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: due --> due to

@davidiw
Copy link
Contributor Author

davidiw commented Dec 13, 2023

Thanks everyone for the thorough reviews.

@davidiw davidiw enabled auto-merge (rebase) December 13, 2023 16:15
* If an account is created during sponsored transaction and the txn aborts.
* The transaction is now kept but aborted.
* The state associated with account creation is wiped and The account is no longer created.
* During epilogue, we then hit a invariant violation, because the sequence number for the account cannot be incremented.

This ensures that we create an account even on transaction failure in the epilogue:

* during validation, sponsored transactions on sequence number 0, must
  have sufficient gas to create an account
* if the transaction aborts, call create account and charge gas appropriately

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

Copy link
Contributor

✅ Forge suite compat success on aptos-node-v1.7.3 ==> b003b9ff50b3f1e03735784648db87d63336541c

Compatibility test results for aptos-node-v1.7.3 ==> b003b9ff50b3f1e03735784648db87d63336541c (PR)
1. Check liveness of validators at old version: aptos-node-v1.7.3
compatibility::simple-validator-upgrade::liveness-check : committed: 4855 txn/s, latency: 6670 ms, (p50: 6800 ms, p90: 9600 ms, p99: 13800 ms), latency samples: 179660
2. Upgrading first Validator to new version: b003b9ff50b3f1e03735784648db87d63336541c
compatibility::simple-validator-upgrade::single-validator-upgrade : committed: 1851 txn/s, latency: 15694 ms, (p50: 18600 ms, p90: 22000 ms, p99: 22300 ms), latency samples: 92560
3. Upgrading rest of first batch to new version: b003b9ff50b3f1e03735784648db87d63336541c
compatibility::simple-validator-upgrade::half-validator-upgrade : committed: 1789 txn/s, latency: 16038 ms, (p50: 19100 ms, p90: 22300 ms, p99: 22600 ms), latency samples: 93040
4. upgrading second batch to new version: b003b9ff50b3f1e03735784648db87d63336541c
compatibility::simple-validator-upgrade::rest-validator-upgrade : committed: 3212 txn/s, latency: 9169 ms, (p50: 9700 ms, p90: 13600 ms, p99: 15300 ms), latency samples: 144540
5. check swarm health
Compatibility test for aptos-node-v1.7.3 ==> b003b9ff50b3f1e03735784648db87d63336541c passed
Test Ok

Copy link
Contributor

✅ Forge suite realistic_env_max_load success on b003b9ff50b3f1e03735784648db87d63336541c

two traffics test: inner traffic : committed: 8488 txn/s, submitted: 8489 txn/s, expired: 1 txn/s, latency: 4617 ms, (p50: 4200 ms, p90: 5400 ms, p99: 12000 ms), latency samples: 3658540
two traffics test : committed: 100 txn/s, latency: 2505 ms, (p50: 2200 ms, p90: 2600 ms, p99: 13800 ms), latency samples: 1740
Latency breakdown for phase 0: ["QsBatchToPos: max: 0.272, avg: 0.209", "QsPosToProposal: max: 0.152, avg: 0.136", "ConsensusProposalToOrdered: max: 0.578, avg: 0.546", "ConsensusOrderedToCommit: max: 0.500, avg: 0.477", "ConsensusProposalToCommit: max: 1.046, avg: 1.024"]
Max round gap was 1 [limit 4] at version 1744375. Max no progress secs was 9.167174 [limit 10] at version 1744375.
Test Ok

@davidiw davidiw merged commit 22ae522 into main Dec 13, 2023
46 of 48 checks passed
@davidiw davidiw deleted the david-sponsored branch December 13, 2023 17:00
Copy link
Contributor

❌ Forge suite framework_upgrade failure on aptos-node-v1.7.3 ==> b003b9ff50b3f1e03735784648db87d63336541c

Compatibility test results for aptos-node-v1.7.3 ==> b003b9ff50b3f1e03735784648db87d63336541c (PR)
Upgrade the nodes to version: b003b9ff50b3f1e03735784648db87d63336541c
Test Failed: API error: Unknown error error sending request for url (http://aptos-node-3-validator.forge-framework-upgrade-pr-11076.svc:8080/v1/estimate_gas_price): error trying to connect: dns error: failed to lookup address information: Name or service not known

Stack backtrace:
   0: <unknown>
   1: <unknown>
   2: <unknown>
   3: <unknown>
   4: <unknown>
   5: <unknown>
   6: <unknown>
   7: <unknown>
   8: <unknown>
   9: <unknown>
  10: <unknown>
  11: <unknown>
  12: <unknown>
  13: __libc_start_main
  14: <unknown>
Trailing Log Lines:
   6: <unknown>
   7: <unknown>
   8: <unknown>
   9: <unknown>
  10: <unknown>
  11: <unknown>
  12: <unknown>
  13: __libc_start_main
  14: <unknown>


Swarm logs can be found here: See fgi output for more information.
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: ApiError: namespaces "forge-framework-upgrade-pr-11076" not found: NotFound (ErrorResponse { status: "Failure", message: "namespaces \"forge-framework-upgrade-pr-11076\" not found", reason: "NotFound", code: 404 })

Caused by:
    namespaces "forge-framework-upgrade-pr-11076" not found: NotFound

Stack backtrace:
   0: <unknown>
   1: <unknown>
   2: <unknown>
   3: <unknown>
   4: <unknown>
   5: <unknown>
   6: <unknown>
   7: <unknown>
   8: <unknown>
   9: <unknown>
  10: <unknown>
  11: <unknown>
  12: <unknown>
  13: <unknown>
  14: <unknown>
  15: __libc_start_main
  16: <unknown>', testsuite/forge/src/backend/k8s/swarm.rs:676:18
stack backtrace:
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
Debugging output:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CICD:run-e2e-tests when this label is present github actions will run all land-blocking e2e tests from the PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants