Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[multisig v2] Enhanced multisig v2 with batch operations and the max pending txn limit #12241

Merged
merged 1 commit into from
Mar 6, 2024

Conversation

junkil-park
Copy link
Contributor

@junkil-park junkil-park commented Feb 26, 2024

Description

  • introduced and enforced MAX_PENDING_TRANSACTIONS
  • added entry functions for batch voting
  • added entry functions for batch executing rejected transactions

This PR resolves #8411.

Test Plan

aptos move test

Copy link

trunk-io bot commented Feb 26, 2024

⏱️ 5h 49m total CI duration on this PR
Job Cumulative Duration Recent Runs
rust-unit-tests 1h 31m 🟩🟩🟩
windows-build 1h 11m 🟩🟩🟩🟩
rust-move-tests 54m 🟩🟩🟩
rust-move-unit-coverage 53m 🟩🟩🟩
run-tests-main-branch 24m 🟥🟥🟥🟩
rust-lints 23m 🟩🟩🟩
check 13m 🟩🟩🟩
check-dynamic-deps 8m 🟩🟩🟩🟩
general-lints 8m 🟩🟩🟩
semgrep/ci 2m 🟩🟩🟩🟩🟩
file_change_determinator 47s 🟩🟩🟩🟩
file_change_determinator 41s 🟩🟩🟩🟩
permission-check 18s 🟩🟩🟩🟩
permission-check 13s 🟩🟩🟩🟩
permission-check 13s 🟩🟩🟩🟩
permission-check 13s 🟩🟩🟩🟩

🚨 1 job on the last run was significantly faster/slower than expected

Job Duration vs 7d avg Delta
windows-build 29m 20m +44%

settingsfeedbackdocs ⋅ learn more about trunk.io

Copy link

codecov bot commented Feb 26, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 63.9%. Comparing base (53c85b3) to head (a8bff20).
Report is 9 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff            @@
##             main   #12241     +/-   ##
=========================================
- Coverage    63.9%    63.9%   -0.1%     
=========================================
  Files         799      801      +2     
  Lines      177623   177641     +18     
=========================================
+ Hits       113628   113629      +1     
- Misses      63995    64012     +17     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@alnoki alnoki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside from the underflow comment, I think this solves the DoS problem. Can you confirm this is the flow that happens in practice for an m-of-n multisig?

  1. Attacker compromises a signer, submits 10 bogus txns to the queue
  2. m signers call vote_all_transactions() with approved = false
  3. Evictor calls create_transaction_with_eviction() with payload of swap_owners_and_update_signatures_required()
  4. m - 1 signers approve the new payload
  5. Evictor executes the payload
  6. Evictor invokes execute_rejected_transactions()
  7. Multisig has empty queue and honest signatory set

#[view]
public fun available_transaction_queue_capacity(multisig_account: address): u64 acquires MultisigAccount {
let multisig_account_resource = borrow_global_mut<MultisigAccount>(multisig_account);
let num_pending_transactions = multisig_account_resource.next_sequence_number - multisig_account_resource.last_executed_sequence_number - 1;

This comment was marked as resolved.

Copy link
Contributor Author

@junkil-park junkil-park Feb 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

next_sequence_number is initially 1. So, there should be no underflow:

@junkil-park
Copy link
Contributor Author

junkil-park commented Feb 26, 2024

Aside from the underflow comment, I think this solves the DoS problem. Can you confirm this is the flow that happens in practice for an m-of-n multisig?

  1. Attacker compromises a signer, submits 10 bogus txns to the queue
  2. m signers call vote_all_transactions() with approved = false
  3. Evictor calls create_transaction_with_eviction() with payload of swap_owners_and_update_signatures_required()
  4. m - 1 signers approve the new payload
  5. Evictor executes the payload
  6. Evictor invokes execute_rejected_transactions()
  7. Multisig has empty queue and honest signatory set

@alnoki , in the step list above ^^, the order of the step 5 and the step 6 should be swapped because the payload would be at the end of the queue. Other than that, the flow looks right!

@junkil-park junkil-park force-pushed the jpark/multisig-account-dos-mitigation branch from 775cbad to 414fad4 Compare February 27, 2024 10:30
@junkil-park junkil-park marked this pull request as ready for review February 27, 2024 10:30
@junkil-park junkil-park force-pushed the jpark/multisig-account-dos-mitigation branch from 414fad4 to af48cbe Compare February 27, 2024 10:31
@junkil-park junkil-park requested a review from alnoki February 27, 2024 10:31
@junkil-park junkil-park changed the title Implemented a DoS mitigation for multisig accounts [multisig v2] Implemented a DoS mitigation for multisig accounts Feb 27, 2024
@alnoki
Copy link
Contributor

alnoki commented Feb 27, 2024

Aside from the underflow comment, I think this solves the DoS problem. Can you confirm this is the flow that happens in practice for an m-of-n multisig?

  1. Attacker compromises a signer, submits 10 bogus txns to the queue
  2. m signers call vote_all_transactions() with approved = false
  3. Evictor calls create_transaction_with_eviction() with payload of swap_owners_and_update_signatures_required()
  4. m - 1 signers approve the new payload
  5. Evictor executes the payload
  6. Evictor invokes execute_rejected_transactions()
  7. Multisig has empty queue and honest signatory set

@alnoki , in the step list above ^^, the order of the step 5 and the step 6 should be swapped because the payload would be at the end of the queue. Other than that, the flow looks right!

@junkil-park yes, thank you for the clarification.

Noting that in between invocation of execute_rejected_transactions() and execution of the swap_owners_and_update_signatures_required() step the attacker could fill up the queue again with more bogus txns, even though they are destined to be removed

But seeing as the new functionality nevertheless deters flooding behavior, regardless of how many "cycles" of queue packing there are, this flow should solve the DoS problem and I approve the changes

Copy link
Contributor

@movekevin movekevin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some suggestions:

  1. Add support for the new bulk operations in the CLI
  2. Remove the _with_eviction version. This is adding a lot of entry functions that are not useful for the common cases. In case where there's a lot of txs to be rejected and a new one to be inserted at the same time, a script can be used with the CLI

@alnoki
Copy link
Contributor

alnoki commented Feb 29, 2024

2. Remove the _with_eviction version. This is adding a lot of entry functions that are not useful for the common cases. In case where there's a lot of txs to be rejected and a new one to be inserted at the same time, a script can be used with the CLI

@movekevin the _with_eviction version is required to mitigate the DoS vector cleanly, without it an attacker could spin up a bot that inserts a bogus txn to clog the queue anytime another one is rejected, blocking the honest signatories from enqueueing the transaction to change the signatory manifest

cc @junkil-park

@junkil-park
Copy link
Contributor Author

In case where there's a lot of txs to be rejected and a new one to be inserted at the same time, a script can be used with the CLI
Just to clarify something, the _with_eviction version would normally reject one txn rather than many txns (@movekevin ).

@movekevin the _with_eviction version is required to mitigate the DoS vector cleanly, without it an attacker could spin up a bot that inserts a bogus txn to clog the queue anytime another one is rejected, blocking the honest signatories from enqueueing the transaction to change the signatory manifest

@alnoki , I think Kevin suggests using a script in this case while simplifying the Multisig module API.

@junkil-park junkil-park force-pushed the jpark/multisig-account-dos-mitigation branch from af48cbe to 30a4349 Compare March 4, 2024 14:41
@alnoki
Copy link
Contributor

alnoki commented Mar 4, 2024

In case where there's a lot of txs to be rejected and a new one to be inserted at the same time, a script can be used with the CLI
Just to clarify something, the _with_eviction version would normally reject one txn rather than many txns (@movekevin ).

@movekevin the _with_eviction version is required to mitigate the DoS vector cleanly, without it an attacker could spin up a bot that inserts a bogus txn to clog the queue anytime another one is rejected, blocking the honest signatories from enqueueing the transaction to change the signatory manifest

@alnoki , I think Kevin suggests using a script in this case while simplifying the Multisig module API.

@junkil-park scripts can offer the same functionality, but they introduce major UX overhead since scripts can essentially only be submitted from the CLI at present. This contrasts with standalone public entry functions which are easily accessible in a variety of formats, most notably through MSafe's UI

@junkil-park junkil-park force-pushed the jpark/multisig-account-dos-mitigation branch from 30a4349 to 7fccf86 Compare March 5, 2024 01:05
@junkil-park junkil-park changed the title [multisig v2] Implemented a DoS mitigation for multisig accounts [multisig v2] Enhanced multisig v2 with batch operations and the max pending txn limit Mar 5, 2024
- introduced and enforced `MAX_PENDING_TRANSACTIONS`
- added entry functions for batch voting
- added entry functions for batch executing rejected transactions
@junkil-park junkil-park force-pushed the jpark/multisig-account-dos-mitigation branch from 7fccf86 to a8bff20 Compare March 6, 2024 01:33
@junkil-park junkil-park enabled auto-merge (squash) March 6, 2024 01:34

This comment has been minimized.

This comment has been minimized.

Copy link
Contributor

github-actions bot commented Mar 6, 2024

✅ Forge suite compat success on aptos-node-v1.9.5 ==> a8bff2020b7c573c784a67f11438ddabc2191d9d

Compatibility test results for aptos-node-v1.9.5 ==> a8bff2020b7c573c784a67f11438ddabc2191d9d (PR)
1. Check liveness of validators at old version: aptos-node-v1.9.5
compatibility::simple-validator-upgrade::liveness-check : committed: 6953 txn/s, latency: 4770 ms, (p50: 4800 ms, p90: 7500 ms, p99: 8100 ms), latency samples: 243380
2. Upgrading first Validator to new version: a8bff2020b7c573c784a67f11438ddabc2191d9d
compatibility::simple-validator-upgrade::single-validator-upgrade : committed: 707 txn/s, latency: 34253 ms, (p50: 36700 ms, p90: 53400 ms, p99: 55900 ms), latency samples: 58700
3. Upgrading rest of first batch to new version: a8bff2020b7c573c784a67f11438ddabc2191d9d
compatibility::simple-validator-upgrade::half-validator-upgrade : committed: 162 txn/s, submitted: 507 txn/s, expired: 344 txn/s, latency: 47004 ms, (p50: 43300 ms, p90: 55600 ms, p99: 76800 ms), latency samples: 12864
4. upgrading second batch to new version: a8bff2020b7c573c784a67f11438ddabc2191d9d
compatibility::simple-validator-upgrade::rest-validator-upgrade : committed: 2611 txn/s, latency: 11620 ms, (p50: 11100 ms, p90: 17200 ms, p99: 18100 ms), latency samples: 122740
5. check swarm health
Compatibility test for aptos-node-v1.9.5 ==> a8bff2020b7c573c784a67f11438ddabc2191d9d passed
Test Ok

Copy link
Contributor

github-actions bot commented Mar 6, 2024

✅ Forge suite realistic_env_max_load success on a8bff2020b7c573c784a67f11438ddabc2191d9d

two traffics test: inner traffic : committed: 7866 txn/s, latency: 4986 ms, (p50: 4800 ms, p90: 5700 ms, p99: 9600 ms), latency samples: 3398460
two traffics test : committed: 100 txn/s, latency: 1826 ms, (p50: 1800 ms, p90: 2100 ms, p99: 2700 ms), latency samples: 1760
Latency breakdown for phase 0: ["QsBatchToPos: max: 0.226, avg: 0.203", "QsPosToProposal: max: 0.314, avg: 0.278", "ConsensusProposalToOrdered: max: 0.465, avg: 0.428", "ConsensusOrderedToCommit: max: 0.306, avg: 0.299", "ConsensusProposalToCommit: max: 0.736, avg: 0.727"]
Max round gap was 1 [limit 4] at version 1668201. Max no progress secs was 4.352516 [limit 15] at version 1668201.
Test Ok

@junkil-park junkil-park merged commit 20a162a into main Mar 6, 2024
82 of 86 checks passed
@junkil-park junkil-park deleted the jpark/multisig-account-dos-mitigation branch March 6, 2024 02:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug][Aptos Framework][API][Transaction][VM] Multisig v2 FIFO invariant facilitates DoS attacks
4 participants