[Staking] `check_payees` try-state check failing in Westend #3245

gpestana · 2024-02-07T21:46:41Z

The check_payees try-state check in Staking is failing in Westend. Figure out what is the reason and fix it.

[2024-02-07T16:13:20Z ERROR runtime::frame-support] ❌ "Staking" try_state checks failed: Other("number of entries in payee storage items does not match the number of bonded ledgers")

Example CI job error: https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/5142158#L2515

Todo before closing:

re-set required tag in the check-runtime-migration-westend CI job.

The text was updated successfully, but these errors were encountered:

gpestana · 2024-02-07T22:22:29Z

It seems that a staking ledger has been removed without clearing up the bonded and payee entry.

[2024-02-07T22:18:11Z INFO  runtime::staking] [19451704] 💸  count Ledger 72560, count Payee: 72561, count Bonded: 72561

The (old) controller account that is faulty is 5CqVcAhUzKbMMwrZiJDSAwXkmoYLpaYBXkobUAp3biVAQoXc

[2024-02-07T22:36:17Z INFO  runtime::staking] [19451704] 💸  controller that does not have a bonded entry: 2228ce54942b2da458b212dc8cb348f59752c28443538ea92324f9b890352611 (5CqVcAhU...)

Timeline:

2024-02-07 03:15:36 (+UTC) Batch extrinsic with bond + nominate
2024-02-07 03:17:24 (+UTC) Register fast_unstake
- 2024-02-07 03:17:24 (+UTC) unbonding, chilling and withdraw from fast-unstake were successfully
2024-02-07 03:17:42 (+UTC) Fast unstake enactment

Notes

The fast unstake called into kill_stash(stash) which, although the FastUnstake event has success: Ok, it seems it did not clean the corresponding Bonded<T>,Payee<T> and SlashingSpans<T> entries for that stash.

// for `5HHaaUvCwAb16KkKy7cpMnuzUgLWH94gEvMVXugc69ZEDfkj` stash

staking.slashingSpans: Option<PalletStakingSlashingSlashingSpans>
{
  spanIndex: 2
  lastStart: 5,639
  lastNonzeroSlash: 5,638
  prior: [
    5
  ]
}

staking.bonded: Option<AccountId32>
5CqVcAhUzKbMMwrZiJDSAwXkmoYLpaYBXkobUAp3biVAQoXc

gpestana · 2024-02-08T01:19:13Z

Solution

Reap the does not work, as it fails with staking.NotController since the staking ledger does not exist anymore in storage. This will require a small migration which I can work on.

sudo call to remove the entries in the storage
~~migration to remove the lingering Bonded and Payee entry for stash 5HHaaUvCwAb16KkKy7cpMnuzUgLWH94gEvMVXugc69ZEDfkj~~

Done. The solution was to set_storage of the staking.ledger(5CqVcAhUzKbMMwrZiJDSAwXkmoYLpaYBXkobUAp3biVAQoXc) with a ledger where the total = 0 and then call reap_stash to clean all the storage items of this ledger.

gpestana · 2024-02-23T12:47:57Z

The try-state is failing again, now with 16 accounts that are faulty. A recent deprecate_controller_batch seems to have been the culprit here. The stashes affected seem to also have bonded in the same timespan as the last stash that has been fixed.

Looking again into this and will check if this may happen to other bonded stashes (also in Kusama and Polkadot).

The solution should be the same as described above, for all the affected stashes.

Currently, the staking logic does not prevent a controller from becoming a stash of *another* ledger (introduced by [removing this check](https://github.com/paritytech/polkadot-sdk/pull/1484/files#diff-3aa6ceab5aa4e0ab2ed73a7245e0f5b42e0832d8ca5b1ed85d7b2a52fb196524L850)). Given that the remaining of the code expects that never happens, bonding a ledger with a stash that is a controller of another ledger may lead to data inconsistencies and data losses in bonded ledgers. For more detailed explanation of this issue: https://hackmd.io/@gpestana/HJoBm2tqo/%2FTPdi28H7Qc2mNUqLSMn15w In a nutshell, when fetching a ledger with a given controller, we may be end up getting the wrong ledger which can lead to unexpected ledger states. This PR also ensures that `set_controller` does not lead to data inconsistencies in the staking ledger and bonded storage in the case when a controller of a stash is a stash of *another* ledger. and improves the staking `try-runtime` checks to catch potential issues with the storage preemptively. In summary, there are two important cases here: 1. **"Sane" double bonded ledger** When a controller of a ledger is a stash of *another* ledger. In this case, we have: ``` > Bonded(stash, controller) (A, B) // stash A with controller B (B, C) // B is also a stash of another ledger (C, D) > Ledger(controller) Ledger(B) = L_a (stash = A) Ledger(C) = L_b (stash = B) Ledger(D) = L_c (stash = C) ``` In this case, the ledgers can be mutated and all operations are OK. However, we should not allow `set_controller` to be called if it means it results in a "corrupt" double bonded ledger (see below). 3. **"Corrupt" double bonded ledger** ``` > Bonded(stash, controller) (A, B) // stash A with controller B (B, B) (C, D) ``` In this case, B is a stash and controller AND is corrupted, since B is responsible for 2 ledgers which is not correct and will lead to inconsistent states. Thus, in this case, in this PR we are preventing these ledgers from mutating (i.e. operations like bonding extra etc) until the ledger is brought back to a consistent state. --- **Changes**: - Checks if stash is already a controller when calling `Call::bond` (fixes the regression introduced by [removing this check](https://github.com/paritytech/polkadot-sdk/pull/1484/files#diff-3aa6ceab5aa4e0ab2ed73a7245e0f5b42e0832d8ca5b1ed85d7b2a52fb196524L850)); - Ensures that all fetching ledgers from storage are done through the `StakingLedger` API; - Ensures that -- when fetching a ledger from storage using the `StakingLedger` API --, a `Error::BadState` is returned if the ledger bonding is in a bad state. This prevents bad ledgers from mutating (e.g. `bond_extra`, `set_controller`, etc) its state and avoid further data inconsistencies. - Prevents stashes which are controllers or another ledger from calling `set_controller`, since that may lead to a bad state. - Adds further try-state runtime checks that check if there are ledgers in a bad state based on their bonded metadata. Related to #3245 --------- Co-authored-by: Kian Paimani <[email protected]> Co-authored-by: kianenigma <[email protected]>

gpestana · 2024-03-18T09:04:57Z

For the record and for future reference, the root issue here is that the current staking logic is not preventing controllers from becoming stashes of different ledgers. This may lead to an account being stash of a ledger and a controller of another ledger. The 2nd order issue is that set_controller is not expecting controllers to be stashes of other ledgers. So ledgers in this state may end up corrupting the ledger data and metadata when calling set_controller. For more details on this issue and backstop patch, refer to #3639.

A patch release (v1.1.3) has been proposed for Kusama and Polkadot to prevent stashes from becoming controllers of other ledgers and backstop the corruption issue. The plan now is to recover the corrupted ledgers across all chains. Once that's done, we can close this issue.

Currently, the staking logic does not prevent a controller from becoming a stash of *another* ledger (introduced by [removing this check](https://github.com/paritytech/polkadot-sdk/pull/1484/files#diff-3aa6ceab5aa4e0ab2ed73a7245e0f5b42e0832d8ca5b1ed85d7b2a52fb196524L850)). Given that the remaining of the code expects that never happens, bonding a ledger with a stash that is a controller of another ledger may lead to data inconsistencies and data losses in bonded ledgers. For more detailed explanation of this issue: https://hackmd.io/@gpestana/HJoBm2tqo/%2FTPdi28H7Qc2mNUqLSMn15w In a nutshell, when fetching a ledger with a given controller, we may be end up getting the wrong ledger which can lead to unexpected ledger states. This PR also ensures that `set_controller` does not lead to data inconsistencies in the staking ledger and bonded storage in the case when a controller of a stash is a stash of *another* ledger. and improves the staking `try-runtime` checks to catch potential issues with the storage preemptively. In summary, there are two important cases here: 1. **"Sane" double bonded ledger** When a controller of a ledger is a stash of *another* ledger. In this case, we have: ``` > Bonded(stash, controller) (A, B) // stash A with controller B (B, C) // B is also a stash of another ledger (C, D) > Ledger(controller) Ledger(B) = L_a (stash = A) Ledger(C) = L_b (stash = B) Ledger(D) = L_c (stash = C) ``` In this case, the ledgers can be mutated and all operations are OK. However, we should not allow `set_controller` to be called if it means it results in a "corrupt" double bonded ledger (see below). 3. **"Corrupt" double bonded ledger** ``` > Bonded(stash, controller) (A, B) // stash A with controller B (B, B) (C, D) ``` In this case, B is a stash and controller AND is corrupted, since B is responsible for 2 ledgers which is not correct and will lead to inconsistent states. Thus, in this case, in this PR we are preventing these ledgers from mutating (i.e. operations like bonding extra etc) until the ledger is brought back to a consistent state. --- **Changes**: - Checks if stash is already a controller when calling `Call::bond` (fixes the regression introduced by [removing this check](https://github.com/paritytech/polkadot-sdk/pull/1484/files#diff-3aa6ceab5aa4e0ab2ed73a7245e0f5b42e0832d8ca5b1ed85d7b2a52fb196524L850)); - Ensures that all fetching ledgers from storage are done through the `StakingLedger` API; - Ensures that -- when fetching a ledger from storage using the `StakingLedger` API --, a `Error::BadState` is returned if the ledger bonding is in a bad state. This prevents bad ledgers from mutating (e.g. `bond_extra`, `set_controller`, etc) its state and avoid further data inconsistencies. - Prevents stashes which are controllers or another ledger from calling `set_controller`, since that may lead to a bad state. - Adds further try-state runtime checks that check if there are ledgers in a bad state based on their bonded metadata. Related to paritytech#3245 --------- Co-authored-by: Kian Paimani <[email protected]> Co-authored-by: kianenigma <[email protected]>

Backport for 1.7: #3639 Relevant Issues: - #3245

Backport for 1.7: - #3639 - #3706 Relevant Issues: - #3245

This PR adds a new extrinsic `Call::restore_ledger ` gated by `StakingAdmin` origin that restores a corrupted staking ledger. This extrinsic will be used to recover ledgers that were affected by the issue discussed in #3245. The extrinsic will re-write the storage items associated with a stash account provided as input parameter. The data used to reset the ledger can be either i) fetched on-chain or ii) partially/totally set by the input parameters of the call. In order to use on-chain data to restore the staking locks, we need a way to read the current lock in the balances pallet. This PR adds a `InspectLockableCurrency` trait and implements it in the pallet balances. An alternative would be to tightly couple staking with the pallet balances but that's inelegant (an example of how it would look like in [this branch](https://github.com/paritytech/polkadot-sdk/tree/gpestana/ledger-badstate-clean_tightly)). More details on the type of corruptions and corresponding fixes https://hackmd.io/DLb5jEYWSmmvqXC9ae4yRg?view#/ We verified that the `Call::restore_ledger` does fix all current corrupted ledgers in Polkadot and Kusama. You can verify it here https://hackmd.io/v-XNrEoGRpe7APR-EZGhOA. **Changes introduced** - Adds `Call::restore_ledger ` extrinsic to recover a corrupted ledger; - Adds trait `frame_support::traits::currency::InspectLockableCurrency` to allow external pallets to read current locks given an account and lock ID; - Implements the `InspectLockableCurrency` in the pallet-balances. - Adds staking locks try-runtime checks (#3751) **Todo** - [x] benchmark `Call::restore_ledger` - [x] throughout testing of all ledger recovering cases - [x] consider adding the staking locks try-runtime checks to this PR (#3751) - [x] simulate restoring all ledgers (https://hackmd.io/Dsa2tvhISNSs7zcqriTaxQ?view) in Polkadot and Kusama using chopsticks -- https://hackmd.io/v-XNrEoGRpe7APR-EZGhOA Related to #3245 Closes #3751 --------- Co-authored-by: command-bot <>

Backports for 1.7: - #3639 - #3706 Relevant Issues: - #3245

This PR adds a new extrinsic `Call::restore_ledger ` gated by `StakingAdmin` origin that restores a corrupted staking ledger. This extrinsic will be used to recover ledgers that were affected by the issue discussed in paritytech#3245. The extrinsic will re-write the storage items associated with a stash account provided as input parameter. The data used to reset the ledger can be either i) fetched on-chain or ii) partially/totally set by the input parameters of the call. In order to use on-chain data to restore the staking locks, we need a way to read the current lock in the balances pallet. This PR adds a `InspectLockableCurrency` trait and implements it in the pallet balances. An alternative would be to tightly couple staking with the pallet balances but that's inelegant (an example of how it would look like in [this branch](https://github.com/paritytech/polkadot-sdk/tree/gpestana/ledger-badstate-clean_tightly)). More details on the type of corruptions and corresponding fixes https://hackmd.io/DLb5jEYWSmmvqXC9ae4yRg?view#/ We verified that the `Call::restore_ledger` does fix all current corrupted ledgers in Polkadot and Kusama. You can verify it here https://hackmd.io/v-XNrEoGRpe7APR-EZGhOA. **Changes introduced** - Adds `Call::restore_ledger ` extrinsic to recover a corrupted ledger; - Adds trait `frame_support::traits::currency::InspectLockableCurrency` to allow external pallets to read current locks given an account and lock ID; - Implements the `InspectLockableCurrency` in the pallet-balances. - Adds staking locks try-runtime checks (paritytech#3751) **Todo** - [x] benchmark `Call::restore_ledger` - [x] throughout testing of all ledger recovering cases - [x] consider adding the staking locks try-runtime checks to this PR (paritytech#3751) - [x] simulate restoring all ledgers (https://hackmd.io/Dsa2tvhISNSs7zcqriTaxQ?view) in Polkadot and Kusama using chopsticks -- https://hackmd.io/v-XNrEoGRpe7APR-EZGhOA Related to paritytech#3245 Closes paritytech#3751 --------- Co-authored-by: command-bot <>

Currently, the staking logic does not prevent a controller from becoming a stash of *another* ledger (introduced by [removing this check](https://github.com/paritytech/polkadot-sdk/pull/1484/files#diff-3aa6ceab5aa4e0ab2ed73a7245e0f5b42e0832d8ca5b1ed85d7b2a52fb196524L850)). Given that the remaining of the code expects that never happens, bonding a ledger with a stash that is a controller of another ledger may lead to data inconsistencies and data losses in bonded ledgers. For more detailed explanation of this issue: https://hackmd.io/@gpestana/HJoBm2tqo/%2FTPdi28H7Qc2mNUqLSMn15w In a nutshell, when fetching a ledger with a given controller, we may be end up getting the wrong ledger which can lead to unexpected ledger states. This PR also ensures that `set_controller` does not lead to data inconsistencies in the staking ledger and bonded storage in the case when a controller of a stash is a stash of *another* ledger. and improves the staking `try-runtime` checks to catch potential issues with the storage preemptively. In summary, there are two important cases here: 1. **"Sane" double bonded ledger** When a controller of a ledger is a stash of *another* ledger. In this case, we have: ``` > Bonded(stash, controller) (A, B) // stash A with controller B (B, C) // B is also a stash of another ledger (C, D) > Ledger(controller) Ledger(B) = L_a (stash = A) Ledger(C) = L_b (stash = B) Ledger(D) = L_c (stash = C) ``` In this case, the ledgers can be mutated and all operations are OK. However, we should not allow `set_controller` to be called if it means it results in a "corrupt" double bonded ledger (see below). 3. **"Corrupt" double bonded ledger** ``` > Bonded(stash, controller) (A, B) // stash A with controller B (B, B) (C, D) ``` In this case, B is a stash and controller AND is corrupted, since B is responsible for 2 ledgers which is not correct and will lead to inconsistent states. Thus, in this case, in this PR we are preventing these ledgers from mutating (i.e. operations like bonding extra etc) until the ledger is brought back to a consistent state. --- **Changes**: - Checks if stash is already a controller when calling `Call::bond` (fixes the regression introduced by [removing this check](https://github.com/paritytech/polkadot-sdk/pull/1484/files#diff-3aa6ceab5aa4e0ab2ed73a7245e0f5b42e0832d8ca5b1ed85d7b2a52fb196524L850)); - Ensures that all fetching ledgers from storage are done through the `StakingLedger` API; - Ensures that -- when fetching a ledger from storage using the `StakingLedger` API --, a `Error::BadState` is returned if the ledger bonding is in a bad state. This prevents bad ledgers from mutating (e.g. `bond_extra`, `set_controller`, etc) its state and avoid further data inconsistencies. - Prevents stashes which are controllers or another ledger from calling `set_controller`, since that may lead to a bad state. - Adds further try-state runtime checks that check if there are ledgers in a bad state based on their bonded metadata. Related to #3245 --------- Co-authored-by: Kian Paimani <[email protected]> Co-authored-by: kianenigma <[email protected]>

This backport PR should bump the `pallet-staking` from 30.0.1 to 30.0.2. Backports for 1.8: - #3639 Relevant Issues: - #3245 --------- Co-authored-by: Kian Paimani <[email protected]> Co-authored-by: kianenigma <[email protected]>

This backport PR should bump the `pallet-staking` from 28.0.0 to 28.0.1. Backports for 1.6: - #3639 Relevant Issues: - #3245 --------- Co-authored-by: Kian Paimani <[email protected]> Co-authored-by: kianenigma <[email protected]>

This backport PR should bump the `pallet-staking` from 27.0.0 to 27.0.1 Backports for 1.5: - #3639 Relevant Issues: - #3245 Co-authored-by: Kian Paimani <[email protected]> Co-authored-by: kianenigma <[email protected]>

This backport PR should bump the `pallet-staking` from 26.0.1 to 26.0.2. Backports for 1.4: - #3639 Relevant Issues: - #3245 Co-authored-by: Kian Paimani <[email protected]> Co-authored-by: kianenigma <[email protected]>

gpestana added T2-pallets This PR/Issue is related to a particular pallet. T10-tests This PR/Issue is related to tests. labels Feb 7, 2024

gpestana self-assigned this Feb 7, 2024

gpestana mentioned this issue Feb 8, 2024

Disable try state payees check from pallet-staking temporarily #3259

Closed

gpestana mentioned this issue Mar 11, 2024

Staking ledger bonding fixes #3639

Merged

gpestana mentioned this issue Mar 14, 2024

Consider implementing a checked_mutate for all the mutations in the staking ledger related storage items #3697

Open

This was referenced Mar 18, 2024

Enable all try-state checks in staking #3728

Open

Extrinsic to restore corrupt staking ledgers #3706

Merged

gpestana added a commit that referenced this issue Mar 27, 2024

Staking ledger bonding fixes (#3639)

3703eab

Backport for 1.7: #3639 Relevant Issues: - #3245

gpestana mentioned this issue Mar 27, 2024

Staking ledger bonding fixes (#3639) #3848

Closed

gpestana added a commit that referenced this issue Mar 27, 2024

Backport staking pallet fix and recover call

07a6733

Backport for 1.7: - #3639 - #3706 Relevant Issues: - #3245

gpestana mentioned this issue Mar 27, 2024

Backport staking pallet fix and recover call #3869

Closed

gpestana mentioned this issue Mar 27, 2024

Release crates io v1.7.0 staking backports #3871

Merged

bkchr pushed a commit that referenced this issue Mar 28, 2024

Release crates io v1.7.0 staking backports (#3871)

297f545

Backports for 1.7: - #3639 - #3706 Relevant Issues: - #3245

gpestana mentioned this issue Apr 11, 2024

Release crates io v1.6.0 staking backport #4082

Closed

gpestana mentioned this issue Apr 11, 2024

Release crates io v1.5.0 staking backport #4084

Closed

gpestana mentioned this issue Apr 11, 2024

Release crates io v1.4.0 staking backport #4086

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Staking] `check_payees` try-state check failing in Westend #3245

[Staking] `check_payees` try-state check failing in Westend #3245

gpestana commented Feb 7, 2024 •

edited

Loading

gpestana commented Feb 7, 2024 •

edited

Loading

gpestana commented Feb 8, 2024 •

edited

Loading

gpestana commented Feb 23, 2024

gpestana commented Mar 18, 2024 •

edited

Loading

[Staking] check_payees try-state check failing in Westend #3245

[Staking] check_payees try-state check failing in Westend #3245

Comments

gpestana commented Feb 7, 2024 • edited Loading

gpestana commented Feb 7, 2024 • edited Loading

gpestana commented Feb 8, 2024 • edited Loading

Solution

gpestana commented Feb 23, 2024

gpestana commented Mar 18, 2024 • edited Loading

[Staking] `check_payees` try-state check failing in Westend #3245

[Staking] `check_payees` try-state check failing in Westend #3245

gpestana commented Feb 7, 2024 •

edited

Loading

gpestana commented Feb 7, 2024 •

edited

Loading

gpestana commented Feb 8, 2024 •

edited

Loading

gpestana commented Mar 18, 2024 •

edited

Loading