Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faulty slashing when ex-validator is reintroduced into the validator set #1867

Closed
4 tasks
hendrikhofstadt opened this issue Jul 28, 2018 · 6 comments
Closed
4 tasks

Comments

@hendrikhofstadt
Copy link
Contributor

hendrikhofstadt commented Jul 28, 2018

Summary of Bug

When a validator is removed from the validator set because it was on the cliff and is later readded to the voting set the SigningInfo is not reset and the validator is slashed as if it was not signing in the period in which it was not in the validator set.

Example:

https://figment.network/cosmos/hubble/gaia-7003/validators/02C231DA2DFF636671D3789B5D651AED09B1B834

Our secondary validator was on the cliff in block 2943 and replaced by another validator with more voting power. As a result it was removed from the active validator-set.

On block 10004 the first slashing period ended and inactive validators (those who really missed blocks) were dropped from the validator-set. As a result our validator was readded to the validator-set.

However the SigningInfo was not reset so StartHeight and SignedBlocksCounter were still in the state in which they were in block <2943.

That later triggered the slashing code since we obviously did not sign blocks while we were not in the active validator-set but the code still thought we started signing on height 0 (StartHeight ).

That way the following condition was true even though:

if height > minHeight && signInfo.SignedBlocksCounter < k.MinSignedPerWindow(ctx) {
validator := k.validatorSet.ValidatorByPubKey(ctx, pubkey)

Solution

There are many ways to solve this of which some keep the number of missed blocks and others drop it.

The easiest would be resetting the SigningInfo but that would allow Byzantine validators to reset their SigningState by redelegating between multiple validators and cliffing themselves which we obviously not want.

Another way would be tracking the LastSignedHeight and adapting StartHeight (possible causing writes on every block).

Since this is only a fraction of possible solutions and this certainly requires some discussion in the SDK team the Certus.One team decided to not create a PR and rather start a discussion about what would be the best solution.


For Admin Use

  • Not duplicate issue
  • Appropriate labels applied
  • Appropriate contributors tagged
  • Contributor assigned/self-assigned
@rigelrozanski
Copy link
Contributor

rigelrozanski commented Jul 30, 2018

Thanks for this bug report! We should investigate soon. CC @cwgoes @alexanderbez

@rigelrozanski
Copy link
Contributor

Talking with @alexanderbez bout this now

I think that this is a general situation which applies to all validators who have left the validator set and are entering back into the validator set - not just the cliff validators.

We probably need to implement some a custom way of updating the SigningInfo if this situation occurs

@alexanderbez
Copy link
Contributor

The easiest would be resetting the SigningInfo but that would allow Byzantine validators to reset their SigningState by redelegating between multiple validators and cliffing themselves which we obviously not want.

I kind of get the gist of what you're saying, but do you want to elaborate on this a bit more?

Another way would be tracking the LastSignedHeight and adapting StartHeight (possible causing writes on every block).

Interesting idea, but indeed, as you pointed out, this would not be very efficient. Perhaps like @rigelrozanski we could have a special case of updating SigningInfo where if we know the validator was once in the set and we also know the last block that validator signed, we can use this information to update SigningInfo accordingly when said validator is added back into the set.

@alexanderbez alexanderbez changed the title Unjustified slashing when cliffed validator is readded to voting set Faulty slashing when ex-validator is reintroduced into the validator set Jul 30, 2018
@cwgoes
Copy link
Contributor

cwgoes commented Jul 30, 2018

I think this specific reported problem is only an issue when a full SignedBlocksWindow hasn't yet passed.

In the usual case, the validator has been signing for awhile and signInfo.SignedBlocksCounter >> MinSignedPerWindow - regardless of how they dip in and out of the bonded set, the counter will just track whether they've signed at least MinSignedPerWindow of the last SignedBlocksWindow blocks they should have signed, which may not be contiguous.

However, if a full SignedBlocksWindow hasn't yet passed and the validator is unbonded for awhile, when they come back in to the bonded set the counter will only reflect (at best) the number of blocks that they could have signed so far.

I think there are two ways to fix this that preserve the semantics we want:

  • Instead of counting the number of blocks signed, count the number of blocks missed (which will start at zero and count up) and check that the number of blocks missed is less than or equal to SignedBlocksWindow - MinSignedPerWindow, or initially set the counter to SignedBlocksWindow (assume the validator has signed some imaginary past blocks).
  • Track a second counter for the number of blocks a validator could have signed so far, and use that instead of height for the comparision (couldHaveSigned >= SignedBlocksWindow).

The latter requires extra store operations, so I think the former is preferable.

I think we also need to perform additional logic upon entering/leaving the bonded validator set later on after unrevoking, so that validators have at least SignedBlocksWindow blocks to update the counter - if they cycle in and out of the bonded set after unrevoking, that might not happen since the height >= minHeight comparision only tracks the block height. Instead, perhaps, we could reset the counter (and make sure the jail period is greater than SignedBlocksWindow blocks) along with tracking missed instead of signed.

The first thing we should do here (after deciding upon an approach) is update the spec.

@cwgoes
Copy link
Contributor

cwgoes commented Oct 5, 2018

I'm going to try to deal with this before Game of Stakes.

@cwgoes
Copy link
Contributor

cwgoes commented Oct 15, 2018

Latest approach outlined at #2480 (comment)

chillyvee pushed a commit to chillyvee/cosmos-sdk that referenced this issue Mar 1, 2024
* prepare v8 release (backport cosmos#1860) (cosmos#1867)

* prepare v8 release (cosmos#1860)

* chore: changelog

* chore: update changelog

* chore: add changelog for v7.0.3

* chore: add v7.1.0 to changelog

Co-authored-by: Yaru Wang <[email protected]>
(cherry picked from commit f0397fd)

# Conflicts:
#	CHANGELOG.md

* Update CHANGELOG.md

Co-authored-by: billy rennekamp <[email protected]>

* Added migrations for quicksilver stuck fund fix

* Migrations for quicksilver

* Check if balance of the refund addess is positive

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Co-authored-by: billy rennekamp <[email protected]>
Co-authored-by: lg <[email protected]>
chillyvee pushed a commit to chillyvee/cosmos-sdk that referenced this issue Mar 1, 2024
* prepare v8 release (backport cosmos#1860) (cosmos#1867)

* prepare v8 release (cosmos#1860)

* chore: changelog

* chore: update changelog

* chore: add changelog for v7.0.3

* chore: add v7.1.0 to changelog

Co-authored-by: Yaru Wang <[email protected]>
(cherry picked from commit f0397fd)

# Conflicts:
#	CHANGELOG.md

* Update CHANGELOG.md

Co-authored-by: billy rennekamp <[email protected]>

* docs: update changelog

Signed-off-by: Yaru Wang <[email protected]>

* docs: update docs

Signed-off-by: Yaru Wang <[email protected]>

Signed-off-by: Yaru Wang <[email protected]>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Co-authored-by: billy rennekamp <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants