Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gaia-7004 halt stack trace #1920

Closed
zmanian opened this issue Aug 4, 2018 · 6 comments
Closed

Gaia-7004 halt stack trace #1920

zmanian opened this issue Aug 4, 2018 · 6 comments

Comments

@zmanian
Copy link
Member

zmanian commented Aug 4, 2018

E[08-04|03:49:20.775] CONSENSUS FAILURE!!!						 module=consensus err="should not already be unbonded,  validator: {A7B3C7DA2964ED56C46142C8254B9249E17FD8FB PubKeyEd25519{11A7882002254A7995D7D5B94DFA2E6772AEA2C6A675B8DDF7E9FB8263D3245E} false 0 29/1 290/9 {Stone [do-not-modify] [do-not-modify] [do-not-modify]} 20127 0  0/1 0/1 0/1 0/1 0/1}"
stack="goroutine 1159 [running]:
runtime/debug.Stack(0xc4327dbb58, 0xd01500, 0xc4327a39e0)
	/snap/go/2130/src/runtime/debug/stack.go:24 +0xa7
	github.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/consensus.(*ConsensusState).receiveRoutine.func1(0xc420110900)
	/home/zaki/go/src/github.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/consensus/state.go:558 +0x57
panic(0xd01500, 0xc4327a39e0)
	/snap/go/2130/src/runtime/panic.go:502 +0x229
github.com/cosmos/cosmos-sdk/x/stake/keeper.Keeper.unbondValidator(0x105dbe0, 0xc42001cb90, 0xc4200fc540, 0x105dbe0, 0xc42001cb60, 0xfe78b0, 0xc4200fc540, 0x4, 0x10658e0, 0xc42b3caab0, ...)
	/home/zaki/go/src/github.com/cosmos/cosmos-sdk/x/stake/keeper/validator.go:487 +0x797
github.com/cosmos/cosmos-sdk/x/stake/keeper.Keeper.UpdateBondedValidators(0x105dbe0, 0xc42001cb90, 0xc4200fc540, 0x105dbe0, 0xc42001cb60, 0xfe78b0, 0xc4200fc540, 0x4, 0x10658e0, 0xc42b3caab0, ...)
	/home/zaki/go/src/github.com/cosmos/cosmos-sdk/x/stake/keeper/validator.go:384 +0x867
github.com/cosmos/cosmos-sdk/x/stake/keeper.Keeper.UpdateValidator(0x105dbe0, 0xc42001cb90, 0xc4200fc540, 0x105dbe0, 0xc42001cb60, 0xfe78b0, 0xc4200fc540, 0x4, 0x10658e0, 0xc42b3caab0, ...)
	/home/zaki/go/src/github.com/cosmos/cosmos-sdk/x/stake/keeper/validator.go:238 +0x6db
github.com/cosmos/cosmos-sdk/x/stake/keeper.Keeper.Slash(0x105dbe0, 0xc42001cb90, 0xc4200fc540, 0x105dbe0, 0xc42001cb60, 0xfe78b0, 0xc4200fc540, 0x4, 0x10658e0, 0xc42b3caab0, ...)
	/home/zaki/go/src/github.com/cosmos/cosmos-sdk/x/stake/keeper/slash.go:103 +0xccc
github.com/cosmos/cosmos-sdk/x/slashing.Keeper.handleValidatorSignature(0x105dbe0, 0xc42001cbb0, 0xc4200fc540, 0x106bf80, 0xc420080880, 0xa, 0x10658e0, 0xc42b3caab0, 0xc425e5dc00, 0x9, ...)
	/home/zaki/go/src/github.com/cosmos/cosmos-sdk/x/slashing/keeper.go:106 +0x7ea
github.com/cosmos/cosmos-sdk/x/slashing.BeginBlocker(0x10658e0, 0xc42b3caab0, 0xc425e5dc00, 0x9, 0xc426ec51e0, 0x14, 0x20, 0xc4333eab50, 0x9, 0x10ea0, ...)
	/home/zaki/go/src/github.com/cosmos/cosmos-sdk/x/slashing/tick.go:28 +0x1d8
github.com/cosmos/cosmos-sdk/cmd/gaia/app.(*GaiaApp).BeginBlocker(0xc42082c340, 0x10658e0, 0xc42b3caab0, 0xc425e5dc00, 0x9, 0xc426ec51e0, 0x14, 0x20, 0xc4333eab50, 0x9, ...)
	/home/zaki/go/src/github.com/cosmos/cosmos-sdk/cmd/gaia/app/app.go:131 +0xc3
github.com/cosmos/cosmos-sdk/cmd/gaia/app.(*GaiaApp).BeginBlocker-fm(0x10658e0, 0xc42b3caab0, 0xc425e5dc00, 0x9, 0xc426ec51e0, 0x14, 0x20, 0xc4333eab50, 0x9, 0x10ea0, ...)
	/home/zaki/go/src/github.com/cosmos/cosmos-sdk/cmd/gaia/app/app.go:103 +0xa0
github.com/cosmos/cosmos-sdk/baseapp.(*BaseApp).BeginBlock(0xc4207e0000, 0xc426ec51e0, 0x14, 0x20, 0xc4333eab50, 0x9, 0x10ea0, 0x5b65223f, 0x0, 0xa7c, ...)
	/home/zaki/go/src/github.com/cosmos/cosmos-sdk/baseapp/baseapp.go:432 +0x1ef
github.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/abci/client.(*localClient).BeginBlockSync(0xc42008cd20, 0xc426ec51e0, 0x14, 0x20, 0xc4333eab50, 0x9, 0x10ea0, 0x5b65223f, 0x0, 0xa7c, ...)
	/home/zaki/go/src/github.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/abci/client/local_client.go:206 +0xab
github.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/proxy.(*appConnConsensus).BeginBlockSync(0xc4208c1350, 0xc426ec51e0, 0x14, 0x20, 0xc4333eab50, 0x9, 0x10ea0, 0x5b65223f, 0x0, 0xa7c, ...)
	/home/zaki/go/src/github.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/proxy/app_conn.go:69 +0x78
github.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/state.execBlockOnProxyApp(0x10664a0, 0xc421a7eaa0, 0x106b320, 0xc4208c1350, 0xc426b4c680, 0xc424556de0, 0x106f540, 0xc42000e080, 0x1, 0xc424766bc0, ...)
	/home/zaki/go/src/github.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/state/execution.go:190 +0x53b
github.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/state.(*BlockExecutor).ApplyBlock(0xc421a18060, 0xc421c924d0, 0x9, 0x10e9f, 0xa7c, 0xc431912e40, 0x14, 0x20, 0x1, 0xc424766bc0, ...)
	/home/zaki/go/src/github.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/state/execution.go:76 +0x12f
github.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/consensus.(*ConsensusState).finalizeCommit(0xc420110900, 0x10ea0)
	/home/zaki/go/src/github.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/consensus/state.go:1290 +0xba6
github.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/consensus.(*ConsensusState).tryFinalizeCommit(0xc420110900, 0x10ea0)
	/home/zaki/go/src/github.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/consensus/state.go:1221 +0x468
github.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/consensus.(*ConsensusState).enterCommit.func1(0xc420110900, 0x0, 0x10ea0)
	/home/zaki/go/src/github.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/consensus/state.go:1169 +0x98
github.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/consensus.(*ConsensusState).enterCommit(0xc420110900, 0x10ea0, 0x0)
	/home/zaki/go/src/github.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/consensus/state.go:1198 +0x802
github.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/consensus.(*ConsensusState).addVote(0xc420110900, 0xc429435b80, 0xc427fd6060, 0x28, 0x17d3360, 0xc4369ddae0, 0x43e819)
	/home/zaki/go/src/github.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/consensus/state.go:1601 +0xbb4
github.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/consensus.(*ConsensusState).tryAddVote(0xc420110900, 0xc429435b80, 0xc427fd6060, 0x28, 0xfa, 0xf2)
	/home/zaki/go/src/github.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/consensus/state.go:1459 +0x56
github.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/consensus.(*ConsensusState).handleMsg(0xc420110900, 0xd3e500, 0xc427dcf318, 0xc427fd6060, 0x28)
	/home/zaki/go/src/github.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/consensus/state.go:628 +0x64f
github.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/consensus.(*ConsensusState).receiveRoutine(0xc420110900, 0x0)
	/home/zaki/go/src/github.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/consensus/state.go:580 +0x6d2
created by github.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/consensus.(*ConsensusState).OnStart
	/home/zaki/go/src/github.com/cosmos/cosmos-sdk/vendor/github.com/tendermint/tendermint/consensus/state.go:295 +0x140
"
@gamarin2
Copy link
Contributor

gamarin2 commented Aug 4, 2018

@rigelrozanski
Copy link
Contributor

rigelrozanski commented Aug 7, 2018

I think I figured out the situation which might cause this - it should be resolved by https://github.com/cosmos/cosmos-sdk/pull/1858/files

Consider this situation:

  • the cliff validator increases power from "5" to "10" but cliff-power record not properly updated
  • new validator now has "6" bonded power, effectively becoming the cliff-validator because cliff power improperly updated
  • a new validator bonds "7" which kicks out the false cliff-validator, when it tries to unbond it, it finds it's already unbonded

I haven't been able to write a failing test for this one yet however

@jaekwon
Copy link
Contributor

jaekwon commented Aug 7, 2018

the cliff validator increases power from "5" to "10" but cliff-power record not properly updated

Can you explain this in more detail? What's the sequence around the oldCliffValidatorAddr variable? In your scenario, after step 2 ("6" bonded power), is the cliff validator the one that became 10? Then, a new validator bonds 7 and kicks out "6", wouldn't "10" still be bonded?

@rigelrozanski
Copy link
Contributor

after step 2 ("6" bonded power), is the cliff validator the one that became 10?

In this situation the cliff validator should be 10, however because the old cliff power wasn't updated correctly the protocol may think that the cliff validator actually the new validator which has a power of "6"

Then, a new validator bonds 7 and kicks out "6", wouldn't "10" still be bonded?

Correct, however if the "6" is being kicked out (which requests that this validator be unbonded) the protocol would panic because this validator was never bonded to begin with. the original cliff validator would remain bonded the whole time.


here is a more verbose description of the same scenario. Again I haven't been able to write a test for this but I think this which actually panics, which leads me to think that the real scenario is not this one but maybe a close variant.

  • validator-1 is a cliff validator, it increases power it's from "5" to "10", which is not enough to bring it out of the cliff position (let's say the next validator has a power of "100"). Due to bad logic the power recorded for the cliff validator is not recorded and remains at "5"
  • an unbonded validator, validator-2, bonds some new tokens bringing it to "6" bonded power. Because the cliff power is recorded at "5" the protocol thinks that validator-2 should now become the new cliff-validator, however because it is not the true cliff validator, it remains unbonded
  • another validator, validator-3 bonds some new tokens bringing it to a power of "7", which then triggers a kick out validator-2, when it tries to unbond it, it finds it's already unbonded and panics

@ebuchman ebuchman mentioned this issue Aug 8, 2018
6 tasks
@gamarin2
Copy link
Contributor

I think the postmortem also closes this? @cwgoes

@zmanian
Copy link
Member Author

zmanian commented Aug 13, 2018

Let's close it cause no one seems to have a stored a blockchain for gaia-7004. It will be prevented in future testnets.

@zmanian zmanian closed this as completed Aug 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants