Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

consensus failure on cosmos-sdk v0.50.2 and comet v0.38.2 #831

Closed
mkaczanowski opened this issue Jan 8, 2024 · 12 comments · Fixed by #837 or #857
Closed

consensus failure on cosmos-sdk v0.50.2 and comet v0.38.2 #831

mkaczanowski opened this issue Jan 8, 2024 · 12 comments · Fixed by #837 or #857

Comments

@mkaczanowski
Copy link
Contributor

TL;DR

tmkms signs the vote and then you see the following error:

ERR CONSENSUS FAILURE!!! err="non-recoverable error when signing vote (587/0)" module=consensus stack="goroutine 2982 [running]:\nruntime/debug.Stack()\n\truntime/debug/stack.go:24 +0x5e\ngithub.com/cometbft/cometbft/consensus.(*State).receiveRoutine.func2()
\n\tgithub.com/cometbft/[email protected]/consensus/state.go:800 +0x46\npanic({0x2f80d60?, 0xc00eb9a020?})\n\truntime/panic.go:914 +0x21f\ngithub.com/cometbft/cometbft/consensus.(*State).signVote(0xc001402380, 0x2, {0xc006815000, 0x20, 0x20}, {0xd9c8dc?, {0xc006815060?, 0xc006fc12ec?, 0x3314060?}}, 0xc0052ed2c0)
\n\tgithub.com/cometbft/[email protected]/consensus/state.go:2387 +0x639\ngithub.com/cometbft/cometbft/consensus.(*State).signAddVote(0xc001402380, 0x0?, {0xc006815000, 0x20, 0x20}, {0x1?, {0xc006815060?, 0xc007c7d440?, 0x20?}}, 0xc0052ed2c0)\n\tgithub.com/cometbft/[email protected]/consensus/state.go:2438 +0x212
\ngithub.com/cometbft/cometbft/consensus.(*State).enterPrecommit(0xc001402380, 0x24b, 0x0)\n\tgithub.com/cometbft/[email protected]/consensus/state.go:1536 +0x1337\ngithub.com/cometbft/cometbft/consensus.(*State).addVote(0xc001402380, 0xc009622a90, {0xc003196000, 0x28})\n\tgithub.com/cometbft/[email protected]/co
nsensus/state.go:2295 +0x186f\ngithub.com/cometbft/cometbft/consensus.(*State).tryAddVote(0xc001402380, 0xc009622a90, {0xc003196000?, 0xc006fc1c08?})\n\tgithub.com/cometbft/[email protected]/consensus/state.go:2055 +0x26\ngithub.com/cometbft/cometbft/consensus.(*State).handleMsg(0xc001402380, {{0x456b840, 0xc002
10ea90}, {0xc003196000, 0x28}})\n\tgithub.com/cometbft/[email protected]/consensus/state.go:928 +0x3ce\ngithub.com/cometbft/cometbft/consensus.(*State).receiveRoutine(0xc001402380, 0x0)\n\tgithub.com/cometbft/[email protected]/consensus/state.go:835 +0x3d1\ncreated by github.com/cometbft/cometbft/consensus.(*Stat
e).OnStart in goroutine 221\n\tgithub.com/cometbft/[email protected]/consensus/state.go:397 +0x10c\n"

it may be related to:
https://docs.cosmos.network/main/build/abci/vote-extensions

@mkaczanowski
Copy link
Contributor Author

My guess is that signed vote extension bytes should be included into signed vote:
https://github.com/cometbft/cometbft/blob/4670b75a26b72867fc4aa70eccbf35f451de341c/types/priv_validator.go#L88-L98

@mkaczanowski
Copy link
Contributor Author

This may be troublesome for yubihsm users. This basically will add additional sign request, and taken that you can't stack too many networks on single yubihsm due to single-thread nature, you'll be able to stack even less...

(the lgtm alternative is to use the hashicorp vault in-transit signing as soft-hsm)

@mkaczanowski
Copy link
Contributor Author

tony-iqlusion added a commit that referenced this issue Jan 11, 2024
When present, computes a secondary signature over `Vote::extension`.

Closes #831 #835
tony-iqlusion added a commit that referenced this issue Jan 11, 2024
When present, computes a secondary signature over `Vote::extension`.

Closes #831 #835
@tony-iqlusion
Copy link
Member

@mkaczanowski can you try running this branch and see if it resolves the issue? #837

@mkaczanowski
Copy link
Contributor Author

yes, lemme try it

tony-iqlusion added a commit that referenced this issue Jan 12, 2024
When present, computes a secondary signature over `Vote::extension`.

Closes #831 #835
@tony-iqlusion
Copy link
Member

This is released in v0.13.1

@tqin7
Copy link

tqin7 commented Feb 15, 2024

Hi folks! With tmkms v0.13.1 running against our chain on cosmos v0.50, we are seeing errors such as CONSENSUS FAILURE!!! err="non-recoverable error when signing vote (6880123/1). However, horcrux v3.3.0-rc1 works. Could I ask two questions:

  1. Do you have any idea on what might be causing the error?
  2. Do you know other chains on cosmos v0.50 where tmkms works?

Much appreciated!

@tony-iqlusion
Copy link
Member

This is the first report we've gotten of a failure since v0.13.1 is released.

The logs from TMKMS when the failure occurred would be very helpful.

@qezz
Copy link

qezz commented Feb 16, 2024

Guessing the chain in question is dydx-testnet, here are some logs.

We run tmkms 0.13.1

Node itself:

dydx_testnet    | 1:21PM INF markets were not included in the price updates: [0 6 11 12 24] module=process_proposal
dydx_testnet    | 1:21PM ERR CONSENSUS FAILURE!!! err="non-recoverable error when signing vote (7296635/0)" module=consensus stack="goroutine 3962 [running]:\nruntime/debug.Stack()\n\truntime/debug/stack.go:24 +0x5e\ngithub.com/cometbft/cometbft/consensus.(*State).receiveRoutine.func2()\n\tgithub.com/cometbft/[email protected]/consensus/state.go:800 +0x46\npanic({0x45aa3e0?, 0xc01a7a0180?})\n\truntime/panic.go:914 +0x21f\ngithub.com/cometbft/cometbft/consensus.(*State).signVote(0xc0035bf880, 0x2, {0xc01cec3520, 0x20, 0x20}, {0x241d3bc?, {0xc01cec3560?, 0xc004b332ec?, 0x499d480?}}, 0xc01a2ee000)\n\tgithub.com/cometbft/[email protected]/consensus/state.go:2441 +0x639\ngithub.com/cometbft/cometbft/consensus.(*State).signAddVote(0xc0035bf880, 0x0?, {0xc01cec3520, 0x20, 0x20}, {0x1?, {0xc01cec3560?, 0xc01da87080?, 0x20?}}, 0xc01a2ee000)\n\tgithub.com/cometbft/[email protected]/consensus/state.go:2492 +0x212\ngithub.com/cometbft/cometbft/consensus.(*State).enterPrecommit(0xc0035bf880, 0x6f567b, 0x0)\n\tgithub.com/cometbft/[email protected]/consensus/state.go:1590 +0x1337\ngithub.com/cometbft/cometbft/consensus.(*State).addVote(0xc0035bf880, 0xc01ad89c70, {0xc003598930, 0x28})\n\tgithub.com/cometbft/[email protected]/consensus/state.go:2349 +0x186f\ngithub.com/cometbft/cometbft/consensus.(*State).tryAddVote(0xc0035bf880, 0xc01ad89c70, {0xc003598930?, 0xc004b33c08?})\n\tgithub.com/cometbft/[email protected]/consensus/state.go:2109 +0x26\ngithub.com/cometbft/cometbft/consensus.(*State).handleMsg(0xc0035bf880, {{0x56413e0, 0xc019edc700}, {0xc003598930, 0x28}})\n\tgithub.com/cometbft/[email protected]/consensus/state.go:928 +0x3ce\ngithub.com/cometbft/cometbft/consensus.(*State).receiveRoutine(0xc0035bf880, 0x0)\n\tgithub.com/cometbft/[email protected]/consensus/state.go:835 +0x3d1\ncreated by github.com/cometbft/cometbft/consensus.(*State).OnStart in goroutine 573\n\tgithub.com/cometbft/[email protected]/consensus/state.go:397 +0x10c\n"
dydx_testnet    | 1:21PM INF service stop impl=baseWAL module=consensus msg="Stopping baseWAL service" wal=/dydx-testnet/.dydxprotocold/data/cs.wal/wal
dydx_testnet    | 1:21PM INF service stop impl=Group module=consensus msg="Stopping Group service" wal=/dydx-testnet/.dydxprotocold/data/cs.wal/wal

Tmkms:

Feb 16 13:21:03 tmkms-0.13.1[2143273]: 2024-02-16T13:21:03.587181Z DEBUG tmkms::chain::state: writing new consensus state to /opt/validators/dydx_testnet-chorus/tmkms/state/dydx-testnet-4-consensus.json: State { height: block::Height(7296634), round: block::Round(0), step: 2, block_id: None }
Feb 16 13:21:03 tmkms-0.13.1[2143273]: 2024-02-16T13:21:03.587287Z DEBUG tmkms::chain::state: successfully wrote new consensus state to /opt/validators/dydx_testnet-chorus/tmkms/state/dydx-testnet-4-consensus.json
Feb 16 13:21:03 tmkms-0.13.1[2143273]: 2024-02-16T13:21:03.587342Z  INFO tmkms::session: [dydx-testnet-4@tcp://100.76.204.31:54659] signed Precommit:<nil> at h/r/s 7296634/0/2 (0 ms)
Feb 16 13:21:03 tmkms-0.13.1[2143273]: 2024-02-16T13:21:03.587350Z DEBUG tmkms::session: [dydx-testnet-4@tcp://100.76.204.31:54659] sending response: SignedVote(SignedVoteResponse { vote: Some(Vote { r#type: Precommit, height: 7296634, round: 0, block_id: None, timestamp: Some(Timestamp { seconds: 1708089663, nanos: 587083382 }), validator_address: [127, 251, 188, 212, 179, 206, 161, 215, 77, 228, 108, 115, 83, 153, 61, 183, 140, 70, 250, 39], validator_index: 0, signature: [144, 205, 248, 12, 116, 127, 60, 157, 124, 194, 20, 21, 199, 236, 139, 163, 0, 246, 231, 190, 168, 173, 125, 244, 65, 142, 80, 151, 173, 133, 234, 200, 127, 47, 252, 164, 164, 93, 222, 218, 29, 58, 155, 99, 224, 1, 52, 3, 220, 91, 107, 153, 163, 105, 82, 87, 119, 187, 187, 151, 156, 186, 64, 5], extension: [], extension_signature: [] }), error: None })
Feb 16 13:21:03 tmkms-0.13.1[2143273]: 2024-02-16T13:21:03.609439Z DEBUG tmkms::session: [dydx-testnet-4@tcp://100.76.204.31:54659] received request: ShowPublicKey
Feb 16 13:21:03 tmkms-0.13.1[2143273]: 2024-02-16T13:21:03.609464Z DEBUG tmkms::session: [dydx-testnet-4@tcp://100.76.204.31:54659] sending response: PublicKey(PubKeyResponse { pub_key: Some(PublicKey { sum: Some(Ed25519([151, 137, 20, 167, 191, 138, 245, 197, 97, 230, 200, 162, 5, 62, 87, 158, 149, 135, 90, 239, 146, 227, 67, 66, 9, 112, 98, 14, 95, 86, 139, 142])) }), error: None })
Feb 16 13:21:03 tmkms-0.13.1[2143273]: 2024-02-16T13:21:03.791651Z DEBUG tmkms::session: [dydx-testnet-4@tcp://100.76.204.31:54659] received request: SignVote(Vote { vote_type: Prevote, height: block::Height(7296635), round: block::Round(0), block_id: Some(Id { hash: Hash::Sha256(E9487BFBCD21607102A89635D238124980FB654498E32173B7A34AC7FF08835F), part_set_header: Header { total: 1, hash: Hash::Sha256(693E1737CC8063EA8FDEF44ADBA1A4FAB619673FF471C6FFD1EF5C116F94FC0A) } }), timestamp: Some(Time(2024-02-16 13:21:03.791594411)), validator_address: account::Id(7FFBBCD4B3CEA1D74DE46C7353993DB78C46FA27), validator_index: vote::ValidatorIndex(0), signature: None, extension: [], extension_signature: None })
Feb 16 13:21:03 tmkms-0.13.1[2143273]: 2024-02-16T13:21:03.791678Z DEBUG tmkms::chain::state: writing new consensus state to /opt/validators/dydx_testnet-chorus/tmkms/state/dydx-testnet-4-consensus.json: State { height: block::Height(7296635), round: block::Round(0), step: 1, block_id: Some(Id { hash: Hash::Sha256(E9487BFBCD21607102A89635D238124980FB654498E32173B7A34AC7FF08835F), part_set_header: Header { total: 1, hash: Hash::Sha256(693E1737CC8063EA8FDEF44ADBA1A4FAB619673FF471C6FFD1EF5C116F94FC0A) } }) }
Feb 16 13:21:03 tmkms-0.13.1[2143273]: 2024-02-16T13:21:03.791764Z DEBUG tmkms::chain::state: successfully wrote new consensus state to /opt/validators/dydx_testnet-chorus/tmkms/state/dydx-testnet-4-consensus.json
Feb 16 13:21:03 tmkms-0.13.1[2143273]: 2024-02-16T13:21:03.791809Z  INFO tmkms::session: [dydx-testnet-4@tcp://100.76.204.31:54659] signed Prevote:E9487BFBCD at h/r/s 7296635/0/1 (0 ms)
Feb 16 13:21:03 tmkms-0.13.1[2143273]: 2024-02-16T13:21:03.791816Z DEBUG tmkms::session: [dydx-testnet-4@tcp://100.76.204.31:54659] sending response: SignedVote(SignedVoteResponse { vote: Some(Vote { r#type: Prevote, height: 7296635, round: 0, block_id: Some(BlockId { hash: [233, 72, 123, 251, 205, 33, 96, 113, 2, 168, 150, 53, 210, 56, 18, 73, 128, 251, 101, 68, 152, 227, 33, 115, 183, 163, 74, 199, 255, 8, 131, 95], part_set_header: Some(PartSetHeader { total: 1, hash: [105, 62, 23, 55, 204, 128, 99, 234, 143, 222, 244, 74, 219, 161, 164, 250, 182, 25, 103, 63, 244, 113, 198, 255, 209, 239, 92, 17, 111, 148, 252, 10] }) }), timestamp: Some(Timestamp { seconds: 1708089663, nanos: 791594411 }), validator_address: [127, 251, 188, 212, 179, 206, 161, 215, 77, 228, 108, 115, 83, 153, 61, 183, 140, 70, 250, 39], validator_index: 0, signature: [159, 172, 105, 169, 208, 94, 191, 243, 25, 92, 19, 184, 207, 64, 195, 184, 212, 113, 117, 192, 7, 10, 226, 128, 39, 53, 182, 2, 204, 207, 85, 161, 55, 24, 26, 111, 195, 230, 112, 134, 114, 204, 11, 73, 31, 234, 131, 154, 251, 238, 144, 156, 85, 118, 142, 82, 110, 49, 181, 74, 159, 209, 151, 11], extension: [], extension_signature: [] }), error: None })
Feb 16 13:21:03 tmkms-0.13.1[2143273]: 2024-02-16T13:21:03.796718Z DEBUG tmkms::session: [dydx-testnet-4@tcp://100.76.204.31:54659] received request: SignVote(Vote { vote_type: Precommit, height: block::Height(7296635), round: block::Round(0), block_id: Some(Id { hash: Hash::Sha256(E9487BFBCD21607102A89635D238124980FB654498E32173B7A34AC7FF08835F), part_set_header: Header { total: 1, hash: Hash::Sha256(693E1737CC8063EA8FDEF44ADBA1A4FAB619673FF471C6FFD1EF5C116F94FC0A) } }), timestamp: Some(Time(2024-02-16 13:21:03.796675881)), validator_address: account::Id(7FFBBCD4B3CEA1D74DE46C7353993DB78C46FA27), validator_index: vote::ValidatorIndex(0), signature: None, extension: [], extension_signature: None })
Feb 16 13:21:03 tmkms-0.13.1[2143273]: 2024-02-16T13:21:03.796735Z DEBUG tmkms::chain::state: writing new consensus state to /opt/validators/dydx_testnet-chorus/tmkms/state/dydx-testnet-4-consensus.json: State { height: block::Height(7296635), round: block::Round(0), step: 2, block_id: Some(Id { hash: Hash::Sha256(E9487BFBCD21607102A89635D238124980FB654498E32173B7A34AC7FF08835F), part_set_header: Header { total: 1, hash: Hash::Sha256(693E1737CC8063EA8FDEF44ADBA1A4FAB619673FF471C6FFD1EF5C116F94FC0A) } }) }
Feb 16 13:21:03 tmkms-0.13.1[2143273]: 2024-02-16T13:21:03.796796Z DEBUG tmkms::chain::state: successfully wrote new consensus state to /opt/validators/dydx_testnet-chorus/tmkms/state/dydx-testnet-4-consensus.json
Feb 16 13:21:03 tmkms-0.13.1[2143273]: 2024-02-16T13:21:03.796838Z  INFO tmkms::session: [dydx-testnet-4@tcp://100.76.204.31:54659] signed Precommit:E9487BFBCD at h/r/s 7296635/0/2 (0 ms)
Feb 16 13:21:03 tmkms-0.13.1[2143273]: 2024-02-16T13:21:03.796851Z DEBUG tmkms::session: [dydx-testnet-4@tcp://100.76.204.31:54659] sending response: SignedVote(SignedVoteResponse { vote: Some(Vote { r#type: Precommit, height: 7296635, round: 0, block_id: Some(BlockId { hash: [233, 72, 123, 251, 205, 33, 96, 113, 2, 168, 150, 53, 210, 56, 18, 73, 128, 251, 101, 68, 152, 227, 33, 115, 183, 163, 74, 199, 255, 8, 131, 95], part_set_header: Some(PartSetHeader { total: 1, hash: [105, 62, 23, 55, 204, 128, 99, 234, 143, 222, 244, 74, 219, 161, 164, 250, 182, 25, 103, 63, 244, 113, 198, 255, 209, 239, 92, 17, 111, 148, 252, 10] }) }), timestamp: Some(Timestamp { seconds: 1708089663, nanos: 796675881 }), validator_address: [127, 251, 188, 212, 179, 206, 161, 215, 77, 228, 108, 115, 83, 153, 61, 183, 140, 70, 250, 39], validator_index: 0, signature: [211, 82, 58, 161, 252, 161, 179, 153, 17, 166, 158, 44, 124, 223, 159, 71, 33, 233, 222, 167, 190, 234, 15, 96, 125, 201, 35, 222, 48, 220, 159, 191, 209, 255, 11, 186, 179, 7, 55, 131, 18, 104, 255, 91, 75, 115, 197, 116, 179, 61, 47, 28, 252, 199, 3, 132, 41, 144, 80, 103, 232, 108, 173, 15], extension: [], extension_signature: [] }), error: None })
Feb 16 13:21:07 tmkms-0.13.1[2143273]: 2024-02-16T13:21:07.130164Z DEBUG tmkms::session: [dydx-testnet-4@tcp://100.76.204.31:54659] received request: PingRequest
Feb 16 13:21:07 tmkms-0.13.1[2143273]: 2024-02-16T13:21:07.130182Z DEBUG tmkms::session: [dydx-testnet-4@tcp://100.76.204.31:54659] sending response: Ping(PingResponse)
Feb 16 13:21:10 tmkms-0.13.1[2143273]: 2024-02-16T13:21:10.464329Z DEBUG tmkms::session: [dydx-testnet-4@tcp://100.76.204.31:54659] received request: PingRequest
Feb 16 13:21:10 tmkms-0.13.1[2143273]: 2024-02-16T13:21:10.464346Z DEBUG tmkms::session: [dydx-testnet-4@tcp://100.76.204.31:54659] sending response: Ping(PingResponse)

So seems like it fails the moment the node is trying to sign anything. In this case it's 7296635. Tmkms logs include some lines from 7296634. After the validator node fails, tmkms logs are just PingRequest and PingResponse

I don't know if it's actually related to the cosmos-sdk v0.50, but the v0.13.1 (and this issue in particular) was mentioned in Dydx Testnet contact channels as the version that should fix the error of dydx-testnet v4.0.0 not working with tmkms.

@mkaczanowski
Copy link
Contributor Author

If only one would include error in Sprintf...
https://github.com/cometbft/cometbft/blob/v0.38.2/consensus/state.go#L2386

@melekes
Copy link
Contributor

melekes commented Feb 18, 2024

@mkaczanowski :

non-recoverable error when signing vote (1/0): extensions must be present IFF vote is a non-nil Precommit; present false, vote type 2, is nil false

@melekes
Copy link
Contributor

melekes commented Feb 19, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants