Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: State-sync doesn't work with specific blocks #494

Closed
pcheliniy opened this issue Sep 22, 2022 · 13 comments
Closed

bug: State-sync doesn't work with specific blocks #494

pcheliniy opened this issue Sep 22, 2022 · 13 comments
Assignees
Labels
bug Something isn't working
Milestone

Comments

@pcheliniy
Copy link

pcheliniy commented Sep 22, 2022

What is happening?

Section description Provide as much context as you can. Provide relevant software versions, screenshots, copy & paste error messages and so on. Give as much context as you can to make it easier for the developers to figure what is happening.

Sometimes node fail consensus and after it can't start.
When we try to reinit node with statesync we have following error

9:53PM INF indexed block height=390682 module=txindex
panic: failed to process committed block (390683:BF34A41107424E708F46796033B015688C9D6F87839DB7F05D78C5C9612A883C): wrong Block.Header.AppHash.  Expected D56B1A9B5DD92380EFB23F90
4A7AC8E457A9B959D89C28DC3FD98079B6BF6009, got 441E638211FDC335862D30983FADADF6B0982EE145C92E3F23D73CE140C56D28 [recovered]
        panic: failed to process committed block (390683:BF34A41107424E708F46796033B015688C9D6F87839DB7F05D78C5C9612A883C): wrong Block.Header.AppHash.  Expected D56B1A9B5DD92380
EFB23F904A7AC8E457A9B959D89C28DC3FD98079B6BF6009, got 441E638211FDC335862D30983FADADF6B0982EE145C92E3F23D73CE140C56D28
last events:
0: scBlockReceived{390684#C155675FD6A69F66A4938A9A4337CE5BFE32C77060B9471AE3561AFAF05C9244 from 0da7015ffb74bf4782ccea0d79783c1493798b0f}
1: scBlockReceived{390691#DFF548CB816307686C394C3E3CFE47A231F97FE191CBC9F6FF0C5F1FE0F1435A from 9f9e229c7375ae4cb75aa971fde9ef89f4f00e7f}
2: scBlockReceived{390690#CFCD87379373D1DEDDEF165C6EC0839B2B4403C43A897E3DA527D71AD3EA7381 from 97e5ae4dbac5eb14aedfa8eee26005d4aa1da0a4}
3: scBlockReceived{390689#C9336985EAEDB680E89C3BCE41E5B71730A4CE7BE3BE783C2F03F0C817EE0C49 from 8912f06b337b9f773225bd59e5a139e5af7eb852}
4: scBlockReceived{390688#BC1B31B9164940CCE908F6AC3EE0DF093181C6AC975B71906F1AB6A6E86551A4 from 484e0d3cc02ba868d4ad68ec44caf89dd14d1845}
5: scBlockReceived{390687#CBF85BA9A2CF03729FFF457162AE42EF6AB264CFB51491576CF7422254D21021 from 40f4f9b78f7df5b0bfe9726a76bebd3b37ce12ae}
6: scBlockReceived{390686#CA7635A09C983B708D8CBE5E953572E1E4C280DFDAA7C5B0F222AB5787C28455 from 3c653c2425f8f8f4973bd88e78cb8a5ae9763ed1}
7: scBlockReceived{390685#3BFE53EBEB181AC7B4F31426DB26AF58B8D4F3A571F395E428062DDDDF070A94 from 22ea29ae571078635492544d736c669712971d57}
8: scBlockReceived{390675#F5697C0C6F42321366C791C515F549DFF38B7BE4D5B545D4568EF73377BECD60 from 0da7015ffb74bf4782ccea0d79783c1493798b0f}
9: scBlockReceived{390683#BF34A41107424E708F46796033B015688C9D6F87839DB7F05D78C5C9612A883C from b159364b4e6a3036c36ef6c7c690c5fbc81fa9c4}
10: scBlockReceived{390682#0A82FFB08BD43499D97B2F2095560952A491E5754797AEA8F35DEC500548C32D from 9f9e229c7375ae4cb75aa971fde9ef89f4f00e7f}
11: scBlockReceived{390681#8DEF706A8E8CA495AF1070A66DD69B3554D197C503C851312DCB42BF3A42F1C9 from 97e5ae4dbac5eb14aedfa8eee26005d4aa1da0a4}
12: scBlockReceived{390680#07E282C48241C61141FC55961B1C9803A486A3DD24BFDB4376A34E706CBE7115 from 8912f06b337b9f773225bd59e5a139e5af7eb852}
13: scBlockReceived{390679#D91B7A6D9CD86D6A6E50D02003884C4DA3AF5A65821AA40F647AF4C7BD897D51 from 484e0d3cc02ba868d4ad68ec44caf89dd14d1845}
14: scBlockReceived{390678#D2D5D75AE09C21B973BB9D3FE400DFB258D41DFB4347C9D1328B7C75E76A0FAE from 40f4f9b78f7df5b0bfe9726a76bebd3b37ce12ae}
15: scBlockReceived{390677#BA0DC852930ABB396F05C60E25F54C85DBF62507BEDA75F15A958C438075472B from 3c653c2425f8f8f4973bd88e78cb8a5ae9763ed1}                                        16: scBlockReceived{390676#06D5945B07E3AE3ACDA8EB0D9D503CFECAF6072105EF0FFD739D73F0DC1E8D14 from 22ea29ae571078635492544d736c669712971d57}
17: scBlockReceived{390666#8C02321CB64C657C2B118BD6500A6B12A8F6EC1164950D67EE71B71E2B44EAF4 from 0da7015ffb74bf4782ccea0d79783c1493798b0f}
18: scBlockReceived{390674#FBF9252907EC46BE7453BD8D9373C4FDB7C6DCC0ABC281FD254400D8F6135984 from b159364b4e6a3036c36ef6c7c690c5fbc81fa9c4}
19: scBlockReceived{390673#2E67AC2D7AE6B8C772F8BD2E38A9FEC7DD56BB59409D91B94C13E00C4A995D10 from 9f9e229c7375ae4cb75aa971fde9ef89f4f00e7f}
20: scBlockReceived{390672#C678C3C778FF7245E70CFE477B46BD62D24309FA716A8831B446EBA2F7C632AB from 97e5ae4dbac5eb14aedfa8eee26005d4aa1da0a4}
21: scBlockReceived{390671#7B8A05FF11693D44B41EBCAC827983A7AC5C4BDB289EA8CFCCCA34F85D8BB406 from 8912f06b337b9f773225bd59e5a139e5af7eb852}
22: scBlockReceived{390670#B5085B0914E802A79B25D0CC2859F56276BBAA1945A6A2A94886C84C5003EBCD from 484e0d3cc02ba868d4ad68ec44caf89dd14d1845}
23: scBlockReceived{390669#EA03DF8B7C2B8D0237A253ABE77AD5808C4C8477E00B0E455E20F11DD7EB89A8 from 40f4f9b78f7df5b0bfe9726a76bebd3b37ce12ae}
24: scBlockReceived{390668#20CDA48672BCED1112D578E0D29BECAD9339D48B3CE7A012BB7F5663DDD88665 from 3c653c2425f8f8f4973bd88e78cb8a5ae9763ed1}


goroutine 166 [running]:
github.com/tendermint/tendermint/blockchain/v2.(*Routine).start.func1()
        /go/pkg/mod/github.com/tendermint/[email protected]/blockchain/v2/routine.go:77 +0x24e
panic({0x19fb440, 0xc007b0c630})
        /usr/local/go/src/runtime/panic.go:838 +0x207
github.com/tendermint/tendermint/blockchain/v2.(*pcState).handle(0xc000f32930, {0x260a960?, 0x36b52d8?})
        /go/pkg/mod/github.com/tendermint/[email protected]/blockchain/v2/processor.go:182 +0x96e
github.com/tendermint/tendermint/blockchain/v2.(*Routine).start(0xc0011a4770)
        /go/pkg/mod/github.com/tendermint/[email protected]/blockchain/v2/routine.go:94 +0x232
created by github.com/tendermint/tendermint/blockchain/v2.(*BlockchainReactor).startSync
        /go/pkg/mod/github.com/tendermint/[email protected]/blockchain/v2/reactor.go:152 +0x17d

Sync start working only when this block pass away from state-sync snapshot.
At the same time, node which provide state-sync snapshot works correctly. It probably means that node has no problem with consensus and all saved blocks should be correct.
I've tried 2 different validators(state-sync providers) both of them fail at the same block. It probably means that it isn't local problem of specific node.

How to reproduce?

Section description Please write detailed steps of what you were doing for this bug to appear.

Problem started appear after pigeon 0.8.0, paloma 0.9.0 update. We haven't had this problem before.

What is the expected behaviour?

Section description If you know, please write down what is the expected behaviour. If you don't know, that's ok. We can have a discussion in comments.

State sync work correct, because this problem ruin whole idea of statesync (ability to sync node even in case when protocol changed)

@pcheliniy pcheliniy added the bug Something isn't working label Sep 22, 2022
@taariq taariq added this to Paloma Sep 22, 2022
@taariq taariq added this to the V 1.0 milestone Sep 22, 2022
@taariq
Copy link
Contributor

taariq commented Sep 23, 2022

@pcheliniy will you output your following:

  1. Paloma Configuration files
  2. Pigeon configuration files
  3. Versions of pigeon and paloma

@pcheliniy
Copy link
Author

@taariq
Copy link
Contributor

taariq commented Sep 24, 2022

@Vizualni will you review the configuration and errors when you have a moment?

@Vizualni
Copy link
Contributor

@taariq @pcheliniy If I am not mistaken the proposal to upgrade palomad was around that block height. I am not really an expert in this, but it seems to me that the sync prior to the proposal should be done with the previous palomad version, and the sync after the proposal was accepted should be done with the latest palomad version. Take this with a grain of salt 🧂 .

Also, have you tried using fast-sync as well?

@taariq
Copy link
Contributor

taariq commented Sep 25, 2022

Closing as state sync issue with old state from first governance which failed. We need new sync from after new governance proposal passed.

@taariq taariq closed this as completed Sep 25, 2022
@taariq taariq moved this to Done in Paloma Sep 25, 2022
@pcheliniy
Copy link
Author

First failed proposal was at 313_947 https://paloma.explorers.guru/proposal/1

Second successful at 353_459 https://paloma.explorers.guru/proposal/2
Problem at 390_682, it doesnt connect to version changes.

A lot of people have this problem and problem somehow connected to contract instantinate activity https://paloma.explorers.guru/transaction/212D41BB3753873CD45885591499F678E091A0C12DFAC9F61A8AC7DEA0D44B57

@taariq taariq reopened this Sep 25, 2022
@taariq
Copy link
Contributor

taariq commented Sep 25, 2022

Re-opening with review of the smart contract issues related to this ticket as well. Thanks @pcheliniy.

@Vizualni
Copy link
Contributor

I am going to assign @measure-fi to this issue. Although, I am not really sure if he can do anything here as well as that code is deterministic. There might be some undesirable nondeterministic behaviour directly in the cosmwasm

@Vizualni Vizualni assigned measure-fi and unassigned Vizualni Sep 25, 2022
@hdmiimdh
Copy link

What happens is:

State sync RPC makes snap, BUT it doesn't include ./data/wasm folder;
Node runner tries to state sync and do palomad tendermint unsafe-reset-all, which resets his local WASM;
Node downloads the snapshot from RPC;
Node applies the snapshot from RPC;
Node starts syncing;
Node faces a block with a contract call transaction;
Node fails with the error below, because expected WASM state and actual state are different.
Error: error during handshake: error on replay: wrong Block.Header.AppHash. Expected E8A9094824059D0AFF18F41E3A751691FB92865033AB2C4871301F79AEFA9E57, got CBBCFCC2DF0C55F2ED3F234A1DD36EA58D5FAC3709358982DD63C138DEB59109

Here's a block # I've faced it recently 447262. I'm not sure where exactly it should be fixed, but the fix is that we should include WASM folder in state sync snaps.
https://paloma.explorers.guru/block/447262
https://paloma.explorers.guru/transaction/1B7B35A9C5CED98DA2CCB1A08C3BBE45DF43BAAB881D2E49501026607792F513

Meanwhile, for those who's struggling with the same issue, use snapshots over state sync, it has full version of WASM and most likely safe.
https://nodejumper.io/paloma-testnet/sync#snap

@taariq
Copy link
Contributor

taariq commented Sep 26, 2022

Closing as 3rd party state-sync issue now resolved. Thank you @hdmiimdh

@taariq taariq closed this as completed Sep 26, 2022
@taariq taariq reopened this Sep 26, 2022
@taariq
Copy link
Contributor

taariq commented Sep 26, 2022

Via @hdmiimdh: Thanks for the headsup
Hdmiimdh | Nodejumper, [Sep 26, 2022 at 7:21:50 AM]:
Btw you did get me wrong though 😅 this is something you should fix, cause now the chain doesn't support state sync as it doesn't include wasm in RPC snapshots.

Using snapshots over state sync is just a workaround, but I don't know whether you guys are okay with this.

@hdmiimdh
Copy link

Whoever takes this issue you can reference MR in Cosmos SDK, which allows devs to add more things to RPC snapshots such as wasm, might be helpful.
cosmos/cosmos-sdk#10961

@taariq
Copy link
Contributor

taariq commented Sep 26, 2022

Moved to feature update #499

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Done
Development

No branches or pull requests

5 participants