-
Notifications
You must be signed in to change notification settings - Fork 379
panicked at 'Storage root must match that calculated.' #219
Comments
Ok, the |
Can you share your parachain implementation? And is the network public? So, that I can sync it on my own? |
The network is private for now as it is very early stage, and we haven't shared yet our implementation (we had to use a different repo as it required too many changes to support cumulus and ...). I can prepare a docker image with the full network running on it and a parachain binary to connect to if that helps. Also another case of the same error (but didn't print the stacktrace this time so not sure if it is related to the same):
It has this UnknownParent that seems to be the trigger in that case. |
Hmm, an Would be really nice to have something to reproduce this :) |
Updating the ticket based on our discuss.
As requested I attached the logs including the TRACE state. The first node has a different value set for state: The left side (node 1 failing) has failed import because of re-org. But the block crashing is the parachain block #84 |
I initially wanted to explain how to reproduce here but it is too long and complex, so I made a branch for it: The readme contains all the steps |
@bkchr If that helps, I can try to keep one network online for you to connect and see it. It will make the reproduction steps a bit simpler. |
With the suggestion of @JoshOrndorff , I tried to reproduce it only using substrate transactions (instead of the EVM) and so far I didn't get the issue, but I observed something that could be similar to what is happening. When running 1 parachain collator and 3 (non validating) parachain rpc node, those rpc nodes sometime stop importing blocks (after a re-org also) for a long time and finally re-sync.
@bkchr Could it be that it is the same issue, but the difference is that substrate when failing to import the block doesn't complain but in the case of the EVM it does because it panics if the state is invalid ? |
The output posted in the comment immediately before this one does not look to me like the parachain stopped importing blocks. It looks like a nice demonstration of the steps that @bkchr outlined earlier. The first thing we see is a normally operating collator. Then there is a relay chain re-org. A re-org is one way that a collator could "import a new best relay chain block", which is exactly what starts the process listed. After the reorg, the collator suddenly imports a bunch of blocks all at once (but not not mark them as best). To me this indicates that the old relay fork did not reference these blocks, but the new relay fork does, so the collator imports them. Then after another 1 second of waiting we get another new relay chain block (#40) which references all those just-imported parablocks, and thus the best parachain block is updated. So my point is that the ouput in the immediately preceeding comment looks like correct expected behavior to me assuming that only the relay block #39 that we re-orged to knew about all those parablocks. |
I guess the one mystery that remains is why did the collator not import all of those parablocks before the reorg? |
This issue does seem to be related to re-orgs on the relay chain. Has paritytech/substrate#7118 made its way back into polkadot yet? |
The linked pr was only required for the transaction pool and nothing else. So, I don't assume that it would change something here. |
Just to keep everyone up to date I want to share an observation and a hypothesis. hypothesis The crash reported here is a symptom of two interacting issues. The first is that weird stuff (TM) happens on the parachain after a re-org on the relay chain. The second is probably something in moonbeam or frontier. In order to eliminate this from the equation I'll try to reproduce the relevant part with the parachain template. observation A lot more weird / bad / unstable stuff like nodes getting stuck, and sudden mass imports is much more visible when running without grandpa. I'm able to reproduce paranodes getting stuck quite reliably by running the relay chain nodes with Here is one full set of logs I generated and found pretty helpful. I'm open to hearing what additional logs I should be recording, if any. https://gist.github.com/JoshOrndorff/fb8230fd059449395730814098d60051 |
Here's another complete set of logs from Alice validator, Bob validator, and three moonbeam collators. https://gist.github.com/JoshOrndorff/57842a1da7ea464a235afaa5499a721e All the collators demonstrate getting stuck after a relay chain organization at various points. |
As I mentioned in #225 this seems to be fixed on newer cumulus, so this issue can be closed. |
Ty for reporting |
We have 2 relay node.
We have 1 collator node (cumulu 96da14c)
We have 1 rpc node (same command as collator but without
--validator
)It is very similar to the node template, but with the Frontier (Ethereum) pallet included
When we perform some tests on the rpc node (using Ethereum deploying contracts, transferring, calling contracts..._), it inconsistency fail with this error. (which then enters in a loop):
This doesn't happen on the collator itself, only on the rpc node.
It doesn't seem to happen if we run the rpc node as a collator too (but because it is inconsistent, I'll have to confirm that after a day or 2)
The text was updated successfully, but these errors were encountered: