Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix past session slashing Zombienet test #578

Open
Tracked by #5852
sandreim opened this issue Aug 16, 2023 · 6 comments
Open
Tracked by #5852

Fix past session slashing Zombienet test #578

sandreim opened this issue Aug 16, 2023 · 6 comments
Assignees
Labels
I2-bug The node fails to follow expected behavior.

Comments

@sandreim
Copy link
Contributor

There are 2 issues to be fixed:

Finality stall

After resume Alice is lacking approval votes from honest validators because they don’t distribute - the blocks are approved in their view and also no need to enable aggression. 2/4 votes are not enough to approve, so finality stalls.

We should adjust the amount of validators to avoid this issue.

Collators stop block production after chain is reverted

This is only relevant on the async backing branch, but it is to be merged soon - paritytech/polkadot#5022

We've fixed Malus/Undying in paritytech/polkadot#7618 and because of this both cumulus and undying stop collating when they fail to build on top of the malus garbage candidate. This persists even after the chain is reverted.

WARN tokio-runtime-worker cumulus-pov-recovery: [Parachain] Failed to decode parachain block data from recovered PoV error=Error { cause: Some(Error { cause: Some(Error { cause: None, desc: "Not enough data to fill buffer" }), desc: "Could not decode `Header::state_root`" }), desc: "Could not decode `ParachainBlockData::header`" }
2023-08-16 17:04:06.035 ERROR tokio-runtime-worker test_parachain_undying_collator: Unable to build on top of HeadData { number: 5, parent_hash: [123, 128, 180, 244, 98, 185, 242, 12, 179, 204, 244, 200, 230, 205, 91, 203, 221, 57, 199, 184, 235, 146, 220, 105, 24, 25, 95, 89, 125, 95, 10, 194], post_state: [114, 105, 244, 217, 48, 123, 250, 165, 199, 115, 164, 210, 45, 19, 127, 62, 26, 175, 171, 187, 173, 112, 27, 213, 21, 131, 176, 156, 103, 118, 55, 37] }: StateMismatch 
@sandreim
Copy link
Contributor Author

CC @ordian

@Sophia-Gold Sophia-Gold transferred this issue from paritytech/polkadot Aug 24, 2023
@the-right-joyce the-right-joyce added I2-bug The node fails to follow expected behavior. T8-parachains_engineering and removed I3-bug labels Aug 25, 2023
@bkchr
Copy link
Member

bkchr commented Sep 4, 2023

The test is still flaky. Can we please fix this or disable it?

@eskimor
Copy link
Member

eskimor commented Oct 20, 2023

@ordian is this fixed?

@ordian
Copy link
Member

ordian commented Oct 20, 2023

Somewhat. I'll have to ask Zombienet team for stats on test failures to see how flaky it is nowadays.

I haven't look into issues with Malus/Undying collator and simply replaced the collator with cumulus based. So that part is not fixed, but may doesn't need to be fixed.

Sometimes it does fail on the last assertion about finality stall. In the test we pause a couple of nodes (2/4), so it breaks our assumption that no more than 1/3 is offline at the same time. And approvals are missing in that case leading to finality stall. This could either be fixed by being more aggressive in approval-distribution, or only disabling dispute resolution somehow on two nodes instead of pausing them.

It does fail sometimes on other assertions e.g. no parachain block is produced within 300 seconds after all nodes are up, likely due to slowness in zombienet/nodes.

@pepoviola
Copy link
Contributor

@ordian let me know if you need those stats. Thx!

@bkchr
Copy link
Member

bkchr commented Oct 21, 2023

Somewhat. I'll have to ask Zombienet team for stats on test failures to see how flaky it is nowadays.

I still see it failing quite regularly. So, it is still too flay. Especially if you consider that we already restart failing zombienet jobs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I2-bug The node fails to follow expected behavior.
Projects
Status: No status
Development

No branches or pull requests

6 participants