Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[KUSAMA] Chain's Block Finalization Halted - Parachains Stuck #3345

Closed
2 tasks done
SimonKraus opened this issue Feb 15, 2024 · 2 comments · Fixed by #3358
Closed
2 tasks done

[KUSAMA] Chain's Block Finalization Halted - Parachains Stuck #3345

SimonKraus opened this issue Feb 15, 2024 · 2 comments · Fixed by #3358
Labels
I2-bug The node fails to follow expected behavior. I10-unconfirmed Issue might be valid, but it's not yet known.

Comments

@SimonKraus
Copy link

Is there an existing issue?

  • I have searched the existing issues

Experiencing problems? Have you tried our Stack Exchange first?

  • This is not a support question.

Description of bug

Since Kusama block #21893876 the finalization is halted and Parachains are stalling in conclusion of this.

I tried to revert unfinalized blocks but that did obviously not help. #3314 might most probably be related.

After the revert I found some uncommon things in the logs:


Feb 15 22:36:00 ksm1 kusama[1073720]: 2024-02-15 22:36:00 Received msg before first active leaves update. This is not expected - message will be dropped. msg=DetermineUndisputedChain { base: (21893876, 0x2d6c8c4fdd11d33aab55c5f92203735efe9de97a038aeed9a2bfbe1fc636511b), block_descriptions: [], tx: Sender { complete: false } }
Feb 15 22:36:00 ksm1 kusama[1073720]: 2024-02-15 22:36:00 Call to `DetermineUndisputedChain` failed error=DetermineUndisputedChainCanceled(Canceled)
Feb 15 22:36:03 ksm1 kusama[1073720]: 2024-02-15 22:36:03 ⚙️  Preparing, target=#21894143 (4 peers), best: #21893951 (0x2b3f…5581), finalized #21893876 (0x2d6c…511b), ⬇ 2.5MiB/s ⬆ 228.2kiB/s
Feb 15 22:36:05 ksm1 kusama[1073720]: 2024-02-15 22:36:05 Received msg before first active leaves update. This is not expected - message will be dropped. msg=DetermineUndisputedChain { base: (21893876, 0x2d6c8c4fdd11d33aab55c5f92203735efe9de97a038aeed9a2bfbe1fc636511b), block_descriptions: [], tx: Sender { complete: false } }
Feb 15 22:36:05 ksm1 kusama[1073720]: 2024-02-15 22:36:05 Call to `DetermineUndisputedChain` failed error=DetermineUndisputedChainCanceled(Canceled)
Feb 15 22:36:06 ksm1 kusama[1073720]: 2024-02-15 22:36:06 Received msg before first active leaves update. This is not expected - message will be dropped. msg=ImportStatements { candidate_receipt: CandidateReceipt { descriptor: CandidateDescriptor { para_id: Id(2024), relay_parent: 0x7275799e9d486b37519b65741297eb858a9f5405a15eca2148faebe08c6be631, collator: Public(4873f964ae60edf0fb45298243ce6d9b518544cac0c2a5d770817ed228f8516e (EDKR96PC...)), persisted_validation_data_hash: 0xbd9504833e67e61ae95e8e44f8146098089ebb8b2972a9c80c96badfccef77ca, pov_hash: 0xbe600f960cd8eb42cb383b7a748cf08e62ab3a5381feb58493ac36d52a9e1342, erasure_root: 0xaa134e8e99fc868a7b035bc6721f76f068aab6541389c1e7196d2dcfac534f3b, signature: Signature(ee92a3e2994184a35f12ab7ae3a32434b1ea8cc97c1fccbab6a96f86140e520bc128fbceb96ac702fe505a1020b249a81cfa4da96a06cfd69a48e3d3502cfa83), para_head: 0x0b12d62b1a3badedbe0ea61371420e16f074ced9a6addf3ff3e547efcfc715d3, validation_code_hash: 0xaa3fe248257d50a7372520a6854a905309578b2b29e8d2cad4cafb7d70bb5cdf }, commitments_hash: 0x9afd4e93480c1fafc670825631dda5b10f3380bfd21461d4099beaed7c0f54d2 }, session: 36964, statements: [(SignedDisputeStatement { dispute_statement: DisputeStatement::Valid(ValidDisputeStatementKind::Explicit), candidate_hash: 0xc4ddebd193365d558f8a0a780148e7a9dbe4d9d9e621189921684a52b8f832ce, validator_public: Public(48e2f1304d686e7606af2d7a633c4045ad4604bd1d4ddb752999f1bba56e1c78 (EDtP5DwL...)), validator_signature: Signature(a690f67843a0039a73a490ae7feab0fb4f8536c0b690ff7da398e8dd880f0c181ac077f7e8400624d29facadf7992edb59de11824ff560b406a7c898ebb85a8e), session_index: 36964 }, ValidatorIndex(280)), (SignedDisputeStatement { dispute_statement: DisputeStatement::Invalid(InvalidDisputeStatementKind::Explicit), candidate_hash: 0xc4ddebd193365d558f8a0a780148e7a9dbe4d9d9e621189921684a52b8f832ce, validator_public: Public(40555bce2795d72b23da3a6ceea185f4fbd1d5def9045285f8fc1ab3272cd373 (E2fwRm5w...)), validator_signature: Signature(c221ebb12542dfde95a5ffcd21297adb842a586da42b5fddfa07b0c61dc4285e95a24ec37041ebbfc5aabbe27a19e009a36f0facadbfa786ff7e83001a8c178b), session_index: 36964 }, ValidatorIndex(108))], pending_confirmation: Some(Sender { complete: false }) }
Feb 15 22:36:13 ksm1 kusama[1073720]: 2024-02-15 22:36:13 Checking our own signature failed - db corruption?

And there's also a lot of banning going on (but maybe not successful):

Feb 15 23:01:58 ksm1 kusama[1073720]: 2024-02-15 23:01:58 Report 12D3KooWQfeHmNoXUnYAjy2L85dkVULBxPLD3PvvYgYM3baK8bAo: -2050470000 to -2147483648. Reason: Aggregated reputation change. Banned, disconnecting.
Feb 15 23:01:58 ksm1 kusama[1073720]: 2024-02-15 23:01:58 Report 12D3KooWRoW1WYV5D8hxuRPDgET9dUwQid4Hh6Xngc3SPsVXmmx7: -2146763648 to -2147483648. Reason: Aggregated reputation change. Banned, disconnecting.
Feb 15 23:01:58 ksm1 kusama[1073720]: 2024-02-15 23:01:58 Report 12D3KooWKYei51Se2PpJUsKRVDa59Qm1bYkgyFJdj9zb8F4Ho34D: -631665000 to -1775277775. Reason: Aggregated reputation change. Banned, disconnecting.
Feb 15 23:01:58 ksm1 kusama[1073720]: 2024-02-15 23:01:58 Report 12D3KooWSjdEB8XWHKMbCB6L5NV7Ek8pt2myy5ff1kJsmwd2j5Lm: -2146803648 to -2147483648. Reason: Aggregated reputation change. Banned, disconnecting.
Feb 15 23:01:58 ksm1 kusama[1073720]: 2024-02-15 23:01:58 Report 12D3KooWDnsrquBgr7q1X4Soo9mS5z6xDoqCkudqX4d3y289EeKZ: -2146783648 to -2147483648. Reason: Aggregated reputation change. Banned, disconnecting.
Feb 15 23:01:58 ksm1 kusama[1073720]: 2024-02-15 23:01:58 Report 12D3KooWExeGvk1NR6QmL9wEDRv6PNuhGvvseBRBuPmrW4HhvrBJ: -1729840000 to -2147483648. Reason: Aggregated reputation change. Banned, disconnecting.
Feb 15 23:01:58 ksm1 kusama[1073720]: 2024-02-15 23:01:58 Report 12D3KooWJuzHZgJFQqNQVaMQ3gQH1DtYTerqmZeV7tCwbvcYNZBB: -2146783648 to -2147483648. Reason: Aggregated reputation change. Banned, disconnecting.
Feb 15 23:01:58 ksm1 kusama[1073720]: 2024-02-15 23:01:58 Report 12D3KooWEBws94EEnx67nj3S6vf59cGQb3uw6xZHyYVRL8AGPhqd: -2000885000 to -2147483648. Reason: Aggregated reputation change. Banned, disconnecting.
Feb 15 23:01:58 ksm1 kusama[1073720]: 2024-02-15 23:01:58 Report 12D3KooWG5q9eBq9qATLUuuHpoQoeSf1UdvKkqTN4ZPku9s7cPu4: -1262670000 to -2147483648. Reason: Aggregated reputation change. Banned, disconnecting.
Feb 15 23:01:58 ksm1 kusama[1073720]: 2024-02-15 23:01:58 Report 12D3KooWCRaMkKD2F57oHGXPwX3uTcsU2dDq7AkhqNbNUWQtjrUz: -825830000 to -1983853669. Reason: Aggregated reputation change. Banned, disconnecting.
Feb 15 23:01:58 ksm1 kusama[1073720]: 2024-02-15 23:01:58 Report 12D3KooWQry3bKuDXfT86GoxZCsPrDywYnHazMST5j13iWBHrztP: -2146783648 to -2147483648. Reason: Aggregated reputation change. Banned, disconnecting.
Feb 15 23:01:58 ksm1 kusama[1073720]: 2024-02-15 23:01:58 Report 12D3KooWT2dRyBzDb2yBq6hmmsuFFAW7PTTUJcw3PiRPvA7d5hgm: -1159335000 to -2147483648. Reason: Aggregated reputation change. Banned, disconnecting.
Feb 15 23:01:58 ksm1 kusama[1073720]: 2024-02-15 23:01:58 Report 12D3KooWRjoe4poRBkBtuJ5JJGYanS7PWTfPc8sN4zvQ3gFEfSP6: -2147473648 to -2147483648. Reason: Aggregated reputation change. Banned, disconnecting.
Feb 15 23:01:58 ksm1 kusama[1073720]: 2024-02-15 23:01:58 Report 12D3KooWP2rRjGvHU5wGhxG9y3uLvtpRZTKADDMmbxYVZXzCZqS2: -2146773648 to -2147483648. Reason: Aggregated reputation change. Banned, disconnecting.
Feb 15 23:01:58 ksm1 kusama[1073720]: 2024-02-15 23:01:58 Report 12D3KooWLD5pJgXngnYmsyXgEgDr86EnEuzUgVZkNxfGT9ikE267: -1930450000 to -2147483648. Reason: Aggregated reputation change. Banned, disconnecting.
Feb 15 23:01:58 ksm1 kusama[1073720]: 2024-02-15 23:01:58 Report 12D3KooWJp8MZawmNtMLBurPTMvo4uLCECLno4rDBnVrsWbbQ4zv: -2147463648 to -2147483648. Reason: Aggregated reputation change. Banned, disconnecting.
Feb 15 23:01:58 ksm1 kusama[1073720]: 2024-02-15 23:01:58 Report 12D3KooWH3V1WwM6xaUcpghrF1rDx92vnTX5UshuuZrDjiBhS7kQ: -946535000 to -1806616166. Reason: Aggregated reputation change. Banned, disconnecting.
Feb 15 23:01:58 ksm1 kusama[1073720]: 2024-02-15 23:01:58 Report 12D3KooWPYw2tYP8gpKB9p55Q5D2tAkqmfoEYGseBNacJ8Xoq7FW: -1949260000 to -2147483648. Reason: Aggregated reputation change. Banned, disconnecting.
Feb 15 23:01:58 ksm1 kusama[1073720]: 2024-02-15 23:01:58 Report 12D3KooWCRTyCoZjarVFPoK6iTqXRRMj6w82nHEiJGajrH8kVc1q: -1478450000 to -2147483648. Reason: Aggregated reputation change. Banned, disconnecting.
Feb 15 23:01:58 ksm1 kusama[1073720]: 2024-02-15 23:01:58 Report 12D3KooWRLqQTHEr2y7LiYW9EEw7pnPXfkr8fKennB7T3UuhmWxq: -1979210000 to -2147483648. Reason: Aggregated reputation change. Banned, disconnecting.
Feb 15 23:01:58 ksm1 kusama[1073720]: 2024-02-15 23:01:58 Report 12D3KooWAbz8XJtnXRhBw3Y7FdzLh6pr7zawxpM5NComVCPfMC1o: -1885650000 to -2147483648. Reason: Aggregated reputation change. Banned, disconnecting.
Feb 15 23:01:58 ksm1 kusama[1073720]: 2024-02-15 23:01:58 Report 12D3KooWCC7QXRYcpku589sBfZqkWZifnLMrdwG9X8WwaVYTkN25: -1723130000 to -2147483648. Reason: Aggregated reputation change. Banned, disconnecting.
Feb 15 23:01:58 ksm1 kusama[1073720]: 2024-02-15 23:01:58 Report 12D3KooWSSJSmBecyWjraNNSoDtwU7uL8v6gYTZuwJhJHDGttTCf: -2050805000 to -2147483648. Reason: Aggregated reputation change. Banned, disconnecting.
Feb 15 23:01:58 ksm1 kusama[1073720]: 2024-02-15 23:01:58 Report 12D3KooWHN5SunMk7qM21WgBLtNMntvZsMqUTt5UT9WukFTuSQyt: -2146883648 to -2147483648. Reason: Aggregated reputation change. Banned, disconnecting.
Feb 15 23:01:58 ksm1 kusama[1073720]: 2024-02-15 23:01:58 Report 12D3KooWBUBJYk73xp37pMQi4mRFpxCGwja9d895JSQNVFSJMeNN: -1104795000 to -2113687465. Reason: Aggregated reputation change. Banned, disconnecting.
Feb 15 23:01:58 ksm1 kusama[1073720]: 2024-02-15 23:01:58 Report 12D3KooWGDFVXGyypQ62HX3fkqfTLkUqwQ2eJVYJaF2XU7iPWFdU: -1725310000 to -2147483648. Reason: Aggregated reputation change. Banned, disconnecting.
Feb 15 23:01:58 ksm1 kusama[1073720]: 2024-02-15 23:01:58 Report 12D3KooWQxKHvunt8gEJesA2D7LfSrkYTf4p3WpgakQDVzFM6K7T: -1860880000 to -2147483648. Reason: Aggregated reputation change. Banned, disconnecting.
Feb 15 23:01:58 ksm1 kusama[1073720]: 2024-02-15 23:01:58 Report 12D3KooWQduj1nQn5BynC1RQPyYhLuKysiJYzD6CEvtFgDmoWHhR: -2006620000 to -2147483648. Reason: Aggregated reputation change. Banned, disconnecting.
Feb 15 23:01:58 ksm1 kusama[1073720]: 2024-02-15 23:01:58 Report 12D3KooWLdpk2c34WePBMbnMxMCX2qimwvSFcVBehixWCBT5UdCA: -1536280000 to -2147483648. Reason: Aggregated reputation change. Banned, disconnecting.
Feb 15 23:01:58 ksm1 kusama[1073720]: 2024-02-15 23:01:58 Report 12D3KooWSGtkGCSEGPiyjK3b2Pt2g76ohYEhfuYpZBQCpC594JPJ: -2146823648 to -2147483648. Reason: Aggregated reputation change. Banned, disconnecting.
Feb 15 23:01:58 ksm1 kusama[1073720]: 2024-02-15 23:01:58 Report 12D3KooWSreuRN9vwLuXc2xfXqaW7nPPR4LQDR96oeNSwkaRP71F: -1908070000 to -2147483648. Reason: Aggregated reputation change. Banned, disconnecting.
Feb 15 23:01:58 ksm1 kusama[1073720]: 2024-02-15 23:01:58 Report 12D3KooWMzFHqBTkvNBwDkrsWmUQovDvQYqZtbUz53D7DTFhUJSm: -2146763648 to -2147483648. Reason: Aggregated reputation change. Banned, disconnecting.
Feb 15 23:01:58 ksm1 kusama[1073720]: 2024-02-15 23:01:58 Report 12D3KooWM7yn2nskJ1x75x83hcy81MEy4c2tfFLjUAnBSPThpKJe: -2146763648 to -2147483648. Reason: Aggregated reputation change. Banned, disconnecting.
Feb 15 23:01:58 ksm1 kusama[1073720]: 2024-02-15 23:01:58 Report 12D3KooWJeeExSxzZ6FsoKx6acee468xhCZnsukFmyXa5nHBSi6E: -2146763648 to -2147483648. Reason: Aggregated reputation change. Banned, disconnecting.
Feb 15 23:01:58 ksm1 kusama[1073720]: 2024-02-15 23:01:58 Report 12D3KooWDSXDeGFKg23GEsDVMTQnG2jjTz6XXNVHcbT82xcyaody: -1541420000 to -2147483648. Reason: Aggregated reputation change. Banned, disconnecting.
Feb 15 23:01:58 ksm1 kusama[1073720]: 2024-02-15 23:01:58 Report 12D3KooWPENq3uTuj1rNPZLiYirFhCJJfZbHNxLT5SPUQnLSZJkz: -1780495000 to -2147483648. Reason: Aggregated reputation change. Banned, disconnecting.
Feb 15 23:01:58 ksm1 kusama[1073720]: 2024-02-15 23:01:58 Report 12D3KooWE8aTUUTT54rj974KhMdMoeG766nxnxQaBKpV7P3HjgR7: -1100545000 to -1957126554. Reason: Aggregated reputation change. Banned, disconnecting.
Feb 15 23:01:58 ksm1 kusama[1073720]: 2024-02-15 23:01:58 Report 12D3KooWNuESg13bXicLXwpJQDD3fQ6F9HW1eKQ9eKv35d2diYz4: -1615615000 to -2147483648. Reason: Aggregated reputation change. Banned, disconnecting.
Feb 15 23:01:58 ksm1 kusama[1073720]: 2024-02-15 23:01:58 Report 12D3KooWJki7YscjWoN5G2er48bACYrPe8xwSQt11J69Vt2n2bEn: -961770000 to -2121218812. Reason: Aggregated reputation change. Banned, disconnecting.
Feb 15 23:01:58 ksm1 kusama[1073720]: 2024-02-15 23:01:58 Report 12D3KooWDB7snmo6VhhunYHTqN6GsivpMcBxLhxrdQwPA7p1ATV1: -2146863648 to -2147483648. Reason: Aggregated reputation change. Banned, disconnecting.
Feb 15 23:01:58 ksm1 kusama[1073720]: 2024-02-15 23:01:58 Report 12D3KooWFno1RMTDfp8iFxA2kjhR8tuds5oBNWczad38pzUc2Vt5: -753955000 to -1916383669. Reason: Aggregated reputation change. Banned, disconnecting.
Feb 15 23:01:58 ksm1 kusama[1073720]: 2024-02-15 23:01:58 Report 12D3KooWSbgpcJKPv4QGbGcJmeHHcJhc9ePAA85YRydKWNUC6oVz: -2146803648 to -2147483648. Reason: Aggregated reputation change. Banned, disconnecting.
Feb 15 23:01:58 ksm1 kusama[1073720]: 2024-02-15 23:01:58 Report 12D3KooWHkk7u2XDLMsJm4TTLMnuGavhq8z3waTWhvnmoD9usyXp: -2146763648 to -2147483648. Reason: Aggregated reputation change. Banned, disconnecting.
Feb 15 23:01:58 ksm1 kusama[1073720]: 2024-02-15 23:01:58 Report 12D3KooWAZA8raRHpt2GBXYUXGLAtYbyj4W6TRtj2pVvSFrSsPAB: -2043060000 to -2147483648. Reason: Aggregated reputation change. Banned, disconnecting.

Steps to reproduce

(globally, included logs from active validator)

@SimonKraus SimonKraus added I10-unconfirmed Issue might be valid, but it's not yet known. I2-bug The node fails to follow expected behavior. labels Feb 15, 2024
@SimonKraus
Copy link
Author

Finalization started again but will leave this open for a resolution to be added later.

@Overkillus
Copy link
Contributor

Overkillus commented Feb 16, 2024

Draft fix is being worked on and tested: #3358
This is the patch for the core of the issue which should prevent further stalls.

Cause issue seems to be related to the introduction of off-chain disabling feature (Validator disabling in Dispute Participation) in the 1.6.0 release. This feature is a small part of the changes coming in the Overhauled Disabling Strategy.

Issue was dormant for a while and only surfaced now because it needed enough disputes to trigger it which simply did not happen for a while.

A proper post-mortem is being worked on but based on our current understanding there's a short summary below:

Change:

The introduction of off-chain disabling made it so nodes that witness a dispute concluding would locally mark the losers as disabled. When we receive dispute votes from a disabled node if those are the only votes stating invalid (for that dispute) we will record them but not participate ourselves. This stops the unnecessary escalation from disabled nodes until we get at least one non disabled vote stating invalid.

Disputes Status flow:
Active -> Confirmed -> ConcludedFor / ConcludedAgainst

Problem:

A validator got disabled. Same validator then attempted to start more disputes. Nodes correctly imported the votes and they did not participate so the dispute never got Confirmed (1/3 participation). The dispute got incorrectly marked as Active as it was imported, but the only votes seen were coming from disabled validators. Despite never getting Confirmed the disputes in their Active state were considered a potential risk by the chain selection logic (used by GRANDPA) which omitted any blocks including them.

Core of the problem is that those disputes were being marked as Active while they were actually non-threatening.

Fix:

#3358 correctly updates their status to non Active to prevent chain selection from seeing them as a threat.

It was a false alarm where something non-threatening was marked as a threat and caused a stall.

Note:

Issue is connected to #3314

github-merge-queue bot pushed a commit that referenced this issue Feb 17, 2024
- [x] test with zombienet-sdk
- [x] prdoc

Relevant Issues:
#3314 (connected to the
cause)
#3345 (solves)

---------

Co-authored-by: Kian Paimani <[email protected]>
eskimor pushed a commit that referenced this issue Feb 19, 2024
- [x] test with zombienet-sdk
- [x] prdoc

Relevant Issues:
#3314 (connected to the
cause)
#3345 (solves)

---------

Co-authored-by: Kian Paimani <[email protected]>
EgorPopelyaev pushed a commit that referenced this issue Feb 19, 2024
Backport for 1.7: #3358

Relevant Issues:
#3314 (connected to the
cause)
#3345 (solves)

---------

---------

Co-authored-by: ordian <[email protected]>
Co-authored-by: Kian Paimani <[email protected]>
Co-authored-by: eskimor <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I2-bug The node fails to follow expected behavior. I10-unconfirmed Issue might be valid, but it's not yet known.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants