[Research] Multiple classes of disablement #2005

Overkillus · 2023-10-24T12:32:22Z

Still an open investigation.

Disabling strategy and affect 3 distinct systems:

Parachain consensus*
GRANDPA
Block Authoring (BA)

* focusing on backing, approvals and disputes so all PVF based activities

Parachain consensus disabling will work as described in 784. As currently proposed it does not affect GRANDPA at all and suggests not affecting block authoring as well. So do we potentially need a second type of disabling, or maybe even three types? Having a single type would be preferred due to a simpler design.

Preliminary research:
GRANDPA TODOs (some are trivial foundations):

What offences are there in GRANDPA?
If those offences are committed what damage do they do? (How much resources we waste etc)
Can honest nodes commit those offences? (+ can we prevent it?)
Is slashing enough of a punishment? What happens when we don't disable in GRANDPA?
Is disabling an optimisation or a requirement?
What's the max number of disabled nodes in GRANDPA?

Block Authoring (BA) TODOs (some are trivial foundations):

What offences are there native to BA?
If those offences are committed what damage do they do? (How much resources we waste, are there security concerns etc)
Can honest nodes commit those offences? (+ can we prevent it?)
Is slashing enough of a punishment? What happens when we don't disable in Block Authoring?
Is disabling an optimisation or a requirement?
What's the max number of disabled nodes in BA?

Considerations:

honest nodes generally don't commit parachain consensus offences unless nondeterminism is exploited
if honest nodes CAN sometimes commit GRANDPA or BA offences we should not disable in parachain consensus based on those
we possibly CAN disable in GRANDPA and/or BA if parachain consensus offences are committed
parachain consensus (backing, approvals, disputes) offences are pretty much always connected to incorrect PVF execution
GRANDPA and BA could technically function even if node is incapable of executing a PVF
GRANDPA and BA also offer a diverse set of responsibilities and could support 2 extra classes of disablement on top of parachain consensus

Naive exploration - Single class approach TODOs:

Edge case check: what happens when 1/3rd of nodes are disabled in parachain consensus and it propagates to GRANDPA?
Edge case check: what happens when 1/3rd of nodes are disabled in parachain consensus and it propagates to BA?
Edge case check: what happens if there is one rogue node? Does propagating the disablement status improve the situation?
Nondeterminism can be exploited to potentially disable honest nodes through parachain consensus - how does it affect BA and GRANDPA if it propagates?
If honest nodes sometimes commit GRANDPA/BA offences how does it affect parachain consensus?

AlistairStewart · 2023-10-26T09:25:00Z

Let me answer the GRANDPA parts first

Preliminary research: GRANDPA TODOs (some are trivial foundations):
* [ ]  What offences are there in GRANDPA?

At the moment, only equivocation, voting twice in one round. The other things we'd want to slash for is more complicated because it requires a challenge-response protocol (see section 4.1 of https://github.com/w3f/consensus/blob/master/pdf/grandpa.pdf ).

* [ ]  If those offences are committed what damage do they do? (How much resources we waste etc)

For a few offences, we waste no resources whatsoever and performance is unnaffected, maybe even slightly better. When we have over 1/3 of validators equivocating, then we can finalise two different blocks. The is pretty bad and we are probably going to have to hardcode some exception into future clients when it happens.

When the finalised fork happens, we should ignore GRANDPA for the current validator set. Having 1/3 of the validator set in a session not show up is in theory less bad, but we still haven't finished putting in the special cases that would allow us to survive it so maybe that is as bad or worse in practice right now.

* [ ]  Can honest nodes commit those offences? (+ can we prevent it?)

No, with one exception that is not quite honest but has occurred multiple times. A validator running multiple nodes with the same identity will equivocate.

In principle there is a key security strategy to prevent it - we would store some keys in memory so this would only occur when cloning a VM somehow rather than just copying the HD for a new node. The issue is that now we might have to deal with liveness - what if a bug requires all validators to restart, would this halt GRANDPA? So no easy fix on the horizon.

* [ ]  Is slashing enough of a punishment? What happens when we don't disable in GRANDPA?

Slashing is actually quite small for a single offence. It gets bigger the more validators equivocate at the same time, because a single offence is no threat.

If a validator runs two nodes as above and is not disabled for a GRANDPA equivocation, it will carry on equivocating in GRANDPA and also in BA. A single equivocation in BA does affect our performance and has a bigger slash.

In GRANDPA itself, there is no problem allowing an isolated equivocating node to carry on voting. Under the two nodes situation, in many round both nodes will vote for the same thing, which shouldn't be an equivocation and usefully contributes. When they equivocate, there is no harm.

Also slashing in GRANDPA right now, and any future disablement, has no affect on the validators for the current session, it only means that they will not be included in the authority set for te next session. Light clients of GRANDPA, including for bridges and warp sync, might go out of consensus without that.

But equivocating in GRANDPA should cause disablement in BA as soon as possible. Also we should watch out for not getting multiple slashes for continued equivocation.

* [ ]  Is disabling an optimisation or a requirement?

It's not even an optimisation, let alone a requirement, for GRANDPA itself but it does interact with BA.

* [ ]  What's the max number of disabled nodes in GRANDPA?

Because disabling does not and should not affect the current session, the obvious issue with talkng out a 1/3 of validators and stalling finality is a non-issue. So we could go down to 4 nodes, as long as it is at all reasonable that we maintain 2/3 honesty. But then if GRANDPA doesn't fork, what reason would an attacker have to equivocate? If we only take out honest nodes, we shouldn't be disabling at all because it reduces our tolerance for bad guys.

If it did finalise different forks, we should disable all equivocators, even if it over 1/3 of all validators, for the next session.

Naive exploration - Single class approach TODOs:

* [ ]  Edge case check: what happens when 1/3rd of nodes are disabled in parachain consensus and it propagates to GRANDPA?

GRANDPA disablement should not take effect in the current session. As long as that holds true, this might not be so bad.

* [ ]  If honest nodes sometimes commit GRANDPA/BA offences how does it affect parachain consensus?

In GRANDPA's case, not at all. But BA equivocation is a headache.

eskimor · 2023-11-07T17:32:46Z

How is this research coming along? Does it threaten our goal to have disabling done by EoY?

eskimor · 2023-11-30T15:39:49Z

All good. This seems to be resolved, one class is fine.

Overkillus added I10-unconfirmed Issue might be valid, but it's not yet known. T8-polkadot This PR/Issue is related to/affects the Polkadot network. labels Oct 24, 2023

Overkillus self-assigned this Oct 24, 2023

Overkillus added this to parachains team board Oct 24, 2023

Overkillus moved this to Backlog in parachains team board Oct 24, 2023

Overkillus moved this from Backlog to In Progress in parachains team board Oct 24, 2023

eskimor closed this as completed Nov 30, 2023

github-project-automation bot moved this from In Progress to Completed in parachains team board Nov 30, 2023

Overkillus removed the I10-unconfirmed Issue might be valid, but it's not yet known. label Dec 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Research] Multiple classes of disablement #2005

[Research] Multiple classes of disablement #2005

Overkillus commented Oct 24, 2023 •

edited

Loading

AlistairStewart commented Oct 26, 2023 •

edited

Loading

eskimor commented Nov 7, 2023

eskimor commented Nov 30, 2023

[Research] Multiple classes of disablement #2005

[Research] Multiple classes of disablement #2005

Comments

Overkillus commented Oct 24, 2023 • edited Loading

AlistairStewart commented Oct 26, 2023 • edited Loading

eskimor commented Nov 7, 2023

eskimor commented Nov 30, 2023

Overkillus commented Oct 24, 2023 •

edited

Loading

AlistairStewart commented Oct 26, 2023 •

edited

Loading