-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Update statement-distribution for asynchronous backing #5055
Comments
I've been reflecting on how this would be implemented in practice with a few questions:
The architecture that I'm coming to as a consequence is that (1) should be candidate-backing, because prospective-parachains has to be informed about when candidates are first seconded and when they're backed, and candidate-backing is the only place where backing votes are counted. broadly, statement distribution needs to be able to ask 2 kinds of questions. The first is asked when we receive a new The distinction between the former and the latter is that the former deals with hypothetical statements (potential spam), whereas the latter is garbage-collection (potentially outdated) and all statements should have already been imported into the prospective parachains subsystem. However, without a change in how we pass statements to candidate-backing, there's the distinct possibility that they have not been imported. That is, if we just hand statements over to candidate-backing, it's possible for us to receive a new The workaround is to attach a response sender to One last point is that while it's technically possible for importing a candidate A into a fragment-tree could enable a previously-known candidate B to be imported as its child, this is a real edge case and we can ignore it in statement-distribution without major consequences. This is a byproduct of the fact that candidates express their parent as head-data, and not as a candidate hash, which means that we can encounter multiple inheritance problems. A -> B -> C, but then receiving a B' with the same head-data as B we can get A -> B' -> C as well in the same tree. This is possible, for instance, if B and B' had different relay-parents (and the PVF didn't reflect that change in initial conditions in the output head-data for the state transition. unreasonable, but maybe possible...). We'll be slightly less strict on spam prevention by ignoring this but our memory usage is still bounded and the end result is only that crazy parachains may be less reliable. |
Here's an alternative version which doesn't care about equivocations (which are damn complicated to report to the chain anyway until we have sequence numbers in candidate receipts) but still handles spam and keeps bandwidth relatively low. It does this by having validators operate directly within their groups and only circulate backed candidates to the rest of the network.
|
I believe this captures what we discussed, with way more details of course. We should discuss when/if these possibly spammed backed candidates get propagated. Imagine, every validator only forwards at most one It maybe worsens censorship problems for parachains? |
I'd answer a slightly different question first: what happens if the grid relayers relay nothing at all to try to censor the parachain? We have two levels of redundancy that seem to alleviate this:
re: censorship, the way I look at it is that the adversary is trying to get a proportion For this analysis I'm assuming that the group is majority-honest because otherwise censorship of a parachain is easy: just don't vote for anything. And assuming that the group is seconding blocks at the expected rate. In the base case where nobody except the backing group learns of the candidate, we'd have something like Validators originating a so I'd estimate that the worst-case of censorship if attacking nodes just don't participate in the grid is to reduce the likelihood of a block being authored containing a candidate from the parachain to ~ To answer the question you actually asked, I suspect that as long as grid relayers relay exactly 1 candidate per parachain at each position (which they might do as a strategy to reduce bandwidth), then the entire network will see at least 1 thing that can be bundled in the next relay-chain block, even though they might not all see the same thing. That seems fine to me and quite possibly a valid argument for implementing the code this way. If the parachain collator-selection is working correctly, there should only be 1 valid thing at any point. |
If 1/3 of validators wanted to censor a parachain then they might behave worse by stopping approval checks for the parachain, which gives a separate soundness adversary much better odds of breaking soundness, like printing the parachain's governance token for themselves. In this, we've broken the 2/3rd honest assumption though because I made the soundness adversary separate from the "fuck them" adversary, so yes we're still going to assume 2/3rd honest in censorship worries.. As availability needs connections between all validators anyways, we're always free to ephemerally vary the grid frequently, or even per message, like the relay parent hash, or assignment VRF in assignments, determines our permutation of the validator set into the grid positions. We then assume the censoring adversary is small so we can ignore the risks of a few unlucky candidates needing to pass through censoring nodes, or perhaps vary the grid with each backing validator group rotation, maybe worth analyzing this claim first of course. It's also possible validator group rotation already handles this just fine. We should not imho vary the grid by candidate receipt because the grid provides a nice spam stopping function if all backing messages from a validator for a specific relay parent hash take the same grid route. We've discussed set reconciliation before, but we could do set reconciliation at most one fully backed candidates per backing validator, which could then run outside the grid or on yet another grid, maybe or maybe not ephemeral. Did you mean the set reconciliation protocols you know do not handle large enough sets? |
Yeah, there are maybe worse things that attacking validators could do to try to censor a parachain. I was limiting the scope of my comment above to only this protocol.
This probably helps somewhat in practice, although I think the effect would be a 'smoothing' of the censorship curve. Also agreed we shouldn't vary per candidate-hash, as that can be chosen by validators.
They can handle large sets, but not in a way that'd be bandwidth-efficient for statement distribution, as far as I could tell. We have to handle very large sets at very high speeds, and the approaches of most set reconciliation protocols for transactions to first do a round of flooding and then a couple rounds of set reconciliation weren't suitable here. |
Alright, our toolbox should remain grid-like for now then, so varying grids, analyzing grid dimension, analyzing non-grids with grid-like properties ala slimfly, etc, thanks. |
re: paritytech/polkadot-introspector#91 One of the main challenges with these types of closed-topology gossip protocols is observability of network state by external nodes. We should build in some kind of pub/sub thing on top of this where peers who we wouldn't normally send |
@rphmeier I assume this is just similar to what I was describing there, but from an implementation perspective it will be done at the network protocol level, right ? |
I like the observation that with async backing we no longer need to propagate everything to everybody immediately and instead have a two phase process. The "tell what we have" in a compact form and then fetch if needed also sounds like the way to go. One thing I am not sure I understand what it is about is the following:
Keep alternative paths through the topology open? I assume this is, because the sender would assume we are dead/malicious if we don't fetch otherwise? But still how would this close a path through the topology? It would certainly prevent pointless re-sends on view change though. |
Yeah. Doing this at the network-protocol level opens up more doors for alternative implementations or for polkadot-introspector to not require knowing any node. I'd consider this an issue that's not urgent by any means but also quite interesting & useful work. |
I wasn't very clear in my wording there, but what it specifically refers to is the part of the protocol used to distribute additional votes to everyone. i.e. a candidate can be backed with 3 votes but might get 5 eventually. Some node might get two grid peers advertising I amended the language in the post to make things more clear. |
Uh got it. Didn't realize that we keep sending statements after we answered with a packet. So in summary, protocol in phase 2 is:
On 4 ... only new statements? We actually don't know which ones the requester has, if it fetched them from another peer. To avoid censorship we kind of have to send all of them. Which should be fine as the heavy load was handled via req/res - we only send compact statements here. What is not quite clear to me: When do we send those statements? What are the triggers? Also on view change like the Inventory messages? |
This was implemented in #5999 |
If the runtime API version indicates that asynchronous backing is enabled, we'll need new logic in statement-distribution.
Goals
The statement-distribution subsystem has these high-level goals:
Recap
By statement, we mean something roughly like
Validators sign these statements about candidates, which in turn are gossiped over the network and can be included alongside the candidate on-chain.
Validators should only issue
Valid
statements for candidates that they've seenSeconded
, and there are restrictions on how manySeconded
statements can be produced by a validator, makingSeconded
statements the core means of preventing spam. In fact, the changes to semantics here as a result of #4963 are the main reason the logic of this subsystem needs to be changed.Spam Protection
There are two kinds of spam that we are worried about.
Status Quo
Without asynchronous backing, parachain candidates were required to have the most recent relay-chain block as their relay-parent, which in turn implied that no two sequential parachain blocks could have the same relay-parent. Spam prevention was relatively simple: we required that each validator could second no more than 1 candidate per relay-chain block, and that
Valid
statements could only be sent to peers we were sure had the correspondingSeconded
statement - either because they sent it to us, or we sent it to them.Valid
statements not corresponding to a knownSeconded
block could be ignored, and the amount ofSeconded
statements to consider was bounded by the number of validators.We exchange
View
s with our peers, which indicate our current active leaves. Our peers send us only candidates that have anything to do with these active leaves. Each active leaf is a blank slate.Changes
With asynchronous backing, there's no longer any requirement for sequential parachain blocks to have different relay-parents. However, we have to rein in the number of possible candidates somehow, otherwise malicious validators or validator groups could produce an infinite number of valid parachain blocks and furthermore, an infinite number of valid prospective parachains of infinite length. These are the problems we need to avoid.
The model we introduce is based off of "prospective parachains", where backing subsystems buffer a few parachain blocks off-chain. This is coordinated by the Prospective Parachains subsystem #4963 . The buffers are actually trees with an enforced max depth. The prospective parachains subsystem is designed to work hand-in-hand with higher level spam prevention to avoid the trees growing too large - and statement-distribution fulfills that role.
Each new active leaf is no longer guaranteed to be a blank slate, but instead will initially be populated by a
FragmentTree
from the set of already known candidates. Peers keep track of which candidates they have sent and received messages for.Candidates don't have a globally unique depth, and they don't even have a unique depth per active-leaf. It is legal, although unexpected, for head-data to cycle and relay-parents can now stay the same. That is, each fragment tree gives a set of depths for any candidate, which is usually either empty or has length 1, except for pathological parachains. This set of valid depths is defined as the depths in the tree at which the candidate appears.
Outdated first attempt (see #5055 (comment))
This is probably the trickiest conceptual bit, but here goes: we keep candidates and messages referencing them around until the union of the valid depths is empty at all active leaves. The valid depths for a candidate trend towards the empty set because the candidate is eventually either included or orphaned.
So here's how we use depths to prevent spam:
Valid
messages after becoming aware ofSeconded
messages for a candidateSeconded
message per validator per depth per active-leaf.Seconded
messages which refer to candidates that would have an empty set of valid depths at all of our active leaves are considered spam.We already know which active-leaves our peers are aware of through their views, and we're aware of which candidates we've received from/sent to our peers, which means that it's not too involved for us to build up a view of their own
FragmentTrees
which we can use to send them things.So, in practice, this 'depth' approach also means that validators have to choose a fork of the parachain to stick to: it sets an upper bound of n_validators*max_depth nodes in the tree, not n_validators^max_depth.
Practicalities
Most of the issues in implementation will revolve around race conditions:
The text was updated successfully, but these errors were encountered: