Approval Distribution Subsystem #1951

rphmeier · 2020-11-13T21:21:32Z

Closes #1603

roadmap/implementers-guide/src/node/approval/approval-distribution.md

roadmap/implementers-guide/src/types/overseer-protocol.md

…tion.md Co-authored-by: Peter Goodspeed-Niklaus <[email protected]>

…h/polkadot into rh-approval-networking

roadmap/implementers-guide/src/types/overseer-protocol.md

roadmap/implementers-guide/src/node/approval/approval-distribution.md

roadmap/implementers-guide/src/types/network.md

roadmap/implementers-guide/src/types/overseer-protocol.md

roadmap/implementers-guide/src/node/approval/approval-distribution.md

roadmap/implementers-guide/src/runtime/approvals_inherent.md

roadmap/implementers-guide/src/types/network.md

burdges · 2020-11-22T19:25:56Z

roadmap/implementers-guide/src/node/approval/approval-distribution.md

+  1. Is a particular assignment relevant under a given `View`?
+  2. Is a particular approval relevant to any assignment in a set?
+
+These two queries need not be perfect, but they must never yield false positives. For our own local view, they must not yield false negatives. When applied to our peers' views, it is acceptable for them to yield false negatives. The reason for that is that our peers' views may be beyond ours, and we are not capable of fully evaluating them. Once we have caught up, we can check again for false negatives to continue distributing.


"it is acceptable for our conclusions to yield false negatives"

roadmap/implementers-guide/src/node/approval/approval-distribution.md

burdges · 2020-11-22T20:50:52Z

roadmap/implementers-guide/src/node/approval/approval-distribution.md

+    * If the message fingerprint appears under the `BlockEntry`'s `Knowledge`, give the peer a small positive reputation boost and return. Note that we must do this after checking for out-of-view to avoid being spammed. If we did this check earlier, a peer could provide data out-of-view repeatedly and be rewarded for it.
+    * Dispatch `ApprovalVotingMessage::CheckAndImportAssignment(assignment)` and wait for the response.
+    * If the result is `AssignmentCheckResult::Accepted(candidate_index)` or `AssignmentCheckResult::AcceptedDuplicate(candidate_index)` 
+      * If `candidate_index` does not match `claimed_candidate_index`, punish the peer's reputation, recompute the fingerprint, and re-do our knowledge checks. The goal here is to accept messages which peers send us that are labeled wrongly, but punish them for it as they've made us do extra work.


"labeled wrongly"? Are these indexes the vacated cores number? If so, we could always include the index inside the assignment, even when doing a modulo VRF criteria for tranche zero, so it could be read before proceeding, which makes the assignment outright invalid if the index is wrong. Is that helpful?

Actually I'm unsure where claimed_candidate_index originates. I suppose the peer sends us data it believes important, but why does it need to tell us why the data matters? We either agree or not, yes?

Yeah, I think including that data in the VRF would make this point redundant. At the moment, the vacated core index isn't part of the VRF, and we have peers send us the vacated core index alongside the assignment which makes it easier for us to do some politeness checks.

So the conundrum this is solving is what we should do if a peer sends us something labeled with the wrong core index but that is still a valid assignment for some other core. And my answer was to re-label it, punish the peer, and distribute.

Your general advice so far has been to keep as much data as possible out of the VRF, so it didn't cross my mind that was an option.

Our VRF signatures have an input that corresponds to the output, which yes we minimize, but the same VRF also signs an extra auxiliary message, that influences only it proof, not its output.

We can pack anything we like into this extra auxiliary message, like whatever claims simplify the code, or even the entire gossip message containing the VRF (IndirectAssignmentCert?). In fact, including the whole gossip message saves a signature that costs at least 45ms and 64 bytes per assignment message, so I slanted the draft code I wrote in this direction.

I see. So it is possible to include the core index under the proof, even if that core index was unknown prior to the output being generated? That does indeed save us a ton of trouble. What's the schnorrkel API for that?

burdges · 2020-11-22T21:01:53Z

roadmap/implementers-guide/src/node/approval/approval-distribution.md

+  * Dispatch a `ApprovalDistributionV1Message::Approval(approval)` to all peers in the `BlockEntry`'s `known_by` set, excluding the peer in the `source`, if `source` has kind `MessageSource::Peer`. Add the fingerprint of the assignment to the knowledge of each peer. Note that this obeys the politeness conditions:
+    * We guarantee elsewhere that all peers within `known_by` are aware of all assignments relative to the block.
+    * We've checked that this specific approval has a corresponding assignment within the `BlockEntry`.
+    * Thus, all peers are aware of the assignment or have a message to them in-flight which will make them so. 


All this sounds fine, but it's quadratic so another reason candidates should be rejected long before "escalation by DoS".

It breaks gossip assumptions if one compares fingerprints between peers using publicly keyed bloom filters, but if each pair of peers computes their own shared secret bloom filter key, then they could compare using bloom filters. Yet another optimization I guess.

burdges · 2020-11-22T22:42:19Z

roadmap/implementers-guide/src/node/approval/approval-voting.md

-    * Broadcast on network with an `ApprovalNetworkingMessage::DistributeAssignment`.
-    * Kick off approval work with `launch_approval`
+    * Broadcast on network with an `ApprovalDistributionMessage::DistributeAssignment`.
+    * Kick off approval work with `launch_approval`. Note that if the candidate appears in multiple current blocks, we will launch approval for each block it appears in. It may make sense to shortcut around this with caching either at this level or on the level of the other subsystems invoked by that function.


Yes, these approval checkers being different is an unfortunate consequence of using the relay VRF modulo criteria for tranche zero.

If the relay chain largely avoids forks then the relay VRF modulo criteria ensures all validators get the same number of tranche zero assignments, which improves tranche zero assignments in several respects:

We avoid a choice between either overworking nodes so much or else giving them more arbitrary choices about what they check.

We estimate the number of checkers with a binimial distribution, which gives a variance n p q which looks less than the variance obtained from the more complex distributions resulting from other strategies.

We could support validators of different computing capacities if we force validators to take assignments with sample numbers 1,..,k where k cannot exceed by too much their previous average number of samples completed on the same inclusion batch. In this way, we pay more for validators with more CPU but permit others to participate. This might not be a good thing, but nice option.

All that sounds fine, but the point here is actually about the fact that we literally re-execute the candidate even if we've just validated it and we should have some LRU cache somewhere so that being "re-assigned" to the same candidate across multiple forks is ultra cheap.

I missed interpreted and read this in terms of across validators, thanks!

burdges · 2020-11-22T23:49:46Z

LGTM :)

burdges · 2020-11-25T03:40:13Z

roadmap/implementers-guide/src/node/approval/approval-voting.md

@@ -153,7 +153,7 @@ On receiving an `OverseerSignal::ActiveLeavesUpdate(update)`:
  * Ensure that the `CandidateEntry` contains a `block_assignments` entry for the block, with the correct backing group set.
  * If a validator in this session, compute and assign `our_assignment` for the `block_assignments`
    * Only if not a member of the backing group.
-    * Run `RelayVRFModulo` and `RelayVRFDelay` according to the [the approvals protocol section](../../protocol-approval.md#assignment-criteria)
+    * Run `RelayVRFModulo` and `RelayVRFDelay` according to the [the approvals protocol section](../../protocol-approval.md#assignment-criteria). Ensure that the assigned core derived from the output is covered by the auxiliary signature aggregated in the `VRFPRoof`.


Seems fine. The core is always either and input or an output already. If you want it to always be an input then we can just include it in the message and check it from the output.

Can we feasibly include the core as an input to the VRF, given that it's derived from the output? That seems cyclical, but maybe through the power of cryptography...

burdges · 2020-11-25T03:43:02Z

roadmap/implementers-guide/src/node/approval/approval-voting.md

@@ -166,6 +166,7 @@ On receiving a `ApprovalVotingMessage::CheckAndImportAssignment` message, we che
  * Check the assignment cert
    * If the cert kind is `RelayVRFModulo`, then the certificate is valid as long as `sample < session_info.relay_vrf_samples` and the VRF is valid for the validator's key with the input `block_entry.relay_vrf_story ++ sample.encode()` as described with [the approvals protocol section](../../protocol-approval.md#assignment-criteria). We set `core_index = vrf.make_bytes().to_u32() % session_info.n_cores`. If the `BlockEntry` causes inclusion of a candidate at `core_index`, then this is a valid assignment for the candidate at `core_index` and has delay tranche 0. Otherwise, it can be ignored.
    * If the cert kind is `RelayVRFDelay`, then we check if the VRF is valid for the validator's key with the input `block_entry.relay_vrf_story ++ cert.core_index.encode()` as described in [the approvals protocol section](../../protocol-approval.md#assignment-criteria). The cert can be ignored if the block did not cause inclusion of a candidate on that core index. Otherwise, this is a valid assignment for the included candidate. The delay tranche for the assignment is determined by reducing `(vrf.make_bytes().to_u64() % (session_info.n_delay_tranches + session_info.zeroth_delay_tranche_width)).saturating_sub(session_info.zeroth_delay_tranche_width)`.
+    * We also check that the core index derived by the output is covered by the `VRFProof` by means of an auxiliary signature.


The auxiliary message checked by the proof is just like a signature, so it gives us the ability to include stuff that is not in the VRF input. The core is either an input or output of the given criteria already, but we can include it in the auxiliary message and check that it is correct if that simplifies some code somewhere.

* master: Some code cleanup in overseer (#2008) PoV Distribution optimization (#1990) Approval Distribution Subsystem (#1951) Session management for approval voting (#1973) Do not send messages twice in bitfield distribution (#2005)

rphmeier added 2 commits November 13, 2020 16:17

skeleton flow control

fa94317

tweaks & rename to approvals distribution

5fd754a

rphmeier added A3-in_progress Pull request is in progress. No review needed at this stage. B0-silent Changes should not be mentioned in any release notes C1-low PR touches the given topic and has a low impact on builders. labels Nov 13, 2020

coriolinus reviewed Nov 16, 2020

View reviewed changes

rphmeier and others added 14 commits November 16, 2020 10:21

Update roadmap/implementers-guide/src/node/approval/approval-distribu…

9f60c32

…tion.md Co-authored-by: Peter Goodspeed-Niklaus <[email protected]>

Update roadmap/implementers-guide/src/node/approval/approval-distribu…

755ea5f

…tion.md Co-authored-by: Peter Goodspeed-Niklaus <[email protected]>

add a NewBlocks message and dispatch

ae542b9

new data format for approval distribution

328e834

guide: update view to include finalized block number

febd83a

Merge branch 'rh-approval-networking' of https://github.com/paritytec…

216a65a

…h/polkadot into rh-approval-networking

approvals: document view updating

33922dc

pruning when peers disconnect

3492743

add remaining message types

3f1b3db

fix link

eba1d50

network message type

4db5ec9

handle incoming assignments

acf164f

import_and_circulate_approval

ee28e90

handle new blocks

739bc98

rphmeier added A0-please_review Pull request needs code review. and removed A3-in_progress Pull request is in progress. No review needed at this stage. labels Nov 18, 2020

rphmeier marked this pull request as ready for review November 18, 2020 02:08

coriolinus reviewed Nov 18, 2020

View reviewed changes

address review comments

ceb50ae

coriolinus approved these changes Nov 19, 2020

View reviewed changes

burdges reviewed Nov 22, 2020

View reviewed changes

roadmap/implementers-guide/src/node/approval/approval-distribution.md Show resolved Hide resolved

burdges reviewed Nov 22, 2020

View reviewed changes

rphmeier added 2 commits November 24, 2020 20:51

address review comments and use nifty VRFProof

4fcd2fe

Merge branch 'master' into rh-approval-networking

a6c8356

burdges reviewed Nov 25, 2020

View reviewed changes

rphmeier merged commit 029c8a2 into master Nov 25, 2020

rphmeier deleted the rh-approval-networking branch November 25, 2020 04:40

rphmeier mentioned this pull request Sep 26, 2022

Member Request polkadot-fellows/seeding#4

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Approval Distribution Subsystem #1951

Approval Distribution Subsystem #1951

rphmeier commented Nov 13, 2020

burdges Nov 22, 2020

burdges Nov 22, 2020

burdges Nov 22, 2020

rphmeier Nov 22, 2020

burdges Nov 22, 2020 •

edited

Loading

rphmeier Nov 25, 2020 •

edited

Loading

burdges Nov 22, 2020

burdges Nov 22, 2020

rphmeier Nov 22, 2020

burdges Nov 22, 2020

burdges commented Nov 22, 2020

burdges Nov 25, 2020

rphmeier Nov 25, 2020

burdges Nov 25, 2020

Approval Distribution Subsystem #1951

Approval Distribution Subsystem #1951

Conversation

rphmeier commented Nov 13, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

burdges Nov 22, 2020 • edited Loading

Choose a reason for hiding this comment

rphmeier Nov 25, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

burdges commented Nov 22, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

burdges Nov 22, 2020 •

edited

Loading

rphmeier Nov 25, 2020 •

edited

Loading