Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Torrent-style fetching for PoVs: high-level #968

Open
rphmeier opened this issue Jun 18, 2021 · 17 comments
Open

Torrent-style fetching for PoVs: high-level #968

rphmeier opened this issue Jun 18, 2021 · 17 comments
Labels
I9-optimisation An enhancement to provide better overall performance in terms of time-to-completion for a task.

Comments

@rphmeier
Copy link
Contributor

Our current Proof-of-Validity (PoV) distribution system for backing is quite simple: A validator fetches a full PoV from a collator, the the validator announces the PoV, and other validators in the group supply it in full.

PoV hashes and trie commitments

It is possible to change the semantics of the pov_hash in the candidate descriptor to refer to a merkle trie of the PoV data as opposed to the entire data. It needs to be done in a backwards compatible way, so we still support full PoVs. What we would allow is something like this:

const CHUNK_SIZE = 32KiB;
struct PoVTrieCommitment(root_hash, n_chunks);

The trie is computed as a mapping from indices i to the chunk hash.

This is the root hash of the merkle trie as well as the number of chunks contained in the trie. The rightmost chunk may have size less than 32KiB, so the upper bound on the size is predicted by n_chunks * CHUNK_SIZE. Storing the number of chunks is important, so validators can tell immediately from the trie root commitment whether the PoV is oversized.

For the pov_hash in the candidate descriptor, it is defined to either be hash(PoV) or hash(PoVTrieCommitment). This allows for backwards compatibility. Note that the erasure root in the candidate receipt is computed over an erasure-coding on the full PoV data, and will not change. However, it will become necessary for approval and dispute checkers to check that the pov_hash in the descriptor matches the trie commitment computed from the full PoV data. Backing checkers, having recovered the PoV from the trie, do not need to do this check.

As the old format of pov_hash = hash(pov) is still supported, validators checking the validity of a descriptor will have to check both pov_hash == hash(pov) || pov_hash == trie_commitment(pov). However, computing the commitment can be done in a single pass over the PoV data without any allocations, just by hashing the stream and ending every 32KiB, so it should be relatively cheap - instead of taking 1 hash pass over the PoV, we take 2, and this is not a bottleneck for approval checkers.

Pre-validation functions (PreVF)

Authenticating a collator's submitted block without downloading the entire PoV is impossible right now. To address this we use the notion of a Pre-validation function, or PreVF for short. PreVFs also have their own counterpart to the PoV - the Proof-of-Pre-Validity, or PrePoV for short. Definitionally, PoVs and PrePoVs are both just byte vectors, however they have different semantic meanings and can have different semantic length limits imposed. For example, while PoVs may be allowed to span multiple Megabytes, a PrePoV may be limited to just a few kilobytes.

This is a new function on the Parachain Validation code which has this approximate signature

fn pre_validate(
    parachain_parent_head_data,
    new_para_head_hash,
    relay_parent_number,
    relay_parent_state_root,
    PrePoV,
) -> Result<CollatorId, Bad>

We introduce the notion of reasonable uniqueness. For a PreVF to function effectively, its result must be reasonably unique. Signatures by a specific validator are reasonably unique so long as equivocations are punished ecnomically. Proofs of Work are reasonably unique if they are sufficiently difficult. And so on. It's allowed to occasionally have collisions in the PreVF, but the multiplicity should be low even if so.

With PreVFs, collators can send over (CandidateReceipt, PrePoV) pairs. The PrePoV can be used by validators to determine preliminary validity.

Note that we don't rely on PreVFs for security of any kind. A Parachain exposing a PreVF which is not plausibly unique only detracts from the liveness of the parachain, as it can only be used to force validators to waste bandwidth and fail to back any candidate for the parachain.

Torrenting PoVs

We introduce a new request type:

struct Request { pov_hash, index };
struct Response { merkle_proof, chunk_data };

We introduce a new gossip statement, to be used exclusively among backing validators:

IntentToRecoverSeconded(CandidateHash, PovTrieCommitment);
IntentToRecover(CandidateReceipt, PovTrieCommitment, PrePoV);

Stage 1: Torrenting among validator groups

Stage 1 can be implemented before PreVFs. At this point, a seconding validator still needs to download the full PoV from a collator and check it before seconding. However, after issuing the Seconded message, the validator can issue an IntentToRecover(seconded, trie_commitment) and the other validators in the group can also signal an intent to recover the data and will recover it from the seconding validator as well as each other.

We have a few pieces of information that can be used to determine the chunk range that validators initially download, to maximize initial coverage:

  1. Each validator's index in the backing group
  2. The amount of validators in the backing group
  3. The overall size of the PoV
  4. The fact that a majority opinion is needed within the backing group

Validators can choose the initial ranges of chunks they download based on this information, to achieve a desirable balance of redundancy and availability.

Stage 2: Torrenting before seconding

PreVFs will enable validators to recover and torrent a PoV among all validators in the group as well as potentially from the collator even before checking the candidate. There should be a limit of 2 or 3 recoveries at a time, and due to the condition of reasonable uniqueness we expect on PreVFs, this should not affect throughput.

A validator, having checked a PreVF, can distribute the PreVF to other validators in the group, and all begin torrenting chunks from each other as well as the initial collator. The same basic principles for choosing chunk ranges to fetch apply, with the addition of potential rules for collators to follow about which chunks to prioritize serving which validators.

@rphmeier
Copy link
Contributor Author

This might all be irrelevant if we just have backing groups of size 1 or 2. Groups of 3 or more are likely to benefit from this.

@Lldenaurois
Copy link

Lldenaurois commented Jun 19, 2021

UPDATE: After discussing with Rob, this currently falls short of the goal. I am going to double back and revisit later after reading further into the CandidateDescriptor logic.

Suggestion:

Instead of committing to the Merkle Root Hash of the PoV, it may be beneficial to some Elliptic Curve magic instead.

Reason: If we commit to the Root Hash alone, then we have the problem that we either need to verify all the merkle proofs for every chunk we receive or we need to download the entire PoV in order to verify that it matches the commitment (without actually needing to validate the entire PoV). One way to circumvent this would be to commit to the hash of every chunk of the PoV, but this allows an attacker to learn the hash of the chunks of the PoV without needing access to the PoV at all.

Goal: How can we transmit a single "Hash" to the validator such that they can verify that a certain chunk of the PoV is a "valid" part of the whole PoV, thereby ensuring that only people with access to the PoV can actually compute the Hash of a certain chunk. Not even the validator should really know the hash of a certain chunk until they actually download the chunks and verify that it is a member of the overarching set of commitments to the PoV.

One way to achieve this would be as follows (let's assume for the sake of discussion that we use BLS signatures):

  1. Chunk the PoV deterministically into chunks { C_1, C_2, ..., C_n } and subsequently hash each of these chunks to curve, i.e \mathcal{H} : { C_i } -> G, where G is some group, e.g. Elliptic Curve.

  2. When the collator advertises to the validator, they can sign each of these Hash points to arrive at \sigma : { C_i } -> G, where \sigma(C_i) = \matchal{H}(C_i)^sk for sk the collator's private key and every chunk index i. Note that \sigma( { C_i }_i ) = { \sigma(C_i) }_i.

  3. Instead of uploading { \sigma(C_i) }_i to the validator upon advertising to the validator (which would be equivalent to hashing each chunk), the collator should compute and subsequently upload:

    (i) H_j = [ \sum_i \mathcal{H}(C_i) ] - \mathcal{H}(C_j), the "complement" of the aggregate hash of chunks to chunk C_j
    (ii) \sigma(\sum_i \matchal{H}(C_i)), a signature on the aggregate Hash point

    Note: The collator does not upload the Hash point, \mathcal{H}(C) = \sum_i \mathcal{H}(C_i), only all of the complements. The validator does not know the value sum_i \matchal{H}(C_i). In theory it is possible to recompute the chunk of each hash from this information, but let's ignore that for now as I'm quite certain there's a simple fix for this.

  4. When torrenting, the validators will receive chunks from collators. What a collator will do is submit

    A. A signature (under their collator key) on the k-th chunk \sigma(C_k), k the index of the chunk they're uploading
    B. The index k (used by the validator to get the complementary hash-point)
    C. A signature on the complementary point \sigma(C'_k)

  5. The validator can now verify that the Hash is valid by downloading the chunk (which can be chosen arbitarily small) and subsequently

    A. Verify the signature on the chunk under the torrenting collator's public key, otherwise move on
    B. Parse k and grab the k-th complementary chunk and subsequently verify a signature on the complement, otherwise move on
    B. Compute the aggregate hash point: C = \matchal{H}(C_k) + C'_k
    C. Verify that the signature submitted by the original collator that declared the commitments is valid under C under that origin collator's public key, otherwise move on

This ensures that

(i). The chunk is a valid chunk of the PoV committed to by the original advertisement that got seconded
(ii). We needn't verify the chunk once we download it and needn't wait until the entire PoV is reconstructed to ensure it hashes to the right value

That is, we don't need to verify the chunks right away and needn't wait before we know whether a chunk is invalid if we don't verify the chunk itself.

If the collator were in addition able to prove that the commitments submitted in the advertisement that got seconded are commitments a PoV for which each Merkle proof is valid under some public root hash, we could "fix" the block at the seconding step without needing to have the PoV ready at that point in time. The PoV will only need to become available in the Approval Voting step (if I understand correctly).

However, due to the depth of a Blake2 circuit, proof generation is completely untenable. Moving to Pedersen Hashing for the PoV Merkle tree will make proof generation more tenable, but depending on the number of Merkle proofs, there may be a better proof system for proving knowledge of all or some leaves of a Merkle tree under a public root hash.

@Lldenaurois
Copy link

Lldenaurois commented Jun 19, 2021

@burdges Would it be possible to ask for a review of the protocol? I'm not sure it works... Need to think about it more. It may be necessary to use random weights to compute the composite hash point.

@burdges
Copy link

burdges commented Jun 19, 2021

Alright first..

Our erasure coding is systemic, meaning chunks 0..f+1 represent an exact copy of the block. We should add this optimization into the decoder @drskalman.

If you simply define the chunk to validator map by chunk_index = validator_index + parachain_id + slot_no % num_validators then over time chunks 0..f+1 land evenly among the validators. We then favor reconstruction from those specific chunks when possible. We also permit anyone who checked the block, aka backing checkers, to supply chunks other than their own, which again always come from 0..f+1.

We already have a chunks Merkle tree so this yields roughly bittorrent but degrades smoothly into full reconstruction.

We'd loose this clean layering if we ever discover some non-systemic codes with much better performance, but that's unlikely and several years away if ever.

@burdges
Copy link

burdges commented Jun 19, 2021

Second..

Your protocol is not secure because summing hashed-to-curve points does not act like an accumulator. H_fake = [ \sum_i \mathcal{H}(C_i) ] - \mathcal{H}(C_fake) falsely claims C_fake to be a valid share.

Just fyi, BLS signatures are crazy slow and should never be used without really careful consideration. The DLEQ proof used by schnorrkel's VRF gives some similar properties, but not aggregation. I already expose vrfs_merge functions in schnorrkel that merge multiple VRF pre-outputs in one DLEQ proof, but this requires a scalar multiplication per point, which costs like 1/2 the CPU that a signature costs, so slow relative to a Merkle proof.

We should already avoid senders doing signatures over they chunks they send because the chunk's Merkle proofs trace back to the candidate receipt on chain. Absolutely nothing you could do would be less CPU than this.

We likely use our radix 16 storage Merkle tree for computing the availability root thing. About every 6 months for the last 2 years, I complain to someone that radix 16 hashing needlessly bloats Merkle proofs in dense Merkle trees, by maybe a factor of 4x in the worst case. I always tell them to switch storage to binary hashing with a "fast forwarding hash function" and radix 16 caching. Invariably, they always explore the issue and come back to me saying that our storage Merkle trees are usually not dense, and so state migration is too much of a pain. They are wrong because some parachains will definitely use dense storage, but by this time I'm always too busy doing something else to push the issue. This is one of those dense Merkle tree situations where radix 16 hashing bloats our proofs of course.

In brief, my second suggestion is that Emeric or someone eventually fix our bloated Merkle proofs.

@rphmeier
Copy link
Contributor Author

rphmeier commented Jun 20, 2021

Ladi and I also had a longer discussion about the protocol and decided that we don't need anything other than regular merklization as I described in the original post.

We likely use our radix 16 storage Merkle tree for computing the availability root thing

Yes, can confirm. re: paritytech/polkadot#3317 we will be able to update the semantic meaning of the erasure-root and migrate to cheaper trie. Although the 16-trie branch proofs are still probably not that bad for 1000 validators.

They will also probably not be that bad for a trie mapping 64 indices to unique pov chunks of 32KB. (2MB PoV)

In brief, my second suggestion is that Emeric or someone eventually fix our bloated Merkle proofs.

Yes. This is orthogonal, though.

@burdges
Copy link

burdges commented Jun 20, 2021

Yes, cool. We could exploit our erasure coding being systemic to streamline fetching PoVs. It gives a clean failover from direct fetching to reconstruction.

@rphmeier
Copy link
Contributor Author

For us to make use of the systemic code effectively we'd have to change the validator indices who hold the systemic chunks more often than we do now, which is once per session. Maybe compute them in some balanced way based on relay-parent number.

@burdges
Copy link

burdges commented Jun 20, 2021

Yes, something like chunk_index = validator_index + parachain_id + relay_parent_slot_no % num_validators like I noted above.

@rphmeier
Copy link
Contributor Author

rphmeier commented Jun 21, 2021

The first two validator_index + parachain_id we can do easily in approval-checking, but the last part relay_parent_slot_no isn't as easily available especially for something like remote disputes. We could have the TrieRootCommitment commit to an offset, maybe, and have the relay chain check this when accepting the backed block.

@burdges
Copy link

burdges commented Jun 21, 2021

I miss-typed above % has high multiplicative precedence in programming, but I wrote it like mod which has precedence below addition in mathematics, oops.

I'd think relay_parent_slot_no spreads work more evenly faster, but other options exist, like distributing parachain_id % num_validators nicer, or maybe just multiplying parachain_id by some prime, maybe even near say 2^16.

At first blush I'd worry about doing (validator_index + parachain_id + TrieRootCommitment) % num_validators because then folks would bias TrieRootCommitment but any VRF output could be used too. At first I dislike using VRFs if not really helpful because someone might think the usage really mattered. Yet, I wanted to use a VRF in the tit-for-tat scheme, so maybe we should use them here so that this gives a distribution suitable for the tit-for-tat scheme? So maybe

chunk_index = ( validator_index + Blake2b("availdist", parachain_id, relay_vrf_output) ) % num_validators

@burdges
Copy link

burdges commented Jun 21, 2021

Initially, I'd everyone using their own VRF in the tit-for-tat, so that nobody knew from who honest people fetch, but..

  • this gives no "security" property per se, merely avoids biasing entirely, and
  • if we've enough parachains then hashing parahain_id spreads things enough already.

Is a relay chain block producer going to not make a block because they've some bad distribution of chunks 0..f? No. We pay at least 10 dots for making a block. We'd pay 1-5 dots per approval check. We'd pay only availability chunk suppliers maybe 0.1 / f dots, and then reduce payments for deviation from the expected chunk suppliers by maybe 0.0001 / f dots, way less than the cost of sending the data. I think biasing this distribution looks unprofitable.

I therefore think this formula works..

chunk_order_seed = Blake2b("availdist", parachain_id, relay_vrf_output)
chunk_index = ( validator_index + chunk_order_seed ) % num_validators

with the fetching rules

  1. fetch chunks 0..f from whoever officially holds that piece and signed for them in their bitfield,
  2. request from the backing checker any chunks in 0..f for for which the offical holder never signed for them in their bitfield
  3. request enough chunks from f+1..3f+1 from whoever officially holds that piece, but do so using a random order defined by Blake2b(my_assignment_publickey, chunk_order_seed) % (2f+1)

You can just straight to 3 anytime you like of course. I prefer doing 1 before 2 but if you need 2 before 1 then I'll get over it. :)

@rymnc
Copy link

rymnc commented Jul 31, 2022

Cross posting from paritytech/substrate#11944

The Episub gossip protocol is a successor to the FloodSub protocol, which allows two types of peers to exist, eager, and lazy peers. The lazy peers are just used to transmit metadata about the message, whereas eager peers receive the whole message.

In the context of distributing PoV's, it is possible for the backing set for the given parachain to be eager peers, and the other validators could be lazy peers, to facilitate gossiping. As an added functionality, the lazy peers who wish to validate the block can request the chunks as well.

There is an existing implementation of the Episub protocol, in Rust.

As per my understanding of this use-case, using Episub for PoV distribution, would result in -

  1. Reduced network overhead for validators that are not part of the backing set
  2. Reduced latency in transmitting data to the backing set

@burdges
Copy link

burdges commented Jul 31, 2022

We'd never widely "gossip" PoV blocks under any circumstances, as that'd be extremely inefficient.

We do "gossip" parablocks among parachain nodes aka collators, but among the parachain nodes these parablocks need not be PoVs of course, which makes the blocks far smaller.

A collator expands its parablock into being a PoV only when sending to its parachains five backing validators. Any backing validator receiving a parablock PoV B could then check B and expand B into its erasure coded chunks B_i to create the candidate receipt R for B.

We then "gossip" Rs among only the five backing validators, which yes then results in those five backing validators "gossiping" the Rs for those Bs. There is no useful notion of eager & lazy here obviously. We also enforce several constraints which do not afaik fit a general pubsub protocol.

In fact, we're currently rewriting gossip within backing groups ala paritytech/polkadot#5055 (comment) because the existing protocol used something resembling eager & lazy peers, but proved too wasteful for network bandwidth.

After some candidate receipt R collects two backing votes and appears on the relay chain then we begin availability distribution, which means sending the erasure coded chunks B_i of R to validator i.

Approval checkers later reconstruct the PoV B of R by fetching enough erasure coded chunks B_i of R, which is what this issue discusses.

I should emphasize that both these last two steps do not constitute gossip, and in fact are asymptotically far more efficient than any possible gossip.

At a high level, I'm dubious pub-sub protocols could ever really be a desirable abstraction. Afaik, one should gossip only when absolutely essential for security & redundancy, and think carefully about the gossip topology for that particular message type, while manually structuring heavier bandwidth flows around direct connection, perhaps using erasure coding to bridge critical gaps. Always think more like bittorrent and less like bitcoin.

At a personal level we need fundamental tools like set reconciliation via BCH codes, as discussed in https://hackmd.io/@rgbPIkIdTwSICPuAq67Jbw/HJ5rpldsB or https://github.com/w3f/research-internal/issues/141 We do not need poorly considered and hard-to-optimize abstractions of the sort libp2p seemingly provides.

Apologize for being gruff here, but you brought pub-sub to "how to be like bittorrent" thread.

@burdges
Copy link

burdges commented Jul 31, 2022

We're clearly happy to pay for someone to work on set reconciliation via BCH codes in Rust of course, maybe @drahnr

@burdges
Copy link

burdges commented Aug 2, 2022

Apologies again for being gruff there. We want good abstractions that makes building p2p networks easier, but afaik they do not appear to be as simple as things like eagerness & laziness. I'll tell a personal story:

I wanted polkadot's availability system not just to be like bittorrent, but to actually become bittorrent, by which I mean I believed a common protocol exists through which we'd benefit from bittorret folks delicate network optimizations, and bittorrent would benefit from our reliability.

I eventually & sadly concluded this protocol does not exist per se, because polkadot needs Reed-Solomn to ensure a undecodable ratio under 1/3, while file sharing likely benefits from rateless erasure codes, under their typical threat model of honest encoders. As these codes are nice for such different reasons, it's dubious that any nice common abstraction exists either above QUIC streams or whatever.

We should do networking abstractions at whatever level looks viable, but we should first build more real applications, so our abstractions hunting works with real material.

@rymnc
Copy link

rymnc commented Aug 2, 2022

Thanks for the explanation! I should've looked a little more into how the distribution is being done.

@Sophia-Gold Sophia-Gold transferred this issue from paritytech/polkadot Aug 24, 2023
@the-right-joyce the-right-joyce added the I9-optimisation An enhancement to provide better overall performance in terms of time-to-completion for a task. label Aug 25, 2023
claravanstaden added a commit to Snowfork/polkadot-sdk that referenced this issue Dec 8, 2023
helin6 pushed a commit to boolnetwork/polkadot-sdk that referenced this issue Feb 5, 2024
Bumps [async-trait](https://github.com/dtolnay/async-trait) from 0.1.59 to 0.1.61.
- [Release notes](https://github.com/dtolnay/async-trait/releases)
- [Commits](dtolnay/async-trait@0.1.59...0.1.61)

---
updated-dependencies:
- dependency-name: async-trait
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Mar 26, 2024
serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Mar 27, 2024
serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 8, 2024
serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 8, 2024
serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 8, 2024
serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 8, 2024
serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 8, 2024
serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 9, 2024
serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 9, 2024
serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 9, 2024
serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 9, 2024
serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 9, 2024
serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 9, 2024
serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 10, 2024
serban300 pushed a commit to serban300/polkadot-sdk that referenced this issue Apr 10, 2024
bkchr pushed a commit that referenced this issue Apr 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I9-optimisation An enhancement to provide better overall performance in terms of time-to-completion for a task.
Projects
Status: Backlog
Status: No status
Development

No branches or pull requests

6 participants