PoRep #9772

jon-chuang · 2020-04-28T17:53:39Z

Comments About Current Proposal
Firstly, it does not seem to utilise the depth-robust graph structured CBC encoding utilised by filecoin presently.

This exposes it to sybil attacks where an archiver can store a copy of the original file, store periodic offsets of the CBC encoding, and generate the CBC encoding for challenge blocks rapidly under multiple identities.

The solution of filecoin is using DAGs that have both high fan in, so that it is untenable to reconstruct a high number of missing nodes, and for which most nodes have high depth, to satisfy the VDE (verifiable delay encoding) property.

Secondly, the current method of Solana is to generate the entire CBC encoding from scratch to verify in batch for each identity. This is an extremely expensive verification both in terms of bandwidth and compute, and leads to the rather dissatisfactory situation where the archivers must catch the validators cheating. Further, validators seem to get some freedom in whose proofs they verify. Rather, a more elegant situation could ensue if proofs are inexpensive to verify, such as by means of inclusion proofs via merkle trees, and let every validator verify the proofs. Rather than batching verification, one can instead continuously verify. However, this approach still had high ledger storage costs for the rather large proofs.

Some questions:

Are segments intended to be 100Gb-1TB? If so, it seems like an extremely bad idea to keep shuffling about the segments between archivers. Downloading a 1GB file takes long enough. It seems better for a single randomly assigned archiver to sit squarely on a given piece of data. Assigning the data in a random way that it cannot be grinded is something that requires looking into.
There doesn't seem a way to query data. There should be a transaction that allows fees to be exchanged for retrieval. The archiver should send a merkle proof that their data is a particular block wanted by the client. This correspondence between block and merkle proofs should be verified upon initialisation by all validators. Initialisations are one off costs that are expensive. There should be several initialisations for every epoch. A few for the most recent segment, and a few for archiver failure. This should mean that long-term miners should get rewarded more than short-term miners, and miners with greater uptime should be rewarded more too. This can be computed from storage account data and PoH storage commits.
What is the intention of using ChaCha? Why not AES? AES has dedicated CPU instructions, prevent ASIC attacks.
There should be some logic to determine finality for delinquent archivers and to reinitialise particular segments if archiver drops out. This should take into account possibility that archivers simply go offline for periods of time. So one should measure the commitment of archivers.
Shouldn't erasure coding be applied to individual tx?
other than aesthetic reasons, what point is there in storing the entire ledger? Shouldn't we only care about the ledger that doesn't go beyond the rollback period?

If we assume that one storage epoch is 12 hours, there's a lot of proofs that need to be generated for a segment of size 600GB, assuming a sequential CBC throughput of 10GB/s, and that the depth of the graphs we are talking about is 1/5 the data size, that's 10s (~4500 proofs required in 12 hours). So that's a lot of savings to be made.

Suggestions

Rather than continuous verification as per above, to reduce the amount of space consumed by proofs of replication (inclusion proofs + CBC of random nodes + parent) in the storage account (could be on the order of several kB per proof per identity) and ledger, we could consider zk-SNARKs, as filecoin is doing. They combine it with proof composition and VDFs to reduce the proof size. The Solana version can use zk-proofs that storage commitment hashes in PoH do indeed come from valid PoRep. This can be as before, one off costs at the end of epochs. These can be commited to ledger with small storage and compute cost. Nice thing is there are SNARK-friendly hashes, so merkle shouldn't be a huge problem. One can use a multilayer SNARK to do proof aggregation. AES could also be potentially cheap through a scheme called PLONK.

Update 1: I think the ZigZag PoRep described in Ben Fisch's paper can be very suitable for our purposes. Since unlike filecoin, which uses PoSt on arbitrary data for election tickets, we are not vulnerable to the SealStack attack, we can use ZigZag for fast decoding.

Update 2: It seems 24 hour timeouts could be possible (porcuquine's blog) This would be pretty amazing for us. Further, this seems to be achieved with just 68GB sector size!

Some resources: Filecoin proof repo

sakridge · 2020-04-28T18:01:23Z

What is the intention of using ChaCha? Why not AES? AES has dedicated CPU instructions, prevent ASIC attacks.

Chacha has better GPU performance than AES, so the relative speedup is greater.

ryoqun · 2020-10-27T19:47:00Z

I think we can close this for now. (let's reduce open issues!)

mvines added this to the The Future! milestone May 21, 2020

ryoqun closed this as completed Oct 27, 2020

github-actions bot locked as resolved and limited conversation to collaborators Mar 31, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PoRep #9772

PoRep #9772

jon-chuang commented Apr 28, 2020 •

edited

Loading

sakridge commented Apr 28, 2020

ryoqun commented Oct 27, 2020

PoRep #9772

PoRep #9772

Comments

jon-chuang commented Apr 28, 2020 • edited Loading

sakridge commented Apr 28, 2020

ryoqun commented Oct 27, 2020

jon-chuang commented Apr 28, 2020 •

edited

Loading