Skip to content
This repository has been archived by the owner on Mar 24, 2023. It is now read-only.

Reconsider serializing block data into shares #152

Closed
liamsi opened this issue Mar 31, 2021 · 4 comments
Closed

Reconsider serializing block data into shares #152

liamsi opened this issue Mar 31, 2021 · 4 comments
Labels
disambiguation investigation Investigation required. serialization Serialization definitions

Comments

@liamsi
Copy link
Member

liamsi commented Mar 31, 2021

Description

I wanted to bring up everyone's favorite topic again. Namely serialization.

First, I want to note that the way we arrange data into shares is essentially a custom serialzation format. It's a very light-weight and smart one and additionally uses (length delimited) protobuf for the hard parts (which IMO is a good choice).

Despite the theoretical simplicity, we ourselves are not having the best time implementing the interesting parts (correctly). Namely the splitting and merging of shares. This is not optimal because every client that does DA sampling or trying to download particular Txs directly from the p2p network, will have to do the merging part, too.
A custom serialzation format is an additional burden and hurdle for writing clients in other languages (e.g., Rust, python, Javascript, etc). IMO implementing that part should not take longer than a few hours, maybe a day in any language. We now have proof that this takes (much) longer even for those who either invented the format or at least are deeply familiar with it. Of course, we could blame this on go but the reality is that every other language also has its pitfalls and tradeoffs, too.

Proposal

I'm proposing to describe shares in protobuf too. The fields of the share proto message would be the namespace, the startIndex (optional) and the total length of the serlialized message, and finally, the part of the message in that particular share (in bytes) resulting from the splitting operation.
That way the spec really would just need to describe the split and merge algorithms and nothing else. We'd need to be careful that we do the splitting in a way that still results in equally sized protobuf messages.

I know, I've been kinda advocating against going all-in on protobuf for the shares myself. At least in the sense that I wanted to keep out any knowledge of the serialization format used out of the NMT (which IMO should simply take in raw bytes and nothing else). I'm not sure yet if the implication here would be that we'd need to move away from this property.

related: #35

┆Issue is synchronized with this Asana task by Unito

@liamsi liamsi added serialization Serialization definitions disambiguation investigation Investigation required. labels Mar 31, 2021
@evan-forbes
Copy link
Member

That's an interesting idea, and definitely needs more investigation. Provided that we could get exact sized shares, I think it could be a good simplification measure. Although, I'm still not sure how much it would simplify things, as we're already mostly working with raw bytes, and would still have to manually split/merge.

I'm not the most familiar with protobuf, how would we get the namespace before the length delimiter?

We're already using protobuf for evidence, would we first serialize evidence, then create the shares, then serialize the shares?

@adlerjohn
Copy link
Member

adlerjohn commented Apr 1, 2021

we could blame this on go

Agreed 🦀

the total length of the serlialized message, and finally

The problem with this suggestion is that this length does not repeat each share. So you'd still need at least 2 serialization passes: one to length-delimit the tx/message/whatever, then another to serialize the share struct after splitting.

The issue with using protobuf for this second step is that protobuf doesn't support fixed-length arrays.

@liamsi
Copy link
Member Author

liamsi commented Apr 1, 2021

we could blame this on go

Agreed 🦀

Good that you bring this up. We should likely try to implement the current scheme in Rust as it is the most interesting language we should target after go anyways. That said, I'd be surprised if the time to finish a fully tested implementation in Rust would be less than it took us to implement this in go (maybe it will actually be faster because we've ironed out a lot of the details now - but it will be the same ballpark).

I'm happy to write up splitting and merging using protobuf messages only and we can try to compare the understandability and ease or difficultness to implement that. But that's for later. Let revisit this issue after the first test-nets.

@adlerjohn
Copy link
Member

not planned any time soon

@adlerjohn adlerjohn closed this as not planned Won't fix, can't repro, duplicate, stale Sep 19, 2022
@adlerjohn adlerjohn moved this to Done in Celestia Node Sep 19, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
disambiguation investigation Investigation required. serialization Serialization definitions
Projects
No open projects
Archived in project
Development

No branches or pull requests

3 participants