Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merkle reference multiformat #357

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Merkle reference multiformat #357

wants to merge 1 commit into from

Conversation

Gozala
Copy link
Contributor

@Gozala Gozala commented Oct 1, 2024

This PR proposes addition of the code for the merkle-references multiformat, described here. Allocating a code would enable use to format them as IPLD links and seamlessly integrate them into the rest of the IPLD ecosystem.

Format

Merkle reference could be viewed as a hashing algorithm defined for all of the IPLD types (as opposed to just bytes). However, just like with CIDs it may use different underlying hashing algorithms (ℹ️ Although same algorithm across full DAG is implied).

Therefor merkle-reference is proposed as a standalone multiformat with a following format

<merkle-reference> ::= <merkle-reference-multicodec><content-multihash>
# or, expanded:
<merkle-reference> ::= <0x07, the code for merkle-reference><multihash of merkle-folded data>

Integration

There is a nice duality to the merkle-references as they could also be viewed as a lossy IPLD codec where bytes are derived through merkle-folding process (described in the linked spec).

By prefixing merkle-reference with 0x01 varint they can be formatted as valid CIDv1, in which case 0x07 code could be treated as IPLD codec.

This could be utilized to integrate merkle-reference's into rest of the IPLD ecosystem by formatting them IPLD links.

@Gozala Gozala requested review from rvagg and vmx as code owners October 1, 2024 15:37
@Gozala Gozala requested a review from alanshaw October 1, 2024 16:07
@Gozala Gozala changed the title Merkle Reference Code Merkle reference multiformat Oct 1, 2024
@Gozala Gozala requested a review from ribasushi October 1, 2024 16:26
@alanshaw
Copy link
Member

I like the idea. I don't know the process to have codes accepted here, do we require at least one implementation?

@Gozala
Copy link
Contributor Author

Gozala commented Oct 24, 2024

I like the idea. I don't know the process to have codes accepted here, do we require at least one implementation?

I have an implementation that currently lives here https://github.com/Gozala/merkle-reference

@bumblefudge
Copy link

bumblefudge commented Nov 18, 2024

I like the idea. I don't know the process to have codes accepted here, do we require at least one implementation?

See Robin's process doc - Provide evidence that the encoding is supported in at least two production implementations. is required for DRAFT status, and this would also be a requirement if the registry were administered at IANA or at W3C's new registry process. Since we're in the extremely scarce single-byte range, it might make more sense to wait until this is a little more developed/further along before robbing future-multiformats of one slot needed for future CID and/or other structured "third layer" tags?

Another thing that would help would be if the current spec were formatted as a complete, linter-passing IETF internet-draft (example from a multibase registration in progress) or W3C CG respec doc with test-vectors and all that, rather than an open PR with unresolved comments on the web3.storage IP process repo... not technically a requirement at DRAFT status but maybe one that would help allay concerns of single-byte squatting from the least-generous possible readers 😉

@Gozala
Copy link
Contributor Author

Gozala commented Nov 19, 2024

See Robin's process doc - Provide evidence that the encoding is supported in at least two production implementations. is required for DRAFT status, and this would also be a requirement if the registry were administered at IANA or at W3C's new registry process.

Ah I was not aware of the new process, thanks for pointing it out.

Since we're in the extremely scarce single-byte range, it might make more sense to wait until this is a little more developed/further along before robbing future-multiformats of one slot needed for future CID and/or other structured "third layer" tags?

Sounds reasonable, yet seems like a double standard when I see

multicodec/table.csv

Lines 4 to 5 in 352d05a

cidv2, cid, 0x02, draft, CIDv2
cidv3, cid, 0x03, draft, CIDv3

CIDv2 had being discussed forever and I would be very surprised if there are multiple implementations two production implementations. I have not even heard of CIDv3 probably something new that happened since I fell of the inter planetary space 😅 I won't even mention that most of the codes in that table would fail to meet new criteria.

As of second implementation, there is one in development in Rust and I can update the thread here when it's ready.

Another thing that would help would be if the current spec were formatted as a complete, linter-passing IETF internet-draft (example from a multibase registration in progress) or W3C CG respec doc with test-vectors and all that, rather than an open PR with unresolved comments on the web3.storage IP process repo... not technically a requirement at DRAFT status but maybe one that would help allay concerns of single-byte squatting from the least-generous possible readers 😉

Most up to date spec lives here https://github.com/Gozala/merkle-reference/blob/main/docs/spec.md. There is also interactive version https://observablehq.com/@gozala/merkle-references that anyone can test with various data sets.

Some test fixtures are available here and they'll likely move into more portable form once Rust implementation is there.

Trying to reformat it into IETF / W3C spec format is plausible, but as one man show I got to be pragmatic with where I spend time and it seemed a lot more reasonable to budget it after code is in the table as opposed to before.

@Gozala
Copy link
Contributor Author

Gozala commented Nov 19, 2024

I should mention that presence on multicodec table is nice to have mostly for backwards compatibility with IPLD addressing scheme. In practice I don't expect multiformat prefixes to be used beyond bridging with legacy (IPLD) system.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants