Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using multihash for digests #45

Open
hinshun opened this issue Mar 12, 2019 · 7 comments
Open

Using multihash for digests #45

hinshun opened this issue Mar 12, 2019 · 7 comments

Comments

@hinshun
Copy link

hinshun commented Mar 12, 2019

The descriptors spec mentions multihash+base58 as an example of an algorithm that is not registered. I'm interested to use multihashes in application/vnd.oci.image.manifest.v1+json, and to pull tags by multihash digest from distribution.

Multihashes encode the algorithm in the digest, so multihash+base58 itself isn't sufficient to generate digests (FromReader, FromBytes, FromString) or to return the size of the hash. Perhaps multihash+base58:encoded are transformed to its respective algorithm:encoded for validation / verifying purposes.

I'm curious to hear thoughts about supporting multihash (in or out of package) in a way that projects like containerd and distribution can consume them.

@AkihiroSuda
Copy link
Member

Do you already have implementation that uses multihash+base58 with OCI descriptors?

@hinshun
Copy link
Author

hinshun commented Apr 7, 2020

Hi @AkihiroSuda, in my proof of concept that transfers layers over IPFS we continue to use sha256:xyz OCI descriptors. Here's the problem with multihash+base58:

  • multihash is an encoding that may be using any hashing algorithm.
  • The current code of this repo wants the Algorithm to return some function to read the Hex. multihash on its own isn't sufficient, you need to read a few leading bytes from the Hex of the multihash to figure out what algorithm it is.

In IPCS we continue to use OCI descriptors: https://github.com/hinshun/ipcs/blob/master/converter.go#L45-L48

However the digest isn't referring to the layer digest, but rather the wrapper object (known as IPLD) around the layer. The reason why its needed its because in IPFS, large objects needs to chunked into a DAG. The wrapper objects are the overhead of storing the structure of the DAG as well as the digest of each chunk.

@thaJeztah
Copy link
Member

@hinshun just a heads-up: we're preparing tagging of v1.0.0 (#56). Given that no PR was opened yet, I assume this is ok to be implemented as a minor release after v1.0.0?

@hinshun
Copy link
Author

hinshun commented May 13, 2020

@thaJeztah Yes, I think we'll need to find another approach after v1.0.0. Multihash is an alternative to go-digest, rather than an algorithm that can be added. For example, combining things like +base58 is in the spec for multibase.

Perhaps the best way to implement this is just as annotations.

@thaJeztah
Copy link
Member

Ah, thanks; though I'd ping you just in case there was a blocker for the release 👍

@stevvooe
Copy link
Contributor

The example referenced in the spec is just an example of an unregistered algorithm. Typically, the first part is the algorithm and the second part is some modifier describing the details of the encoding. There's no reason we couldn't store a multihash in a go-digest string. The only constraint is that the encoding has to be URL safe (https://github.com/opencontainers/go-digest/blob/master/digest.go#L63).

Seems like multibase would be sufficient as a prefix, since the rest of the decoding information is embedded in the string.

One thing to consider here: it is not necessarily better to support a whole slew of algorithms, since it makes sets of content mutually incompatible without more intelligent backend support. Ideally, you choose one algorithm that is "canonical" then have conversions to/from that main one. Annotations, as you mentioned earlier, might be sufficient for packing images that were built by docker then pushed into the ipcs context.

One other thing we might consider: the use of Algorithm.Size() assumes a fixed size output encoding. I can't remember why this is there, other than being compatible with Go's crypto package, which assumes fixed size hash outputs, so if that is an issue, we might be able to deprecate it.

@hinshun
Copy link
Author

hinshun commented May 14, 2020

The example referenced in the spec is just an example of an unregistered algorithm. Typically, the first part is the algorithm and the second part is some modifier describing the details of the encoding. There's no reason we couldn't store a multihash in a go-digest string. The only constraint is that the encoding has to be URL safe (https://github.com/opencontainers/go-digest/blob/master/digest.go#L63).

Yeah, I think the spec doesn't need to be changed. The existing implementation of go-digest makes a few assumptions like fixed size issue you pointed out. We'll need to fix those before we can add algorithms that have decoding information.

Seems like multibase would be sufficient as a prefix, since the rest of the decoding information is embedded in the string.

If we're going this route, cidv1 seems to be best prefix. I think there is a few more bits of information that IPFS uses than just the base-encoding. It also allows IPFS to upgrade (like going from cidv0 to cidv1) but still maintain backwards compat.

One thing to consider here: it is not necessarily better to support a whole slew of algorithms, since it makes sets of content mutually incompatible without more intelligent backend support. Ideally, you choose one algorithm that is "canonical" then have conversions to/from that main one.

Good point. The IPFS team actually is moving away from storing content using cid to multihash so that the base encoding doesn't make sets of content incompatible. Though they will still suffer from duplicating content that were stored using different algorithms.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants