Define CIDv0 to allow sha256 multhashes only. #13

Stebalien · 2017-08-30T20:09:03Z

All of our code assumes this by checking for the Qm prefix. Furthermore, the argument for CIDv0 is backwards compatibility and, as far as I know, nobody is using non-sha256 CIDv0 CIDs at the moment (at least I hope not as our code can't really handle them...).

The text was updated successfully, but these errors were encountered:

Kubuxu · 2017-08-31T01:13:04Z

This very much makes sense.

daviddias · 2017-08-31T12:12:41Z

Although that is true. The hash is the only one thing in CIDv0 that uses a self-describable format, so it should be an issue to support multiple.

Stebalien · 2017-09-01T00:26:54Z

Because CIDv0 isn't a self describing format, the only way to tell if something is a CIDv0 is to check the first two bytes for Qm. If we change the multihash, this changes and we'd have to check every possible base58 encoded multihash prefix. Worse, we might end up with a hash choice that, when base58 encoded, looks like a multibase prefix.

If we say that CIDv0 addresses should be SHA256 only and never assign a multibase prefix of Q, we'll never have a problem.

Note: Accepting non-base58 multihashes would also be a backwards incompatible change due to how we currently decode CIDs...

daviddias · 2017-09-01T03:58:18Z

Because CIDv0 isn't a self describing format, the only way to tell if something is a CIDv0 is to check the first two bytes for Qm.

Not true.

If the CID is a String, then we can check the first two bytes

If the CID is a Buffer, then we can check the first byte for the multihash table, if it matches one codec there, then it is a CIDv0.

Check how we do the matching in JS: https://github.com/ipld/js-cid/blob/master/src/index.js#L24-L40

Stebalien · 2017-09-01T04:52:54Z

The JS algorithm is already broken:

For the string case, there can't exist any multihash such that, when base58 encoded, the encoded multihash starts with an existing multibase prefix. Unfortunately, due to the nature of base58 and the fact that we allow truncating hashes, this means that no letter in the base58btc alphabet can be a multibase prefix (probably). For example, here's a base58 sha256 hash truncated to 26 bytes: BoLNvd7Fkkm1Pz7kBN5F8b7YKmZ8nzvgJGDYjt; B is the base32 prefix.
For the buffer case, we can't pick a CID version that's also a multihash. The identity hash prefix (0x00) breaks this case.

If we limit the valid hashes to sha256-256, we can just never assign Q as a multibase prefix and never use 0x12 as a CID version.

daviddias · 2017-09-01T05:14:20Z

For the string case, there can't exist any multihash such that, when base58 encoded, the encoded multihash starts with an existing multibase prefix. Unfortunately, due to the nature of base58 and the fact that we allow truncating hashes, this means that no letter in the base58btc alphabet can be a multibase prefix (probably). For example, here's a base58 sha256 hash truncated to 26 bytes: BoLNvd7Fkkm1Pz7kBN5F8b7YKmZ8nzvgJGDYjt; B is the base32 prefix.

Without wanting to claim what is the right solution for this case, a possible one is to catch this edge case (and edge as in, there had to be a block out there that uses that format, which is quite unlikely and there is no reason for it to exist now that there is CIDv1) is to check the rest of the the parser breaking and then adding the rule check if we are in this scenario.

For the buffer case, we can't pick a CID version that's also a multihash. The identity hash prefix (0x00) breaks this case.

Good catch. This is definitely an issue introduced when the Identity multihash became a thing.

Similar to above, a not perfect solution but probably safe unless someone intentionally wants to break it, is to assume that CIDv0 will not use the Identity multihash.

These kinds of things should be included in the migration strategy and why it is good to move everyone to CIDv1 as fast as possible than trying to let CIDsv0 continue to be created with new iterations.

Stebalien · 2017-09-01T05:24:27Z

a possible one is to catch this edge case (and edge as in, there had to be a block out there that uses that format,

I just pulled that one out of a hat. There are infinitely many collisions with different hash functions and different bases.

Similar to above, a not perfect solution but probably safe unless someone intentionally wants to break it, is to assume that CIDv0 will not use the Identity multihash.

This is the internet...

These kinds of things should be included in the migration strategy and why it is good to move everyone to CIDv1 as fast as possible than trying to let CIDsv0 continue to be created with new iterations.

We'll still have to support these for the foreseeable future, we might as well restrict them as much as possible.

I really don't see any reason not to forbid everything but sha256-256 CIDv0s (given that this exists solely for backwards compatibility purposes). go-cid has only ever supported sha256-256 CIDv0 CIDs anyways.

Stebalien · 2017-09-01T17:43:11Z

So, apparently this was supposed to be part of the spec already...

Old v0 CIDs are strictly sha2-256 multihashes encoded in base58 -- this is because IPFS tooling only shipped with support for sha2-256. This means the binary versions are 34 bytes long (sha2-256 256 bit multihash), and that the string versions are 46 characters long (base58 encoded). This means we can recognize a v0 CID by ensuring it is a sha256 bit multihash, of length 256 bits, and base58 encoded (when a string).

-- ipfs/specs#130

daviddias added the ready label Mar 19, 2018

Stebalien closed this as completed Apr 12, 2019

ghost removed the ready label Apr 12, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Define CIDv0 to allow sha256 multhashes only. #13

Define CIDv0 to allow sha256 multhashes only. #13

Stebalien commented Aug 30, 2017

Kubuxu commented Aug 31, 2017

daviddias commented Aug 31, 2017 •

edited

Loading

Stebalien commented Sep 1, 2017

daviddias commented Sep 1, 2017

Stebalien commented Sep 1, 2017

daviddias commented Sep 1, 2017

Stebalien commented Sep 1, 2017

Stebalien commented Sep 1, 2017

Define CIDv0 to allow sha256 multhashes only. #13

Define CIDv0 to allow sha256 multhashes only. #13

Comments

Stebalien commented Aug 30, 2017

Kubuxu commented Aug 31, 2017

daviddias commented Aug 31, 2017 • edited Loading

Stebalien commented Sep 1, 2017

daviddias commented Sep 1, 2017

Stebalien commented Sep 1, 2017

daviddias commented Sep 1, 2017

Stebalien commented Sep 1, 2017

Stebalien commented Sep 1, 2017

daviddias commented Aug 31, 2017 •

edited

Loading