Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define CIDv0 to allow sha256 multhashes only. #13

Closed
Stebalien opened this issue Aug 30, 2017 · 8 comments
Closed

Define CIDv0 to allow sha256 multhashes only. #13

Stebalien opened this issue Aug 30, 2017 · 8 comments

Comments

@Stebalien
Copy link
Member

All of our code assumes this by checking for the Qm prefix. Furthermore, the argument for CIDv0 is backwards compatibility and, as far as I know, nobody is using non-sha256 CIDv0 CIDs at the moment (at least I hope not as our code can't really handle them...).

@Kubuxu
Copy link
Member

Kubuxu commented Aug 31, 2017

This very much makes sense.

@daviddias
Copy link
Member

daviddias commented Aug 31, 2017

Although that is true. The hash is the only one thing in CIDv0 that uses a self-describable format, so it should be an issue to support multiple.

@Stebalien
Copy link
Member Author

Because CIDv0 isn't a self describing format, the only way to tell if something is a CIDv0 is to check the first two bytes for Qm. If we change the multihash, this changes and we'd have to check every possible base58 encoded multihash prefix. Worse, we might end up with a hash choice that, when base58 encoded, looks like a multibase prefix.

If we say that CIDv0 addresses should be SHA256 only and never assign a multibase prefix of Q, we'll never have a problem.


Note: Accepting non-base58 multihashes would also be a backwards incompatible change due to how we currently decode CIDs...

@daviddias
Copy link
Member

Because CIDv0 isn't a self describing format, the only way to tell if something is a CIDv0 is to check the first two bytes for Qm.

Not true.

If the CID is a String, then we can check the first two bytes

If the CID is a Buffer, then we can check the first byte for the multihash table, if it matches one codec there, then it is a CIDv0.

Check how we do the matching in JS: https://github.com/ipld/js-cid/blob/master/src/index.js#L24-L40

@Stebalien
Copy link
Member Author

The JS algorithm is already broken:

  1. For the string case, there can't exist any multihash such that, when base58 encoded, the encoded multihash starts with an existing multibase prefix. Unfortunately, due to the nature of base58 and the fact that we allow truncating hashes, this means that no letter in the base58btc alphabet can be a multibase prefix (probably). For example, here's a base58 sha256 hash truncated to 26 bytes: BoLNvd7Fkkm1Pz7kBN5F8b7YKmZ8nzvgJGDYjt; B is the base32 prefix.
  2. For the buffer case, we can't pick a CID version that's also a multihash. The identity hash prefix (0x00) breaks this case.

If we limit the valid hashes to sha256-256, we can just never assign Q as a multibase prefix and never use 0x12 as a CID version.

@daviddias
Copy link
Member

For the string case, there can't exist any multihash such that, when base58 encoded, the encoded multihash starts with an existing multibase prefix. Unfortunately, due to the nature of base58 and the fact that we allow truncating hashes, this means that no letter in the base58btc alphabet can be a multibase prefix (probably). For example, here's a base58 sha256 hash truncated to 26 bytes: BoLNvd7Fkkm1Pz7kBN5F8b7YKmZ8nzvgJGDYjt; B is the base32 prefix.

Without wanting to claim what is the right solution for this case, a possible one is to catch this edge case (and edge as in, there had to be a block out there that uses that format, which is quite unlikely and there is no reason for it to exist now that there is CIDv1) is to check the rest of the the parser breaking and then adding the rule check if we are in this scenario.

For the buffer case, we can't pick a CID version that's also a multihash. The identity hash prefix (0x00) breaks this case.

Good catch. This is definitely an issue introduced when the Identity multihash became a thing.

Similar to above, a not perfect solution but probably safe unless someone intentionally wants to break it, is to assume that CIDv0 will not use the Identity multihash.

These kinds of things should be included in the migration strategy and why it is good to move everyone to CIDv1 as fast as possible than trying to let CIDsv0 continue to be created with new iterations.

@Stebalien
Copy link
Member Author

a possible one is to catch this edge case (and edge as in, there had to be a block out there that uses that format,

I just pulled that one out of a hat. There are infinitely many collisions with different hash functions and different bases.

Similar to above, a not perfect solution but probably safe unless someone intentionally wants to break it, is to assume that CIDv0 will not use the Identity multihash.

This is the internet...

These kinds of things should be included in the migration strategy and why it is good to move everyone to CIDv1 as fast as possible than trying to let CIDsv0 continue to be created with new iterations.

We'll still have to support these for the foreseeable future, we might as well restrict them as much as possible.


I really don't see any reason not to forbid everything but sha256-256 CIDv0s (given that this exists solely for backwards compatibility purposes). go-cid has only ever supported sha256-256 CIDv0 CIDs anyways.

@Stebalien
Copy link
Member Author

So, apparently this was supposed to be part of the spec already...

Old v0 CIDs are strictly sha2-256 multihashes encoded in base58 -- this is because IPFS tooling only shipped with support for sha2-256. This means the binary versions are 34 bytes long (sha2-256 256 bit multihash), and that the string versions are 46 characters long (base58 encoded). This means we can recognize a v0 CID by ensuring it is a sha256 bit multihash, of length 256 bits, and base58 encoded (when a string).

-- ipfs/specs#130

@ghost ghost removed the ready label Apr 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants