Use raw multihashes for providers #755

Stebalien · 2018-10-24T22:29:52Z

Currently, we're using CIDs. However, multiple CIDs can map to the same block (v1, v0, different codecs). Due to the base32 CID migration effort, we'll need to switch over the routing system to use raw multihashes.

Note: Given that raw multihashes are still valid V0 CIDs, old servers will continue to work.

Proposed changes:

Change the routing interfaces to take multihashes.
In handleAddProvider, just use strings/[]byte; don't parse anything (although we can assert that the length < 100 bytes (or something like that). We could also just use mutlihashes but then we'd need to change the multihash library to handle unknown multihash codes (which we may want to do anyways?).

@kevina is working on this from the go-ipfs side but he'll need someone on the libp2p side to implement these content-routing changes. Any takers?

The text was updated successfully, but these errors were encountered:

Stebalien · 2018-10-24T22:30:40Z

(note: there may be other ways to do this but, at the end of the day, we need to be able to lookup a raw multihash of a block in the DHT).

kevina · 2018-10-29T05:27:10Z

Also see: https://github.com/ipfs/go-ipfs-blockstore/issues/8 and ipfs/kubo#5510 (comment) for context.

I made an attempt to push it though. I basically changed the interface in libp2p/go-libp2p-routing#33 and then semi-blindly pushed this through. Everything seams to work, but not really knowing how the network layers of ipfs work I am sure I missed things. Shareness tests are passing locally in go-ipfs.

Most of the significant changes are in libp2p/go-libp2p-kad-dht#203.

Other changes are in: libp2p/go-libp2p-pubsub-router#14, libp2p/go-libp2p-routing-helpers#13, ipfs/go-bitswap#18, ipfs/go-ipfs-routing#15

kevina · 2018-10-30T04:12:11Z

Original IRC Conversation (mostly for my own reference):

kevina: isn't your proposed change in the wire protocol?
kevina: That is changing provide and find providers to use multihash means we will start publishing multihashes and not Cid's
kevina: Or am I missing something?
stebalien: That's what my proposed change is. Basically, we'll use CIDs in bitswap to give the other side as much information as we have but we need to use multihashes in provide or as provide is really a block-level thing.
stebalien: TLDR: yes. kevina (IRC) ^^
kevina: stebalien: okay, then how do we handle older nodes? Do we need to bump the protocol version string?
kevina: (my knowledge of that part of the code is very limited)
stebalien: I don't think so. multihashes are valid CIDs.
stebalien: However, we'll probably need to change the routing interfaces.
stebalien: And new DHT nodes will need to reject provider records with CIDs (i.e., anything that's not just a multihash). Unfortunately, we can't just accept them and respond to CIDv1 and CIDv0 requests because of the way DHTs work (if a user puts a CIDv1 CID on our node, users looking for the CIDv0 version won't find us).
kevina: stebalien: sorry I missed your message, multihashes are valid CIDs but a bare multihash implies CidV0
kevina: I am really unfamiliar with that part of the code also, so I am having a hard time following...
stebalien: kevina (IRC): Basically, new clients will work with old servers because the multihashes will be valid CIDs. Old clients won't work with new servers when putting/getting CIDv1 CIDs but that's probably fine.
stebalien: you'll probably need to work with someone on the libp2p team here...

kevina · 2018-11-05T17:29:02Z

@Stebalien I spend some catching on how IPFS works at the network lawyer and I am not sure I like your solution. For simplicity let's assume we just use the raw binary string for the DHT and not try to decode into a CID or Multihash. Once the newer nodes switch to using Multihashes this will (1) prevent older nodes from discovering CIDv1 content on newer nodes as they will be looking for CIDv1 strings and newer nodes will index based on the multihash
(2) prevent newer nodes from discovering CIDv1 context on older nodes as newer nodes will always look for the multihash and thus won't find CIDv1 strings. The effect of this is that will will partly partition the network in two as for as CIDv1 context is concerned. Is this correct or am I missing something?

If we do as you propose and reject older nodes in the newer nodes DHT I don't see how it will help with (1) or (2) and it may make the situation worse.

I need to study this more, but your insight (or those from the libp2p folks) will be helpful.

magik6k · 2018-11-06T14:29:54Z

I don't have any specific numbers, but my guess would be that vast majority of requests are for CIDv0 content (note that for stuff with raw leaves we still usually start from cidv0 root), so this shouldn't have much impact on most people. Also afaik bitswap will use some workaround for this, so connected nodes won't even notice this. For the remaining few impacted users it should be easy enough to say 'just update'

Stebalien · 2018-11-12T17:55:54Z

So, I can see two solutions:

Publish both records.
Pick one.

Unfortunately, publishing these records is already really expensive and, luckily, we don't have many users of CIDv1 anyways. Most users of CIDv1 (and IPLD) are using javascript which doesn't currently interoperate with the DHT.

So, yeah, I'm not happy with any of these solutions but I feel that:

Using raw multihashes.
Not validating anything (or just validating the multihash structure but not the actual codec).

Is the most flexible approach.

aschmahmann · 2018-12-18T23:53:12Z

A few (likely stupid) questions:

However, multiple CIDs can map to the same block (v1, v0, different codecs).

@Stebalien is multiple codecs for the same data a common/serious issue?

Is the implication here that multihashes are effectively our "data identifiers", but CIDs are "content" = data + metadata identifiers?

I'm trying to understand 1) what CIDs are meant to be used for if not identifying content? 2) what's the plan for solving the multiple addresses per content issue as sha256 becomes obsolete, don't we run into these issues all over again?

Stebalien · 2019-03-22T02:56:44Z

is multiple codecs for the same data a common/serious issue?

Not yet but this could be useful for, e.g., packing IPLD data up into an archive while still deduplicating the underlying blocks. That is, it allows one to build multiple "alternative" DAGs/views with a single set of leaves.

Is the implication here that multihashes are effectively our "data identifiers", but CIDs are "content" = data + metadata identifiers?

Yeah, this is a bit funky. Really, CIDs are "typed" identifiers while multihashes identify blobs of data. We could also just use the raw data multicodec but the current proposal of "just use multihashes" is the simplest solution, IMO.

aschmahmann · 2019-12-09T23:53:39Z

@Stebalien any chance we can do an easier version of this that doesn't require interface changes so we can ship this soon?

Proposal: Leave the interfaces the same, but send multihashes over the wire. Since CIDv0 is a multihash none of those records should change, CIDv1 records will change (so newer and older clients will have not be able to find each others' CIDv1 data).

Is the CIDv1 record incompatibility a problem? If so then the only options I can think of right now are for clients to do multiple queries for a while before we deprecate the publishing CIDv1s, or to rely on a dht version bump to force the deprecation.

Stebalien · 2019-12-10T09:04:57Z

@Stebalien any chance we can do an easier version of this that doesn't require interface changes so we can ship this soon? Proposal: Leave the interfaces the same, but send multihashes over the wire. Since CIDv0 is a multihash none of those records should change, CIDv1 records will change (so newer and older clients will have not be able to find each others' CIDv1 data).

Yes. Let's do it.

Is the CIDv1 record incompatibility a problem? If so then the only options I can think of right now are for clients to do multiple queries for a while before we deprecate the publishing CIDv1s, or to rely on a dht version bump to force the deprecation.

Hm. It may be a bit annoying for rendezvous (e.g., pubsub), but I think we'll be fine with a clean break.

jacobheun · 2020-07-28T10:49:27Z

Closing this as multihashes are now being used.

Stebalien assigned ghost and kevina Oct 24, 2018

Stebalien transferred this issue from libp2p/go-libp2p-routing Dec 6, 2019

achingbrain mentioned this issue Dec 10, 2019

Switch to using multihashes for DHT queries libp2p/js-libp2p-kad-dht#163

Closed

aschmahmann mentioned this issue Dec 11, 2019

Provider records use multihashes instead of CIDs libp2p/go-libp2p-kad-dht#422

Merged

jacobheun closed this as completed Jul 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use raw multihashes for providers #755

Use raw multihashes for providers #755

Stebalien commented Oct 24, 2018

Stebalien commented Oct 24, 2018

kevina commented Oct 29, 2018 •

edited

Loading

kevina commented Oct 30, 2018

kevina commented Nov 5, 2018

magik6k commented Nov 6, 2018

Stebalien commented Nov 12, 2018

aschmahmann commented Dec 18, 2018

Stebalien commented Mar 22, 2019

aschmahmann commented Dec 9, 2019 •

edited

Loading

Stebalien commented Dec 10, 2019 via email

jacobheun commented Jul 28, 2020

Use raw multihashes for providers #755

Use raw multihashes for providers #755

Comments

Stebalien commented Oct 24, 2018

Stebalien commented Oct 24, 2018

kevina commented Oct 29, 2018 • edited Loading

kevina commented Oct 30, 2018

kevina commented Nov 5, 2018

magik6k commented Nov 6, 2018

Stebalien commented Nov 12, 2018

aschmahmann commented Dec 18, 2018

Stebalien commented Mar 22, 2019

aschmahmann commented Dec 9, 2019 • edited Loading

Stebalien commented Dec 10, 2019 via email

jacobheun commented Jul 28, 2020

kevina commented Oct 29, 2018 •

edited

Loading

aschmahmann commented Dec 9, 2019 •

edited

Loading