-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use raw multihashes for providers #755
Comments
(note: there may be other ways to do this but, at the end of the day, we need to be able to lookup a raw multihash of a block in the DHT). |
Also see: https://github.com/ipfs/go-ipfs-blockstore/issues/8 and ipfs/kubo#5510 (comment) for context. I made an attempt to push it though. I basically changed the interface in libp2p/go-libp2p-routing#33 and then semi-blindly pushed this through. Everything seams to work, but not really knowing how the network layers of ipfs work I am sure I missed things. Shareness tests are passing locally in Most of the significant changes are in libp2p/go-libp2p-kad-dht#203. Other changes are in: libp2p/go-libp2p-pubsub-router#14, libp2p/go-libp2p-routing-helpers#13, ipfs/go-bitswap#18, ipfs/go-ipfs-routing#15 |
Original IRC Conversation (mostly for my own reference): kevina: isn't your proposed change in the wire protocol? |
@Stebalien I spend some catching on how IPFS works at the network lawyer and I am not sure I like your solution. For simplicity let's assume we just use the raw binary string for the DHT and not try to decode into a CID or Multihash. Once the newer nodes switch to using Multihashes this will (1) prevent older nodes from discovering CIDv1 content on newer nodes as they will be looking for CIDv1 strings and newer nodes will index based on the multihash If we do as you propose and reject older nodes in the newer nodes DHT I don't see how it will help with (1) or (2) and it may make the situation worse. I need to study this more, but your insight (or those from the libp2p folks) will be helpful. |
I don't have any specific numbers, but my guess would be that vast majority of requests are for CIDv0 content (note that for stuff with raw leaves we still usually start from cidv0 root), so this shouldn't have much impact on most people. Also afaik bitswap will use some workaround for this, so connected nodes won't even notice this. For the remaining few impacted users it should be easy enough to say 'just update' |
So, I can see two solutions:
Unfortunately, publishing these records is already really expensive and, luckily, we don't have many users of CIDv1 anyways. Most users of CIDv1 (and IPLD) are using javascript which doesn't currently interoperate with the DHT. So, yeah, I'm not happy with any of these solutions but I feel that:
Is the most flexible approach. |
A few (likely stupid) questions:
@Stebalien is multiple codecs for the same data a common/serious issue? Is the implication here that multihashes are effectively our "data identifiers", but CIDs are "content" = data + metadata identifiers? I'm trying to understand 1) what CIDs are meant to be used for if not identifying content? 2) what's the plan for solving the multiple addresses per content issue as sha256 becomes obsolete, don't we run into these issues all over again? |
Not yet but this could be useful for, e.g., packing IPLD data up into an archive while still deduplicating the underlying blocks. That is, it allows one to build multiple "alternative" DAGs/views with a single set of leaves.
Yeah, this is a bit funky. Really, CIDs are "typed" identifiers while multihashes identify blobs of data. We could also just use the raw data multicodec but the current proposal of "just use multihashes" is the simplest solution, IMO. |
@Stebalien any chance we can do an easier version of this that doesn't require interface changes so we can ship this soon? Proposal: Leave the interfaces the same, but send multihashes over the wire. Since CIDv0 is a multihash none of those records should change, CIDv1 records will change (so newer and older clients will have not be able to find each others' CIDv1 data). Is the CIDv1 record incompatibility a problem? If so then the only options I can think of right now are for clients to do multiple queries for a while before we deprecate the publishing CIDv1s, or to rely on a dht version bump to force the deprecation. |
@Stebalien any chance we can do an easier version of this that doesn't require interface changes so we can ship this soon?
Proposal: Leave the interfaces the same, but send multihashes over the wire. Since CIDv0 is a multihash none of those records should change, CIDv1 records will change (so newer and older clients will have not be able to find each others' CIDv1 data).
Yes. Let's do it.
Is the CIDv1 record incompatibility a problem? If so then the only options I can think of right now are for clients to do multiple queries for a while before we deprecate the publishing CIDv1s, or to rely on a dht version bump to force the deprecation.
Hm. It may be a bit annoying for rendezvous (e.g., pubsub), but I think we'll be fine with a clean break.
|
Closing this as multihashes are now being used. |
Currently, we're using CIDs. However, multiple CIDs can map to the same block (v1, v0, different codecs). Due to the base32 CID migration effort, we'll need to switch over the routing system to use raw multihashes.
Note: Given that raw multihashes are still valid V0 CIDs, old servers will continue to work.
Proposed changes:
handleAddProvider
, just use strings/[]byte; don't parse anything (although we can assert that the length < 100 bytes (or something like that). We could also just use mutlihashes but then we'd need to change the multihash library to handle unknown multihash codes (which we may want to do anyways?).@kevina is working on this from the go-ipfs side but he'll need someone on the libp2p side to implement these content-routing changes. Any takers?
The text was updated successfully, but these errors were encountered: