Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Use new blockstore that stores Multihashes and not CIDs. #5510

Closed
wants to merge 3 commits into from

Conversation

kevina
Copy link
Contributor

@kevina kevina commented Sep 22, 2018

Depends on ipfs/go-ipfs-blockstore#13.

Will not even compile yet.

Pressing issues that need to be addressed:

  1. The dagservice has a Delete method. This is problematic. For example if you insert a raw and a Protobuf version of the same Node then delete the one of those nodes via the dagservice both will be deleted. We should probably delete these methods. The only place the Delete method seams to be used is in dagutils/utils.go. I have marked two of the three instances with comments. Also see RFC: Remove Delete methods from DagService go-ipld-format#43

  2. Due to the choice to keep Bitswap using Cid's there is one place where we have multihashes but need Cid's. For now I converted them to Raw CIDs for lack of anything better to do. See NewBlockstoreProvider in exchange/reprovide/providers.go (https://github.com/ipfs/go-ipfs/pull/5510/files#diff-a23df8bd7581d29e09f90ef684b34631L16). Is this function even used, that is do we ever publish the complete lists of everything in the blockstore, and if so will this create a problem.

    As an aside, in these cases I believe we should introduce a new Unknown or Invalid codec type to mark the fact that we really don't know the original Codec as it was created via a multihash. Raw may be technically correct but is less informative. This was discussed in blockstore: switch from Cid to Multihash boxo#361 (with the last comment being blockstore: switch from Cid to Multihash boxo#361)

  3. Raw multihashes will now be exposed to the users in several places. Two of those places are ipfs refs local and ipfs repo gc. How should this be handled. Here are some options (a) Use the default string method for multihashes which is a hex string (with no multibase prefix) (b) Tack on a multibase prefix and then display it like a CID (c) Make it a Cid using the Raw codec and display that (d) Make it a Cid using a new codec type to specify an Unknown or Invalid codec as discussed in (2). Right now am using (a) as it makes it very obvious that what is being returned is not a Cid as cids are hardly ever encoded in hex. (b) and (c) don't make it obvious what is going on. The special codec type might be helpful in other situations but it is less obvious than (a) on visual inspection.

CC @Stebalien @whyrusleeping

@kevina kevina added the topic/cidv1b32 Topic cidv1b32 label Sep 22, 2018
@kevina kevina requested a review from Kubuxu as a code owner September 22, 2018 07:36
@ghost ghost assigned kevina Sep 22, 2018
@ghost ghost added the status/in-progress In progress label Sep 22, 2018
@@ -70,7 +70,7 @@ func addLink(ctx context.Context, ds ipld.DAGService, root *dag.ProtoNode, child
return nil, err
}

_ = ds.Remove(ctx, root.Cid())
//_ = ds.Remove(ctx, root.Cid())
Copy link
Contributor Author

@kevina kevina Sep 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of this call to remove.

@@ -124,7 +124,7 @@ func (e *Editor) insertNodeAtPath(ctx context.Context, root *dag.ProtoNode, path
return nil, err
}

_ = e.tmp.Remove(ctx, root.Cid())
//_ = e.tmp.Remove(ctx, root.Cid())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is working on a temporary in memory dag service. Maybe we should (1) implement a direct in memory dag service that does that even use a blockservice or (2) just use a different data structure, like maybe something as simple as a go map?

@Kubuxu
Copy link
Member

Kubuxu commented Oct 9, 2018

How would publishing providers records work with this?

@Stebalien
Copy link
Member

1

Honestly, I think we can just keep it and say "be careful". We're not really going to be making this worse because we can already (accidentally) reuse nodes from other DAGs. Basically, there's no way to know that dag.Delete(cid) won't interfere with some other DAG. A safe interface would use pins or a key-value store but we can work on designing that later.

2

This will, unfortunately, break bitswap compatibility with existing nodes. This is also a problem for providing. We'll have to pull in @diasdavid and @whyrusleeping because I know they've done quite a bit of thinking on this but here are some ideas:

DAG-based-bitswap

We can use a DAGService instead of a Blockservice to drive bitswap. We'll do this with graphsync anyways.

a. Unfortunately, for full compatibility, we may still have to issue bitswap queries and provider records for both CID versions (which is going to suck and may yield duplicate blocks). However, we can (a) make all nodes accept any CID version and (b) eventually stop sending CIDv0 queries. IMO, this is the cleanest solution from a "where we're going" standpoint.

b. We could also push this issue up the stack and say that the user (programmer) must make both requests if they're unsure (e.g., they've received a base32 CID from the API). This may be the best solution from a performance standpoint but it will annoying to use...

c. For slightly worse compatibility, we could forcibly downgrade all CIDv1 DagPB CIDs to CIDv0 when we hit the provider system/bitswap. However, that's going to suck for IA (which uses CIDv1 DagPB CIDs, IIRC). It's also completely backwards because we're trying to upgrade to CIDv1, not downgrade.

Just use multihashes

Allow raw multihashes and eventually switch over to them entirely. Basically:

  • A CID will (for now) always start with 0x01 and a multihash will start with something else.
  • CIDv0 CIDs are already valid raw multihashes.

This means CIDv0 CIDs will still work. However, this is kind of crappy from a compatibility perspective and looses information relative to the dag-based-bitswap solution.

Upgrade the network

This is basically the solution you've presented plus an upgrade path:

  1. Make bitswap serve requests regardless of the codec (i.e., plug in a blockstore that doesn't care about the codec).
  2. Upgrade the network.
  3. Start asking for blocks using the raw codec.

However, this would be a major break.

New Bitswap Version

We can, in fact, introduce a new bitswap version. This solution can be combined with one of the other solutions. For example, it can be combined with DAG-based-bitswap, variant a: Per-peer, bitswap decides to either send (a) both CIDs or (b) a single CIDv1 CID (or a raw multihash, or a raw CID, etc).

3

So, multihashes + multibase is probably the most correct solution. Unfortunately, they're ambiguous with CIDs so it's probably confusing from a user perspective.

As you say, we could also use hex-based multihashes (0x...). However, that would be the only place in all of IPFS where we'd be using them so I'm a bit reluctant to go with that. Basically, I'd rather not introduce a new concept to users.

Honestly, I think raw nodes is probably the right balance. We could also have a special codec for "unknown" but I think we should talk to users to see what they think before we add a new codec. My primary reservation is that I'd rather not (a) have a codec that can't be decoded or (b) have a codec duplicating raw. But we'll have to discuss this with users.

@kevina
Copy link
Contributor Author

kevina commented Oct 11, 2018

@Stebalien when you get a minute, can you outline the results of our discussion on lunch today (Thursday the 11th of October) to make sure we are all on the same page.

@kevina
Copy link
Contributor Author

kevina commented Oct 12, 2018

@Stebalien

However, that would be the only place in all of IPFS where we'd be using them so I'm a bit reluctant to go with that. Basically, I'd rather not introduce a new concept to users.

I think it better to introduce a new concept rather then mislead users. By using the raw codec we might give the user the impression that nothing but raw leaves are stored in the blockstore.

We could also have a special codec for "unknown" but I think we should talk to users to see what they think before we add a new codec. My primary reservation is that I'd rather not (a) have a codec that can't be decoded or (b) have a codec duplicating raw. But we'll have to discuss this with users.'

I am going to have to disagree with you on this one. For (a) that is kind of the point. It is a red flag that something is special about this CID. For (b) I do not consider it a duplicate at all, it conveys a different meaning.

@kevina
Copy link
Contributor Author

kevina commented Oct 12, 2018

1

Honestly, I think we can just keep it and say "be careful". We're not really going to be making this worse because we can already (accidentally) reuse nodes from other DAGs. Basically, there's no way to know that dag.Delete(cid) won't interfere with some other DAG. A safe interface would use pins or a key-value store but we can work on designing that later.

I am not exactly conformable with this option, but you are not going to get much of an argument from me.

@Stebalien
Copy link
Member

Conclusions from the lab-week discussion:

  1. The blockservice/blockstore continue to take CIDs for Add/Get. The blockservice returns multihashes from AllNodesChan. I believe this is what you're already doing.
  2. Bitswap continues to work with CIDs.
  3. We change provide and find providers to take raw multihashes.
  4. Locally, wantlists need to be indexed by multihash, not CID. That is map[Multihash][]Cid (or something like that).

@kevina
Copy link
Contributor Author

kevina commented Oct 16, 2018

@Stebalien thanks. I work to push those changes though and let you know if I run into any problems.

@lidel
Copy link
Member

lidel commented Dec 9, 2021

Superseded by #6816

@lidel lidel closed this Dec 9, 2021
@lidel lidel deleted the kevina/multihash branch December 9, 2021 19:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status/in-progress In progress topic/cidv1b32 Topic cidv1b32
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants