-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extract dagservice and friends from go-ipfs #8
Conversation
Context: It turns out I will need this to get PubSub+IPLD working. IPLD without the DAG is kind of useless. @whyrusleeping it appears I can't assign reviewers on this repo so I'm going to informally throw this on your plate 🏋️♂️. @kevina You also have a fair amount of experience with this subsystem. |
merkledag.go
Outdated
// | ||
// Why *not* do this? We might decide to store a light-weight DAG of links | ||
// without actually storing the data. I don't really find that to be a | ||
// convincing argument. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Plan with this interface is to have link cache/graph DB at some time in the future, fetching object from disk to parse them to just see the links adds not that minor overhead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah. I assume this interface needs to be updated for IPLD? Non-IPFS IPLD nodes don't fill out the Name
and Size
fields of the Link
structure (as a matter of fact, they can't accurately fill out the Size
field).
I assume such an interface would want to return an array of:
Link {
Path []string
Cid *cid.Cid
}
(for pathing through "light-weigh" objects)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The linkservice interface is useful for optimizing graph traversals. Large ipld graphs might cache links for a given node to make resolving down a path much much faster.
I agree that there might be a cleaner way, but your proposal above includes getting a node, and then calling getlinks on it. the problem there is that you have to spend time and memory unmarshaling the bytes from disk into the in memory object and then calling another function on it that returns the link array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Large ipld graphs might cache links for a given node to make resolving down a path much much faster.
Currently (I hope), this is just used for pinning/GC. The only Node type that fills in the Name field of Link structs is PBNode so this can't be used for resolving paths. (but it's obviously still useful or GC/pinning).
An alternative is to use upcasting/specialization (or whatever go calls it) to simplify DAGService implementations:
// NodeGetters can optionally implement this interface to make
// finding linked objects faster.
type GetLinks interface {
NodeGetter
GetLinks(ctx context.Context, nd *cid.Cid) ([]Link, error)
}
func GetLinks(ctx context.Context, ng NodeGetter, nd *cid.Cid) ([]Link, error) {
if c.Type() == cid.Raw {
return nil, nil
}
if gl, ok := ng.(GetLinks); ok {
return gl.GetLinks(ctx, nd)
}
node, err := ng.Get(ctx, c)
if err != nil {
return nil, err
}
return node.Links(), nil
}
However, I don't know how prevalent this pattern is in the go-ipfs codebase and whether or not it's worth introducing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, using the upcasting/specialization pattern here sounds good to me. I also believe youre right about this only being used for GC and pinning
promise.go
Outdated
// 1. NodeGetter is a naming conflict. | ||
// 2. It's a promise... | ||
|
||
// TODO: Should this even be an interface? It seems like a simple struct would |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could probably be a simple struct as opposed to an interface. I'm not sure i can see a scenario where we would want to implement this differently
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the thread-safety guarantees here? It looks like single-writer/single-reader. Is that the case?
merkledag.go
Outdated
Remove(Node) error | ||
|
||
// TODO: This is returning them in-order?? Why not just use []NodePromise? | ||
// Maybe add a couple of helpers for getting them in-order and as-available? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is returned as a channel to make its usage in caller contexts much simpler. As a channel, we can just call range
over it to do a 'for each node' type thing. Using an array of things, we would have to range over the array, then call 'get' on each inside the loop, which is slightly more complicated. Maybe we can change this in the future, but for now i think its fine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, it's just a bit inconsistent. I'll leave a shorter todo.
merkledag.go
Outdated
|
||
// An interface for batch-adding nodes to a DAG. | ||
|
||
// TODO: Is this really the *right* level to do this at? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason for this abstraction is to avoid forcing callers to think about batching. This batch can determine the right time to flush on its own for optimal memory vs batch size tradeoffs, if done differently, each caller would have to reimplement similar logic, and that increases the chance of them doing something incorrectly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking of defining some form of func Batching(dag DAGService) Batch
function where Batch
is a concrete helper type that calls AddMany
as needed.
Pro: You don't need to implement a custom batching strategy per DAGService.
Con: You can't implement a custom batching strategy for individual DAGServices.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that sounds good. Removing Batch
as a method on the dagservice seems fine to me.
Thoughts @Kubuxu ?
I've addressed the comments and done a little cleanup. To discuss
|
Is there a p.r in go-ipfs for using this new interface? There are several things that concern me in particular I would not approve of this p.r. until there is a corresponding p.r. in go-ipfs that passes all the tests. I don't have a strong opinion either way for LinkGetter vs LinkService as long as the new interface works with go-ipfs. |
@kevina the suggestion for replacement is here: ipfs/kubo#4009 |
Actually, for now, I've added an
I'd rather not spend the time refactoring IPFS until we've agreed on these interfaces.
This PR (intentionally) doesn't actually touch any code that IPFS currently uses. The idea is to extract some IPLD interfaces/code from the go-ipfs repo into this repo and then later update the go-ipfs repo to depend on them (removing the duplicate code from the go-ipfs repo in the process). |
I will have a hard time reviewing the interface without seeing it in use. There are minor details that could be missed. I could probably do the actual refactoring in a few hours. The time consuming part for me would be the gx-publish part. @whyrusleeping thoughts |
I'll take that as "this looks fine but my review can't progress without seeing this in use". I'll go ahead and implement that PR, I just didn't want to spend time on that without getting any feedback here (as it is, that would have been a waste as the code here has changed over the course of this PR). |
|
**DO NOT MERGE!** Refactor IPFS to use: ipfs/go-ipld-format#8 Context: IPLD isn't currently very usable from libp2p because interfaces like the DAGService are bundled into go-ipfs. This PR extracts these interfaces and some of the related helper functions. License: MIT Signed-off-by: Steven Allen <[email protected]>
merkledag.go
Outdated
type LinkGetter interface { | ||
NodeGetter | ||
// TODO(ipfs/go-ipld-format#9): This should return []*cid.Cid | ||
GetLinks(ctx context.Context, nd *cid.Cid) ([]*Link, error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One also important reason this interface exists is that raw nodes can't have links so we don't have to fetch them to enumerate their links. This optimization was placed in LinkService.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but I see you placed that short-circuit in helper bellow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes; the idea is to avoid requiring that all DAGServices implement this method (even it they don't cache links).
Plan: #23 |
@Stebalien the build is now failing, it looks like you may of forgot to commit something. |
eaf2cad
to
5ed5f98
Compare
Sorry, I'm concurrently working on the next PR (refactoring the Node API). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, Is there anything else that needs to be done here?
I have a second PR that refactors the node API but that's blocked on finalizing the IPLD spec. |
link to PR? and why is it blocked on the spec? what are the open questions? (and where can we answer them) |
I haven't push it yet. I was waiting on answering the questions: Does IPLD have primitive types? If so, what? (#25). (This is that thing I keep on trying to schedule a discussion about). |
My answer was no, it doesn't have canonical types
…On Wed, Oct 4, 2017, 4:59 PM Steven Allen ***@***.***> wrote:
I haven't push it yet. I was waiting on answering the questions: Does IPLD
have primitive types? If so, what? (#25
<#25>).
(This is that thing I keep on trying to schedule a discussion about).
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#8 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABL4HJKNMz2EB56wtECAmq9a5njVtMOOks5so48ngaJpZM4ODF3C>
.
|
(ipfs/kubo#4296) 1. Modern storage devices (i.e., SSDs) tend to be highly parallel. 2. Allows us to read and write at the same time (avoids pausing while flushing). fixes ipfs/kubo#898 (comment)
1a40eb0
to
0408f8d
Compare
@whyrusleeping You can always add that to the copy of However, we should probably just move forward with this anyways (and leave fixing the node API to a future PR). I have a few more questions open for discussion but we can punt on those if you want. |
@@ -0,0 +1,113 @@ | |||
package format |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a lot of code for an interface package. Unfortunately, the interface needs this (unless we decide to introduce a Batch
interface.
merkledag.go
Outdated
NodeGetter | ||
|
||
// Add adds a node to this DAG. | ||
Add(Node) (*cid.Cid, error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this really need to return the Cid? The caller can also just call node.Cid()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree
merkledag.go
Outdated
// | ||
// Consider using NewBatch instead of calling this directly if you need | ||
// to add an unbounded number of nodes to avoid buffering too much. | ||
AddMany([]Node) ([]*cid.Cid, error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here. Worse, this will allocate something that we're likely to ignore (that's why I bothered bringing it up).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably fine to not have it return cids, as you say, though there might be cases where we need an array of cids directly after calling AddMany, and that will kindof suck.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, we can just create a Cids([]Node) []*cid.Cid
function for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that SGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree also.
merkledag.go
Outdated
// Remove removes a node from this DAG. | ||
// | ||
// If the node is not in this DAG, Remove returns ErrNotFound. | ||
Remove(*cid.Cid) error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we also want RemoveMany(...)
(for, e.g., dag-level GC)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can probably punt on this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO yes.
} | ||
|
||
// DAGService is an IPFS Merkle DAG service. | ||
type DAGService interface { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should all of these methods take contexts? Currently, they don't because they never hit the network. However, I assume we'll eventually want remote DAGs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might want to add it now.
Actually, I'm just going to punt on all of those questions and merge this if I don't get any objections within the next few hours. 🚀 |
I wanted to give it a review tomorrow morning. |
@Kubuxu got it. I'll hold off. FYI, I've decided to punt on fixing the Node API till later so we can get stop blocking this on "one more question". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One comment, and also I would love to see those interface changes implemented.
// Node is the base interface all IPLD nodes must implement. | ||
// | ||
// Nodes are **Immutable** and all methods defined on the interface are | ||
// **Thread Safe**. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If Nodes are immutable, what is the reason for Copy
method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
None. I was planning on fixing that in this PR but have punted on fixing Nodes till a future one.
} | ||
|
||
// DAGService is an IPFS Merkle DAG service. | ||
type DAGService interface { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might want to add it now.
merkledag.go
Outdated
NodeGetter | ||
|
||
// Add adds a node to this DAG. | ||
Add(Node) (*cid.Cid, error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree
merkledag.go
Outdated
// Remove removes a node from this DAG. | ||
// | ||
// If the node is not in this DAG, Remove returns ErrNotFound. | ||
Remove(*cid.Cid) error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO yes.
merkledag.go
Outdated
// | ||
// Consider using NewBatch instead of calling this directly if you need | ||
// to add an unbounded number of nodes to avoid buffering too much. | ||
AddMany([]Node) ([]*cid.Cid, error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree also.
1. Add is already idempotent. This makes remove match. 2. Generally, all we care about is that the node no longer exists in the DAG. If two callers remove a node at the same time, they should both succeed. Currently, we *ignore* the result of Remove in go-ipfs. IMO, it would be better to let errors be *errors* and only return an error if something goes wrong. 3. This can be significantly faster. It allows us to batch/queue removes (as long as we guarantee that they'll eventually happen). 4. This matches how most databases/key-value stores operate. An alternative would be to return `(deleted bool, err error)` but then we don't get the speedup.
The caller can just call `node.Cid()` and returning CIDs from `AddMany` requires allocation.
We'll need these for slower/remote datastores.
I considered (well, implemented then threw it away) allowing contexts on all calls to Batch (Add, Commit, etc). However, really, you should treat a batch as a single large "operation". I also went down the road of generalizing batches to sessions. However, it became immediately obvious that permitting add *and* remove *and* fetch would require a lot of bookkeeping and that you'd lose a lot of performance. So, we'll do that separately.
@Kubuxu I've decided to just go ahead and address all your comments/the todos now.
Also, improved Batch by adding context support (we'll need that to use batch from the API). However, this is starting to become an ever growing PR so I should probably stop now... Final review? |
If we're on the breaking interfaces train already, i'd love to get a Also fine having that be in a follow-up too though |
@whyrusleeping we could also just lazily deserialize nodes. Are you worried about the intermediate buffer? |
@Kubuxu IIRC, you wanted to do a final review. |
This p.r. sorta interferes with my effort to create better error messages, which I believe will require some API changes to make it happen. I would like to work though ipfs/kubo#4468 first as it is easier to modify things when they are all in the same package. But if this is blocking something I don't want to hold anything in,espacally considering I just created the p.r. yesterday. |
@kevina I'd like to push this through ASAP. @whyrusleeping wants some of these changes and this has been sitting here for months. |
This is a WIP/RFC attempt at extracting DAGService from go-ipfs. Please see the TODOs in the source files as there's a lot to discuss here.
However, the first question is: Is this even the right place to put this? I could create an go-ipfs-dag repo.