-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: add the ipld plus dag-cbor protocol v1.1 #323
Conversation
This will fail for now; ipld-prime's bindnode doesn't support enums just yet. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only other mental TODO I have about the schema is how many times it repeats GraphSync as a prefix in the types. That could get pretty verbose in terms of Go code, errors, etc.
Perhaps we could name the package something like ipldgraphsync
, so the types end up being ipldgraphsync.Request
and so on?
I think perhaps we should consider faster version matching in the schema. Eric wrote this up even before we had any code for schemas and it's a really good treatment of protocol versioning and applies here I think: https://ipld.io/docs/schemas/using/migrations/ e.g. nesting in a parent that has an explicit version makes it super simple an fast to select the right schema for the version you're decoding: |
I'll defer on faster version matching to you and Eric, who have more experience there :) For this WIP branch, I've literally just taken the schema that Hannah wrote in ipld/specs#354 - plus some very minor typo fixes. |
So this generates with your latest go-ipld-prime branch now @mvdan, but we still have the missing // type GraphSyncExtensions {String:Any}
type GraphSyncExtensions struct {
Keys []string
Values map[string]ipld.Node
} So we allow ipld-prime to decode whatever it wants into the values of the This could even be the native way that bindnode codegen works when it encounters Presumably (I don't know the bindnode API very well), when we pass one of these |
On versioning: I do still considerably favor use of a dummy union of keyed representation. Fast parse. The rest of that document that Rod linked could use a re-pass for improvements to brevity and focus, but the gist of "a dummy union that is fast to parse and is also evolvability-friendly should be good for all stories" seems to remain true. |
Just so we're all on the same page, here's what Hannah and I discussed at yesterday's lightning sync in terms of next goals:
We briefly discussed ways to implement point 4, because it's not clear in my mind. The simplest would be what you say, to decode in the simplest data model form at run-time on bindnode, and then let each extension poke at the basic structure to read its values. The other approach, which both Hannah and I favor, is something closer to what json and cbor-gen support: a special field (be it with a special type, or a special struct field tag) that signals "just keep the original encoded bytes for now, I'll decode at a later time". Then, an extension can decode into their actual schema node as needed. This will require a bit of coordination between bindnode and the codec package in question, which I'll investigate a bit. There's nothing stopping us from supporting both approaches, letting the user decide. But at least for graphsync extensions, I think the latter will be a better fit. |
Upstream ipld-prime now supports Enum and Any; no more major blockers for now, continuing with the refactor here. |
Status update: the schema and bindnode now work well, pending ipld/go-ipld-prime#328. The tests for the message package all pass; the transition to/from proto still works OK. Now we "just" have to adapt the rest of the codebase to the changes in the message types. In summary, they are:
@hannahhoward I'd love your eyes on what I've done so far before I go ahead with refactoring the rest of the codebase. If you disagree with any of the decisdions I've done up to this point, it's going to be harder for me to change them later in the week :) |
cc @willscott in case you have input, too :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My main comment on the changes to Message (Public members instead of accessor methods)
On the one hand, every time graphsync assembles messages, it uses the builder, so we're probably not in much trouble in terms of assembling messages.
BUT the big problem is your going to find is in the use of these interfaces:
https://github.com/ipfs/go-graphsync/blob/main/graphsync.go#L134
https://github.com/ipfs/go-graphsync/blob/main/graphsync.go#L156
These are root level public interfaces that are passed to every hook used with graphsync and backed by the current graphsync request/graphsync response structs. So, you're gonna have to wrap either way, cause these simply can't change for now cause of the breakage involved. :(
(plus giving external consumers of this code modify privileges to these structs seems dangerous)
Anyway, my gut is to push this into the message layer, keep the structs as close to as is as possible, and have seperate structs for the IPLD encoding/decode. It's a bummer in terms of memory allocations and copying, but then we already have that for protobuf if you dig into that code. Ultimately, the single largest part of a message by far is the data
member of GraphSyncBlock, and that's ultimately just a pointer.
I would love to get @rvagg 's take though
message/builder.go
Outdated
Name: graphsync.ExtensionMetadata, | ||
Data: mdRaw, | ||
Data: basicnode.NewBytes(mdRaw), // TODO: likely wrong |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Metadata is itself a CBOR encoded ipld.Node. We use CBOR-gen but don't have to -- it could just be bindnode -- it's a clear schema.
Moreover, if you look at my original spec proposal (albeit now hidden in an archive only repo), one of the big changes was promoting Metadata to an actual member of the response -- I'd really like to do that at least before we release this protocol for real. BUT that doesn't mean we need to do it in this PR. I don't think we should. Let's stay focused on getting the CBOR struct pushed through all of the code.
Background:
actually, to wrap all the breaks at once, I'd really like to change the metadata schema itself from
type GraphSyncMetadatum struct {
link &Any // &Any here just means a Link
blockPresent Bool
} representation tuple
type GraphSyncMetadata [GraphsyncMetadatum]
to
type GraphSyncMetadatum struct {
link &Any // &Any here just means a Link
blockTraversalType Int
} representation tuple
type GraphSyncMetadata [GraphsyncMetadatum]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, right, metadata was one of those that got promoted out of an extension. The schema already includes what you posted on the archived repo. I'll fix up the schema as per your last note, and I'll leave a TODO about doing the moving in/out of the extensions map.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, a quick apology:
We said the other day "just worry about encoding to IPLD and we'll worry about compatibility with proto". I think this might have been confusing guidance, cause it implied we could change the main message/response/request struct a lot without a bunch of breakage through out the code base.... this is not actually the case I think.
Thanks for the input! No worries about the confusion. Rod brought this up yesterday (whether or not we'd use separate types), and I already told him my plan was to figure it out on the way. I just wasn't aware that there were public interfaces to be implemented by the message types :) I'll go ahead and implement something like ToIPLD/FromIPLD. On the plus side, then the refactor across the rest of the repo will be minimal. My only open question, then, is what I should do with ToNet/FromNet as it currently assumes protobuf. I think I'll leave that alone for now, as I'm not sure what the code will need to look like for graphsync to choose between the old and new encodings. |
@mvdan if you look at #332 there's version switching involved, with the key differentiation points being You're welcome to hack on that branch if you want to try and wire that up, otherwise I'm happy to try and consume changes you make here, it's just a large divergence as we work in the same area of the codebase. |
I'll continue the refactor here for now, at least until I've got the equivalent of ToProto/FromProto but for IPLD. I don't think that should cause significant conflicts with your changes, because in theory I shouldn't need to modify the message package much. I'll let you know when that second version is done. From that point, I imagine it would be easier for you or Hannah to take over and finish the protocol changes, because my IPLD-related involvement should be finished at that point - but I'm happy to do it another way if it's better for you :) |
ToIPLD and messageFromIPLD are implemented. See the benchmark, which shows both protobuf and IPLD round-tripping; you can run them via Note that the IPLD one fails to roundtrip right now, as I haven't dealt with translating extensions. The current message APIs use Also note that there's still the replace for go-ipld-prime, but I'll get rid of that shortly once the last PR is merged. |
@mvdan if you wouldn't mind using the |
ACK on working on the same branch, closing |
(see commit message)