WIP: add the ipld plus dag-cbor protocol v1.1 #323

mvdan · 2021-12-17T18:28:38Z

(see commit message)

mvdan · 2021-12-17T18:29:12Z

This will fail for now; ipld-prime's bindnode doesn't support enums just yet.

mvdan

The only other mental TODO I have about the schema is how many times it repeats GraphSync as a prefix in the types. That could get pretty verbose in terms of Go code, errors, etc.

Perhaps we could name the package something like ipldgraphsync, so the types end up being ipldgraphsync.Request and so on?

message/ipldcbor/schema.ipldsch

rvagg · 2021-12-18T06:06:05Z

I think perhaps we should consider faster version matching in the schema. Eric wrote this up even before we had any code for schemas and it's a really good treatment of protocol versioning and applies here I think: https://ipld.io/docs/schemas/using/migrations/

e.g. nesting in a parent that has an explicit version makes it super simple an fast to select the right schema for the version you're decoding: {"graphsync-v1":[...]}. And in fact it can be even faster than that because those initial bytes are fixed because we're using dag-cbor and we can match initial bytes for a version like a set of magic bytes.

mvdan · 2022-01-06T10:35:04Z

I'll defer on faster version matching to you and Eric, who have more experience there :) For this WIP branch, I've literally just taken the schema that Hannah wrote in ipld/specs#354 - plus some very minor typo fixes.

rvagg · 2022-01-07T06:36:06Z

So this generates with your latest go-ipld-prime branch now @mvdan, but we still have the missing Any union elements to deal with. But, here's a thought—could we allow for bridging in bindnode such that if we have a struct that contains an ipld.Node field, that you allow it to fall back to basicnode at that point. So, for this, we have the nice Go types all the way up to:

// type GraphSyncExtensions {String:Any}

type GraphSyncExtensions struct {
        Keys   []string
        Values map[string]ipld.Node
}

So we allow ipld-prime to decode whatever it wants into the values of the Values map and just accept we have to break out of our nice native Go types at that point.

This could even be the native way that bindnode codegen works when it encounters Any, allowing Any to be a hand-wave to "I don't really know what to expect here, but it could be any valid data model data", which is what Any really is.

Presumably (I don't know the bindnode API very well), when we pass one of these ipld.Node values up to a graphsync caller (like datatransfer, which has its own custom extensions), it could have its own codegen'd bindnode Go types for the specific extension values and it just calls into bindnode to apply the conversion (or it could be some schema-based ipld-prime types that it uses to validate it's got what it wants).

warpfork · 2022-01-07T07:07:17Z

On versioning: I do still considerably favor use of a dummy union of keyed representation. Fast parse.

The rest of that document that Rod linked could use a re-pass for improvements to brevity and focus, but the gist of "a dummy union that is fast to parse and is also evolvability-friendly should be good for all stories" seems to remain true.

mvdan · 2022-01-07T08:44:38Z

Just so we're all on the same page, here's what Hannah and I discussed at yesterday's lightning sync in terms of next goals:

Attempt to reuse the existing message package types for bindnode, even if it requires some tweaking.
Support any in ipld-prime's schema package, like Rod says.
Using any for the selector initially, as it simplifies the proof of concept, and the selector needs to be compiled to be verified anyway. We can swap it out for the actual selector schema later.
Deferred decoding support, akin to https://pkg.go.dev/encoding/json#RawMessage or what cbor-gen supports. Will be needed for the extensions map, similar to what Rod says.

We briefly discussed ways to implement point 4, because it's not clear in my mind. The simplest would be what you say, to decode in the simplest data model form at run-time on bindnode, and then let each extension poke at the basic structure to read its values.

The other approach, which both Hannah and I favor, is something closer to what json and cbor-gen support: a special field (be it with a special type, or a special struct field tag) that signals "just keep the original encoded bytes for now, I'll decode at a later time". Then, an extension can decode into their actual schema node as needed. This will require a bit of coordination between bindnode and the codec package in question, which I'll investigate a bit.

There's nothing stopping us from supporting both approaches, letting the user decide. But at least for graphsync extensions, I think the latter will be a better fit.

mvdan · 2022-01-10T20:51:43Z

Upstream ipld-prime now supports Enum and Any; no more major blockers for now, continuing with the refactor here.

mvdan · 2022-01-11T12:32:00Z

Status update: the schema and bindnode now work well, pending ipld/go-ipld-prime#328.

The tests for the message package all pass; the transition to/from proto still works OK.

Now we "just" have to adapt the rest of the codebase to the changes in the message types. In summary, they are:

Export the fields; for example, we replace root cid.Cid and Root() cid.Cid with just Root cid.Cid. This is a requirement for bindnode reflection to work. Relatively easy refactor, I think. Perhaps loses us some degree of read-only safety. If we wanted to keep it, we could keep separate types and ToIPLD/FromIPLD methods, but I think it's not worth it.
Extensions use ipld.Node as value rather than []byte. I've added a type NamedExtension struct { Name graphsync.ExtensionName; Data ipld.Node } for the API methods and funcs.
I've tried to keep blocks.Blocks in the API funcs and methods, for the sake of an easier transition. I've added helpers to transition back and forth with the protocol-native GraphSyncBlock.
Note that request, responses, and blocks are now stored like in the protocol - as lists. This is nice as it removes the need for the "list accessors" that we had before.

@hannahhoward I'd love your eyes on what I've done so far before I go ahead with refactoring the rest of the codebase. If you disagree with any of the decisdions I've done up to this point, it's going to be harder for me to change them later in the week :)

mvdan · 2022-01-11T12:33:00Z

cc @willscott in case you have input, too :)

hannahhoward

My main comment on the changes to Message (Public members instead of accessor methods)

On the one hand, every time graphsync assembles messages, it uses the builder, so we're probably not in much trouble in terms of assembling messages.

BUT the big problem is your going to find is in the use of these interfaces:
https://github.com/ipfs/go-graphsync/blob/main/graphsync.go#L134
https://github.com/ipfs/go-graphsync/blob/main/graphsync.go#L156

These are root level public interfaces that are passed to every hook used with graphsync and backed by the current graphsync request/graphsync response structs. So, you're gonna have to wrap either way, cause these simply can't change for now cause of the breakage involved. :(

(plus giving external consumers of this code modify privileges to these structs seems dangerous)

Anyway, my gut is to push this into the message layer, keep the structs as close to as is as possible, and have seperate structs for the IPLD encoding/decode. It's a bummer in terms of memory allocations and copying, but then we already have that for protobuf if you dig into that code. Ultimately, the single largest part of a message by far is the data member of GraphSyncBlock, and that's ultimately just a pointer.

I would love to get @rvagg 's take though

hannahhoward · 2022-01-11T22:13:07Z

message/builder.go

 			Name: graphsync.ExtensionMetadata,
-			Data: mdRaw,
+			Data: basicnode.NewBytes(mdRaw), // TODO: likely wrong


Metadata is itself a CBOR encoded ipld.Node. We use CBOR-gen but don't have to -- it could just be bindnode -- it's a clear schema.

Moreover, if you look at my original spec proposal (albeit now hidden in an archive only repo), one of the big changes was promoting Metadata to an actual member of the response -- I'd really like to do that at least before we release this protocol for real. BUT that doesn't mean we need to do it in this PR. I don't think we should. Let's stay focused on getting the CBOR struct pushed through all of the code.

Background:

https://github.com/ipld/specs/pull/354/files#diff-593c970e9c16b6ef660f023f08d24e28e38667ad76f5a705011f4d873e2a3424R116

actually, to wrap all the breaks at once, I'd really like to change the metadata schema itself from

type GraphSyncMetadatum struct { link &Any // &Any here just means a Link blockPresent Bool } representation tuple type GraphSyncMetadata [GraphsyncMetadatum]

to

type GraphSyncMetadatum struct { link &Any // &Any here just means a Link blockTraversalType Int } representation tuple type GraphSyncMetadata [GraphsyncMetadatum]

Ah, right, metadata was one of those that got promoted out of an extension. The schema already includes what you posted on the archived repo. I'll fix up the schema as per your last note, and I'll leave a TODO about doing the moving in/out of the extensions map.

hannahhoward

Also, a quick apology:
We said the other day "just worry about encoding to IPLD and we'll worry about compatibility with proto". I think this might have been confusing guidance, cause it implied we could change the main message/response/request struct a lot without a bunch of breakage through out the code base.... this is not actually the case I think.

mvdan · 2022-01-12T10:59:57Z

Thanks for the input! No worries about the confusion. Rod brought this up yesterday (whether or not we'd use separate types), and I already told him my plan was to figure it out on the way. I just wasn't aware that there were public interfaces to be implemented by the message types :)

I'll go ahead and implement something like ToIPLD/FromIPLD. On the plus side, then the refactor across the rest of the repo will be minimal. My only open question, then, is what I should do with ToNet/FromNet as it currently assumes protobuf. I think I'll leave that alone for now, as I'm not sure what the code will need to look like for graphsync to choose between the old and new encodings.

hannahhoward · 2022-01-12T19:12:21Z

@mvdan sounds excellent! Yes, don't worry about ToNet/FromNet -- @rvagg is in the middle of changing these anyway to support protocol versioning.

rvagg · 2022-01-13T05:53:05Z

@mvdan if you look at #332 there's version switching involved, with the key differentiation points being ToProto() and newMessageFromProto(). There's a V1 for both of these that does the current 1.0.0 format, but these methods do the "new" format, which is just the current format but with the request IDs being []byte. So what we need to do is replace these methods with the bindnode wrap and unwrap and other magic to get the wire bytes. As I've mentioned in that PR, the request ID now being a string internally, that should be a dag-cbor Bytes over the wire, is a bit of a problem so far.

You're welcome to hack on that branch if you want to try and wire that up, otherwise I'm happy to try and consume changes you make here, it's just a large divergence as we work in the same area of the codebase.

mvdan · 2022-01-13T10:04:15Z

I'll continue the refactor here for now, at least until I've got the equivalent of ToProto/FromProto but for IPLD. I don't think that should cause significant conflicts with your changes, because in theory I shouldn't need to modify the message package much.

I'll let you know when that second version is done. From that point, I imagine it would be easier for you or Hannah to take over and finish the protocol changes, because my IPLD-related involvement should be finished at that point - but I'm happy to do it another way if it's better for you :)

mvdan · 2022-01-13T15:52:35Z

ToIPLD and messageFromIPLD are implemented. See the benchmark, which shows both protobuf and IPLD round-tripping; you can run them via go test -bench=. -benchtime=1x.

Note that the IPLD one fails to roundtrip right now, as I haven't dealt with translating extensions. The current message APIs use []byte, whereas the new IPLD protocol uses the more generic ipld.Node, so I don't think it can work unless we tweak the API to support ipld.Node extensions.

Also note that there's still the replace for go-ipld-prime, but I'll get rid of that shortly once the last PR is merged.

rvagg · 2022-01-14T05:45:26Z

@mvdan if you wouldn't mind using the rvagg/uuid-rebasing branch, from #332, for additional work on this that would be helpful. Even cherry-picking has become painful, mainly because of some refactoring of message.go to split out a messageformat.go. I've also been iterating on your types in there—introducing some optional properties on the schema (and running into trouble, will open an issue in go-ipld-prime for you on this).

mvdan · 2022-01-14T09:44:27Z

ACK on working on the same branch, closing

mvdan marked this pull request as draft December 17, 2021 18:28

mvdan commented Dec 17, 2021

View reviewed changes

message/ipldcbor/schema.ipldsch Outdated Show resolved Hide resolved

message/ipldcbor/schema.ipldsch Outdated Show resolved Hide resolved

hannahhoward reviewed Jan 11, 2022

View reviewed changes

This was referenced Jan 13, 2022

feat(requestid): use uuids for requestids #313

Closed

[Feature] UUIDs, protocol versioning, v2 protocol w/ dag-cbor messaging #332

Merged

mvdan added 11 commits January 13, 2022 14:34

WIP: add the ipld plus dag-cbor protocol v1.1

546aa22

more WIP: upstream supports Any now

c06a9cd

start using bindnode.Prototype

483c4f5

schema is fully working now

a6949b7

remove the replace directive after the ipld merge

da225d3

start refactoring types

15e34dc

some more transition work in builder

7e61e3c

current types now bind to the schema

37d1137

start adapting tests; custom type for extension names

e41494b

go back to blocks.Block for the message funcs

ff6c3d9

message tests build for now

3d9529a

mvdan added 8 commits January 13, 2022 14:34

nearly all tests passing

17d0b7a

message tests passing

0186fd8

remove bindnode-generated types, as we use the native ones now

13b3ea7

join ipldcbor package, no longer needed

c0d1a36

add first roundtrip benchmark

be55f94

move ipld cbor out of the message package again

4800f1c

add first version of to/from ipld

01ecf66

wrap up the deep comparisons

1647599

mvdan added 2 commits January 13, 2022 15:56

drop the ipld-prime replace

ed7254b

fix up ID vs Id, remove unintentionally copied code

6619a07

mvdan closed this Jan 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: add the ipld plus dag-cbor protocol v1.1 #323

WIP: add the ipld plus dag-cbor protocol v1.1 #323

mvdan commented Dec 17, 2021

mvdan commented Dec 17, 2021

mvdan left a comment

rvagg commented Dec 18, 2021

mvdan commented Jan 6, 2022

rvagg commented Jan 7, 2022

warpfork commented Jan 7, 2022

mvdan commented Jan 7, 2022 •

edited

Loading

mvdan commented Jan 10, 2022

mvdan commented Jan 11, 2022

mvdan commented Jan 11, 2022

hannahhoward left a comment •

edited

Loading

hannahhoward Jan 11, 2022

mvdan Jan 12, 2022

hannahhoward left a comment

mvdan commented Jan 12, 2022

hannahhoward commented Jan 12, 2022

rvagg commented Jan 13, 2022

mvdan commented Jan 13, 2022

mvdan commented Jan 13, 2022

rvagg commented Jan 14, 2022

mvdan commented Jan 14, 2022

WIP: add the ipld plus dag-cbor protocol v1.1 #323

WIP: add the ipld plus dag-cbor protocol v1.1 #323

Conversation

mvdan commented Dec 17, 2021

mvdan commented Dec 17, 2021

mvdan left a comment

Choose a reason for hiding this comment

rvagg commented Dec 18, 2021

mvdan commented Jan 6, 2022

rvagg commented Jan 7, 2022

warpfork commented Jan 7, 2022

mvdan commented Jan 7, 2022 • edited Loading

mvdan commented Jan 10, 2022

mvdan commented Jan 11, 2022

mvdan commented Jan 11, 2022

hannahhoward left a comment • edited Loading

Choose a reason for hiding this comment

hannahhoward Jan 11, 2022

Choose a reason for hiding this comment

mvdan Jan 12, 2022

Choose a reason for hiding this comment

hannahhoward left a comment

Choose a reason for hiding this comment

mvdan commented Jan 12, 2022

hannahhoward commented Jan 12, 2022

rvagg commented Jan 13, 2022

mvdan commented Jan 13, 2022

mvdan commented Jan 13, 2022

rvagg commented Jan 14, 2022

mvdan commented Jan 14, 2022

mvdan commented Jan 7, 2022 •

edited

Loading

hannahhoward left a comment •

edited

Loading