DagJSON/DagCBOR encode+decode changes selector traversal order? #220

hannahhoward · 2021-08-05T00:01:12Z

I attempted to upgrade go-graphsync to master and found my tests are now broken.

It looks like we added code to "sort" fields on encode/decode from DAG-JSON / DAG-CBOR per the IPLD spec I guess?

The problem here is that means that building an IPLD node, then encoding/decoding from DAG-JSON or DAG-CBOR produces a different selector traversal order, which is unexpected to say the least.

It's also of course important cause selector traversal order remaining constant is a fairly important gaurantee, at least for Filecoin.

I've produced a test to demonstrate the behavior.

Not sure if this is "works as intended", but definitely quite unexpected and will break several things.

I wonder if it makes sense to force things to get added "in order" when building nodes?

Oh also this might cause a lot of data to change CIDs as well.

willscott · 2021-08-05T04:49:41Z

Relaying some useful context from @rvagg :

the state of world regarding map key sorting is

Old Go stack sorts RFC7049 on encode (via refmt)
Old and new JS stack sorts RFC7049 on encode
ipld-prime did no sorting until now, due to ongoing arguments about where the responsibility for sorting lies, but also for performance concerns
cbor-gen does its own form of sorting, simple bytewise, but maps are not used on main chain at all

But the reason why this isn’t the end of the world is that we don’t care about round-trip consistency in most cases (we do in some, i.e. chain consensus, which is why we need to be careful and why avoiding maps (and floats) is a great choice). On decode, what matters most of the time is that our implementations see the fields in the order that they are encoded in. So ipld-prime does this, traversals will work on the blocks as they are so selector ordering is stable. We have a problem with our JS stack in that we don’t preserve this ordering properly on decode, which is one reason why selectors in JS are a bit of a problem.

warpfork · 2021-08-10T08:55:43Z

The problem here is that means that building an IPLD node, then encoding/decoding from DAG-JSON or DAG-CBOR produces a different selector traversal order, which is unexpected to say the least.

^ I'm also here. I think we've made more footguns by enforcing this sorting than we remove.

But I feel I've been outvoted.

mvdan · 2021-08-10T16:45:16Z

@rvagg said:

On decode, what matters most of the time is that our implementations see the fields in the order that they are encoded in. So ipld-prime does this, traversals will work on the blocks as they are so selector ordering is stable.

Just so I can understand what's going on here better: what's causing breakage here is not the decoding, but the encoding, since the encoding by default now does the sorting as per the IPLD codec spec.

And again as per Rod's input, it seems like this will generally be fine if users avoid tricky types like maps, where the key order can add edge cases.

So if we have a system that already uses maps and requires determinism with encode-decode roundtrips, could we not compromise by forcing the codec Encode calls use "no map sorting"? That has existed since 083c395.

The question then might be "why should the default be to sort and break those users"? If we swapped the default to not sort map keys for the user, in my opinion we open up even bigger footguns, as the implementation of the codecs would simply not do what the spec requires by default.

Comparatively, deterministic traversal order when using encode-decode roundtrips while using maps feels like a fairly advanced use case. I think it's more reasonable for those users to opt out via MapSortMode or an equivalent option :)

mvdan · 2021-08-10T16:49:44Z

To add another bit of info that might get lost in Slack, Rod has revived the cbor-gen PR to also make it follow the spec: whyrusleeping/cbor-gen#56

rvagg · 2021-08-11T03:26:17Z

(slightly philosophical rant below, feel free to gloss)

IMO the horse has bolted on this, sorting was baked into the codec usage for IPLD, as far as I understand from the beginning, and it's based on the 7049 suggestions. If you want to do something else wrt map ordering then that should probably be a different codec.

One of the primary goals is that of convergence from data produced by multiple systems. When you only have a single implementation then whatever but we're already starting off with two equally prominent stacks and when we move into blockchain-space we have multiple implementations that are having to match the rules of the reference implementation and it can get gnarly (witness the hassles over big.Int for non-Go implementations).

The problem here is that means that building an IPLD node, then encoding/decoding from DAG-JSON or DAG-CBOR produces a different selector traversal order, which is unexpected to say the least.

This sentiment seems reasonable, but it's a shifting of the goalposts. If this is what we really want then we're signing up for significant investment in making this work in both of our stacks. I think there's an important point here about IPLD - it's slowly moved from being primarily about data serialization to be concerned with higher level concerns about instantiated forms, which we've largely captured under the term "data model". This really has been driven almost entirely by @warpfork through go-ipld-prime but has not, at all, been mirrored by the JavaScript stack which still views IPLD as a matter of data serialization and bytes.

Maybe we need to be honest about the status quo and the huge differentials its created and the risks therein, or we're going to be picking up big.Int scraps for years to come as we "just make it work" in JavaScript and any other stack that tries to follow the specifics opinions arising out of go-ipld-prime. @warpfork I understand where your head's at regarding this map ordering issue, for the most part, but you're not taking everyone else along for the ride, we're brute forcing these problems through the stack by means of go-ipld-prime without having a chance to solve them elsewhere. As I consider the problems of graphsync, and more specificially, selectors, in JavaScript I'm terrified at the rabbit holes of patching needed to make it work because nobody's going to allow any kind of official investment in mirroring the go-ipld-prime vision of IPLD in JavaScript. Even our specifications and most of our documentation are focused on serialization and avoid many of these issues of what it means for the "data model" to exist outside of serialization concerns. Even now the web3.storage team is puzzling over why CARs created by https://github.com/web3-storage/ipfs-car are different to those produced by ipfs dag export (with go-ipfs) ... The reasons are here, but how do they get addressed without patching over patches over patches for each new edge that doesn't quite fit.

aschmahmann · 2021-08-11T06:19:58Z

that building an IPLD node, then encoding/decoding from DAG-JSON or DAG-CBOR produces a different selector traversal order, which is unexpected to say the least.

Something I'd like to explore/poke at here a little is what the implications are of the above being considered a problem.

If we are given an IPLD node built with some node builder there is no way to know ahead of time whether we're going to be serializing it to DagJSON, DagCBOR, DagPB, GitRaw, EthTx, or AdinSillyFormat. Unless we were to absolutely preclude support for codecs that ordered any of their fields we'd be out of luck on round-tripping the ordering of anything that's not explicitly ordered (i.e. a list).

For example, say we have two codecs DagAdinAZ and DagAdinZA. The first orders maps from A to Z and the other does the reverse. When I use node builders to create an ipld.Node with a map builder that adds foo = 5 and bar = 6 there is no way that I can traverse that map in a way that would round-trip with both DagAdinAZ and DagAdinZA.

So how could we get out of this if we really wanted to be able to create nodes with mapbuilders that are traversed in order the same way as when encoded and decoded? These are what come to mind:

Have an IPLD library spec that defines a single sort order for map traversals that applies at the data model layer (i.e. above the codec layer)
Have go-ipld-prime refuse to run selectors against nodes that are not backed by a concrete object (i.e. you must traverse over data that's already been encoded to a particular codec)
Have users make sure to add their data to the mapbuilder in sorted order knowing which codec(s) they will be encoding to ahead of time and if the sort orders could conflict then create multiple different nodes with identical data (aside from sorting) one which will be serialized to each codec
Disallow support for any codec that has any defined field order

All of these seem like pretty bad options to me, although if anyone has alternative ideas or disagrees with the badness levels I'd be interested to hear.

This seems to then boil down to the data model only enforcing a universal sort (i.e. option 1) on a single data model type and that is List and not Map. IIUC in the case of the go-graphsync tests they are failing for a good reason which is that until encoded and decoded the ordering of maps cannot be determined. However, IIUC we should be able to deterministically traverse a CID+selector combination since our decoders aren't strict and will still parse non-sorted DagCBOR. This is essentially option 2, but localized to where it's needed (i.e. traversals over CIDs + selectors, since the CID is immutable and defines the encoding) as opposed to the library as a whole.

BigLep · 2021-10-29T06:38:50Z

What are the next steps here?

rvagg · 2021-10-29T07:04:04Z

I think this can be closed, discussion here is a good reference

test(add test demonstrating dag ordering issue):

80b26d5

hannahhoward added the kind/bug A bug in existing code (including security flaws) label Aug 5, 2021

warpfork mentioned this pull request Aug 25, 2021

test failure when updating to use upstream codecs which now enforce map entry sorting ceramicnetwork/go-dag-jose#3

Closed

Merge branch 'master' into issue/dag-ordering

1a0fbce

BigLep added the need/author-input Needs input from the original author label Oct 29, 2021

rvagg closed this Oct 29, 2021

mvdan deleted the issue/dag-ordering branch January 10, 2022 15:15

aschmahmann mentioned this pull request Mar 17, 2022

codec/dagcbor: add DecodeOptions.ExperimentalDeterminism #390

Merged

aschmahmann mentioned this pull request Jun 29, 2022

spec: initial WAC spec ipld/ipld#226

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DagJSON/DagCBOR encode+decode changes selector traversal order? #220

DagJSON/DagCBOR encode+decode changes selector traversal order? #220

hannahhoward commented Aug 5, 2021 •

edited

Loading

willscott commented Aug 5, 2021

warpfork commented Aug 10, 2021

mvdan commented Aug 10, 2021

mvdan commented Aug 10, 2021

rvagg commented Aug 11, 2021

aschmahmann commented Aug 11, 2021

BigLep commented Oct 29, 2021

rvagg commented Oct 29, 2021

DagJSON/DagCBOR encode+decode changes selector traversal order? #220

DagJSON/DagCBOR encode+decode changes selector traversal order? #220

Conversation

hannahhoward commented Aug 5, 2021 • edited Loading

willscott commented Aug 5, 2021

warpfork commented Aug 10, 2021

mvdan commented Aug 10, 2021

mvdan commented Aug 10, 2021

rvagg commented Aug 11, 2021

aschmahmann commented Aug 11, 2021

BigLep commented Oct 29, 2021

rvagg commented Oct 29, 2021

hannahhoward commented Aug 5, 2021 •

edited

Loading