Major API changes #50

vmx · 2018-11-22T16:59:53Z

The whole IPLD APIs get a review to make them more consistent and
easier to use.

The biggest change is that there's not resolve API anymore. From
now on you access the properties of the JavaScript objects directly.

Issues that were taken into account:

Remove isLink() method from formats #31
- Remove isLink() method from formats: isLink() is no
  longer needed as all links will be CID objects you can easily
  check for
Proposal: Move resolver to use CID instances for links #44
- Proposal: Move resolver to use CID instances for links: Not
  applicable anymore as resolve.resolve() is removed
properties: align spec with implementations #46
- properties: align spec with implementation: Covered by spec
Define how toJSON() should look like #49
- Define how toJSON() should look like: Binary and CIDs are
  defined with example
Implementation of nested objects #34
- Implementation of nested objects: Won’t fix as tree() is
  not part of the API anymore

Closes #48.

The whole IPLD APIs get a review to make them more consistent and easier to use. The biggest change is that there's not `resolve` API anymore. From now on you access the properties of the JavaScript objects directly. Issues that were taken into account: - #31 - [x] Remove `isLink()` method from formats: `isLink()` is no longer needed as all links will be CID objects you can easily check for - #44 - [x] Proposal: Move resolver to use CID instances for links: Not applicable anymore as `resolve.resolve()` is removed - #46 - [x] properties: align spec with implementation: Covered by spec - #49 - [x] Define how `toJSON()` should look like: Binary and CIDs are defined with example - #34 - [x] Implementation of nested objects: Won’t fix as `tree()` is not part of the API anymore Closes #48.

README.md

`toJSON()` is out of scope for this spec and can be implemented outside of it.

Renaming properties back to their original names. defaultHashCode -> defaultHashAlg codec - format The intention was to make clear that the type is a multicodec. But this shouldn't be done by the name of the property, but by a type description. Generally we should always use the codes and not the string identifiers of the multicodecs.

vmx · 2018-11-28T13:46:32Z

@mikeal I've removed toJSON(). I'd like your feedback especially on using the multicodec codecs byte values and not their string identifiers. I think we should do that in all of our APIs. What do you think?

README.md

mikeal · 2018-11-28T18:35:15Z

@vmx I like that change, I think it has some other implications I've noted inline.

Instead of requiring them to be in the table, make it recommendation. So it's still possible to have your custom ones.

vmx · 2018-11-28T23:20:03Z

@mikeal I also changed the hash codec to "should".

Once this change is approved, I'd like to merge it myself as I'd like to get a single clean commit and I need to edit the commit message a bit :)

mikeal · 2018-11-29T00:10:30Z

@vmx approved :)

vmx · 2018-11-29T00:11:44Z

@daviddias As this is a major change I'd like to get your approval before I merge it too.

daviddias · 2018-11-29T08:06:43Z

README.md

-    - [`util.cid(binaryBlob[, options], callback)`](#utilcidbinaryblob-options-callback)
-  - [Local resolver methods](#local-resolver-methods)
-    - [`resolver.resolve(binaryBlob, path, callback)`](#resolverresolvebinaryblob-path-callback)
-    - [`resolver.tree(binaryBlob, callback)`](#resolvertreebinaryblob-callback)


Where did these methods go?

I would like to see the new js-ipld with the formats updated and resolving through things with all of these updates. It seems that we are preemptively removing things but I might be missing something.

From the commit message:

The biggest change is that there's not resolve API anymore. From
now on you access the properties of the JavaScript objects directly.

So instead of resolve.resolve(binaryBlob, '/deeply/nested/path)' you deserialize the node first (which resolve also needs to do) and then on the resolved node, let's cal it ipldNodeyou doipldNode.deeply.nested.path`.

For tree()-like functionality (if needed) you do what dag-json is doing.

So now it is the user that needs to convert an IPLD Path into JS notation?

Note that dag-pb supported resolve as in cid/name-of-the-link-to-the-other-node and cid/links/0/hash/. It was basically packing a transformation inside the format. It will be an extra breaking change.

That's the same behaviour as the current implementation

Hmm, could we make it better?

You could implement the same semantics with JavaScript Proxies if needed.

What's the browser support for that like? I found babel-plugin-proxy but the README says it's got really bad performance and don't recommend using it in production environments.

Hmm, could we make it better?

I would get rid of the named links traversal (I call it "IPFS traversal") or put it under a field (e.g. ipfs) so it won't collide. Though I think being just as broken as we are currently is good enough for now, especially since this is again dag-pb specific and shouldn't be a problem with other formats.

What's the browser support for that like?

Pretty good: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Proxy#Browser_compatibility

I would get rid of the named links traversal

This is not a crazy idea. It's only of limited use for IPFS because there isn't a one-to-one mapping between IPFS paths and IPLD paths for HAMT shards.

E.g.

resolve.resolve(binaryBlob, '/deeply/nested/path.txt')

Might end up needing to be

resolve.resolve(binaryBlob, '/deeply/A0/01/FC-nested/path.txt')

..and you don't know what the prefix for each segment is going to be until you descend into that level of the trie as it's dependent on the siblings.

I mostly kept the named links to show that such things are still possible.

I personally do think that move to JS based traversal has following disadvantages:

As @achingbrain pointed out that implies that that whole node is de-serialized in one step. Sure JS proxies could be utilized but they are very slow and are really difficult to get right as in ensure that they you don't end up creating memory leaks (we used them a lot for cross compartment boundaries in gecko & in my opinion for non trivial cases e.g process.env they're not worth it).

I have being wanted to implement ipld resolver for flatbuffers who's primary advantage over proto-buffs is laziness - only relevant fields can be decoded without instantiating objects for intermediaries. It seems like with this move taking advantage of this would be impossible.

Proxies only work if you can decode everything synchronously, if that is not the case then you wind up with bunch of promises you have to await on to get to the nested data, which IMO is less convenient than it is to provide a path.

vmx · 2018-12-06T14:55:34Z

After implementing this spec for dag-pb I learned about enumerable properties. Hence I'd like to add this paragraph to the deserialize() section:

All enumerable properties (the ones that are returned by a Object.keys() call) of the deserialized object are considered for resolving IPLD Paths. They must only return values whose type is one of the IPLD Data Model.

What do others think about this? /cc @mikeal @daviddias

achingbrain · 2018-12-10T12:42:07Z

Another thing to consider - the performance analysis done by @mcollina and the NearForm team has revealed we spend a lot of time waiting for artificially induced asynchronous behaviour to resolve (basically setImmediate and friends).

If the serialization/deserialization can be done synchronously we should consider making these synchronous methods instead of taking callbacks or returning promises.

The alternative is to invoke callbacks synchronously or use process.nextTick but we run the risk of releasing Zalgo - better to just examine whether async is necessary at all in this case.

mikeal · 2018-12-10T18:11:29Z

FYI, I wrote a synchronous version of dag-cbor in JS. All of our sync code in that module is self-induced :) https://github.com/mikeal/dag-cbor-sync

vmx · 2018-12-11T09:52:41Z

If the serialization/deserialization can be done synchronously we should consider making these synchronous methods instead of taking callbacks or returning promises.

Are already resolved promises a big bottleneck? Promises seem like a good trade-off to make async stuff possible.

mcollina · 2018-12-11T09:58:24Z

Are already resolved promises a big bottleneck? Promises seem like a good trade-off to make async stuff possible.

That will cause the value to be deferred in the microtask queue. In practice, the impact will be similar to process.nextTick. It is better than a full setImmediate as that is triggered after a full run of the event loop.

Note that await Promise.resolve(42) causes multiple tasks (4 in Node v10) to be scheduled in the microtask queue. See https://www.youtube.com/watch?v=DFP5DKDQfOc for more details.

daviddias · 2018-12-12T09:08:24Z

@vmx following your request for the review of this proposal.

This is such a massive change that I don't believe it is wise to merge this one without having sibling PRs to all of its dependents in IPFS land. Can you take the ownership as a builder of IPLD to test the assumptions made by this PR through creating PRs to all the IPFS modules that will require changes. This will give you a complete understanding of how this will impact its usage and also get us ready to merge everything in one swipe.

With these PRs, the Lead Maintainers of which module will also be able to observe how this affects the rest of the code and give direct feedback accordingly.

Create an awesome-endeavour to track this work.

vmx · 2018-12-12T09:49:58Z

@daviddias Of course there will still be changes once things get implemented. For me approval means "looks good for me, I don't see any major problems" and I think that's what I got from you now.

daviddias · 2018-12-12T10:18:58Z

@vmx you have my 👍 of "LGTM, but do try it first with all the other repos before merging and creating a massive breaking change" :)

vmx · 2019-03-18T16:36:13Z

@Gozala I'd like your feedback on this one :)

Gozala · 2019-03-18T18:16:54Z

README.md

-    - [`util.cid(binaryBlob[, options], callback)`](#utilcidbinaryblob-options-callback)
-  - [Local resolver methods](#local-resolver-methods)
-    - [`resolver.resolve(binaryBlob, path, callback)`](#resolverresolvebinaryblob-path-callback)
-    - [`resolver.tree(binaryBlob, callback)`](#resolvertreebinaryblob-callback)


I personally do think that move to JS based traversal has following disadvantages:

As @achingbrain pointed out that implies that that whole node is de-serialized in one step. Sure JS proxies could be utilized but they are very slow and are really difficult to get right as in ensure that they you don't end up creating memory leaks (we used them a lot for cross compartment boundaries in gecko & in my opinion for non trivial cases e.g process.env they're not worth it).

I have being wanted to implement ipld resolver for flatbuffers who's primary advantage over proto-buffs is laziness - only relevant fields can be decoded without instantiating objects for intermediaries. It seems like with this move taking advantage of this would be impossible.

Proxies only work if you can decode everything synchronously, if that is not the case then you wind up with bunch of promises you have to await on to get to the nested data, which IMO is less convenient than it is to provide a path.

Gozala · 2019-03-18T18:19:10Z

README.md

-    - [`util.cid(binaryBlob[, options], callback)`](#utilcidbinaryblob-options-callback)
-  - [Local resolver methods](#local-resolver-methods)
-    - [`resolver.resolve(binaryBlob, path, callback)`](#resolverresolvebinaryblob-path-callback)
-    - [`resolver.tree(binaryBlob, callback)`](#resolvertreebinaryblob-callback)


Large naming changes might be out of the scope here, but if it isn't I would favor following terminology:

Gozala · 2019-03-18T18:19:23Z

README.md

-  - [Local resolver methods](#local-resolver-methods)
-    - [`resolver.resolve(binaryBlob, path, callback)`](#resolverresolvebinaryblob-path-callback)
-    - [`resolver.tree(binaryBlob, callback)`](#resolvertreebinaryblob-callback)
+  - [`serialize(IpldNode)`](#serializeipldnode)


encode(node)

Gozala · 2019-03-18T18:19:36Z

README.md

-    - [`resolver.resolve(binaryBlob, path, callback)`](#resolverresolvebinaryblob-path-callback)
-    - [`resolver.tree(binaryBlob, callback)`](#resolvertreebinaryblob-callback)
+  - [`serialize(IpldNode)`](#serializeipldnode)
+  - [`deserialize(binaryBlob)`](#deserializebinaryblob)


decode(node)

Gozala · 2019-03-18T18:19:47Z

README.md

  - [Properties](#properties)
    - [`defaultHashAlg`](#defaulthashalg)
-    - [`multicodec`](#multicodec)
+    - [`format`](#format)


Gozala · 2019-03-18T18:27:58Z

README.md


-> serializes a dagNode of an IPLD format into its binary format
+### `serialize(IpldNode)`


Maybe serialize(IpldNode):Uint8Array|Promise<Uint8Array> ?

Gozala · 2019-03-18T18:30:09Z

README.md


-> deserializes a binary blob into the instance
+Returns a Promise containing a `Buffer` with the serialized version of the given IPLD Node.


I would really encourage to remove requirement to return Buffer in favor of Uint8Array. This requirement makes browser use case unnecessarily complicated, and in practice isn't really necessary. Other IPFS libraries can do the casting where necessary.

P.S. node Buffer is also a valid Uint8Array so node case will require no change.

I would love if there was some word about errors as well. Specifically how decoder can return error and how that error propagates.

Moving from Node.js to Browser based primitives is certainly on my todo list, but out of scope for these API changes. I don't want to do everything at once.

Gozala · 2019-03-18T18:39:07Z

README.md


-> resolves a path in block, returns the value and or a link and the partial missing path. This way the IPLD Resolver can fetch the link and continue to resolve.
+> Return the CID of the binary blob.


This is more of a question.

From what I can tell implementation of the cid is pretty much the same across the board. Is there reason to require implementation of cid in first place ? Could this be made optional, so only implementation that need a custom behavior can provide it ?

An IPLD Format has all the information it needs to calculate the CID. The multicodec code as well as the hash. So you can call cid() from layers that build on top of IPLD Formats without knowing those details.

Gozala · 2019-03-18T18:50:08Z

README.md

-Options include:
-  - version - the CID version to be used (defaults to 1)
-  - hashAlg - the hash algorithm to be used (default to the one set by the format)
+The result is a JavaScript object, which may also be a Proxy object in case the data shouldn’t be deserialized as a whole. Its fields are the public API that can be resolved through. It’s up to the format to add convenient methods for manipulating the data. The returned object may also be a Proxy object in case the data shouldn’t be deserialized as a whole.


I did wrote more about this above but I think I want add one more note here - Requirement for implementation to provide a proxy just to avoid full decoding makes a high bar, in fact I would argue that custom DataView class with getters is probably a far better alternative. However that still implies 5 allocations when accessing something like /deeply/A0/01/FC-nested/path.txt which may seem insignificant but in some cases (e.g. representing Virtual DOM changelists in dominion library) that makes a difference between fast and unusable performance.

It's not a requirement, "may also be a Proxy object", but I'm very negative on Proxies regardless. Aside from performance concerns, I think we've learned from many previous attempts in other languages that things like Proxies should be reserved for metaprogramming and other reflection use-cases rather than general magical data structures. Pitfalls abound.
Perhaps the way forward is to remove the reference to proxies completely and leave it up to each format and allow experimentation.

Alternatively go the container route which is much nicer--I'm not sure DataView is the right container, maybe a bit too low-level, something that feels more like a Map might be more accessible. In fact, a whole custom Map-like that you could .toObject() to fully instantiate as a non-lazy object representation might provide the best of both worlds?

let data = await format.deserialize(blob) data.get('foo') // could be lazy data.has('/deeply/A0/01/FC-nested/path.txt') // could be lazy data.toObject() // skip laziness and recursively instantiate // { foo: 'bar', deeply: { A0: { '01': { 'FC-nested': { 'path.txt': UInt8Array<00, 00, 00, 00> } } } } }

It would require specification, and even provision of a custom class, maybe just a simple extension of Map.

For deserialization where you know the result is cheap and you want it all you could just instantiate it in the same call:

let copy = Object.extend((await format.deserialize(blob)).toObject(), { lastUpdate: new Date() })

Map { get (key/path) has (key/path) toObject() put() // throws? is mutability helpful here? maybe until it starts to hit awkward edges }

Are there possible use-cases where getters might need to be async or is it safe to frontload that in decode()?

I think you may have misunderstood what I meant by DataView - just a container tgat wraps underlying buffer and provides getters to access / decode fields on demand get key() {. So pretty much what you described as Map except named getters on prototype instead of get(key)

Perhaps the way forward is to remove the reference to proxies completely and leave it up to each format and allow experimentation.

The whole point of IPLD is to provide common interface for traversing linked data no ? If each implementation is free to design one, the common interface is gone & I think that’s a step in a wrong direction - I much rather have common imperfect interface that can be used across arbitrary formats than have to learn each one

just a container tgat wraps underlying buffer and provides getters to access / decode fields on demand

But this then won't be lazy for nested objects, will it?

(2) usability for implementer (lower complexity)

For implementing eager case this will make no difference. For lazy case avoiding proxies and lazy getters would be significantly simpler than needing to have them

let a = await deserialize(blob).nested let b = await a.b let c = await b.c

When I wrote my initial response in this thread I was trying to imagine a use-case where the properties would need to be asynchronously accessed and I couldn't see an obvious one except for very large blobs that need to be stored on disk--but those wouldn't be passed in to a deserialize(blob) call in the first place. Maybe there's a case if there was some really heavy crypto applied to sections of a blob, but even then it's questionable. Async computational tasks are usually made pointless by the boundary crossing problems and is why async methods in Node's 'crypto' module have been mostly resisted, computationally intensive thread offloading can't be demonstrated except for very heavy work. The browser is a slightly different case because you're not maximising core use as with a server and responsiveness is important, maybe that's a worthy argument? I'm doubtful.

So again, it looks like premature optimisation in search of a use-case.

(2) usability for implementer (lower complexity)

For implementing eager case this will make no difference. For lazy case avoiding proxies and lazy getters would be significantly simpler than needing to have them

Agreed, and it'll help make quicker progress on this PR!

I care about JS implementation in browser context (on server go is likely a better choice anyway). In browser crypto APIs are async (for good reasons) ignoring that seems strange to me.

And honestly it’s not about optimizations, just about not making currently well supported use cases second class.

P.S. I am working on crypto heavy use case for browsers right now and this API channge will require me to switch to handrolled solution.

@Gozala ok this is good context, excuse my ignorance. In your use case, is it necessary that the crypto be handled at the format layer or is it reasonable to push it up to ipld itself? I was imagining ipld/ipld#64 being something in ipld and the blocks just providing the encrypted binary data. Perhaps an encrypted block coming from a format gets transposed into an unencrypted ipld block where the key is provided but this happens at the ipld layer, not in the format. Or are you seeing a different mechanism? Again, I'm fairly new to thinking about this stuff so additional insight is appreciated!

I've created #51 which hopefully accommodates all the points brought up here. It's a smaller change to the current API. Let's have a discussion over there.

README.md

vmx · 2019-03-25T15:20:24Z

I'm closing this one to prevent confusion. The current discussion on an improved API are at #51.

vmx requested review from mikeal and daviddias November 22, 2018 16:59

ghost assigned vmx Nov 22, 2018

ghost added the in progress label Nov 22, 2018

mikeal reviewed Nov 26, 2018

View reviewed changes

README.md Outdated Show resolved Hide resolved

vmx added 3 commits November 28, 2018 11:52

Remove toJSON()

190de47

`toJSON()` is out of scope for this spec and can be implemented outside of it.

Update TOC

888cd27

vmx mentioned this pull request Nov 28, 2018

Define how toJSON() should look like #49

Closed

mikeal reviewed Nov 28, 2018

View reviewed changes

README.md Outdated Show resolved Hide resolved

Multicodecs should be in the Multicodecs table

3e79d97

Instead of requiring them to be in the table, make it recommendation. So it's still possible to have your custom ones.

mikeal approved these changes Nov 29, 2018

View reviewed changes

daviddias reviewed Nov 29, 2018

View reviewed changes

vmx mentioned this pull request Dec 6, 2018

WIP: new ipld format api ipld/js-ipld-dag-pb#105

Closed

Improve cid() documentation a bit

55585d4

daviddias mentioned this pull request Dec 12, 2018

docs: Add notes for JS core dev meeting 10th December 2018 ipfs/team-mgmt#795

Merged

This was referenced Jan 14, 2019

Awesome Endeavour: Async Iterators ipfs/js-ipfs#1670

Closed

chore: change from callbacks to promises ipld/js-ipld-git#38

Closed

alanshaw approved these changes Jan 25, 2019

View reviewed changes

vmx mentioned this pull request Mar 6, 2019

Provide API to just encode / decode data without publishing a block ipld/js-ipld#194

Open

Gozala reviewed Mar 18, 2019

View reviewed changes

rvagg reviewed Mar 19, 2019

View reviewed changes

README.md Outdated Show resolved Hide resolved

vmx added 2 commits March 19, 2019 16:59

Replace "block" with "blob"

f04424d

Clarify details about the deserialized object

bf2184d

Gozala mentioned this pull request Mar 21, 2019

IPLD Format API cleanups #51

Merged

vmx closed this Mar 25, 2019

ghost removed the in progress label Mar 25, 2019

Gozala mentioned this pull request Apr 4, 2019

Ability to decode only relevant fields ipld/js-ipld-stack#3

Closed


		> serializes a dagNode of an IPLD format into its binary format
		### `serialize(IpldNode)`


		> deserializes a binary blob into the instance
		Returns a Promise containing a `Buffer` with the serialized version of the given IPLD Node.


		> resolves a path in block, returns the value and or a link and the partial missing path. This way the IPLD Resolver can fetch the link and continue to resolve.
		> Return the CID of the binary blob.

Major API changes #50

Major API changes #50

Conversation

vmx commented Nov 22, 2018

vmx commented Nov 28, 2018

mikeal commented Nov 28, 2018

vmx commented Nov 28, 2018

mikeal commented Nov 29, 2018

vmx commented Nov 29, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

daviddias Nov 29, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

achingbrain Dec 11, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Gozala Mar 18, 2019 • edited Loading

Choose a reason for hiding this comment

vmx commented Dec 6, 2018

achingbrain commented Dec 10, 2018

mikeal commented Dec 10, 2018

vmx commented Dec 11, 2018

mcollina commented Dec 11, 2018

daviddias commented Dec 12, 2018

vmx commented Dec 12, 2018

daviddias commented Dec 12, 2018

vmx commented Mar 18, 2019

Gozala Mar 18, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vmx commented Mar 25, 2019

daviddias Nov 29, 2018 •

edited

Loading

achingbrain Dec 11, 2018 •

edited

Loading

Gozala Mar 18, 2019 •

edited

Loading

Gozala Mar 18, 2019 •

edited

Loading