-
Notifications
You must be signed in to change notification settings - Fork 5
Conversation
The whole IPLD APIs get a review to make them more consistent and easier to use. The biggest change is that there's not `resolve` API anymore. From now on you access the properties of the JavaScript objects directly. Issues that were taken into account: - #31 - [x] Remove `isLink()` method from formats: `isLink()` is no longer needed as all links will be CID objects you can easily check for - #44 - [x] Proposal: Move resolver to use CID instances for links: Not applicable anymore as `resolve.resolve()` is removed - #46 - [x] properties: align spec with implementation: Covered by spec - #49 - [x] Define how `toJSON()` should look like: Binary and CIDs are defined with example - #34 - [x] Implementation of nested objects: Won’t fix as `tree()` is not part of the API anymore Closes #48.
`toJSON()` is out of scope for this spec and can be implemented outside of it.
Renaming properties back to their original names. defaultHashCode -> defaultHashAlg codec - format The intention was to make clear that the type is a multicodec. But this shouldn't be done by the name of the property, but by a type description. Generally we should always use the codes and not the string identifiers of the multicodecs.
@mikeal I've removed |
@vmx I like that change, I think it has some other implications I've noted inline. |
Instead of requiring them to be in the table, make it recommendation. So it's still possible to have your custom ones.
@mikeal I also changed the hash codec to "should". Once this change is approved, I'd like to merge it myself as I'd like to get a single clean commit and I need to edit the commit message a bit :) |
@vmx approved :) |
@daviddias As this is a major change I'd like to get your approval before I merge it too. |
- [`util.cid(binaryBlob[, options], callback)`](#utilcidbinaryblob-options-callback) | ||
- [Local resolver methods](#local-resolver-methods) | ||
- [`resolver.resolve(binaryBlob, path, callback)`](#resolverresolvebinaryblob-path-callback) | ||
- [`resolver.tree(binaryBlob, callback)`](#resolvertreebinaryblob-callback) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where did these methods go?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to see the new js-ipld
with the formats updated and resolving through things with all of these updates. It seems that we are preemptively removing things but I might be missing something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the commit message:
The biggest change is that there's not
resolve
API anymore. From
now on you access the properties of the JavaScript objects directly.
So instead of resolve.resolve(binaryBlob, '/deeply/nested/path)' you deserialize the node first (which resolve also needs to do) and then on the resolved node, let's cal it
ipldNodeyou do
ipldNode.deeply.nested.path`.
For tree()
-like functionality (if needed) you do what dag-json is doing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So now it is the user that needs to convert an IPLD Path into JS notation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that dag-pb supported resolve as in cid/name-of-the-link-to-the-other-node
and cid/links/0/hash/
. It was basically packing a transformation inside the format. It will be an extra breaking change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's the same behaviour as the current implementation
Hmm, could we make it better?
You could implement the same semantics with JavaScript Proxies if needed.
What's the browser support for that like? I found babel-plugin-proxy but the README says it's got really bad performance and don't recommend using it in production environments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, could we make it better?
I would get rid of the named links traversal (I call it "IPFS traversal") or put it under a field (e.g. ipfs
) so it won't collide. Though I think being just as broken as we are currently is good enough for now, especially since this is again dag-pb specific and shouldn't be a problem with other formats.
What's the browser support for that like?
Pretty good: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Proxy#Browser_compatibility
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would get rid of the named links traversal
This is not a crazy idea. It's only of limited use for IPFS because there isn't a one-to-one mapping between IPFS paths and IPLD paths for HAMT shards.
E.g.
resolve.resolve(binaryBlob, '/deeply/nested/path.txt')
Might end up needing to be
resolve.resolve(binaryBlob, '/deeply/A0/01/FC-nested/path.txt')
..and you don't know what the prefix for each segment is going to be until you descend into that level of the trie as it's dependent on the siblings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mostly kept the named links to show that such things are still possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally do think that move to JS based traversal has following disadvantages:
-
As @achingbrain pointed out that implies that that whole node is de-serialized in one step. Sure JS proxies could be utilized but they are very slow and are really difficult to get right as in ensure that they you don't end up creating memory leaks (we used them a lot for cross compartment boundaries in gecko & in my opinion for non trivial cases e.g
process.env
they're not worth it).I have being wanted to implement ipld resolver for flatbuffers who's primary advantage over proto-buffs is laziness - only relevant fields can be decoded without instantiating objects for intermediaries. It seems like with this move taking advantage of this would be impossible.
Proxies only work if you can decode everything synchronously, if that is not the case then you wind up with bunch of promises you have to await on to get to the nested data, which IMO is less convenient than it is to provide a path.
After implementing this spec for dag-pb I learned about enumerable properties. Hence I'd like to add this paragraph to the
What do others think about this? /cc @mikeal @daviddias |
Another thing to consider - the performance analysis done by @mcollina and the NearForm team has revealed we spend a lot of time waiting for artificially induced asynchronous behaviour to resolve (basically If the serialization/deserialization can be done synchronously we should consider making these synchronous methods instead of taking callbacks or returning promises. The alternative is to invoke callbacks synchronously or use |
FYI, I wrote a synchronous version of |
Are already resolved promises a big bottleneck? Promises seem like a good trade-off to make async stuff possible. |
That will cause the value to be deferred in the microtask queue. In practice, the impact will be similar to Note that |
@vmx following your request for the review of this proposal. This is such a massive change that I don't believe it is wise to merge this one without having sibling PRs to all of its dependents in IPFS land. Can you take the ownership as a builder of IPLD to test the assumptions made by this PR through creating PRs to all the IPFS modules that will require changes. This will give you a complete understanding of how this will impact its usage and also get us ready to merge everything in one swipe. With these PRs, the Lead Maintainers of which module will also be able to observe how this affects the rest of the code and give direct feedback accordingly. Create an |
@daviddias Of course there will still be changes once things get implemented. For me approval means "looks good for me, I don't see any major problems" and I think that's what I got from you now. |
@vmx you have my 👍 of "LGTM, but do try it first with all the other repos before merging and creating a massive breaking change" :) |
@Gozala I'd like your feedback on this one :) |
- [`util.cid(binaryBlob[, options], callback)`](#utilcidbinaryblob-options-callback) | ||
- [Local resolver methods](#local-resolver-methods) | ||
- [`resolver.resolve(binaryBlob, path, callback)`](#resolverresolvebinaryblob-path-callback) | ||
- [`resolver.tree(binaryBlob, callback)`](#resolvertreebinaryblob-callback) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally do think that move to JS based traversal has following disadvantages:
-
As @achingbrain pointed out that implies that that whole node is de-serialized in one step. Sure JS proxies could be utilized but they are very slow and are really difficult to get right as in ensure that they you don't end up creating memory leaks (we used them a lot for cross compartment boundaries in gecko & in my opinion for non trivial cases e.g
process.env
they're not worth it).I have being wanted to implement ipld resolver for flatbuffers who's primary advantage over proto-buffs is laziness - only relevant fields can be decoded without instantiating objects for intermediaries. It seems like with this move taking advantage of this would be impossible.
Proxies only work if you can decode everything synchronously, if that is not the case then you wind up with bunch of promises you have to await on to get to the nested data, which IMO is less convenient than it is to provide a path.
- [`util.cid(binaryBlob[, options], callback)`](#utilcidbinaryblob-options-callback) | ||
- [Local resolver methods](#local-resolver-methods) | ||
- [`resolver.resolve(binaryBlob, path, callback)`](#resolverresolvebinaryblob-path-callback) | ||
- [`resolver.tree(binaryBlob, callback)`](#resolvertreebinaryblob-callback) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Large naming changes might be out of the scope here, but if it isn't I would favor following terminology:
- [Local resolver methods](#local-resolver-methods) | ||
- [`resolver.resolve(binaryBlob, path, callback)`](#resolverresolvebinaryblob-path-callback) | ||
- [`resolver.tree(binaryBlob, callback)`](#resolvertreebinaryblob-callback) | ||
- [`serialize(IpldNode)`](#serializeipldnode) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
encode(node)
- [`resolver.resolve(binaryBlob, path, callback)`](#resolverresolvebinaryblob-path-callback) | ||
- [`resolver.tree(binaryBlob, callback)`](#resolvertreebinaryblob-callback) | ||
- [`serialize(IpldNode)`](#serializeipldnode) | ||
- [`deserialize(binaryBlob)`](#deserializebinaryblob) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
decode(node)
- [Properties](#properties) | ||
- [`defaultHashAlg`](#defaulthashalg) | ||
- [`multicodec`](#multicodec) | ||
- [`format`](#format) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
codec
|
||
> serializes a dagNode of an IPLD format into its binary format | ||
### `serialize(IpldNode)` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe serialize(IpldNode):Uint8Array|Promise<Uint8Array>
?
|
||
> deserializes a binary blob into the instance | ||
Returns a Promise containing a `Buffer` with the serialized version of the given IPLD Node. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would really encourage to remove requirement to return Buffer in favor of Uint8Array. This requirement makes browser use case unnecessarily complicated, and in practice isn't really necessary. Other IPFS libraries can do the casting where necessary.
P.S. node Buffer is also a valid Uint8Array so node case will require no change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would love if there was some word about errors as well. Specifically how decoder can return error and how that error propagates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moving from Node.js to Browser based primitives is certainly on my todo list, but out of scope for these API changes. I don't want to do everything at once.
README.md
Outdated
|
||
> resolves a path in block, returns the value and or a link and the partial missing path. This way the IPLD Resolver can fetch the link and continue to resolve. | ||
> Return the CID of the binary blob. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is more of a question.
From what I can tell implementation of the cid
is pretty much the same across the board. Is there reason to require implementation of cid
in first place ? Could this be made optional, so only implementation that need a custom behavior can provide it ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An IPLD Format has all the information it needs to calculate the CID. The multicodec code as well as the hash. So you can call cid()
from layers that build on top of IPLD Formats without knowing those details.
Options include: | ||
- version - the CID version to be used (defaults to 1) | ||
- hashAlg - the hash algorithm to be used (default to the one set by the format) | ||
The result is a JavaScript object, which may also be a Proxy object in case the data shouldn’t be deserialized as a whole. Its fields are the public API that can be resolved through. It’s up to the format to add convenient methods for manipulating the data. The returned object may also be a Proxy object in case the data shouldn’t be deserialized as a whole. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did wrote more about this above but I think I want add one more note here - Requirement for implementation to provide a proxy just to avoid full decoding makes a high bar, in fact I would argue that custom DataView
class with getters is probably a far better alternative. However that still implies 5 allocations when accessing something like /deeply/A0/01/FC-nested/path.txt
which may seem insignificant but in some cases (e.g. representing Virtual DOM changelists in dominion library) that makes a difference between fast and unusable performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not a requirement, "may also be a Proxy object", but I'm very negative on Proxies regardless. Aside from performance concerns, I think we've learned from many previous attempts in other languages that things like Proxies should be reserved for metaprogramming and other reflection use-cases rather than general magical data structures. Pitfalls abound.
Perhaps the way forward is to remove the reference to proxies completely and leave it up to each format and allow experimentation.
Alternatively go the container route which is much nicer--I'm not sure DataView
is the right container, maybe a bit too low-level, something that feels more like a Map might be more accessible. In fact, a whole custom Map-like that you could .toObject()
to fully instantiate as a non-lazy object representation might provide the best of both worlds?
let data = await format.deserialize(blob)
data.get('foo') // could be lazy
data.has('/deeply/A0/01/FC-nested/path.txt') // could be lazy
data.toObject() // skip laziness and recursively instantiate
// { foo: 'bar', deeply: { A0: { '01': { 'FC-nested': { 'path.txt': UInt8Array<00, 00, 00, 00> } } } } }
It would require specification, and even provision of a custom class, maybe just a simple extension of Map
.
For deserialization where you know the result is cheap and you want it all you could just instantiate it in the same call:
let copy = Object.extend((await format.deserialize(blob)).toObject(), { lastUpdate: new Date() })
Map {
get (key/path)
has (key/path)
toObject()
put() // throws? is mutability helpful here? maybe until it starts to hit awkward edges
}
Are there possible use-cases where getters might need to be async or is it safe to frontload that in decode()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you may have misunderstood what I meant by DataView
- just a container tgat wraps underlying buffer and provides getters to access / decode fields on demand get key() {
. So pretty much what you described as Map except named getters on prototype instead of get(key)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps the way forward is to remove the reference to proxies completely and leave it up to each format and allow experimentation.
The whole point of IPLD is to provide common interface for traversing linked data no ? If each implementation is free to design one, the common interface is gone & I think that’s a step in a wrong direction - I much rather have common imperfect interface that can be used across arbitrary formats than have to learn each one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a container tgat wraps underlying buffer and provides getters to access / decode fields on demand
But this then won't be lazy for nested objects, will it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(2) usability for implementer (lower complexity)
For implementing eager case this will make no difference. For lazy case avoiding proxies and lazy getters would be significantly simpler than needing to have them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let a = await deserialize(blob).nested
let b = await a.b
let c = await b.c
When I wrote my initial response in this thread I was trying to imagine a use-case where the properties would need to be asynchronously accessed and I couldn't see an obvious one except for very large blobs that need to be stored on disk--but those wouldn't be passed in to a deserialize(blob)
call in the first place. Maybe there's a case if there was some really heavy crypto applied to sections of a blob, but even then it's questionable. Async computational tasks are usually made pointless by the boundary crossing problems and is why async methods in Node's 'crypto'
module have been mostly resisted, computationally intensive thread offloading can't be demonstrated except for very heavy work. The browser is a slightly different case because you're not maximising core use as with a server and responsiveness is important, maybe that's a worthy argument? I'm doubtful.
So again, it looks like premature optimisation in search of a use-case.
(2) usability for implementer (lower complexity)
For implementing eager case this will make no difference. For lazy case avoiding proxies and lazy getters would be significantly simpler than needing to have them
Agreed, and it'll help make quicker progress on this PR!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I care about JS implementation in browser context (on server go is likely a better choice anyway). In browser crypto APIs are async (for good reasons) ignoring that seems strange to me.
And honestly it’s not about optimizations, just about not making currently well supported use cases second class.
P.S. I am working on crypto heavy use case for browsers right now and this API channge will require me to switch to handrolled solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Gozala ok this is good context, excuse my ignorance. In your use case, is it necessary that the crypto be handled at the format layer or is it reasonable to push it up to ipld itself? I was imagining ipld/ipld#64 being something in ipld and the blocks just providing the encrypted binary data. Perhaps an encrypted block coming from a format gets transposed into an unencrypted ipld block where the key is provided but this happens at the ipld layer, not in the format. Or are you seeing a different mechanism? Again, I'm fairly new to thinking about this stuff so additional insight is appreciated!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've created #51 which hopefully accommodates all the points brought up here. It's a smaller change to the current API. Let's have a discussion over there.
I'm closing this one to prevent confusion. The current discussion on an improved API are at #51. |
The whole IPLD APIs get a review to make them more consistent and
easier to use.
The biggest change is that there's not
resolve
API anymore. Fromnow on you access the properties of the JavaScript objects directly.
Issues that were taken into account:
isLink()
method from formats #31isLink()
method from formats:isLink()
is nolonger needed as all links will be CID objects you can easily
check for
applicable anymore as
resolve.resolve()
is removedtoJSON()
should look like #49toJSON()
should look like: Binary and CIDs aredefined with example
tree()
isnot part of the API anymore
Closes #48.