-
Notifications
You must be signed in to change notification settings - Fork 5
Proposal: Changes to IPLD interface to support chunked nodes. #41
Comments
A few more thoughts on "sharding vs chunking." There is clearly a case where, when someone has millions of links, or even just thousands of links in a node they routinely need to update, the user will need some form of sharding. That case clearly needs to be handled in a layer above the IPLD dag implementation. But, keep something in mind: a user has no way of knowing how large a new node they are creating will be until they try to create it.
There is a threshold where an implementation will want to say, or potentially ask the user to configure, "I'm not going to load a node this size into memory." But that threshold varies wildly between different peers and in most cases is larger than the current 2MB limit we need to maintain for bitswap. If we don't address this at this layer, if we lump all of these cases into the same higher level sharding bucket, we're making the lower level primitives unusable for most cases. I'm working with the github archive data, and there's just a few places where the edge nodes need to hold a few thousand links. Because of these few edge cases, among millions of edge nodes, I would have to implement sharding for all edge nodes in order to create predictable paths. We would end up pushing a lot of applications onto higher order abstractions, like MFS, or potentially just push them into other stacks outside of IPLD all together. |
Thanks for the proposal @mikeal. The 2MB limitation from Bitswap is kind of arbitrary and not really related to IPLD. It's on a different level, the transport. So from the IPLD API perspective there should always be a 1:1 mapping between a CID and a block, no matter how big that block is. So if you store a Git object, which is bigger than 2MB, that needs to be possible, without changing the hash of it. If you would chunk it on an IPLD level, then the hash would be different. It might make sense to perhaps even chunk the data on a storage level. Though before putting thoughts into that, I would rather spend time solving the transport layer (Bitswap) problem on how to transfer large IPLD objects in a chunked manner. |
That's fair, although I will note that the larger the block the less efficient any transport will be able to retrieve it. Since there is only one hash for the entire block the smallest unit you can validate is the whole block, so if you try to grab different parts of the block from different peers you can't validate anything until you get the whole block and if a peer gave you invalid data you don't know which peer did it. That's why bittorrent clients only get one part per peer, and if they decide to get a part from another peer they just request the whole part. Aside from the "splitting node into parts" use case, is there a performance case for altering the |
Good points about the splitting. Storing objects of any size while keeping their hash needs to be supported for e.g. Git. But I agree there should be some layer for automatic splitting. And you raise valid points about where this splitting should be. Within IPLD or in a layer above. I need to put more thought into this. Having |
So, we thought long and hard about adding sharding and chunking to IPLD but decided against it to try to try to keep the layer as simple as possible (it's already pretty complex). However, we would like a tiny layer on-top-of IPLD that implements automatic sharding.
I've discussed some solutions here: https://discuss.ipfs.io/t/git-on-ipfs-links-and-references/730/4 However, for new systems, my ideal solution would be a second layer that transforms a sharded DAG into an unsharded DAG. |
I think I'm getting a little confused about what "in IPLD" vs "on-top-of IPLD" means in this context. To me, "a tiny layer on-top-of IPLD that implements automatic sharding" is basically what https://github.com/mikeal/alt-ipld-interface/blob/master/cbor.js#L187 does. It creates a valid CBOR node with a special property that points to a list of parts for a large CBOR node. The sharding is implemented transparently via the interface, but at a spec/protocol level it is a bunch of valid IPLD nodes (one You have to understand the convention for describing the larger node in order to know what it truly means, so other implementations of I interpreted this repo/project as "the way you describe an IPLD codec implementation for other libraries to consume." If the sharding is to be automated, what layer would that automation occur? If we only implement it in IPFS that means that the paths, which are defined at the IPLD layer, won't be consistent between IPFS and other implementations of IPLD that use the codec implementations. |
Also, I'll just add that this is just a POC and there is a probably a better way to describe a node that should be constructed out of parts of other CIDs, similar to what we're doing with tagged CID's when describing links in |
"in IPLD" means that it's part of the IPLD spec. "on-top-of IPLD" means not in the IPLD spec. At the moment, the IPLD spec assumes that IPLD nodes are atomic†. Basically, it would be an entirely new system (kind of like unixfs) built on-top of IPLD. This would leave IPLD as a building block for describing arbitrary DAGs and allow users to build more complicated systems on-top of it. †Note: this may change slightly when/if we introduce a type system as we may want to put the type description in a separate object and we may need the type description to fully decode the object. However, the type system is just a vague idea for the moment. |
@Stebalien got it.
Where does that spec live and should we start a thread about this topic? At the moment, we're sort of stuck.
I do see a moral hazard in supporting something like this though. Where does it end? Do we also start supporting a bunch of other random sharding semantics in all the codec implementations? On the other hand, I just don't yet see a place we can do it yet. |
@Stebalien wrote:
For me the discussion reads like there isn't really any practical solution, yet.
I don't understand what you mean, can you please explain this a bit more? Do you mean with "new systems" "something that we can't do, but should have done", or "now is the chance to do it that way". |
@mikeal |
Fair enough... There is no written down spec. At the moment, it's a bunch of implementations and a bunch of discussion. Unfortunately, I can't find the relevant discussions.
The idea is that sharding would be up to the application. We'd come up with a standard set of tools/types that applications could use to shard up their large datastructures.
Correct. Just a conclusion that it can be done at a layer above IPLD.
By "new systems" I mean "not git". That is, we can't go back and shard git objects but we can come up with a thin sharding layer and build new systems (unixfsv2, etc.) on top of these sharding primitives. |
Ah, this makes a lot more sense now. Thanks! |
@mikeal This issue certainly doesn't belong to the interface-ipld-format spec. Rather to the js-ipld one, though I still think it's a layer on top. What do you think? |
Let's close this out. I think that we're all agreed that this interface layer should be atomic at the block level and that what we're discussing here should be a layer above. |
Here's the problem I've got: I need nodes that are larger than 2MB.
This is not a "sharding" problem as it has been typically categorized. I still have a max size of 10MB, I just need to support nodes that are larger than the bitswap max block size. A 5MB cbor binary is a relatively small object that can fit into memory in programs.
In order to do this today you basically have to implement sharding semantics at every layer of a graph that ever might have just a few thousand links in it. That's not very reasonable, I have a data set which has millions of keys at the top level and I'm doing sharding at that layer, but closer to the leaf nodes there's typically less than 100 links, but every once in a while there will be a few thousand.
However, once you go down the rabbit hole of "how do I chunk a single dag node into a couple blocks" you reach some limitations in the current IPLD interface.
cid
interface, it needs to be able to return multiple cid's for all the blocks necessary to construct the node.For efficiency, there's a few other things we should be doing as well.
Once you're doing all this, you realize that you can also offload more of the path resolution to the implementation, since it now has a way to request additional blocks.
Putting all this together, you end up with something very different than the current spec.
I went ahead and documented all of this and wrote an implementation of
dag-cbor
that will chunk nodes larger than 1MB but still reject nodes larger than 10MB (although that is configurable). Please don't get caught up in theasync/await
aspects of the proposal and implementation. I'm not trying to have a big callbacks vs promises debate, this was just the most expedient way to write the reference implementation. However, without async iterables some of the interfaces would get pretty hairy to implement with something like streams :(I'd like to treat this proposal as a discussion rather than an PR. I'm not strongly tied to any particular names or structure. I wrote a full implementation because my ideas needed to be flushed out a bit and it's much easier to show something working than just describe a bunch of ideas.
Proposal for alternative interface for IPLD.
ipld(getBlock)
Must return IPLD Interface.
getBlock
is an async function that accepts a CID and returns a promise forthe binary blob of that CID.
IPLD.serialize(native object)
Takes a native object that can be serialized.
Returns an iterable. All items in iterable much be instances of
Block
orpromises that resolve instances of
Block
.When returning multiple blocks the last block must be the root block.
IPLD.deserialize(buffer)
Takes a binary blob to be deserialized.
Returns a promise to a native object.
IPLD.tree(buffer)
Takes a binary blob of a serialzed node.
Returns an iterable. All item sin iterable must be either strings or promises that resolve to strings.
IPLD.resolve(buffer, path)
Takes a binary blob of a serialized node and a path to child links.
Returns a promise to an object with two properties:
value
andremaining
.value
must be either a deserialized node or a CID instance.remaining
must be a string of the remaining path.Throws an Error() when path cannot be resolved. Error instance should have a
.code
attribute set to404
.IPLD.cids(buffer)
Takes a binary blob of a serialize node.
Returns an iterator. All items in the iterator must be instances of CIDor promises that resolve to instances of CID.
Returns only the CID's required to deserialize this node. Must not contain CID's of named links.
The text was updated successfully, but these errors were encountered: