Skip to content
This repository has been archived by the owner on Oct 23, 2022. It is now read-only.

BlockStore considerations #84

Closed
koivunej opened this issue Mar 10, 2020 · 11 comments
Closed

BlockStore considerations #84

koivunej opened this issue Mar 10, 2020 · 11 comments
Assignees

Comments

@koivunej
Copy link
Collaborator

I wanted to reserve time in the grant plan to discuss/design/implement a spec compliant filesystem backed store which would also track the latest developments in go-ipfs and js-ipfs (sharding, store by multihash). At the grant writing time I was also thinking alternative designs to the Store traits and/or even non-async implementation might need to be explored but I no longer see any need to do such things.

The latest version of the go-ipfs filesystem repo is likely at ipfs/go-ds-flatfs but this is no longer spec compliant, as things seem to have evolved without the spec being kept up to date. The latest version of the js-ipfs repo implementation is at ipfs/js-ipfs-repo and the filesystem datastore at ipfs/js-datastore-fs.

Of the many topics mentioned in the first paragraph the "store by multihash" and it's linked issue ipfs/kubo#6815 (similar in ipfs/js-ipfs#2415, from 2019) is probably the most pressing to discuss. As the go-ipfs PR related to this are long running (started in 2018) and I no longer see this mentioned on go-ipfs 0.5 roadmap, I think this might have been postponed until after 0.5, as it's not present on ipfs/kubo#6776 or any of the existing milestones? Could @Stebalien comment on the plans related to this, or someone else who has been keeping eye on this? As the go and js impls are still in progress on this front, it might be better to aim to more traditional blockstore compatibiltity.

If storing by multihash is not a pressing concern, should we aim to be fs-blockstore compatible with js and go? This is tested in ipsf/interop/.../repo.js. This would imply supporting at least the sharding by /repo/flatfs/shard/v1/next-to-last/2 (this was the default with go-ipfs 0.4.22) but the tests might require in practice larger subset of the "repo": $IPFS_PATH/datastore_spec at least, possibly even the leveldb supported in both js and go. The interop tests are however only testing that a block stored in one implementation can be read by the other.

@rklaehn
Copy link
Member

rklaehn commented Mar 10, 2020

I am obviously out of the loop WRT the grant process. But a file based store is horribly inefficient, so for me personally it does not have high priority.

@dvc94ch
Copy link
Contributor

dvc94ch commented Mar 10, 2020

@rklaehn can you explain what you mean exactly? For some workloads some filesystems will be slower than some dbs, but it's not immediately clear what you mean in particular.
I put my thoughts on how it should work into ipld-daemon and rust-ipld, but I haven't assessed it's performance in comparison to other approaches.

@Stebalien
Copy link

As the go-ipfs PR related to this are long running (started in 2018) and I no longer see this mentioned on go-ipfs 0.5 roadmap, I think this might have been postponed until after 0.5, as it's not present on ipfs/kubo#6776 or any of the existing milestones?

We have punted this to post go-ipfs 0.5.0. It's "ready to go" but it'll require a migration. If we include it in 0.5.0 and people need to downgrade for some reason, they'd have to run the reverse migration.

@Stebalien
Copy link

I would aim to store blocks by multihash instead of by CID. The important part of the spec is really the interface/network side of things.

Really, please do innovate here. The datastore is a really nice abstraction but it has some limitations.

  • No streaming. I'd make the rust datastore streaming by default.
  • It's a terrible database. There's no way to run simple queries without either (a) being really inefficient or (b) manually indexing. I'd consider not using the datastore where you'd be better off with a database.
    • Note: the upside of the datastore abstraction is that it's really easy to swap out the storage backend.

@koivunej
Copy link
Collaborator Author

Thanks @Stebalien for the insights I hoped to get even though my original issue text is ... wide :)

I would aim to store blocks by multihash instead of by CID.

I understand we must start doing right away (inside the grant) so we will not have to do any of the workarounds, at least in the lower levels -- keeping the workarounds at http api level sounds great with and the current setup we have (another crate for the interop purposes and http api using this root rust-ipfs crate) supports this nicely.

The important part of the spec is really the interface/network side of things.

By this I understand you mean the $IPFS_PATH/api file as the only interface/network item in the spec I can see, or are you referring to something else, or even other spec than the REPO_FS.md spec I linked in the description?

No streaming. I'd make the rust datastore streaming by default.

Most of my optimization ideas on the layers above (ipld, unixfs) are heavily dependent on streaming so I'm happy you brought it up. But I have no code ready nor do I think I have time to experiment during the grant. Getting the functionality and (at least personally) further learning going without too many performance considerations will support later experimentation when most of the tests pass and we can "keep up" easily with other impls. Performance PR's are always easier when you can benchmark against something.

Note: the upside of the datastore abstraction is that it's really easy to swap out the storage backend.

I am yet to have insight on any or all of the cases where efficient queries and/or other database-y functionality would be needed. At the same time I feel going too general with this level of abstraction will work against later optimization possibilities so perhaps we should dance carefully around this one, at aim to keep the different implementation count initially really low.

@aphelionz
Copy link
Contributor

I just created this secondary issue to nail down which parts of the IPFS_HOME folder we should be compatible with: #88

@aphelionz
Copy link
Contributor

@Stebalien @koivunej @rklaehn @dvc94ch your input would be vailable in #88

@Stebalien
Copy link

Relevant discussion: ipfs/specs#242

@aphelionz
Copy link
Contributor

@Stebalien and cc @vmx

Given the ongoing discussion in that ipfs/specs thread, I think it would be prudent for the grant work to go ahead using the infrastructure we have in place with things like the Blockstore trait, in-memory store, and the existing fs-store. While these systems might lack polish now, they are appropriately idiomatic to Rust and can be improved upon easily as such.

If I read the above discussion correctly, does all the following ring true to folks?

  • Passing interface / interop tests (the network side of things) are top priority
  • We'll manage the complexity of block-storage-by-multihash, probably in the http crate
  • Given time @koivunej will experiment with streaming

@Stebalien
Copy link

Completely agree. The timeline for questions like this can be weeks to months (to years in this case 😭).

@aphelionz
Copy link
Contributor

Closing this issue for now, we can rekindle this discussion when relevant.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants