Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Add support for multiple blockstores #3257

Closed
wants to merge 7 commits into from

Conversation

kevina
Copy link
Contributor

@kevina kevina commented Sep 25, 2016

Closes #3119.

Required for #2634.

@Kubuxu Kubuxu added the status/in-progress In progress label Sep 25, 2016
@kevina kevina mentioned this pull request Sep 25, 2016
6 tasks
@kevina kevina changed the title WIP: Add basic support for multiple blockstores WIP: Add support for multiple blockstores Sep 25, 2016
Copy link
Member

@whyrusleeping whyrusleeping left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reasoning for doing this at the blockstore level? I thought you were doing this with a multi-datastore type thing?

}

func NewBlockstoreWPrefix(d ds.Batching, prefix string) *blockstore {
if prefix == "" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont like this, lets force the user to give valid input, in the function above this we can just pass the default value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay.

if err == nil && exists {
return nil // already stored.
}
// Note: The Has Check is now done by the MultiBlockstore
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree its probably okay to remove this and do it intentionally in a certain place (instead of at every layer). But this gets a bit weird... For example, if someone just wants to use the blockstore on its own, they don't get this nice optimization.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure about that? MultiBlockstore just calls Has method of Blockstore but it doesn't mean that Blockstore (on Datastore) will check the Has of datastore.

Copy link
Member

@Kubuxu Kubuxu Oct 3, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UPDATE: ahh, you call an explicit Has.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So can I take it that this is resolved?

@@ -104,11 +104,11 @@ func TestHasIsBloomCached(t *testing.T) {
block := blocks.NewBlock([]byte("newBlock"))

cachedbs.PutMany([]blocks.Block{block})
if cacheFails != 2 {
t.Fatalf("expected two datastore hits: %d", cacheFails)
if cacheFails != 1 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I verified this was correct, the change was due to an implementation detail. Unfortually, I can't remember why. If it is important enough I will spend an hour or two looking into it.

Copy link
Member

@whyrusleeping whyrusleeping Oct 3, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It always weirds me out when i see tests changing. Maybe @Kubuxu is more familiar with this and can check it?

Copy link
Member

@Kubuxu Kubuxu Oct 3, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was to because PutMany on default blockstore was accessing datastore two times, once checking Has for a key and then calling Put.

This means that now we don't check the datastore with Has we just Put the value.

https://github.com/ipfs/go-ipfs/pull/3257/files#r80438802

@@ -557,6 +565,27 @@ func (r *FSRepo) Datastore() repo.Datastore {
return d
}

func (r *FSRepo) DirectMount(prefix string) ds.Datastore {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe call this DatastoreAt or GetMount. DirectMount just leaves me confused about what this is

@kevina
Copy link
Contributor Author

kevina commented Oct 3, 2016

@whyrusleeping

What is the reasoning for doing this at the blockstore level? I thought you were doing this with a multi-datastore type thing?

To better interact with caching. In particular the bloom filter.

func (bs *multiblockstore) Locate(key key.Key) []LocateInfo {
res := make([]LocateInfo, 0, len(bs.mounts))
for _, m := range bs.mounts {
_, err := m.Blocks.Get(key)
Copy link
Member

@Kubuxu Kubuxu Oct 3, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be better to use Has here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessary, the problem is with the filestore simply calling Has() will not guarantee the block is available as the backing file might have changed or no longer be available.

Copy link
Member

@Kubuxu Kubuxu Oct 4, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be on filestore to figure out, IMO.
Why should we load X MiB into memory from slow harddrive if other blockstores can figure it out without loading whole content or even from Has caches.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allright. I agree will fix this and the Filestore's Has() method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other thing is that Get() will return more information than Has() for the filestore. In particular it will return if the key doesn't exist, or if the key exists but there was a problem reconstructing the block.

I am holding off on this change for a bit while I think about it.

}

func (bs *multiblockstore) Put(blk blocks.Block) error {
// Has is cheaper than Put, so see if we already have it
Copy link
Member

@Kubuxu Kubuxu Oct 3, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is misleading. You don't check Has before Put because it is cheaper..
If you don't do it, you can end up with data being duplicated in two blockstores.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your right, I will fix it to be clearer.

out <- key
}
}
}()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If @whyrusleeping agrees with me I would open a goroutine per BS and make them pipe from one AllKeysChan to one external. This way if first BS is slow and second is fast, the first won't slow down whole process.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

if err == nil {
return blk, nil
}
if firstErr == nil || firstErr == ErrNotFound {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would replace it with firstErr == nil && err != ErrNotFound

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I do that than firstErr could end up being nil even if the block was not found. The idea is to return ErrNotFound only if that is the case for all the blockstore's. If not than return the more serious error.

@kevina kevina force-pushed the kevina/multiblockstore branch 2 times, most recently from 1be4ba9 to 145180a Compare October 16, 2016 16:36
@kevina kevina added this to the Filestore implementation milestone Oct 19, 2016
Copy link
Member

@whyrusleeping whyrusleeping left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This just feels very complicated to me. not sure how best to make it less so. It might help if i look at exactly what is needed by the filestore here

func (bs *multiblockstore) Locate(c *cid.Cid) []LocateInfo {
res := make([]LocateInfo, 0, len(bs.mounts))
for _, m := range bs.mounts {
_, err := m.Blocks.Get(c)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should use Has

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine with that. Not 100% happy with it as filestore Has doesn't really guarantee the block is available (and changing it so it does will make the Has() call really expensive), but its not a major problem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I partly take the Has() comment back. The filestore Has() needs fixing or the Put below needs fixing.

return res
}

func (bs *multiblockstore) Put(blk blocks.Block) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if i add a file with the filestore, then i add another file normally that overlaps a block with it. one block from my 'normal' add will be referencing a file on disk?

That feels odd to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but rather difficult to avoid.

out <- key
}
}
}()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

func RmBlocks(mbs bs.MultiBlockstore, pins pin.Pinner, out chan<- interface{}, cids []*cid.Cid, opts RmBlocksOpts) error {
prefix := opts.Prefix
if prefix == "" {
prefix = mbs.Mounts()[0]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this only remove from one blockstore?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

if err != nil {
return err
}

mounts := []bstore.Mount{{fsrepo.CacheMount, cbs}}

if n.Repo.DirectMount(fsrepo.FilestoreMount) != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filestore mount? does this do anything yet?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not in this P.R. Looks like I missed that one when factoring out the code.

@@ -78,6 +80,7 @@ You can now refer to the added file in a gateway, like so:
cmds.BoolOption(hiddenOptionName, "H", "Include files that are hidden. Only takes effect on recursive add.").Default(false),
cmds.StringOption(chunkerOptionName, "s", "Chunking algorithm to use."),
cmds.BoolOption(pinOptionName, "Pin this object when adding.").Default(true),
cmds.BoolOption(allowDupName, "Add even if blocks are in non-cache blockstore.").Default(false),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This really complicates things... thoughts @Kubuxu ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i brought this up in IRC.

DAG merkledag.DAGService // the merkle dag service, get/add objects.
Resolver *path.Resolver // the path resolution system
Peerstore pstore.Peerstore // storage for other Peer instances
Blockstore bstore.MultiBlockstore // the block store (lower level)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be an interface

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what should be an interface?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the blockstore should just be of type Blockstore. That way its easy to swap out with any other blockstore implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@whyrusleeping I did not design the MultiBlockstore to be able to be swapped out. (As I see it it, it was designed to allow use of other Blockstores so why would you want to swap it out, but I can understand your viewpoint) If that is something you want it is likely possible. I image I can have something done in a few days.

func openDefaultDatastore(r *FSRepo) (repo.Datastore, error) {
const (
RootMount = "/"
CacheMount = "/blocks" // needs to be the same as blockstore.DefaultPrefix
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this called the cache mount?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see #3119.

kevina added a commit to ipfs-filestore/go-ipfs that referenced this pull request Nov 3, 2016
Factored out of ipfs#3257 (Add support for multiple blockstores).

License: MIT
Signed-off-by: Kevin Atkinson <[email protected]>
Each datastore is mounted under a different mount point and a
multi-blockstore is used to check each mount point for the block.

The first mount checked of the multi-blockstore is considered the
"cache", all others are considered read-only.  This implies that the
garbage collector only removes block from the first mount.

This change also factors out the pinlock from the blockstore into its
own structure.  Only the multi-datastore now implements the
GCBlockstore interface.  In the future this could be separated out
from the blockstore completely.

For now caching is only done on the first mount, in the future this
could be reworked.  The bloom filter is the most problematic as the
read-only mounts are not necessary immutable and can be changed by
methods outside of the blockstore.

Right now there is only one mount, but that will soon change once
support for the filestore is added.

License: MIT
Signed-off-by: Kevin Atkinson <[email protected]>
License: MIT
Signed-off-by: Kevin Atkinson <[email protected]>
This option adds a files the the primary blockstore even if the block
is in another blockstore such as the filestore.

License: MIT
Signed-off-by: Kevin Atkinson <[email protected]>
License: MIT
Signed-off-by: Kevin Atkinson <[email protected]>
@whyrusleeping
Copy link
Member

This is no longer needed

@whyrusleeping whyrusleeping deleted the kevina/multiblockstore branch March 6, 2017 19:46
@whyrusleeping whyrusleeping removed the status/ready Ready to be worked label Mar 6, 2017
@kevina kevina restored the kevina/multiblockstore branch March 6, 2017 19:50
@kevina
Copy link
Contributor Author

kevina commented Mar 6, 2017

@whyrusleeping I beg to differ. Something like this might still be needed. For example if you want multiple ipfs-packs in a single node.

@whyrusleeping
Copy link
Member

Alright, I'll keep the branch around (its also archived on the branch archive repo). But keeping the PR closed until we have an actionable usecase need for it.

@Kubuxu Kubuxu deleted the kevina/multiblockstore branch August 10, 2017 17:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants