Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement basic filestore 'no-copy' functionality #3629

Merged
merged 10 commits into from
Mar 6, 2017

Conversation

whyrusleeping
Copy link
Member

if has {
return nil
}

switch b := b.(type) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@whyrusleeping I don't know what you have in mind for the filestore. But I very specially designed my filestore so that writes go though. This way if a file is moved it can simply be readded and any invalid blocks are automatically fixed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That becomes problematic, What then happens if i add dataset A, which contains some subset C. Then later, i add dataset B that also contains some subset C. Then i change my mind and remove dataset B. By your description, the original dataset references would now be lost, with no indication to the user that this has happened.

I think this scenario is worse UX than the reverse (deleting A causing bad refs in B). In either case, repairs are going to have to be manual, and some things will get sticky.

cc @jbenet for his thoughts on this scenario.

@kevina
Copy link
Contributor

kevina commented Jan 25, 2017

@whyrusleeping

From an inline comment above:

I don't know what you have in mind for the filestore. But I very specially designed my filestore so that writes go though. This way if a file is moved it can simply be readded and any invalid blocks are automatically fixed.

That becomes problematic, What then happens if i add dataset A, which contains some subset C. Then later, i add dataset B that also contains some subset C. Then i change my mind and remove dataset B. By your description, the original dataset references would now be lost, with no indication to the user that this has happened.

Yes this was originally a problem. In my implementation I fixed this. See ipfs-filestore#12 and ipfs-filestore#23. However, that solution also requires writes to go through.

I think this scenario is worse UX than the reverse (deleting A causing bad refs in B). In either case, repairs are going to have to be manual, and some things will get sticky.

Not if my solution in ipfs-filestore#12 is implemented.

cc @jbenet for his thoughts on this scenario.

@kevina
Copy link
Contributor

kevina commented Jan 25, 2017

Yes this was originally a problem. In my implementation I fixed this. See ipfs-filestore#12 and ipfs-filestore#23. However, that solution also requires writes to go through.

Another option is to only replace the block if it is invalid. I considered that, but decided that it was better to support allowing multiple files to point to the same block.

Copy link
Contributor

@kevina kevina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One major thing I noticed. A few other minor things that I comment on later.

out, err := f.readDataObj(&dobj)
if err != nil {
return nil, err
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I strongly recommend you verify the block.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yeah. NewBlockWithCid only does that check if u.Debug is true. good catch

@kevina
Copy link
Contributor

kevina commented Jan 30, 2017

I see that "--nocopy" is allowed when the daemon is online without any sort of additional checks. If the blocks are ever not verified this could become a security problem. Even without the security problems this could lead to strange results if files are added on another machine with an identical path.

@kevina
Copy link
Contributor

kevina commented Jan 30, 2017

For this to be usable outside of ipfs-pack I would strongly recommend that references be allowed "outside the root handler" by simply requiring all paths to be absolute.

var FilestorePrefix = ds.NewKey("filestore")

type FileManager struct {
ds ds.Batching
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note. In my early implementations I wrote the filestore on top of the datastore also. However, I changed this to directly use the leveldb as I found little value in the abstraction and extra layers. What actually prompted it was ipfs-filestore#10. In particular querying the datastore was very very slow, however I then latter find it very useful to use the features of the LevelDB in order to provide safe verification of the filestore while the deamon was running by using the LevelDB snapshot feature. With my recent improvements to the datastore query the performance may no longer be a problem, but I still see little value in the extra indirection.

@kevina
Copy link
Contributor

kevina commented Feb 2, 2017

@whyrusleeping concerning replacing blocks. How about for now we make it an option? There needs to be a way to fix broken blocks due to files moving. One option would be to remove the block and then readd it, but pinning makes that a more complicated operation.

As far as allowing multiple blocks for a hash (ipfs-filestore#12 and ipfs-filestore#23), I agree that we should aim for that but to start with it might be a bit much to review.

return nil, err
}

out := make(chan *cid.Cid)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest you use a buffered channel. Use dsq.KeysOnlyBufSize.

return nil, err
}

out := make(chan *cid.Cid)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest you use a buffered channel. Use dsq.KeysOnlyBufSize.

@jefft0
Copy link
Contributor

jefft0 commented Feb 20, 2017

Is this branch ready for me to test on my 30 TB of webcam videos or should I wait?

@whyrusleeping
Copy link
Member Author

@jefft0 Its almost ready, should be merged in a couple days.

@whyrusleeping
Copy link
Member Author

Rebased, waiting on a 👍 from @Kubuxu and CI

@kevina
Copy link
Contributor

kevina commented Mar 2, 2017

At @jefft0 I am really not sure how usable this code is going to be for you as it does not allow for files "outside the root handler". For example if you directory layout is:

/home/jeff0/.ipfs
/aux/media

and you want to share files in /aux/media you won't be able to and will get an error.

@kevina
Copy link
Contributor

kevina commented Mar 2, 2017

@whyrusleeping I would like this comment, to at least be addressed:

I see that "--nocopy" is allowed when the daemon is online without any sort of additional checks. If the blocks are ever not verified this could become a security problem. Even without the security problems this could lead to strange results if files are added on another machine with an identical path.

@whyrusleeping
Copy link
Member Author

@kevina @jefft0 In this case, simply set your IPFS_PATH to be in (given the example) /aux/ipfs or similar, We're pushing for git style security on file references. They need to be within the scope of the ipfs repository.

@kevina Re the comment, blocks are checked now (thanks for pointing that out), and this feature should not be used with client and daemon on separate machines. We can probably implement a check of some sort for this... any ideas on how to do that off the top of your head?

whyrusleeping and others added 6 commits March 6, 2017 00:37
License: MIT
Signed-off-by: Jeromy <[email protected]>
License: MIT
Signed-off-by: Jeromy <[email protected]>
License: MIT
Signed-off-by: Kevin Atkinson <[email protected]>
License: MIT
Signed-off-by: Kevin Atkinson <[email protected]>
License: MIT
Signed-off-by: Jeromy <[email protected]>
@whyrusleeping whyrusleeping merged commit ea57c69 into master Mar 6, 2017
@whyrusleeping whyrusleeping removed the status/in-progress In progress label Mar 6, 2017
@jefft0
Copy link
Contributor

jefft0 commented Mar 6, 2017

I'm confused. I have several webcam videos totalling 60 GB. With a fresh installation of go-ipfs (master branch), I did ipfs init then ipfs add --nocopy --raw-leaves for each file. But now ~/.ipfs is 60 GB. Is --nocopy working yet in master?

@Kubuxu
Copy link
Member

Kubuxu commented Mar 6, 2017

@whyrusleeping anything on this?

@whyrusleeping
Copy link
Member Author

Ah, theres a check i forgot to add, erroring out if --nocopy is passed without enabling the filestore.

@jefft0 take a look at "How to enable" here: #3397 (comment)

@jefft0
Copy link
Contributor

jefft0 commented Mar 6, 2017

@whyrusleeping Much better, thanks! Now ~/,ipfs is only 68 MB. I'll try adding a lot more files. Where do you want me to report my results?

@whyrusleeping
Copy link
Member Author

@jefft0 You can report them here, or open a new issue to discuss. Either way :)

@jefft0
Copy link
Contributor

jefft0 commented Mar 6, 2017

@whyrusleeping I want to keep my main repo in ~/.ipfs, so instead of changing IPSF_PATH I put a symbolic link to my external drive in ~/.ipfs and do ipfs add --nocopy with that path. This works. It would also let me use multiple external drives at different mount points. Do you see any downside to this approach?

@whyrusleeping
Copy link
Member Author

@jefft0 hrm... unsure about the implications of this. I'll have to think about the security model of using/allowing symlinks like that.

My first thought though is that it should be fine

@whyrusleeping
Copy link
Member Author

@jefft0 you can also put the symbolic link in just ~/, it doesnt have to be IN the ipfs dir, just in its parent (like the .git folder for git repos)

@whoizit
Copy link

whoizit commented Mar 6, 2017

every time I get

$ ps aux | grep ipfs
atommixz  3109 36.2  2.5 879840 207656 ?       Ssl  23:20   1:24 /usr/bin/ipfs daemon --manage-fdlimit=true

$ LANG=C type ipfs_add
ipfs_add is aliased to `ipfs add --nocopy --raw-leaves --recursive --progress'

$ ipfs_add download/torrents
added QmcGUHBF2XT8dTblablablazYsNQch6ECGx torrents/file.ext
 3.24 GB / 61.22 GB [==>---------------------------------------------------]   5.30% 15m33s23:23:30.659 ERROR commands/h: open /home/username/.ipfs/blocks/HE/put-789802368: too many open files client.go:247
Error: open /home/username/.ipfs/blocks/HE/put-789802368: too many open files

@whyrusleeping
Copy link
Member Author

@atommixz Wanna open a new issue for this?

@whyrusleeping whyrusleeping deleted the feat/filestore0 branch March 6, 2017 20:51
@whoizit
Copy link

whoizit commented Mar 6, 2017

This helps to me:

$ systemctl --user edit --full ipfs 
[Service]
LimitNOFILE=999999

This is don't make sence to me ipfs daemon --manage-fdlimit=true

@whoizit
Copy link

whoizit commented Mar 6, 2017

I'm trying to download many images from filestore. Everytime it stops at the same place. Last commit on two pc.

$ ipfs get Qmblablabla
Saving file(s) to Qmblablabla
 117.00 MB / 2.36 GB [==>------------------------------------------------------]   4.84% 4s
Error: expected protobuf dag node

...
23:49:27.028 ERROR    bitswap: couldnt open sender again after SendMsg(<peer.ID PRB9SN>) failed: dial attempt failed: <peer.ID Rj75eS> --> <peer.ID PRB9SN> dial attempt failed: i/o timeout wantmanager.go:233
23:50:03.062 ERROR    bitswap: couldnt open sender again after SendMsg(<peer.ID aoqJoJ>) failed: dial attempt failed: <peer.ID Rj75eS> --> <peer.ID aoqJoJ> dial attempt failed: i/o timeout wantmanager.go:233
23:51:15.626 ERROR commands/h: err: expected protobuf dag node handler.go:288
23:52:34.252 ERROR commands/h: err: expected protobuf dag node handler.go:288
23:52:59.859 ERROR commands/h: err: expected protobuf dag node handler.go:288

@Kubuxu
Copy link
Member

Kubuxu commented Mar 7, 2017

Can you report it in separate issue. It is much easier to track this way.

@whoizit
Copy link

whoizit commented Mar 7, 2017

whyrusleeping said "You can report them here, or open a new issue to discuss. Either way :)"
so I do, because it's about filestore.

@Kubuxu
Copy link
Member

Kubuxu commented Mar 7, 2017

I think he meant results as it, it worked, performance, small problems.

@whyrusleeping
Copy link
Member Author

Yeah, Keeping perf and general 'how it went' things here is fine, but actual problems need to have their own issues so we can address them

@jefft0
Copy link
Contributor

jefft0 commented Mar 7, 2017

I had a problem with too many open files and opened issue #3763. I don't know if it's related to the issue @Kubuxu has.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants