Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disk space usage of old Files API nodes #3254

Open
matthewrobertbell opened this issue Sep 24, 2016 · 14 comments
Open

Disk space usage of old Files API nodes #3254

matthewrobertbell opened this issue Sep 24, 2016 · 14 comments
Labels
status/deferred Conscious decision to pause or backlog

Comments

@matthewrobertbell
Copy link

matthewrobertbell commented Sep 24, 2016

Version information: Official 0.4.3 64 bit Linux binary

Type: Files API

Priority: P3

Description:

I've been adding a large amount of small files, using the files API, and on a cronjob using "ipfs name publish $(ipfs files stat / | head -n 1)" to publish to IPNS.

The disk is now full. The disk IPFS is using is 250GB of Digital Ocean Block storage:

root@ipfs:~# df -h | grep ipfs
Filesystem Size Used Avail Use% Mounted on
/dev/sda 246G 246G 0 100% /root/.ipfs

I tried to run "ipfs files stat /", but it failed due to no disk space being available (is this a bug?), so I instead did this to get the root object's stats:

root@ipfs:~# ipfs resolve /ipns/QmekbrSJGBAJy6Yzbj5e2pJh61nxWsTwpx88FraUVHwq8x /ipfs/QmTgJ1ZWcGDhyyAnvMn3ggrQntrR6eVrhMmuVxjcT7Ct3D

root@ipfs:~# ipfs object stat /ipfs/QmTgJ1ZWcGDhyyAnvMn3ggrQntrR6eVrhMmuVxjcT7Ct3D
NumLinks: 1
BlockSize: 55
LinksSize: 53
DataSize: 2
CumulativeSize: 124228002182

The useful data is ~124GB, so almost twice the amount of storage is being used as there is added data. Is this because of old root objects hanging around?

Over 99% of disk usage is in .ipfs/blocks

@Kubuxu
Copy link
Member

Kubuxu commented Sep 24, 2016

Possibly, also you might have old files, or not longer reachable files lying around.

You might want to make sure that all of your important files are pinned or reachable by Files API root and run ipfs repo gc to remove old roots, unneeded files and so on.

@matthewrobertbell
Copy link
Author

The script only adds files, it doesn't delete them, so there shouldn't be any old files.

With that being the case, is a gc safe, as all files are reachable from root?

Thanks

@Kubuxu
Copy link
Member

Kubuxu commented Sep 24, 2016

It should be on 0.4.3.

@matthewrobertbell
Copy link
Author

root@ipfs:~# ipfs repo gc
Error: write /root/.ipfs/blocks/CIQJC/put-441273290: no space left on device

All data on the device is from IPFS, is there a way to get around this?

@Kubuxu
Copy link
Member

Kubuxu commented Sep 24, 2016

@lgierth, @whyrusleeping: I think you in the past recovered from situation like that so I think you might be more of a help.

@ghost
Copy link

ghost commented Sep 24, 2016

You'll have to free up a few kilobytes somehow, go-ipfs is currently not able to start if there's no space left. You could also move some subdirectory of the repo to a ramdisk or similar.

@matthewrobertbell
Copy link
Author

Moving the datastore folder worked and gc ran successfully. Is it suggested to run gc regularly on cron with my usecase? The problem I see is that it requires an exclusive lock so the daemon can't run.

Ideally old roots would be automatically cleaned up by the files API, is this planned?

Thanks

@djdv
Copy link
Contributor

djdv commented Sep 24, 2016

You can have the daemon itself trigger a gc automatically by starting it like this ipfs daemon --enable-gc and then editing your ipfs config. You can set things like the interval, max storage, etc.
https://github.com/ipfs/go-ipfs/blob/master/docs/config.md#datastore
I believe there will be more ways to trigger a gc and constrain resources automatically in the future: #1482

Edit:

it requires an exclusive lock so the daemon can't run.

You should be able to initiate a gc while the daemon is running or not so long as it's not already locked. I tend to do mine manually while it is up. You can also prune specific hashes with the ipfs block rm <hash> if you just want to just remove old roots specifically.

@whyrusleeping whyrusleeping added the status/deferred Conscious decision to pause or backlog label Sep 28, 2016
@rht
Copy link
Contributor

rht commented Jan 31, 2017

Possible duplicate of #3621. Also, the tracking of how the repo size grows has been incorporated into the benchmark.

Edit: s/Duplicate/Possible duplicate/
Edit: one more disease strand is confirmed

@Kubuxu
Copy link
Member

Kubuxu commented Jan 31, 2017

@rht it isn't duplicate of #3621. #3621 references directly the pin sharding creating a lot of intermediate nodes, this is is about files api.

@Kubuxu Kubuxu changed the title Files API leading to disk space wastage? Disk space usage of old Files API nodes Jan 31, 2017
@rht
Copy link
Contributor

rht commented Jan 31, 2017

I know the files are being added with files api, but additionally they are pinned, and the extra store is likely due to the same reason as #3621 regardless of how the files are being added (the almost twice size of storage increase might be a coincidence). This can be quickly tested though.

@Kubuxu
Copy link
Member

Kubuxu commented Jan 31, 2017

They don't have to be pinned and files API doesn't use pinset for pinning.

@rht
Copy link
Contributor

rht commented Feb 1, 2017

Confirmed there is an additional storage explosion coming from ipfs files cp (after being pinned):

Tested on #3640 (even after deterministic pin sharding).

@mguentner
Copy link

The trick is to use --flush false and once you are done (after the 10k files cp) to do a ipfs files flush on the root path.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status/deferred Conscious decision to pause or backlog
Projects
None yet
Development

No branches or pull requests

6 participants