Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Got doubled data with dvc add #1442

Closed
TezRomacH opened this issue Dec 16, 2018 · 3 comments
Closed

Got doubled data with dvc add #1442

TezRomacH opened this issue Dec 16, 2018 · 3 comments
Assignees
Labels
question I have a question?

Comments

@TezRomacH
Copy link

Information about the environment

dvc --version
0.22.0

Installed with macOS package to macOS Mojave.

Bug report

I tried this tutorial to reproduce and got some error.
During the tutorial, dvc add command should not copy cashed data. But as you can see below the total folder size is doubled.

aiml/tutorial_dvc/classify  master  
▶ du -sh data 
 41M	data

aiml/tutorial_dvc/classify  master  
▶ du -sh .dvc/cache 
 41M	.dvc/cache

aiml/tutorial_dvc/classify  master 
▶ du -sh .
 82M	.

Reflinks to nodes are different. Maybe it's the case. Let me know if you need some extra information about my environment.

aiml/tutorial_dvc/classify  master
▶ ls -i data/Posts.xml.zip
4690717 data/Posts.xml.zip

aiml/tutorial_dvc/classify  master
▶ ls -i .dvc/cache/ec/
4688793 1d2935f811b77cc49b031b999cbf17
@TezRomacH
Copy link
Author

I've checked issue #942 but fix in 0.14.0 seems to be not working

@efiop
Copy link
Contributor

efiop commented Dec 16, 2018

Hi @TezRomacH !

Your system supports reflinks, so dvc used them to create a link from cache to your workspace. No data duplication has occurred. Unlike hardlink, reflink to a file has different inode, so it is a bit harder to see it working for yourself. Also, du utility still counts them as two separate full-blown files, even though there is no duplication on the filesystem level. We should definitely make it more clear in the documentation. Created iterative/dvc.org#139 .

Also, to be sure that no duplication occurs, you could take a look at the free space on your drive using df utility, which will show that your free space didn't go down once again by that file size after you've dvc add-ed it.

Thanks for the feedback!

@TezRomacH
Copy link
Author

Oh, thanks, now it's clearer!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question I have a question?
Projects
None yet
Development

No branches or pull requests

2 participants