Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pull: Failed to pull data from the cloud. <filename> is git-ignored. #5396

Closed
benjamintanweihao opened this issue Feb 3, 2021 · 13 comments
Closed
Labels
awaiting response we are waiting for your reply, please respond! :) discussion requires active participation to reach a conclusion

Comments

@benjamintanweihao
Copy link

Bug Report

Description

I tested this between DVC versions 1.11 and 2.0. My setup is the following:

I have a folder called env/resources where I store Tensorflow binaries which are tracked by DVC. env/resources is also git-ignored.

When I do dvc pull in earlier versions of DVC (i.e 1.x) everything works as expected. However, when I do the same in version 2, I get pull: Failed to pull data from the cloud. <filename> is git-ignored.

Reproduce

Example:

  1. dvc init
  2. Copy dataset.zip to the directory
  3. dvc add dataset.zip
  4. dvc push
  5. Add the directory to .gitignore.
  6. git clone the repository somwhere else.
  7. dvc pull

Expected

dvc should have let me pull in the file even though the directory is git-ignored.

Environment information

Output of dvc version:

DVC version: 2.0.0a0+bb4604 (snap)
---------------------------------
Platform: Python 3.6.9 on Linux-5.4.0-62-generic-x86_64-with-Ubuntu-18.04-bionic
Supports: All remotes
Cache types: hardlink, symlink
Cache directory: ext4 on /dev/nvme0n1p2
Caches: local
Remotes: s3
Workspace directory: ext4 on /dev/nvme0n1p2
Repo: dvc, git
@skshetry
Copy link
Member

skshetry commented Feb 3, 2021

@benjamintanweihao, could you please share the reason why you add the directory to .gitignore?

DVC already adds the tracked directory to the .gitignores.

@benjamintanweihao
Copy link
Author

benjamintanweihao commented Feb 3, 2021 via email

@skshetry
Copy link
Member

skshetry commented Feb 5, 2021

Hi @benjamintanweihao, is it possible to exclude those .gitignores?

@skshetry skshetry added awaiting response we are waiting for your reply, please respond! :) discussion requires active participation to reach a conclusion labels Feb 5, 2021
@macio232
Copy link

Up

@skshetry
Copy link
Member

@macio232, could you please provide more details? How is it affecting you? And, is that possible to unignore them? Thanks. 🙂

@macio232
Copy link

Actually, all the details are in the first post :) Files/directories added to .gitignore (for whatever reason) and tracked by DVC can not be pulled, which makes not much sense for me and is different from the previous behavior.

I was testing on the pre-release version mentioned in the first post here, and I don't know if the bug (or feature) is present in the latest version (which is 2.0.5 while typing). I can not verify this because I no longer have access to the project.

@efiop
Copy link
Contributor

efiop commented Mar 13, 2021

@macio232 That behaviour was introduced in 2.0 intentionally. You need to exclude corresponding *.dvc (or dvc.yaml) files from .gitignore, otherwise dvc won't be able to discover them.

@efiop efiop closed this as completed Mar 13, 2021
@efiop
Copy link
Contributor

efiop commented Mar 13, 2021

Just to clarify: we've started ignoring such dvcfiles in 2.0 to improve the performance #5265 @macio232 Would it be possible to unignore those dvcfiles in your scenario?

@benjamintanweihao
Copy link
Author

benjamintanweihao commented Mar 13, 2021 via email

@macio232
Copy link

@efiop exactly as @benjamintanweihao wrote - the usual strategy is to put the whole resources/data directory to .gitignore because of data confidentiality reasons. Sometimes you also have local data resources which you do not want to track either by git and DVC.

Moreover, I have the data directory added to .gitignore in all of my projects. I want to upgrade to 2.0, but this requires taking care of each individual file that should be gitignored.

@efiop
Copy link
Contributor

efiop commented Mar 14, 2021

@macio232 @benjamintanweihao Would it be possible to add an exclusion rule only for *.dvc and dvc.yaml?

@macio232
Copy link

@macio232 @benjamintanweihao Would it be possible to add an exclusion rule only for *.dvc and dvc.yaml?

Yes! I think this is a solution, and I feel stupid I didn't think of it myself :) Thank you @efiop !

@OSobky
Copy link

OSobky commented Oct 30, 2023

@macio232 @benjamintanweihao Would it be possible to add an exclusion rule only for *.dvc and dvc.yaml?

Hello @efiop, what was the exclusion rule? I am not sure why did it disappear.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting response we are waiting for your reply, please respond! :) discussion requires active participation to reach a conclusion
Projects
None yet
Development

No branches or pull requests

5 participants