-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
stage collection: ignore git-ignored directories? #5244
Comments
Great suggestion! We should think about scenarios that this might break, but it looks like it won't break any scenarios that use |
This proposal is only regarding the user's workspace. Even though the user might have git-ignored dvcfiles (maybe to prevent it from getting updated), we'll still read what's checked-in the git. So, it does not apply to those commands at all. One thing that comes to mind is |
As all of the DVC tracked data would be added into |
@karajan1001, we do need to look into every directory, except the git-ignored ones which include the dvc-tracked data as well. Our assumption is that the user's repo is not usually dense (if we ignore dvc-tracked data and the git-ignored files like Users could just use |
Hi! Thanks for the RFC on Slack. A couple general questions to make sure I understand this:
Can you clarify whether you are proposing to get rid of .dvcignore here?
Again, are we suggesting to rely on .gitignore (deprecate .dvcignore)? B/c there could be files you want DVC to ignore but Git to track.
Can someone explain what they refer to when talking about Thanks |
We had discussions before on getting rid of So, no, I am not proposing to get rid of
Yes, but what's the usecase of that? That's what we need to understand.
This is an internal abstraction. We had discussions months before about using both What I am proposing here is just use this at a high level, when searching for the stages, which provides almost same benefit, without too much complications. |
OK, I was confused by the
Yes, good Q. It's a bit confusing because unlike Git, DVC already ignores everything by default (the Git analogy comes back to haunt us again...) but:
We could try to emphasize that usage of .dvcignore in docs. Even add it to the Best Practices section (when that happens... see iterative/dvc.org/issues/72). No point in doing that if this gets implemented though. |
dvc-tracked directory must not contain a dvc.yaml/.dvc file. Or, maybe I did not get your question?
It could be implemented when someone asks for, with a clear use-case. It could be
People might not like to maintain |
BTW I tried |
I have 4 virtual environments inside |
Wasn't a question but never mind: "It is not possible to re-include a file if a parent directory of that file is excluded." (from gitignore) |
dvc status
in our repo is terribly slow (~6s), and I think a project of a reasonable size of our users are slow as well, as we might be traversing through very dense directories (eg:node_modules
/venv
, and user's dvc-tracked data).We have discussed before to get rid of
.dvcignore
, but I see it to be a different thing, useful in, for example, ignoring.DS_Store
/logs
, etc in dvc-tracked artefacts.We could opt for a simple solution and just ignore the git-ignored dvcfiles and directories and never traverse them. In addition to this, as we also add user's data in the
gitignore
, we can also get significant speedups just by doing this.Also, design-wise, we won't need to mix
gitignore
s anddvcignore
s in thetree
s.Performance improvements in dvc's own repo for
status
Before
After
Also, note that we can change this behaviour in 2.0 release, as this could be considered a breaking change.
The text was updated successfully, but these errors were encountered: