Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stage collection: do not collect .gitignored dvcfiles #5265

Merged
merged 5 commits into from
Jan 26, 2021

Conversation

skshetry
Copy link
Member

@skshetry skshetry commented Jan 13, 2021

Closes #5244

A few fixes has been made along the way, including:

  • better error message if the dvc.yaml and .dvc files are dvc-ignored.
    $ echo "dvc.yaml" >> .dvcignore
    $ dvc repro
    ERROR: 'dvc.yaml' is dvc-ignored. 
  • better error message if the dvc.yaml and .dvc files are git-ignored.
    $ echo "dvc.yaml" >> .gitignore
    $ dvc repro
    ERROR: 'dvc.yaml' is git-ignored.

An open question is, what should we do about lockfiles if they are git-ignored or dvc-ignored?
Right now, we assume the lockfile as empty if they are dvc-ignored and we don't care about git-ignores
for the lockfiles. I feel that this behaviour is good, but if we want them to be consistent, please do tell me.

Another thing to keep in mind is, since the implementation depends on isinstance(repo.tree, LocalTree), if we start using LocalTree for get/list/import, it'll break, though I don't think it should as we better apply it to only the user's workspace (IIRC dvcx uses this mode).
Though, this might be handled by a state (eg: repo.is_external_repo or something like that).

Thank you for the contribution - we'll try to review it as soon as possible. 🙏

@skshetry skshetry self-assigned this Jan 13, 2021
@skshetry skshetry marked this pull request as ready for review January 19, 2021 14:40

def is_ignored(path):
# apply only for the local tree
return is_local_tree and scm.is_ignored(path)
Copy link
Member Author

@skshetry skshetry Jan 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check might not work everywhere, especially in for_write=True scenarios in dvcx. But as we are adjusting our APIs in ExternalRepo, Repo.clone() should set some kind of state to workaround that. And, we'll add additional check for repo.is_external here and that should be enough.

@pared
Copy link
Contributor

pared commented Jan 21, 2021

An open question is, what should we do about lockfiles if they are git-ignored or dvc-ignored?

I think users should be able to ignore lockfile. Dvc is able (to some extent) to work without it, so removing this possibility would limit dvc's functionality. And I guess one can come with some use cases where this is actually a feature. For example any educational material that's purpose is to show how running the pipeline looks like.

@pmrowla
Copy link
Contributor

pmrowla commented Jan 22, 2021

An open question is, what should we do about lockfiles if they are git-ignored or dvc-ignored?

I think users should be able to ignore lockfile. Dvc is able (to some extent) to work without it, so removing this possibility would limit dvc's functionality. And I guess one can come with some use cases where this is actually a feature. For example any educational material that's purpose is to show how running the pipeline looks like.

One thing to keep in mind is that experiments runs in temp directories (and on remote machines in the future) have to git commit dvc.lock in order for us to retrieve the experiment result. We can (and already do) force-add gitignored lock files internally for these types of runs, and on the user's local exp apply the dvc.lock file would no longer be staged/committed like normal. But this does seem like the kind of thing that may lead to unexpected behavior in the future and it may be safer for us to forbid it?

@pared
Copy link
Contributor

pared commented Jan 22, 2021

@pmrowla Fair point, I didn't consider experiments in my thought process. In that case it might be better to just require it.

@skshetry skshetry changed the title stage collection: do not collect .gitignored dvcfiles [WIP] stage collection: do not collect .gitignored dvcfiles Jan 22, 2021
@skshetry skshetry changed the title [WIP] stage collection: do not collect .gitignored dvcfiles stage collection: do not collect .gitignored dvcfiles Jan 26, 2021
@efiop efiop merged commit b1d6918 into iterative:master Jan 26, 2021
@skshetry skshetry deleted the gitignore_stages_collection branch January 26, 2021 12:52
@skshetry skshetry mentioned this pull request Jan 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

stage collection: ignore git-ignored directories?
4 participants