Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce check-ignore for dvc ignore command. #3736

Closed
pared opened this issue May 4, 2020 · 5 comments · Fixed by #4282
Closed

Introduce check-ignore for dvc ignore command. #3736

pared opened this issue May 4, 2020 · 5 comments · Fixed by #4282
Labels
feature request Requesting a new feature p3-nice-to-have It should be done this or next sprint

Comments

@pared
Copy link
Contributor

pared commented May 4, 2020

There should be functionality providing easy way to check whether some file is dvc-ignored or not. Related: #3538

@triage-new-issues triage-new-issues bot added the triage Needs to be triaged label May 4, 2020
@pared pared added the feature request Requesting a new feature label May 7, 2020
@triage-new-issues triage-new-issues bot removed the triage Needs to be triaged label May 7, 2020
@pared pared added the p2-medium Medium priority, should be done, but less important label May 7, 2020
@efiop efiop added p3-nice-to-have It should be done this or next sprint and removed p2-medium Medium priority, should be done, but less important labels May 8, 2020
@karajan1001
Copy link
Contributor

karajan1001 commented Jul 24, 2020

@pared
Hi, I'd like to look into it this weekend. And now are asking for some advice.

According to Git.
It has -q, -v, --stdin, -z, -n, --no-index arguments.

Among them, I consider -v(show details, the last pattern matches) --no-index (without this ignore-check would ignore those files already added in cache) most required.

Then, --stdin is very helpful at debugging, -q is a default parameter. -z is used when we are about to output to files or use it in scripts. At last -n is an optional I think but is easy to implement.

So, at first, I'd like to implement -v, -q, -n, and --no-index first, and --stdin at the next step.
Besides these, Maybe -a, --all which shows all of the patterns matches the files instead of only the last one are useful?

@pared
Copy link
Contributor Author

pared commented Jul 24, 2020

@karajan1001
Ill try to provide as much info as I am able:

  1. -q - I think that if we are to use the logger to display the information, the only important thing to implement is proper return code on command. quiet behavior is handled by default parser

  2. --no-index - I am not sure whether this one is vital. In my understanding, in git, it can be used to verify whether some element would be ignored if it was not already added to the index. So there is a kind of superiority of index over the workspace. In DVC however, if you put something into the .dvcignore, and run dvc status, DVC will pick it up and report your dir as changed, even if it is not. But from the point of view of DVC it did change because you "deleted" particular file. So I think in this point DVC differs from git in its operating principles and might not require --no-index option. Though if you have some other perspective, please share, I might be missing something here.

So, for now, if you want to implement a few features, I think -v, -q, -n will already be super-useful. Just a note though, please don't consider it as a must. It will be awesome even if you were able to do just dvc check-ignore returning 1/0.

In next steps --stdin would be great and nicely matching other command-line helpful options, like --show-vega for plots command. -a also sounds awesome.

Please ping me if you want me to extend some parts.

@karajan1001
Copy link
Contributor

karajan1001 commented Jul 25, 2020

@pared
Thank you,

  1. For -q. I agreed, the only thing need to take care is the return number, non-zero return number may cause some report process. In Git

0:One or more of the provided paths is ignored.
1:None of the provided paths are ignored.

  1. For --no-index, I had tried on my computer, seems that it is vital for Git but not for DVC. The reason lies in the differences in treating ignored files between Git and DVC. In Git .gitignore changed didn't influence files existing in the cache, Ignore check skips them. But for DVC, ignored check on files whether added to cache or not.

In conclusion without --no-index Git can't check files already in the cache, while for DVC actually we are always in a --no-index mode.

So, two steps:

  • basic functions -v, -q, -n
  • advanced functions -a, --stdin

karajan1001 added a commit to karajan1001/dvc that referenced this issue Jul 25, 2020
@karajan1001
Copy link
Contributor

@pared
verbose

Seems that --verbose conflict with debug model, use -d, --details instead.

@pared
Copy link
Contributor Author

pared commented Jul 25, 2020

@karajan1001
Some disrepancies are unavoidable, feel free to invent your own names. Until we create docs for the command, we can assume its experimental feature which can be adjusted. We can also come to some agreements during review.

karajan1001 added a commit to karajan1001/dvc that referenced this issue Aug 2, 2020
Fix iterative#3736

1. Introduce new arguments `--all` for `dvc check-ignore`
2. Add tests for `dvc check-ignore --all`
3. Introduce an interactive mode for `dvc check-ignore` with arguments `--stdin` on
efiop added a commit that referenced this issue Aug 3, 2020
* Update some tests first

fix #3736

* first edition

* solve failure in Windows

* For Windows ci

* Some help ducuments issue.

* Update dvc/ignore.py

abspath

Co-authored-by: Ruslan Kuprieiev <[email protected]>

* Refactor with OutOfWorkSpaceError

* Solve a bug

* Update dvc/command/check_ignore.py

Co-authored-by: Jorge Orpinel <[email protected]>

* Update dvc/command/check_ignore.py

Co-authored-by: Jorge Orpinel <[email protected]>

* Update dvc/command/check_ignore.py

* Revert "Refactor with OutOfWorkSpaceError"

This reverts commit 27eec49.

* Two change request

1. Argument `targets`'s description.
2. Error handling of `_get_normalize_path`

* Update dvc/main.py

Co-authored-by: Ruslan Kuprieiev <[email protected]>

* `check_ignore` now only accept one path a time

1. Add a new test for the out side repo cases
2. check ignore now only check one file not file lists

Co-authored-by: Ruslan Kuprieiev <[email protected]>
Co-authored-by: karajan1001 <[email protected]>
Co-authored-by: Jorge Orpinel <[email protected]>
karajan1001 added a commit to karajan1001/dvc that referenced this issue Aug 3, 2020
Fix iterative#3736

1. Introduce new arguments `--all` for `dvc check-ignore`
2. Add tests for `dvc check-ignore --all`
3. Introduce an interactive mode for `dvc check-ignore` with arguments `--stdin` on
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Requesting a new feature p3-nice-to-have It should be done this or next sprint
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants