Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Special dir warning #3436 #4229

Merged
merged 12 commits into from
Aug 12, 2020
32 changes: 32 additions & 0 deletions dvc/ignore.py
Original file line number Diff line number Diff line change
Expand Up @@ -148,6 +148,37 @@ def __eq__(self, other):
return self.basenames == other.basenames


class DvcWarningDirs:
special_dirs = {
".petest_cache",
".mypy_cache",
"node_modules",
".venv",
"venv",
"env",
"__pycache__",
"dist",
"eggs",
"wheels",
"htmlcov",
"docs",
".ipynb_checkpoints",
".vscode",
".github",
}
Copy link
Contributor

@efiop efiop Jul 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, makes me wonder if we should really just ignore them by default (like we do with .git and .hg), and if some user really needs these, he could !something in dvcignore to include it.

Or, we might consider generating a default .dvcignore (kinda like github generates .gitignore when you create a project) to make it extra explicit. Though this is simple on init, but managing this stuff in existing projects is not too simple 🙁

I'm just worried about these warnings getting too annoying. Plus, unless I'm missing something, ignore execution order is not deterministic (that's actually is a bigger problem for us in general), so you never know if this filter will be executed before or after .dvcignores in which user has ignored a special dir explicitly. 🙁

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more thing that this makes me think about is having some kind of safe guard, that would, for example, record the amount of time it took to walk through the directory and warn if it didn't find anything but it took, say, more than 5 sec.

Copy link
Contributor Author

@aerubanov aerubanov Jul 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This issue was opened after this conversation #3257. It was decided that it is better to warn the user if a special directory is found than to ignore it by default. But annoyance from warnings can be a problem. Maybe we should make it possible to disable this warning? From the other hand, the default .dvcignore also good idea because the ignore will be explicite to the user.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What makes idea of default .dvcignore compelling, is the fact that is not only explicit, but also editable by user.

If we decide to go with warnings, they definitely need turn off mechanism, just like we did with slow_link_guard.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I soppouse that the default .dvcignore, is better option. It`s more explicite and easy to configure. Also the list of special dirs always will be incomplite, we can not add all posiible variants here. And default .dvcignore provide additional example for users how to deal with such cases.

@efiop, @pared, if you are agree with me, then I will change the code. Should I close this PR and open a new one later, or commit right here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I love the idea of an explicit .dvcignore

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder whether it should be default. Maybe we should add some option to init, or dedicate special command for that? It seems to me that .dvcignore was supposed to let user decide what is part of his project. If we always create default .dvcignore on init, we make it our choice. @efiop @aerubanov, what do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that default .dvcignore should consist comment with link to the dvcignore documentation. And we should add option for init to filling .dvcignore with some content. Not all list of special folder, rather based on type of project specified by user. Something like github gitignore template https://github.com/github/gitignore, but with fewer options ofcorse.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aerubanov I think you are right on this one.


def __call__(self, root, dirs, files):
for d in dirs:
if d in self.special_dirs:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could do something like set(dirs).intersection(self.special_dirs) and print single warning for all problematic dirs.

print(
f"Warning: {d} found in dvc, traversing may be slow."
f" You can add it in .dvcignore. See more "
f"https://dvc.org/doc/user-guide/dvcignore"
)

return dirs, files


class DvcIgnoreRepo(DvcIgnore):
def __call__(self, root, dirs, files):
def is_dvc_repo(directory):
Expand Down Expand Up @@ -181,6 +212,7 @@ def __init__(self, tree, root_dir):
self.ignores = {
DvcIgnoreDirs([".git", ".hg", ".dvc"]),
DvcIgnoreRepo(),
DvcWarningDirs(),
}
ignore_pattern_trie = DvcIgnorePatternsTrie()
for root, dirs, _ in self.tree.walk(self.root_dir):
Expand Down
24 changes: 21 additions & 3 deletions tests/func/test_ignore.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
DvcIgnorePatterns,
DvcIgnorePatternsTrie,
DvcIgnoreRepo,
DvcWarningDirs,
)
from dvc.path_info import PathInfo
from dvc.repo import Repo
Expand Down Expand Up @@ -103,7 +104,7 @@ def test_ignore_collecting_dvcignores(tmp_dir, dvc, dname):
ignore_file = tmp_dir / dname / DvcIgnore.DVCIGNORE_FILE
ignore_file.write_text("foo")

assert len(dvc.tree.dvcignore.ignores) == 3
assert len(dvc.tree.dvcignore.ignores) == 4
assert DvcIgnoreDirs([".git", ".hg", ".dvc"]) in dvc.tree.dvcignore.ignores
ignore_pattern_trie = None
for ignore in dvc.tree.dvcignore.ignores:
Expand Down Expand Up @@ -158,7 +159,7 @@ def test_match_nested(tmp_dir, dvc):
assert result == {".dvcignore", "foo"}


def test_ignore_external(tmp_dir, scm, dvc, tmp_path_factory):
def test_ignore_external(tmp_dir, dvc, tmp_path_factory):
tmp_dir.gen(".dvcignore", "*.backup\ntmp")
ext_dir = TmpDir(os.fspath(tmp_path_factory.mktemp("external_dir")))
ext_dir.gen({"y.backup": "y", "tmp": "ext tmp"})
Expand Down Expand Up @@ -265,7 +266,7 @@ def test_ignore_directory(tmp_dir, dvc):
}


def test_multi_ignore_file(tmp_dir, dvc, monkeypatch):
def test_multi_ignore_file(tmp_dir, dvc):
tmp_dir.gen({"dir": {"subdir": {"should_ignore": "1", "not_ignore": "1"}}})
tmp_dir.gen(DvcIgnore.DVCIGNORE_FILE, "dir/subdir/*_ignore")
tmp_dir.gen({"dir": {DvcIgnore.DVCIGNORE_FILE: "!subdir/not_ignore"}})
Expand Down Expand Up @@ -389,3 +390,20 @@ def test_ignore_in_added_dir(tmp_dir, dvc):
dvc.checkout()

assert not ignored_path.exists()


def test_special_folder_warning(tmp_dir, dvc, capsys):
for d in DvcWarningDirs.special_dirs:
tmp_dir.gen({"dir": {"tmp": "content"}, d: {"file": "test"}})

for _ in dvc.tree.walk_files("."):
pass

captured = capsys.readouterr()
output = captured.out
message = (
f"Warning: {d} found in dvc, traversing may be slow."
f" You can add it in .dvcignore. See more "
f"https://dvc.org/doc/user-guide/dvcignore"
)
assert message in output