Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

data status: show untracked files in normal mode #8061

Closed
dberenbaum opened this issue Jul 27, 2022 · 6 comments
Closed

data status: show untracked files in normal mode #8061

dberenbaum opened this issue Jul 27, 2022 · 6 comments
Labels
A: status Related to the dvc diff/list/status p2-medium Medium priority, should be done, but less important

Comments

@dberenbaum
Copy link
Collaborator

From iterative/dvc.org#3812 (comment):

Is it a performance optimization?

It no longer is, or will no longer be of performance concern. We didn't have --untracked-files=normal support in dulwich so we had to use --untracked-files=all, so it'd be noisy and slow if you have too many untracked files.

But thanks to @dtrifiro's great work in pygit2, the git.status() is about 30x faster. Plus, we now have --untracked-files=normal support in pygit2, which will be much faster even for a very large repository. So there's no performance issue now if we decide to revisit. See iterative/scmrepo#118.

(I have been testing a repo with 150,000 untracked files from a dataset, and --untracked-files=normal takes ~20ms now)

@dberenbaum dberenbaum added A: status Related to the dvc diff/list/status p2-medium Medium priority, should be done, but less important labels Jul 27, 2022
@skshetry

This comment was marked as resolved.

@skshetry
Copy link
Member

@dberenbaum, we have the support for this in upstream now.

@skshetry
Copy link
Member

Also there have been questions about --untracked-files being inconsistent with rest of the flags (#7943 (comment)), so if we do make it normal by default, we may want to rename this to just --untracked.

@skshetry
Copy link
Member

Maybe we can even get rid of the flag, enable --untracked-files=normal by default and change to --untracked-files=all when --granular is used.

@dberenbaum
Copy link
Collaborator Author

Going back to the original in rationale in #7943 (comment):

  * Users don't expect to track the entire repo with DVC.
  * If we suggest doing `dvc data status` and `git status` as a pair, they become redundant for untracked files.

This still applies, and @mattseddon mentioned that a VS Code user already commented that it was confusing to see untracked files show up under both Git and DVC, so my preference is not to show them by default. Do you think it makes sense to always show untracked files?

change to --untracked-files=all when --granular is used.

It's a good idea, but I worry it would still get too busy if there is something like a virtualenv dir in the repo, especially since there's no target path support in dvc data status.

@dberenbaum
Copy link
Collaborator Author

Not planned for now

@dberenbaum dberenbaum reopened this May 20, 2023
@dberenbaum dberenbaum closed this as not planned Won't fix, can't repro, duplicate, stale May 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: status Related to the dvc diff/list/status p2-medium Medium priority, should be done, but less important
Projects
None yet
Development

No branches or pull requests

2 participants