-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
introduce new data:status command #7943
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
Very minor suggestion: Can we make the "uncommitted changes" red instead of green? And make the "committed changes" green? |
You mean the header or the complete section? |
I don't think |
Need to think about how to handle
|
I'm finding the |
If I have a tracked directory and am missing the cache, it will show as added even if the $ mkdir data
$ dvc add data
$ git add .
$ git commit -m "add data"
$ rm -rf .dvc/cache
$ dvc data status
Not in cache:
(use "dvc pull <file>..." to update your local storage)
data
DVC uncommitted changes:
(use "dvc commit <file>..." to track changes)
added: data/ |
Can we ignore pipelines outputs where For example, if I run an experiment in https://github.com/iterative/example-get-started, the output looks like: $ dvc data status
DVC committed changes:
(git commit the corresponding dvc files to update the repo)
modified: data/prepared/
modified: data/features/
modified: model.pkl
modified: scores.json
modified: prc.json
modified: roc.json
(there are other changes not tracked by dvc, use "git status" to see)
$ git status
On branch master
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git restore <file>..." to discard changes in working directory)
modified: dvc.lock
modified: params.yaml
modified: prc.json
modified: roc.json
modified: scores.json
no changes added to commit (use "git add" and/or "git commit -a") Ideally, |
It should be ignored, I tried fixing it in 231805a. I don't see in example-get-started locally. Have you fetched the latest commit from this branch? |
Looks amazing @skshetry! If we can decide on the default behavior for untracked files and address #7943 (comment) and #7943 (comment), I think it's ready to merge. For untracked files, I still think we should exclude by default. Even if we can support normal mode (and we can't for now), the other reasons still apply. I agree it's useful when debugging, but in real usage I don't think it is. Most people treat DVC-tracked outputs as special for large data and wouldn't expect DVC to show the state of the whole repo, especially since they still need to use git status alongside it. |
I think this is #7661. I have also noticed that. Will require a fix upstream. |
Yeah, unfortunately it looks like it's still an issue.
|
Sorry, this one is different. Even though it's a pipeline output, it's data being tracked by DVC, right? |
Right, I just noticed that was about not-in-cache. 🤔 I see what you mean, but I think we should ignore For example: $ dvc data status
DVC committed changes:
(git commit the corresponding dvc files to update the repo)
modified: data/prepared/
modified: data/features/
modified: model.pkl
modified: scores.json
modified: prc.json
modified: roc.json
(there are other changes not tracked by dvc, use "git status" to see)
$ git restore dvc.lock
$ dvc checkout
M data/prepared/
M model.pkl
M data/features/
$ dvc data status
DVC uncommitted changes:
(use "dvc commit <file>..." to track changes)
modified: scores.json
modified: prc.json
modified: roc.json
(there are other changes not tracked by dvc, use "git status" to see) |
Fixed in 29c2fa1. |
With b3dd1e020, |
Just a quick thought on
Shouldn't be hard to merge the JSON output of both calls (with and without |
8c21f82
to
7bd6a18
Compare
@efiop, this should be ready for review/merge. :) |
Started on moving to the new command. Found an issue with untracked files. In the following output
LMK if this was deliberate. Thanks. |
Great catch. That's an oversight on my part. Will fix it. Thanks. |
@mattseddon, I have tried fixing the sub dir untracked files issue. Let me know if it fixes for you. |
LGTM |
Great work on this @skshetry! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some string suggestions after checking iterative/dvc.org#3812.
"unchanged": "DVC unchanged files", | ||
} | ||
HINTS = { | ||
"not_in_cache": 'use "dvc pull <file>..." ' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we used back ticks in these texts? E.g.
"not_in_cache": 'use "dvc pull <file>..." ' | |
"not_in_cache": 'Use `dvc pull <file>` ' |
} | ||
HINTS = { | ||
"not_in_cache": 'use "dvc pull <file>..." ' | ||
"to update your local storage", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- storage -> cache?
- Missing sentence period
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have made changes to this in successive PRs. Now it says:
(use "dvc fetch <file>..." to download files)
It's a hint, so we don't need a period.
"committed": "git commit the corresponding dvc files " | ||
"to update the repo", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"committed": "git commit the corresponding dvc files " | |
"to update the repo", | |
"committed": "`git commit` the corresponding DVC metafiles " | |
"to update the repo.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer quotes, it's easier on eyes. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it's a convention already to code-inline this kind of things?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do use backticks on help messages, but I am not sure about the reasoning. The only other place within CLI (except help) is when we print commands in git add
hints where we just indent those.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My only rec. here is to aim for consistency if possible. Probably not major (up to you) but it's a product quality question.
"to update your local storage", | ||
"committed": "git commit the corresponding dvc files " | ||
"to update the repo", | ||
"uncommitted": 'use "dvc commit <file>..." to track changes', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"uncommitted": 'use "dvc commit <file>..." to track changes', | |
"uncommitted": 'Use `dvc commit <file>` to track changes.', |
"untracked": 'use "git add <file> ..." or ' | ||
'dvc add <file>..." to commit to git or to dvc', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"untracked": 'use "git add <file> ..." or ' | |
'dvc add <file>..." to commit to git or to dvc', | |
"untracked": 'Use `git add <file>` or ' | |
'`dvc add <file>` to track with Git or DVC.', |
"git_dirty": "there are {}changes not tracked by dvc, " | ||
'use "git status" to see', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"git_dirty": "there are {}changes not tracked by dvc, " | |
'use "git status" to see', | |
"git_dirty": "There are {}changes not tracked by DVC. " | |
'Use `git status` to see them.', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a hint, no need to upper case it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why hints don't need capitalization or punctuation. Are they part of a more complete sentence?
if not result: | ||
no_changes = "No changes" | ||
if git_info.get("is_empty", False): | ||
no_changes += " in an empty git repo" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no_changes += " in an empty git repo" | |
no_changes += " in an empty Git repo" |
DATA_STATUS_HELP = ( | ||
"Show changes between the last git commit, " | ||
"the dvcfiles and the workspace." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh and migrating discussion from iterative/dvc.org#3812 (review) cc @skshetry @shcheklein:
DATA_STATUS_HELP = ( | |
"Show changes between the last git commit, " | |
"the dvcfiles and the workspace." | |
DATA_STATUS_HELP = ( | |
"Show changes between the last Git commit, " | |
"DVC metafiles, and the workspace." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
metafile is only docs-concept
this term has not been introduced in DVC (maybe for good reasons).
I think the only reason is that we don't have a system to ensure consistency in terminology between docs and help output but we probably should. It would improve the UX.
dvcfiles is also a weird term
we should... not introduce internal terms into help / docs.
➕
And it sounds like .dvc
files specifically. What about dvc.lock
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be further simplified though, to just:
Show changes in DVC-tracked data between the last Git commit and the workspace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Metafile is a confusing term. Tbh I don't even know what it is, I can guess that it's a file having metadata about something. Also, it's a logical concept, not a physical one like .dvc
and dvc.yaml
, so I find it to be an unnecessary redirection. Unlike docs, we can afford repetition which is not that many.
Here, I think we can just avoid mentioning the files at all.
Show changes in the data tracked by DVC in the workspace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Show changes in DVC-tracked data between the last Git commit and the workspace.
yep, we can avoid metafiles, dvcfiles.
my 2cs - I find both terms suboptimal, but would probably prefer metafiles - since, yes they are technically metafiles - they contain a spec for data, yes metadata about data. At least it sounds reasonable to me. Plus we already use it in docs. May be we can avoid using this low lever terminology in help messages at all btw- that would the best option for end users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, I think we can just avoid mentioning the files at all.
we can avoid using this low lever terminology in help messages
➕➕
if old_obj is None: | ||
return {"added": [root], **d} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought of one more thing: This should be called new
I think, because you can dvc add
an updated file and it will be listed as modified
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, it's interesting Git does "new file" ... VS Code does "added"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure about this one tbh.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
data_status_parser.add_argument( | ||
"--unchanged", | ||
action="store_true", | ||
default=False, | ||
help="Show unmodified DVC-tracked files.", | ||
) | ||
data_status_parser.add_argument( | ||
"--untracked-files", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more...
Should the 2nd option be just --untracked
? Unclear why -files
is only in that name (plus it can be directories too).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, let's introduce this alternative now, and deprecate the previous option, drop it later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Git uses --untracked-files
, so that's a major reason why this is this way. Also, --untracked-files
at the moment is recursive, so the files
is more correct. There are some questions regarding it's behaviour: #8061, so I'd defer it until then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK (up to you) but remember that Git's UI is not very good. We should not aim to copy it just because it's Git. Consistency is more important IMO. And "untracked" can refer to plural (files) so both are correct, I think.
See Consolidate Repo Status.