-
Notifications
You must be signed in to change notification settings - Fork 393
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ref: data status
updates
#3924
ref: data status
updates
#3924
Changes from 4 commits
6b1a4ab
d0aef73
d5b1373
9105ff4
23c7eb8
d925744
725c7e4
9ae9bd7
70f1cb0
f3fa7c6
1cf235b
ce29566
aca9d8a
87c8f51
7e15f6e
9c11972
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,6 @@ | ||
# data status | ||
|
||
Show changes in the data tracked by DVC in the workspace. | ||
Show changes to the files and directories tracked by DVC. | ||
|
||
## Synopsis | ||
|
||
|
@@ -43,32 +43,37 @@ DVC uncommitted changes: | |
(there are other changes not tracked by dvc, use "git status" to see) | ||
``` | ||
|
||
As shown above, the `dvc data status` displays changes in multiple categories: | ||
|
||
- _Not in cache_ indicates that the hash for files are recorded in `dvc.lock` | ||
and `.dvc` files but the corresponding cache files are missing. | ||
- _DVC committed changes_ indicates that there are changes that are | ||
`dvc-commit`-ed that differs with the last Git commit. There might be more | ||
detailed state on how each of those files changed: _added_, _modified_, | ||
_deleted_ and _unknown_. | ||
Comment on lines
-60
to
-62
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. One question though: what's an There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For example, let's say you clone a repo with a tracked directory but haven't done There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
So it only applies to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @skshetry Do you know if it can ever apply to non-granular data? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yeah, it only applies to granular. |
||
- _DVC uncommitted changes_ indicates that there are changes in the working | ||
directory that are not `dvc commit`-ed yet. Same as _DVC committed changes_, | ||
there might be more detailed state on how each of those files changed. | ||
- _Untracked files_ shows the files that are not being tracked by DVC and Git. | ||
This is disabled by default, unless [`--untracked-files`](#--untracked-files) | ||
is specified. | ||
- _DVC unchanged files_ shows the files that are not changed. This is not shown | ||
by default, unless [`--unchanged`](#--unchanged) is specified. | ||
|
||
By default, `dvc data status` does not show individual changes inside the | ||
tracked directories, which can be enabled with [`--granular`](#--granular) | ||
option. | ||
`dvc data status` displays changes in multiple categories: | ||
|
||
- `Not in cache` indicates that there are file records (hashes) in `.dvc` or | ||
shcheklein marked this conversation as resolved.
Show resolved
Hide resolved
|
||
`dvc.lock` files, but the corresponding <abbr>cache</abbr> files are missing. | ||
This may happen after cloning a DVC repository but before using `dvc pull` (or | ||
`dvc fetch`) to download the data; or after using `dvc gc`. | ||
|
||
- `Committed changes` are new, modified, or deleted tracked files or directories | ||
shcheklein marked this conversation as resolved.
Show resolved
Hide resolved
jorgeorpinel marked this conversation as resolved.
Show resolved
Hide resolved
|
||
that have been [committed to DVC]. These may be ready for committing to Git. | ||
|
||
- `Uncommitted changes` are new, modified, or deleted tracked files or | ||
jorgeorpinel marked this conversation as resolved.
Show resolved
Hide resolved
|
||
directories that have not been [committed to DVC] yet. You can `dvc add` or | ||
`dvc commit` these. | ||
|
||
- `Untracked files` have not been added to DVC (nor Git). Only shown if the | ||
`--untracked-files` flag is used. | ||
|
||
- `Unchanged files` have no modifications. Only shown if the `--unchanged` flag | ||
is used. | ||
|
||
Individual changes to files inside [directories tracked as a whole] are not | ||
jorgeorpinel marked this conversation as resolved.
Show resolved
Hide resolved
|
||
shown by default but this can be enabled with the `--granular` flag. | ||
|
||
[committed to dvc]: /doc/command-reference/commit | ||
[directories tracked as a whole]: | ||
/doc/command-reference/add#adding-entire-directories | ||
|
||
## Options | ||
|
||
- `--granular` - show granular, file-level information of the changes for | ||
DVC-tracked directories. By default, `dvc data status` does not show | ||
individual changes for files inside the tracked directories. | ||
- `--granular` - show granular file-level changes inside DVC-tracked | ||
directories. Not included by default | ||
|
||
- `--untracked-files` - show files that are not being tracked by DVC and Git. | ||
|
||
|
@@ -83,31 +88,6 @@ option. | |
|
||
- `-v`, `--verbose` - displays detailed tracing information. | ||
|
||
## Examples | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why removing it? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I suggest removing it because it's already used in the description. You asked me to make a PR with the most important changes from my feedback and IMO this is one of them: #3812 (comment). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. #3812 (comment) - the problem is it was discussed already. Question - do you have an actually strong opinion about this? :) (I'm personally fine and Dave was fine it seems) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I didn't realize it had been discussed (I didn't read all the resolved comments). I still think it's unnecessary since the It's not a very strong opinion but it does go against good practices for cmd ref examples, I think: they should add special value to the doc, not just cover obvious cases. And to avoid redundancy in general. But anyway, I rolled it back in 725c7e4 since it was discussed by them. |
||
|
||
```dvc | ||
$ dvc data status | ||
Not in cache: | ||
(use "dvc fetch <file>..." to download files) | ||
data/data.xml | ||
|
||
jorgeorpinel marked this conversation as resolved.
Show resolved
Hide resolved
|
||
DVC committed changes: | ||
(git commit the corresponding dvc files to update the repo) | ||
modified: data/features/ | ||
|
||
jorgeorpinel marked this conversation as resolved.
Show resolved
Hide resolved
|
||
DVC uncommitted changes: | ||
(use "dvc commit <file>..." to track changes) | ||
(use "dvc checkout <file>..." to discard changes) | ||
deleted: model.pkl | ||
(there are other changes not tracked by dvc, use "git status" to see) | ||
``` | ||
|
||
This shows that the `data/data.xml` is missing from the cache, `data/features/` | ||
a directory, has changes that are being tracked by DVC but is not Git committed | ||
yet, and a file `model.pkl` has been deleted from the workspace. The | ||
`data/features/` directory is modified, but there is no further details to what | ||
changed inside. The `--granular` option can provide more information on that. | ||
|
||
## Example: Granular output | ||
|
||
Following on from the above example, using `--granular` will show file-level | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The command name is
data
. I am not sure what we get by replacing it.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are specific about what we mean by data. Docs should explain 🙂