Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ref: data status updates #3924

Merged
merged 16 commits into from
Oct 18, 2022
Merged

ref: data status updates #3924

merged 16 commits into from
Oct 18, 2022

Conversation

jorgeorpinel
Copy link
Contributor

Follow up to #3812 (comment)

@shcheklein shcheklein temporarily deployed to dvc-org-ref-data-status-bnsygg September 6, 2022 20:16 Inactive
@jorgeorpinel jorgeorpinel mentioned this pull request Sep 6, 2022
@jorgeorpinel jorgeorpinel added A: docs Area: user documentation (gatsby-theme-iterative) C: ref Content of /doc/*-reference labels Sep 6, 2022
@shcheklein shcheklein temporarily deployed to dvc-org-ref-data-status-bnsygg September 6, 2022 20:20 Inactive
Comment on lines -51 to -62
`dvc-commit`-ed that differs with the last Git commit. There might be more
detailed state on how each of those files changed: _added_, _modified_,
_deleted_ and _unknown_.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question though: what's an unknown change? Does it apply to untracked files too? I didn't know where to mention that in the updated text.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, let's say you clone a repo with a tracked directory but haven't done dvc fetch/pull. You have an md5 in the .dvc file but the directory is empty. If you modify the directory (for example, add a file to it), DVC can tell that the directory doesn't match the md5 but has no info about the granular changes to the files in that directory.

Copy link
Contributor Author

@jorgeorpinel jorgeorpinel Sep 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DVC can tell that the directory doesn't match the md5 but has no info about the granular changes to the files in that directory

So it only applies to --granular right? (Assuming that generalizing that example is the one scenario for this.) I added a note in that option instead of trying to cover it up here (it still counts as "new, modified, or deleted"). aca9d8a

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@skshetry Do you know if it can ever apply to non-granular data?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, it only applies to granular.

@shcheklein shcheklein temporarily deployed to dvc-org-ref-data-status-bnsygg September 6, 2022 20:23 Inactive
@shcheklein shcheklein had a problem deploying to dvc-org-ref-data-status-bnsygg September 6, 2022 20:23 Failure
@jorgeorpinel jorgeorpinel marked this pull request as ready for review September 6, 2022 20:23
@jorgeorpinel
Copy link
Contributor Author

jorgeorpinel commented Sep 6, 2022

There may be other things for you to include here based on pending comments under #3812 (comment) @dberenbaum

Thanks!

@github-actions
Copy link
Contributor

github-actions bot commented Sep 6, 2022

Link Check Report

1/3 links failed.

@@ -83,31 +88,6 @@ option.

- `-v`, `--verbose` - displays detailed tracing information.

## Examples
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why removing it?

Copy link
Contributor Author

@jorgeorpinel jorgeorpinel Sep 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest removing it because it's already used in the description.

You asked me to make a PR with the most important changes from my feedback and IMO this is one of them: #3812 (comment).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#3812 (comment) - the problem is it was discussed already. Question - do you have an actually strong opinion about this? :) (I'm personally fine and Dave was fine it seems)

Copy link
Contributor Author

@jorgeorpinel jorgeorpinel Sep 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't realize it had been discussed (I didn't read all the resolved comments). I still think it's unnecessary since the --granular example can still refer to the one in the description.

It's not a very strong opinion but it does go against good practices for cmd ref examples, I think: they should add special value to the doc, not just cover obvious cases. And to avoid redundancy in general.

But anyway, I rolled it back in 725c7e4 since it was discussed by them.

Copy link
Member

@shcheklein shcheklein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorgeorpinel we go into another round of discussions here I feel, again there are copy edits + substnatial product changes together. It makes review complicated.

Copy link
Contributor Author

@jorgeorpinel jorgeorpinel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

content/docs/command-reference/data/status.md Outdated Show resolved Hide resolved
content/docs/command-reference/data/status.md Outdated Show resolved Hide resolved
@shcheklein shcheklein temporarily deployed to dvc-org-ref-data-status-bnsygg September 6, 2022 22:01 Inactive
@shcheklein shcheklein temporarily deployed to dvc-org-ref-data-status-bnsygg September 7, 2022 05:07 Inactive
@shcheklein shcheklein temporarily deployed to dvc-org-ref-data-status-bnsygg September 7, 2022 05:44 Inactive
@shcheklein shcheklein temporarily deployed to dvc-org-ref-data-status-bnsygg September 7, 2022 06:04 Inactive
This may happen after cloning a DVC repository but before using `dvc pull` (or
`dvc fetch`) to download the data; or after using `dvc gc`.

- _DVC committed changes_ are new, modified, or deleted tracked files or
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's call it added to keep it in sync with the output.

Suggested change
- _DVC committed changes_ are new, modified, or deleted tracked files or
- _DVC committed changes_ are added, modified, or deleted tracked files or

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same on DVC uncommitted changes.

Copy link
Contributor Author

@jorgeorpinel jorgeorpinel Sep 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not using back ticks because this doesn't intend to reflect the exact output. I think "added" is misleading and the output itself seems to be up for debate. Please see/ comment in iterative/dvc#7943 (review).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorgeorpinel It's a good point, but @skshetry also has a reasonable rationale for using "added," so what do you think about continuing discussion in another issue and leaving "added" to get this merged?

@@ -1,6 +1,6 @@
# data status

Show changes in the data tracked by DVC in the workspace.
Show changes to the files and directories tracked by DVC.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The command name is data. I am not sure what we get by replacing it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are specific about what we mean by data. Docs should explain 🙂

@shcheklein shcheklein temporarily deployed to dvc-org-ref-data-status-bnsygg September 13, 2022 01:30 Inactive
@shcheklein shcheklein temporarily deployed to dvc-org-ref-data-status-bnsygg September 13, 2022 01:37 Inactive
@shcheklein shcheklein temporarily deployed to dvc-org-ref-data-status-bnsygg September 13, 2022 01:45 Inactive
@jorgeorpinel jorgeorpinel requested review from dberenbaum and removed request for dberenbaum September 13, 2022 01:46
Comment on lines +76 to +78
directories. Note that some granular changes may be reported as `unknown` as
DVC tracks
[directory-level hash values](doc/user-guide/project-structure/internal-files#directories).
Copy link
Contributor Author

@jorgeorpinel jorgeorpinel Sep 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this comment tbh.
From aca9d8a#r83797370

Do you mean you're not familiar with the underlying situation this refers to @shcheklein? Please see #3924 (comment) if so. It's kind of confusing (maybe I didn't even get it right, waiting for another review from the core team).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another alternative is to not even document this specifically, if it's evident what unknown means in the CLI output (self-documenting).

Suggested change
directories. Note that some granular changes may be reported as `unknown` as
DVC tracks
[directory-level hash values](doc/user-guide/project-structure/internal-files#directories).
directories.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might understand this if read the comment (i haven't yet), but I would expect to be able to understand it from the text, right?

do you mean that some files (not granular changes) can be reported as "unknown" if (when? as DVC tracks is not informative)?

Copy link
Contributor Author

@jorgeorpinel jorgeorpinel Sep 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it's not that easy to explain briefly. How about this?

Suggested change
directories. Note that some granular changes may be reported as `unknown` as
DVC tracks
[directory-level hash values](doc/user-guide/project-structure/internal-files#directories).
directories. Note that granular changes may have "unknown" state if tracked
[directory contents](doc/user-guide/project-structure/internal-files#directories)
are not present in the local cache (see also `dvc fetch`).

Probably still doesn't explain why but at least it describes the scenario more clearly, I think.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Status for files inside DVC-tracked directories may be reported as "unknown" if information about directories is missing from the cache yet (see ...)

@dberenbaum

This comment was marked as resolved.

improve top admons in data status and dvc data status
@jorgeorpinel

This comment was marked as resolved.

@jorgeorpinel jorgeorpinel added the p2-nice-to-have Less of a priority at the moment. We don't usually deal with this immediately. label Oct 18, 2022
Copy link
Contributor Author

@jorgeorpinel jorgeorpinel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK well I guess it's all approved now. The only change after that is improved cross-tip admon/links:

Comment on lines +6 to 10
<admon type="tip">

For [pipelines](/doc/user-guide/pipelines) status, use `dvc status`.
For the status of <abbr>data pipelines</abbr>, see `dvc status`.

</admon>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This

Comment on lines +7 to +9
<admon type="tip">

For an equivalent to `git status`, use `dvc data status`.
For the status of tracked data, see `dvc data status` (similar to `git status`).
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and this

@jorgeorpinel jorgeorpinel merged commit f7a4e67 into main Oct 18, 2022
@jorgeorpinel jorgeorpinel deleted the ref/data-status branch October 18, 2022 07:27
@github-actions
Copy link
Contributor

f7a4e67

Link Check Report

There were no links to check!

CML watermark

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: docs Area: user documentation (gatsby-theme-iterative) C: ref Content of /doc/*-reference p2-nice-to-have Less of a priority at the moment. We don't usually deal with this immediately.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants