Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd-ref: document new gc behavior #1023

Merged
merged 9 commits into from
Mar 19, 2020
Merged

cmd-ref: document new gc behavior #1023

merged 9 commits into from
Mar 19, 2020

Conversation

skshetry
Copy link
Member

You may disregard these recommendations if you used the Edit on GitHub button from dvc.org to improve a doc in place.

❗ Please read the guidelines in the Contributing to the Documentation list if you make any substantial changes to the documentation or JS engine.

🐛 Please make sure to mention Fix #issue (if applicable) in the description of the PR. This causes GitHub to close it automatically when the PR is merged.

Please chose to allow us to edit your branch when creating the PR.

Thank you for the contribution - we'll try to review it as soon as possible. 🙏

Fixes #1013

@skshetry skshetry added the A: docs Area: user documentation (gatsby-theme-iterative) label Feb 25, 2020
@skshetry skshetry self-assigned this Feb 25, 2020
@shcheklein shcheklein temporarily deployed to dvc-landing-gc-flag-cha-j6yay1 February 25, 2020 15:16 Inactive
Copy link
Member

@shcheklein shcheklein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great! the major thing is to align option description with other similar options

@jorgeorpinel

This comment has been minimized.

Copy link
Contributor

@jorgeorpinel jorgeorpinel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great in general, some precisions below. Not sure whether we're taking this one over (yet), please lmk @shcheklein, thanks

public/static/docs/command-reference/gc.md Outdated Show resolved Hide resolved
public/static/docs/command-reference/gc.md Outdated Show resolved Hide resolved
public/static/docs/command-reference/gc.md Outdated Show resolved Hide resolved
public/static/docs/command-reference/gc.md Outdated Show resolved Hide resolved
public/static/docs/command-reference/gc.md Outdated Show resolved Hide resolved
public/static/docs/command-reference/gc.md Outdated Show resolved Hide resolved
Copy link
Contributor

@jorgeorpinel jorgeorpinel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extracted from #1023 (comment)

One last improvement is to also update the description of -a and -T options. Something like:

- `-a`, `--all-branches` - keep cached objects referenced in all Git branches
  as well as in the workspace (implies `-w`). Useful for preserving data for ...

...

- `-T`, `--all-tags` - the same as `-a` above, but applies to Git tags as well
  as in the workspace (implies `-w`). Useful if tags are used to ...

Thanks

@shcheklein
Copy link
Member

@skshetry @jorgeorpinel what is the status of it? @skshetry is still assigned - are you going to move it forward?

@skshetry
Copy link
Member Author

skshetry commented Mar 7, 2020

@shcheklein Umm, yes. I'm trying to add --all-commits on this PR as well. I'll also have to address @jorgeorpinel reviews.

@shcheklein
Copy link
Member

@skshetry sure! np. Just wanted to be sure that I don't block it like it happened last time.

@shcheklein
Copy link
Member

@jorgeorpinel @skshetry ping! any progress on this?

@skshetry
Copy link
Member Author

@shcheklein, sorry, I couldn't get to the docs. I have adjusted it now and addressed most of the suggestions. 🙂

Please take a look.

@shcheklein shcheklein temporarily deployed to dvc-landing-gc-flag-cha-stdoiq March 18, 2020 05:28 Inactive
@shcheklein
Copy link
Member

the local cache, which is typically located on the same machine as the project
in question. This usually helps to free up disk space.
This command deletes (garbage collects) data files or directories that exist in
DVC cache but are no longer needed. With `--cloud` it also removes data in
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but are no longer needed.

It'd be better to express this in any other way. I don't have a better suggestion, though than the following:

Suggested change
DVC cache but are no longer needed. With `--cloud` it also removes data in
DVC cache but are no longer in use. With `--cloud` it also removes data in

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question, I think needed was there before. I don't have a very strong opinion ... may be even needed better because it is more precise here - I just keep stuff that I need. need is more general then use. Still, need to review other place with in use/need.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to always use need vs in-use. I think need and needed are actually more correct here.

Copy link
Contributor

@jorgeorpinel jorgeorpinel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few more precisions but looks good to me otherwise:

content/docs/command-reference/gc.md Outdated Show resolved Hide resolved
Comment on lines +26 to +29
`--all-commits`, or any combination of them must be provided. Each of them
corresponds to the current workspace _and_ a set of commits to analyze what
files, directories and what versions are still needed and should be kept (by
analyzing DVC-files in those commits).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
`--all-commits`, or any combination of them must be provided. Each of them
corresponds to the current workspace _and_ a set of commits to analyze what
files, directories and what versions are still needed and should be kept (by
analyzing DVC-files in those commits).
`--all-commits`, or any combination of them must be provided. Each of them
corresponds to keeping the data for the current workspace and for a
different set of commits (determined by reading the DVC-files in them).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: what's the point of this option? To me it sounds like "don't gc anything" except maybe deleted parts of the repo like removed branches, rewritten commits, etc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DVC cache contains data that is not referenced in any way in Git history. E.g. run do a lot of dvc run, or dvc repro after modifying certain files/settings. If --no-commit is not specified DVC is saving data to cache not matter you do git commit to save DVC-files or not.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably mention this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mentioned a possible use case in the recent version - @jorgeorpinel pls take a look

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, thanks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still this suggestion wasn't applied but I'll address in regular updates.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put a new paragraph that explains the use case, have you seen it? Or may be I didn't get you specific suggestion ... 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see it, thanks. May reword a little in regular updates but it's great!

One of the use cases for this option is to safely delete all temporary data DVC cached when dvc run and/or dvc repro were run without committing changes to DVC-files (thus potentially caching data that is not referenced from workspace or Git commits).

content/docs/command-reference/gc.md Outdated Show resolved Hide resolved
content/docs/command-reference/gc.md Outdated Show resolved Hide resolved
content/docs/command-reference/gc.md Outdated Show resolved Hide resolved
@shcheklein shcheklein temporarily deployed to dvc-landing-gc-flag-cha-stdoiq March 18, 2020 22:51 Inactive
@shcheklein shcheklein temporarily deployed to dvc-landing-gc-flag-cha-stdoiq March 18, 2020 22:52 Inactive
@shcheklein shcheklein temporarily deployed to dvc-landing-gc-flag-cha-stdoiq March 18, 2020 22:53 Inactive
@shcheklein shcheklein temporarily deployed to dvc-landing-gc-flag-cha-stdoiq March 18, 2020 22:53 Inactive
@shcheklein shcheklein temporarily deployed to dvc-landing-gc-flag-cha-stdoiq March 19, 2020 02:48 Inactive
@shcheklein shcheklein assigned shcheklein and unassigned skshetry Mar 19, 2020
@shcheklein
Copy link
Member

@jorgeorpinel @skshetry PTAL again.

Copy link
Member Author

@skshetry skshetry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@shcheklein
Copy link
Member

@jorgeorpinel merging this, feel free to do changes on top of it!

@shcheklein shcheklein merged commit 0381090 into master Mar 19, 2020
@skshetry skshetry deleted the gc-flag-changes branch March 19, 2020 03:55
jorgeorpinel added a commit that referenced this pull request Mar 19, 2020
@jorgeorpinel
Copy link
Contributor

Great 🎉 Just added a couple more changes in 0f10925 (part of #1072)

jorgeorpinel added a commit that referenced this pull request Mar 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: docs Area: user documentation (gatsby-theme-iterative)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

cmd-ref: document gc changes
3 participants