-
Notifications
You must be signed in to change notification settings - Fork 393
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd-ref: document new gc behavior #1023
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks great! the major thing is to align option description with other similar options
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking great in general, some precisions below. Not sure whether we're taking this one over (yet), please lmk @shcheklein, thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extracted from #1023 (comment)
One last improvement is to also update the description of -a and -T options. Something like:
- `-a`, `--all-branches` - keep cached objects referenced in all Git branches
as well as in the workspace (implies `-w`). Useful for preserving data for ...
...
- `-T`, `--all-tags` - the same as `-a` above, but applies to Git tags as well
as in the workspace (implies `-w`). Useful if tags are used to ...
Thanks
@skshetry @jorgeorpinel what is the status of it? @skshetry is still assigned - are you going to move it forward? |
@shcheklein Umm, yes. I'm trying to add |
@skshetry sure! np. Just wanted to be sure that I don't block it like it happened last time. |
@jorgeorpinel @skshetry ping! any progress on this? |
5611c60
to
21572eb
Compare
@shcheklein, sorry, I couldn't get to the docs. I have adjusted it now and addressed most of the suggestions. 🙂 Please take a look. |
@skshetry @jorgeorpinel PTAL - https://dvc-landing-gc-flag-cha-stdoiq.herokuapp.com/doc/command-reference/gc - I've applied some edits to the doc |
the local cache, which is typically located on the same machine as the project | ||
in question. This usually helps to free up disk space. | ||
This command deletes (garbage collects) data files or directories that exist in | ||
DVC cache but are no longer needed. With `--cloud` it also removes data in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but are no longer needed.
It'd be better to express this in any other way. I don't have a better suggestion, though than the following:
DVC cache but are no longer needed. With `--cloud` it also removes data in | |
DVC cache but are no longer in use. With `--cloud` it also removes data in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question, I think needed was there before. I don't have a very strong opinion ... may be even needed
better because it is more precise here - I just keep stuff that I need. need is more general then use. Still, need to review other place with in use/need.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to always use need
vs in-use
. I think need
and needed
are actually more correct here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few more precisions but looks good to me otherwise:
`--all-commits`, or any combination of them must be provided. Each of them | ||
corresponds to the current workspace _and_ a set of commits to analyze what | ||
files, directories and what versions are still needed and should be kept (by | ||
analyzing DVC-files in those commits). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
`--all-commits`, or any combination of them must be provided. Each of them | |
corresponds to the current workspace _and_ a set of commits to analyze what | |
files, directories and what versions are still needed and should be kept (by | |
analyzing DVC-files in those commits). | |
`--all-commits`, or any combination of them must be provided. Each of them | |
corresponds to keeping the data for the current workspace and for a | |
different set of commits (determined by reading the DVC-files in them). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: what's the point of this option? To me it sounds like "don't gc anything" except maybe deleted parts of the repo like removed branches, rewritten commits, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DVC cache contains data that is not referenced in any way in Git history. E.g. run do a lot of dvc run
, or dvc repro
after modifying certain files/settings. If --no-commit
is not specified DVC is saving data to cache not matter you do git commit
to save DVC-files or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably mention this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mentioned a possible use case in the recent version - @jorgeorpinel pls take a look
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still this suggestion wasn't applied but I'll address in regular updates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I put a new paragraph that explains the use case, have you seen it? Or may be I didn't get you specific suggestion ... 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see it, thanks. May reword a little in regular updates but it's great!
One of the use cases for this option is to safely delete all temporary data DVC cached when dvc run and/or dvc repro were run without committing changes to DVC-files (thus potentially caching data that is not referenced from workspace or Git commits).
Co-Authored-By: Jorge Orpinel <[email protected]>
Co-Authored-By: Jorge Orpinel <[email protected]>
Co-Authored-By: Jorge Orpinel <[email protected]>
Co-Authored-By: Jorge Orpinel <[email protected]>
@jorgeorpinel @skshetry PTAL again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
@jorgeorpinel merging this, feel free to do changes on top of it! |
❗ Please read the guidelines in the Contributing to the Documentation list if you make any substantial changes to the documentation or JS engine.
🐛 Please make sure to mention
Fix #issue
(if applicable) in the description of the PR. This causes GitHub to close it automatically when the PR is merged.Please chose to allow us to edit your branch when creating the PR.
Thank you for the contribution - we'll try to review it as soon as possible. 🙏
Fixes #1013