-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dvc: update gc to remove unpacked dir #3054
Conversation
dvc/remote/local.py
Outdated
@@ -152,6 +152,9 @@ def remove(self, path_info): | |||
|
|||
if self.exists(path_info): | |||
remove(path_info.fspath) | |||
# Remove unpacked if dir checksum is true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that this comment is necessary, the code is pretty self-explanatory :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the comment.
Ok, @sharidas, as you probably noticed, the tests do not pass. The reason for that is that you are trying to check
|
@pared Thanks for the support. I did implemented the way suggested here #3054 (comment). Unfortunately the tests are still failing. Need to look into it. |
dvc/remote/local.py
Outdated
@@ -151,6 +151,9 @@ def remove(self, path_info): | |||
raise NotImplementedError | |||
|
|||
if self.exists(path_info): | |||
checksum = self.get_checksum(path_info) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@skshetry Not sure. I was looking into one of the test, basically looking into the stack trace and found https://github.com/iterative/dvc/blob/master/tests/func/test_checkout.py#L320. And this test does not parse the gc code if I noticed the flow correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The updated .remove()
has .get_checksum()
which gets checksum from .dvc/state
AFAIK, but the test does not load the state and hence the exception (i assume as the state.cursor
is None
). You can load the state via with self.dvc.state:
though. But, I'd try moving the logic after this line (though, a check if it's local needs to be added), but @iterative/dvc developers might have a different suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@skshetry is right and that is also what we meant in the solution description in #2946 (comment) . This PR is trying to make a change to a generic method that doesn't really need to know anything about checksums and is often used in other places where it is not expected to have such a behaviour.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@efiop Thanks for the feedback. I was trying to make the solution close to the remote local. Now I have updated the pr again by making change in base.py. Initially I made the change in the base.py as suggested in the issue. Later I thought it would be better to make the solution close to the local.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sharidas Makes sense, I guess my comment about "local" was confusing, sorry about that 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! Please see a comment above. Also, please don't forget to include issue number in the PR or commit desc. #2946 🙂
@efiop I have added |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Please see a few minor comments.
dvc/remote/base.py
Outdated
@@ -697,6 +697,9 @@ def gc(self, named_cache): | |||
if checksum in used: | |||
continue | |||
path_info = self.checksum_to_path_info(checksum) | |||
if self.is_dir_checksum(checksum): | |||
if callable(getattr(self, "_get_unpacked_dir_path_info")): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably don't need callable
🙂 Also, this getattr
will keep working normally even if we remove _get_unpacked_dir_path_info
at some point. So your implementation is correct, but maybe we could clean it up a bit. How about we mimic the other unpacked_dir
logic that we have in the code and define _remove_unpacked_dir
method in base(as a noop) and in local, so that we could simply:
if self.is_dir_checksum(checksum):
self._remove_unpacked_dir(checksum)
here, and won't have to do getattr
tricks?
Also, please add a test for this to tests/func/test_gc.py
. Something similar to the reproduction script in the issue would be just right 🙂
Updated the test as per the suggestion from #3054 (comment) |
@@ -204,3 +205,15 @@ def test_gc_no_dir_cache(tmp_dir, dvc, repo_template): | |||
|
|||
def _count_files(path): | |||
return sum(len(files) for _, _, files in os.walk(path)) | |||
|
|||
def test_gc_no_unpacked_dir(tmp_dir, dvc, repo_template): | |||
dir_stages = dvc.add("dir") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can use stage, = tmp_dir.dvc_gen({"dir": {"file": "file_content"}})
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you choose to go this way there will be no need for tepo_template :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pared repo_template
is fine too though.
@sharidas Looks like you didn't install pre-commit hooks described in our contribution guide, hence why the styling checks are failing. |
In local remote its found that gc does not remove the unpacked dir. This change set helps to remove it. Signed-off-by: Sujith H <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
In local remote its found that gc does not
remove the unpacked dir. This change set
helps to remove it.
Signed-off-by: Sujith H [email protected]
❗ Have you followed the guidelines in the Contributing to DVC list?
📖 Check this box if this PR does not require documentation updates, or if it does and you have created a separate PR in dvc.org with such updates (or at least opened an issue about it in that repo). Please link below to your PR (or issue) in the dvc.org repo.
❌ Have you checked DeepSource, CodeClimate, and other sanity checks below? We consider their findings recommendatory and don't expect everything to be addressed. Please review them carefully and fix those that actually improve code or fix bugs.
dvc gc
does not remove files under dir.unpacked #2946Thank you for the contribution - we'll try to review it as soon as possible. 🙏