-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
remote: skip non-cache looking files when estimating remote size #3557
Conversation
@@ -697,17 +697,18 @@ def checksum_to_path_info(self, checksum): | |||
def list_cache_paths(self, prefix=None, progress_callback=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is pretty much a walk_files
method, but it is not supported in every remote yet, so we are using a legacy implementation for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using walk_files
would've been much more intuitive, esp since walk
presumes that it is an iterator, unlike list_cache_paths
which could be anything.
Oops, I think I forgot to change the places that expect cache paths but get hashes. Looking into it. |
`list_cache_paths()` doesn't ignore paths that don't look like hashes, which makes dvc raise an exception when it tries to convert such paths to hashes. We should use `all()` instead. Discord context: https://discordapp.com/channels/485586884165107732/485596304961962003/694454282719592458
This will allow us to see it from verbose logs coming from users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will fix the bug for now.
In the changes for #3543 it will be cleaned up a bit more to keep the separation between all()
returning all checksums in the remote and list_cache_paths()
allowing filtering by prefix
@@ -936,13 +935,13 @@ def update(n=1): | |||
pbar.update(n * total_prefixes) | |||
|
|||
if max_remote_size: | |||
paths = self._cache_paths_with_max( | |||
checksums = self._all_with_limit( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, maybe simply islice()
?
dvc push
and other commands will fail if they find a file that doesn't look like a cache file in the remote. The issue is that we've started usinglist_cache_paths
for remote size estimation, which doesn't skip non-cache looking files. We should useall()
instead.Discord context: https://discordapp.com/channels/485586884165107732/485596304961962003/694454999492591657
❗ I have followed the Contributing to DVC checklist.
📖 If this PR requires documentation updates, I have created a separate PR (or issue, at least) in dvc.org and linked it here. If the CLI API is changed, I have updated tab completion scripts.
❌ I will check DeepSource, CodeClimate, and other sanity checks below. (We consider them recommendatory and don't expect everything to be addressed. Please fix things that actually improve code or fix bugs.)
Thank you for the contribution - we'll try to review it as soon as possible. 🙏