Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remote: use progress bar for remote cache query status during dvc gc #3559

Merged
merged 3 commits into from
Apr 1, 2020

Conversation

pmrowla
Copy link
Contributor

@pmrowla pmrowla commented Mar 31, 2020

  • ❗ I have followed the Contributing to DVC checklist.

  • πŸ“– If this PR requires documentation updates, I have created a separate PR (or issue, at least) in dvc.org and linked it here. If the CLI API is changed, I have updated tab completion scripts.

  • ❌ I will check DeepSource, CodeClimate, and other sanity checks below. (We consider them recommendatory and don't expect everything to be addressed. Please fix things that actually improve code or fix bugs.)

Thank you for the contribution - we'll try to review it as soon as possible. πŸ™

  • dvc gc -c now uses the same behavior as push/pull when fetching checksums from a remote cache
  • Progress bar is displayed using the estimated cache size as the total, and checksums are fetched in parallel by prefix
  • This also reverts the estimation progress bar total changes as discussed in remote: short-circuit remote size estimation for large remotesΒ #3537 - we now just display the running estimated size without using the max limit as total

Will close #3543

@pmrowla pmrowla self-assigned this Mar 31, 2020
@pmrowla
Copy link
Contributor Author

pmrowla commented Mar 31, 2020

asciicast

dvc gc is still very quiet until it gets to the actual cache query step - it still looks like it's hanging for the first 2:50 in the example

Comment on lines +138 to +146
def test_cache_checksums():
remote = RemoteBASE(None, {})
remote.path_info = PathInfo("foo")

with mock.patch.object(
remote, "list_cache_paths", return_value=["12/3456", "bar"]
):
checksums = list(remote.cache_checksums())
assert checksums == ["123456"]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test is for #3558

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pmrowla Thank you so much! πŸ™

Comment on lines 710 to 712
logger.debug(
"'%s' doesn't look like a cache file, skipping", path
)
Copy link
Contributor Author

@pmrowla pmrowla Mar 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is makes push/pull/gc very verbose now to the point that I'm not sure it's worth logging?

...
2020-03-31 18:49:52,911 DEBUG: '/Users/pmrowla/git/remote-1m-files/.dvc/cache/c5/d84d5879b6266fe8624802e99ff211.dir.unpacked/000360c8.txt' doesn't look like a cache file, skipping
2020-03-31 18:49:52,935 DEBUG: '/Users/pmrowla/git/remote-1m-files/.dvc/cache/c5/d84d5879b6266fe8624802e99ff211.dir.unpacked/00001f77.txt' doesn't look like a cache file, skipping
2020-03-31 18:49:53,122 DEBUG: '/Users/pmrowla/git/remote-1m-files/.dvc/cache/c5/d84d5879b6266fe8624802e99ff211.dir.unpacked/000569a4.txt' doesn't look like a cache file, skipping
...

(it catches every unpacked file in local .dvc/cache)

Prior to #3557 this exception handler was just a pass in RemoteBASE.all()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pmrowla It is verbose in -v verbose mode, but at least we know that it does skip it. Maybe it is time to support -vv and hide this, "assuming" and state logs under it. Will need to look into it, added a comment to #2329 .

@pmrowla pmrowla marked this pull request as ready for review March 31, 2020 10:09
@efiop efiop merged commit ee0066f into iterative:master Apr 1, 2020
@pmrowla pmrowla deleted the 3543 branch April 2, 2020 04:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

remote: use progress bar for remote cache query status during dvc gc
3 participants