Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remote: continue remote/cache/tree refactoring #4041

Merged
merged 16 commits into from
Jun 15, 2020

Conversation

pmrowla
Copy link
Contributor

@pmrowla pmrowla commented Jun 15, 2020

  • ❗ I have followed the Contributing to DVC checklist.

  • πŸ“– If this PR requires documentation updates, I have created a separate PR (or issue, at least) in dvc.org and linked it here. If the CLI API is changed, I have updated tab completion scripts.

  • ❌ I will check DeepSource, CodeClimate, and other sanity checks below. (We consider them recommendatory and don't expect everything to be addressed. Please fix things that actually improve code or fix bugs.)

Thank you for the contribution - we'll try to review it as soon as possible. πŸ™

Related to #3882.

  • Replaces remote.Remote and remote.Cache helper functions with actual Remote and CloudCache classes
  • Remote and CloudCache constructors take a RemoteTree parameter
  • remote.get_cloud_tree helper method can be used to return the proper tree for a given URL
  • output/dependency/cache updated to use the new functionality

@pmrowla pmrowla self-assigned this Jun 15, 2020
@pmrowla pmrowla force-pushed the separate-remote-cache-2 branch from b27d57c to aef4701 Compare June 15, 2020 11:08

class CloudCache:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quite strange why it is CloudCache and not just Cache? And why LocalCache is the only exception?

Copy link
Contributor

@efiop efiop Jun 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right, because of the push/pull. Ok, makes sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went with CloudCache to avoid the collision with dvc.cache.Cache, but maybe this should be Cache, and the other one should be RepoCache or something along those lines.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pmrowla Makes sense. I'm fine with CloudCache for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can also get rid of LocalCache in the future by making push/pull/status more generic for syncing data between any Cache and any Remote, rather than having it be specific to LocalCache, but I don't think there's any pressing need for that right now.

Comment on lines +82 to +88
def get_remote(repo, **kwargs):
tree = get_cloud_tree(repo, **kwargs)
if tree.scheme == "local":
return LocalRemote(tree)
if tree.scheme == "ssh":
return SSHRemote(tree)
return Remote(tree)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For initializing caches, we now just do

tree = get_cloud_tree(<remote config>)
cache = CloudCache(tree)

But for remotes, we still have this helper method. It would be ideal if we could just always use Remote(tree) but LocalRemote and SSHRemote still have their own overridden batch exists methods. It may be worth investigating whether or not the custom ssh/local methods are still needed with the other performance improvements that we have now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe those custom methods could be moved to the trees. IIRC ssh uses some custom sftp channel handling, which might be abstracted away too. But surely could do that later. Maybe consider creating an issue for it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One other question I had was whether or not the indexing related code should also just go into the trees at some point instead of being DVC Remote specific

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pmrowla Oh, that's a very good question! To me, it seems purely remote-related, as it solely relies on remote structure. But how do you see it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it's remote specific it makes sense to keep it in the Remote class for now, but I wasn't sure if indexing any kind of content (as opposed to just DVC Remotes) in a remote/cloud tree is something we might want to support in the future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure there is a good way to index arbitrary data in that way, it is really tailored for our cache structure, so I really doubt that full index logic will ever make its way into the Tree. But, I'm sure that parts of it will, as many functions are actually quite generic (like walk_files replacing _list_cache).

@pmrowla pmrowla marked this pull request as ready for review June 15, 2020 13:08
@pmrowla pmrowla changed the title [WIP] remote: continue remote/cache/tree refactoring remote: continue remote/cache/tree refactoring Jun 15, 2020
@efiop efiop merged commit a0c229c into iterative:master Jun 15, 2020
@pmrowla pmrowla deleted the separate-remote-cache-2 branch June 15, 2020 13:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants