-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] repo: use unified RepoTree for erepos #3716
Conversation
741879a
to
6702399
Compare
@@ -139,7 +139,5 @@ def walk(self, top, topdown=True): | |||
|
|||
yield root, dirs, files | |||
|
|||
def walk_files(self, top): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is moved into dvc.scm.tree.BaseTree
dvc/remote/base.py
Outdated
if is_working_tree(self.repo.tree): | ||
self.state.save(path_info, checksum) | ||
self.state.save(new_info, checksum, tree=self.cache.tree) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that repo.tree
may be a GitTree, RemoteLocal (self.cache
here) has it's own WorkingTree instance. When working with GitTree's, repo.tree
must be used to look for paths in the git-index, and RemoteLocal.tree
must be used to look for local cache paths. state.save
also has to know which tree to use - it uses repo.tree
by default, but when the repo.tree
is a GitTree, state.save
must be passed the local cache tree instead
Here, path_info
is a git-index blob, and not an actual filesystem path with an inode, so we can't do anything with it in state. But new_info
is an actual fs path to the newly created local cache file, so we do want to save it in state.
There's a still bug that needs to be fixed, where git logger messages are not being suppressed and cause the caplog based tests in plot to fail, but other than that this can be reviewed now. (but do not merge as long as it's still labeled WIP) |
From discussion w/@efiop: need to clean up state changes with regard to local cache (state should just be a no-op when the parent repo tree is not a working tree) |
2c1f7e7
to
6ec185b
Compare
dvc/external_repo.py
Outdated
def use_cache(self, cache): | ||
"""Use specified cache instead of erepo tmpdir cache.""" | ||
self._local_cache = cache | ||
if hasattr(self, "cache"): | ||
save_cache = self.cache.local | ||
self.cache.local = cache | ||
# make cache aware of our repo tree | ||
with cache.erepo_tree(self.tree): | ||
yield | ||
if hasattr(self, "cache"): | ||
self.cache.local = save_cache | ||
self._local_cache = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This context manager lets us use an existing local cache instead of the default tmpdir erepo cache. Before we would just edit erepo.cache.local.cache_dir
to make downloads go where we wanted them to, but we were not using other cache settings like the proper link types.
combined w/the Remote.erepo_tree
context manager, we can now use the correct local cache instance, and also make sure that local cache instance is aware of the erepo git tree
dvc/external_repo.py
Outdated
filter_info = None | ||
return path_info, filter_info | ||
|
||
def get_external(self, path, to_info, recursive=False, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Future PR will be needed to add import/get -R
support for #3087
dvc/external_repo.py
Outdated
self._fetch_external(fetch_infos, **kwargs) | ||
|
||
def _pull_cached(self, out, path_info, dest): | ||
with self.state: | ||
tmp = PathInfo(tmp_fname(dest)) | ||
src = tmp / path_info.relative_to(out.path_info) | ||
self.repo_tree.copytree(path_info, to_info) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get_external
is now essentially just a fetch
and then checkout
(checkout is done for dvc outs in RepoTree.copytree
)
* get_external is now fetch then checkout to specific destination
recursively added DVC outs * Partial fix for iterative#3087
py35 rebase errors scm: rebase errors rebase errors
closing this, it will be reworked and split into smaller PRs |
β I have followed the Contributing to DVC checklist.
π If this PR requires documentation updates, I have created a separate PR (or issue, at least) in dvc.org and linked it here. If the CLI API is changed, I have updated tab completion scripts.
β I will check DeepSource, CodeClimate, and other sanity checks below. (We consider them recommendatory and don't expect everything to be addressed. Please fix things that actually improve code or fix bugs.)
Thank you for the contribution - we'll try to review it as soon as possible. π
PR implements the following:
dvc.repo.tree.RepoTree
finished - RepoTree can be used to view a DvcTree and separate WorkingTree or GitTree as a single unified tree instancewalk()
/walk_files()
will walk and merge both treescopytree(src, dest)
will copy the contents of the tree (starting fromsrc
) into the dest pathnameremote.save_tree()
,remote.save_obj()
can be used to save trees or file objects directly to local cachegit checkout
unlessfor_write=True
dvc ls
: Now uses RepoTree instead of DvcTree + separate git file handlingdvc get
: Now uses ExternalRepo.get_external() for handling all dvc + git filesdvc fetch
: Now uses ExternalRepo.fetch_external() for handling all dvc + git filesRecursiveImportError
now raised when trying toimport
/get
a git directory which contains dvcfiles (import/get directory with git files and dvc outputsΒ #3087). Support forimport -R
will be implemented in a future PR.TODO:
Fixes #3611
Partial fix for #3087