-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
repofs: use underlying fs.download to download files #6401
Conversation
@@ -253,3 +255,9 @@ def info(self, path_info): | |||
ret[obj.hash_info.name] = obj.hash_info.value | |||
|
|||
return ret | |||
|
|||
def _download(self, from_info, to_file, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the record: this is a bit ugly, but it will turn into a cleaner and more proper fs.get_file
when migrating dvcfs to fsspec.
def open( # type: ignore | ||
self, path: PathInfo, mode="r", encoding=None, remote=None, **kwargs | ||
): # pylint: disable=arguments-differ | ||
def _get_fs_path(self, path: PathInfo, remote=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This didn't handle chained imports before and neither does it handle them now. We'll need to generalize resolving logic we use in dvc/dependency/repo.py to use it here. Will prob look into it in a followup.
|
||
def _download(self, from_info, to_file, **kwargs): | ||
fs, path = self._get_fs_path(from_info) | ||
fs._download( # pylint: disable=protected-access |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This again is temporary, it will be replaced by just get_file
in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This makes
dvc get
work just as fast asimport/pull
. The reason is thatopen()
is harder to make fast (e.g. we need to know what block size to use, etc, for which we don't yet have a very good internal infrastructure), and it is much easier to just use nativefs.download
method that has been already tuned for maximum performance.Related to #5546
Fixes #6019
❗ I have followed the Contributing to DVC checklist.
📖 If this PR requires documentation updates, I have created a separate PR (or issue, at least) in dvc.org and linked it here.
Thank you for the contribution - we'll try to review it as soon as possible. 🙏