-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Google Drive support further enhancements #2865
Comments
@Maxris would be helpful if you include links to the discussions in the initial PR. |
@shcheklein sure 👍 |
also add dvc/dvc/remote/gdrive/__init__.py Line 120 in d42a04e
see #2875 (comment) (source at https://github.com/gsuitedevs/PyDrive/blob/68cea204cdcd10bab9321580164c8f4961385a0f/pydrive/files.py#L202) which means this is non-trivial to add |
@casperdcl thanks, have added this to the list |
@Maxris maybe start from tests, then pick whatever you like. |
One more GDrive related issue that can be added to the list #3015 |
(Added "Support file streaming in dvc.api.open() function (more details in #3408)" checkbox.) |
@shcheklein just to put |
@Maxris k, so meaning that we need to put an assert in the |
yes, exactly |
I'm addressing this in iterative/dvc.org#1096 (review) ✅ Also, how about
? Maybe also for service account params. |
not sure about this, I don't see that much difference with other remotes and other params they have |
Regarding adding CC @shcheklein the stack:
Could monkey-patch doing: class Custom(http.client.HTTPResponse):
def _readall_chunked(...):
with Tqdm(...) as pbar:
...
httplib2.HTTPConnectionWithTimeout.response_class = Custom
gdrive_file.GetContentFile()
httplib2.HTTPConnectionWithTimeout.response_class = http.client.HTTPResponse but it's not pretty. Opened upstream httplib2/httplib2#165 |
* wip * gdrive: add progress Part of #2865 See #2865 (comment) * gdrive: move towards next pydrive2 release - depends on iterative/PyDrive2#30 * update to latest pydrive>=1.4.11 * avoid unneeded API call * progress: gdrive: ensure proper bar_format
Fixes iterative#3408 Related iterative#2865 Fixes iterative#3897
Fixes iterative#3408 Related iterative#2865 Fixes iterative#3897
* gdrive: add open Fixes #3408 Related #2865 Fixes #3897 * dependency: add gdrive * test: api: open: gdrive * rollback pydrive dependency * Revert "dependency: add gdrive" This reverts commit fd33326. * tests: remotes: GDrive: fix get_url * tests: api: fully test GDrive * minor typo fix * attempt gdrive tests fix * fix gdrive credentials config * replace GDrivePathNotFound => FileMissingError * fix test_open_external[GDrive] * add ensure_dir_scm * minor exception fix
Dear @shcheklein and @efiop, I'm starting working today on the "add |
Dear @julsal, I am glad you're implementing it. I presume your current work is here: https://github.com/iterative/dvc/tree/gdrive-import When do u expect to finish it? |
@plopd We are migrating to fsspec right now iterative/PyDrive2#113 and will ressurect |
@efiop Great. Until then, I think we should remove the |
* import-url: remove gdrive iterative/dvc#2865 (comment) * get-url: remove gdrive * Restyled by prettier (#2598) Co-authored-by: Restyled.io <[email protected]> Co-authored-by: restyled-io[bot] <32688539+restyled-io[bot]@users.noreply.github.com> Co-authored-by: Restyled.io <[email protected]>
With iterative/PyDrive2#113 closed, would it be possible to resume GDrive support for |
cc: @efiop @isidentical |
I think we should close this issue, and open separate ones for upcoming feature requests. This seems like a really giant but outdated epic.
As far as I can tell, everything should be there for import-url, so let's create another issue to finalize the support / add it docs etc. (if I am not missing something) |
Closing this meta-ticket since it's very outdated. Separate issues for any missing features in the fssspec based gdrivefs should be opened as needed. |
This ticket is to keep track of required further improvements for Google Drive
remote
implementation after #2551 will be merged intomaster
:Processing of auth exceptions and printing more meaningful error message on failure. GDrive remote support #2551 (comment)
Implement
gc
command support forgdrive
(def remove(self, path_info)
overloading )Validate resolved remote file object to have expected title (
def resolve_remote_item_from_path(self, path_parts, create):
)Simplify a few things using new and shiny
@wrap_prop
andfilter_errors
param of@retry
from funcy 1.14 :)Make name -> id deterministic by choosing minimum id GDrive remote support #2551 (comment)
Reimplement caching to have 2 entry points .path_info_to_ids() and .id_to_path_info(). The idea is to remove cache dictionaries completely, but it will require introduction of helper method which accepts parent_id, title and returns the actual id ( this method introduces 1 -> 1 relation between input and resulting remote id ). GDrive remote support #2551 (comment)
Simplify
resolve_remote_item_from_path
GDrive remote support #2551 (comment)Protect
create_remote_dir
with lock. GDrive remote support #2551 (comment)Organize methods, params in ordered and unified way. GDrive remote support #2551 (comment)
Enhance retrieving of HTTP error from GDrive API exceptions maxhora@783d8b6#r36184990
Create Iterative Google Drive account and Project to share client id and secret with final DVC users. Highest possible API usage limits quotas should be requested. Probably, it will be needed to have separate Google Project for CI. GDrive remote support #2551 (comment)
Stop creating a path to the DVC remote root if it does not exist. Since Google Drive allows multiple folders with the same name (at least on in My Drive and one in Shared With Me) in case a path like
gdrive://root/storage
is used to access, collaborators see astorage
empty folder after the firstdvc pull
attempt`. Instead we should just throw path does not exist.Hide
/Users/ivan/Projects/test-gdrive/.env/lib/python3.7/site-packages/oauth2client/_helpers.py:255: UserWarning: Cannot access /Users/ivan/Projects/test-gdrive/.dvc/tmp/gdrive-user-credentials.json: No such file or directory warnings.warn(_MISSING_FILE_MESSAGE.format(filename))
on the first auth.Reconsider
self.no_traverse = False
. Now even to pull a single file we run the full traversal. We can at least start listing only prefixes we need (e.g.remote/0c/*
if we need to check if file0c1234...ef
exists), we can use a parallelexists
if we need to check less than 256 files, etc.Put notes in docs that path like
gdrive://root
is not accessible by other people - that ID must be used to actually share data with other team members.We should store one credentials file per remote
Do not allow empty root DVC remote path (except shared?). I think it can prevent a lot of strange issues. More on this here remote add: should gdrive://root be an error? #3586Add
_download
progress Google Drive support further enhancements #2865 (comment) -> gdrive: download: stream & add progress #3722Support file streaming in
dvc.api.open()
function (more details in api: support streaming from Google Drive remotes #3408) and and update docsAdd
import-url
supportCheck that
close()
is handles in thedvc.api.open()
context manager.Support show URL in
dvc get
for Gdrive. We can generate a HTTPS link that can be used in a browser if user is authed. The same link as you would get if you download a file from UI.Fix credentials management for external repos (when we do
dvc get
, etc). We should have a way to cache them.Support external dependencies and outputs
Check that
dvc get-url
functions properly. Need to come up with some credentials management.Review and add more tests if needed. Starting from API tests.
Add an explanation comment about how path vs ids work, all non 1-1 stuff. Someone running into GDrive for first time will appreciate GDrive remote support #2551 (review). Explain how we deal with it, how caching is involved too.
Notify user on retries, keep the message on the screen if they are happening at a certain rate
Consider raising an exception if there are multiple remote root directories with the same name
Move
_gdrive_*
helpers into a separate moduleapi
to simplify testing and reading the codeWhen it's more or less stable make it trusted by default
Check if there is a way to generate URLs for GDrive publicly available files to download them w/o auth. Examples: https://github.com/NVlabs/stylegan/blob/master/pretrained_example.py and https://github.com/NVlabs/stylegan/blob/master/dnnlib/util.py .
dvc get
ideally should work w/o asking to Auth for public objects.Improve performance, pass
fields
, consider using batch for exists call - https://developers.google.com/drive/api/v2/performance , see example here docs: How to get specific fields when listing files. PyDrive2#42The text was updated successfully, but these errors were encountered: