Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Google Remote API Error (max retries exceeded) #3265

Closed
sid-marain opened this issue Jan 31, 2020 · 7 comments
Closed

Google Remote API Error (max retries exceeded) #3265

sid-marain opened this issue Jan 31, 2020 · 7 comments
Labels
awaiting response we are waiting for your reply, please respond! :)

Comments

@sid-marain
Copy link

sid-marain commented Jan 31, 2020

ERROR: unexpected error - HTTPSConnectionPool(host='www.googleapis.com', port=443): Max retries exceeded with url:<REDACTED>?fields=name (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:1051)')))

due to dvc push

excerpt from dvc push -v

DEBUG: fetched: [('1580439350437543424', '589', 'dc87bd29786e1cc8b2b3738ca8130c78', '1580440243730723328')]
DEBUG: UPDATE state SET timestamp = ? WHERE inode = ?
DEBUG: cache '.dvc/cache/dc/87bd29786e1cc8b2b3738ca8130c78' expected 'dc87bd29786e1cc8b2b3738ca8130c78' actual 'dc87bd29786e1cc8b2b3738ca8130c78'
DEBUG: Path .dvc/cache/05/31f82fec9701d95ef53ccf929dbaed inode 5738793
DEBUG: SELECT mtime, size, md5, timestamp from state WHERE inode=?
DEBUG: fetched: [('1580439350425543424', '28717', '0531f82fec9701d95ef53ccf929dbaed', '1580440243730771968')]
DEBUG: UPDATE state SET timestamp = ? WHERE inode = ?
DEBUG: cache '.dvc/cache/05/31f82fec9701d95ef53ccf929dbaed' expected '0531f82fec9701d95ef53ccf929dbaed' actual '0531f82fec9701d95ef53ccf929dbaed'
DEBUG: Path .dvc/cache/e9/c97e5d0a774d08c0a2d5b0e2e510b7 inode 5738794
DEBUG: SELECT mtime, size, md5, timestamp from state WHERE inode=?
DEBUG: fetched: [('1580439350437543424', '25515', 'e9c97e5d0a774d08c0a2d5b0e2e510b7', '1580440243730871296')]
DEBUG: UPDATE state SET timestamp = ? WHERE inode = ?
DEBUG: cache '.dvc/cache/e9/c97e5d0a774d08c0a2d5b0e2e510b7' expected 'e9c97e5d0a774d08c0a2d5b0e2e510b7' actual 'e9c97e5d0a774d08c0a2d5b0e2e510b7'`

Settings
DVC 0.82.6
Python 3.6
Ubuntu 18.04

@triage-new-issues triage-new-issues bot added the triage Needs to be triaged label Jan 31, 2020
@shcheklein
Copy link
Member

@sid-marain what remote type do you use? Google Drive or GCP?

can you push at least one file or it happens in the middle?

@sid-marain
Copy link
Author

sid-marain commented Jan 31, 2020

It's GCP.

I can push at least two files, I believe, but the last fails. I am running dvc in a distributed manner from docker in a batch process. That is, I have (maybe dozens to hundreds) of containers running a script in parallel. These finish asynchronously and then push results to the dvc remote. This happens somewhat infrequently (on a recent job with ~140 tasks, I saw this occur 3 times... there didn't seem to be anything about the tasks that would be indicative of failure. I.e., these seemed to be rather random failures).

@shcheklein
Copy link
Member

@sid-marain thanks! could you also please share the last part of the stack trace?

It's not clear yet what is happening. To be honest looks more like an environment error to me. We would need to find a way (together?) to reproduce this most likely to understand what's happening.

@ghost
Copy link

ghost commented Jan 31, 2020

@sid-marain , looks like GCP rate-limits you if you are doing consecutive requests: https://cloud.google.com/storage/docs/request-rate#auto-scaling

@sid-marain
Copy link
Author

@shcheklein : That's all I'm getting from the stack trace. It's difficult to reproduce this error because, as mentioned above, it only seems to happen once every so often.

@MrOutis : that might be it. We are potentially doing quite a few requests. Could retry the dvc call w/ exponential backoff.

@ghost
Copy link

ghost commented Jan 31, 2020

@MrOutis : that might be it. We are potentially doing quite a few requests. Could retry the dvc call w/ exponential backoff.

@sid-marain , as far as I remember, dvc push is smart enough to detect already uploaded files, so a retry mechanism could definitely help.

@sid-marain
Copy link
Author

@MrOutis: Yeah. Based on review of the timing of this error on logs, the requests were submitted and failed within a 20 second window. We don't really have a way to coordinate the call to dvc push across the batch, so I think the retry mechanism will have to do. Thanks!

@efiop efiop added the awaiting response we are waiting for your reply, please respond! :) label Feb 1, 2020
@triage-new-issues triage-new-issues bot removed the triage Needs to be triaged label Feb 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting response we are waiting for your reply, please respond! :)
Projects
None yet
Development

No branches or pull requests

3 participants