Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

http.py _retry_request improve exponential backoff #466

Closed
wants to merge 1 commit into from
Closed

http.py _retry_request improve exponential backoff #466

wants to merge 1 commit into from

Conversation

xmedeko
Copy link
Contributor

@xmedeko xmedeko commented Jan 17, 2018

When I use num_retries=5 I may see in the logs:

Sleeping 0.04 seconds before retry 1 of 5 for request ...
Sleeping 2.59 seconds before retry 2 of 5 for request ...
Sleeping 0.74 seconds before retry 3 of 5 for request ...
Sleeping 0.36 seconds before retry 4 of 5 for request ...
Sleeping 0.62 seconds before retry 5 of 5 for request ...

Which is in total approx 5sec, but I would expect something between 16-32sec total. Also, see the Google backoff rule https://cloud.google.com/storage/docs/exponential-backoff

My change does not follow the doc, but at least is closer to that is than the current documentation.

@googlebot googlebot added the cla: yes This human has signed the Contributor License Agreement. label Jan 17, 2018
@theacodes
Copy link
Contributor

Hi @xmedeko, thank you for sending this and for researching it, however, because this library is in maintenance mode I am apprehensive to merge this patch. It does not appear that the existing algorithm is incorrect, so I am going to close this for now.

@theacodes theacodes closed this Jan 17, 2018
@xmedeko
Copy link
Contributor Author

xmedeko commented Jan 17, 2018

@jonparrott what about to make the sleep time as external func so a user may overwrite it? Something like

def default_sleeptime(rand, retry_num):
    return rand() * 2 ** retry_num

_sleep_time = default_sleeptime
# ...
    sleep_time = _sleep_time(rand, retry_num)

The problem with the current implementation is that it's reall very random. E.g. for the num_retries=5 I can get total time anything between 0 and 32 sec. When I rise the num_retries=10, I can still get total time close to zero, just with less probability.
I use Cloud Tasks and they have regular HTTP 500 out of service responses. And Cloud Tasks are not in google-cloud-python yet, and they won't be there in the near future.

I know I can monkey patch the _retry_request, but I want to avoid it if possible.

@theacodes
Copy link
Contributor

I'm unfortunately going to have to turn that down as well.

For what it's worth, the backoff should be random because of jitter. It should not necessarily get progressively long, it should progressively get potentially longer.

@xmedeko xmedeko deleted the http_retry branch January 18, 2018 07:22
@xmedeko
Copy link
Contributor Author

xmedeko commented Jan 29, 2018

For those with the same problem, I've made a HttpRequest subclass which tries to fix the backoff time problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes This human has signed the Contributor License Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants