Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems using Configurable Retry Policy #21

Closed
asqui opened this issue Oct 1, 2019 · 5 comments
Closed

Problems using Configurable Retry Policy #21

asqui opened this issue Oct 1, 2019 · 5 comments
Assignees
Labels
bug docs enhancement should update the documentation to better explain this

Comments

@asqui
Copy link

asqui commented Oct 1, 2019

Looking at the documentation on Configurable Retry Policy Support:

  • retry_count is documented as "Number of times a request is tried upon encountering a specific response code, such as 429." however I don't think the implementation respects this -- I believe it will make a total of retry_count attempts across any error type.
  • retry_delay and retry_max_delay documentation does not make it clear that these parameters control the full jitter algorithm. Also, they probably don't need to be integers.
  • In particular, retry_max_delay is documented as "The upper limit for the amount of time, in seconds, for which the retries will be attempted for." which is not respected, but would be useful to have. (i.e. I think you should add tenacity.stop_after_delay(configuration.retry_max_delay) to the Retrying(stop=...) argument, so that there is a way to limit the total amount of time spent retrying.)

Separately, it doesn't look like metrics are getting published during retries because opsgenie_sdk.rest.RESTClientObject.request() does not publish HTTP Metrics in the scenario where pool_manager.request() raises an exception. This is making it harder for me to debug what is going on.

@asqui
Copy link
Author

asqui commented Oct 1, 2019

Additional observations:

  • The Configuration object has an undocumented retries argument which (I think) is unused, but comes up in intellisense auto-complete lists.
  • The back_off parameter is not prefixed with retry_ like all the rest of the parameters for retry logic. retry_back_off might be more intuitive (and easier to find in intellisense auto-complete lists)

@asqui
Copy link
Author

asqui commented Oct 1, 2019

  • It's not possible to disable retry logic by setting retry_enabled = False because this parameter is quietly being ignored due to a bug. The and below needs to be an & for this to work as intended.

retry=(tenacity.retry_if_result(self.is_retry_enabled) and
((tenacity.retry_if_exception_type(RetryableException)) |
(tenacity.retry_if_exception_type(HTTPError)))))

@mhamzak008 mhamzak008 self-assigned this Nov 5, 2019
@mhamzak008 mhamzak008 added the bug label Nov 6, 2019
asqui added a commit to BATS/opsgenie-python-sdk that referenced this issue Nov 6, 2019
This is needed because the retry logic in SDK doesn't work properly :-(
(see opsgenie#21)
asqui added a commit to BATS/opsgenie-python-sdk that referenced this issue Nov 6, 2019
This is needed because the retry logic in SDK doesn't work properly :-(
(see opsgenie#21)
@mhamzak008 mhamzak008 added the docs enhancement should update the documentation to better explain this label Nov 6, 2019
@mhamzak008
Copy link
Contributor

mhamzak008 commented Nov 6, 2019

Hi, thanks a bunch for pointing out all of this and sorry for the late response.

I have changed the explanations of the retry_count and retry_delay in the documentation to better align with what they actually do.

The bug which was preventing the disablement of retrying will be fixed, by implementing a custom callback function for deciding whether the retry should stop or not, when #29 is released.
It looks like that Tenacity doesn't support and or & operator in the retry keyword argument because when we changed to & from and, the retrying was stopped but wasn't working with it enabled.

I will remove the unused retries parameter in the Configuration object in 17f99e2

I agree that the back_off parameter of the Configuration object should be renamed to retry_back_off, but doing so might introduce some breaking changes for some customers. So, I will be postponing it to the next major release perhaps.

And I am looking into why the metrics are not being published when pool_manager.request() raises an exception at the moment

@mhamzak008
Copy link
Contributor

mhamzak008 commented Nov 6, 2019

Coming back to metrics' publishing, http metrics are published only after an http request is completed. However, to make debugging easier, I have added a change to publish sdk metrics whenever an exception is thrown while attempting to call the api in b382348

@zfr
Copy link
Contributor

zfr commented Jan 24, 2020

I'm closing the ticket now. Thanks for the collaborative work guys. If we need an another improvement later, let's create a new issue then.

@zfr zfr closed this as completed Jan 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug docs enhancement should update the documentation to better explain this
Projects
None yet
Development

No branches or pull requests

3 participants