Fix #887 Enable automatic retry by a handy way #1084

seratch · 2021-08-05T09:51:12Z

Summary

This pull request fixes #887 by adding the new RetryHandler feature to all the API clients (except legacy ones under slack package).

With the default settings, the API clients do one retry only for connectivity issues like the "Connection reset by peer" error. For the intervals of retries, the built-in retry handlers behave in the manner of exponential backoff and jitter.

To customize the behavior, you can pass your own retry_handlers argument to API client constructors:

from slack_sdk.http_retry.handler import RetryHandler
from slack_sdk.http_retry.builtin_handlers import RateLimitErrorRetryHandler

class MyRetryHandler(RetryHandler):
    def _can_retry(self, *, state, request, response, error) -> bool:
        return response is not None and response.status_code >= 500

my_retry_handler = MyRetryHandler(max_retry_count=2)
ratelimit_retry_handler = RateLimitErrorRetryHandler(max_retry_count=1)

import os
from slack_sdk.web import WebClient

client = WebClient(
    token=os.environ["SLACK_BOT_TOKEN"],
    retry_handlers=[my_retry_handler, ratelimit_retry_handler],
)

If an API client with retry handlers encounters an error, it runs each handler's def can_retry(args) -> bool method. If any of the method executions returns True, the client runs its def prepare_for_next_retry(args) -> None method to wait for the right timing. Then, the same API request will be performed until the client hits the handler's max_retry_count.

In this pull request, I've updated the following API clients:

slack_sdk.web.WebClient
slack_sdk.webhook.WebhookClient
slack_sdk.audit_logs.AuditLogsClient
slack_sdk.scim.SCIMClient
slack_sdk.web.async_client.AsyncWebClient (aiohttp/asyncio compatible)
slack_sdk.webhook.async_client.AsyncWebhookClient (aiohttp/asyncio compatible)
slack_sdk.audit_logs.async_client.AsyncAuditLogsClient (aiohttp/asyncio compatible)
slack_sdk.scim.async_client.AsyncSCIMClient (aiohttp/asyncio compatible)

You can reuse retry handlers across the above API clients:

from slack_sdk.scim import SCIMClient
client = WebClient(
    token=os.environ["SLACK_ADMIN_TOKEN"],
    retry_handlers=[my_retry_handler],
)

from slack_sdk.audit_logs.async_client import AsyncAuditLogsClient
from slack_sdk.http_retry.builtin_async_handlers import AsyncConnectionErrorRetryHandler
from slack_sdk.http_retry.builtin_interval_calculators import BackoffRetryIntervalCalculator
from slack_sdk.http_retry.jitter import RandomJitter

client = AsyncAuditLogsClient(
    token=os.environ["SLACK_ADMIN_TOKEN"],
    retry_handlers=[AsyncConnectionErrorRetryHandler(
        max_retry_count=2,
        interval_calculator=BackoffRetryIntervalCalculator(
            backoff_factor=0.2,
            jitter=RandomJitter(),
        )
    )],
)

TODOs

Implement the features
Add new unit tests for the changes
Run all the integration tests to verify if there is no regression
Update the document to cover how to customize retry handlers (in a different PR; we'll merge it after releasing v3.9)

Category (place an `x` in each of the `[ ]`)

Requirements (place an `x` in each `[ ]`)

I've read and understood the Contributing Guidelines and have done my best effort to follow them.
I've read and agree to the Code of Conduct.
I've run python3 -m venv .venv && source .venv/bin/activate && ./scripts/run_validation.sh after making the changes.

seratch · 2021-08-05T09:51:53Z

slack_sdk/audit_logs/v1/async_client.py

+                retry_response: Optional[RetryHttpResponse] = None
+                response_body = ""
+
+                if self.logger.level <= logging.DEBUG:


Moved this debug logging to print this every time this client does a retry.

seratch · 2021-08-05T09:53:23Z

slack_sdk/audit_logs/v1/async_client.py

@@ -49,6 +55,7 @@ def __init__(
        user_agent_prefix: Optional[str] = None,
        user_agent_suffix: Optional[str] = None,
        logger: Optional[logging.Logger] = None,
+        retry_handlers: List[RetryHandler] = async_default_handlers,


The default one consists of only the (Async)ConnectionErrorRetryHandler instance with its defaults.

seratch · 2021-08-05T09:54:37Z

slack_sdk/http_retry/handler.py

+default_interval_calculator = BackoffRetryIntervalCalculator()
+
+
+class RetryHandler:


This class is the main interface in this pull request

seratch · 2021-08-05T09:56:16Z

tests/slack_sdk/my_retry_handler.py

+from slack_sdk.http_retry.handler import RetryHandler, default_interval_calculator
+
+
+class MyRetryHandler(RetryHandler):


Custom retry handler for testing

seratch · 2021-08-05T09:59:19Z

tests/web/mock_web_api_server.py

@@ -111,7 +111,7 @@ def _handle(self):
                    return
                if pattern == "rate_limited":
                    self.send_response(429)
-                    self.send_header("Retry-After", 30)
+                    self.send_header("Retry-After", 1)


Changed the value for faster test execution (we really don't want to wait for 30 seconds

codecov · 2021-08-05T11:30:36Z

Codecov Report

Merging #1084 (2bf87c4) into main (8a0c802) will increase coverage by 0.46%.
The diff coverage is 90.58%.

@@            Coverage Diff             @@
##             main    #1084      +/-   ##
==========================================
+ Coverage   85.62%   86.09%   +0.46%     
==========================================
  Files          99      110      +11     
  Lines        9324     9847     +523     
==========================================
+ Hits         7984     8478     +494     
- Misses       1340     1369      +29

Impacted Files	Coverage Δ
slack_sdk/http_retry/interval_calculator.py	`66.66% <66.66%> (ø)`
slack_sdk/web/async_internal_utils.py	`81.81% <80.95%> (+2.78%)`	⬆️
slack_sdk/audit_logs/v1/async_client.py	`89.16% <85.18%> (+0.41%)`	⬆️
slack_sdk/http_retry/jitter.py	`85.71% <85.71%> (ø)`
slack_sdk/web/base_client.py	`89.55% <87.32%> (+0.45%)`	⬆️
slack_sdk/audit_logs/v1/client.py	`91.20% <88.57%> (+1.66%)`	⬆️
slack_sdk/webhook/async_client.py	`92.23% <90.00%> (-1.42%)`	⬇️
slack_sdk/scim/v1/async_client.py	`94.20% <90.19%> (-1.72%)`	⬇️
slack_sdk/scim/v1/client.py	`93.75% <90.27%> (+3.27%)`	⬆️
slack_sdk/http_retry/builtin_handlers.py	`92.10% <92.10%> (ø)`
... and 25 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8a0c802...2bf87c4. Read the comment docs.

seratch · 2021-08-05T12:51:12Z

Applied the following changes:

Renamed RetryHandler#can_retry_custom(...) to RetryHandler#_can_retry(...)
Renamed RetryHandler#prepare_for_next_retry(...) to RetryHandler#prepare_for_next_attempt(...)

…mpt()

filmaj · 2021-08-06T13:02:54Z

I plan on reviewing today - it is a big PR so I did not end up having time yesterday.

seratch · 2021-08-06T13:07:28Z

@filmaj Thanks! No rush at all. I know this includes so many changes. I am thinking that this pull request can add more unit tests covering rate limited error patterns for safety.

seratch

Added more comments for reviewers

seratch · 2021-08-06T13:09:55Z

slack_sdk/audit_logs/v1/client.py

@@ -190,7 +199,7 @@ def api_call(
        return self._perform_http_request(
            http_verb=http_verb,
            url=url,
-            body_params=body_params,
+            body=body_params,


As _perform_http_request is an internal method, we can safely rename this arg name

seratch · 2021-08-06T13:11:06Z

slack_sdk/audit_logs/v1/async_client.py

+            counter_for_safety = 0
+            while counter_for_safety < 100:


We may want to remove this counter for simplicity. while True here should be safe enough as retry_state.next_attempt_requested is usually False

seratch · 2021-08-06T13:11:54Z

slack_sdk/http_retry/builtin_async_handlers.py

+        error_types: List[Exception] = [
+            ServerConnectionError,
+            ServerDisconnectedError,
+            # ClientOSError: [Errno 104] Connection reset by peer
+            ClientOSError,
+        ],


These are aiohttp specific exceptions

seratch · 2021-08-06T13:12:39Z

slack_sdk/http_retry/builtin_handlers.py

+        return False
+
+
+class ServerErrorRetryHandler(RetryHandler):


I've added this one as a reference implementation but it's unused. We may want to remove this for now.

I would suggest removing it. I am not sure it is a good practice to blindly retry if a request yields an HTTP 500 response; I think it could lead to undesirable network saturation in certain cases like a legitimate outage on Slack's side.

Yes, this is fair enough 👍

I would suggest removing it. I am not sure it is a good practice to blindly retry if a request yields an HTTP 500 response; I think it could lead to undesirable network saturation in certain cases like a legitimate outage on Slack's side.

Sorry to necro, but I think it's worth reconsidering this decision. While it may be undesirable from Slack's perspective to exponentially backoff when their API is returning 5xxs, I think this is what the SDK consumers will want. And I think it makes sense to do what the consumer wants here, because if we don't, the consumer is just going to implement their own exponential backoff logic that includes 5xxs. This is my plan anyway.

Thanks for your work here!

This feels like a good time to mention the Circuit Breaker pattern.

seratch · 2021-08-06T13:13:55Z

slack_sdk/http_retry/builtin_handlers.py

+            duration += random.random()
+        else:
+            duration = (
+                int(response.headers.get(retry_after_header_name)[0]) + random.random()


The random.random() is a random jitter but it might not be necessary here. This is not backoff

seratch · 2021-08-06T13:14:45Z

slack_sdk/http_retry/builtin_interval_calculators.py

+from .interval_calculator import RetryIntervalCalculator
+
+
+class FixedValueRetryIntervalCalculator(RetryIntervalCalculator):


Just as a reference. Unused with the default setting. I should add some test for this

If it's unused, can probably not worry about the tests. Unless you want to keep the coverage scores high 😆

Haha, yeah, I always like better coverage!

filmaj

Wow, lots of work and you did a great job! Thanks for involving me in the review.

I left a few comments, mostly for my own learning and education.

slack_sdk/http_retry/handler.py

filmaj · 2021-08-06T17:49:56Z

slack_sdk/http_retry/builtin_handlers.py

+        return False
+
+
+class ServerErrorRetryHandler(RetryHandler):


I would suggest removing it. I am not sure it is a good practice to blindly retry if a request yields an HTTP 500 response; I think it could lead to undesirable network saturation in certain cases like a legitimate outage on Slack's side.

filmaj · 2021-08-06T17:55:11Z

slack_sdk/http_retry/builtin_handlers.py

+            duration = (
+                int(response.headers.get(retry_after_header_name)[0]) + random.random()
+            )
+        time.sleep(duration)


Theoretically, using the synchronous client, if the API responds with a relatively large value in the Retry-After header (e.g. the docs for this header show an example value of 30) - would this freeze the entire process?

would this freeze the entire process?

When it comes to the same thread, yes. Thinking about the behavior as the whole app. it depends on how the app is implemented. In the case of Bolt for Python, all the code except ack() will be executed in a background thread. It does not result in 3 secound timeout.

By default, we don't enable rate limited error retries. Developers should turn it on with great understanding of the potential long pause.

filmaj · 2021-08-06T18:06:18Z

slack_sdk/http_retry/builtin_async_handlers.py

+from slack_sdk.http_retry.handler import RetryHandler, default_interval_calculator
+
+
+class AsyncConnectionErrorRetryHandler(RetryHandler):


Since this async implementation relies on the same base class that is shared with the sync implementation, and the base RetryHandler class' prepare_for_next_request uses the built-in Python's sleep method - could this lead to a situation where we block the process even when using an async handler?

I am not very familiar with aiohttp, but it seems like it is based on the asyncio library which has its own async-friendly sleep implementation (or, at least, this aiohttp document page implies that such an async sleep exists - search for asyncio on this page for the relevant section).

I am posing this question from a place of ignorance and a desire to learn so it is likely I am completely off. But asking dumb questions is helpful for me to learn 🤪

@filmaj Ah, this is a great point! Yes, we should use asyncio.sleep instead here and I was aware of it. But somehow I forgot to override the method. We can have a base class RetryHandler, which uses asyncio'sleep method. All the methods in it will be async methods. I will update this part shortly.

filmaj · 2021-08-06T18:13:21Z

tests/slack_sdk/web/mock_web_api_server.py

+            header = self.headers["Authorization"]
+            if header is not None and "xoxp-" in header:
+                pattern = str(header).split("xoxp-", 1)[1]
+                if "remote_disconnected" in pattern:


Very nice pattern, I like this a lot!

seratch

@filmaj Thanks for your review! I will update some parts before merging this.

seratch · 2021-08-06T22:02:11Z

slack_sdk/http_retry/builtin_async_handlers.py

+from slack_sdk.http_retry.handler import RetryHandler, default_interval_calculator
+
+
+class AsyncConnectionErrorRetryHandler(RetryHandler):


@filmaj Ah, this is a great point! Yes, we should use asyncio.sleep instead here and I was aware of it. But somehow I forgot to override the method. We can have a base class RetryHandler, which uses asyncio'sleep method. All the methods in it will be async methods. I will update this part shortly.

seratch · 2021-08-06T22:02:57Z

slack_sdk/http_retry/builtin_handlers.py

+        return False
+
+
+class ServerErrorRetryHandler(RetryHandler):


Yes, this is fair enough 👍

seratch · 2021-08-06T22:06:36Z

slack_sdk/http_retry/builtin_handlers.py

+            duration = (
+                int(response.headers.get(retry_after_header_name)[0]) + random.random()
+            )
+        time.sleep(duration)


would this freeze the entire process?

When it comes to the same thread, yes. Thinking about the behavior as the whole app. it depends on how the app is implemented. In the case of Bolt for Python, all the code except ack() will be executed in a background thread. It does not result in 3 secound timeout.

By default, we don't enable rate limited error retries. Developers should turn it on with great understanding of the potential long pause.

seratch · 2021-08-06T22:07:12Z

slack_sdk/http_retry/builtin_interval_calculators.py

+from .interval_calculator import RetryIntervalCalculator
+
+
+class FixedValueRetryIntervalCalculator(RetryIntervalCalculator):


Haha, yeah, I always like better coverage!

slack_sdk/http_retry/handler.py

… ratelimit errors

seratch · 2021-08-07T02:34:10Z

Fixed all the issues in the latest revision. Let me merge this PR now. I will release an RC version for getting feedback from communities.

…t arguments

seratch added enhancement M-T: A feature request for new functionality web-client Version: 3x scim-client audit-logs-client labels Aug 5, 2021

seratch added this to the 3.9.0 milestone Aug 5, 2021

seratch commented Aug 5, 2021

View reviewed changes

seratch requested review from filmaj, misscoded, mwbrooks, srajiang and stevengill August 5, 2021 10:01

seratch force-pushed the issue-887-retry-handlers branch from 31d7027 to f9385f7 Compare August 5, 2021 11:10

seratch force-pushed the issue-887-retry-handlers branch from 5e418a6 to 84b12e1 Compare August 5, 2021 12:56

seratch added 6 commits August 6, 2021 12:40

Fix slackapi#887 Enable automatic retry by a handy way

6207896

Fix a type hint error

d7eb348

Update the tests to be compatible with 3.x versions

b23c21f

Rename can_retry_custom to _can_retry

0842173

Rename RetryHandler#prepare_for_next_retry() to prepare_for_next_atte…

ac96026

…mpt()

Add urllib.error.URLError to ConnectionErrorRetryHandler

85eab23

seratch force-pushed the issue-887-retry-handlers branch from 84b12e1 to 85eab23 Compare August 6, 2021 04:22

seratch commented Aug 6, 2021

View reviewed changes

filmaj approved these changes Aug 6, 2021

View reviewed changes

seratch commented Aug 6, 2021

View reviewed changes

seratch added 2 commits August 7, 2021 11:22

Add async retry handler; removed server error handler; more tests for…

82182e2

… ratelimit errors

Remove nonexistent method access

2bf87c4

seratch merged commit c6efe45 into slackapi:main Aug 7, 2021

seratch deleted the issue-887-retry-handlers branch August 7, 2021 02:34

seratch added a commit to seratch/python-slack-sdk that referenced this pull request Aug 13, 2021

Improve slackapi#1084 by removing globally mutable lists

fa931f6

seratch added a commit to seratch/python-slack-sdk that referenced this pull request Aug 13, 2021

Improve slackapi#1084 by removing globally mutable lists

271265b

seratch mentioned this pull request Aug 13, 2021

Improve #1084 by removing globally mutable lists #1092

Merged

16 tasks

seratch added a commit that referenced this pull request Aug 13, 2021

Improve #1084 by removing globally mutable lists (#1092)

2b30100

seratch added a commit to seratch/python-slack-sdk that referenced this pull request Aug 13, 2021

Improve slackapi#1092 slackapi#1084 by removing mutable method defaul…

99be36c

…t arguments

seratch mentioned this pull request Aug 13, 2021

Improve #1092 #1084 by removing mutable method default arguments #1093

Merged

16 tasks

seratch added a commit that referenced this pull request Aug 13, 2021

Improve #1092 #1084 by removing mutable method default arguments (#1093)

c07dc7b

seratch added a commit to seratch/python-slack-sdk that referenced this pull request Aug 17, 2021

Improve slackapi#1084 to run rate limited error retry handler correctly

58b34ef

seratch mentioned this pull request Aug 17, 2021

Improve #1084 to run rate limited error retry handler correctly #1094

Merged

16 tasks

seratch added a commit that referenced this pull request Aug 17, 2021

Improve #1084 to run rate limited error retry handler correctly (#1094)

9cab809

cj81499 mentioned this pull request Jan 26, 2023

Handling Rate Limiting With v2 Commands #436

Closed

2 tasks

		default_interval_calculator = BackoffRetryIntervalCalculator()


		class RetryHandler:

		from slack_sdk.http_retry.handler import RetryHandler, default_interval_calculator


		class MyRetryHandler(RetryHandler):

		from .interval_calculator import RetryIntervalCalculator


		class FixedValueRetryIntervalCalculator(RetryIntervalCalculator):

		from slack_sdk.http_retry.handler import RetryHandler, default_interval_calculator


		class AsyncConnectionErrorRetryHandler(RetryHandler):

Fix #887 Enable automatic retry by a handy way #1084

Fix #887 Enable automatic retry by a handy way #1084

Conversation

seratch commented Aug 5, 2021 • edited Loading

Summary

TODOs

Category (place an x in each of the [ ])

Requirements (place an x in each [ ])

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Aug 5, 2021 • edited Loading

Codecov Report

seratch commented Aug 5, 2021 • edited Loading

filmaj commented Aug 6, 2021

seratch commented Aug 6, 2021

seratch left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tgiardina Jun 5, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

filmaj left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

seratch left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

seratch commented Aug 7, 2021

seratch commented Aug 5, 2021 •

edited

Loading

Category (place an `x` in each of the `[ ]`)

Requirements (place an `x` in each `[ ]`)

codecov bot commented Aug 5, 2021 •

edited

Loading

seratch commented Aug 5, 2021 •

edited

Loading

tgiardina Jun 5, 2024 •

edited

Loading