-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: respect service's suggested retryAfter when throttled #39
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
ddneilson
commented
Sep 26, 2023
ddneilson
commented
Sep 26, 2023
ddneilson
force-pushed
the
ddneilson/13268
branch
2 times, most recently
from
September 27, 2023 18:56
e3abf72
to
e4a8739
Compare
jericht
approved these changes
Sep 27, 2023
jusiskin
reviewed
Sep 27, 2023
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes look good, but my primary concern is how this change would affect the load on the service. We'd done a prior analysis in #20 on the backoff algorithm being used and this might change the outcome.
ddneilson
force-pushed
the
ddneilson/13268
branch
from
October 18, 2023 17:53
e4a8739
to
eff5b41
Compare
When calling a deadline cloud service API and getting a throttle/retry response the exception object may contain a "retryAfterSeconds" field alongside the error. When that field is present, the calling client should treat that as a request to retry in no sooner than the given number of seconds; it is a load-shedding mechanism for the service. We should respect the service's request. Solution: Added to the logic of all of the deadline-cloud API wrappers to have them extract the value of the "retryAfterSeconds" field if it's present, and pass that to our backoff-delay calculator. We use the value as a lower limit on the returned delay. I also made the scheduler use the API wrapper for update_worker; it still had its own implementation that didn't properly handle exceptions. This necessitated adding the ability to interrupt the update_worker's throttled-retries so preserve the functionality at that call site. Signed-off-by: Daniel Neilson <[email protected]>
ddneilson
force-pushed
the
ddneilson/13268
branch
from
October 18, 2023 18:00
eff5b41
to
92087e5
Compare
jericht
approved these changes
Oct 18, 2023
jusiskin
approved these changes
Oct 18, 2023
gmchale79
pushed a commit
that referenced
this pull request
Oct 31, 2023
When calling a deadline cloud service API and getting a throttle/retry response the exception object may contain a "retryAfterSeconds" field alongside the error. When that field is present, the calling client should treat that as a request to retry in no sooner than the given number of seconds; it is a load-shedding mechanism for the service. We should respect the service's request. Solution: Added to the logic of all of the deadline-cloud API wrappers to have them extract the value of the "retryAfterSeconds" field if it's present, and pass that to our backoff-delay calculator. We use the value as a lower limit on the returned delay. I also made the scheduler use the API wrapper for update_worker; it still had its own implementation that didn't properly handle exceptions. This necessitated adding the ability to interrupt the update_worker's throttled-retries so preserve the functionality at that call site. Signed-off-by: Daniel Neilson <[email protected]> Signed-off-by: Graeme McHale <[email protected]>
gmchale79
pushed a commit
that referenced
this pull request
Nov 2, 2023
When calling a deadline cloud service API and getting a throttle/retry response the exception object may contain a "retryAfterSeconds" field alongside the error. When that field is present, the calling client should treat that as a request to retry in no sooner than the given number of seconds; it is a load-shedding mechanism for the service. We should respect the service's request. Solution: Added to the logic of all of the deadline-cloud API wrappers to have them extract the value of the "retryAfterSeconds" field if it's present, and pass that to our backoff-delay calculator. We use the value as a lower limit on the returned delay. I also made the scheduler use the API wrapper for update_worker; it still had its own implementation that didn't properly handle exceptions. This necessitated adding the ability to interrupt the update_worker's throttled-retries so preserve the functionality at that call site. Signed-off-by: Daniel Neilson <[email protected]> Signed-off-by: Graeme McHale <[email protected]>
gahyusuh
pushed a commit
that referenced
this pull request
Nov 6, 2023
When calling a deadline cloud service API and getting a throttle/retry response the exception object may contain a "retryAfterSeconds" field alongside the error. When that field is present, the calling client should treat that as a request to retry in no sooner than the given number of seconds; it is a load-shedding mechanism for the service. We should respect the service's request. Solution: Added to the logic of all of the deadline-cloud API wrappers to have them extract the value of the "retryAfterSeconds" field if it's present, and pass that to our backoff-delay calculator. We use the value as a lower limit on the returned delay. I also made the scheduler use the API wrapper for update_worker; it still had its own implementation that didn't properly handle exceptions. This necessitated adding the ability to interrupt the update_worker's throttled-retries so preserve the functionality at that call site. Signed-off-by: Daniel Neilson <[email protected]> Signed-off-by: Graeme McHale <[email protected]> Signed-off-by: Gahyun Suh <[email protected]>
gmchale79
pushed a commit
that referenced
this pull request
Feb 12, 2024
When calling a deadline cloud service API and getting a throttle/retry response the exception object may contain a "retryAfterSeconds" field alongside the error. When that field is present, the calling client should treat that as a request to retry in no sooner than the given number of seconds; it is a load-shedding mechanism for the service. We should respect the service's request. Solution: Added to the logic of all of the deadline-cloud API wrappers to have them extract the value of the "retryAfterSeconds" field if it's present, and pass that to our backoff-delay calculator. We use the value as a lower limit on the returned delay. I also made the scheduler use the API wrapper for update_worker; it still had its own implementation that didn't properly handle exceptions. This necessitated adding the ability to interrupt the update_worker's throttled-retries so preserve the functionality at that call site. Signed-off-by: Daniel Neilson <[email protected]> Signed-off-by: Graeme McHale <[email protected]>
Closed
gmchale79
pushed a commit
that referenced
this pull request
Mar 11, 2024
When calling a deadline cloud service API and getting a throttle/retry response the exception object may contain a "retryAfterSeconds" field alongside the error. When that field is present, the calling client should treat that as a request to retry in no sooner than the given number of seconds; it is a load-shedding mechanism for the service. We should respect the service's request. Solution: Added to the logic of all of the deadline-cloud API wrappers to have them extract the value of the "retryAfterSeconds" field if it's present, and pass that to our backoff-delay calculator. We use the value as a lower limit on the returned delay. I also made the scheduler use the API wrapper for update_worker; it still had its own implementation that didn't properly handle exceptions. This necessitated adding the ability to interrupt the update_worker's throttled-retries so preserve the functionality at that call site. Signed-off-by: Daniel Neilson <[email protected]> Signed-off-by: Graeme McHale <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What was the problem/requirement? (What/Why)
When calling a deadline cloud service API and getting a throttle/retry response the exception object may contain a "retryAfterSeconds" field alongside the error. When that field is present, the calling client should treat that as a request to retry in no sooner than the given number of seconds; it is a load-shedding mechanism for the service. We should respect the service's request.
What was the solution? (How)
Added to the logic of all of the deadline-cloud API wrappers to have
them extract the value of the "retryAfterSeconds" field if it's present, and pass that to our backoff-delay calculator. We use the value as a lower limit on the returned delay.
I also made the scheduler use the API wrapper for update_worker; it
still had its own implementation that didn't properly handle exceptions. This necessitated adding the ability to interrupt the update_worker's throttled-retries so preserve the functionality at that call site.
What is the impact of this change?
The worker agent will be a more kind and cooperative client to the deadline cloud service.
How was this change tested?
This part of the code is well unit tested, so I just ensured that the unit tests cover the new functionality.
Was this change documented?
N/A
Is this a breaking change?
No