-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
upstream: Implement retry concurrency budgets #9069
Conversation
Signed-off-by: Tony Allen <[email protected]>
Signed-off-by: Tony Allen <[email protected]>
Signed-off-by: Tony Allen <[email protected]>
Signed-off-by: Tony Allen <[email protected]>
Signed-off-by: Tony Allen <[email protected]>
Signed-off-by: Tony Allen <[email protected]>
Signed-off-by: Tony Allen <[email protected]>
Signed-off-by: Tony Allen <[email protected]>
/retest |
🔨 rebuilding |
/retest |
🔨 rebuilding |
Signed-off-by: Tony Allen <[email protected]>
Signed-off-by: Tony Allen <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks pretty good, just a couple comments
|
||
// If a retry budget was configured, we cannot exceed the configured percentage of total | ||
// outstanding requests/connections. | ||
const uint64_t current_active = cluster_.resourceManager(priority_).connections().count() + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems a bit iffy to me: today kinda this works before for H/1.1 and H/2 make use of either max_connections OR max_requests so this works, but there are requested features that might introduce max_connections to H/2 as well (#7403). Envoy also doesn't reclaim H/1.1 connections until it needs to, so I think using max_connections
to get a sense for the current concurrency isn't perfect.
I think ideally we'd be counting # of requests for HTTP/1.1 and just do max_requests + max_pending, but I don't think we can just add that enforcement safely at this point (since users might have bumped max_con but not max_req). Maybe just note this limitation in the docs for now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noted in the docs. Let me know if the latest patch communicates this well enough.
Signed-off-by: Tony Allen <[email protected]>
Signed-off-by: Tony Allen <[email protected]>
Signed-off-by: Tony Allen <[email protected]>
Signed-off-by: Tony Allen <[email protected]>
Signed-off-by: Tony Allen <[email protected]>
Signed-off-by: Tony Allen <[email protected]>
/wait-any |
Signed-off-by: Tony Allen <[email protected]>
Apologies but can you merge master and format one more time? It should fix all the CI/random format issues. /wait |
Signed-off-by: Tony Allen <[email protected]>
Signed-off-by: Tony Allen <[email protected]>
Sorry needs another master merge. /wait |
I’ll just wait until we’re all back in the office to avoid further
conflicts.
On Fri, Jan 3, 2020 at 11:20 AM Matt Klein ***@***.***> wrote:
Sorry needs another master merge.
/wait
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#9069?email_source=notifications&email_token=AAIOZ7RKT5SY7QYPM5IH7VLQ35QUXA5CNFSM4JO5HESKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEIBPVZA#issuecomment-570620644>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAIOZ7TYVNDJ64XQQWD5M2TQ35QUXANCNFSM4JO5HESA>
.
--
Tony Allen
Software Engineer
<https://www.lyft.com/>
|
Signed-off-by: Tony Allen <[email protected]>
Signed-off-by: Tony Allen <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM modulo merge issue. Great work!
/wait
Signed-off-by: Tony Allen <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
@@ -314,7 +315,8 @@ TEST_P(ClusterMemoryTestRunner, MemoryLargeClusterSizeWithRealSymbolTable) { | |||
// 2019/11/01 8859 35221 36000 build: switch to libc++ by default | |||
// 2019/11/15 9040 35029 35500 build: update protobuf to 3.10.1 | |||
// 2019/11/15 9040 35061 35500 upstream: track whether cluster is local | |||
// 2019/12/10 8779 35053 35000 use var-length coding for name lengths | |||
// 2019/12/20 8779 35053 35000 use var-length coding for name lengths |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still a merge issue. To avoid going back and forth on this again can you fix this in your next change?
Since envoyproxy#9069, macOS builds have been failing because they use slightly more memory for stats than the new limit. Bump limit to next even multiple of 1000. Risk Level: low, test change only Docs Changes: n/a Release Notes: n/a Signed-off-by: Stephan Zuercher <[email protected]>
This patch implements
RetryBudget
, a new configuration option in the retry policy that limits allowed outstanding retries. The budget is specified as a percentage of the current outstanding requests/connections. Each retry budget must specify a minimum concurrency value to the budgets, so that one can retry when there are low concurrency values. This configuration is optional and will not change existing behavior unless configured. If configured, the retry budgets will override anymax_retries
circuit breaker configuration.For example, a budget of 20% with a minimum retry concurrency of 3 will allow 5 active retries while there are 25 active requests. If there are 2 active requests, there are still 3 active retries allowed because of the minimum retry concurrency.
This approach to limiting retries and mitigating retry storms is useful, because it allows one to think in terms of worst-case increases in traffic due to retries. Retry circuit breakers are fixed values and can potentially limit retries when the mesh can handle the increased volume. This is especially beneficial for adaptive concurrency for high RPS services.
Fixes: #8727
Risk Level: Low
Testing: Unit tests
Docs Changes: Stats and proto.