-
Notifications
You must be signed in to change notification settings - Fork 373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
api: support retry on in BackendTrafficPolicy #2168
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #2168 +/- ##
=======================================
Coverage 64.71% 64.71%
=======================================
Files 115 115
Lines 17431 17431
=======================================
Hits 11281 11281
Misses 5433 5433
Partials 717 717 ☔ View full report in Codecov by Sentry. |
@kflynn and @AliceProxy, can you help review this one |
@arkodg @tmsnan Sorry for the delay here! and many thanks for digging into this, I appreciate that. 🙂 That said, I'd like to formalize something I've said on many other occasions: Flynn's First Law of Envoy: Having end users interact directly with Envoy configuration concepts is almost always a huge mistake. 🙂 In Linkerd, the basic retry configuration is one line: My strong preference here is that the base case should be a single line of configuration: to get sane retries with good defaults, the user should need only to add It's certainly possible that I missed something here, and if so I apologize! 🙂 But in Envoy Gateway's name, the "Gateway" word is the really important bit, right? 🙂 |
|
@tmsnan, so sorry for the delay (again)! In practice, we haven't really found that everything needs support, and many things just don't get used all that often. Here's an alternative that deliberately tries to minimize the amount of indentation and configuration needed for the things I think are very common: Suggestion---
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: BackendTrafficPolicy
metadata:
name: example-policy
spec:
...
isRetryable: true
retryStrategy:
maxBudget: 20 # percentage, default 20%
maxRetries: 0 # count, if nonzero, maxBudget is ignored
maxParallel: 3 # Optional
minConcurrent: 3 # Optional
retryOn:
trigger: trigger-enum (see below)
httpStatus: []int # valid when trigger is "retriable-status-code"
perRetry:
timeout: duration # optional
idleTimeout: duration # optional
backoff: # optional
baseInterval: duration # required
maxInterval: duration # optional
# we can add rate limited based backoff config here if we want to
trigger-enum: one of
5xx # HTTP events
gateway-error
disconnect-reset
connect-fail
retriable-4xx
refused-stream
retriable-status-codes
cancelled # GRPC events
deadline-exceeded
internal
resource-exhausted
unavailable The idea here is that the proxy should know whether it's speaking HTTP or gRPC to a given backend. There's no point in configuring HTTP retries for a gRPC backend, and minimal config shouldn't require rewriting the retry config if you change types. (Obviously if you change protocols when you have configured a trigger, you'd need to change that.) ExamplesThese are not meant to be exhaustive; they're meant to show specific cases that might come up. Minimal config---
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: BackendTrafficPolicy
metadata:
name: example-policy
spec:
isRetryable: true Just give me retries and don't make me think about it. This will enable budgeted retries with a 20% retry budget. (Why budgeted and not counted? because the user experience is better for really simple configurations.) Minimal counted retries---
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: BackendTrafficPolicy
metadata:
name: example-policy
spec:
isRetryable: true
retryStrategy:
maxRetries: 1 Retry, but only once. (It might be worth considering whether the presence of Retry only on 499 or 505 (I don't recommend this)---
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: BackendTrafficPolicy
metadata:
name: example-policy
spec:
isRetryable: true
retryStrategy:
retryOn:
trigger: retriable-status-codes
httpStatus: [ 499, 505 ] Budgeted retries, but only retry on 499 or 505 status codes. Retry, controlling timing, on gRPC
|
@kflynn, thanks, the API looks much simpler, and there's really no need to distinguish between HTTP and gRPC retry trigger conditions. It's a minor point that we might need to configure multiple conditions in the
I will update the API later. |
01c7a9e
to
24e2f66
Compare
🚀 Deployed on https://gateway-pr-2168-preview--eg-preview.netlify.app |
|
||
// IsRetryAble indicates whether to enable budget retries with a 20% retry budget. | ||
// +optional | ||
IsRetryAble bool `json:"isRetryAble,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer we didn't include that and instead relied only on Retry
// RetryStrategy provides more advanced retry usage, which allows users to customize the retry method (number of retries、retry budget and concurrency max retries), | ||
// retry fallback strategy, and retry triggering conditions | ||
// +optional | ||
RetryStrategy *RetryStrategy `json:"retryStrategy,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we rename to retry
? all fields in here are singular
api/v1alpha1/retrystragegy_types.go
Outdated
// NumRetries is the number of retries to be attempted. Defaults to 0. If nonzero, maxBudget is ignored. | ||
// | ||
// +optional | ||
MaxRetries *int `json:"maxRetries,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this numRetries
or maxRetries
?
api/v1alpha1/retrystragegy_types.go
Outdated
|
||
// RetryOn specifies the retry trigger condition. | ||
// | ||
// +optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets add some defaults ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
like istio?
retryOn:
trigger: retriable-status-codes,connection-failure, refused-stream, unavailable,cancelled
httpStatus: [ 503 ]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah as well as numRetries
Signed-off-by: nan zhao <[email protected]>
Signed-off-by: zhaonan <[email protected]>
Signed-off-by: nan zhao <[email protected]>
3c9b667
to
1a24ca4
Compare
hey @tmsnan, sorry for the delay in review
it results in
|
Signed-off-by: nan zhao <[email protected]>
Signed-off-by: nan zhao <[email protected]>
done |
} | ||
|
||
// TriggerEnum specifies the conditions that trigger retries. | ||
type TriggerEnum string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needs a validation annotation
// | ||
// +optional | ||
// +kubebuilder:validation:Format=duration | ||
IdleTimeout *metav1.Duration `json:"idleTimeout,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we also rm IdleTimeout
in the first iteration ?
hey @tmsnan, this API is pretty close to the finish line, can you address the review comments ? tia |
// | ||
// +optional | ||
// +kubebuilder:default=2 | ||
NumRetries *int `json:"numRetries,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe int32 is more explicit. Also, we can validate that the value is >= 0.
// HttpStatusCodes specifies the http status codes to be retried. | ||
// | ||
// +optional | ||
HTTPStatusCodes []int `json:"httpStatusCodes,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe validate status in accepted range 1xx - 5xx
Sorry for the late reply, there have been some family matters recently, and I will handle them soon. |
Closing as this work was merged in #2657 |
api: support retry on in BackendTrafficPolicy
Relates to #1821 & #1907