Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

api: support retry on in BackendTrafficPolicy #2168

Closed
wants to merge 5 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions api/v1alpha1/backendtrafficpolicy_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,11 @@ type BackendTrafficPolicySpec struct {
// +optional
CircuitBreaker *CircuitBreaker `json:"circuitBreaker,omitempty"`

// Retry provides more advanced usage, allowing users to customize the number of retries, retry fallback strategy, and retry triggering conditions.
// If not set, retry will be disabled.
// +optional
Retry *Retry `json:"retry,omitempty"`

// Timeout settings for the backend connections.
//
// +optional
Expand Down
114 changes: 114 additions & 0 deletions api/v1alpha1/retry_types.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,114 @@
// Copyright Envoy Gateway Authors
// SPDX-License-Identifier: Apache-2.0
// The full text of the Apache license is available in the LICENSE file at
// the root of the repo.

package v1alpha1

import (
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)

// Retry defines the retry strategy to be applied.
type Retry struct {
// NumRetries is the number of retries to be attempted. Defaults to 2.
//
// +optional
// +kubebuilder:default=2
NumRetries *int `json:"numRetries,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe int32 is more explicit. Also, we can validate that the value is >= 0.


// RetryOn specifies the retry trigger condition.
//
// If not specified, the default is to retry on connect-failure,refused-stream,unavailable,cancelled,retriable-status-codes(503).
// +optional
RetryOn *RetryOn `json:"retryOn,omitempty"`

// PerRetry is the retry policy to be applied per retry attempt.
//
// +optional
PerRetry *PerRetryPolicy `json:"perRetry,omitempty"`
}

type RetryOn struct {
// Triggers specifies the retry trigger condition(Http/Grpc).
//
// +optional
Triggers []TriggerEnum `json:"triggers,omitempty"`

// HttpStatusCodes specifies the http status codes to be retried.
//
// +optional
HTTPStatusCodes []int `json:"httpStatusCodes,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe validate status in accepted range 1xx - 5xx

}

// TriggerEnum specifies the conditions that trigger retries.
type TriggerEnum string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs a validation annotation


const (
// HTTP events.
// For additional details, see https://www.envoyproxy.io/docs/envoy/latest/configuration/http/http_filters/router_filter#x-envoy-retry-on

// The upstream server responds with any 5xx response code, or does not respond at all (disconnect/reset/read timeout).
// Includes connect-failure and refused-stream.
Error5XX TriggerEnum = "5xx"
// The response is a gateway error (502,503 or 504).
GatewayError TriggerEnum = "gateway-error"
// The upstream server does not respond at all (disconnect/reset/read timeout.)
DisconnectRest TriggerEnum = "disconnect-reset"
// Connection failure to the upstream server (connect timeout, etc.). (Included in *5xx*)
ConnectFailure TriggerEnum = "connect-failure"
// The upstream server responds with a retriable 4xx response code.
// Currently, the only response code in this category is 409.
Retriable4XX TriggerEnum = "retriable-4xx"
// The upstream server resets the stream with a REFUSED_STREAM error code.
RefusedStream TriggerEnum = "refused-stream"
// The upstream server responds with any response code matching one defined in the RetriableStatusCodes.
RetriableStatusCodes TriggerEnum = "retriable-status-codes"

// GRPC events, currently only supported for gRPC status codes in response headers.
// For additional details, see https://www.envoyproxy.io/docs/envoy/latest/configuration/http/http_filters/router_filter#x-envoy-retry-grpc-on

// The gRPC status code in the response headers is “cancelled”.
Cancelled TriggerEnum = "cancelled"
// The gRPC status code in the response headers is “deadline-exceeded”.
DeadlineExceeded TriggerEnum = "deadline-exceeded"
// The gRPC status code in the response headers is “internal”.
Internal TriggerEnum = "internal"
// The gRPC status code in the response headers is “resource-exhausted”.
ResourceExhausted TriggerEnum = "resource-exhausted"
// The gRPC status code in the response headers is “unavailable”.
Unavailable TriggerEnum = "unavailable"
)

type PerRetryPolicy struct {
// Timeout is the timeout per retry attempt.
//
// +optional
// +kubebuilder:validation:Format=duration
Timeout *metav1.Duration `json:"timeout,omitempty"`
// IdleTimeout is the upstream idle timeout per retry attempt.This parameter is optional and if absent there is no per try idle timeout.
//
// +optional
// +kubebuilder:validation:Format=duration
IdleTimeout *metav1.Duration `json:"idleTimeout,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we also rm IdleTimeout in the first iteration ?

// Backoff is the backoff policy to be applied per retry attempt. gateway uses a fully jittered exponential
// back-off algorithm for retries. For additional details,
// see https://www.envoyproxy.io/docs/envoy/latest/configuration/http/http_filters/router_filter#config-http-filters-router-x-envoy-max-retries
//
// +optional
BackOff *BackOffPolicy `json:"backOff,omitempty"`
}

type BackOffPolicy struct {
// BaseInterval is the base interval between retries.
//
// +kubebuilder:validation:Format=duration
BaseInterval *metav1.Duration `json:"baseInterval,omitempty"`
// MaxInterval is the maximum interval between retries. This parameter is optional, but must be greater than or equal to the base_interval if set.
// The default is 10 times the base_interval
//
// +optional
// +kubebuilder:validation:Format=duration
MaxInterval *metav1.Duration `json:"maxInterval,omitempty"`
// we can add rate limited based backoff config here if we want to.
}
115 changes: 115 additions & 0 deletions api/v1alpha1/zz_generated.deepcopy.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Original file line number Diff line number Diff line change
Expand Up @@ -656,6 +656,69 @@ spec:
required:
- type
type: object
retry:
description: Retry provides more advanced usage, allowing users to
customize the number of retries, retry fallback strategy, and retry
triggering conditions. If not set, retry will be disabled.
properties:
numRetries:
default: 2
description: NumRetries is the number of retries to be attempted.
Defaults to 2.
type: integer
perRetry:
description: PerRetry is the retry policy to be applied per retry
attempt.
properties:
backOff:
description: Backoff is the backoff policy to be applied per
retry attempt. gateway uses a fully jittered exponential
back-off algorithm for retries. For additional details,
see https://www.envoyproxy.io/docs/envoy/latest/configuration/http/http_filters/router_filter#config-http-filters-router-x-envoy-max-retries
properties:
baseInterval:
description: BaseInterval is the base interval between
retries.
format: duration
type: string
maxInterval:
description: MaxInterval is the maximum interval between
retries. This parameter is optional, but must be greater
than or equal to the base_interval if set. The default
is 10 times the base_interval
format: duration
type: string
type: object
idleTimeout:
description: IdleTimeout is the upstream idle timeout per
retry attempt.This parameter is optional and if absent there
is no per try idle timeout.
format: duration
type: string
timeout:
description: Timeout is the timeout per retry attempt.
format: duration
type: string
type: object
retryOn:
description: "RetryOn specifies the retry trigger condition. \n
If not specified, the default is to retry on connect-failure,refused-stream,unavailable,cancelled,retriable-status-codes(503)."
properties:
httpStatusCodes:
description: HttpStatusCodes specifies the http status codes
to be retried.
items:
type: integer
type: array
triggers:
description: Triggers specifies the retry trigger condition(Http/Grpc).
items:
description: TriggerEnum specifies the conditions that trigger
retries.
type: string
type: array
type: object
type: object
targetRef:
description: targetRef is the name of the resource this policy is
being attached to. This Policy and the TargetRef MUST be in the
Expand Down
Loading
Loading