Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(translator): Implement BTP Timeout API #2454

Merged
merged 1 commit into from
Jan 29, 2024

Conversation

guydc
Copy link
Contributor

@guydc guydc commented Jan 16, 2024

What this PR does / why we need it:
Implement the BackendTrafficPolicy Timeout API

Which issue(s) this PR fixes:
Fixes #2401

@guydc guydc requested a review from a team as a code owner January 16, 2024 23:53
Copy link

codecov bot commented Jan 16, 2024

Codecov Report

Attention: 84 lines in your changes are missing coverage. Please review.

Comparison is base (8d19d77) 64.70% compared to head (ed5ca57) 64.61%.
Report is 1 commits behind head on main.

Files Patch % Lines
internal/ir/zz_generated.deepcopy.go 0.00% 61 Missing and 1 partial ⚠️
internal/gatewayapi/backendtrafficpolicy.go 86.41% 7 Missing and 4 partials ⚠️
internal/gatewayapi/route.go 57.69% 9 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2454      +/-   ##
==========================================
- Coverage   64.70%   64.61%   -0.09%     
==========================================
  Files         116      116              
  Lines       17613    17797     +184     
==========================================
+ Hits        11396    11500     +104     
- Misses       5486     5560      +74     
- Partials      731      737       +6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.


}

func SetBackendTrafficPolicyTranslationErrorCondition(policy *egv1a1.BackendTrafficPolicy, field, errMsg string) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use lower case here to make the func private - setBackend....

@@ -391,6 +391,8 @@ type HTTPRoute struct {
ExtensionRefs []*UnstructuredRef `json:"extensionRefs,omitempty" yaml:"extensionRefs,omitempty"`
// Circuit Breaker Settings
CircuitBreaker *CircuitBreaker `json:"circuitBreaker,omitempty" yaml:"circuitBreaker,omitempty"`
// Connection timeout Settings
BackendTimeout *Timeout `json:"backendTimeout,omitempty" yaml:"backendTimeout,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prefer if this was renamed to Timeout and we moved the existing Timeout field into it and renamed it to RequestTimeout under HTTP

if hasCommonHTTPOptions {
protocolOptions.CommonHttpProtocolOptions = &corev3.HttpProtocolOptions{
IdleTimeout: durationpb.New(args.timeout.HTTP.ConnectionIdleTimeout.Duration),
MaxConnectionDuration: durationpb.New(args.timeout.HTTP.MaxConnectionDuration.Duration),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these two duration should be checked (whether is nil ptr) before assign values toIdleTimeout and MaxConnectionDuration, nor it will raise a nil pointer panic.

http:
- name: "first-listener"
  #...
  routes:
  - name: "first-route"
    backendTimeout:
      tcp:
        connectTimeout: "31s"
      http:
        connectionIdleTimeout: "32s"
        #maxConnectionDuration: "33s" # paniced
    #...

Copy link
Contributor

@shawnh2 shawnh2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thanks

@zirain zirain requested a review from arkodg January 23, 2024 03:08
Copy link
Member

@zhaohuabing zhaohuabing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implementation LGTM, thanks!

A minor comment on the name of Timeout structure in the API: we may want to change it to BackendTimeout because the TCP ConnectTimeout can't be reused for the future client-side Timeouts.

// BackendTimeout defines configuration for timeouts related to the backend.
type BackendTimeout struct {
	// Timeout settings for TCP.
	//
	// +optional
	TCP *TCPTimeout `json:"tcp,omitempty"`

	// Timeout settings for HTTP.
	//
	// +optional
	HTTP *HTTPTimeout `json:"http,omitempty"`
}

@guydc
Copy link
Contributor Author

guydc commented Jan 23, 2024

A minor comment on the name of Timeout structure in the API: we may want to change it to BackendTimeout because the TCP ConnectTimeout can't be reused for the future client-side Timeouts.

Can we translate this to transport socket timeout for downstream connections? If so, it can be reused...

@@ -227,12 +227,20 @@ func (t *Translator) processHTTPRouteRules(httpRoute *HTTPRouteContext, parentRe

func processTimeout(irRoute *ir.HTTPRoute, rule gwapiv1.HTTPRouteRule) {
if rule.Timeouts != nil {
var rto *ir.Timeout

if irRoute.Timeout != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: policy is computed/translated post route translation, so we can assume its empty here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to this to be safe in case there's a re-arrangement of the computation order. If you feel that we can make this assumption, i'll update the implementation accordingly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, if thats the case, can you also add a comment here, so the next dev can understand why this done this way similar to the comment you have in policy around request timeout

d, err := time.ParseDuration(string(*pto.TCP.ConnectTimeout))
if err != nil {
setBackendTrafficPolicyTranslationErrorCondition(policy, "TCP Timeout", fmt.Sprintf("invalid ConnectTimeout value %s", *pto.TCP.ConnectTimeout))
return nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should log error in status and continue processing instead of returning early here

d, err := time.ParseDuration(string(*pto.HTTP.ConnectionIdleTimeout))
if err != nil {
setBackendTrafficPolicyTranslationErrorCondition(policy, "HTTP Timeout", fmt.Sprintf("invalid ConnectionIdleTimeout value %s", *pto.HTTP.ConnectionIdleTimeout))
return nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

d, err := time.ParseDuration(string(*pto.HTTP.MaxConnectionDuration))
if err != nil {
setBackendTrafficPolicyTranslationErrorCondition(policy, "HTTP Timeout", fmt.Sprintf("invalid MaxConnectionDuration value %s", *pto.HTTP.MaxConnectionDuration))
return nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

@@ -14,4 +14,3 @@
name: first-route
route:
cluster: first-route-dest
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happened here, why did 5s get deleted ?

@@ -10,7 +10,8 @@ http:
routes:
- name: "first-route"
hostname: "*"
timeout: 5s
timeout:
requestTimeout: 5s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test is broken, probably needs below config

timeout:
  http:
    requestTimeout: 5s

@zhaohuabing
Copy link
Member

A minor comment on the name of Timeout structure in the API: we may want to change it to BackendTimeout because the TCP ConnectTimeout can't be reused for the future client-side Timeouts.

Can we translate this to transport socket timeout for downstream connections? If so, it can be reused...

Not sure about it, they look similar but not exactly the same. Are there any other possible different settings between client and backend timeouts?

@arkodg @zirain WDYT?

@arkodg
Copy link
Contributor

arkodg commented Jan 26, 2024

A minor comment on the name of Timeout structure in the API: we may want to change it to BackendTimeout because the TCP ConnectTimeout can't be reused for the future client-side Timeouts.

Can we translate this to transport socket timeout for downstream connections? If so, it can be reused...

Not sure about it, they look similar but not exactly the same. Are there any other possible different settings between client and backend timeouts?

@arkodg @zirain WDYT?

the go struct name can be changed anytime w/o breaking YAML API compatibility so no strong comments for now from my side

@guydc
Copy link
Contributor Author

guydc commented Jan 26, 2024

Regarding E2E test for this feature: echo server does support a /delay endpoint. However, since these settings are per-connection and not per-request, timed-out connections will not terminate active delayed requests. In other words, it's difficult to demonstrate a connection timeout on a specific request.

  • idle timeout: "The idle timeout is defined as the period in which there are no active requests."
  • max connection duration: "When max_connection_duration is reached and if there are no active streams, the connection will be closed"

One way to verify timeout settings with an e2e test is to trigger some upstream connections, wait for timeouts to occur and then check the relevant envoy cluster metrics.

I propose to handle the e2e test in a dedicated issue.

@arkodg
Copy link
Contributor

arkodg commented Jan 26, 2024

sure @guydc lets raise seperate GH issues for e2e and docs and tackle those in different PRs

arkodg
arkodg previously approved these changes Jan 26, 2024
Copy link
Contributor

@arkodg arkodg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks !

Signed-off-by: Guy Daich <[email protected]>
@guydc
Copy link
Contributor Author

guydc commented Jan 29, 2024

/retest

Copy link
Contributor

@arkodg arkodg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks !

@arkodg arkodg merged commit f1a0a42 into envoyproxy:main Jan 29, 2024
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support upstream HTTP connection timeouts
4 participants