Add timeouts to destination connection details #89

hbagdi · 2020-02-13T23:57:12Z

What would you like to be added:

Destination service should include connect, read and write timeouts to be used by the proxy in addition to specifying a port (#48) and service name. This should be per service configuration as services of different types can requires different timeouts based on the workloads.

Why is this needed:

All proxies and LB implementation have timeouts that can be defined for establishing a TCP connection, read timeouts, write timeouts, sometimes referred to as idle timeouts.

/kind feature

fejta-bot · 2020-05-14T00:34:54Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

hbagdi · 2020-05-14T18:48:57Z

/remove-lifecycle stale

hbagdi · 2020-05-14T18:49:17Z

xref #184

fejta-bot · 2020-08-12T18:57:20Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2020-09-11T19:39:24Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

fejta-bot · 2020-10-11T20:22:31Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

k8s-ci-robot · 2020-10-11T20:22:38Z

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

robscott · 2020-10-11T21:12:46Z

/reopen
/remove-lifecycle rotten

k8s-ci-robot · 2020-10-11T21:12:52Z

@robscott: Reopened this issue.

In response to this:

/reopen
/remove-lifecycle rotten

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

hbagdi · 2020-12-07T19:15:49Z

/lifecycle frozen

robscott · 2021-04-21T00:38:02Z

I've spent some time digging into just how portable this could be, and unfortunately it looks like there is quite a bit of variation on how different projects handle this. One of the most broadly supported concepts is that of a request or read timeout, here are some examples of how that is defined:

GCP: This is the amount of time that the load balancer waits for a backend to return a full response to a request.
Envoy: The amount of time that Envoy will wait for the entire request to be received. The timer is activated when the request is initiated, and is disarmed when the last byte of the request is sent upstream (i.e. all decoding filters have processed the request), OR when the response is initiated.
Traefik: The maximum duration for reading the entire request, including the body.
HAProxy: The maximum accepted time to receive a complete HTTP request without affecting the client timeout… By default, this timeout only applies to the header part of the request, and not to any data. As soon as the empty line is received, this timeout is not used anymore. When combined with “option http-buffer-request”, this timeout also applies to the body of the request.
Skipper: Skipper responds with 504 Gateway Timeout status if obtaining a connection, sending the request, and reading the backend response headers and body takes longer than the configured timeout

At first glance, there appears to be some level of potential portability here, but even these relatively similar definitions have some divergence in what they're doing.

When we start to look a bit more broadly, we can see that implementations like nginx and caddyserver have very different timeout configurations that would simply not be portable with any of what's specified above.

I think @hbagdi summarized this well in a related Slack thread today:

Most implementations define timeouts in very different ways.

Timeouts also change must less often. Sometimes never.

Users want a way to configure timeouts. If they are not portable, that's fine. But they would like to keep timeouts separate from the portable configuration. Meaning, I can reuse most of my xRoute resources and I've to tweak only a couple of resources (BackendPolicy, ServicePolicy or such) when I'm migrating from one to another (or using 2 at the same time).

What we are trying to do here is create a higher-order abstraction in the API to achieve portability. But as we have just realized, portability is not possible at all. So this won't work.

And I think we don't even need that.

What Gateway API aims to provide is a consistent "experience". That doesn't require taking the same set of configuration and moving it from one implementation to another. That is a pipe dream (IMHO).

What would a consistent experience look like? A way to configure implementation-specific timeouts in a consistent manner - using the same location/resource in the API.

I think that should be enough because that gives everyone a good trade-off between maintainability, portability and consistency.

I'm interested in some additional perspective here - did we miss any paths to portability here? If not, maybe we should close out this issue?

/cc @bowei @danehans @hbagdi @jpeach @youngnick

youngnick · 2021-04-21T00:45:51Z

I think the way @hbagdi laid it out is both clear and correct. There's just too much variability here to really be able to make these settings portable.

But the other part of what he said is probably addressable under this issue, namely having a single spot in the API where you put timeouts, and a standard shape of thing to put there, probably a CRD per underlying implementation or something? That would let an implementation easily check if it supported a particular timeout set, or could translate the timeouts, and if not, keeps the config confined to one spot.

bowei · 2021-04-21T05:56:33Z

One question:

How many different types of timeouts are supported for each of the variants?
For example, is it suffice to say:

ReadTimeout
WriteTimeout

But the specific (which byte) qualifies is up to the implementation?

Do we need to be 100% proscriptive or is it still useful at a high level and point the user to the specific documentation when they have to get into the weeds?

youngnick · 2021-04-29T02:56:36Z

I think @robscott's review showed that even the definitions of what a particular timeout is are really varied between implementations. So I don't know if even that level of generic timeout is possible (because different implementations will have timeouts that may add up to that timeout or something).

jpeach · 2021-04-29T03:16:05Z

Timeouts in general are a incompatible, but there are some general use cases that might be able to help us

is this is streaming endpoint?
is this a request/response endpoint?
is this a large content upload/download endpoint?

Orthogonal to these issues is how to handle idle sessions.

So maybe an approach for the portable API is to represent timeouts as profiles that an implementation can use to do the right thing (or expand into something specific)?

hbagdi · 2021-08-23T23:36:10Z

#713 covers this in some sense.
Safe to close?
/cc @robscott @jpeach @youngnick

robscott · 2021-08-24T00:52:06Z

Yeah I think that's probably the most fitting solution here.

/close

k8s-ci-robot · 2021-08-24T00:52:13Z

@robscott: Closing this issue.

In response to this:

Yeah I think that's probably the most fitting solution here.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Add Mesh CRDs

hbagdi added the kind/feature Categorizes issue or PR as related to a new feature. label Feb 13, 2020

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 14, 2020

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 14, 2020

hbagdi mentioned this issue Jun 11, 2020

Service-level configuration #196

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 12, 2020

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Sep 11, 2020

k8s-ci-robot closed this as completed Oct 11, 2020

k8s-ci-robot reopened this Oct 11, 2020

k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Oct 11, 2020

k8s-ci-robot added the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Dec 7, 2020

robscott added this to the v0.3.0 milestone Mar 24, 2021

robscott added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Apr 14, 2021

robscott removed this from the v0.3.0 milestone Apr 23, 2021

hbagdi mentioned this issue Aug 3, 2021

Implementing policy attachment GEP #736

Merged

This was referenced Aug 3, 2021

GEP: Pattern for policy attachment #713

Open

Extensible Service policy and configuration #611

Closed

k8s-ci-robot closed this as completed Aug 24, 2021

jaison-tiu pushed a commit to jaison-tiu/gateway-api that referenced this issue Apr 14, 2022

Merge pull request kubernetes-sigs#89 from jaison-tiu/mesh-crd

854b8a3

Add Mesh CRDs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add timeouts to destination connection details #89

Add timeouts to destination connection details #89

hbagdi commented Feb 13, 2020

fejta-bot commented May 14, 2020

hbagdi commented May 14, 2020

hbagdi commented May 14, 2020

fejta-bot commented Aug 12, 2020

fejta-bot commented Sep 11, 2020

fejta-bot commented Oct 11, 2020

k8s-ci-robot commented Oct 11, 2020

robscott commented Oct 11, 2020

k8s-ci-robot commented Oct 11, 2020

hbagdi commented Dec 7, 2020

robscott commented Apr 21, 2021 •

edited

Loading

youngnick commented Apr 21, 2021

bowei commented Apr 21, 2021

youngnick commented Apr 29, 2021

jpeach commented Apr 29, 2021

hbagdi commented Aug 23, 2021

robscott commented Aug 24, 2021

k8s-ci-robot commented Aug 24, 2021

Add timeouts to destination connection details #89

Add timeouts to destination connection details #89

Comments

hbagdi commented Feb 13, 2020

fejta-bot commented May 14, 2020

hbagdi commented May 14, 2020

hbagdi commented May 14, 2020

fejta-bot commented Aug 12, 2020

fejta-bot commented Sep 11, 2020

fejta-bot commented Oct 11, 2020

k8s-ci-robot commented Oct 11, 2020

robscott commented Oct 11, 2020

k8s-ci-robot commented Oct 11, 2020

hbagdi commented Dec 7, 2020

robscott commented Apr 21, 2021 • edited Loading

youngnick commented Apr 21, 2021

bowei commented Apr 21, 2021

youngnick commented Apr 29, 2021

jpeach commented Apr 29, 2021

hbagdi commented Aug 23, 2021

robscott commented Aug 24, 2021

k8s-ci-robot commented Aug 24, 2021

robscott commented Apr 21, 2021 •

edited

Loading