router: set correct timeout for egress->ingress envoys #8051

nezdolik · 2019-08-27T12:24:07Z

Signed-off-by: Kateryna Nezdolii [email protected]

Description: In case of egress->ingress envoy setup, ingress envoy currently does not respect x-envoy-expected-rq-timeout-ms header, set by egress envoy and overrides the header with it's own timeout value. This change makes ingress envoy to respect x-envoy-expected-rq-timeout-ms header value, if it's present in request.
Risk Level: Low
Testing: Added unit and integration test to make sure header is not sanitised and not ignored.
Docs Changes: Updated API v2 docs
Release Notes: Updated version history
Fixes #7358

snowp

Nice, this is a good start

snowp · 2019-08-27T15:04:24Z

source/common/router/router.cc

+    expected_timeout = timeout.global_timeout_.count();
+  }
+  // todo(nezdolik) Check if order is correct, add tests.
+  // Check if there is timeout set by egress envoy. If present, use that instead.


we probably need this to be guarded by a config flag as this could potentially break existing deployments

@snowp, you mean by runtime guarding or by simple bool config property?

In the past I think we've been doing xds config for these kind of changes, but I guess this case could be considered a bug fix. @mattklein123 do you have any thoughts around what kind of feature guarding we'll need for this?

My quick thought is this should probably be a full on config option in the router?

snowp · 2019-08-27T15:11:50Z

test/common/router/router_test.cc

+                                    {"x-envoy-expected-rq-timeout-ms", "10"}};
+    FilterUtility::TimeoutData timeout =
+        FilterUtility::finalTimeout(route, headers, true, false, false);
+    EXPECT_EQ(std::chrono::milliseconds(10), timeout.global_timeout_);


since the route timeout and the expected-rq-timeout-ms is the same in this test case, this line doesn't really verify that we're honoring the expected-rq-timeout-ms. You probably want to make these values different so that you can verify that we're capping the global timeout by the incoming deadline

snowp · 2019-08-27T15:13:32Z

source/common/router/router.cc

-    if (absl::SimpleAtoi(header_timeout_entry->value().getStringView(), &header_timeout)) {
-      timeout.global_timeout_ = std::chrono::milliseconds(header_timeout);
+  Http::HeaderEntry* header_expected_timeout_entry =
+      request_headers.EnvoyExpectedRequestTimeoutMs();


it might be worthwhile to add an integration test to verify that we actually have this header at this point due to all the possible header sanitization that happens at various points during request processing

Good point, thanks

nezdolik · 2019-09-02T15:39:08Z

added config guard, may need to fix tests later.

stale · 2019-09-11T08:23:11Z

This pull request has been automatically marked as stale because it has not had activity in the last 7 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

nezdolik · 2019-09-17T23:32:03Z

still working on this (was on vacation).

Signed-off-by: Kateryna Nezdolii <[email protected]>

repokitteh-read-only · 2019-09-23T19:48:33Z

CC @envoyproxy/api-shepherds: Your approval is needed for changes made to api/.

🐱

Caused by: #8051 was synchronize by nezdolik.

see: more, trace.

Signed-off-by: Kateryna Nezdolii <[email protected]>

nezdolik · 2019-09-24T06:37:35Z

/retest

repokitteh-read-only · 2019-09-24T06:37:40Z

🔨 rebuilding ci/circleci: coverage (failed build)

🐱

Caused by: a #8051 (comment) was created by @nezdolik.

see: more, trace.

snowp

Looking pretty good overall, just a few comments

snowp · 2019-09-24T13:52:11Z

api/envoy/config/filter/http/router/v2/router.proto

@@ -63,4 +63,10 @@ message Router {
      "x-envoy-retry-on"
    ]
  }];
+
+  // If set to true, envoy first will check if `x-envoy-expected-timeout-ms` header is present


nit: "will first check if the `x-envoy"

Also use *x-envoy-expected-timeout-ms* formatting for headers.

snowp · 2019-09-24T13:55:06Z

source/common/router/router.cc

+    Http::HeaderEntry* header_expected_timeout_entry =
+        request_headers.EnvoyExpectedRequestTimeoutMs();
+    if (header_expected_timeout_entry) {
+      // This will prevent from overriding `x-envoy-expected-rq-timeout-ms` header.


do we need this? won't this be set based on the global_timeout_ later on which should match the value we're extracting from the expected-rq-timeout-ms header?

@snowp this was protection against this code path:

if (insert_envoy_expected_request_timeout_ms && expected_timeout > 0) { request_headers.insertEnvoyExpectedRequestTimeoutMs().value(expected_timeout); }

We do indeed derive timeout and put it into a separate data structure (with global timeout), so there is not a big gain from setting insert_envoy_expected_request_timeout_ms to false.

changed my mind, there is a gain, so that we use same value in timeout.global_timeout (derived from x-envoy-expected-timeout-ms) and observe same value in header x-envoy-expected-timeout-ms by not overriding it.

Hmm, I think we want the expected header to reflect the timeout used by the router, which is affected by more than just those two headers. If you look further down in that function you'll see that we'll use the per try timeout instead of the global timeout if it's set. It seems like we want the expected timeout header to always reflect the timeout enforced by the router for the outgoing request, and just use the incoming expected timeout header to infer the global timeout. Does that make sense?

@snowp it does, thanks for clarification

will adjust the test as well

snowp · 2019-09-24T13:58:45Z

api/envoy/config/filter/http/router/v2/router.proto

+  // If set to true, envoy first will check if `x-envoy-expected-timeout-ms` header is present
+  // and use it's value as timeout to upstream cluster. If header is not present or
+  // `respect_expected_rq_timeout` is set to false, envoy will derive timeout value from
+  // `x-envoy-upstream-rq-timeout-ms` header.


i dont think this is completely correct, as there might not be a rq-timeout header in which case the route specified timeout is used. Might be better to avoid listing what the behavior is without this flag to help ensure that this comment doesn't have to be updated whenever the timeout decision logic changes

snowp · 2019-09-24T14:02:51Z

test/integration/http_timeout_integration_test.cc

+  ASSERT_TRUE(upstream_request_->waitForEndStream(*dispatcher_));
+
+  // Trigger global timeout, populated from `x-envoy-expected-rq-timeout-ms` header.
+  timeSystem().sleep(std::chrono::milliseconds(501));


since the upstream-rq-timeout header is greater than the expected-rq-timeout, you can't tell based on this test that the timeout is due to upstream-rq-timeout. I suggest making upstream-rq-timeout 300ms instead so that we know we're timing out before the expected-rq-timeout timer would hit

@snowp made x-envoy-upstream-rq-timeout-ms smaller than x-envoy-expected-rq-timeout-ms

htuch · 2019-09-25T14:27:04Z

api/envoy/config/filter/http/router/v2/router.proto

@@ -63,4 +63,10 @@ message Router {
      "x-envoy-retry-on"
    ]
  }];
+
+  // If set to true, envoy first will check if `x-envoy-expected-timeout-ms` header is present


Also use *x-envoy-expected-timeout-ms* formatting for headers.

htuch · 2019-09-25T14:27:27Z

api/envoy/config/filter/http/router/v2/router.proto

+
+  // If set to true, envoy first will check if `x-envoy-expected-timeout-ms` header is present
+  // and use it's value as timeout to upstream cluster. If header is not present or
+  // `respect_expected_rq_timeout` is set to false, envoy will derive timeout value from


Can you :ref: internal link to the relevant field in the API docs?

htuch · 2019-09-25T14:27:38Z

api/envoy/config/filter/http/router/v2/router.proto

+
+  // If set to true, envoy first will check if `x-envoy-expected-timeout-ms` header is present
+  // and use it's value as timeout to upstream cluster. If header is not present or
+  // `respect_expected_rq_timeout` is set to false, envoy will derive timeout value from


Nit: s/envoy/Envoy/g

source/common/router/router.cc

Signed-off-by: Kateryna Nezdolii <[email protected]>

snowp

Well done, this LGTM

mattklein123

Thanks, great work. Just a few small comments. Great feature!

/wait

api/envoy/config/filter/http/router/v2/router.proto

mattklein123 · 2019-10-01T18:29:34Z

source/common/router/router.cc

+    if (header_expected_timeout_entry) {
+      if (absl::SimpleAtoi(header_expected_timeout_entry->value().getStringView(),
+                           &header_timeout)) {
+        timeout.global_timeout_ = std::chrono::milliseconds(header_timeout);


nit: can you lift this part out into a helper which takes the header, etc.? It's repeated 3 times in this function and it would help readability.

Signed-off-by: Kateryna Nezdolii <[email protected]>

mattklein123

Awesome, thanks!

Signed-off-by: Kateryna Nezdolii <[email protected]>

Sooryaa-A · 2022-01-10T12:19:21Z

hi All,
Im trying to send x-envoy-expected-rq-timeout-ms from my application to egress envoy and expecting envoy to use the timeout set in this header to be used for upstream connection request timeout.But the upstream connection request is timing out after the route timeout -30s value only.

Attaching my config below.

static_resources:
listeners:
- address:
socket_address:
protocol: TCP
address: 10.10.180.90
port_value: 9001
filter_chains:
- filters:
name: envoy.filters.network.http_connection_manager
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
stat_prefix: upstream_listener
route_config:
name: upstream_listener
virtual_hosts:
- name: upstream_listener
domains:
- "*"
routes:
- match:
prefix: "/"
route:
cluster: server
timeout: 30s
http_filters:
- name: envoy.filters.http.router
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
respect_expected_rq_timeout: true
suppress_envoy_headers:true
use_remote_address: true

nezdolik · 2022-01-12T15:59:40Z

@Sooryaa-A please create a dedicated issue in envoy repo

nezdolik force-pushed the fix-timeout branch from 3e05e1b to ae8ab5d Compare August 27, 2019 12:52

snowp self-assigned this Aug 27, 2019

snowp suggested changes Aug 27, 2019

View reviewed changes

stale bot added the stale stalebot believes this issue/PR has not been touched recently label Sep 11, 2019

stale bot removed the stale stalebot believes this issue/PR has not been touched recently label Sep 17, 2019

snowp added no stalebot Disables stalebot from closing an issue waiting labels Sep 20, 2019

Kateryna Nezdolii added 4 commits September 23, 2019 19:17

router: set correct timeout for egress->ingress envoys

5c5a706

Signed-off-by: Kateryna Nezdolii <[email protected]>

apply review comments

2b3080d

Signed-off-by: Kateryna Nezdolii <[email protected]>

fix format

b36858a

Signed-off-by: Kateryna Nezdolii <[email protected]>

Add more tests

99cd2cd

Signed-off-by: Kateryna Nezdolii <[email protected]>

nezdolik requested review from alyssawilk, htuch, lizan and zuercher as code owners September 23, 2019 19:48

repokitteh-read-only bot removed the waiting label Sep 23, 2019

clean up

be4d596

Signed-off-by: Kateryna Nezdolii <[email protected]>

nezdolik force-pushed the fix-timeout branch from a3a1551 to be4d596 Compare September 23, 2019 19:51

Kateryna Nezdolii added 2 commits September 23, 2019 21:56

clean up

de1fdf2

Signed-off-by: Kateryna Nezdolii <[email protected]>

fix spelling

b0ea85a

Signed-off-by: Kateryna Nezdolii <[email protected]>

nezdolik changed the title ~~WIP router: set correct timeout for egress->ingress envoys~~ router: set correct timeout for egress->ingress envoys Sep 24, 2019

snowp suggested changes Sep 24, 2019

View reviewed changes

htuch reviewed Sep 25, 2019

View reviewed changes

apply review comments

cbb9594

Signed-off-by: Kateryna Nezdolii <[email protected]>

update docs, release notes and fix format

90695da

Signed-off-by: Kateryna Nezdolii <[email protected]>

nezdolik requested review from snowp and htuch September 27, 2019 11:19

Kateryna Nezdolii added 2 commits September 27, 2019 14:05

sync v2 and v3alpha

8432d62

Signed-off-by: Kateryna Nezdolii <[email protected]>

fix test

967acfd

Signed-off-by: Kateryna Nezdolii <[email protected]>

mattklein123 removed the no stalebot Disables stalebot from closing an issue label Sep 27, 2019

snowp previously approved these changes Sep 30, 2019

View reviewed changes

snowp added the api-review-required API review required by @envoyproxy/api-shepherds label Sep 30, 2019

snowp assigned mattklein123 Sep 30, 2019

mattklein123 requested changes Oct 1, 2019

View reviewed changes

repokitteh-read-only bot added the waiting label Oct 1, 2019

Kateryna Nezdolii added 2 commits October 7, 2019 11:26

apply review comments

1450fc9

Signed-off-by: Kateryna Nezdolii <[email protected]>

fix format

cc6f485

Signed-off-by: Kateryna Nezdolii <[email protected]>

nezdolik dismissed snowp’s stale review via cc6f485 October 7, 2019 11:10

repokitteh-read-only bot removed the waiting label Oct 7, 2019

Merge remote-tracking branch 'origin/master' into fix-timeout

8992d02

Signed-off-by: Kateryna Nezdolii <[email protected]>

mattklein123 approved these changes Oct 8, 2019

View reviewed changes

mattklein123 merged commit 3f7b132 into envoyproxy:master Oct 8, 2019

nandu-vinodan pushed a commit to nandu-vinodan/envoy that referenced this pull request Oct 17, 2019

router: set correct timeout for egress->ingress envoys (envoyproxy#8051)

27587cb

Signed-off-by: Kateryna Nezdolii <[email protected]>

blake mentioned this pull request Feb 18, 2020

Consul connect provide a way to configure envoy route timeout hashicorp/consul#6382

Open

nezdolik deleted the fix-timeout branch January 12, 2022 16:01

router: set correct timeout for egress->ingress envoys #8051

router: set correct timeout for egress->ingress envoys #8051

Conversation

nezdolik commented Aug 27, 2019 • edited Loading

snowp left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nezdolik commented Sep 2, 2019

stale bot commented Sep 11, 2019

nezdolik commented Sep 17, 2019

repokitteh-read-only bot commented Sep 23, 2019

nezdolik commented Sep 24, 2019

repokitteh-read-only bot commented Sep 24, 2019

snowp left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nezdolik Sep 24, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

snowp left a comment

Choose a reason for hiding this comment

mattklein123 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mattklein123 left a comment

Choose a reason for hiding this comment

Sooryaa-A commented Jan 10, 2022

nezdolik commented Jan 12, 2022

nezdolik commented Aug 27, 2019 •

edited

Loading

nezdolik Sep 24, 2019 •

edited

Loading