Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

router: set correct timeout for egress->ingress envoys #8051

Merged
merged 14 commits into from
Oct 8, 2019

Conversation

nezdolik
Copy link
Member

@nezdolik nezdolik commented Aug 27, 2019

Signed-off-by: Kateryna Nezdolii [email protected]

Description: In case of egress->ingress envoy setup, ingress envoy currently does not respect x-envoy-expected-rq-timeout-ms header, set by egress envoy and overrides the header with it's own timeout value. This change makes ingress envoy to respect x-envoy-expected-rq-timeout-ms header value, if it's present in request.
Risk Level: Low
Testing: Added unit and integration test to make sure header is not sanitised and not ignored.
Docs Changes: Updated API v2 docs
Release Notes: Updated version history
Fixes #7358

Copy link
Contributor

@snowp snowp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, this is a good start

expected_timeout = timeout.global_timeout_.count();
}
// todo(nezdolik) Check if order is correct, add tests.
// Check if there is timeout set by egress envoy. If present, use that instead.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we probably need this to be guarded by a config flag as this could potentially break existing deployments

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@snowp, you mean by runtime guarding or by simple bool config property?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the past I think we've been doing xds config for these kind of changes, but I guess this case could be considered a bug fix. @mattklein123 do you have any thoughts around what kind of feature guarding we'll need for this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My quick thought is this should probably be a full on config option in the router?

{"x-envoy-expected-rq-timeout-ms", "10"}};
FilterUtility::TimeoutData timeout =
FilterUtility::finalTimeout(route, headers, true, false, false);
EXPECT_EQ(std::chrono::milliseconds(10), timeout.global_timeout_);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since the route timeout and the expected-rq-timeout-ms is the same in this test case, this line doesn't really verify that we're honoring the expected-rq-timeout-ms. You probably want to make these values different so that you can verify that we're capping the global timeout by the incoming deadline

if (absl::SimpleAtoi(header_timeout_entry->value().getStringView(), &header_timeout)) {
timeout.global_timeout_ = std::chrono::milliseconds(header_timeout);
Http::HeaderEntry* header_expected_timeout_entry =
request_headers.EnvoyExpectedRequestTimeoutMs();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it might be worthwhile to add an integration test to verify that we actually have this header at this point due to all the possible header sanitization that happens at various points during request processing

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, thanks

@nezdolik
Copy link
Member Author

nezdolik commented Sep 2, 2019

added config guard, may need to fix tests later.

@stale
Copy link

stale bot commented Sep 11, 2019

This pull request has been automatically marked as stale because it has not had activity in the last 7 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

@stale stale bot added the stale stalebot believes this issue/PR has not been touched recently label Sep 11, 2019
@nezdolik
Copy link
Member Author

still working on this (was on vacation).

@stale stale bot removed the stale stalebot believes this issue/PR has not been touched recently label Sep 17, 2019
@snowp snowp added no stalebot Disables stalebot from closing an issue waiting labels Sep 20, 2019
Kateryna Nezdolii added 4 commits September 23, 2019 19:17
Signed-off-by: Kateryna Nezdolii <[email protected]>
Signed-off-by: Kateryna Nezdolii <[email protected]>
Signed-off-by: Kateryna Nezdolii <[email protected]>
@repokitteh-read-only
Copy link

CC @envoyproxy/api-shepherds: Your approval is needed for changes made to api/.

🐱

Caused by: #8051 was synchronize by nezdolik.

see: more, trace.

Signed-off-by: Kateryna Nezdolii <[email protected]>
Kateryna Nezdolii added 2 commits September 23, 2019 21:56
Signed-off-by: Kateryna Nezdolii <[email protected]>
Signed-off-by: Kateryna Nezdolii <[email protected]>
@nezdolik
Copy link
Member Author

/retest

@repokitteh-read-only
Copy link

🔨 rebuilding ci/circleci: coverage (failed build)

🐱

Caused by: a #8051 (comment) was created by @nezdolik.

see: more, trace.

@nezdolik nezdolik changed the title WIP router: set correct timeout for egress->ingress envoys router: set correct timeout for egress->ingress envoys Sep 24, 2019
Copy link
Contributor

@snowp snowp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking pretty good overall, just a few comments

@@ -63,4 +63,10 @@ message Router {
"x-envoy-retry-on"
]
}];

// If set to true, envoy first will check if `x-envoy-expected-timeout-ms` header is present
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "will first check if the `x-envoy"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also use *x-envoy-expected-timeout-ms* formatting for headers.

Http::HeaderEntry* header_expected_timeout_entry =
request_headers.EnvoyExpectedRequestTimeoutMs();
if (header_expected_timeout_entry) {
// This will prevent from overriding `x-envoy-expected-rq-timeout-ms` header.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need this? won't this be set based on the global_timeout_ later on which should match the value we're extracting from the expected-rq-timeout-ms header?

Copy link
Member Author

@nezdolik nezdolik Sep 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@snowp this was protection against this code path:

  if (insert_envoy_expected_request_timeout_ms && expected_timeout > 0) {
    request_headers.insertEnvoyExpectedRequestTimeoutMs().value(expected_timeout);
  }

We do indeed derive timeout and put it into a separate data structure (with global timeout), so there is not a big gain from setting insert_envoy_expected_request_timeout_ms to false.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed my mind, there is a gain, so that we use same value in timeout.global_timeout (derived from x-envoy-expected-timeout-ms) and observe same value in header x-envoy-expected-timeout-ms by not overriding it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I think we want the expected header to reflect the timeout used by the router, which is affected by more than just those two headers. If you look further down in that function you'll see that we'll use the per try timeout instead of the global timeout if it's set. It seems like we want the expected timeout header to always reflect the timeout enforced by the router for the outgoing request, and just use the incoming expected timeout header to infer the global timeout. Does that make sense?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@snowp it does, thanks for clarification

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will adjust the test as well

// If set to true, envoy first will check if `x-envoy-expected-timeout-ms` header is present
// and use it's value as timeout to upstream cluster. If header is not present or
// `respect_expected_rq_timeout` is set to false, envoy will derive timeout value from
// `x-envoy-upstream-rq-timeout-ms` header.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i dont think this is completely correct, as there might not be a rq-timeout header in which case the route specified timeout is used. Might be better to avoid listing what the behavior is without this flag to help ensure that this comment doesn't have to be updated whenever the timeout decision logic changes

ASSERT_TRUE(upstream_request_->waitForEndStream(*dispatcher_));

// Trigger global timeout, populated from `x-envoy-expected-rq-timeout-ms` header.
timeSystem().sleep(std::chrono::milliseconds(501));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since the upstream-rq-timeout header is greater than the expected-rq-timeout, you can't tell based on this test that the timeout is due to upstream-rq-timeout. I suggest making upstream-rq-timeout 300ms instead so that we know we're timing out before the expected-rq-timeout timer would hit

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@snowp made x-envoy-upstream-rq-timeout-ms smaller than x-envoy-expected-rq-timeout-ms

@@ -63,4 +63,10 @@ message Router {
"x-envoy-retry-on"
]
}];

// If set to true, envoy first will check if `x-envoy-expected-timeout-ms` header is present
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also use *x-envoy-expected-timeout-ms* formatting for headers.


// If set to true, envoy first will check if `x-envoy-expected-timeout-ms` header is present
// and use it's value as timeout to upstream cluster. If header is not present or
// `respect_expected_rq_timeout` is set to false, envoy will derive timeout value from
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you :ref: internal link to the relevant field in the API docs?


// If set to true, envoy first will check if `x-envoy-expected-timeout-ms` header is present
// and use it's value as timeout to upstream cluster. If header is not present or
// `respect_expected_rq_timeout` is set to false, envoy will derive timeout value from
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: s/envoy/Envoy/g

source/common/router/router.cc Show resolved Hide resolved
Signed-off-by: Kateryna Nezdolii <[email protected]>
Kateryna Nezdolii added 2 commits September 27, 2019 14:05
Signed-off-by: Kateryna Nezdolii <[email protected]>
Signed-off-by: Kateryna Nezdolii <[email protected]>
@mattklein123 mattklein123 removed the no stalebot Disables stalebot from closing an issue label Sep 27, 2019
snowp
snowp previously approved these changes Sep 30, 2019
Copy link
Contributor

@snowp snowp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well done, this LGTM

@snowp snowp added the api-review-required API review required by @envoyproxy/api-shepherds label Sep 30, 2019
Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, great work. Just a few small comments. Great feature!

/wait

Comment on lines 164 to 167
if (header_expected_timeout_entry) {
if (absl::SimpleAtoi(header_expected_timeout_entry->value().getStringView(),
&header_timeout)) {
timeout.global_timeout_ = std::chrono::milliseconds(header_timeout);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can you lift this part out into a helper which takes the header, etc.? It's repeated 3 times in this function and it would help readability.

Kateryna Nezdolii added 2 commits October 7, 2019 11:26
Signed-off-by: Kateryna Nezdolii <[email protected]>
Signed-off-by: Kateryna Nezdolii <[email protected]>
Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks!

@mattklein123 mattklein123 merged commit 3f7b132 into envoyproxy:master Oct 8, 2019
nandu-vinodan pushed a commit to nandu-vinodan/envoy that referenced this pull request Oct 17, 2019
@Sooryaa-A
Copy link

hi All,
Im trying to send x-envoy-expected-rq-timeout-ms from my application to egress envoy and expecting envoy to use the timeout set in this header to be used for upstream connection request timeout.But the upstream connection request is timing out after the route timeout -30s value only.

Attaching my config below.

static_resources:
listeners:
- address:
socket_address:
protocol: TCP
address: 10.10.180.90
port_value: 9001
filter_chains:
- filters:
name: envoy.filters.network.http_connection_manager
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
stat_prefix: upstream_listener
route_config:
name: upstream_listener
virtual_hosts:
- name: upstream_listener
domains:
- "*"
routes:
- match:
prefix: "/"
route:
cluster: server
timeout: 30s
http_filters:
- name: envoy.filters.http.router
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
respect_expected_rq_timeout: true
suppress_envoy_headers:true
use_remote_address: true

@nezdolik
Copy link
Member Author

@Sooryaa-A please create a dedicated issue in envoy repo

@nezdolik nezdolik deleted the fix-timeout branch January 12, 2022 16:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-review-required API review required by @envoyproxy/api-shepherds
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Propagate downstream timeout to upstream through multiple Envoys
5 participants