Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

router: set correct timeout for egress->ingress envoys #8051

Merged
merged 14 commits into from
Oct 8, 2019
6 changes: 6 additions & 0 deletions api/envoy/config/filter/http/router/v2/router.proto
Original file line number Diff line number Diff line change
Expand Up @@ -63,4 +63,10 @@ message Router {
"x-envoy-retry-on"
]
}];

// If set to true, envoy first will check if `x-envoy-expected-timeout-ms` header is present
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "will first check if the `x-envoy"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also use *x-envoy-expected-timeout-ms* formatting for headers.

// and use it's value as timeout to upstream cluster. If header is not present or
// `respect_expected_rq_timeout` is set to false, envoy will derive timeout value from
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you :ref: internal link to the relevant field in the API docs?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: s/envoy/Envoy/g

// `x-envoy-upstream-rq-timeout-ms` header.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i dont think this is completely correct, as there might not be a rq-timeout header in which case the route specified timeout is used. Might be better to avoid listing what the behavior is without this flag to help ensure that this comment doesn't have to be updated whenever the timeout decision logic changes

bool respect_expected_rq_timeout = 6;
}
2 changes: 1 addition & 1 deletion source/common/http/async_client_impl.cc
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ AsyncClientImpl::AsyncClientImpl(Upstream::ClusterInfoConstSharedPtr cluster,
Router::ShadowWriterPtr&& shadow_writer,
Http::Context& http_context)
: cluster_(cluster), config_("http.async-client.", local_info, stats_store, cm, runtime, random,
std::move(shadow_writer), true, false, false, {},
std::move(shadow_writer), true, false, false, false, {},
dispatcher.timeSource(), http_context),
dispatcher_(dispatcher) {}

Expand Down
42 changes: 35 additions & 7 deletions source/common/router/router.cc
Original file line number Diff line number Diff line change
Expand Up @@ -121,7 +121,8 @@ bool FilterUtility::shouldShadow(const ShadowPolicy& policy, Runtime::Loader& ru
FilterUtility::TimeoutData
FilterUtility::finalTimeout(const RouteEntry& route, Http::HeaderMap& request_headers,
bool insert_envoy_expected_request_timeout_ms, bool grpc_request,
bool per_try_timeout_hedging_enabled) {
bool per_try_timeout_hedging_enabled,
bool respect_expected_rq_timeout) {
// See if there is a user supplied timeout in a request header. If there is we take that.
// Otherwise if the request is gRPC and a maximum gRPC timeout is configured we use the timeout
// in the gRPC headers (or infinity when gRPC headers have no timeout), but cap that timeout to
Expand Down Expand Up @@ -151,13 +152,39 @@ FilterUtility::finalTimeout(const RouteEntry& route, Http::HeaderMap& request_he
}
timeout.per_try_timeout_ = route.retryPolicy().perTryTimeout();

Http::HeaderEntry* header_timeout_entry = request_headers.EnvoyUpstreamRequestTimeoutMs();
uint64_t header_timeout;
if (header_timeout_entry) {
if (absl::SimpleAtoi(header_timeout_entry->value().getStringView(), &header_timeout)) {
timeout.global_timeout_ = std::chrono::milliseconds(header_timeout);

if (respect_expected_rq_timeout) {
htuch marked this conversation as resolved.
Show resolved Hide resolved
// Check if there is timeout set by egress envoy.
// If present, use that value as route timeout and don't override
// `x-envoy-expected-rq-timeout-ms` header. At this point `x-envoy-upstream-rq-timeout-ms`
// header should have been sanitized by egress envoy.
Http::HeaderEntry* header_expected_timeout_entry =
request_headers.EnvoyExpectedRequestTimeoutMs();
if (header_expected_timeout_entry) {
// This will prevent from overriding `x-envoy-expected-rq-timeout-ms` header.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need this? won't this be set based on the global_timeout_ later on which should match the value we're extracting from the expected-rq-timeout-ms header?

Copy link
Member Author

@nezdolik nezdolik Sep 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@snowp this was protection against this code path:

  if (insert_envoy_expected_request_timeout_ms && expected_timeout > 0) {
    request_headers.insertEnvoyExpectedRequestTimeoutMs().value(expected_timeout);
  }

We do indeed derive timeout and put it into a separate data structure (with global timeout), so there is not a big gain from setting insert_envoy_expected_request_timeout_ms to false.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed my mind, there is a gain, so that we use same value in timeout.global_timeout (derived from x-envoy-expected-timeout-ms) and observe same value in header x-envoy-expected-timeout-ms by not overriding it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I think we want the expected header to reflect the timeout used by the router, which is affected by more than just those two headers. If you look further down in that function you'll see that we'll use the per try timeout instead of the global timeout if it's set. It seems like we want the expected timeout header to always reflect the timeout enforced by the router for the outgoing request, and just use the incoming expected timeout header to infer the global timeout. Does that make sense?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@snowp it does, thanks for clarification

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will adjust the test as well

insert_envoy_expected_request_timeout_ms = false;
if (absl::SimpleAtoi(header_expected_timeout_entry->value().getStringView(),
&header_timeout)) {
timeout.global_timeout_ = std::chrono::milliseconds(header_timeout);
}
} else {
Http::HeaderEntry* header_timeout_entry = request_headers.EnvoyUpstreamRequestTimeoutMs();
if (header_timeout_entry) {
if (absl::SimpleAtoi(header_timeout_entry->value().getStringView(), &header_timeout)) {
timeout.global_timeout_ = std::chrono::milliseconds(header_timeout);
}
request_headers.removeEnvoyUpstreamRequestTimeoutMs();
}
}
} else {
Http::HeaderEntry* header_timeout_entry = request_headers.EnvoyUpstreamRequestTimeoutMs();
if (header_timeout_entry) {
if (absl::SimpleAtoi(header_timeout_entry->value().getStringView(), &header_timeout)) {
timeout.global_timeout_ = std::chrono::milliseconds(header_timeout);
}
request_headers.removeEnvoyUpstreamRequestTimeoutMs();
}
request_headers.removeEnvoyUpstreamRequestTimeoutMs();
}

// See if there is a per try/retry timeout. If it's >= global we just ignore it.
Expand Down Expand Up @@ -484,7 +511,8 @@ Http::FilterHeadersStatus Filter::decodeHeaders(Http::HeaderMap& headers, bool e
hedging_params_ = FilterUtility::finalHedgingParams(*route_entry_, headers);

timeout_ = FilterUtility::finalTimeout(*route_entry_, headers, !config_.suppress_envoy_headers_,
grpc_request_, hedging_params_.hedge_on_per_try_timeout_);
grpc_request_, hedging_params_.hedge_on_per_try_timeout_,
config_.respect_expected_rq_timeout_);

// If this header is set with any value, use an alternate response code on timeout
if (headers.EnvoyUpstreamRequestTimeoutAltResponse()) {
Expand Down
12 changes: 8 additions & 4 deletions source/common/router/router.h
Original file line number Diff line number Diff line change
Expand Up @@ -140,7 +140,8 @@ class FilterUtility {
*/
static TimeoutData finalTimeout(const RouteEntry& route, Http::HeaderMap& request_headers,
bool insert_envoy_expected_request_timeout_ms, bool grpc_request,
bool per_try_timeout_hedging_enabled);
bool per_try_timeout_hedging_enabled,
bool respect_expected_rq_timeout);

/**
* Determine the final hedging settings after applying randomized behavior.
Expand All @@ -161,12 +162,14 @@ class FilterConfig {
Stats::Scope& scope, Upstream::ClusterManager& cm, Runtime::Loader& runtime,
Runtime::RandomGenerator& random, ShadowWriterPtr&& shadow_writer,
bool emit_dynamic_stats, bool start_child_span, bool suppress_envoy_headers,
bool respect_expected_rq_timeout,
const Protobuf::RepeatedPtrField<std::string>& strict_check_headers,
TimeSource& time_source, Http::Context& http_context)
: scope_(scope), local_info_(local_info), cm_(cm), runtime_(runtime),
random_(random), stats_{ALL_ROUTER_STATS(POOL_COUNTER_PREFIX(scope, stat_prefix))},
emit_dynamic_stats_(emit_dynamic_stats), start_child_span_(start_child_span),
suppress_envoy_headers_(suppress_envoy_headers), http_context_(http_context),
suppress_envoy_headers_(suppress_envoy_headers),
respect_expected_rq_timeout_(respect_expected_rq_timeout), http_context_(http_context),
stat_name_pool_(scope_.symbolTable()), retry_(stat_name_pool_.add("retry")),
zone_name_(stat_name_pool_.add(local_info_.zoneName())),
empty_stat_name_(stat_name_pool_.add("")), shadow_writer_(std::move(shadow_writer)),
Expand All @@ -186,8 +189,8 @@ class FilterConfig {
context.runtime(), context.random(), std::move(shadow_writer),
PROTOBUF_GET_WRAPPED_OR_DEFAULT(config, dynamic_stats, true),
config.start_child_span(), config.suppress_envoy_headers(),
config.strict_check_headers(), context.api().timeSource(),
context.httpContext()) {
config.respect_expected_rq_timeout(), config.strict_check_headers(),
context.api().timeSource(), context.httpContext()) {
for (const auto& upstream_log : config.upstream_log()) {
upstream_logs_.push_back(AccessLog::AccessLogFactory::fromProto(upstream_log, context));
}
Expand All @@ -207,6 +210,7 @@ class FilterConfig {
const bool emit_dynamic_stats_;
const bool start_child_span_;
const bool suppress_envoy_headers_;
const bool respect_expected_rq_timeout_;
// TODO(xyu-stripe): Make this a bitset to keep cluster memory footprint down.
HeaderVectorPtr strict_check_headers_;
std::list<AccessLog::InstanceSharedPtr> upstream_logs_;
Expand Down
Loading