-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
xDS: gRPC connection failure shouldn't make Envoy continue startup #8152
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -38,10 +38,7 @@ HttpSubscriptionImpl::HttpSubscriptionImpl( | |
void HttpSubscriptionImpl::start(const std::set<std::string>& resource_names) { | ||
if (init_fetch_timeout_.count() > 0) { | ||
init_fetch_timeout_timer_ = dispatcher_.createTimer([this]() -> void { | ||
ENVOY_LOG(warn, "REST config: initial fetch timed out for", path_); | ||
stats_.init_fetch_timeout_.inc(); | ||
callbacks_.onConfigUpdateFailed(Envoy::Config::ConfigUpdateFailureReason::FetchTimedout, | ||
nullptr); | ||
handleFailure(Config::ConfigUpdateFailureReason::FetchTimedout, nullptr); | ||
}); | ||
init_fetch_timeout_timer_->enableTimer(init_fetch_timeout_); | ||
} | ||
|
@@ -77,8 +74,7 @@ void HttpSubscriptionImpl::parseResponse(const Http::Message& response) { | |
try { | ||
MessageUtil::loadFromJson(response.bodyAsString(), message, validation_visitor_); | ||
} catch (const EnvoyException& e) { | ||
ENVOY_LOG(warn, "REST config JSON conversion error: {}", e.what()); | ||
handleFailure(nullptr); | ||
handleFailure(Config::ConfigUpdateFailureReason::UpdateRejected, &e); | ||
return; | ||
} | ||
try { | ||
|
@@ -87,23 +83,45 @@ void HttpSubscriptionImpl::parseResponse(const Http::Message& response) { | |
stats_.version_.set(HashUtil::xxHash64(request_.version_info())); | ||
stats_.update_success_.inc(); | ||
} catch (const EnvoyException& e) { | ||
ENVOY_LOG(warn, "REST config update rejected: {}", e.what()); | ||
stats_.update_rejected_.inc(); | ||
callbacks_.onConfigUpdateFailed(Envoy::Config::ConfigUpdateFailureReason::UpdateRejected, &e); | ||
handleFailure(Config::ConfigUpdateFailureReason::UpdateRejected, &e); | ||
} | ||
} | ||
|
||
void HttpSubscriptionImpl::onFetchComplete() {} | ||
|
||
void HttpSubscriptionImpl::onFetchFailure(const EnvoyException* e) { | ||
disableInitFetchTimeoutTimer(); | ||
ENVOY_LOG(warn, "REST config update failed: {}", e != nullptr ? e->what() : "fetch failure"); | ||
handleFailure(e); | ||
void HttpSubscriptionImpl::onFetchFailure(Config::ConfigUpdateFailureReason reason, | ||
const EnvoyException* e) { | ||
handleFailure(reason, e); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I thought onFetchFailure is always a ConnectionFailure? If so, we do not have to change onFetchFailure signature and just call handleFailure with ConnectionFailure reason here? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. no, please see old code in |
||
} | ||
|
||
void HttpSubscriptionImpl::handleFailure(const EnvoyException* e) { | ||
stats_.update_failure_.inc(); | ||
callbacks_.onConfigUpdateFailed(Envoy::Config::ConfigUpdateFailureReason::ConnectionFailure, e); | ||
void HttpSubscriptionImpl::handleFailure(Config::ConfigUpdateFailureReason reason, | ||
const EnvoyException* e) { | ||
|
||
switch (reason) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The code here is exactly same as the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. IMHO this is not necessary to use a common function to handle the errors for HTTP and gRPC subscription. It's possible to have different error handling for them. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not sure why would it be different once the message is received, but up to you. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I guess it's not a good idea to define a new interface for error handling or use template to generalize the type. Maybe they can share a common base class which define the error handling, but for this PR I want to limit the change scope :) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it would be nice to dedupe here, but agree we can push this to a later PR. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please hold off on anything like this until #7293 is merged. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. (The reason being that GrpcMuxSubscriptionImpl is about to be entirely replaced) |
||
case Config::ConfigUpdateFailureReason::ConnectionFailure: | ||
ENVOY_LOG(warn, "REST update for {} failed", path_); | ||
stats_.update_failure_.inc(); | ||
break; | ||
case Config::ConfigUpdateFailureReason::FetchTimedout: | ||
ENVOY_LOG(warn, "REST config: initial fetch timeout for {}", path_); | ||
stats_.init_fetch_timeout_.inc(); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: reset timer if not nullptr, or simply disableInitFetchTimeoutTimer() ? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
disableInitFetchTimeoutTimer(); | ||
break; | ||
case Config::ConfigUpdateFailureReason::UpdateRejected: | ||
ASSERT(e != nullptr); | ||
ENVOY_LOG(warn, "REST config for {} rejected: {}", path_, e->what()); | ||
stats_.update_rejected_.inc(); | ||
disableInitFetchTimeoutTimer(); | ||
break; | ||
} | ||
|
||
if (reason == Envoy::Config::ConfigUpdateFailureReason::ConnectionFailure) { | ||
// New requests will be sent again. | ||
// If init_fetch_timeout is non-zero, server will continue startup after it timeout | ||
return; | ||
} | ||
|
||
callbacks_.onConfigUpdateFailed(reason, e); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. But should we fix that case now? Otherwise of http subscriptions even if the connection failed, it will call onConfigUpdateFailed method which is incorrect/inconsistent? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I suggest use an another PR to fix "//test/integration:hotrestart_test". I'm not fully understand that test case right now, no sure what's the best way to modify that. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. PR #8162 created for fixing "//test/integration:hotrestart_test", that's a small config change, just remove dynamic_resources, after that accepted, I will update code here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. PR #8162 merged. This is also updated. |
||
} | ||
|
||
void HttpSubscriptionImpl::disableInitFetchTimeoutTimer() { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: call this method if reason != ConnectionFailure may be more readable than return?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Putting const value at left side of
==
is considered a better style. It doesn't cause any difficulty for readability usually, because reader always need to see both side of==
to understand the condition.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry. My comment was not clear may be. I am not talking about const on left of ==,
I am suggesting the following instead if return - Not a big deal though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO positive logic is preferred, the comment in
if
block describe is helpful for understanding what's going on.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@l8huang can you actually switch this to
reason == the const value
? I agree that what you have there is "better style" for the reason you give, but it's actually quite jarring as most of the Envoy code base doesn't do this, and the "conform to local practices" style argument wins out IMHO.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, will do