Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

admin: Support /drain_listeners?graceful #11639

Merged
merged 12 commits into from
Jun 24, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 29 additions & 13 deletions docs/root/intro/arch_overview/operations/draining.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,16 +3,42 @@
Draining
========

Draining is the process by which Envoy attempts to gracefully shed connections in response to
various events. Draining occurs at the following times:
In a few different scenarios, Envoy will attempt to gracefully shed connections. For instance,
during server shutdown, existing requests can be discouraged and listeners set to stop accepting,
to reduce the number of open connections when the server shuts down. Draining behaviour is defined
by the server options in addition to individual listener configs.

Draining occurs at the following times:

* The server is being :ref:`hot restarted <arch_overview_hot_restart>`.
* The server begins the graceful drain sequence via the :ref:`drain_listeners?graceful
<operations_admin_interface_drain>` admin endpoint.
* The server has been manually health check failed via the :ref:`healthcheck/fail
<operations_admin_interface_healthcheck_fail>` admin endpoint. See the :ref:`health check filter
<arch_overview_health_checking_filter>` architecture overview for more information.
* The server is being :ref:`hot restarted <arch_overview_hot_restart>`.
* Individual listeners are being modified or removed via :ref:`LDS
<arch_overview_dynamic_config_lds>`.

By default, the Envoy server will close listeners immediately on server shutdown. To drain listeners
for some duration of time prior to server shutdown, use :ref:`drain_listeners <operations_admin_interface_drain>`
before shutting down the server. The listeners will be directly stopped without any graceful draining behaviour,
and cease accepting new connections immediately.

To add a graceful drain period prior to listeners being closed, use the query parameter
:ref:`drain_listeners?graceful <operations_admin_interface_drain>`. By default, Envoy
will discourage requests for some period of time (as determined by :option:`--drain-time-s`).
The behaviour of request discouraging is determined by the drain manager.

Note that although draining is a per-listener concept, it must be supported at the network filter
level. Currently the only filters that support graceful draining are
:ref:`Redis <config_network_filters_redis_proxy>`,
:ref:`Mongo <config_network_filters_mongo_proxy>`,
and :ref:`HTTP connection manager <config_http_conn_man>`.

By default, the :ref:`HTTP connection manager <config_http_conn_man>` filter will
add "Connection: close" to HTTP1 requests, send HTTP2 GOAWAY, and terminate connections
on request completion (after the delayed close period).

Each :ref:`configured listener <arch_overview_listeners>` has a :ref:`drain_type
<envoy_v3_api_enum_config.listener.v3.Listener.DrainType>` setting which controls when draining takes place. The currently
supported values are:
Expand All @@ -27,13 +53,3 @@ modify_only
It may be desirable to set *modify_only* on egress listeners so they only drain during
modifications while relying on ingress listener draining to perform full server draining when
attempting to do a controlled shutdown.

Note that although draining is a per-listener concept, it must be supported at the network filter
level. Currently the only filters that support graceful draining are
:ref:`HTTP connection manager <config_http_conn_man>`,
:ref:`Redis <config_network_filters_redis_proxy>`, and
:ref:`Mongo <config_network_filters_mongo_proxy>`.

Listeners can also be stopped via :ref:`drain_listeners <operations_admin_interface_drain>`. In this case,
they are directly stopped (without going through the actual draining process) on worker threads,
so that they will not accept any new requests.
6 changes: 6 additions & 0 deletions docs/root/operations/admin.rst
Original file line number Diff line number Diff line change
Expand Up @@ -258,6 +258,12 @@ modify different aspects of the server:
:ref:`Listener <envoy_v3_api_msg_config.listener.v3.Listener>` is used to determine whether a listener
is inbound or outbound.

.. http:post:: /drain_listeners?graceful

When draining listeners, enter a graceful drain period prior to closing listeners.
This behaviour and duration is configurable via server options or CLI
(:option:`--drain-time-s` and :option:`--drain-strategy`).

.. attention::

This operation directly stops the matched listeners on workers. Once listeners in a given
Expand Down
5 changes: 5 additions & 0 deletions include/envoy/server/drain_manager.h
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,11 @@ class DrainManager : public Network::DrainDecision {
*/
virtual void startDrainSequence(std::function<void()> drain_complete_cb) PURE;

/**
* @return whether the drain sequence has started.
*/
virtual bool draining() const PURE;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add doc comments for the interface method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks for the quick turnaround.


/**
* Invoked in the newly launched primary process to begin the parent shutdown sequence. At the end
* of the sequence the previous primary process will be terminated.
Expand Down
16 changes: 15 additions & 1 deletion source/server/admin/listeners_handler.cc
Original file line number Diff line number Diff line change
Expand Up @@ -16,10 +16,24 @@ ListenersHandler::ListenersHandler(Server::Instance& server) : HandlerContextBas
Http::Code ListenersHandler::handlerDrainListeners(absl::string_view url, Http::ResponseHeaderMap&,
Buffer::Instance& response, AdminStream&) {
const Http::Utility::QueryParams params = Http::Utility::parseQueryString(url);

ListenerManager::StopListenersType stop_listeners_type =
params.find("inboundonly") != params.end() ? ListenerManager::StopListenersType::InboundOnly
: ListenerManager::StopListenersType::All;
server_.listenerManager().stopListeners(stop_listeners_type);

const bool graceful = params.find("graceful") != params.end();
if (graceful) {
// Ignore calls to /drain_listeners?graceful if the drain sequence has
// already started.
if (!server_.drainManager().draining()) {
server_.drainManager().startDrainSequence([this, stop_listeners_type]() {
server_.listenerManager().stopListeners(stop_listeners_type);
});
}
} else {
server_.listenerManager().stopListeners(stop_listeners_type);
}

response.add("OK\n");
return Http::Code::OK;
}
Expand Down
1 change: 1 addition & 0 deletions source/server/drain_manager_impl.h
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ class DrainManagerImpl : Logger::Loggable<Logger::Id::main>, public DrainManager

// Server::DrainManager
void startDrainSequence(std::function<void()> drain_complete_cb) override;
bool draining() const override { return draining_; }
void startParentShutdownSequence() override;

private:
Expand Down
100 changes: 100 additions & 0 deletions test/integration/drain_close_integration_test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,106 @@ TEST_P(DrainCloseIntegrationTest, DrainCloseImmediate) {

TEST_P(DrainCloseIntegrationTest, AdminDrain) { testAdminDrain(downstreamProtocol()); }

TEST_P(DrainCloseIntegrationTest, AdminGracefulDrain) {
drain_strategy_ = Server::DrainStrategy::Immediate;
drain_time_ = std::chrono::seconds(999);
initialize();
fake_upstreams_[0]->set_allow_unexpected_disconnects(true);
uint32_t http_port = lookupPort("http");
codec_client_ = makeHttpConnection(http_port);

auto response = codec_client_->makeHeaderOnlyRequest(default_request_headers_);
waitForNextUpstreamRequest(0);
upstream_request_->encodeHeaders(default_response_headers_, true);
response->waitForEndStream();
ASSERT_TRUE(response->complete());
EXPECT_THAT(response->headers(), Http::HttpStatusIs("200"));
// The request is completed but the connection remains open.
EXPECT_TRUE(codec_client_->connected());

// Invoke /drain_listeners with graceful drain
BufferingStreamDecoderPtr admin_response = IntegrationUtil::makeSingleRequest(
lookupPort("admin"), "POST", "/drain_listeners?graceful", "", downstreamProtocol(), version_);
EXPECT_EQ(admin_response->headers().Status()->value().getStringView(), "200");

// With a 999s graceful drain period, the listener should still be open.
EXPECT_EQ(test_server_->counter("listener_manager.listener_stopped")->value(), 0);

response = codec_client_->makeHeaderOnlyRequest(default_request_headers_);
waitForNextUpstreamRequest(0);
upstream_request_->encodeHeaders(default_response_headers_, true);
response->waitForEndStream();
ASSERT_TRUE(response->complete());
EXPECT_THAT(response->headers(), Http::HttpStatusIs("200"));

// Connections will terminate on request complete
ASSERT_TRUE(codec_client_->waitForDisconnect());
if (downstream_protocol_ == Http::CodecClient::Type::HTTP2) {
EXPECT_TRUE(codec_client_->sawGoAway());
} else {
EXPECT_EQ("close", response->headers().getConnectionValue());
}

// New connections can still be made.
auto second_codec_client_ = makeRawHttpConnection(makeClientConnection(http_port));
EXPECT_TRUE(second_codec_client_->connected());

// Invoke /drain_listeners and shut down listeners.
second_codec_client_->rawConnection().close(Network::ConnectionCloseType::NoFlush);
admin_response = IntegrationUtil::makeSingleRequest(
lookupPort("admin"), "POST", "/drain_listeners", "", downstreamProtocol(), version_);
EXPECT_EQ(admin_response->headers().Status()->value().getStringView(), "200");

test_server_->waitForCounterEq("listener_manager.listener_stopped", 1);
EXPECT_NO_THROW(Network::TcpListenSocket(
Network::Utility::getAddressWithPort(*Network::Test::getCanonicalLoopbackAddress(version_),
http_port),
nullptr, true));
}

TEST_P(DrainCloseIntegrationTest, RepeatedAdminGracefulDrain) {
// Use the default gradual probabilistic DrainStrategy so drainClose()
// behaviour isn't conflated with whether the drain sequence has started.
drain_time_ = std::chrono::seconds(999);
initialize();
fake_upstreams_[0]->set_allow_unexpected_disconnects(true);
uint32_t http_port = lookupPort("http");
codec_client_ = makeHttpConnection(http_port);

auto response = codec_client_->makeHeaderOnlyRequest(default_request_headers_);
waitForNextUpstreamRequest(0);
upstream_request_->encodeHeaders(default_response_headers_, true);
response->waitForEndStream();

// Invoke /drain_listeners with graceful drain
BufferingStreamDecoderPtr admin_response = IntegrationUtil::makeSingleRequest(
lookupPort("admin"), "POST", "/drain_listeners?graceful", "", downstreamProtocol(), version_);
EXPECT_EQ(admin_response->headers().Status()->value().getStringView(), "200");
EXPECT_EQ(test_server_->counter("listener_manager.listener_stopped")->value(), 0);

admin_response = IntegrationUtil::makeSingleRequest(
lookupPort("admin"), "POST", "/drain_listeners?graceful", "", downstreamProtocol(), version_);
EXPECT_EQ(admin_response->headers().Status()->value().getStringView(), "200");
EXPECT_EQ(admin_response->headers().Status()->value().getStringView(), "200");

response = codec_client_->makeHeaderOnlyRequest(default_request_headers_);
waitForNextUpstreamRequest(0);
upstream_request_->encodeHeaders(default_response_headers_, true);
response->waitForEndStream();
ASSERT_TRUE(response->complete());
EXPECT_THAT(response->headers(), Http::HttpStatusIs("200"));

admin_response = IntegrationUtil::makeSingleRequest(
lookupPort("admin"), "POST", "/drain_listeners", "", downstreamProtocol(), version_);
EXPECT_EQ(admin_response->headers().Status()->value().getStringView(), "200");

test_server_->waitForCounterEq("listener_manager.listener_stopped", 1);
EXPECT_NO_THROW(Network::TcpListenSocket(
Network::Utility::getAddressWithPort(*Network::Test::getCanonicalLoopbackAddress(version_),
http_port),
nullptr, true));
}

INSTANTIATE_TEST_SUITE_P(Protocols, DrainCloseIntegrationTest,
testing::ValuesIn(HttpProtocolIntegrationTest::getProtocolTestParams(
{Http::CodecClient::Type::HTTP1, Http::CodecClient::Type::HTTP2},
Expand Down
1 change: 1 addition & 0 deletions test/mocks/server/mocks.h
Original file line number Diff line number Diff line change
Expand Up @@ -192,6 +192,7 @@ class MockDrainManager : public DrainManager {

// Server::DrainManager
MOCK_METHOD(bool, drainClose, (), (const));
MOCK_METHOD(bool, draining, (), (const));
MOCK_METHOD(void, startDrainSequence, (std::function<void()> completion));
MOCK_METHOD(void, startParentShutdownSequence, ());

Expand Down
3 changes: 3 additions & 0 deletions test/server/drain_manager_impl_test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -126,7 +126,10 @@ TEST_P(DrainManagerImplTest, DrainDeadlineProbability) {
EXPECT_TRUE(drain_manager.drainClose());
EXPECT_CALL(server_, healthCheckFailed()).WillRepeatedly(Return(false));
EXPECT_FALSE(drain_manager.drainClose());
EXPECT_FALSE(drain_manager.draining());

drain_manager.startDrainSequence([] {});
EXPECT_TRUE(drain_manager.draining());

if (drain_gradually) {
// random() should be called when elapsed time < drain timeout
Expand Down