Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

per-worker listener and watchdog stats #8263

Merged
merged 11 commits into from
Sep 20, 2019
5 changes: 3 additions & 2 deletions api/envoy/config/bootstrap/v2/bootstrap.proto
Original file line number Diff line number Diff line change
Expand Up @@ -202,13 +202,14 @@ message ClusterManager {

// Envoy process watchdog configuration. When configured, this monitors for
// nonresponsive threads and kills the process after the configured thresholds.
// See the :ref:`watchdog documentation <operations_performance_watchdog>` for more information.
message Watchdog {
// The duration after which Envoy counts a nonresponsive thread in the
// *server.watchdog_miss* statistic. If not specified the default is 200ms.
// *watchdog_miss* statistic. If not specified the default is 200ms.
google.protobuf.Duration miss_timeout = 1;

// The duration after which Envoy counts a nonresponsive thread in the
// *server.watchdog_mega_miss* statistic. If not specified the default is
// *watchdog_mega_miss* statistic. If not specified the default is
// 1000ms.
google.protobuf.Duration megamiss_timeout = 2;

Expand Down
20 changes: 20 additions & 0 deletions docs/root/configuration/listeners/stats.rst
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,26 @@ Every listener has a statistics tree rooted at *listener.<address>.* with the fo
ssl.sigalgs.<sigalg>, Counter, Total successful TLS connections that used signature algorithm <sigalg>
ssl.versions.<version>, Counter, Total successful TLS connections that used protocol version <version>

.. _config_listener_stats_per_handler:

Per-handler Listener Stats
--------------------------

Every listener additionally has a statistics tree rooted at *listener.<address>.<handler>.* which
contains *per-handler* statistics. As described in the
:ref:`threading model <arch_overview_threading>` documentation, Envoy has a threading model which
includes the *main thread* as well as a number of *worker threads* which are controlled by the
:option:`--concurrency` option. Along these lines, *<handler>* is equal to *main_thread*,
*worker_0*, *worker_1*, etc. These statistics can be used to look for per-handler/worker imbalance
on either accepted or active connections.

.. csv-table::
:header: Name, Type, Description
:widths: 1, 1, 2

downstream_cx_total, Counter, Total connections on this handler.
downstream_cx_active, Gauge, Total active connections on this handler.

Listener manager
----------------

Expand Down
3 changes: 3 additions & 0 deletions docs/root/intro/version_history.rst
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,9 @@ Version history
* router check tool: add flag for only printing results of failed tests.
* router check tool: add support for outputting missing tests in the detailed coverage report.
* server: added a post initialization lifecycle event, in addition to the existing startup and shutdown events.
* server: added :ref:`per-handler listener stats <config_listener_stats_per_handler>` and
:ref:`per-worker watchdog stats <operations_performance_watchdog>` to help diagnosing event
loop imbalance and general performance issues.
* thrift_proxy: fix crashing bug on invalid transport/protocol framing
* tls: added verification of IP address SAN fields in certificates against configured SANs in the
* tracing: added support to the Zipkin reporter for sending list of spans as Zipkin JSON v2 and protobuf message over HTTP.
Expand Down
25 changes: 23 additions & 2 deletions docs/root/operations/performance.rst
Original file line number Diff line number Diff line change
Expand Up @@ -34,8 +34,8 @@ to true.
the wire individually because the statsd protocol doesn't have any way to represent a histogram
summary. Be aware that this can be a very large volume of data.

Statistics
----------
Event loop statistics
---------------------

The event dispatcher for the main thread has a statistics tree rooted at *server.dispatcher.*, and
the event dispatcher for each worker thread has a statistics tree rooted at
Expand All @@ -49,3 +49,24 @@ the event dispatcher for each worker thread has a statistics tree rooted at
poll_delay_us, Histogram, Polling delays in microseconds

Note that any auxiliary threads are not included here.

.. _operations_performance_watchdog:

Watchdog
--------

In addition to event loop statistics, Envoy also include a configurable
:ref:`watchdog <envoy_api_field_config.bootstrap.v2.Bootstrap.watchdog>` system that can increment
statistics when Envoy is not responsive and optionally kill the server. The statistics are useful
for understanding at a high level whether Envoy's event loop is not responsive either because it is
doing too much work, blocking, or not being scheduled by the OS.

The watchdog emits statistics in both the *server.* and *server.<thread_name>.* trees.
mattklein123 marked this conversation as resolved.
Show resolved Hide resolved
*<thread_name>* is equal to *main_thread*, *worker_0*, *worker_1*, etc.

.. csv-table::
:header: Name, Type, Description
:widths: 1, 1, 2

watchdog_miss, Counter, Number of standard misses
watchdog_mega_miss, Counter, Number of mega misses
17 changes: 9 additions & 8 deletions include/envoy/network/connection_handler.h
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,6 @@
#include "envoy/network/listener.h"
#include "envoy/ssl/context.h"

#include "spdlog/spdlog.h"

namespace Envoy {
namespace Network {

Expand Down Expand Up @@ -84,6 +82,11 @@ class ConnectionHandler {
*/
virtual void enableListeners() PURE;

/**
* @return the stat prefix used for per-handler stats.
*/
virtual const std::string& statPrefix() PURE;

/**
* Used by ConnectionHandler to manage listeners.
*/
Expand All @@ -95,10 +98,12 @@ class ConnectionHandler {
* @return the tag value as configured.
*/
virtual uint64_t listenerTag() PURE;

/**
* @return the actual Listener object.
*/
virtual Listener* listener() PURE;

/**
* Destroy the actual Listener it wraps.
*/
Expand All @@ -111,8 +116,7 @@ class ConnectionHandler {
using ConnectionHandlerPtr = std::unique_ptr<ConnectionHandler>;

/**
* A registered factory interface to create different kinds of
* ActiveUdpListener.
* A registered factory interface to create different kinds of ActiveUdpListener.
*/
class ActiveUdpListenerFactory {
public:
Expand All @@ -123,16 +127,13 @@ class ActiveUdpListenerFactory {
* according to given config.
* @param parent is the owner of the created ActiveListener objects.
* @param dispatcher is used to create actual UDP listener.
* @param logger might not need to be passed in.
* TODO(danzh): investigate if possible to use statically defined logger in ActiveUdpListener
* implementation instead.
* @param config provides information needed to create ActiveUdpListener and
* UdpListener objects.
* @return the ActiveUdpListener created.
*/
virtual ConnectionHandler::ActiveListenerPtr
createActiveUdpListener(ConnectionHandler& parent, Event::Dispatcher& disptacher,
spdlog::logger& logger, Network::ListenerConfig& config) const PURE;
Network::ListenerConfig& config) const PURE;

/**
* @return true if the UDP passing through listener doesn't form stateful connections.
Expand Down
4 changes: 3 additions & 1 deletion include/envoy/server/guarddog.h
Original file line number Diff line number Diff line change
Expand Up @@ -27,8 +27,10 @@ class GuardDog {
* stopWatching() method to remove it from the list of watched objects.
*
* @param thread_id a Thread::ThreadId containing the system thread id
* @param thread_name supplies the name of the thread which is used for per-thread miss stats.
*/
virtual WatchDogSharedPtr createWatchDog(Thread::ThreadId thread_id) PURE;
virtual WatchDogSharedPtr createWatchDog(Thread::ThreadId thread_id,
const std::string& thread_name) PURE;

/**
* Tell the GuardDog to forget about this WatchDog.
Expand Down
5 changes: 4 additions & 1 deletion include/envoy/server/worker.h
Original file line number Diff line number Diff line change
Expand Up @@ -91,9 +91,12 @@ class WorkerFactory {
virtual ~WorkerFactory() = default;

/**
* @param overload_manager supplies the server's overload manager.
* @param worker_name supplies the name of the worker, used for per-worker stats.
* @return WorkerPtr a new worker.
*/
virtual WorkerPtr createWorker(OverloadManager& overload_manager) PURE;
virtual WorkerPtr createWorker(OverloadManager& overload_manager,
const std::string& worker_name) PURE;
};

} // namespace Server
Expand Down
1 change: 1 addition & 0 deletions source/common/common/logger.h
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ namespace Logger {
FUNCTION(client) \
FUNCTION(config) \
FUNCTION(connection) \
FUNCTION(conn_handler) \
FUNCTION(dubbo) \
FUNCTION(file) \
FUNCTION(filter) \
Expand Down
17 changes: 0 additions & 17 deletions source/common/event/BUILD
Original file line number Diff line number Diff line change
Expand Up @@ -123,20 +123,3 @@ envoy_cc_library(
"//source/common/common:scope_tracker",
],
)

envoy_cc_library(
name = "dispatched_thread_lib",
srcs = ["dispatched_thread.cc"],
hdrs = ["dispatched_thread.h"],
external_deps = [
"event",
],
deps = [
":dispatcher_lib",
"//include/envoy/api:api_interface",
"//include/envoy/event:dispatcher_interface",
"//source/common/common:minimal_logger_lib",
"//source/common/common:thread_lib",
"//source/server:guarddog_lib",
],
)
41 changes: 0 additions & 41 deletions source/common/event/dispatched_thread.cc

This file was deleted.

67 changes: 0 additions & 67 deletions source/common/event/dispatched_thread.h

This file was deleted.

17 changes: 9 additions & 8 deletions source/extensions/quic_listeners/quiche/active_quic_listener.cc
Original file line number Diff line number Diff line change
Expand Up @@ -11,29 +11,30 @@ namespace Envoy {
namespace Quic {

ActiveQuicListener::ActiveQuicListener(Event::Dispatcher& dispatcher,
Network::ConnectionHandler& parent, spdlog::logger& logger,
Network::ConnectionHandler& parent,
Network::ListenerConfig& listener_config,
const quic::QuicConfig& quic_config)
: ActiveQuicListener(dispatcher, parent,
dispatcher.createUdpListener(listener_config.socket(), *this), logger,
dispatcher.createUdpListener(listener_config.socket(), *this),
listener_config, quic_config) {}

ActiveQuicListener::ActiveQuicListener(Event::Dispatcher& dispatcher,
Network::ConnectionHandler& parent,
Network::UdpListenerPtr&& listener, spdlog::logger& logger,
Network::UdpListenerPtr&& listener,
Network::ListenerConfig& listener_config,
const quic::QuicConfig& quic_config)
: ActiveQuicListener(dispatcher, parent, std::make_unique<EnvoyQuicPacketWriter>(*listener),
std::move(listener), logger, listener_config, quic_config) {}
std::move(listener), listener_config, quic_config) {}

ActiveQuicListener::ActiveQuicListener(Event::Dispatcher& dispatcher,
Network::ConnectionHandler& parent,
std::unique_ptr<quic::QuicPacketWriter> writer,
Network::UdpListenerPtr&& listener, spdlog::logger& logger,
Network::UdpListenerPtr&& listener,
Network::ListenerConfig& listener_config,
const quic::QuicConfig& quic_config)
: Server::ConnectionHandlerImpl::ActiveListenerImplBase(std::move(listener), listener_config),
logger_(logger), dispatcher_(dispatcher), version_manager_(quic::CurrentSupportedVersions()) {
: Server::ConnectionHandlerImpl::ActiveListenerImplBase(parent, std::move(listener),
listener_config),
dispatcher_(dispatcher), version_manager_(quic::CurrentSupportedVersions()) {
quic::QuicRandom* const random = quic::QuicRandom::GetInstance();
random->RandBytes(random_seed_, sizeof(random_seed_));
crypto_config_ = std::make_unique<quic::QuicCryptoServerConfig>(
Expand All @@ -51,7 +52,7 @@ ActiveQuicListener::ActiveQuicListener(Event::Dispatcher& dispatcher,
}

void ActiveQuicListener::onListenerShutdown() {
ENVOY_LOG_TO_LOGGER(logger_, info, "Quic listener {} shutdown.", config_.name());
ENVOY_LOG(info, "Quic listener {} shutdown.", config_.name());
quic_dispatcher_->Shutdown();
}

Expand Down
Loading