Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add x-envoy-immediate-health-check-fail header support #1570

Merged
merged 3 commits into from
Aug 30, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/configuration/cluster_manager/cluster_stats.rst
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,7 @@ If health check is configured, the cluster has an additional statistics tree roo
attempt, Counter, Number of health checks
success, Counter, Number of successful health checks
failure, Counter, Number of immediately failed health checks (e.g. HTTP 503) as well as network failures
passive_failure, Counter, Number of health check failures due to passive events (e.g. x-envoy-immediate-health-check-fail)
network_failure, Counter, Number of health check failures due to network error
verify_cluster, Counter, Number of health checks that attempted cluster name verification
healthy, Gauge, Number of healthy members
Expand Down
7 changes: 7 additions & 0 deletions docs/configuration/http_filters/health_check_filter.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,13 @@ Health check filter :ref:`architecture overview <arch_overview_health_checking_f
}
}

Note that the filter will automatically fail health checks and set the
:ref:`x-envoy-immediate-health-check-fail
<config_http_filters_router_x-envoy-immediate-health-check-fail>` header if the
:ref:`/healthcheck/fail <operations_admin_interface_healthcheck_fail>` admin endpoint has been
called. (The :ref:`/healthcheck/ok <operations_admin_interface_healthcheck_ok>` admin endpoint
reverses this behavior).

pass_through_mode
*(required, boolean)* Specifies whether the filter operates in pass through mode or not.

Expand Down
13 changes: 13 additions & 0 deletions docs/configuration/http_filters/router_filter.rst
Original file line number Diff line number Diff line change
Expand Up @@ -209,6 +209,19 @@ If the route utilizes :ref:`prefix_rewrite <config_http_conn_man_route_table_rou
Envoy will put the original path header in this header. This can be useful for logging and
debugging.

.. _config_http_filters_router_x-envoy-immediate-health-check-fail:

x-envoy-immediate-health-check-fail
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

If the upstream host returns this header (set to any value), Envoy will immediately assume the
upstream host has failed :ref:`active health checking <arch_overview_health_checking>` (if the
cluster has been :ref:`configured <config_cluster_manager_cluster_hc>` for active health checking).
This can be used to fast fail an upstream host via standard data plane processing without waiting
for the next health check interval. The host can become healthy again via standard active health
checks. See the :ref:`health checking overview <arch_overview_health_checking>` for more
information.

.. _config_http_filters_router_stats:

Statistics
Expand Down
28 changes: 26 additions & 2 deletions docs/intro/arch_overview/health_checking.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,10 @@ unhealthy, successes required before marking a host healthy, etc.):
server can respond with anything other than PONG to cause an immediate active health check
failure.

Note that Envoy also supports passive health checking via :ref:`outlier detection
Passive health checking
-----------------------

Envoy also supports passive health checking via :ref:`outlier detection
<arch_overview_outlier_detection>`.

.. _arch_overview_health_checking_filter:
Expand All @@ -47,7 +50,28 @@ operation:
eventually consistent view of the health state of each upstream host without overwhelming the
local service with a large number of health check requests.

Health check filter :ref:`configuration <config_http_filters_health_check>`.
Further reading:

* Health check filter :ref:`configuration <config_http_filters_health_check>`.
* :ref:`/healthcheck/fail <operations_admin_interface_healthcheck_fail>` admin endpoint.
* :ref:`/healthcheck/ok <operations_admin_interface_healthcheck_ok>` admin endpoint.

Active health checking fast failure
-----------------------------------

When using active health checking along with passive health checking (:ref:`outlier detection
<arch_overview_outlier_detection>`), it is common to use a long health checking interval to avoid a
large amount of active health checking traffic. In this case, it is still useful to be able to
quickly drain an upstream host when using the :ref:`/healthcheck/fail
<operations_admin_interface_healthcheck_fail>` admin endpoint. To support this, the :ref:`router
filter <config_http_filters_router>` will respond to the :ref:`x-envoy-immediate-health-check-fail
<config_http_filters_router_x-envoy-immediate-health-check-fail>` header. If this header is set by
an upstream host, Envoy will immediately mark the host as being failed for active health check. Note
that this only occurs if the host's cluster has active health checking :ref:`configured
<config_cluster_manager_cluster_hc>`. The :ref:`health checking filter
<config_http_filters_health_check>` will automatically set this header if Envoy has been marked as
failed via the :ref:`/healthcheck/fail <operations_admin_interface_healthcheck_fail>` admin
endpoint.

.. _arch_overview_health_checking_identity:

Expand Down
4 changes: 4 additions & 0 deletions docs/operations/admin.rst
Original file line number Diff line number Diff line change
Expand Up @@ -66,13 +66,17 @@ modify different aspects of the server.

Enable or disable the CPU profiler. Requires compiling with gperftools.

.. _operations_admin_interface_healthcheck_fail:

.. http:get:: /healthcheck/fail

Fail inbound health checks. This requires the use of the HTTP :ref:`health check filter
<config_http_filters_health_check>`. This is useful for draining a server prior to shutting it
down or doing a full restart. Invoking this command will universally fail health check requests
regardless of how the filter is configured (pass through, etc.).

.. _operations_admin_interface_healthcheck_ok:

.. http:get:: /healthcheck/ok

Negate the effect of :http:get:`/healthcheck/fail`. This requires the use of the HTTP
Expand Down
1 change: 1 addition & 0 deletions include/envoy/http/header_map.h
Original file line number Diff line number Diff line change
Expand Up @@ -212,6 +212,7 @@ class HeaderEntry {
HEADER_FUNC(EnvoyExpectedRequestTimeoutMs) \
HEADER_FUNC(EnvoyExternalAddress) \
HEADER_FUNC(EnvoyForceTrace) \
HEADER_FUNC(EnvoyImmediateHealthCheckFail) \
HEADER_FUNC(EnvoyInternalRequest) \
HEADER_FUNC(EnvoyMaxRetries) \
HEADER_FUNC(EnvoyOriginalPath) \
Expand Down
7 changes: 7 additions & 0 deletions include/envoy/upstream/BUILD
Original file line number Diff line number Diff line change
Expand Up @@ -34,10 +34,16 @@ envoy_cc_library(
deps = [":upstream_interface"],
)

envoy_cc_library(
name = "health_check_host_monitor_interface",
hdrs = ["health_check_host_monitor.h"],
)

envoy_cc_library(
name = "host_description_interface",
hdrs = ["host_description.h"],
deps = [
":health_check_host_monitor_interface",
":outlier_detection_interface",
"//include/envoy/network:address_interface",
"//include/envoy/stats:stats_macros",
Expand Down Expand Up @@ -78,6 +84,7 @@ envoy_cc_library(
name = "upstream_interface",
hdrs = ["upstream.h"],
deps = [
":health_check_host_monitor_interface",
":load_balancer_type_interface",
":resource_manager_interface",
"//include/envoy/common:callback",
Expand Down
27 changes: 27 additions & 0 deletions include/envoy/upstream/health_check_host_monitor.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
#pragma once

#include <memory>

namespace Envoy {
namespace Upstream {

/**
* A monitor for "passive" health check events that might happen on every thread. For example, if a
* special HTTP header is received, the data plane may decide to fast fail a host to avoid waiting
* for the full HC interval to elapse before determining the host is active HC failed.
*/
class HealthCheckHostMonitor {
public:
virtual ~HealthCheckHostMonitor() {}

/**
* Mark the host as unhealthy. Note that this may not be immediate as events may need to be
* propagated between multiple threads.
*/
virtual void setUnhealthy() PURE;
};

typedef std::unique_ptr<HealthCheckHostMonitor> HealthCheckHostMonitorPtr;

} // namespace Upstream
} // namespace Envoy
2 changes: 1 addition & 1 deletion include/envoy/upstream/health_checker.h
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ class HealthChecker {
virtual void start() PURE;
};

typedef std::unique_ptr<HealthChecker> HealthCheckerPtr;
typedef std::shared_ptr<HealthChecker> HealthCheckerSharedPtr;

} // namespace Upstream
} // namespace Envoy
10 changes: 8 additions & 2 deletions include/envoy/upstream/host_description.h
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@

#include "envoy/network/address.h"
#include "envoy/stats/stats_macros.h"
#include "envoy/upstream/health_check_host_monitor.h"
#include "envoy/upstream/outlier_detection.h"

namespace Envoy {
Expand Down Expand Up @@ -50,9 +51,14 @@ class HostDescription {
virtual const ClusterInfo& cluster() const PURE;

/**
* @return the host's outlier detection sink.
* @return the host's outlier detection monitor.
*/
virtual Outlier::DetectorHostSink& outlierDetector() const PURE;
virtual Outlier::DetectorHostMonitor& outlierDetector() const PURE;

/**
* @return the host's health checker monitor.
*/
virtual HealthCheckHostMonitor& healthChecker() const PURE;

/**
* @return the hostname associated with the host if any.
Expand Down
8 changes: 4 additions & 4 deletions include/envoy/upstream/outlier_detection.h
Original file line number Diff line number Diff line change
Expand Up @@ -21,11 +21,11 @@ typedef std::shared_ptr<const HostDescription> HostDescriptionConstSharedPtr;
namespace Outlier {

/**
* Sink for per host data. Proxy filters should send pertinent data when available.
* Monitor for per host data. Proxy filters should send pertinent data when available.
*/
class DetectorHostSink {
class DetectorHostMonitor {
public:
virtual ~DetectorHostSink() {}
virtual ~DetectorHostMonitor() {}

/**
* @return the number of times this host has been ejected.
Expand Down Expand Up @@ -63,7 +63,7 @@ class DetectorHostSink {
virtual double successRate() const PURE;
};

typedef std::unique_ptr<DetectorHostSink> DetectorHostSinkPtr;
typedef std::unique_ptr<DetectorHostMonitor> DetectorHostMonitorPtr;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think this nomenclature is easier to parse for me.


/**
* Interface for an outlier detection engine. Uses per host data to determine which hosts in a
Expand Down
15 changes: 12 additions & 3 deletions include/envoy/upstream/upstream.h
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
#include "envoy/http/codec.h"
#include "envoy/network/connection.h"
#include "envoy/ssl/context.h"
#include "envoy/upstream/health_check_host_monitor.h"
#include "envoy/upstream/load_balancer_type.h"
#include "envoy/upstream/outlier_detection.h"
#include "envoy/upstream/resource_manager.h"
Expand Down Expand Up @@ -80,11 +81,19 @@ class Host : virtual public HostDescription {
virtual bool healthy() const PURE;

/**
* Set the host's outlier detector. Outlier detectors are assumed to be thread safe, however
* a new outlier detector must be installed before the host is used across threads. Thus,
* Set the host's health checker monitor. Monitors are assumed to be thread safe, however
* a new monitor must be installed before the host is used across threads. Thus,
* this routine should only be called on the main thread before the host is used across threads.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add an ASSERT verifying we're on the main thread in the implementation?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no great way to do this because of how we use hosts in tests, etc. I think I'm going to skip this for now. I will make a note to look at it in a follow up.

*/
virtual void setOutlierDetector(Outlier::DetectorHostSinkPtr&& outlier_detector) PURE;
virtual void setHealthChecker(HealthCheckHostMonitorPtr&& health_checker) PURE;

/**
* Set the host's outlier detector monitor. Outlier detector monitors are assumed to be thread
* safe, however a new outlier detector monitor must be installed before the host is used across
* threads. Thus, this routine should only be called on the main thread before the host is used
* across threads.
*/
virtual void setOutlierDetector(Outlier::DetectorHostMonitorPtr&& outlier_detector) PURE;

/**
* @return the current load balancing weight of the host, in the range 1-100.
Expand Down
5 changes: 5 additions & 0 deletions source/common/http/headers.h
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ class HeaderValues {
const LowerCaseString EnvoyDownstreamServiceNode{"x-envoy-downstream-service-node"};
const LowerCaseString EnvoyExternalAddress{"x-envoy-external-address"};
const LowerCaseString EnvoyForceTrace{"x-envoy-force-trace"};
const LowerCaseString EnvoyImmediateHealthCheckFail{"x-envoy-immediate-health-check-fail"};
const LowerCaseString EnvoyInternalRequest{"x-envoy-internal"};
const LowerCaseString EnvoyMaxRetries{"x-envoy-max-retries"};
const LowerCaseString EnvoyOriginalPath{"x-envoy-original-path"};
Expand Down Expand Up @@ -89,6 +90,10 @@ class HeaderValues {
const std::string Json{"application/json"};
} ContentTypeValues;

struct {
const std::string True{"true"};
} EnvoyImmediateHealthCheckFailValues;

struct {
const std::string True{"true"};
} EnvoyInternalRequestValues;
Expand Down
4 changes: 4 additions & 0 deletions source/common/router/router.cc
Original file line number Diff line number Diff line change
Expand Up @@ -455,6 +455,10 @@ void Filter::onUpstreamHeaders(Http::HeaderMapPtr&& headers, bool end_stream) {
upstream_request_->upstream_host_->outlierDetector().putHttpResponseCode(
Http::Utility::getResponseStatus(*headers));

if (headers->EnvoyImmediateHealthCheckFail() != nullptr) {
upstream_request_->upstream_host_->healthChecker().setUnhealthy();
}

if (retry_state_ &&
retry_state_->shouldRetry(headers.get(), Optional<Http::StreamResetReason>(),
[this]() -> void { doRetry(); }) &&
Expand Down
Loading