Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change default policy to HealthyOrPanic #2171

Merged
merged 1 commit into from
Jun 28, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions docs/docfx/articles/dests-health-checks.md
Original file line number Diff line number Diff line change
Expand Up @@ -252,7 +252,7 @@ endpoints.MapReverseProxy(proxyPipeline =>
- `ReactivationPeriod` - period after which an unhealthy destination's passive health state is reset to `Unknown` and it starts receiving traffic again. Default value is `null` which means the period will be set by a `IPassiveHealthCheckPolicy`

### Built-in policies
There is currently one built-in passive health check policy - [`TransportFailureRateHealthPolicy`](xref:Yarp.ReverseProxy.Health.TransportFailureRateHealthPolicyOptions). It calculates the proxied requests failure rate for each destination and marks it as unhealthy if the specified limit is exceeded. Rate is calculated as a percentage of failured requests to the total number of request proxied to a destination in the given period of time. Failed and total counters are tracked in a sliding time window which means that only the recent readings fitting in the window are taken into account.
There is currently one built-in passive health check policy - [`TransportFailureRateHealthPolicy`](xref:Yarp.ReverseProxy.Health.TransportFailureRateHealthPolicyOptions). It calculates the proxied requests failure rate for each destination and marks it as unhealthy if the specified limit is exceeded. Rate is calculated as a percentage of failed requests to the total number of request proxied to a destination in the given period of time. Failed and total counters are tracked in a sliding time window which means that only the recent readings fitting in the window are taken into account.
There are two sets of policy parameters defined globally and on per cluster level.

Global parameters are set via the options mechanism using `TransportFailureRateHealthPolicyOptions` type with the following properties:
Expand All @@ -272,7 +272,7 @@ services.Configure<TransportFailureRateHealthPolicyOptions>(o =>
```

Cluster-specific parameters are set in the cluster's metadata as follows:
`TransportFailureRateHealthPolicy.RateLimit` - failure rate limit for a destination to be marked as unhealhty. The value is in range `(0,1)`. Default value is provided by the global `DefaultFailureRateLimit` parameter.
`TransportFailureRateHealthPolicy.RateLimit` - failure rate limit for a destination to be marked as unhealthy. The value is in range `(0,1)`. Default value is provided by the global `DefaultFailureRateLimit` parameter.

### Design
The main component is [PassiveHealthCheckMiddleware](xref:Yarp.ReverseProxy.Health.PassiveHealthCheckMiddleware) sitting in the request pipeline and analyzing responses returned by destinations. For each response from a destination belonging to a cluster with enabled passive health checks, `PassiveHealthCheckMiddleware` invokes an [IPassiveHealthCheckPolicy](xref:Yarp.ReverseProxy.Health.IPassiveHealthCheckPolicy) specified for the cluster. The policy analyzes the given response, evaluates a new destination's passive health state and calls [IDestinationHealthUpdater](xref:Yarp.ReverseProxy.Health.IDestinationHealthUpdater) to actually update [DestinationHealthState.Passive](xref:Yarp.ReverseProxy.Model.DestinationHealthState.Passive) value. The update happens asynchronously in the background and doesn't block the request pipeline. When a destination gets marked as unhealthy, it stops receiving new requests until it gets reactivated after a configured period. Reactivation means the destination's `DestinationHealthState.Passive` state is reset from `Unhealthy` to `Unknown` and the cluster's list of healthy destinations is rebuilt to include it. A reactivation is scheduled by `IDestinationHealthUpdater` right after setting the destination's `DestinationHealthState.Passive` to `Unhealthy`.
Expand Down Expand Up @@ -329,10 +329,10 @@ public class FirstUnsuccessfulResponseHealthPolicy : IPassiveHealthCheckPolicy
## Available destination collection
Destinations health state is used to determine which of them are eligible for receiving proxied requests. Each cluster maintains its own list of available destinations on `AvailableDestinations` property of the [ClusterDestinationState](xref:Yarp.ReverseProxy.Model.ClusterDestinationsState) type. That list gets rebuilt when any destination's health state changes. The [IClusterDestinationsUpdater](xref:Yarp.ReverseProxy.Health.IClusterDestinationsUpdater) controls that process and calls an [IAvailableDestinationsPolicy](xref:Yarp.ReverseProxy.Health.IAvailableDestinationsPolicy) configured on the cluster to actually choose the available destinations from the all cluster's destinations. There are the following built-in policies provided and custom ones can be implemented if necessary.

- `HealthyAndUnknown` - inspects each `DestinationState` and adds it on the available destination list if all of the following statements are TRUE. If no destinations are available then requests will get a 503 error. This is the default policy.
- `HealthyAndUnknown` - inspects each `DestinationState` and adds it on the available destination list if all of the following statements are TRUE. If no destinations are available then requests will get a 503 error.
- Active health checks are disabled on the cluster OR `DestinationHealthState.Active != DestinationHealth.Unhealthy`
- Passive health checks are disabled on the cluster OR `DestinationHealthState.Passive != DestinationHealth.Unhealthy`
- `HealthyOrPanic` - calls `HealthyAndUnknown` policy at first to get the available destinations. If none of them are returned from this call, it marks all cluster's destinations as available.
- `HealthyOrPanic` - calls `HealthyAndUnknown` policy at first to get the available destinations. If none of them are returned from this call, it marks all cluster's destinations as available. This is the default policy.

**NOTE**: An available destination policy configured on a cluster will be always called regardless of if any health check is enabled on the given cluster. The health state of a disabled health check is set to `Unknown`.

Expand Down
2 changes: 1 addition & 1 deletion src/ReverseProxy/Configuration/ConfigValidator.cs
Original file line number Diff line number Diff line change
Expand Up @@ -482,7 +482,7 @@ private void ValidateHealthChecks(IList<Exception> errors, ClusterConfig cluster
if (string.IsNullOrEmpty(availableDestinationsPolicy))
{
// The default.
availableDestinationsPolicy = HealthCheckConstants.AvailableDestinations.HealthyAndUnknown;
availableDestinationsPolicy = HealthCheckConstants.AvailableDestinations.HealthyOrPanic;
}

if (!_availableDestinationsPolicies.ContainsKey(availableDestinationsPolicy))
Expand Down
4 changes: 2 additions & 2 deletions src/ReverseProxy/Health/ClusterDestinationsUpdater.cs
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ private void UpdateInternal(ClusterState cluster, IReadOnlyList<DestinationState
var config = cluster.Model.Config;
var destinationPolicy = _destinationPolicies.GetRequiredServiceById(
config.HealthCheck?.AvailableDestinationsPolicy,
HealthCheckConstants.AvailableDestinations.HealthyAndUnknown);
HealthCheckConstants.AvailableDestinations.HealthyOrPanic);

var availableDestinations = destinationPolicy.GetAvailalableDestinations(config, allDestinations);

Expand All @@ -85,7 +85,7 @@ private void UpdateInternal(ClusterState cluster, IReadOnlyList<DestinationState
// The first thread (T1) finished a rebuild and left the lock while still holding the semaphore. The second thread (T2)
// waiting on the lock gets awaken, proceeds under the lock and begins the next rebuild. If at this exact moment
// the third thread (T3) enters this method and tries to acquire the semaphore, it will be debounced because
// the semaphore's count is still 0. However, T2 could have already made some progress and didnt' observe updates made
// the semaphore's count is still 0. However, T2 could have already made some progress and didn't observe updates made
// by T3.
// By releasing the semaphore under the lock, we make sure that in the above situation T3 will proceed till the lock and
// its updates will be observed anyways.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
using Yarp.ReverseProxy.Common;
using Yarp.ReverseProxy.Configuration;
using Yarp.ReverseProxy.Forwarder;
using Yarp.ReverseProxy.Health;

namespace Yarp.ReverseProxy;

Expand Down Expand Up @@ -65,6 +66,7 @@ public async Task PassiveHealthChecksEnabled_MultipleDestinationFailures_ProxyRe
{
HealthCheck = new HealthCheckConfig
{
AvailableDestinationsPolicy = HealthCheckConstants.AvailableDestinations.HealthyAndUnknown,
Passive = new PassiveHealthCheckConfig
{
Enabled = true
Expand All @@ -85,6 +87,7 @@ await test.Invoke(async uri =>
using var response = await client.SendAsync(new HttpRequestMessage(HttpMethod.Get, uri));

Assert.Equal(i < 10 ? HttpStatusCode.BadGateway : HttpStatusCode.ServiceUnavailable, response.StatusCode);
await Task.Yield();
}
});

Expand Down