Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IProxyStateLookup and ClusterState.DestinationsState.AvailableDestinations - behavior change after upgrade from 1.1.1 -> 2.1.0 #2602

Closed
4865783a5d opened this issue Sep 10, 2024 · 2 comments
Assignees
Labels
Type: Bug Something isn't working

Comments

@4865783a5d
Copy link

4865783a5d commented Sep 10, 2024

Describe the bug

We are using IProxyStateLookup in a Asp.Net Core Healthcheck to check if a specific cluster is healthy using the cluster id as query param in a Azure environment with Azure Front Door and YARP as regional proxy. After upgrading from .NET 6 and YARP 1.1.1 to .NET 8 and YARP 2.1.0, AvailableDestinations contains an unhealthy destination, where in the previous version, that would not be the case. We checked breaking changes by going through the release notes and couldn't find anything.

To Reproduce

The configuration uses a simple json-server

"Configuration": {
  "Routes": {
    "json-server": {
      "ClusterId": "json-server",
      "Match": {
        "Hosts": [ "localhost" ]
      }
    }
  },
  "Clusters": {
    "json-server": {
      "HealthCheck": {
        "Active": {
          "Enabled": "true",
          "Interval": "00:00:05",
          "Timeout": "00:00:03",
          "Policy": "ConsecutiveFailures",
          "Path": "/health"
        }
      },
      "Destinations": {
        "local": {
          "Address": "http://localhost:3000"
        }
      },
      "Metadata": {
        "ConsecutiveFailuresHealthPolicy.Threshold": "2"
      }
    }
  }
}

When calling the exposed healthcheck with http://localhost:7014/health?clusterId=json-server we get the following results (The cluster is not running at that time):

With YARP 2.1.0 and .NET8
image

With YARP 1.1.0 and .NET6
image

Further technical details

Relevant health check code in IHealthcheck service

if (_httpContextAccessor.HttpContext == null)
    throw new InvalidOperationException("HttpContext is null.");

var httpRequestFeature = _httpContextAccessor.HttpContext.GetHttpRequestFeature();
var clusters = _proxyStateLookup.GetClusters();

if (!clusters.Any())
    return Task.FromResult(new HealthCheckResult(HealthStatus.Unhealthy, HealthProbeErrorDescriber.NoConfig));

if (TryGetClusterId(httpRequestFeature.QueryString, out var clusterId))
{
    if (!_proxyStateLookup.TryGetCluster(clusterId, out var clusterState))
    {
        return Task.FromResult(new HealthCheckResult(HealthStatus.Unhealthy,
            HealthProbeErrorDescriber.FormatClusterNotFound(clusterId)));
    }

    return Task.FromResult(clusterState.DestinationsState.AvailableDestinations.Count > 0
        ? HealthCheckResult.Healthy()
        : new(HealthStatus.Unhealthy, HealthProbeErrorDescriber.FormatUnhealthyCluster(clusterId)));
}
  • Yarp.ReverseProxy 2.1.0, .NET 8
  • Windows
@4865783a5d 4865783a5d added the Type: Bug Something isn't working label Sep 10, 2024
@4865783a5d
Copy link
Author

4865783a5d commented Sep 10, 2024

After reviewing the YARP source and re-checking release notes, I found that #2171 is the reason we're seeing this behavior.

Relevant code in HealthyOrPanicDestinationsPolicy:

 public override IReadOnlyList<DestinationState> GetAvailalableDestinations(ClusterConfig config, IReadOnlyList<DestinationState> allDestinations)
 {
     var availableDestinations = base.GetAvailalableDestinations(config, allDestinations);
     return availableDestinations.Count > 0 ? availableDestinations : allDestinations;
 }

While understandable, it has quite an impact on the operating side of things. We typically want to be informed if a cluster is unhealthy for the sake of alerting / monitoring. I personally would have classified that as breaking change :) - as in, it broke our monitoring.

We fixed the issue with a IProxyConfigFilter to set the 1.1.1 default policy on all cluster configs.

@MihaZupan
Copy link
Member

Sorry about that, you can see that the PR #2171 was marked as a breaking change, but we forgot to call out breaking changes in the 2.1.0 preview 1 release notes.

I've updated the release notes for that release now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants