Consolidated health check service #1723

samsp-msft · 2022-05-16T20:24:13Z

What should we add or change to make your life better?

YARP can do active health checks against backend servers to make sure that they are able to respond successfully to requests. In the case of having a number of YARP proxy instances, and a large number of backends, each YARP instance will need to ping each back end for health checks. As the number of servers grows, the health checks will also grow, potentially exponentially.

For scale out scenarios, YARP should have the ability to run the health check as a separate service. That should be runnable on a limited number of servers, which will perform the health checks and then provide the data to other YARP instances.

Concept

YARP includes a consolidated health check service which can be configured to run on a server or two. This service will talk to the configuration server #1710 to understand the cluster and destination definitions. It will perform the active health checks against the servers based on the URL definitions in config.

The configuration service will act as a broker, enabling instances to discover each other:

The health check service will register with the configuration service telling it of its presence
YARP proxy servers will get the address of the health check service from the configuration service. They will use its active health check data to determine which sites are unwell.
YARP proxies will continue to use passive health checks against backends. If/when they determine that a backend is likely unhealthy, they will notify the consolidated health check service.
- The health check service will make a determination if the destination is healthy, so that one bad YARP instance can't DoS the entire system.
If there are multiple health check servers, then they will inform each other of health changes. Similar to notifications from destination servers, it will make its own determination on actual the health status.

Proposal

Consolidated health checks will be dependent on having a configuration server. This features value is mostly when used in a scale out scenario where there are multiple YARP instances. The configuration server will provide the orchestration of YARP instances knowing about the health check server, and also for the health check server knowing about the configuration of the clusters and destinations.

The configuration server will include configuration data about the health checks as part of the configuration that is exposed via rest endpoints, and notifications in either direction about a specific destination's health.

rwkarg · 2022-05-16T23:33:21Z

It may be interesting to allow different implementations of the Config Service or Health Check service.

For example, we're planning on using Orleans to run health checks in the cluster (but only one instance of each health check) without needing to manage extra services. We haven't yet implemented, but could implement further segmentation by availability zone or other failure zone by adding that to the Orleans grain id for the health check.
Ex. all us-west2-a LBs hit the <route>/us-west2-a/HealthCheck grain and similar for us-west2-b -> <route>/us-west2-b/HealthCheck. If the segmentation (AZ in this example) is flexible then any segmentation can be provided for a particular LB.

An Orleans based implementation may not be provided out of the box, but we'd like to be able to utilize a similar implementation behind an abstraction if possible.

One of the reasons we have integrated with Orleans is that it is also useful for things like the Config Server (have a single point of computation for merged routes/clusters from multiple k8s clusters) and we also use it for rate limiting across a collection of load balancers. If you're running proxies at scale, you have to solve the distributed system problems somehow; Orleans is how we're solving it without needing to implement the whole distributed foundation from scratch.

samsp-msft · 2022-05-17T19:54:18Z

Consider #267 where the destinations have scheduled downtime, that that health check system could push out that data to proxies.

samsp-msft · 2023-01-30T20:58:27Z

Feedback from a 1P team - For Http/2 the health checks ensure that there are warm connections to all destinations.

Tratcher · 2023-01-30T23:54:41Z

edit that's not HTTP/2 specific, it also helps for HTTP/1.x.

samsp-msft added the Type: Idea This issue is a high-level idea for discussion. label May 16, 2022

karelz added this to the YARP 2.0.0 milestone May 17, 2022

samsp-msft mentioned this issue May 17, 2022

Health checking should allow for health data to come from a 3rd service #304

Closed

adityamandaleeka mentioned this issue May 19, 2022

API for reporting health from external sources #267

Closed

samsp-msft added this to YARP 2.x Jun 9, 2022

samsp-msft moved this to 📋 Backlog in YARP 2.x Jun 9, 2022

rwkarg mentioned this issue Sep 7, 2022

[Proposal] Provide ASP.NET service providers written in Orleans dotnet/orleans#7774

Open

adityamandaleeka modified the milestones: YARP 2.0.0, YARP 2.x Jan 9, 2023

adityamandaleeka modified the milestones: YARP 2.1, YARP 2.x Nov 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consolidated health check service #1723

Consolidated health check service #1723

samsp-msft commented May 16, 2022

rwkarg commented May 16, 2022 •

edited

Loading

samsp-msft commented May 17, 2022

samsp-msft commented Jan 30, 2023

Tratcher commented Jan 30, 2023

Consolidated health check service #1723

Consolidated health check service #1723

Comments

samsp-msft commented May 16, 2022

What should we add or change to make your life better?

Concept

Proposal

rwkarg commented May 16, 2022 • edited Loading

samsp-msft commented May 17, 2022

samsp-msft commented Jan 30, 2023

Tratcher commented Jan 30, 2023

rwkarg commented May 16, 2022 •

edited

Loading