Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Health checking should allow for health data to come from a 3rd service #304

Closed
samsp-msft opened this issue Jul 6, 2020 · 7 comments
Closed
Labels
Type: Idea This issue is a high-level idea for discussion.
Milestone

Comments

@samsp-msft
Copy link
Contributor

If you have a large deployment, with multiple instances of YARP fronting the same services across a number of destinations, then having each instance of YARP collecting health data from all destinations can become a problem. If used in a mesh scenario, it becomes N^2 as each service is trying to determine the health of the other.

We should account for this from the start and support having a 3rd service be able to supply health data. So rather than having to poll each destination, you can ask a different authority for heath info, and it will supply it for all destinations.

@samsp-msft samsp-msft added the Type: Idea This issue is a high-level idea for discussion. label Jul 6, 2020
@samsp-msft samsp-msft added this to the 1.0.0 milestone Jul 9, 2020
@samsp-msft
Copy link
Contributor Author

Note: Design here needs to support the ability to do this scenario, we don't need to write the code to integrate with those 3rd services, just have the extensibility point.

@3GDXC
Copy link

3GDXC commented Sep 17, 2020

@samsp-msft this could be IHMO be done in a similar way to service discovery, where the services register with the reverse-proxy do a heartbeat broadcast (udp/tcp or gRPC/REST) if said heartbeat isn't done within a given period the reverse-proxy service informs others the destination is offline either by way of updating shared state store or broadcasting a message to nodes in cluster again via (udp/tcp or gRPC/REST)

@Tratcher
Copy link
Member

One proposal is that this could be implemented as an IActiveHealthCheckMonitor that queried the central health store rather than individual destinations.

@samsp-msft
Copy link
Contributor Author

This is a scenario where every deployment is probably different, and will use different mechanisms. The goal should be to provide the extensibility required so that the developer can easily integrate with whatever mechanisms have been chosen for that deployment.
Is it sufficient to define an interface like IActiveHealthCheckMonitor that can be implemented by the proxy consumer, or do we need to have a REST/gRPC type endpoint for collecting health data.

@Tratcher
Copy link
Member

Tratcher commented Nov 17, 2020

Fair question. We'll have to look at some health reporting systems to see if they're pull or push, what protocols they support, etc.

Does anybody have suggestions for centralized health systems we should look at interoping with?

@karelz karelz modified the milestones: YARP 1.0.0, Backlog Mar 29, 2021
@rwkarg
Copy link
Contributor

rwkarg commented Mar 1, 2022

Our usage isn't yet at the scale to require this, but our plan is to have the YARP proxies also part of an Orleans cluster. Then a Grain would represent an Active Health Check against a (destination, source_availability_zone) pair. Ex. dest_1/west-1a would be one Grain and all proxy instances in west-1a would ask that Grain for dest_1's health. Similarly for dest_1/west-1b, etc. Grain Observers could also be used to push health status transitions instead of having each proxy actively polling for status.

As the proxies scale up/down, the health checks would be distributed across the available instances and there would be only N Active checks for N services (per AZ).

This should be possible to implement very similarly to ActiveHealthCheckMonitor by swapping out the EntityActionScheduler and calling in to Grain instances instead.

It would require some extra setup if opting in to this (storage/config for Orleans clustering) but it does remove the need to manage a completely separate health checking system.

@samsp-msft
Copy link
Contributor Author

Duplicate of #1723

@samsp-msft samsp-msft marked this as a duplicate of #1723 May 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Idea This issue is a high-level idea for discussion.
Projects
None yet
Development

No branches or pull requests

5 participants