Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[API] Add Subnet-Specific Health Checks #1264

Closed
patrick-ogrady opened this issue Mar 29, 2023 · 3 comments · Fixed by #1304 or #1358
Closed

[API] Add Subnet-Specific Health Checks #1264

patrick-ogrady opened this issue Mar 29, 2023 · 3 comments · Fixed by #1304 or #1358
Assignees
Labels
incident response monitoring This primarily focuses on logs, metrics, and/or tracing

Comments

@patrick-ogrady
Copy link
Contributor

Although one Subnet on an AvalancheGo node may be unhealthy, operators may still wish to interact with other Subnets running on it. AvalancheGo's existing health check, however, returns unhealthy if any Subnet is unhealthy. This behavior led to an outage in Subnet APIs during this incident even though most Subnets were able to serve queries because API providers prevented a node serving queries if this "global" check failed (as that was the only mechanism they had to gauge health of the underlying node).

We should add a new health check or add an argument to the existing check (https://docs.avax.network/apis/avalanchego/apis/health#healthhealth) that allows for just checking the health of a specific Subnet. This will allow API providers to serve queries to any subset of healthy Subnets on a node.

I don't think we should remove the "global" health check in this change (which still is useful for getting a "full sense" of a node's status).

@patrick-ogrady patrick-ogrady added this to the v1.10.1 milestone Mar 29, 2023
@ceyonur ceyonur self-assigned this Mar 29, 2023
@StephenButtolph
Copy link
Contributor

We'll need to make sure to add this support for the GET calls as well. Load balancers typically just look for a 200 response, so jsonrpc doesn't work well for them (which is why we added the special GET handling)

@StephenButtolph StephenButtolph added the monitoring This primarily focuses on logs, metrics, and/or tracing label Mar 29, 2023
@ceyonur ceyonur linked a pull request Apr 6, 2023 that will close this issue
@ceyonur
Copy link
Contributor

ceyonur commented Apr 12, 2023

Need to add docs for that, then I will close.

@ceyonur
Copy link
Contributor

ceyonur commented May 18, 2023

Added to docs: https://docs.avax.network/apis/avalanchego/apis/health#filtering

There is still a pending PR that would filter min connected health checks with subnetIDs: #1358

@ceyonur ceyonur closed this as completed Jun 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
incident response monitoring This primarily focuses on logs, metrics, and/or tracing
Projects
None yet
3 participants