Skip to content
This repository has been archived by the owner on Jan 22, 2025. It is now read-only.

Reduce reliance on getHealth #31989

Closed
wants to merge 1 commit into from
Closed

Conversation

brooksprumo
Copy link
Contributor

The RPC getHealth method is quite arbitrary, and doesn't do a good job reflecting the health of a node. Its use should be discouraged.

  • If a node has no known validators, it'll always report healthy
  • If a node's accounts hash interval is larger than health_check_slot_distance, it'll mostly report unhealthy
  • If the known validators accounts hash interval is larger than health_check_slot_distance, it'll mostly report unhealthy
  • If a node is keeping up with transaction processing but the background AccountsHashVerifier is dropped some accounts hash verifier request, the node may report unhealthy

@brooksprumo
Copy link
Contributor Author

The specific implementation of this PR is tongue-in-cheek, but it does feel almost as arbitrary as the current version.

Eventually the accounts hash calculation interval in AccountsHashVerifier will be larger than 100 slots, and having another way to check health will be good.

@mvines
Copy link
Contributor

mvines commented Jun 6, 2023

It is very useful to be able to infer some sense of health without an external RPC reference point (solana catchup), I'd be sad to see this ripped out entirely without a replacement of some form.

For voting nodes I've long considered measuring the time it takes a node to observe its votes landing in a block as a sign of "health". Non-voting nodes could emit a "ping" transaction and measure the time to observe it, at a SOL cost. But this approach doesn't give much resolution when the node is far behind, as it won't be able to land any transaction, so I don't think that's ultimately the best alternative solution

@brooksprumo
Copy link
Contributor Author

It is very useful to be able to infer some sense of health without an external RPC reference point (solana catchup), I'd be sad to see this ripped out entirely without a replacement of some form.

For sure. This was my way of leveraging Cunningham's Law 😸.

I have no intent to actually get this version merged. Probably can continue real brainstorming over in #16957.

@github-actions github-actions bot added the stale [bot only] Added to stale content; results in auto-close after a week. label Jun 21, 2023
@github-actions github-actions bot closed this Jun 28, 2023
@brooksprumo brooksprumo deleted the 💪 branch October 23, 2023 16:40
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
stale [bot only] Added to stale content; results in auto-close after a week.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants