Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cluster health should distinguish RED from "no master" #34897

Closed
DaveCTurner opened this issue Oct 26, 2018 · 8 comments
Closed

Cluster health should distinguish RED from "no master" #34897

DaveCTurner opened this issue Oct 26, 2018 · 8 comments
Assignees
Labels
:Core/Infra/Core Core issues without another label >enhancement v8.0.0-alpha1

Comments

@DaveCTurner
Copy link
Contributor

DaveCTurner commented Oct 26, 2018

A RED cluster health means that there is either no master or else at least one primary is unassigned. However, in 7.0 there seems to be no simple way for clients to distinguish these two cases, and it would be useful to do so. For example:

  • a per-node health check should verify the node-level property of cluster membership, but may not be interested in the cluster-wide property of whether all the primaries are assigned.

  • an orchestrator may wish to wait for a freshly-started node to join a cluster before performing some followup actions, again without caring about whether all the primaries are assigned. Today this often happens by waiting for the HTTP port to be open, but this is unreliable: if no master is discovered within discovery.initial_state_timeout (default 30s) then we open the HTTP port anyway.

In the 6.x series a node responds to GET / with 503 Service Unavailable if it believes there to be no master (i.e. either NO_MASTER_BLOCK_* or STATE_NOT_RECOVERED_BLOCK is present), which allows these cases to be distinguished. However #29045 changes this in 7.0 so that we will always respond 200 OK to GET / (as is right and proper) so another mechanism is needed.

I think we should expose the presence or absence of these blocks in the output to GET _cluster/health and add the ability to wait for their absence, e.g.:

GET _cluster/health?local&wait_for_discovered_master
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra

@DaveCTurner
Copy link
Contributor Author

We discussed this and broadly agreed to do it. Note that we are not intending to change the meaning of RED health - it already means no master OR unassigned primaries, and we would like it to keep this meaning.

We contemplated identifying the discovered master in the cluster health response. I think I'd rather just have a boolean "discovered_master": true, on the grounds that you can find out which node is the discovered master via other APIs (GET _nodes/_master for instance) and the identity of the master isn't important for the cluster's health.

@DaveCTurner DaveCTurner added help wanted adoptme and removed discuss labels Nov 2, 2018
@DaveCTurner DaveCTurner self-assigned this Nov 4, 2018
@DaveCTurner DaveCTurner removed the help wanted adoptme label Nov 4, 2018
@jasontedor jasontedor added v8.0.0 and removed v7.0.0 labels Feb 6, 2019
@getsaurabh02
Copy link

I am currently working on it.

getsaurabh02 added a commit to getsaurabh02/elasticsearch that referenced this issue May 8, 2019
…o-master (elastic#34897)

Added boolean "discovered_master": true in the GET _cluster/health response to expose the presence of master.
@DaveCTurner
Copy link
Contributor Author

We discussed this again recently and determined that GET _cluster/health?timeout=0s returns 503 Service Unavailable when there is no master, and 200 OK otherwise, so it looks like we don't need any further action here.

@getsaurabh02
Copy link

Hi @DaveCTurner , thanks for responding on this. I tried running this and verified form code as well that GET _cluster/health?timeout=0s returns a valid response when used along with the flag local=true when no master is present. It does not return 503 Service Unavailable as expected otherwise.

This makes the API behavior inconsistent with the presence of local=true field when master is not discovered on a node. Also, using the health API request always without the local field will not be an efficient choice for large clusters, when triggered frequently.

Introducing the "discovered_master": true, as thought earlier, allows the no-master cases to be distinguished consistently, with and without the presence of the local field. Thoughts?

@DaveCTurner
Copy link
Contributor Author

This makes the API behavior inconsistent

Asking a node to compute the cluster health locally is always possible, so 200 seems like the right response there. 503 here doesn't mean simply "unhealthy", it means "so unhealthy that I can't even answer your question".

Also, using the health API request always without the local field will not be an efficient choice for large clusters, when triggered frequently.

This deserves some further investigation. Health requests should be pretty cheap, and there are benefits to getting an answer from the master. Are you really seeing the master struggling with the load here? Can you provide more details?

fatmcgav pushed a commit to fatmcgav/helm-charts that referenced this issue Nov 20, 2019
nodes are available.

The behaviour of the `/` endpoint changed[0] between 6.x and 7.x, whereby
previously it would return a HTTP `503` response when the cluster was
blocked, it now returns a HTTP `200` response even if there are no masters
available.

This change updates the behaviour of the `readinessProbe` command during
normal running to verify that the local node is responding and that there
are master nodes available. [1]

The desired behaviour here is that if the data nodes are unable to talk to
their master nodes for whatever reason, then the data nodes will become
`Unready` and therefore be removed from the Service load-balancer until
the master nodes are available again.

Refs:
[0] elastic/elasticsearch#29045
[1] elastic/elasticsearch#34897 (comment)
@getsaurabh02
Copy link

Hi @DaveCTurner replying back and requesting opening this thread again to get some more thoughts. We have seen instances where master node gets overwhelmed with too many (periodic) health requests being delegated to it from data nodes (without local=true), typically in a large cluster, resulting into steep JVM spikes and causing master duress. This is more apparent when the master node is already under stress due to increased number of pending tasks, or busy managing cluster states, such as during full cluster restart.

Even for non dedicated master setup, this can get aggravated with even with a small cluster sizes (less than 10 nodes). This could typically happens in scenarios when cluster has a font end service, or a load balancer or maybe some monitoring script which intends to look for node health periodically, to send or restrict traffic to node, based upon nodes ability to process the request. Without the presence of field local=true in the health API call it is not possible to uncover scenarios where a node is not connected to the active master. Also cluster/health?timeout=0s returns a valid response when used along with the flag local=true when no master is present.

As there could be multiple reason which could lead a node to this situation, such as leader check failures due to network partitioning etc. Please let me know if it still makes sense to have a dedicated "discovered_master": true for better clarity. This will leverage the local cluster state, without having to delegate each request to master to discover its presence.

@DaveCTurner
Copy link
Contributor Author

Repeating my previous message, health requests should be pretty cheap, can you analyse this in more detail so we can understand better how and why it's struggling?

galina-tochilkin pushed a commit to mtp-devops/3d-party-helm that referenced this issue Dec 20, 2022
nodes are available.

The behaviour of the `/` endpoint changed[0] between 6.x and 7.x, whereby
previously it would return a HTTP `503` response when the cluster was
blocked, it now returns a HTTP `200` response even if there are no masters
available.

This change updates the behaviour of the `readinessProbe` command during
normal running to verify that the local node is responding and that there
are master nodes available. [1]

The desired behaviour here is that if the data nodes are unable to talk to
their master nodes for whatever reason, then the data nodes will become
`Unready` and therefore be removed from the Service load-balancer until
the master nodes are available again.

Refs:
[0] elastic/elasticsearch#29045
[1] elastic/elasticsearch#34897 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Core/Infra/Core Core issues without another label >enhancement v8.0.0-alpha1
Projects
None yet
Development

No branches or pull requests

5 participants