-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove node from cluster when node locks are broken #58373
Labels
:Distributed Coordination/Cluster Coordination
Cluster formation and cluster state publication, including cluster membership and fault detection.
>enhancement
Team:Distributed (Obsolete)
Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.
Comments
DaveCTurner
added
>enhancement
:Distributed Coordination/Cluster Coordination
Cluster formation and cluster state publication, including cluster membership and fault detection.
team-discuss
labels
Jun 19, 2020
Pinging @elastic/es-distributed (:Distributed/Cluster Coordination) |
elasticmachine
added
the
Team:Distributed (Obsolete)
Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.
label
Jun 19, 2020
We discussed this today and agreed to proceed. |
Thanks @DaveCTurner should we work on the existing PR, or else we can start a new one |
I'd prefer a new one once #52680 is merged. |
This was referenced Aug 20, 2020
DaveCTurner
pushed a commit
that referenced
this issue
Sep 22, 2020
In #52680 we introduced a mechanism that will allow nodes to remove themselves from the cluster if they locally determine themselves to be unhealthy. The only check today is that their data paths are all empirically writeable. This commit extends this check to consider a failure of `NodeEnvironment#assertEnvIsLocked()` to be an indication of unhealthiness. Closes #58373
DaveCTurner
pushed a commit
that referenced
this issue
Sep 22, 2020
In #52680 we introduced a mechanism that will allow nodes to remove themselves from the cluster if they locally determine themselves to be unhealthy. The only check today is that their data paths are all empirically writeable. This commit extends this check to consider a failure of `NodeEnvironment#assertEnvIsLocked()` to be an indication of unhealthiness. Closes #58373
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
:Distributed Coordination/Cluster Coordination
Cluster formation and cluster state publication, including cluster membership and fault detection.
>enhancement
Team:Distributed (Obsolete)
Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.
In #52680 we are introducing a mechanism that will allow nodes to remove themselves from the cluster if they locally determine themselves to be unhealthy. The only check today is that their data paths are all empirically writeable. We could also check
NodeEnvironment#assertEnvIsLocked()
here; indeed we already call this method during the health check but do not consider a failure to be fatal (see #52680 (comment)).A broken node lock today blocks things like allocating new shards to the node, but I think it does not block indexing or searching on existing shards since these are protected by shard-level locks instead. On the other hand there's something very wrong with your environment if the node lock is broken and it seems reasonable to treat it pretty seriously.
The text was updated successfully, but these errors were encountered: