-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize log cluster health performance. #87723
Optimize log cluster health performance. #87723
Conversation
@howardhuanghua please enable the option "Allow edits and access to secrets by maintainers" on your PR. For more information, see the documentation. |
Pinging @elastic/es-distributed (Team:Distributed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @howardhuanghua this is a really nice idea + speedup.
See my comments inline: I'm a little hesitant about using RoutingNodes
here and suggested a smaller change for a first step. Let me know what you think.
server/src/main/java/org/elasticsearch/cluster/routing/IndexRoutingTable.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/cluster/health/ClusterStateHealth.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/cluster/health/ClusterStateHealth.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @howardhuanghua looks really nice now! Just one question/request left.
server/src/main/java/org/elasticsearch/cluster/health/ClusterStateHealth.java
Outdated
Show resolved
Hide resolved
Jenkins test this |
Fixed changelog area value issue. |
Jenkins test this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests found a small bug in this, see comment inline.
ClusterHealthStatus computeStatus = ClusterHealthStatus.GREEN; | ||
for (String index : clusterState.metadata().getConcreteAllIndices()) { | ||
IndexRoutingTable indexRoutingTable = clusterState.routingTable().index(index); | ||
if (indexRoutingTable.allShardsActive()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're failing a test here if this turns out to be null
(which could be the case and is not a bug) currently. We have to short-circuit on indexRoutingTable == null
here as well, that should fix things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks you, I have fixed it.
Jenkins test this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM thanks @howardhuanghua this is a nice speedup in our many-shards benchmark run.
Thank you @original-brownbear. |
From current version of elasticsearch "8.12" this log is gone, even if I restart elasticsearch, anyone knows reason for that? used to have a log of |
Cluster health would be checked in each allocation task, since construct cluster state health would iterate all the shards,
logClusterHealthStateChange
is costly if cluster has huge number of shards.With this commit, we could skip redundant shards iteration. Quickly check cluster status by:
GREEN
directly.YELLOW
directly.RED
directly.