-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose index health and status to the _stats API #81954
Expose index health and status to the _stats API #81954
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this Mary! I left some comments about the implementation
server/src/main/java/org/elasticsearch/action/admin/indices/stats/IndicesStatsResponse.java
Outdated
Show resolved
Hide resolved
) { | ||
super(totalShards, successfulShards, failedShards, shardFailures); | ||
this.shards = shards; | ||
this.clusterState = clusterState; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't serialized in the constructor and writeTo
methods, so it's going to be null quite frequently.
I think we have at least two alternative options:
The first is to continue to pass the cluster state into this method, but construct a map of indexName -> state & health, which is serialized in the IndicesStatsResponse
so that it's available on every node. Constructing this map would happen in the constructor, so that we don't hold on to a reference to the cluster state in the class itself.
The second would be to push the health and state into the ShardStats
object itself, so that we could collate the index health and open/closed state from the ShardStats the way we calculate other states in getIndices()
.
I think I lean slightly towards the second option, because it would allow us to expose this information (state and health) in the shard stats API as well, re-using it for this particular use case. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was having hard timing deciding how to approach it as well. That's why this draft contains the least invasive option to start with.
I agree these are the most obvious alternatives, I had the following concerns:
1) Keep a map with index state and health
Pros:
- Minimal penalty during serialization, it's just a map with the metadata
- This feels to me like the most logical place
Cons:
- Calculation overhead in the constructor since we would have to go through the shards to determine the relevant indices
2) Add it to the ShardStats
Pros:
- Nicer code wise and the new information will be available right where we need it, no need for extra maps etc.
Cons:
- This feels more "convenient" than "right" to me. What I mean with that is, an index has one or more shards, the
ShardStats
class contains information about a single shard (if I am not mistaken), adding index information feels like we are stretching the scope of theShardStats
to contain also some index information.
I would like to put out there one more option, again with some trade offs:
3) Initialize indices in the constructor and add it in the serialization
Pros:
- No intermediary maps, the information is available right where we need it
- This feels to me like the most logical place
Cons:
- Calculation overhead in the constructor since we would have to go through the shards to determine the relevant indices (if we need this information often then that's not a big deal because we go through the shards once.
- Serialization penalty, with this option we are increasing the serialized object significantly.
What do you think? Based on this I am kind of leaning towards the first or the third.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I gave it a try and implemented the extra maps because it felt the best option. Re-reading your initial comment, you say:
it would allow us to expose this information (state and health) in the shard stats API as well, re-using it for this particular use case
This is not covered in this solution. Do you think this benefit is strong enough to justify adding to the stats of a single shards information about the whole index?
server/src/main/java/org/elasticsearch/action/admin/indices/stats/IndicesStatsResponse.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/cluster/health/ClusterHealthStatus.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/cluster/health/ClusterHealthStatus.java
Outdated
Show resolved
Hide resolved
@dakrone Thank you for the comments, they were really helpful for me! |
@elasticmachine update branch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Mary! I think this is a better implementation. I left a couple comments and answers to your comment here (because Github won't let me attach a comment to your response for some reason):
2) Add it to the
ShardStats
Cons:
- This feels more "convenient" than "right" to me. What I mean with that is, an index has one or more shards, the
ShardStats
class contains information about a single shard (if I am not mistaken), adding index information feels like we are stretching the scope of theShardStats
to contain also some index information.
Well, a shard does itself have the concept of state (open or closed), though it's confusing to re-use IndexMetadata.State
for a shard, since it says index metadata and not shard metadata. We don't have an enum/state at the shard level because we don't really expose it outside of the concept of an index (yet).
As for health, we have the concept of assigned, initializing, or unassigned for a shard, which very roughly maps on to the index health (I guess it would be either red or green in all cases), so it's almost-but-not-quite pertinent to a shard in addition to an index.
I think it's fine to wait on this sort of fine-grained state for a single shard until we figure out exactly what we want from the "fine-grained health API" project.
3) Initialize indices in the constructor and add it in the serialization
Pros:
- No intermediary maps, the information is available right where we need it
- This feels to me like the most logical place
Cons:
- Calculation overhead in the constructor since we would have to go through the shards to determine the relevant indices (if we need this information often then that's not a big deal because we go through the shards once.
- Serialization penalty, with this option we are increasing the serialized object significantly.
I agree this is the most logical from a behavior point of view, however, I am also concerned about the overhead for serialization and construction. I took a look at the git-blame output for this and apparently it wasn't a performance optimization added later —it's been implemented that way (calculated on getIndices()
) the entire time.
What do you think? Based on this I am kind of leaning towards the first or the third.
I agree, I also would lean towards the first or the third. I think the indices stats API is hit fairly heavily from the monitoring side, so for that reason, I think we should stick with the first option for now. We can always change it later since it is an implementation detail and wouldn't break anything to change.
) { | ||
super(totalShards, successfulShards, failedShards, shardFailures); | ||
this.shards = shards; | ||
if (clusterState != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When do we expect the cluster state to be null? As far as I can tell it would only be in the tests.
I think we should require it to be non-null (with Objects.requireNonNull(clusterState)
and handle the case when clusterState.getMetadata().index(index)
returns null gracefully. Then in tests if needed you can pass ClusterState.EMPTY_STATE
if you don't need to construct it for anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
server/src/main/java/org/elasticsearch/action/admin/indices/stats/IndexStats.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/action/admin/indices/stats/IndexStats.java
Outdated
Show resolved
Hide resolved
@elasticmachine update branch |
Thanks for thinking along with me, I am looking forward to see how the fine-grained health api is going to be formed! |
Pinging @elastic/es-data-management (Team:Data Management) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for iterating on this Mary!
The feature added in elastic#81954 lacks coverage in BwC situations. This commit adds a YAML test to address that.
The feature added in #81954 lacks coverage in BwC situations. This commit adds a YAML test to address that.
Expose the index health and status to the
_stats
API. We only enrich theIndicesStatsResponse
with the new fields when it is called by the API in order to not affect the performance during internal usage.Resolve #80413