[Feature Request] Enable data/ coordinator node in the cluster to serve cluster state and its entities (like index, alias etc.) read APIs #12272

rajiv-kv · 2024-02-09T09:13:49Z

Is your feature request related to a problem? Please describe

cluster-state typically grows into 100's of MB in size in case of large clusters with thousands of shard. API requests such as cat/shards, cat/indices and node/_stats require copy of cluster-state and fetch it from cluster-manager. However cluster-state is also cached at node serving the API request and can be consumed if the node is in-sync with cluster-manager. This would avoid the serialization and transport overhead on cluster-manager to serve large cluster-state responses.

Describe the solution you'd like

We propose to introduce a new light-weight transport request to cluster-manager to return the cluster name, UUID, Term and Version of cluster-state. The node serving API request would use the new transport endpoint to verify if the cluster-state cached on the node is in-sync with the cluster-manager in the context of read request.

If the cluster-state is in-sync with cluster-manager, use the cluster-state on the node to serve the incoming API request
If the cluster-state is out-of-sync, make a subsequent transport call to fetch the latest cluster-state and then serve the incoming API request

Related component

Cluster Manager

Describe alternatives you've considered

No response

Additional context

API Response time /_cluster/statefrom local and remote

curl http://localhost:9200/_cluster/state?local=true -> 2652 ms
curl http://localhost:9200/_cluster/state?local=false -> 3858 ms

Size of cluster-state -> 153 MB

The text was updated successfully, but these errors were encountered:

shwetathareja · 2024-02-09T09:51:02Z

In the first iteration, in case node is not in sync with active cluster manager term and version, let it fallback to active leader to serve the request. Subsequently, we should evaluate how to refresh the state on the node.

shwetathareja · 2024-02-09T09:55:45Z

Besides latency benefits, this reduces the read API overhead on the leader cluster manager significantly and thereby, reducing its memory, cpu usage and transport overhead.

Bukhtawar · 2024-03-19T08:12:58Z

If the cluster-state is out-of-sync, make a subsequent transport call to fetch the latest cluster-state and then serve the incoming API request

Wondering if this can be done as a part of the same call?
Essentially a single transport call that returns empty response if the state is in-sync and an actual full cluster stats response from the leader if out-of-sync.
That way the full cluster stats response should be able to

Serve the request with a single transport call to the leader
Be able to refresh the local node's state

rajiv-kv · 2024-03-19T09:38:40Z

If the cluster-state is out-of-sync, make a subsequent transport call to fetch the latest cluster-state and then serve the incoming API request

Wondering if this can be done as a part of the same call? Essentially a single transport call that returns empty response if the state is in-sync and an actual full cluster stats response from the leader if out-of-sync. That way the full cluster stats response should be able to

Serve the request with a single transport call to the leader

Be able to refresh the local node's state

Having it as seperate API which returns only term-version helps to reuse at multiple places to make a decision at follower nodes as to whether it needs to fallback to cluster-manager / not.
It will help to make similar changes for other transport Read API's of ClusterManager (~25) and offload it from cluster-manager . The Read API's do not return full cluster-state response always.

Refreshing the local state when not in-sync, is something needs to be evaluated. The pull-based refresh should be able to work in background, while it consumes the ClusterUpdates pushed from cluster-manager. This will be a follow-up.
I think we can introduce a new API / modify the existing one later , if there is a requirement for returning the ClusterState (when not in-sync) and updating the local state.

Bukhtawar · 2024-03-19T13:19:30Z

Sounds good

rajiv-kv added enhancement Enhancement or improvement to existing feature or request untriaged labels Feb 9, 2024

github-actions bot added the Cluster Manager label Feb 9, 2024

shwetathareja changed the title ~~[Feature Request] Introduce TermVersion check for read requests of ClusterState~~ [Feature Request] Enable data/ coordinator node in the cluster to serve cluster state and its entities (like index, alias etc.) read APIs Feb 9, 2024

shwetathareja removed the untriaged label Feb 9, 2024

rajiv-kv mentioned this issue Feb 14, 2024

Light weight Transport action to verify local term before fetching cluster-state from remote #12252

Merged

8 tasks

peternied closed this as completed in #12252 Mar 20, 2024

This was referenced Mar 21, 2024

Light weight Transport action to verify local term before fetching cl… #12824

Merged

Light weight Transport action to verify local term before fetching cl… #12825

Merged

Update the version for local term fetch to 2.13 #12830

Merged

rwali-aws added this to Cluster Manager Project Board Apr 22, 2024

github-project-automation bot moved this to 🆕 New in Cluster Manager Project Board Apr 22, 2024

rwali-aws moved this from 🆕 New to ✅ Done in Cluster Manager Project Board Apr 22, 2024

rwali-aws assigned rajiv-kv Apr 22, 2024

rwali-aws added the v2.13.0 Issues and PRs related to version 2.13.0 label Apr 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Enable data/ coordinator node in the cluster to serve cluster state and its entities (like index, alias etc.) read APIs #12272

[Feature Request] Enable data/ coordinator node in the cluster to serve cluster state and its entities (like index, alias etc.) read APIs #12272

rajiv-kv commented Feb 9, 2024 •

edited by shwetathareja

Loading

shwetathareja commented Feb 9, 2024

shwetathareja commented Feb 9, 2024

Bukhtawar commented Mar 19, 2024

rajiv-kv commented Mar 19, 2024

Bukhtawar commented Mar 19, 2024

[Feature Request] Enable data/ coordinator node in the cluster to serve cluster state and its entities (like index, alias etc.) read APIs #12272

[Feature Request] Enable data/ coordinator node in the cluster to serve cluster state and its entities (like index, alias etc.) read APIs #12272

Comments

rajiv-kv commented Feb 9, 2024 • edited by shwetathareja Loading

Is your feature request related to a problem? Please describe

Describe the solution you'd like

Related component

Describe alternatives you've considered

Additional context

shwetathareja commented Feb 9, 2024

shwetathareja commented Feb 9, 2024

Bukhtawar commented Mar 19, 2024

rajiv-kv commented Mar 19, 2024

Bukhtawar commented Mar 19, 2024

rajiv-kv commented Feb 9, 2024 •

edited by shwetathareja

Loading