Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[META] Remote Routing Table - v2.16 #14685

Closed
himshikha opened this issue Jul 9, 2024 · 1 comment
Closed

[META] Remote Routing Table - v2.16 #14685

himshikha opened this issue Jul 9, 2024 · 1 comment
Labels
enhancement Enhancement or improvement to existing feature or request Meta Meta issue, not directly linked to a PR ShardManagement:Routing v2.16.0 Issues and PRs related to version 2.16.0

Comments

@himshikha
Copy link
Contributor

himshikha commented Jul 9, 2024

Please describe the end goal of this project

This Meta tracks issues to be targetted for v2.16.

Each shard movement results in a cluster state update which needs to be communicated to all the data nodes to be able to effectively route requests. This results in a scaling problem for a reasonably bigger size having large number of nodes. This can cause inter-node network to get swamped due to bigger states and high volume/frequency of network transfers.

Proposed Solution : Reduce memory and communication overhead for routing table updates using a remote store as an intermediate store and leveraging remote store interactions for data transfers and sparing the node to node network bandwidth

We will move Routing table to remote store. Cluster manager node will be responsible for updating the remote store whenever any updates in routing happen. Since we will have the complete table in storage, we can optimize on what we want to keep in memory on the nodes and use remote store to get the routing information whenever required. Data nodes will only need to keep routings for replica shards whose primary reside on the node.
For reducing communication overhead, cluster state publication will intimate data nodes of the change with updated cluster state term and version rather than complete diff. Data nodes will download the updated routing information from storage. This would make communication from cluster manager faster and each node can individually update their local memory.

Supporting References

Project Meta: #14164

Issues

Related component

Other

@himshikha himshikha added Meta Meta issue, not directly linked to a PR untriaged labels Jul 9, 2024
@himshikha himshikha added v2.16.0 Issues and PRs related to version 2.16.0 ShardManagement:Routing enhancement Enhancement or improvement to existing feature or request and removed untriaged labels Jul 9, 2024
@getsaurabh02
Copy link
Member

@himshikha are we good to close this out for 2.16?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request Meta Meta issue, not directly linked to a PR ShardManagement:Routing v2.16.0 Issues and PRs related to version 2.16.0
Projects
Status: 2.16 (First RC 07/23, Release 08/06)
Status: ✅ Done
Development

No branches or pull requests

2 participants