Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[META] Remote Routing Table - v2.15 #12995

Closed
7 tasks done
himshikha opened this issue Apr 1, 2024 · 1 comment
Closed
7 tasks done

[META] Remote Routing Table - v2.15 #12995

himshikha opened this issue Apr 1, 2024 · 1 comment
Assignees
Labels
Meta Meta issue, not directly linked to a PR ShardManagement:Routing v2.15.0 Issues and PRs related to version 2.15.0

Comments

@himshikha
Copy link
Contributor

himshikha commented Apr 1, 2024

Please describe the end goal of this project

Project Meta: #14164

This Meta tracks issues to be targetted for v2.15.

Each shard movement results in a cluster state update which needs to be communicated to all the data nodes to be able to effectively route requests. This results in a scaling problem for a reasonably bigger size having large number of nodes. This can cause inter-node network to get swamped due to bigger states and high volume/frequency of network transfers.

Proposed Solution : Reduce memory and communication overhead for routing table updates using a remote store as an intermediate store and leveraging remote store interactions for data transfers and sparing the node to node network bandwidth

We will move Routing table to remote store. Cluster manager node will be responsible for updating the remote store whenever any updates in routing happen. Since we will have the complete table in storage, we can optimize on what we want to keep in memory on the nodes and use remote store to get the routing information whenever required. Data nodes will only need to keep routings for replica shards whose primary reside on the node.
For reducing communication overhead, cluster state publication will intimate data nodes of the change with updated cluster state term and version rather than complete diff. Data nodes will download the updated routing information from storage. This would make communication from cluster manager faster and each node can individually update their local memory.

Issues

Related component

ShardManagement:Routing

@himshikha himshikha added Meta Meta issue, not directly linked to a PR untriaged labels Apr 1, 2024
@peternied
Copy link
Member

[Triage - attendees 1 2 3 4 5 6 7 8]
@himshikha Thanks for creating this issue; however, it isn't being accepted due to not having enough detail to justify what is being acomplished, please use the meta template with a number of clear deliverables. Please feel free to open a new issue after addressing the reason.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Meta Meta issue, not directly linked to a PR ShardManagement:Routing v2.15.0 Issues and PRs related to version 2.15.0
Projects
Status: 2.15.0 (Release window opens on June 10th, 2024 and closes on June 25th, 2024)
Status: New
Status: ✅ Done
Development

No branches or pull requests

4 participants