[META] Remote Routing Table - v2.16 #14685
Labels
enhancement
Enhancement or improvement to existing feature or request
Meta
Meta issue, not directly linked to a PR
ShardManagement:Routing
v2.16.0
Issues and PRs related to version 2.16.0
Please describe the end goal of this project
This Meta tracks issues to be targetted for v2.16.
Each shard movement results in a cluster state update which needs to be communicated to all the data nodes to be able to effectively route requests. This results in a scaling problem for a reasonably bigger size having large number of nodes. This can cause inter-node network to get swamped due to bigger states and high volume/frequency of network transfers.
Proposed Solution : Reduce memory and communication overhead for routing table updates using a remote store as an intermediate store and leveraging remote store interactions for data transfers and sparing the node to node network bandwidth
We will move Routing table to remote store. Cluster manager node will be responsible for updating the remote store whenever any updates in routing happen. Since we will have the complete table in storage, we can optimize on what we want to keep in memory on the nodes and use remote store to get the routing information whenever required. Data nodes will only need to keep routings for replica shards whose primary reside on the node.
For reducing communication overhead, cluster state publication will intimate data nodes of the change with updated cluster state term and version rather than complete diff. Data nodes will download the updated routing information from storage. This would make communication from cluster manager faster and each node can individually update their local memory.
Supporting References
Project Meta: #14164
Issues
Related component
Other
The text was updated successfully, but these errors were encountered: