-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Shrink indices without needing shards to be co-located #63519
Comments
Pinging @elastic/es-distributed (:Distributed/CRUD) |
We discussed this as a team today and we can see some value in supporting shrinking an index from shards that are not colocated. We considered a "remote shrink" operation, implemented as a new recovery source, which would copy segments directly from remote nodes into the new shards. This copy process could use hard-links if the source shard were on the same node (and filesystem) but would in general consume up to 2x the disk space; however today's full process is to relocate, then shrink, then relocate again, which itself consumes a good deal of additional disk space and network bandwidth. It'd also be a good deal simpler to orchestrate the shrinking process if it could be done in fewer steps. There is no need for the source shard copy to be a primary, so we could try and select source shard copies from the same AZ to further reduce cross-zone network costs. There's some subtleties. For instance in today's shrink we check up-front that we will not exceed the maximum doc count in any of the shrunken shards, but this check would need to be deferred until later and would need to be robust against retrying the shrink from a different source shard copy. That said, we are not intending to work on this project in the foreseeable future. We'll leave this issue open to invite other ideas or indications of support. |
I'd like to add my support. My issue is that without this there are no great solutions to avoiding node hot spots. The recommendation is to use The current situation means there are several best practices in direct conflict - using multiple shards per index for ingest performance, using shrink/force merge for search performance, and using |
Hey all not sure how this is not a bug or won't have any work in the foreseeable future. We created a Support Known Issue for support as this is coming up a lot, can we perhaps re-prioritize ? Thanks! cc @leehinman |
We have just noticed the same issue in a number of our deployments. We set Is there a way to change arbitrary settings of an index when transitioning ILM steps? That would solve the problem, since you could set the We're likely going to need to write some automation that detects indices in the warm phase that still have the |
@percygrunwald for what it is worth, we do this detection and un-setting automatically in 8.0+: #76732 |
Currently the shrink index operation requires 1 copy of every shard to reside on the same node.
This may be not possible for some indices.
Example:
Index primary store size = 5Tb
Index has setting total_shards_per_node: 2 (To avoid hotspots being created and reduce cpu usage during queries)
Rollover is not feasible, as too many indices/shards would be created, causing cluster instability
Number of primary shards = 100 (required to keep up with ingestion rates)
Largest Disk = 3.5Tb
Suggestions for improvement:
Distribute shrink operation across multiple nodes
Do not require all shards to be on same node
Do not move more shards than required.
Essentially the shrink could be performed on 2,3,4... shards on a per node basis. Thereby, distributing the workload across multiple nodes, reducing load from a shrink operation. Added benefit of performing the shrink operation across multiple nodes, fewer shards would need to be relocated.
Bonus improvements:
Ability to specify nodes as shrink only, so that index routing could be ignored for the shrink operation. This node type would not store data any time other than during the shrink operation. This would limit the impact of shrink load to ingestion and queries.
The text was updated successfully, but these errors were encountered: