-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ILM execution order on phase rollover #61014
Comments
Pinging @elastic/es-core-features (:Core/Features/ILM+SLM) |
@rsdrakh thanks for describing this issue. We talked about it today and we came up with the proposal of allowing the number of replicas to be configurable in the shrink action. |
@andreidan Thanks for considering this issue! So as I understood, on rollover there is an allocation action followed by a shrink action. I am not familiar with the code, but IMHO a separate "replicas" action would make most sense, executed before allocation/shrink in case of reduced replicas, and post allocation/shrink in case of increased replicas. |
@rsdrakh you make a great point about the most optimal execution path depending on how the number of replicas is changing (increased or decreased).
A
Yes, with our proposal, the
No, the shrink action would not be involved here. This is one use case that our proposal would not tackle.
It does, and I agree that a |
See also #73499, which isn't the same as this issue, but does touch on some additional flexibility around the shrink action and juggling the number of replicas while executing that action. |
I experienced non-optimal behaviour on ILM rollover from hot to warm phase.
My general pattern is
When executing the ILM policy, elasticsearch first creates the replicas for the existing primaries, then allocates one copy of each shard to the node that will do the shrink, then executes the shrink and again, creates a replica of the now shrunk single primary shard.
This adds some overhead, as creating the replica after the primary shards are in the configured condition (1 shrunk to a single shard/segment) is redundant.
In an environment with 2 nodes and 2 primaries that is no problem, as creating the replica has the same result as allocating all primaries to a single node.
But as soon as there are more primaries or more nodes, there is (in my opinion) unnecessary shard movement, at the recommended 25-50GB each.
Would it be possible to implement a logic that steers the allocation on rollover, taking into account such shard movements?
The text was updated successfully, but these errors were encountered: