-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Autoscaling during shrink #88292
Autoscaling during shrink #88292
Conversation
Fix autoscaling during shrink to disregard pinning to nodes. Instead signal that we need a minimum node size to hold the entire shrink operation. This avoids scaling far higher than necessary when cluster balancing does not allow a shrink to proceed. It is considered a (separate) balancing issue when a shrink cannot complete with enough space in the tier. This changes autoscaling in general for node pinning filters (based on `_id`, `_name` or `name` filters). Clone and split also pins to the shards they clone or split, similarly this is changed to ignore that pinning during autoscaling. Closes elastic#85480
Pinging @elastic/es-distributed (Team:Distributed) |
Hi @henningandersen, I've created a changelog YAML for you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks mostly good, I left a few comments 👍
...src/main/java/org/elasticsearch/xpack/autoscaling/storage/ReactiveStorageDeciderService.java
Outdated
Show resolved
Hide resolved
...src/main/java/org/elasticsearch/xpack/autoscaling/storage/ReactiveStorageDeciderService.java
Outdated
Show resolved
Hide resolved
...src/main/java/org/elasticsearch/xpack/autoscaling/storage/ReactiveStorageDeciderService.java
Outdated
Show resolved
Hide resolved
...src/main/java/org/elasticsearch/xpack/autoscaling/storage/ReactiveStorageDeciderService.java
Outdated
Show resolved
Hide resolved
// For resize shards only allow autoscaling if there is no other node where the shard could fit had it not been | ||
// a resize shard. Notice that we already removed any initial_recovery filters. | ||
diskOnly = nodesInTier(allocation.routingNodes()).map(node -> allocationDeciders.canAllocate(shard, node, allocation)) | ||
.anyMatch(ReactiveStorageDeciderService::isResizeOnlyNoDecision) == false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a test where there's enough space to hold the resize shard so we verify that we don't request more capacity in that case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find double negations a bit trappy 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is covered by the assertion here:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I missed that.
@@ -222,7 +298,8 @@ public static class AllocationState { | |||
Set<DiscoveryNode> nodes, | |||
Set<DiscoveryNodeRole> roles | |||
) { | |||
this.state = state; | |||
this.state = removeNodeLockFilters(state); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if removing the node lock filter in all cases couldn't lead to a different allocation outcome? Maybe we should only remove the initial_recovery
setting here? I'm not 100% sure though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea I followed was that we should ensure that we have a node that is large enough to hold the node-locked data, hence the addition to ensure we deliver a proper node-level size.
With that done, we can assume that it is an allocation problem if it cannot fit. Hence it seems fair to remove the node locking here. I am aware that our allocation system is not yet sophisticated enough, but I'd rather not autoscale to a too large setup (since that may be multiple steps too large) in that case. For ILM controlled shrink, it will eventually fail and retry. Manual intervention may be necessary until our allocation system can handle this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense, thanks for clarifying.
Thanks @fcofdez for the review and sorry about the left-over code from previous generations of this PR. This is ready for another round (commented on the two other outstanding comments). |
@elasticmachine update branch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
Thanks Francisco! |
Fix autoscaling during shrink to disregard pinning to nodes for the total tier size.
Instead signal that we need a minimum node size to hold the entire shrink
operation. This avoids scaling far higher than necessary when cluster
balancing does not allow a shrink to proceed. It is considered a
(separate) balancing issue when a shrink cannot complete with enough
space in the tier.
This changes autoscaling in general for node pinning filters (based on
_id
,_name
orname
filters).Clone and split also pins to the shards they clone or split, similarly
this is changed to ignore that pinning when calculating the total tier size.
Closes #85480