-
Notifications
You must be signed in to change notification settings - Fork 24.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Limit retries of failed allocations per index #18467
Limit retries of failed allocations per index #18467
Conversation
Today if a shard fails during initialization phase due to misconfiguration, broken disks, missing analyzers, not installed plugins etc. elasticsaerch keeps on trying to initialize or rather allocate that shard. Yet, in the worst case scenario this ends in an endless allocation loop. To prevent this loop and all it's sideeffects like spamming log files over and over again this commit adds an allocation decider that stops allocating a shard that failed more than N times in a row to allocate. The number or retries can be configured via `index.allocation.max_retry` and it's default is set to `5`. Once the setting is updated shards with less failures than the number set per index will be allowed to allocate again. Internally we maintain a counter on the UnassignedInfo that is reset to `0` once the shards has been started. Relates to elastic#18417
@ywelsch can you take a look? |
this.reason = reason; | ||
this.unassignedTimeMillis = unassignedTimeMillis; | ||
this.unassignedTimeNanos = unassignedTimeNanos; | ||
this.lastComputedLeftDelayNanos = 0L; | ||
this.message = message; | ||
this.failure = failure; | ||
this.failedAllocations = failedAllocations; | ||
assert failedAllocations > 0 && reason == Reason.ALLOCATION_FAILED || failedAllocations == 0 && reason != Reason.ALLOCATION_FAILED: | ||
"failedAllocations: " + 0 + " for reason " + reason; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This 0
is hardcoded, I think this was supposed to be failedAllocation
instead
I know you didn't ask for my review, but I left some comments regardless :) |
int maxRetry = SETTING_ALLOCATION_MAX_RETRY.get(indexSafe.getSettings()); | ||
if (unassignedInfo.getNumFailedAllocations() >= maxRetry) { | ||
return allocation.decision(Decision.NO, NAME, "shard has already failed allocating [" | ||
+ unassignedInfo.getNumFailedAllocations() + "] times"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will it be nice to show the last failure here as well? this will help explain how we got here.
this.reason = reason; | ||
this.unassignedTimeMillis = unassignedTimeMillis; | ||
this.unassignedTimeNanos = unassignedTimeNanos; | ||
this.lastComputedLeftDelayNanos = 0L; | ||
this.message = message; | ||
this.failure = failure; | ||
this.failedAllocations = failedAllocations; | ||
assert failedAllocations > 0 && reason == Reason.ALLOCATION_FAILED || failedAllocations == 0 && reason != Reason.ALLOCATION_FAILED: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just (assert failedAllocations > 0) == (reason == Reason.ALLOCATION_FAILED)
?
* append unassigned info to allocaiton decision.
} | ||
|
||
public UnassignedInfo readFrom(StreamInput in) throws IOException { | ||
return new UnassignedInfo(in); | ||
} | ||
|
||
/** | ||
* Retruns the number of previously failed allocations of this shard. | ||
*/ | ||
public int getNumFailedAllocations() {return failedAllocations;} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some newlines are ok here :-)
public Decision canAllocate(ShardRouting shardRouting, RoutingAllocation allocation) { | ||
UnassignedInfo unassignedInfo = shardRouting.unassignedInfo(); | ||
if (unassignedInfo != null && unassignedInfo.getNumFailedAllocations() > 0) { | ||
IndexMetaData indexSafe = allocation.metaData().getIndexSafe(shardRouting.index()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just call this variable indexMetaData
?
LGTM. Thx @s1monw |
assertEquals(routingTable.index("idx").shards().size(), 1); | ||
assertEquals(routingTable.index("idx").shard(0).shards().get(0).state(), INITIALIZING); | ||
// now fail it 4 times - 5 retries is default | ||
for (int i = 0; i < 4; i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could parameterize the test on the number of retries. alternatively I would suggest using the setting SETTING_ALLOCATION_MAX_RETRY.get(settings)
explicitly here instead of hardcoded 4.
Left minor comments but LGTM o.w. |
@clintongormley @ywelsch @bleskes I pushed a new commit that adds a |
@@ -82,13 +83,30 @@ public ClusterRerouteRequest explain(boolean explain) { | |||
} | |||
|
|||
/** | |||
* Sets the retry failed flag (defaults to <tt>false</tt>). If true, the | |||
* request will retry allocating shards that are currently can't be allocated due to too many allocation failures. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/that are currently can't be allocated
/that can't currently be allocated
/
Found a small issue. LGTM after fixing this. Thanks @s1monw! |
@@ -103,3 +103,15 @@ are available: | |||
To ensure that these implications are well-understood, | |||
this command requires the special field `accept_data_loss` to be | |||
explicitly set to `true` for it to work. | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add [float]
before the header
LGTM2 . I wonder how we can rest test this. It's tricky and I'm not sure it's worth it to have a simple call with true to retry_after. I'm good with pushing as is - unless someone has a good idea. |
@bleskes I thought about it and I think the major problem is to wait for state. I think we should rather try to unittest this so I added a unittest for serialization (found a bug) and for the master side of things on the reroute command. I think we are ready, will push soon |
* master: (158 commits) Document the hack Refactor property placeholder use of env. vars Force java9 log4j hack in testing Fix log4j buggy java version detection Make java9 work again Don't mkdir directly in deb init script Fix env. var placeholder test so it's reproducible Remove ScriptMode class in favor of boolean true/false [rest api spec] fix doc urls Netty request/response tracer should wait for send Filter client/server VM options from jvm.options [rest api spec] fix url for reindex api docs Remove use of a Fields class in snapshot responses that contains x-content keys, in favor of declaring/using the keys directly. Limit retries of failed allocations per index (#18467) Proxy box method to use valueOf. Use the build-in valueOf method instead of the custom one. Fixed tests and added a comment to the box method. Fix boxing. Do not decode path when sending error Fix race condition in snapshot initialization ...
With #17187, we verified IndexService creation during initial state recovery on the master and if the recovery failed the index was imported as closed, not allocating any shards. This was mainly done to prevent endless allocation loops and full log files on data-nodes when the indexmetadata contained broken settings / analyzers. Zen2 loads the cluster state eagerly, and this check currently runs on all nodes (not only the elected master), which can significantly slow down startup on data nodes. Furthermore, with replicated closed indices (#33888) on the horizon, importing the index as closed will no longer not allocate any shards. Fortunately, the original issue for endless allocation loops is no longer a problem due to #18467, where we limit the retries of failed allocations. The solution here is therefore to just undo #17187, as it's no longer necessary, and covered by #18467, which will solve the issue for Zen2 and replicated closed indices as well.
With #17187, we verified IndexService creation during initial state recovery on the master and if the recovery failed the index was imported as closed, not allocating any shards. This was mainly done to prevent endless allocation loops and full log files on data-nodes when the indexmetadata contained broken settings / analyzers. Zen2 loads the cluster state eagerly, and this check currently runs on all nodes (not only the elected master), which can significantly slow down startup on data nodes. Furthermore, with replicated closed indices (#33888) on the horizon, importing the index as closed will no longer not allocate any shards. Fortunately, the original issue for endless allocation loops is no longer a problem due to #18467, where we limit the retries of failed allocations. The solution here is therefore to just undo #17187, as it's no longer necessary, and covered by #18467, which will solve the issue for Zen2 and replicated closed indices as well.
With #17187, we verified IndexService creation during initial state recovery on the master and if the recovery failed the index was imported as closed, not allocating any shards. This was mainly done to prevent endless allocation loops and full log files on data-nodes when the indexmetadata contained broken settings / analyzers. Zen2 loads the cluster state eagerly, and this check currently runs on all nodes (not only the elected master), which can significantly slow down startup on data nodes. Furthermore, with replicated closed indices (#33888) on the horizon, importing the index as closed will no longer not allocate any shards. Fortunately, the original issue for endless allocation loops is no longer a problem due to #18467, where we limit the retries of failed allocations. The solution here is therefore to just undo #17187, as it's no longer necessary, and covered by #18467, which will solve the issue for Zen2 and replicated closed indices as well.
Today if a shard fails during initialization phase due to misconfiguration, broken disks,
missing analyzers, not installed plugins etc. elasticsaerch keeps on trying to initialize
or rather allocate that shard. Yet, in the worst case scenario this ends in an endless
allocation loop. To prevent this loop and all it's sideeffects like spamming log files over
and over again this commit adds an allocation decider that stops allocating a shard that
failed more than N times in a row to allocate. The number or retries can be configured via
index.allocation.max_retry
and it's default is set to5
. Once the setting is updatedshards with less failures than the number set per index will be allowed to allocate again.
Internally we maintain a counter on the UnassignedInfo that is reset to
0
once the shardshas been started.
Relates to #18417