-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Retry ILM steps when transient or recoverable errors are encountered #48183
Comments
Pinging @elastic/es-core-features (:Core/Features/ILM+SLM) |
Removed team-discuss as we decided to make the Rollover action retryable by executing the rollover using one cluster state update (as opposed to 3 updates that are chained now, namely create rolled over index, update aliases, attach rollover information to the source index) |
I'll be interested to see the outcome of that, a while ago I experimented with condensing rollover into a single cluster state update for a different problem before going with a different solution. As I recall, the only issue I ran into was that (at least the way I did it, which may not be the best way) ended up having to duplicate some of the logic from the transport actions for creating an index, etc. Unfortunately I don't seem to have the branch around anymore. |
This commits makes the "init" ILM step retryable. It also adds a test where an index is created with a non-parsable index name and then fails. Related to elastic#48183
This commits makes the "init" ILM step retryable. It also adds a test where an index is created with a non-parsable index name and then fails. Related to #48183
This commits makes the "init" ILM step retryable. It also adds a test where an index is created with a non-parsable index name and then fails. Related to elastic#48183
This commits makes the "init" ILM step retryable. It also adds a test where an index is created with a non-parsable index name and then fails. Related to elastic#48183
Since #43246 makes force merging "best effort", are there any remaining plans to make errors in the force merging step retryable? They're still listed under the "ForceMergeAction" section in the body of this issue. |
@bczifra the goal is to eventually may every step retryable, including the force merge step |
This is a meta-issue to track and discuss the ILM steps that should be retryable and under which circumstances. This relates to the efforts on making the rollover action retryable (#44135 ) and the more general strategy ILM will employ in order to make actions more resilient and self-healing ( #42824 ).
Below are all the steps we use, grouped by actions (as we'll likely not treat steps differently depending in which actions they occur they are listed only once under the first action, ordered alphabetically, they're used in). The marker Terminal/Error steps are not listed.
Steps
AllocateAction
DeleteAction
ForceMergeAction
FreezeAction
RolloverAction
ShrinkAction
getNextStepKey()
depending on the outcome of a defined predicate. It performs no changes to the cluster stateUnfollowAction
Scope
Any action/step that can be made to be re-tried after a failure.
Duration
~ 2 months
The text was updated successfully, but these errors were encountered: