-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make the TransportRolloverAction execute in one cluster state update #50388
Make the TransportRolloverAction execute in one cluster state update #50388
Conversation
This commit makes the rollover more resilient, by having it execute only one cluster state update that creates the new (rollover index), rolls over the alias from the source to the target index and set the RolloverInfo on the source index. Before these 3 steps were represented as 3 chained cluster state updates, which would've seen the user manually intervene if, say, the alias rollover cluster state update (second in the chain) failed but the creation of the rollover index (first in the chain) update succeeded
Pinging @elastic/es-core-features (:Core/Features/ILM+SLM) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @andreidan, I left two minor comments and one about moving part of this to a separate PR.
rolloverIndexName, rolloverRequest); | ||
CreateIndexClusterStateUpdateRequest createIndexRequest = prepareCreateIndexRequest(unresolvedName, | ||
rolloverIndexName, rolloverRequest); | ||
clusterService.submitStateUpdateTask("rollover_index", new ClusterStateUpdateTask() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add the index name into the task string?
}, listener::onFailure)); | ||
@Override | ||
public void clusterStateProcessed(String source, ClusterState oldState, ClusterState newState) { | ||
activeShardsObserver.waitForActiveShards(new String[]{rolloverIndexName}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't expect it to happen (it would be a bad situation if it did), but it might be a good idea to wrap this in:
if (newState.equals(oldState) == false) {
...
}
@@ -30,6 +30,11 @@ public RolloverStep(StepKey key, StepKey nextStepKey, Client client) { | |||
super(key, nextStepKey, client); | |||
} | |||
|
|||
@Override | |||
public boolean isRetryable() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should do this in a separate PR. We still need to solve the complexity around retrying AsyncAction
steps, which will require slightly different execution than the other step types, due to their exactly-once re-invocation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we satisfy the exactly-once re-invocation already. A failed AsyncActionStep
will be moved into the ErrorStep
on failure and on the next ILM periodic loop we'll move the index policy back into the failed step (the AsyncActionStep
) which will be executed on the next async steps execution cycle (on a master change event or the subsequent periodic ILM loop). Am I missing something here? I'm happy to separate the PRs either way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AsyncActionStep
s don't get executed by the periodic loop invocation, meaning they will never be executed except on a master change event (which should be exceedingly rare). I was saying that we need to figure out the best way to make sure we can execute AsyncAction
steps when they do an automatic retry.
To figure out this logic I was thinking a different PR would be better since it would likely mean extra testing for however we decide we want invoke AsyncAction
steps (it may be as simple as adding the invocation in the onProcessed
method of the ilm-retry-failed-step
cluster state update, but we'll have to test).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, you're right Lee, I mixed the AsyncWaitStep
with the AsyncActionStep
(async and step was all my brain read)
@elasticmachine update branch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks Andrei
…lastic#50388) This commit makes the TransportRolloverAction more resilient, by having it execute only one cluster state update that creates the new (rollover index), rolls over the alias from the source to the target index and set the RolloverInfo on the source index. Before these 3 steps were represented as 3 chained cluster state updates, which would've seen the user manually intervene if, say, the alias rollover cluster state update (second in the chain) failed but the creation of the rollover index (first in the chain) update succeeded * Rename innerExecute to applyAliasActions (cherry picked from commit 1ba4339) Signed-off-by: Andrei Dan <[email protected]>
…50388) (#50442) This commit makes the TransportRolloverAction more resilient, by having it execute only one cluster state update that creates the new (rollover index), rolls over the alias from the source to the target index and set the RolloverInfo on the source index. Before these 3 steps were represented as 3 chained cluster state updates, which would've seen the user manually intervene if, say, the alias rollover cluster state update (second in the chain) failed but the creation of the rollover index (first in the chain) update succeeded * Rename innerExecute to applyAliasActions (cherry picked from commit 1ba4339) Signed-off-by: Andrei Dan <[email protected]>
Hi @andreidan, |
Hi @shwetathareja, thank you for your interest and I'm sorry to hear you're running into issues with the rollover action. |
…lastic#50388) This commit makes the TransportRolloverAction more resilient, by having it execute only one cluster state update that creates the new (rollover index), rolls over the alias from the source to the target index and set the RolloverInfo on the source index. Before these 3 steps were represented as 3 chained cluster state updates, which would've seen the user manually intervene if, say, the alias rollover cluster state update (second in the chain) failed but the creation of the rollover index (first in the chain) update succeeded * Rename innerExecute to applyAliasActions Co-authored-by: Elastic Machine <[email protected]>
This commit makes the TransportRolloverAction more resilient, by having it
execute only one cluster state update that creates the new (rollover index), rolls
over the alias from the source to the target index and set the RolloverInfo on the
source index. Before, these 3 steps were represented as 3 chained cluster state
updates, which would've seen the user manually intervene if, say, the alias
rollover cluster state update (second in the chain) failed but the creation of
the rollover index (first in the chain) update succeeded