-
Notifications
You must be signed in to change notification settings - Fork 24.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TransportResyncReplicationAction should not honour blocks #35795
Conversation
Pinging @elastic/es-distributed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -102,6 +103,18 @@ protected void sendReplicaRequest( | |||
} | |||
} | |||
|
|||
@Override | |||
protected ClusterBlockLevel globalBlockLevel() { | |||
// resync should never be blocked |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe add here "because it's an internal action"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure.
Thanks @ywelsch |
After #35332 has been merged, we noticed some test failures like #35597 in which one or more replica shards failed to be promoted as primaries because the primary replica re-synchronization never succeed. After some digging it appeared that the execution of the resync action was blocked because of the presence of a global cluster block in the cluster state (in this case, the "no master" block), making the resync action to fail when executed on the primary. Until #35332 such failures never happened because the TransportResyncReplicationAction is skipping the reroute phase, the only place where blocks were checked. Now with #35332 blocks are checked during reroute and also during the execution of the transport replication action on the primary. After some internal discussion, we decided that the TransportResyncReplicationAction should never be blocked. This action is part of the replica to primary promotion and makes sure that replicas are in sync and should not be blocked when the cluster state has no master or when the index is read only. This commit changes the TransportResyncReplicationAction to make obvious that it does not honor blocks. It also adds a simple test that fails if the resync action is blocked during the primary action execution. Closes #35597
After #35332 has been merged, we noticed some test failures like #35597 in which one or more replica shards failed to be promoted as primaries because the primary replica re-synchronization never succeed.
After some digging it appeared that the execution of the resync action was blocked because of the presence of a global cluster block in the cluster state (in this case, the "no master" block), making the resync action to fail when executed on the primary.
Until #35332 such failures never happened because the
TransportResyncReplicationAction
is skipping the reroute phase, the only place where blocks were checked. Now with #35332 blocks are checked during reroute and also during the execution of the transport replication action on the primary. After some internal discussion, we decided that theTransportResyncReplicationAction
should never be blocked. This action is part of the replica to primary promotion and makes sure that replicas are in sync and should not be blocked when the cluster state has no master or when the index is read only.This pull request changes the
TransportResyncReplicationAction
to make obvious that it does not honor blocks. It also adds a simple test that fails if the resync action is blocked during the primary action execution.Closes #35597