-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enforce higher priority for RepositoriesService ClusterStateApplier #58808
Enforce higher priority for RepositoriesService ClusterStateApplier #58808
Conversation
This avoids shards allocation failures when the repository instance comes in the same ClusterState update as the shard allocation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've left some minor comments, o.w. looking good.
import static org.hamcrest.Matchers.greaterThan; | ||
import static org.hamcrest.Matchers.is; | ||
|
||
@ESIntegTestCase.ClusterScope(scope = TEST, numDataNodes = 0, autoManageMasterNodes = false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why autoManageMasterNodes = false?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was the only scenario where I was able to reproduce the failure consistently. I was missing the gateway.recover_after_data_nodes
piece
.../test/java/org/elasticsearch/xpack/searchablesnapshots/ClusterStateApplierOrderingTests.java
Outdated
Show resolved
Hide resolved
.../test/java/org/elasticsearch/xpack/searchablesnapshots/ClusterStateApplierOrderingTests.java
Show resolved
Hide resolved
internalCluster().fullRestart(); | ||
|
||
List<UnassignedInfo.Reason> unassignedReasons = new ArrayList<>(); | ||
internalCluster().clusterService().addListener(event -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this racing against state recovery and the actual allocation taking place? We might be adding the listener too late. Perhaps we should delay state recovery (for example by setting gateway.recover_after_data_nodes
to 3 on restart, and start up a third data node after the listener is registered (on the master)).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't aware of that setting, I'll use that approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@ywelsch should we backport this to 7.9? |
yes |
This avoids shards allocation failures when the repository instance comes in the same ClusterState update as the shard allocation. Backport of elastic#58808
This avoids shards allocation failures when the repository instance
comes in the same ClusterState update as the shard allocation.