Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enforce higher priority for RepositoriesService ClusterStateApplier #58808

Merged

Conversation

fcofdez
Copy link
Contributor

@fcofdez fcofdez commented Jul 1, 2020

This avoids shards allocation failures when the repository instance
comes in the same ClusterState update as the shard allocation.

This avoids shards allocation failures when the repository instance
comes in the same ClusterState update as the shard allocation.
@fcofdez fcofdez added >enhancement :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. labels Jul 1, 2020
@fcofdez fcofdez marked this pull request as ready for review July 1, 2020 11:24
@fcofdez fcofdez requested a review from ywelsch July 1, 2020 11:24
Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left some minor comments, o.w. looking good.

import static org.hamcrest.Matchers.greaterThan;
import static org.hamcrest.Matchers.is;

@ESIntegTestCase.ClusterScope(scope = TEST, numDataNodes = 0, autoManageMasterNodes = false)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why autoManageMasterNodes = false?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was the only scenario where I was able to reproduce the failure consistently. I was missing the gateway.recover_after_data_nodes piece

internalCluster().fullRestart();

List<UnassignedInfo.Reason> unassignedReasons = new ArrayList<>();
internalCluster().clusterService().addListener(event -> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this racing against state recovery and the actual allocation taking place? We might be adding the listener too late. Perhaps we should delay state recovery (for example by setting gateway.recover_after_data_nodes to 3 on restart, and start up a third data node after the listener is registered (on the master)).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't aware of that setting, I'll use that approach.

@fcofdez fcofdez requested a review from ywelsch July 2, 2020 14:44
Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@fcofdez fcofdez added the v8.0.0 label Jul 3, 2020
@fcofdez fcofdez merged commit 6cd9770 into elastic:master Jul 3, 2020
@fcofdez
Copy link
Contributor Author

fcofdez commented Jul 3, 2020

@ywelsch should we backport this to 7.9?

@ywelsch ywelsch added the v7.9.0 label Jul 3, 2020
@ywelsch
Copy link
Contributor

ywelsch commented Jul 3, 2020

yes

fcofdez added a commit to fcofdez/elasticsearch that referenced this pull request Jul 5, 2020
This avoids shards allocation failures when the repository instance
comes in the same ClusterState update as the shard allocation.

Backport of elastic#58808
fcofdez added a commit that referenced this pull request Jul 7, 2020
…59040)

* Enforce higher priority for RepositoriesService ClusterStateApplier

This avoids shards allocation failures when the repository instance
comes in the same ClusterState update as the shard allocation.

Backport of #58808
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. >enhancement Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. v7.9.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants