Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix testCorruptionOnNetworkLayer #88644

Conversation

DaveCTurner
Copy link
Contributor

@DaveCTurner DaveCTurner commented Jul 20, 2022

POST _cluster/reroute?retry_failed doesn't reset the failure counter
on any INITIALIZING shards, and waiting for no INITIALIZING shards
isn't quite enough to ensure that we've finished all possible retries
because there could instead be an ongoing async fetch.

This commit fixes this using a ClusterStateObserver to observe the
retry counter instead of using the cluster health action.

Relates #88314
Closes #88615

`POST _cluster/reroute?retry_failed` doesn't reset the failure counter
on any `INITIALIZING` shards, and waiting for no `INITIALIZING` shards
isn't quite enough to ensure that we've finished all possible retries
because there could instead be an ongoing async fetch.

This commit fixes this using a `ClusterStateObserver` to observe the
retry counter instead of using the cluster health action.

Relates elastic#88314
Closes elastic#88615
@DaveCTurner DaveCTurner added >test Issues or PRs that are addressing/adding tests :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) v8.4.0 v8.3.4 labels Jul 20, 2022
@DaveCTurner DaveCTurner requested a review from idegtiarenko July 20, 2022 12:20
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@elasticsearchmachine elasticsearchmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Jul 20, 2022
@DaveCTurner DaveCTurner merged commit 6bbe32f into elastic:master Jul 20, 2022
@DaveCTurner DaveCTurner deleted the 2022-07-20-fix-testCorruptionOnNetworkLayer branch July 20, 2022 15:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. >test Issues or PRs that are addressing/adding tests v8.4.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CI] CorruptedFileIT testCorruptionOnNetworkLayer failing
3 participants