Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update CorruptedFileIT so that it passes with new allocation strategy #88314

Merged

Conversation

idegtiarenko
Copy link
Contributor

New allocation strategy is not going to retry failed shards. Update the
test to not rely on that behavior

Rel: #86429

New allocation strategy is not going to retry failed shards. Update the
test to not rely on that behaviour
@idegtiarenko idegtiarenko added >test Issues or PRs that are addressing/adding tests :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. v8.4.0 labels Jul 6, 2022
@idegtiarenko idegtiarenko requested a review from DaveCTurner July 6, 2022 14:20
@idegtiarenko
Copy link
Contributor Author

Just confirmed this also passes on the feature branch

@idegtiarenko idegtiarenko marked this pull request as ready for review July 6, 2022 14:34
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@idegtiarenko
Copy link
Contributor Author

@elasticmachine please run elasticsearch-ci/part-2

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, I left a couple of comments.

.put("index.routing.allocation.include._name", primariesNode.getName() + "," + unluckyNode.getName())
)
.get();
ensureYellow("test");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm this would pass even if the index reaches green health. I think we want to wait for no initialising shards and then genuinely assert that none of the shards are on the unlucky node.

Copy link
Contributor Author

@idegtiarenko idegtiarenko Jul 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this call is waiting for no initializing shards internally. Added more detailed assertion below.
UPD: actually it is not, fixing...

@idegtiarenko idegtiarenko requested a review from DaveCTurner July 7, 2022 15:15
Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@idegtiarenko idegtiarenko merged commit f99ee51 into elastic:master Jul 11, 2022
@idegtiarenko idegtiarenko deleted the fix_corruption_on_network_layer branch July 11, 2022 06:07
weizijun added a commit to weizijun/elasticsearch that referenced this pull request Jul 12, 2022
* upstream/master:
  Pass IndexMetadata to AllocationDecider.can_remain (elastic#88453)
  [TSDB] Cache rollup bucket timestamp to reduce rounding cost (elastic#88420)
  Correct some typos/mistakes in comments/docs (elastic#88446)
  Make ClusterInfo use immutable maps in all cases (elastic#88447)
  Reduce map lookups (elastic#88418)
  Don't index geo_shape field in AbstractBuilderTestCase (elastic#88437)
  Remove usages of TestGeoShapeFieldMapperPlugin from enrich module (elastic#88440)
  Fix test memory leak (elastic#88362)
  Improve error when sorting on incompatible types (elastic#88399)
  Remove usages of BucketCollector#getLeafCollector(LeafReaderContext) (elastic#88414)
  Mute ReactiveStorageIT::testScaleWhileShrinking (elastic#88431)
  Clarify snapshot docs on archive indices (elastic#88417)
  [Stack Monitoring] Switch cgroup memory fields to keyword (elastic#88260)
  Fix RealmIdentifier XContent parser (elastic#88410)
  Make LoggedExec gradle task configuration cache compatible (elastic#87621)
  Update CorruptedFileIT so that it passes with new allocation strategy (elastic#88314)
  Update RareClusterStateIT to work with the new shards allocator (elastic#87922)
  Ensure CreateApiKey always creates a new document (elastic#88413)

# Conflicts:
#	x-pack/plugin/rollup/src/main/java/org/elasticsearch/xpack/rollup/v2/RollupShardIndexer.java
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this pull request Jul 20, 2022
`POST _cluster/reroute?retry_failed` doesn't reset the failure counter
on any `INITIALIZING` shards, and waiting for no `INITIALIZING` shards
isn't quite enough to ensure that we've finished all possible retries
because there could instead be an ongoing async fetch.

This commit fixes this using a `ClusterStateObserver` to observe the
retry counter instead of using the cluster health action.

Relates elastic#88314
Closes elastic#88615
DaveCTurner added a commit that referenced this pull request Jul 20, 2022
`POST _cluster/reroute?retry_failed` doesn't reset the failure counter
on any `INITIALIZING` shards, and waiting for no `INITIALIZING` shards
isn't quite enough to ensure that we've finished all possible retries
because there could instead be an ongoing async fetch.

This commit fixes this using a `ClusterStateObserver` to observe the
retry counter instead of using the cluster health action.

Relates #88314
Closes #88615
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. >test Issues or PRs that are addressing/adding tests v8.4.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants