Update CorruptedFileIT so that it passes with new allocation strategy #88314

idegtiarenko · 2022-07-06T14:20:33Z

New allocation strategy is not going to retry failed shards. Update the
test to not rely on that behavior

Rel: #86429

New allocation strategy is not going to retry failed shards. Update the test to not rely on that behaviour

idegtiarenko · 2022-07-06T14:34:11Z

Just confirmed this also passes on the feature branch

elasticmachine · 2022-07-06T14:34:17Z

Pinging @elastic/es-distributed (Team:Distributed)

idegtiarenko · 2022-07-06T14:52:34Z

@elasticmachine please run elasticsearch-ci/part-2

DaveCTurner

Looks good, I left a couple of comments.

DaveCTurner · 2022-07-06T15:22:43Z

server/src/internalClusterTest/java/org/elasticsearch/index/store/CorruptedFileIT.java

+                    .put("index.routing.allocation.include._name", primariesNode.getName() + "," + unluckyNode.getName())
+            )
+            .get();
+        ensureYellow("test");


Hmm this would pass even if the index reaches green health. I think we want to wait for no initialising shards and then genuinely assert that none of the shards are on the unlucky node.

I believe this call is waiting for no initializing shards internally. Added more detailed assertion below.
UPD: actually it is not, fixing...

server/src/internalClusterTest/java/org/elasticsearch/index/store/CorruptedFileIT.java

DaveCTurner

LGTM

* upstream/master: Pass IndexMetadata to AllocationDecider.can_remain (elastic#88453) [TSDB] Cache rollup bucket timestamp to reduce rounding cost (elastic#88420) Correct some typos/mistakes in comments/docs (elastic#88446) Make ClusterInfo use immutable maps in all cases (elastic#88447) Reduce map lookups (elastic#88418) Don't index geo_shape field in AbstractBuilderTestCase (elastic#88437) Remove usages of TestGeoShapeFieldMapperPlugin from enrich module (elastic#88440) Fix test memory leak (elastic#88362) Improve error when sorting on incompatible types (elastic#88399) Remove usages of BucketCollector#getLeafCollector(LeafReaderContext) (elastic#88414) Mute ReactiveStorageIT::testScaleWhileShrinking (elastic#88431) Clarify snapshot docs on archive indices (elastic#88417) [Stack Monitoring] Switch cgroup memory fields to keyword (elastic#88260) Fix RealmIdentifier XContent parser (elastic#88410) Make LoggedExec gradle task configuration cache compatible (elastic#87621) Update CorruptedFileIT so that it passes with new allocation strategy (elastic#88314) Update RareClusterStateIT to work with the new shards allocator (elastic#87922) Ensure CreateApiKey always creates a new document (elastic#88413) # Conflicts: # x-pack/plugin/rollup/src/main/java/org/elasticsearch/xpack/rollup/v2/RollupShardIndexer.java

`POST _cluster/reroute?retry_failed` doesn't reset the failure counter on any `INITIALIZING` shards, and waiting for no `INITIALIZING` shards isn't quite enough to ensure that we've finished all possible retries because there could instead be an ongoing async fetch. This commit fixes this using a `ClusterStateObserver` to observe the retry counter instead of using the cluster health action. Relates elastic#88314 Closes elastic#88615

`POST _cluster/reroute?retry_failed` doesn't reset the failure counter on any `INITIALIZING` shards, and waiting for no `INITIALIZING` shards isn't quite enough to ensure that we've finished all possible retries because there could instead be an ongoing async fetch. This commit fixes this using a `ClusterStateObserver` to observe the retry counter instead of using the cluster health action. Relates #88314 Closes #88615

Update CorruptedFileIT so that it passes with new allocation strategy

302f14f

New allocation strategy is not going to retry failed shards. Update the test to not rely on that behaviour

idegtiarenko requested a review from DaveCTurner July 6, 2022 14:20

idegtiarenko marked this pull request as ready for review July 6, 2022 14:34

DaveCTurner reviewed Jul 6, 2022

View reviewed changes

idegtiarenko added 2 commits July 7, 2022 16:17

Merge branch 'master' into fix_corruption_on_network_layer

7da04a1

fix comments

237a357

idegtiarenko requested a review from DaveCTurner July 7, 2022 15:15

update health condition

3812de6

DaveCTurner approved these changes Jul 8, 2022

View reviewed changes

Merge branch 'master' into fix_corruption_on_network_layer

0d8ef89

idegtiarenko merged commit f99ee51 into elastic:master Jul 11, 2022

idegtiarenko deleted the fix_corruption_on_network_layer branch July 11, 2022 06:07

DaveCTurner mentioned this pull request Jul 20, 2022

Fix testCorruptionOnNetworkLayer #88644

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update CorruptedFileIT so that it passes with new allocation strategy #88314

Update CorruptedFileIT so that it passes with new allocation strategy #88314

idegtiarenko commented Jul 6, 2022

idegtiarenko commented Jul 6, 2022

elasticmachine commented Jul 6, 2022

idegtiarenko commented Jul 6, 2022

DaveCTurner left a comment

DaveCTurner Jul 6, 2022

idegtiarenko Jul 7, 2022 •

edited

Loading

DaveCTurner left a comment

Update CorruptedFileIT so that it passes with new allocation strategy #88314

Update CorruptedFileIT so that it passes with new allocation strategy #88314

Conversation

idegtiarenko commented Jul 6, 2022

idegtiarenko commented Jul 6, 2022

elasticmachine commented Jul 6, 2022

idegtiarenko commented Jul 6, 2022

DaveCTurner left a comment

Choose a reason for hiding this comment

DaveCTurner Jul 6, 2022

Choose a reason for hiding this comment

idegtiarenko Jul 7, 2022 • edited Loading

Choose a reason for hiding this comment

DaveCTurner left a comment

Choose a reason for hiding this comment

idegtiarenko Jul 7, 2022 •

edited

Loading