-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update CorruptedFileIT so that it passes with new allocation strategy #88314
Update CorruptedFileIT so that it passes with new allocation strategy #88314
Conversation
New allocation strategy is not going to retry failed shards. Update the test to not rely on that behaviour
Just confirmed this also passes on the feature branch |
Pinging @elastic/es-distributed (Team:Distributed) |
@elasticmachine please run elasticsearch-ci/part-2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, I left a couple of comments.
.put("index.routing.allocation.include._name", primariesNode.getName() + "," + unluckyNode.getName()) | ||
) | ||
.get(); | ||
ensureYellow("test"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm this would pass even if the index reaches green
health. I think we want to wait for no initialising shards and then genuinely assert that none of the shards are on the unlucky node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this call is waiting for no initializing shards internally. Added more detailed assertion below.
UPD: actually it is not, fixing...
server/src/internalClusterTest/java/org/elasticsearch/index/store/CorruptedFileIT.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* upstream/master: Pass IndexMetadata to AllocationDecider.can_remain (elastic#88453) [TSDB] Cache rollup bucket timestamp to reduce rounding cost (elastic#88420) Correct some typos/mistakes in comments/docs (elastic#88446) Make ClusterInfo use immutable maps in all cases (elastic#88447) Reduce map lookups (elastic#88418) Don't index geo_shape field in AbstractBuilderTestCase (elastic#88437) Remove usages of TestGeoShapeFieldMapperPlugin from enrich module (elastic#88440) Fix test memory leak (elastic#88362) Improve error when sorting on incompatible types (elastic#88399) Remove usages of BucketCollector#getLeafCollector(LeafReaderContext) (elastic#88414) Mute ReactiveStorageIT::testScaleWhileShrinking (elastic#88431) Clarify snapshot docs on archive indices (elastic#88417) [Stack Monitoring] Switch cgroup memory fields to keyword (elastic#88260) Fix RealmIdentifier XContent parser (elastic#88410) Make LoggedExec gradle task configuration cache compatible (elastic#87621) Update CorruptedFileIT so that it passes with new allocation strategy (elastic#88314) Update RareClusterStateIT to work with the new shards allocator (elastic#87922) Ensure CreateApiKey always creates a new document (elastic#88413) # Conflicts: # x-pack/plugin/rollup/src/main/java/org/elasticsearch/xpack/rollup/v2/RollupShardIndexer.java
`POST _cluster/reroute?retry_failed` doesn't reset the failure counter on any `INITIALIZING` shards, and waiting for no `INITIALIZING` shards isn't quite enough to ensure that we've finished all possible retries because there could instead be an ongoing async fetch. This commit fixes this using a `ClusterStateObserver` to observe the retry counter instead of using the cluster health action. Relates elastic#88314 Closes elastic#88615
`POST _cluster/reroute?retry_failed` doesn't reset the failure counter on any `INITIALIZING` shards, and waiting for no `INITIALIZING` shards isn't quite enough to ensure that we've finished all possible retries because there could instead be an ongoing async fetch. This commit fixes this using a `ClusterStateObserver` to observe the retry counter instead of using the cluster health action. Relates #88314 Closes #88615
New allocation strategy is not going to retry failed shards. Update the
test to not rely on that behavior
Rel: #86429