-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CI] SnapshotBasedRecoveryIT testSnapshotBasedRecovery failing #76595
Comments
Pinging @elastic/es-distributed (Team:Distributed) |
Selectively muting parts of the rolling upgrade test for recover from snapshot. Relates elastic#76595
The failures are genuine. My initial analysis points to the primary ending up on the upgraded node. This is surprising (but may have a valid reason once we dig deeper). I muted this selectively in #76601, with that we should still be validating rolling upgrade works with recovery from snapshot (though less frequently). |
Selectively muting parts of the rolling upgrade test for recover from snapshot. Relates #76595
Selectively muting parts of the rolling upgrade test for recover from snapshot. Relates #76595
Pasting David's comment from #76601 here: I suspect the problem is caused by a rebalance moving the primary onto the newly-upgraded node, but I haven't seen a failure in captivity to confirm that yet. If so I think we could do something a bit stronger here, e.g. apply an allocation filter to exclude the solitary upgraded node, then explicitly cancel any shards it holds to promote a replica on the old nodes, and then remove replicas. |
Move the shard to a replica in an older version when the primary is located in the upgraded node during the first rolling restart round. Closes elastic#76595
Move the shard to a replica in an older version when the primary is located in the upgraded node during the first rolling restart round. Closes #76595
Move the shard to a replica in an older version when the primary is located in the upgraded node during the first rolling restart round. Closes elastic#76595
Normally I'd skip timeout errors because they are often just due to luck. But this test is new and already failed 38 times in past few days.
Build scan:
https://gradle-enterprise.elastic.co/s/523u2dz3olx3u
Repro line:
./gradlew ':qa:rolling-upgrade:v7.14.1#oneThirdUpgradedTest' -Dtests.class="org.elasticsearch.upgrades.SnapshotBasedRecoveryIT" -Dtests.method="testSnapshotBasedRecovery" -Dtests.seed=15D06BC7AFD0B42F -Dtests.bwc=true -Dtests.locale=ca-ES -Dtests.timezone=Africa/Banjul -Druntime.java=8
Reproduces locally?:
No
Applicable branches:
7.x
Failure history:
https://gradle-enterprise.elastic.co/scans/tests?search.relativeStartTime=P7D&search.timeZoneId=Australia/Melbourne&tests.container=org.elasticsearch.upgrades.SnapshotBasedRecoveryIT&tests.sortField=FAILED&tests.test=testSnapshotBasedRecovery&tests.unstableOnly=true
Failure excerpt:
The text was updated successfully, but these errors were encountered: