-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wipe Snapshots Before Indices in RestTests #39662
Wipe Snapshots Before Indices in RestTests #39662
Conversation
* If we have a snapshot ongoing from the previous test and enter this method, then deleting the indices fails, which in turn fails the whole wipe * Fixed by first deleting/aborting snapshots
Pinging @elastic/es-core-infra |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I am concerned that this might miss bugs. I don't yet understand the failure. Here are some (edited) logs:
Suite: org.elasticsearch.backwards.MixedClusterClientYamlTestSuiteIT
1> [2562-03-04T13:32:52,610][INFO ][o.e.b.MixedClusterClientYamlTestSuiteIT] [test] [p0=indices.get_alias/10_basic/Get all aliases via /_alias] before test
1> [2562-03-04T13:32:52,616][INFO ][o.e.b.MixedClusterClientYamlTestSuiteIT] [test] initializing REST clients against [http://[::1]:40899, http://[::1]:41395, http://[::1]:41282, http://[::1]:44755]
1> [2562-03-04T13:32:52,786][INFO ][o.e.b.MixedClusterClientYamlTestSuiteIT] [test] initializing yaml client, minimum es version: [5.6.1] master version: [5.6.1] hosts: [http://[::1]:40899, http://[::1]:41395, http://[::1]:41282, http://[::1]:44755]
1> [2562-03-04T13:32:52,844][INFO ][o.e.b.MixedClusterClientYamlTestSuiteIT] [test] [p0=indices.get_alias/10_basic/Get all aliases via /_alias] after test
1> [2562-03-04T13:32:52,875][INFO ][o.e.b.MixedClusterClientYamlTestSuiteIT] [test] [p0=search.aggregation/110_max_metric/Missing field with missing param] before test
1> [2562-03-04T13:32:53,385][INFO ][o.e.b.MixedClusterClientYamlTestSuiteIT] [test] [p0=search.aggregation/110_max_metric/Missing field with missing param] after test
1> [2562-03-04T13:32:53,400][INFO ][o.e.b.MixedClusterClientYamlTestSuiteIT] [test] [p0=search.aggregation/130_sum_metric/Aggregating wrong datatype test] before test
1> [2562-03-04T13:32:54,108][INFO ][o.e.b.MixedClusterClientYamlTestSuiteIT] [test] [p0=search.aggregation/130_sum_metric/Aggregating wrong datatype test] after test
1> [2562-03-04T13:32:54,133][INFO ][o.e.b.MixedClusterClientYamlTestSuiteIT] [test] [p0=search.aggregation/180_percentiles_tdigest_metric/Metadata test] before test
1> [2562-03-04T13:32:54,164][INFO ][o.e.b.MixedClusterClientYamlTestSuiteIT] [test] [p0=search.aggregation/180_percentiles_tdigest_metric/Metadata test] after test
1> [2562-03-04T13:32:54,183][INFO ][o.e.b.MixedClusterClientYamlTestSuiteIT] [test] [p0=delete/20_internal_version/Internal version] before test
1> [2562-03-04T13:32:54,213][INFO ][o.e.b.MixedClusterClientYamlTestSuiteIT] [test] [p0=delete/20_internal_version/Internal version] after test
1> [2562-03-04T13:32:54,239][INFO ][o.e.b.MixedClusterClientYamlTestSuiteIT] [test] [p0=indices.open/10_basic/Open index with wait_for_active_shards set to all] before test
1> [2562-03-04T13:32:54,270][INFO ][o.e.b.MixedClusterClientYamlTestSuiteIT] [test] [p0=indices.open/10_basic/Open index with wait_for_active_shards set to all] after test
1> [2562-03-04T13:32:54,281][INFO ][o.e.b.MixedClusterClientYamlTestSuiteIT] [test] [p0=indices.get_mapping/30_missing_index/Raise 404 when index doesn't exist] before test
1> [2562-03-04T13:32:54,322][INFO ][o.e.b.MixedClusterClientYamlTestSuiteIT] [test] [p0=indices.get_mapping/30_missing_index/Raise 404 when index doesn't exist] after test
1> [2562-03-04T13:32:54,332][INFO ][o.e.b.MixedClusterClientYamlTestSuiteIT] [test] [p0=indices.rollover/10_basic/Rollover no condition matched] before test
1> [2562-03-04T13:32:54,693][INFO ][o.e.b.MixedClusterClientYamlTestSuiteIT] [test] [p0=indices.rollover/10_basic/Rollover no condition matched] after test
1> [2562-03-04T13:32:54,707][INFO ][o.e.b.MixedClusterClientYamlTestSuiteIT] [test] [p0=indices.stats/13_fields/Fields - star] before test
1> [2562-03-04T13:32:55,274][INFO ][o.e.b.MixedClusterClientYamlTestSuiteIT] [test] [p0=indices.stats/13_fields/Fields - star] after test
1> [2562-03-04T13:32:55,288][INFO ][o.e.b.MixedClusterClientYamlTestSuiteIT] [test] [p0=cluster.reroute/11_explain/Explain API with empty command list] before test
1> [2562-03-04T13:32:55,518][INFO ][o.e.b.MixedClusterClientYamlTestSuiteIT] [test] [p0=cluster.reroute/11_explain/Explain API with empty command list] after test
1> [2562-03-04T13:32:55,532][INFO ][o.e.b.MixedClusterClientYamlTestSuiteIT] [test] [p0=search/110_field_collapsing/no hits and inner_hits] before test
1> [2562-03-04T13:32:56,276][INFO ][o.e.b.MixedClusterClientYamlTestSuiteIT] [test] [p0=search/110_field_collapsing/no hits and inner_hits] after test
1> [2562-03-04T13:32:56,293][INFO ][o.e.b.MixedClusterClientYamlTestSuiteIT] [test] [p0=indices.get_mapping/70_legacy_multi_type/Get /index*/_mapping/{type}] before test
1> [2562-03-04T13:32:57,054][INFO ][o.e.b.MixedClusterClientYamlTestSuiteIT] [test] [p0=indices.get_mapping/70_legacy_multi_type/Get /index*/_mapping/{type}] after test
1> [2562-03-04T13:32:57,065][INFO ][o.e.b.MixedClusterClientYamlTestSuiteIT] [test] [p0=snapshot.get/10_basic/Get snapshot info contains include_global_state] before test
1> [2562-03-04T13:32:57,091][INFO ][o.e.b.MixedClusterClientYamlTestSuiteIT] [test] [p0=snapshot.get/10_basic/Get snapshot info contains include_global_state] after test
1> [2562-03-04T13:32:57,108][INFO ][o.e.b.MixedClusterClientYamlTestSuiteIT] [test] [p0=search/160_exists_query/Test exists query on mapped float field with no doc values] before test
1> [2562-03-04T13:32:59,071][INFO ][o.e.b.MixedClusterClientYamlTestSuiteIT] [test] [p0=search/160_exists_query/Test exists query on mapped float field with no doc values] after test
1> [2562-03-04T13:32:59,087][INFO ][o.e.b.MixedClusterClientYamlTestSuiteIT] [test] [p0=indices.stats/20_translog/Translog retention] before test
1> [2562-03-04T13:32:59,119][INFO ][o.e.b.MixedClusterClientYamlTestSuiteIT] [test] [p0=indices.stats/20_translog/Translog retention] after test
1> [2562-03-04T13:32:59,133][INFO ][o.e.b.MixedClusterClientYamlTestSuiteIT] [test] [p0=termvectors/10_basic/Basic tests for termvector get] before test
1> [2562-03-04T13:32:59,559][INFO ][o.e.b.MixedClusterClientYamlTestSuiteIT] [test] [p0=termvectors/10_basic/Basic tests for termvector get] after test
1> [2562-03-04T13:32:59,572][INFO ][o.e.b.MixedClusterClientYamlTestSuiteIT] [test] [p0=search.aggregation/10_histogram/Format test] before test
1> [2562-03-04T13:32:59,997][INFO ][o.e.b.MixedClusterClientYamlTestSuiteIT] [test] [p0=search.aggregation/10_histogram/Format test] after test
1> [2562-03-04T13:33:00,009][INFO ][o.e.b.MixedClusterClientYamlTestSuiteIT] [test] [p0=cat.shards/10_basic/Help] before test
1> [2562-03-04T13:33:00,037][INFO ][o.e.b.MixedClusterClientYamlTestSuiteIT] [test] [p0=cat.shards/10_basic/Help] after test
1> [2562-03-04T13:33:00,050][INFO ][o.e.b.MixedClusterClientYamlTestSuiteIT] [test] [p0=indices.get_mapping/11_basic_with_types/Get /{index}/_mapping/_all] before test
1> [2562-03-04T13:33:00,797][INFO ][o.e.b.MixedClusterClientYamlTestSuiteIT] [test] [p0=indices.get_mapping/11_basic_with_types/Get /{index}/_mapping/_all] after test
1> [2562-03-04T13:33:00,806][INFO ][o.e.b.MixedClusterClientYamlTestSuiteIT] [test] [p0=indices.get_mapping/50_wildcard_expansion/Get test-* with wildcard_expansion=open,closed] before test
1> [2562-03-04T13:33:01,670][INFO ][o.e.b.MixedClusterClientYamlTestSuiteIT] [test] [p0=indices.get_mapping/50_wildcard_expansion/Get test-* with wildcard_expansion=open,closed] after test
1> [2562-03-04T13:33:01,681][INFO ][o.e.b.MixedClusterClientYamlTestSuiteIT] [test] [p0=search/160_exists_query/Test exists query on mapped long field with no doc values] before test
1> [2562-03-04T13:33:03,172][INFO ][o.e.b.MixedClusterClientYamlTestSuiteIT] [test] [p0=search/160_exists_query/Test exists query on mapped long field with no doc values] after test
1> [2562-03-04T13:33:03,182][INFO ][o.e.b.MixedClusterClientYamlTestSuiteIT] [test] [p0=get/70_source_filtering/Source filtering] before test
1> [2562-03-04T13:33:03,704][INFO ][o.e.b.MixedClusterClientYamlTestSuiteIT] [test] [p0=get/70_source_filtering/Source filtering] after test
1> [2562-03-04T13:33:03,714][INFO ][o.e.b.MixedClusterClientYamlTestSuiteIT] [test] [p0=indices.get_template/10_basic/Get template with flat settings and master timeout] before test
1> [2562-03-04T13:33:03,736][INFO ][o.e.b.MixedClusterClientYamlTestSuiteIT] [test] [p0=indices.get_template/10_basic/Get template with flat settings and master timeout] after test
1> [2562-03-04T13:33:03,746][INFO ][o.e.b.MixedClusterClientYamlTestSuiteIT] [test] [p0=indices.get_field_mapping/50_field_wildcards/Get field mapping with * for fields] before test
1> [2562-03-04T13:33:03,767][INFO ][o.e.b.MixedClusterClientYamlTestSuiteIT] [test] [p0=indices.get_field_mapping/50_field_wildcards/Get field mapping with * for fields] after test
1> [2562-03-04T13:33:03,776][INFO ][o.e.b.MixedClusterClientYamlTestSuiteIT] [test] [p0=search.aggregation/190_percentiles_hdr_metric/Missing field with missing param] before test
1> [2562-03-04T13:33:04,115][INFO ][o.e.b.MixedClusterClientYamlTestSuiteIT] [test] [p0=search.aggregation/190_percentiles_hdr_metric/Missing field with missing param] after test
1> [2562-03-04T13:33:04,125][INFO ][o.e.b.MixedClusterClientYamlTestSuiteIT] [test] [p0=indices.put_settings/all_path_options/put settings in * index] before test
1> [2562-03-04T13:33:04,893][INFO ][o.e.b.MixedClusterClientYamlTestSuiteIT] [test] [p0=indices.put_settings/all_path_options/put settings in * index] after test
1> [2562-03-04T13:33:04,901][INFO ][o.e.b.MixedClusterClientYamlTestSuiteIT] [test] [p0=indices.put_mapping/all_path_options/put mapping in * index] before test
1> [2562-03-04T13:33:04,929][INFO ][o.e.b.MixedClusterClientYamlTestSuiteIT] [test] [p0=indices.put_mapping/all_path_options/put mapping in * index] after test
1> [2562-03-04T13:33:04,946][INFO ][o.e.b.MixedClusterClientYamlTestSuiteIT] [test] [p0=snapshot.restore/10_basic/Create a snapshot and then restore it] before test
1> [2562-03-04T13:33:35,158][INFO ][o.e.b.MixedClusterClientYamlTestSuiteIT] [test] Stash dump on test failure [{
1> "stash" : {
1> "body" : {
1> "error" : {
1> "root_cause" : [
1> {
1> "type" : "concurrent_snapshot_execution_exception",
1> "reason" : "[test_repo_restore_1:test_snapshot]a snapshot is already running",
1> "stack_trace" : "ConcurrentSnapshotExecutionException[[test_repo_restore_1:test_snapshot]a snapshot is already running]
1> at org.elasticsearch.snapshots.SnapshotsService$1.execute(SnapshotsService.java:266)
1> at org.elasticsearch.cluster.ClusterStateUpdateTask.execute(ClusterStateUpdateTask.java:45)
1> at org.elasticsearch.cluster.service.ClusterService.executeTasks(ClusterService.java:634)
1> at org.elasticsearch.cluster.service.ClusterService.calculateTaskOutputs(ClusterService.java:612)
1> at org.elasticsearch.cluster.service.ClusterService.runTasks(ClusterService.java:571)
1> at org.elasticsearch.cluster.service.ClusterService$ClusterServiceTaskBatcher.run(ClusterService.java:263)
1> at org.elasticsearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:150)
1> at org.elasticsearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:188)
1> at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:569)
1> at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:247)
1> at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:210)
1> at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
1> at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
As far as I can see there's only one test that does anything with snapshots before the one that fails, snapshot.get/10_basic/Get snapshot info contains include_global_state
, it shouldn't be finishing while there are still snapshots running. So I am wondering what was the snapshot running here? I note that this was a 6.7/5.6.1 mixed cluster with a 5.6.1 master so maybe this is a bug that has since been fixed?
@DaveCTurner sorry I wasn't so clear in the PR description. There's def. some other issue in that test (on that too), but imo the current behavior is just annoying: We leak in one test => all tests fail and one has to go through MBs of logs to track down what triggered this because the teardown method isn't stable. I guess we could register the fact that we had in-progress snapshots in the teardown method and then fail after the teardown actually happened if we wanted to make a leaked snapshot a failure condition? |
@DaveCTurner I think one possible explanation for the first failure seen in that specific log would be a master failure while the snapshot request was made (that situation is known to behave weirdly and can fail in exactly this way due to master request retries). But then again, there could be a number of other issues like the one fixed in #27214 here I think since it's |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I'm with you on the current behaviour being annoying. Could we make it fail this one time if it finds a snapshot in progress, but then clean them up anyway, so at least the build will fail if a test leaks a snapshot?
@DaveCTurner thanks, added code to first wipe the snapshots and then fail in the end if we found any that weren't in a completed state :) |
The real issue here is fixed in #39737, but I'm still a fan of this change just to make things a little more ergonomic :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@DaveCTurner thanks! |
* Wipe Snapshots Before Indices in RestTests * If we have a snapshot ongoing from the previous test and enter this method, then deleting the indices fails, which in turn fails the whole wipe * Fixed by first deleting/aborting snapshots
* Wipe Snapshots Before Indices in RestTests * If we have a snapshot ongoing from the previous test and enter this method, then deleting the indices fails, which in turn fails the whole wipe * Fixed by first deleting/aborting snapshots
* Wipe Snapshots Before Indices in RestTests * If we have a snapshot ongoing from the previous test and enter this method, then deleting the indices fails, which in turn fails the whole wipe * Fixed by first deleting/aborting snapshots
* Wipe Snapshots Before Indices in RestTests * If we have a snapshot ongoing from the previous test and enter this method, then deleting the indices fails, which in turn fails the whole wipe * Fixed by first deleting/aborting snapshots
* Wipe Snapshots Before Indices in RestTests * If we have a snapshot ongoing from the previous test and enter this method, then deleting the indices fails, which in turn fails the whole wipe * Fixed by first deleting/aborting snapshots
* Wipe Snapshots Before Indices in RestTests * If we have a snapshot ongoing from the previous test and enter this method, then deleting the indices fails, which in turn fails the whole wipe * Fixed by first deleting/aborting snapshots
* 6.7: Fix CCR HLRC docs Introduce forget follower API (elastic#39718) 6.6.2 release notes. Update release notes for 6.7.0 Add documentation for min_hash filter (elastic#39671) Unmute testIndividualActionsTimeout Unmute testFollowIndexAndCloseNode Use unwrapped cause to determine if node is closing (elastic#39723) Don’t ack if unable to remove failing replica (elastic#39584) Wipe Snapshots Before Indices in RestTests (elastic#39662) (elastic#39765) Bug fix for AnnotatedTextHighlighter (elastic#39525) Fix Snapshot BwC with Version 5.6.x (elastic#39737) Fix occasional SearchServiceTests failure (elastic#39697) Correct date in daterange-aggregation.asciidoc (elastic#39727) Add a note to purge the ingest-geoip plugin (elastic#39553)
* Mistake was made in elastic#39662 * The response deserialized here is `org.elasticsearch.action.admin.cluster.snapshots.get.GetSnapshotsResponse` which uses `org.elasticsearch.snapshots.SnapshotInfo` which uses `org.elasticsearch.snapshots.SnapshotState` and not the shard state
* Mistake was made in #39662 * The response deserialized here is `org.elasticsearch.action.admin.cluster.snapshots.get.GetSnapshotsResponse` which uses `org.elasticsearch.snapshots.SnapshotInfo` which uses `org.elasticsearch.snapshots.SnapshotState` and not the shard state
* Mistake was made in elastic#39662 * The response deserialized here is `org.elasticsearch.action.admin.cluster.snapshots.get.GetSnapshotsResponse` which uses `org.elasticsearch.snapshots.SnapshotInfo` which uses `org.elasticsearch.snapshots.SnapshotState` and not the shard state
* Mistake was made in elastic#39662 * The response deserialized here is `org.elasticsearch.action.admin.cluster.snapshots.get.GetSnapshotsResponse` which uses `org.elasticsearch.snapshots.SnapshotInfo` which uses `org.elasticsearch.snapshots.SnapshotState` and not the shard state
* Mistake was made in elastic#39662 * The response deserialized here is `org.elasticsearch.action.admin.cluster.snapshots.get.GetSnapshotsResponse` which uses `org.elasticsearch.snapshots.SnapshotInfo` which uses `org.elasticsearch.snapshots.SnapshotState` and not the shard state
* Mistake was made in #39662 * The response deserialized here is `org.elasticsearch.action.admin.cluster.snapshots.get.GetSnapshotsResponse` which uses `org.elasticsearch.snapshots.SnapshotInfo` which uses `org.elasticsearch.snapshots.SnapshotState` and not the shard state
* Mistake was made in #39662 * The response deserialized here is `org.elasticsearch.action.admin.cluster.snapshots.get.GetSnapshotsResponse` which uses `org.elasticsearch.snapshots.SnapshotInfo` which uses `org.elasticsearch.snapshots.SnapshotState` and not the shard state
* Mistake was made in #39662 * The response deserialized here is `org.elasticsearch.action.admin.cluster.snapshots.get.GetSnapshotsResponse` which uses `org.elasticsearch.snapshots.SnapshotInfo` which uses `org.elasticsearch.snapshots.SnapshotState` and not the shard state
Example failure: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.7+bwc-tests/109/console
@markharwood fyi :)