Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] TransportMonitoringMigrateAlertsActionTests.testLocalAlertsRemoval #66586

Closed
andreidan opened this issue Dec 18, 2020 · 10 comments
Closed
Assignees
Labels
:Data Management/Monitoring Team:Data Management Meta label for data/management team >test-failure Triaged test failures from CI

Comments

@andreidan
Copy link
Contributor

Encountered on a PR build https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+pull-request-2/15205/ (multiple tests failing testLocalExporterWithAlertingDisabled, testLocalAlertsRemoval and testDisabledLocalExporterAlertsRemoval

Build scan:
https://gradle-enterprise.elastic.co/s/or4i6aw2kxx6u

Repro line:

./gradlew ':x-pack:plugin:monitoring:test' --tests "org.elasticsearch.xpack.monitoring.action.TransportMonitoringMigrateAlertsActionTests.testLocalExporterWithAlertingDisabled" -Dtests.seed=74901503919A9EEE -Dtests.security.manager=true -Dtests.locale=el-GR -Dtests.timezone=Pacific/Pitcairn -Druntime.java=8

Reproduces locally?: No

Applicable branches: 7.11

Failure history:
https://gradle-enterprise.elastic.co/scans/tests?search.relativeStartTime=P7D&search.tags=CI&search.timeZoneId=America/Los_Angeles&tests.container=org.elasticsearch.xpack.monitoring.action.TransportMonitoringMigrateAlertsActionTests&tests.sortField=FAILED&tests.test=testLocalAlertsRemoval&tests.unstableOnly=true

Failure excerpt:

Expected: is <true>
     but: was <false>
	at __randomizedtesting.SeedInfo.seed([74901503919A9EEE:45FE8D342B81AB19]:0)
	at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)
	at org.junit.Assert.assertThat(Assert.java:956)
	at org.junit.Assert.assertThat(Assert.java:923)
	at org.elasticsearch.xpack.monitoring.test.MonitoringIntegTestCase.assertIndicesExists(MonitoringIntegTestCase.java:209)
	at org.elasticsearch.xpack.monitoring.test.MonitoringIntegTestCase.lambda$awaitIndexExists$3(MonitoringIntegTestCase.java:204)
	at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:1016)
	at org.elasticsearch.xpack.monitoring.test.MonitoringIntegTestCase.awaitIndexExists(MonitoringIntegTestCase.java:204)
	at org.elasticsearch.xpack.monitoring.action.TransportMonitoringMigrateAlertsActionTests.waitForWatcherIndices(TransportMonitoringMigrateAlertsActionTests.java:592)
	at org.elasticsearch.xpack.monitoring.action.TransportMonitoringMigrateAlertsActionTests.ensureInitialLocalResources(TransportMonitoringMigrateAlertsActionTests.java:473)
	at org.elasticsearch.xpack.monitoring.action.TransportMonitoringMigrateAlertsActionTests.testLocalAlertsRemoval(TransportMonitoringMigrateAlertsActionTests.java:139)
@andreidan andreidan added >test-failure Triaged test failures from CI :Data Management/Monitoring labels Dec 18, 2020
@elasticmachine elasticmachine added the Team:Data Management Meta label for data/management team label Dec 18, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-features (Team:Core/Features)

@gwbrown
Copy link
Contributor

gwbrown commented Dec 18, 2020

These have been failing more recently on master, 7.x, and 7.11 so I'm going to mute these pending a fix.

Some possibly-relevant logs from the failures:

[2020-12-18T08:15:12,687][ERROR][o.e.x.c.t.IndexTemplateRegistry] [node_s1] error adding index template [.watch-history-14] from [/watch-history.json] for [watcher]	
java.lang.IllegalArgumentException: composable template [.watch-history-14] template after composition is invalid	
	at org.elasticsearch.cluster.metadata.MetadataIndexTemplateService.addIndexTemplateV2(MetadataIndexTemplateService.java:504) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]	
	at org.elasticsearch.cluster.metadata.MetadataIndexTemplateService$4.execute(MetadataIndexTemplateService.java:399) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]	
	at org.elasticsearch.cluster.ClusterStateUpdateTask.execute(ClusterStateUpdateTask.java:59) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]	
	at org.elasticsearch.cluster.service.MasterService.executeTasks(MasterService.java:697) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]	
	at org.elasticsearch.cluster.service.MasterService.calculateTaskOutputs(MasterService.java:319) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]	
	at org.elasticsearch.cluster.service.MasterService.runTasks(MasterService.java:214) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]	
	at org.elasticsearch.cluster.service.MasterService$Batcher.run(MasterService.java:151) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]	
	at org.elasticsearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:150) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]	
	at org.elasticsearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:188) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]	
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:680) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]	
	at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:252) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]	
	at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:215) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]	
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]	
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]	
	at java.lang.Thread.run(Thread.java:834) [?:?]	
Caused by: org.elasticsearch.common.xcontent.XContentParseException: [index_template] unknown field [data_stream]	
	at org.elasticsearch.cluster.metadata.MetadataIndexTemplateService.lambda$validateCompositeTemplate$21(MetadataIndexTemplateService.java:1133) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]	
	at org.elasticsearch.indices.IndicesService.withTempIndexService(IndicesService.java:638) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]	
	at org.elasticsearch.cluster.metadata.MetadataIndexTemplateService.validateCompositeTemplate(MetadataIndexTemplateService.java:1113) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]	
	at org.elasticsearch.cluster.metadata.MetadataIndexTemplateService.addIndexTemplateV2(MetadataIndexTemplateService.java:500) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]	
	... 14 more	
[2020-12-18T08:15:12,690][ERROR][o.e.x.c.t.IndexTemplateRegistry] [node_s1] error adding index template [.slm-history] from [/slm-history.json] for [index_lifecycle]	
java.lang.IllegalArgumentException: composable template [.slm-history] template after composition is invalid	
	at org.elasticsearch.cluster.metadata.MetadataIndexTemplateService.addIndexTemplateV2(MetadataIndexTemplateService.java:504) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]	
	at org.elasticsearch.cluster.metadata.MetadataIndexTemplateService$4.execute(MetadataIndexTemplateService.java:399) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]	
	at org.elasticsearch.cluster.ClusterStateUpdateTask.execute(ClusterStateUpdateTask.java:59) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]	
	at org.elasticsearch.cluster.service.MasterService.executeTasks(MasterService.java:697) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]	
	at org.elasticsearch.cluster.service.MasterService.calculateTaskOutputs(MasterService.java:319) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]	
	at org.elasticsearch.cluster.service.MasterService.runTasks(MasterService.java:214) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]	
	at org.elasticsearch.cluster.service.MasterService$Batcher.run(MasterService.java:151) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]	
	at org.elasticsearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:150) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]	
	at org.elasticsearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:188) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]	
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:680) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]	
	at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:252) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]	
	at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:215) [elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]	
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]	
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]	
	at java.lang.Thread.run(Thread.java:834) [?:?]	
Caused by: org.elasticsearch.common.xcontent.XContentParseException: [index_template] unknown field [data_stream]	
	at org.elasticsearch.cluster.metadata.MetadataIndexTemplateService.lambda$validateCompositeTemplate$21(MetadataIndexTemplateService.java:1133) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]	
	at org.elasticsearch.indices.IndicesService.withTempIndexService(IndicesService.java:638) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]	
	at org.elasticsearch.cluster.metadata.MetadataIndexTemplateService.validateCompositeTemplate(MetadataIndexTemplateService.java:1113) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]	
	at org.elasticsearch.cluster.metadata.MetadataIndexTemplateService.addIndexTemplateV2(MetadataIndexTemplateService.java:500) ~[elasticsearch-8.0.0-SNAPSHOT.jar:8.0.0-SNAPSHOT]	
	... 14 more

gwbrown added a commit to gwbrown/elasticsearch that referenced this issue Dec 18, 2020
gwbrown added a commit that referenced this issue Dec 18, 2020
gwbrown added a commit to gwbrown/elasticsearch that referenced this issue Dec 18, 2020
gwbrown added a commit to gwbrown/elasticsearch that referenced this issue Dec 18, 2020
gwbrown added a commit that referenced this issue Dec 18, 2020
gwbrown added a commit that referenced this issue Dec 18, 2020
@martijnvg martijnvg self-assigned this Jan 5, 2021
@martijnvg
Copy link
Member

The mentioned stacktrace is caused by the fact that the monitoring test doesn't load the data stream plugin and then the watcher history index template can't be added because it uses data stream (the data stream field type is missing). I also noticed that other plugins are not loaded which monitoring requires to run (enrich for EnrichStatsCollector and cc for StatsCollector). I will fix this and that should reduce the noice.

martijnvg added a commit to martijnvg/elasticsearch that referenced this issue Jan 5, 2021
…ons for ccr and enrich.

The data stream plugin and dummy transport actions that are added to LocalStateMonitoring
will allow for monitoring java integration tests to function properly without printing error
messages that make debugging harder. For example the data stream plugin was added so that
index templates with data streams can be added without failing constantly in the background and
enrich stats dummy transport action so that the EnrichStatsCollector doesn't fail.

Also unmutes tests that were muted via elastic#66586, to have another opportunity to look at logs without all the noise,
perhaps all these errors contributed to the test failures.
martijnvg added a commit that referenced this issue Jan 6, 2021
Adds data-streams plugin to LocalStateMonitoring and dummy stats actions for ccr and enrich.

The data stream plugin and dummy transport actions that are added to LocalStateMonitoring
will allow for monitoring java integration tests to function properly without printing error
messages that make debugging harder. For example the data stream plugin was added so that
index templates with data streams can be added without failing constantly in the background and
enrich stats dummy transport action so that the EnrichStatsCollector doesn't fail.

Also unmutes tests that were muted via #66586, to have another opportunity to look at logs without all the noise,
perhaps all these errors contributed to the test failures.
martijnvg added a commit to martijnvg/elasticsearch that referenced this issue Jan 6, 2021
@martijnvg
Copy link
Member

I'm muting testDisabledLocalExporterAlertsRemoval() and testLocalExporterWithAlertingDisabled(), since these started to fail after merging in #66997.

@jbaiera
Copy link
Member

jbaiera commented Jan 12, 2021

There are a couple of fresh failures for this test among others that were unmuted.

https://gradle-enterprise.elastic.co/s/cj6fo6lynknmm/tests/:x-pack:plugin:monitoring:test/org.elasticsearch.xpack.monitoring.action.TransportMonitoringMigrateAlertsActionTests/testLocalAlertsRemoval#1

I was looking into this before the end of last year (turns out there was a duplicate issue #66391). From everything I was able to gather, something is delaying the setup of the .watches index that is required for the local tests. Here's the excerpt from the test logs:

  | 1> [2021-01-11T10:55:23,681][INFO ][o.e.x.m.a.TransportMonitoringMigrateAlertsActionTests] [testLocalAlertsRemoval] before test |  
  | 1> [2021-01-11T10:55:23,682][INFO ][o.e.x.m.a.TransportMonitoringMigrateAlertsActionTests] [testLocalAlertsRemoval] [TransportMonitoringMigrateAlertsActionTests#testLocalAlertsRemoval]: setting up test |  
  | 1> [2021-01-11T10:55:23,709][WARN ][o.e.c.m.MetadataIndexTemplateService] [node_s0] legacy template [random_index_template] has index patterns [*] matching patterns from existing composable templates [.slm-history,.triggered_watches,.watch-history-14,.watches] with patterns (.slm-history => [.slm-history-5*],.triggered_watches => [.triggered_watches*],.watch-history-14 => [.watcher-history-14*],.watches => [.watches*]); this template [random_index_template] may be ignored in favor of a composable template at index creation time |  
  | 1> [2021-01-11T10:55:23,711][INFO ][o.e.c.m.MetadataIndexTemplateService] [node_s0] adding template [random_index_template] for index patterns [*] |  
  | 1> [2021-01-11T10:55:23,766][INFO ][o.e.x.m.a.TransportMonitoringMigrateAlertsActionTests] [testLocalAlertsRemoval] [TransportMonitoringMigrateAlertsActionTests#testLocalAlertsRemoval]: all set up test |  
  | 1> [2021-01-11T10:55:23,778][INFO ][o.e.t.h.MockWebServer    ] [testLocalAlertsRemoval] bound HTTP mock server to [127.0.0.1:38303] |  
  | 1> [2021-01-11T10:55:23,802][INFO ][o.e.c.s.ClusterSettings  ] [node_s4] updating [xpack.monitoring.exporters._local.type] from [] to [local] |  
  | 1> [2021-01-11T10:55:23,802][INFO ][o.e.c.s.ClusterSettings  ] [node_s3] updating [xpack.monitoring.exporters._local.type] from [] to [local] |  
  | 1> [2021-01-11T10:55:23,802][INFO ][o.e.c.s.ClusterSettings  ] [node_s4] updating [xpack.monitoring.collection.enabled] from [false] to [true] |  
  | 1> [2021-01-11T10:55:23,802][INFO ][o.e.c.s.ClusterSettings  ] [node_s3] updating [xpack.monitoring.collection.enabled] from [false] to [true] |  
  | 1> [2021-01-11T10:55:23,802][INFO ][o.e.c.s.ClusterSettings  ] [node_s1] updating [xpack.monitoring.exporters._local.type] from [] to [local] |  
  | 1> [2021-01-11T10:55:23,802][INFO ][o.e.c.s.ClusterSettings  ] [node_s1] updating [xpack.monitoring.collection.enabled] from [false] to [true] |  
  | 1> [2021-01-11T10:55:23,807][INFO ][o.e.c.s.ClusterSettings  ] [node_s2] updating [xpack.monitoring.exporters._local.type] from [] to [local] |  
  | 1> [2021-01-11T10:55:23,807][INFO ][o.e.c.s.ClusterSettings  ] [node_s2] updating [xpack.monitoring.collection.enabled] from [false] to [true] |  
  | 1> [2021-01-11T10:55:23,814][INFO ][o.e.c.s.ClusterSettings  ] [node_s0] updating [xpack.monitoring.exporters._local.type] from [] to [local] |  
  | 1> [2021-01-11T10:55:23,815][INFO ][o.e.c.s.ClusterSettings  ] [node_s0] updating [xpack.monitoring.collection.enabled] from [false] to [true] |  
  | 1> [2021-01-11T10:55:24,810][DEPRECATION][o.e.d.c.m.MetadataCreateIndexService] [node_s0] data_stream.dataset="deprecation.elasticsearch" data_stream.namespace="default" data_stream.type="logs" ecs.version="1.6" key="index_template_multiple_match" message="index [.monitoring-es-7-2021.01.11] matches multiple legacy templates [.monitoring-es, random_index_template], composable templates will only match a single template" |  
  | 1> [2021-01-11T10:55:24,812][DEPRECATION][o.e.d.c.m.MetadataCreateIndexService] [node_s0] data_stream.dataset="deprecation.elasticsearch" data_stream.namespace="default" data_stream.type="logs" ecs.version="1.6" key="index_name_starts_with_dot" message="index name [.monitoring-es-7-2021.01.11] starts with a dot '.', in the next major version, index names starting with a dot are reserved for hidden indices and system indices" |  
  | 1> [2021-01-11T10:55:24,854][INFO ][o.e.c.m.MetadataCreateIndexService] [node_s0] [.monitoring-es-7-2021.01.11] creating index, cause [auto(bulk api)], templates [random_index_template, .monitoring-es], shards [10]/[1] |  
  | 1> [2021-01-11T10:55:28,188][INFO ][o.e.c.r.a.AllocationService] [node_s0] current.health="GREEN" message="Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[.monitoring-es-7-2021.01.11][8]]])." previous.health="YELLOW" reason="shards started [[.monitoring-es-7-2021.01.11][8]]" |  
  | 1> [2021-01-11T10:55:56,694][INFO ][o.e.c.s.ClusterSettings  ] [node_s4] updating [xpack.monitoring.exporters._local.type] from [local] to [] |  
  | 1> [2021-01-11T10:55:56,694][INFO ][o.e.c.s.ClusterSettings  ] [node_s2] updating [xpack.monitoring.exporters._local.type] from [local] to [] |  
  | 1> [2021-01-11T10:55:56,695][INFO ][o.e.c.s.ClusterSettings  ] [node_s4] updating [xpack.monitoring.collection.enabled] from [true] to [false] |  
  | 1> [2021-01-11T10:55:56,696][INFO ][o.e.c.s.ClusterSettings  ] [node_s2] updating [xpack.monitoring.collection.enabled] from [true] to [false] |  
  | 1> [2021-01-11T10:55:56,694][INFO ][o.e.c.s.ClusterSettings  ] [node_s3] updating [xpack.monitoring.exporters._local.type] from [local] to [] |  
  | 1> [2021-01-11T10:55:56,699][INFO ][o.e.c.s.ClusterSettings  ] [node_s3] updating [xpack.monitoring.collection.enabled] from [true] to [false] |  
  | 1> [2021-01-11T10:55:56,709][INFO ][o.e.c.s.ClusterSettings  ] [node_s1] updating [xpack.monitoring.exporters._local.type] from [local] to [] |  
  | 1> [2021-01-11T10:55:56,709][INFO ][o.e.c.s.ClusterSettings  ] [node_s1] updating [xpack.monitoring.collection.enabled] from [true] to [false] |  
  | 1> [2021-01-11T10:55:56,717][INFO ][o.e.c.s.ClusterSettings  ] [node_s0] updating [xpack.monitoring.exporters._local.type] from [local] to [] |  
  | 1> [2021-01-11T10:55:56,718][INFO ][o.e.c.s.ClusterSettings  ] [node_s0] updating [xpack.monitoring.collection.enabled] from [true] to [false] |  
  | 1> [2021-01-11T10:55:56,742][INFO ][o.e.x.m.a.TransportMonitoringMigrateAlertsActionTests] [testLocalAlertsRemoval] [TransportMonitoringMigrateAlertsActionTests#testLocalAlertsRemoval]: cleaning up after test |  
  | 1> [2021-01-11T10:55:56,921][INFO ][o.e.c.m.MetadataDeleteIndexService] [node_s0] [.monitoring-es-7-2021.01.11/g5ePvZ4-RKGXECT1vQqlQw] deleting index |  
  | 1> [2021-01-11T10:55:57,263][INFO ][o.e.c.m.MetadataIndexTemplateService] [node_s0] removing template [random_index_template] |  
  | 1> [2021-01-11T10:55:57,324][INFO ][o.e.x.m.a.TransportMonitoringMigrateAlertsActionTests] [testLocalAlertsRemoval] [TransportMonitoringMigrateAlertsActionTests#testLocalAlertsRemoval]: cleaned up after test |  
  | 1> [2021-01-11T10:55:57,324][INFO ][o.e.x.m.a.TransportMonitoringMigrateAlertsActionTests] [testLocalAlertsRemoval] after test |  
  | 2> REPRODUCE WITH: ./gradlew ':x-pack:plugin:monitoring:test' --tests "org.elasticsearch.xpack.monitoring.action.TransportMonitoringMigrateAlertsActionTests.testLocalAlertsRemoval" -Dtests.seed=38904F1E769EB12E -Dtests.security.manager=true -Dtests.locale=no-NO -Dtests.timezone=Africa/Algiers -Druntime.java=11 |  
  | 2> java.lang.AssertionError: |  
  | Expected: is <true> |  
  | but: was <false> |  
  | at __randomizedtesting.SeedInfo.seed([38904F1E769EB12E:9FED729CC8584D9]:0) |  
  | at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18) |  
  | at org.junit.Assert.assertThat(Assert.java:956) |  
  | at org.junit.Assert.assertThat(Assert.java:923) |  
  | at org.elasticsearch.xpack.monitoring.test.MonitoringIntegTestCase.assertIndicesExists(MonitoringIntegTestCase.java:186) |  
  | at org.elasticsearch.xpack.monitoring.test.MonitoringIntegTestCase.lambda$awaitIndexExists$3(MonitoringIntegTestCase.java:180) |  
  | at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:955) |  
  | at org.elasticsearch.xpack.monitoring.test.MonitoringIntegTestCase.awaitIndexExists(MonitoringIntegTestCase.java:180) |  
  | at org.elasticsearch.xpack.monitoring.action.TransportMonitoringMigrateAlertsActionTests.waitForWatcherIndices(TransportMonitoringMigrateAlertsActionTests.java:578) |  
  | at org.elasticsearch.xpack.monitoring.action.TransportMonitoringMigrateAlertsActionTests.ensureInitialLocalResources(TransportMonitoringMigrateAlertsActionTests.java:459) |  
  | at org.elasticsearch.xpack.monitoring.action.TransportMonitoringMigrateAlertsActionTests.testLocalAlertsRemoval(TransportMonitoringMigrateAlertsActionTests.java:123)

@dnhatn
Copy link
Member

dnhatn commented Jan 19, 2021

@mark-vieira
Copy link
Contributor

I've muted both testLocalAlertsRemoval and testRepeatedLocalAlertsRemoval in master as they have been failing again this week.

@martijnvg
Copy link
Member

Apologies, I missed the updates to this issue. I've looked into the latest failure and I can confirm what @jbaiera has observed. In the case of the testRepeatedLocalAlertsRemoval() test, the test fails after waiting for 20s for the .watches index. This index should be created by the LocalExporter class in the monitoring plugin, however for some reason the logic in LocalExporter#setupIfElectedMaster() method doesn't get executed. Also the fact that all indices get deleted between tests, also probably doesn't help here. I'm going to unmute the testLocalAlertsRemoval and testRepeatedLocalAlertsRemoval tests again and enable test logging for these tests. I will keep an eye on CI for these test failures. If I'm slow to respond, then please mute again.

martijnvg added a commit to martijnvg/elasticsearch that referenced this issue Jan 26, 2021
martijnvg added a commit that referenced this issue Feb 5, 2021
martijnvg added a commit to martijnvg/elasticsearch that referenced this issue Feb 5, 2021
martijnvg added a commit to martijnvg/elasticsearch that referenced this issue Feb 9, 2021
unmute TransportMonitoringMigrateAlertsActionTests#testLocalAlertsRemoval and
TransportMonitoringMigrateAlertsActionTests#testRepeatedLocalAlertsRemoval tests

Somehow during these tests the monitor watches are not installed. Both
tests use the local exporter and this exporter only installs the watches
under specific conditions via the elected master node. I suspect the
conditions are never met. The http exporter is more relaxed when attempting
to install monitor watches and the tests using the http exporter seem
not to be prone by the fact that tests fail because monitor watches have
not been installed.

Relates to elastic#66586
martijnvg added a commit that referenced this issue Feb 10, 2021
unmute TransportMonitoringMigrateAlertsActionTests#testLocalAlertsRemoval and
TransportMonitoringMigrateAlertsActionTests#testRepeatedLocalAlertsRemoval tests

Somehow during these tests the monitor watches are not installed. Both
tests use the local exporter and this exporter only installs the watches
under specific conditions via the elected master node. I suspect the
conditions are never met. The http exporter is more relaxed when attempting
to install monitor watches and the tests using the http exporter seem
not to be prone by the fact that tests fail because monitor watches have
not been installed.

Relates to #66586
@martijnvg
Copy link
Member

martijnvg commented Feb 17, 2021

After adding more logging via #68752 and this test failure, that the setup of watcher is never attempted, however the local exporter setup is marked as completed and because of this during the testRepeatedLocalAlertsRemoval() test watches are never ever installed. This test expects the watch index to exists, and therefore fails.

The problem is LocalExporter#setupClusterAlertsTasks(...) skips installing the watches because:

watches shouldn't be setup, because state=[INITIALIZED] and clusterStateChange=[true]

However the .watch index is missing. I think the setupClusterAlertsTasks() method should also consider installing the watches if .watches index is missing, regardless of state.

The setupIfElectedMaster() method that is invoking the setupClusterAlertsTasks() method should not advertise the setup as complete, if watcher isn't ready and another setup attempt needs to be done.

Update: Turns out that watches are not installed, because initially setupClusterAlertsTasks(...) is invoked from cluster state update thread and this avoids installing the watches. Naturally when the monitoring services on the elected master start exporting monitor documents then the watches should be installed, however no monitor documents are exported on elected master node (monitor docs are exported on other nodes). This result is that the watches are never installed and this test fails.

I will open a pr.

martijnvg added a commit to martijnvg/elasticsearch that referenced this issue Feb 17, 2021
Change tests to use monitor bulk api on elected master node before verifying watcher index exists.
Sometimes the monitor service on the elected master doesn't yet export monitor documents resulting in tests using the `ensureInitialLocalResources(...)` method to fail.
Cluster alerts watcher are only installed when local exporter tries to resolve local bulk.

Relates to elastic#66586
martijnvg added a commit that referenced this issue Feb 18, 2021
…#69139)

Change tests to use monitor bulk api on elected master node before verifying watcher index exists.
Sometimes the monitor service on the elected master doesn't yet export monitor documents resulting in tests using the `ensureInitialLocalResources(...)` method to fail.
Cluster alerts watcher are only installed when local exporter tries to resolve local bulk.

Relates to #66586
martijnvg added a commit to martijnvg/elasticsearch that referenced this issue Feb 22, 2021
…8752)

unmute TransportMonitoringMigrateAlertsActionTests#testLocalAlertsRemoval and
TransportMonitoringMigrateAlertsActionTests#testRepeatedLocalAlertsRemoval tests

Somehow during these tests the monitor watches are not installed. Both
tests use the local exporter and this exporter only installs the watches
under specific conditions via the elected master node. I suspect the
conditions are never met. The http exporter is more relaxed when attempting
to install monitor watches and the tests using the http exporter seem
not to be prone by the fact that tests fail because monitor watches have
not been installed.

Relates to elastic#66586
martijnvg added a commit to martijnvg/elasticsearch that referenced this issue Feb 22, 2021
…elastic#69139)

Change tests to use monitor bulk api on elected master node before verifying watcher index exists.
Sometimes the monitor service on the elected master doesn't yet export monitor documents resulting in tests using the `ensureInitialLocalResources(...)` method to fail.
Cluster alerts watcher are only installed when local exporter tries to resolve local bulk.

Relates to elastic#66586
martijnvg added a commit that referenced this issue Feb 22, 2021
…69326)

Re-enabled TransportMonitoringMigrateAlertsActionTests#testLocalAlertsRemoval and TransportMonitoringMigrateAlertsActionTests#testRepeatedLocalAlertsRemoval on 7.x branch.

Includes changes from #69139 and #68752
Relates to #66586

Included commits:
* Add more trace logging when installing monitor watches and (#68752)

unmute TransportMonitoringMigrateAlertsActionTests#testLocalAlertsRemoval and
TransportMonitoringMigrateAlertsActionTests#testRepeatedLocalAlertsRemoval tests

Somehow during these tests the monitor watches are not installed. Both
tests use the local exporter and this exporter only installs the watches
under specific conditions via the elected master node. I suspect the
conditions are never met. The http exporter is more relaxed when attempting
to install monitor watches and the tests using the http exporter seem
not to be prone by the fact that tests fail because monitor watches have
not been installed.

Relates to #66586

* Manually trigger local exporter to open a bulk in some monitor tests. (#69139)

Change tests to use monitor bulk api on elected master node before verifying watcher index exists.
Sometimes the monitor service on the elected master doesn't yet export monitor documents resulting in tests using the `ensureInitialLocalResources(...)` method to fail.
Cluster alerts watcher are only installed when local exporter tries to resolve local bulk.

Relates to #66586
@martijnvg
Copy link
Member

The tests haven't failed in almost 5 days. Also re-enabled these tests in 7.x branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/Monitoring Team:Data Management Meta label for data/management team >test-failure Triaged test failures from CI
Projects
None yet
Development

No branches or pull requests

7 participants