Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] BasicWatcherTests.testModifyWatches fails in CI #35503

Closed
jaymode opened this issue Nov 13, 2018 · 12 comments
Closed

[CI] BasicWatcherTests.testModifyWatches fails in CI #35503

jaymode opened this issue Nov 13, 2018 · 12 comments
Assignees
Labels
:Data Management/Watcher >test-failure Triaged test failures from CI

Comments

@jaymode
Copy link
Member

jaymode commented Nov 13, 2018

BasicWatcherTests testModify watches failed in CI. I looked at the test and one possible cause could be a concurrent reloading of the watch service, which will remove all watches from the trigger engine since the trigger service gets paused. Looking at the logs we see a reload get triggered right after adding the watch due to new local watcher shard allocation ids. Maybe we should wait for a green cluster health or use ensureClusterStateConsistency in this test?

Failure link: https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.x+release-tests/92/consoleFull

Reproduce line:

./gradlew :x-pack:plugin:watcher:test -Dtests.seed=CE0D14FE35FDBFFD -Dtests.class=org.elasticsearch.xpack.watcher.test.integration.BasicWatcherTests -Dtests.method="testModifyWatches" -Dtests.security.manager=true -Dbuild.snapshot=false -Dtests.jvm.argline="-Dbuild.snapshot=false" -Dtests.locale=pt-PT -Dtests.timezone=Asia/Bangkok -Dcompiler.java=11 -Druntime.java=8
  1> [2018-11-13T19:26:33,676][INFO ][o.e.x.w.t.i.BasicWatcherTests] [testModifyWatches] [BasicWatcherTests#testModifyWatches]: all set up test
  1> [2018-11-13T19:26:33,677][INFO ][o.e.x.w.t.i.BasicWatcherTests] [testModifyWatches] [BasicWatcherTests#testModifyWatches]: freezing time on nodes
  1> [2018-11-13T19:26:33,680][INFO ][o.e.c.m.MetaDataCreateIndexService] [node_s0] [.watches] creating index, cause [api], templates [.watches, random_index_template], shards [2]/[0], mappings [doc]
  1> [2018-11-13T19:26:33,680][INFO ][o.e.c.r.a.AllocationService] [node_s0] updating number_of_replicas to [1] for indices [.watches]
  1> [2018-11-13T19:26:33,721][INFO ][o.e.c.m.MetaDataCreateIndexService] [node_s0] [.triggered_watches] creating index, cause [api], templates [.triggered_watches, random_index_template], shards [1]/[1], mappings [doc]
  1> [2018-11-13T19:26:33,766][INFO ][o.e.c.m.MetaDataCreateIndexService] [node_s0] [.watcher-history-9-2018.11.13] creating index, cause [api], templates [.watch-history-9, random_index_template], shards [1]/[0], mappings [doc]
  1> [2018-11-13T19:26:33,766][INFO ][o.e.c.r.a.AllocationService] [node_s0] updating number_of_replicas to [1] for indices [.watcher-history-9-2018.11.13]
  1> [2018-11-13T19:26:33,797][INFO ][o.e.x.w.t.i.BasicWatcherTests] [testModifyWatches] creating watch history index [.watcher-history-9-2018.11.13]
  1> [2018-11-13T19:26:33,819][INFO ][o.e.c.r.a.AllocationService] [node_s0] Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[.watcher-history-9-2018.11.13][0]] ...]).
  1> [2018-11-13T19:26:33,824][DEBUG][o.e.x.w.t.i.BasicWatcherTests] [testModifyWatches] indices [.watcher-history-9-2018.11.13, .watches, .triggered_watches] are green
  1> [2018-11-13T19:26:33,824][INFO ][o.e.x.w.t.i.BasicWatcherTests] [testModifyWatches] waiting to start watcher, current states [Tuple [v1=node_s1, v2=STOPPED], Tuple [v1=node_s2, v2=STOPPED], Tuple [v1=node_s0, v2=STOPPED]]
  1> [2018-11-13T19:26:33,830][DEBUG][o.e.x.w.WatcherService   ] [node_s2] watch service has been reloaded, reason [starting]
  1> [2018-11-13T19:26:33,830][DEBUG][o.e.x.w.WatcherService   ] [node_s1] watch service has been reloaded, reason [starting]
  1> [2018-11-13T19:26:33,831][DEBUG][o.e.x.w.WatcherService   ] [node_s0] watch service has been reloaded, reason [starting]
  1> [2018-11-13T19:26:33,831][INFO ][o.e.x.w.t.i.BasicWatcherTests] [testModifyWatches] waiting to start watcher, current states [Tuple [v1=node_s1, v2=STARTED], Tuple [v1=node_s2, v2=STARTED], Tuple [v1=node_s0, v2=STARTING]]
  1> [2018-11-13T19:26:33,834][INFO ][o.e.x.w.t.i.BasicWatcherTests] [testModifyWatches] waiting to start watcher, current states [Tuple [v1=node_s1, v2=STARTED], Tuple [v1=node_s2, v2=STARTED], Tuple [v1=node_s0, v2=STARTED]]
  1> [2018-11-13T19:26:33,836][INFO ][o.e.c.m.MetaDataCreateIndexService] [node_s0] [idx] creating index, cause [api], templates [random_index_template], shards [7]/[1], mappings []
  1> [2018-11-13T19:26:33,904][DEBUG][o.e.x.w.WatcherIndexingListener] [node_s2] adding watch [_name] to trigger service
  1> [2018-11-13T19:26:33,904][DEBUG][o.e.x.w.t.ScheduleTriggerEngineMock] [node_s2] adding watch [_name]
  1> [2018-11-13T19:26:33,917][DEBUG][o.e.x.w.WatcherIndexingListener] [node_s0] watch [_name] should not be triggered
  1> [2018-11-13T19:26:33,926][DEBUG][o.e.x.w.t.ScheduleTriggerEngineMock] [testModifyWatches] firing watch [_name] at [2018-11-13T12:26:38.677Z]
  1> [2018-11-13T19:26:33,927][DEBUG][o.e.x.w.e.ExecutionService] [testModifyWatches] saving watch records [1]
  1> [2018-11-13T19:26:33,931][DEBUG][o.e.x.w.e.ExecutionService] [testModifyWatches] executing watch [_name]
  1> [2018-11-13T19:26:33,933][DEBUG][o.e.x.w.i.s.ExecutableSearchInput] [testModifyWatches] [_name_887065ee-0c7f-4ae6-9947-269be07f5479-2018-11-13T12:26:38.677Z] found [0] hits
  1> [2018-11-13T19:26:33,936][DEBUG][o.e.x.w.WatcherIndexingListener] [node_s2] adding watch [_name] to trigger service
  1> [2018-11-13T19:26:33,936][DEBUG][o.e.x.w.t.ScheduleTriggerEngineMock] [node_s2] adding watch [_name]
  1> [2018-11-13T19:26:33,939][DEBUG][o.e.x.w.WatcherIndexingListener] [node_s0] watch [_name] should not be triggered
  1> [2018-11-13T19:26:33,948][INFO ][o.e.c.m.MetaDataMappingService] [node_s0] [.watcher-history-9-2018.11.13/CmR9DRQkQD2WEcvGyaQC7A] update_mapping [doc]
  1> [2018-11-13T19:26:33,966][DEBUG][o.e.x.w.e.ExecutionService] [testModifyWatches] finished [_name]/[_name_887065ee-0c7f-4ae6-9947-269be07f5479-2018-11-13T12:26:38.677Z]
  1> [2018-11-13T19:26:34,008][DEBUG][o.e.x.w.WatcherIndexingListener] [node_s2] adding watch [_name] to trigger service
  1> [2018-11-13T19:26:34,008][DEBUG][o.e.x.w.t.ScheduleTriggerEngineMock] [node_s2] adding watch [_name]
  1> [2018-11-13T19:26:34,026][INFO ][o.e.c.r.a.AllocationService] [node_s0] Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[idx][6]] ...]).
  1> [2018-11-13T19:26:34,027][DEBUG][o.e.x.w.WatcherIndexingListener] [node_s0] watch [_name] should not be triggered
  1> [2018-11-13T19:26:34,029][INFO ][o.e.x.w.WatcherService   ] [node_s1] reloading watcher, reason [new local watcher shard allocation ids], cancelled [0] queued tasks
  1> [2018-11-13T19:26:34,030][INFO ][o.e.x.w.WatcherService   ] [node_s2] reloading watcher, reason [new local watcher shard allocation ids], cancelled [0] queued tasks
  1> [2018-11-13T19:26:34,038][INFO ][o.e.x.w.t.i.BasicWatcherTests] [testModifyWatches] [#testModifyWatches]: clearing watcher state
  1> [2018-11-13T19:26:34,039][INFO ][o.e.x.w.t.i.BasicWatcherTests] [testModifyWatches] waiting to stop watcher, current states [Tuple [v1=node_s1, v2=STARTED], Tuple [v1=node_s2, v2=STARTED], Tuple [v1=node_s0, v2=STARTED]]
  1> [2018-11-13T19:26:34,042][INFO ][o.e.x.w.WatcherService   ] [node_s1] stopping watch service, reason [watcher manually marked to shutdown by cluster state update]
  1> [2018-11-13T19:26:34,042][INFO ][o.e.x.w.WatcherService   ] [node_s2] stopping watch service, reason [watcher manually marked to shutdown by cluster state update]
  1> [2018-11-13T19:26:34,045][INFO ][o.e.x.w.WatcherService   ] [node_s0] stopping watch service, reason [watcher manually marked to shutdown by cluster state update]
  1> [2018-11-13T19:26:34,048][INFO ][o.e.x.w.t.i.BasicWatcherTests] [testModifyWatches] waiting to stop watcher, current states [Tuple [v1=node_s1, v2=STOPPED], Tuple [v1=node_s2, v2=STOPPED], Tuple [v1=node_s0, v2=STOPPED]]
  1> [2018-11-13T19:26:34,048][INFO ][o.e.x.w.t.i.BasicWatcherTests] [testModifyWatches] [BasicWatcherTests#testModifyWatches]: cleaning up after test
  1> [2018-11-13T19:26:34,048][INFO ][o.e.t.InternalTestCluster] [testModifyWatches] Clearing active scheme time frozen, expected healing time 0s
  1> [2018-11-13T19:26:34,055][DEBUG][o.e.x.w.WatcherService   ] [node_s1] Loaded [0] watches for execution
  1> [2018-11-13T19:26:34,055][DEBUG][o.e.x.w.WatcherService   ] [node_s2] Loaded [1] watches for execution
  1> [2018-11-13T19:26:34,056][DEBUG][o.e.x.w.WatcherService   ] [node_s1] watch service has been reloaded, reason [new local watcher shard allocation ids]
  1> [2018-11-13T19:26:34,056][DEBUG][o.e.x.w.t.ScheduleTriggerEngineMock] [node_s2] adding watch [_name]
  1> [2018-11-13T19:26:34,056][DEBUG][o.e.x.w.WatcherService   ] [node_s2] watch service has been reloaded, reason [new local watcher shard allocation ids]
  1> [2018-11-13T19:26:34,065][INFO ][o.e.c.m.MetaDataDeleteIndexService] [node_s0] [idx/dvrs-NoDRECbdygx_jZlBA] deleting index
  1> [2018-11-13T19:26:34,065][INFO ][o.e.c.m.MetaDataDeleteIndexService] [node_s0] [.watches/4_uvYvRAQSabLevglsnYQg] deleting index
  1> [2018-11-13T19:26:34,066][INFO ][o.e.c.m.MetaDataDeleteIndexService] [node_s0] [.triggered_watches/HipCpVXNSWuosdZwXrIFBA] deleting index
  1> [2018-11-13T19:26:34,066][INFO ][o.e.c.m.MetaDataDeleteIndexService] [node_s0] [.watcher-history-9-2018.11.13/CmR9DRQkQD2WEcvGyaQC7A] deleting index
  1> [2018-11-13T19:26:34,123][INFO ][o.e.c.m.MetaDataIndexTemplateService] [node_s0] removing template [random_index_template]
  1> [2018-11-13T19:26:34,128][INFO ][o.e.x.w.t.i.BasicWatcherTests] [testModifyWatches] [BasicWatcherTests#testModifyWatches]: cleaned up after test
  1> [2018-11-13T19:26:34,129][INFO ][o.e.x.w.t.i.BasicWatcherTests] [testModifyWatches] after test
FAILURE 0.50s J2 | BasicWatcherTests.testModifyWatches <<< FAILURES!
   > Throwable #1: java.lang.AssertionError: could not find watch [_name] to trigger
   > Expected: is <true>
   >      but: was <false>
   > 	at __randomizedtesting.SeedInfo.seed([CE0D14FE35FDBFFD:13C578BEB78E4397]:0)
   > 	at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
   > 	at org.elasticsearch.xpack.watcher.test.AbstractWatcherIntegrationTestCase$TimeWarp.trigger(AbstractWatcherIntegrationTestCase.java:569)
   > 	at org.elasticsearch.xpack.watcher.test.AbstractWatcherIntegrationTestCase$TimeWarp.trigger(AbstractWatcherIntegrationTestCase.java:559)
   > 	at org.elasticsearch.xpack.watcher.test.integration.BasicWatcherTests.testModifyWatches(BasicWatcherTests.java:195)
   > 	at java.lang.Thread.run(Thread.java:748)
@jaymode jaymode added >test-failure Triaged test failures from CI :Data Management/Watcher labels Nov 13, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra

@gwbrown
Copy link
Contributor

gwbrown commented Feb 19, 2019

After lots of trying, (10s of thousands of runs of this test) under various conditions, including on a heavily-loaded system, on 6.x and master, I have not been able to reproduce this failure once. This test isn't muted, and has not failed on CI since this issue was opened 3 months ago. While it has failed a few times on CI within the past year, it has only failed with this particular failure once, and the other times were before some significant Watcher refactorings/bug fixes, so I'm not sure if those failures are relevant.

I'm very hesitant to make any changes to this test to try to address this failure given that I can't reproduce it, and thus have no way of verifying if any changes actually increased the stability of the test, so I'm wondering what we should do with this issue: Unless someone else has some insight into how we might be able to better reproduce this, I don't think we'll be able to do much for the moment.

@markharwood
Copy link
Contributor

Build-stats link

@jtibshirani
Copy link
Contributor

I dug into these failures while investigating #37882, which seems to have the same root cause as both this issue and #39306.

From looking at logs and stepping through code, I saw the same cause that @jaymode identified above. There is a change in shard routing that causes the watch service to be reloaded, which then clears the list of registered watches from the mock trigger engine. The assertion errors we're seeing can be consistently reproduced if we simulate a change in shard routing, and repeat a test case ~10 times:

--- a/x-pack/plugin/watcher/src/main/java/org/elasticsearch/xpack/watcher/WatcherLifeCycleService.java
+++ b/x-pack/plugin/watcher/src/main/java/org/elasticsearch/xpack/watcher/WatcherLifeCycleService.java
@@ -137,7 +137,7 @@ public class WatcherLifeCycleService implements ClusterStateListener {
             .sorted(Comparator.comparing(ShardRouting::hashCode))
             .collect(Collectors.toList());
 
-        if (previousShardRoutings.get().equals(localAffectedShardRoutings) == false) {
+        if (Math.random() < 0.1 || previousShardRoutings.get().equals(localAffectedShardRoutings) == false) {
             if (watcherService.validate(event.state())) {
                 previousShardRoutings.set(localAffectedShardRoutings);
                 if (state.get() == WatcherState.STARTED) {

I don't think that waiting on a green cluster health will improve the situation, since we already wait for the watcher indices to be green in AbstractWatcherIntegrationTestCase#_setup. It looks like the reload is caused by a shard routing change that happens after the indices are created and watcher has already started:

  1> [2018-11-13T19:26:33,830][DEBUG][o.e.x.w.WatcherService   ] [node_s2] watch service has been reloaded, reason [starting]
  ...
  1> [2018-11-13T19:26:34,029][INFO ][o.e.x.w.WatcherService   ] [node_s1] reloading watcher, reason [new local watcher shard allocation ids], cancelled [0] queued tasks

From looking at the mock trigger engine, I actually don't think it's correct to be clearing the list of watches when watcher is reloaded. I opened a PR that proposes to remove this behavior, which could also help with these test failures: #39724. With that change, I can no longer reproduce the failures when simulating a change in shard routing.

@jtibshirani
Copy link
Contributor

I'm closing this now that #39724 has merged. Feel free to reopen the issue if similar failures pop up again.

@benwtrent
Copy link
Member

benwtrent commented Mar 27, 2019

Previous issues around these tests (but not convinced that they are the same cause or not). #35503 #39306

Previous investigations have brought about changes in this PR #39724

Previous guesses at the cause: reloading causes watches to be cleared (fixed via the PR), internalCluster not shutting down properly.

Failing builds:

https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+periodic/951/consoleFull
Reproduction:

./gradlew :x-pack:plugin:watcher:unitTest -Dtests.seed=CF3EAFDCCBFC8BED -Dtests.class=org.elasticsearch.xpack.watcher.test.integration.BasicWatcherTests -Dtests.method="testConditionSearchWithIndexedTemplate" -Dtests.security.manager=true -Dtests.locale=ca -Dtests.timezone=America/Atka -Dcompiler.java=12 -Druntime.java=8

Relevant trace:

1> [2019-03-26T22:30:43,355][INFO ][o.e.x.w.t.i.BasicWatcherTests] [testIndexWatch] before test
  1> [2019-03-26T22:30:43,355][INFO ][o.e.x.w.t.i.BasicWatcherTests] [testIndexWatch] [BasicWatcherTests#testIndexWatch]: setting up test
  1> [2019-03-26T22:30:43,360][INFO ][o.e.c.m.MetaDataIndexTemplateService] [node_s2] adding template [random_index_template] for index patterns [*]
  1> [2019-03-26T22:30:43,370][INFO ][o.e.x.w.t.i.BasicWatcherTests] [testIndexWatch] [BasicWatcherTests#testIndexWatch]: all set up test
  1> [2019-03-26T22:30:43,371][INFO ][o.e.x.w.t.i.BasicWatcherTests] [testIndexWatch] [BasicWatcherTests#testIndexWatch]: freezing time on nodes
  1> [2019-03-26T22:30:43,373][INFO ][o.e.x.w.t.i.BasicWatcherTests] [testIndexWatch] waiting to stop watcher, current states [Tuple [v1=node_s2, v2=STOPPED], Tuple [v1=node_s0, v2=STOPPED], Tuple [v1=node_s1, v2=STOPPED]]
  1> [2019-03-26T22:30:43,389][INFO ][o.e.c.m.MetaDataCreateIndexService] [node_s2] [.watchesyvkwv] creating index, cause [Index to test aliases with .watches index], templates [.watches, random_index_template], shards [1]/[0], mappings [_doc]
  1> [2019-03-26T22:30:43,389][INFO ][o.e.c.r.a.AllocationService] [node_s2] updating number_of_replicas to [1] for indices [.watchesyvkwv]
  1> [2019-03-26T22:30:43,465][INFO ][o.e.c.m.MetaDataCreateIndexService] [node_s2] [vuebvrpaq] creating index, cause [api], templates [random_index_template], shards [1]/[1], mappings [_doc]
  1> [2019-03-26T22:30:43,626][INFO ][o.e.c.r.a.AllocationService] [node_s2] Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[vuebvrpaq][0]] ...]).
  1> [2019-03-26T22:30:43,636][DEBUG][o.e.x.w.t.i.BasicWatcherTests] [testIndexWatch] indices [vuebvrpaq] are green
  1> [2019-03-26T22:30:43,638][INFO ][o.e.c.m.MetaDataDeleteIndexService] [node_s2] [.watchesyvkwv/YxVv6AmvQH6ojNeIB4BeCw] deleting index
  1> [2019-03-26T22:30:43,676][INFO ][o.e.x.w.t.i.BasicWatcherTests] [testIndexWatch] set alias for .watches index to [vuebvrpaq]
  1> [2019-03-26T22:30:43,680][INFO ][o.e.c.m.MetaDataCreateIndexService] [node_s2] [.triggered_watches] creating index, cause [api], templates [.triggered_watches, random_index_template], shards [1]/[0], mappings [_doc]
  1> [2019-03-26T22:30:43,681][INFO ][o.e.c.r.a.AllocationService] [node_s2] updating number_of_replicas to [1] for indices [.triggered_watches]
  1> [2019-03-26T22:30:43,785][INFO ][o.e.c.m.MetaDataCreateIndexService] [node_s2] [.watcher-history-9-2019.03.27] creating index, cause [api], templates [.watch-history-9, random_index_template], shards [1]/[0], mappings [_doc]
  1> [2019-03-26T22:30:43,786][INFO ][o.e.c.r.a.AllocationService] [node_s2] updating number_of_replicas to [1] for indices [.watcher-history-9-2019.03.27]
  1> [2019-03-26T22:30:43,874][INFO ][o.e.x.w.t.i.BasicWatcherTests] [testIndexWatch] creating watch history index [.watcher-history-9-2019.03.27]
  1> [2019-03-26T22:30:43,967][INFO ][o.e.c.r.a.AllocationService] [node_s2] Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[.watcher-history-9-2019.03.27][0]] ...]).
  1> [2019-03-26T22:30:43,978][DEBUG][o.e.x.w.t.i.BasicWatcherTests] [testIndexWatch] indices [.watcher-history-9-2019.03.27, vuebvrpaq, .triggered_watches] are green
  1> [2019-03-26T22:30:43,979][INFO ][o.e.x.w.t.i.BasicWatcherTests] [testIndexWatch] waiting to start watcher, current states [Tuple [v1=node_s2, v2=STOPPED], Tuple [v1=node_s0, v2=STOPPED], Tuple [v1=node_s1, v2=STOPPED]]
  1> [2019-03-26T22:30:43,995][DEBUG][o.e.x.w.WatcherService   ] [node_s0] watch service has been reloaded, reason [starting]
  1> [2019-03-26T22:30:44,005][INFO ][o.e.x.w.t.i.BasicWatcherTests] [testIndexWatch] waiting to start watcher, current states [Tuple [v1=node_s2, v2=STARTING], Tuple [v1=node_s0, v2=STARTED], Tuple [v1=node_s1, v2=STARTED]]
  1> [2019-03-26T22:30:44,009][DEBUG][o.e.x.w.WatcherService   ] [node_s2] watch service has been reloaded, reason [starting]
  1> [2019-03-26T22:30:44,009][INFO ][o.e.x.w.t.i.BasicWatcherTests] [testIndexWatch] waiting to start watcher, current states [Tuple [v1=node_s2, v2=STARTING], Tuple [v1=node_s0, v2=STARTED], Tuple [v1=node_s1, v2=STARTED]]
  1> [2019-03-26T22:30:44,015][INFO ][o.e.x.w.t.i.BasicWatcherTests] [testIndexWatch] waiting to start watcher, current states [Tuple [v1=node_s2, v2=STARTED], Tuple [v1=node_s0, v2=STARTED], Tuple [v1=node_s1, v2=STARTED]]
  1> [2019-03-26T22:30:44,018][INFO ][o.e.c.m.MetaDataCreateIndexService] [node_s2] [idx] creating index, cause [api], templates [random_index_template], shards [2]/[0], mappings []
  1> [2019-03-26T22:30:44,082][INFO ][o.e.c.r.a.AllocationService] [node_s2] Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[idx][1]] ...]).
  1> [2019-03-26T22:30:44,096][INFO ][o.e.c.m.MetaDataMappingService] [node_s2] [idx/HYDL5JI3RrWT65ANheMyHw] create_mapping [type]
  1> [2019-03-26T22:30:44,135][DEBUG][o.e.x.w.WatcherIndexingListener] [node_s2] adding watch [_name] to trigger service
  1> [2019-03-26T22:30:44,135][DEBUG][o.e.x.w.t.ScheduleTriggerEngineMock] [node_s2] adding watch [_name]
  1> [2019-03-26T22:30:44,149][DEBUG][o.e.x.w.WatcherIndexingListener] [node_s0] watch [_name] should not be triggered
  1> [2019-03-26T22:30:44,163][DEBUG][o.e.x.w.t.ScheduleTriggerEngineMock] [testIndexWatch] firing watch [_name] at [2019-03-27T07:30:43.372Z]
  1> [2019-03-26T22:30:44,166][DEBUG][o.e.x.w.e.ExecutionService] [testIndexWatch] watcher execution service paused, not processing [1] events
  1> [2019-03-26T22:30:54,339][INFO ][o.e.x.w.t.i.BasicWatcherTests] [testIndexWatch] Found [0] records for watch [_name]
  1> [2019-03-26T22:30:54,339][INFO ][o.e.x.w.t.i.BasicWatcherTests] [testIndexWatch] [#testIndexWatch]: clearing watcher state
  1> [2019-03-26T22:30:54,341][INFO ][o.e.x.w.t.i.BasicWatcherTests] [testIndexWatch] waiting to stop watcher, current states [Tuple [v1=node_s2, v2=STARTED], Tuple [v1=node_s0, v2=STARTED], Tuple [v1=node_s1, v2=STARTED]]
  1> [2019-03-26T22:30:54,353][INFO ][o.e.x.w.WatcherService   ] [node_s0] stopping watch service, reason [watcher manually marked to shutdown by cluster state update]
  1> [2019-03-26T22:30:54,353][INFO ][o.e.x.w.WatcherService   ] [node_s1] stopping watch service, reason [watcher manually marked to shutdown by cluster state update]
  1> [2019-03-26T22:30:54,357][INFO ][o.e.x.w.WatcherService   ] [node_s2] stopping watch service, reason [watcher manually marked to shutdown by cluster state update]
  1> [2019-03-26T22:30:54,362][INFO ][o.e.x.w.t.i.BasicWatcherTests] [testIndexWatch] waiting to stop watcher, current states [Tuple [v1=node_s2, v2=STOPPED], Tuple [v1=node_s0, v2=STOPPED], Tuple [v1=node_s1, v2=STOPPED]]
  1> [2019-03-26T22:30:54,362][INFO ][o.e.x.w.t.i.BasicWatcherTests] [testIndexWatch] [BasicWatcherTests#testIndexWatch]: cleaning up after test
  1> [2019-03-26T22:30:54,362][INFO ][o.e.t.InternalTestCluster] [testIndexWatch] Clearing active scheme time frozen, expected healing time 0s
  1> [2019-03-26T22:30:54,403][INFO ][o.e.c.m.MetaDataDeleteIndexService] [node_s2] [idx/HYDL5JI3RrWT65ANheMyHw] deleting index
  1> [2019-03-26T22:30:54,403][INFO ][o.e.c.m.MetaDataDeleteIndexService] [node_s2] [vuebvrpaq/sE_znfh3QHedE0HaGoXQXQ] deleting index
  1> [2019-03-26T22:30:54,403][INFO ][o.e.c.m.MetaDataDeleteIndexService] [node_s2] [.watcher-history-9-2019.03.27/sIe7e7WWSaa4Xi6RwCtI7Q] deleting index
  1> [2019-03-26T22:30:54,403][INFO ][o.e.c.m.MetaDataDeleteIndexService] [node_s2] [.triggered_watches/gu6jqmx3QguzdYDSU3dTgg] deleting index
  1> [2019-03-26T22:30:54,462][INFO ][o.e.c.m.MetaDataIndexTemplateService] [node_s2] removing template [random_index_template]
  1> [2019-03-26T22:30:54,474][INFO ][o.e.x.w.t.i.BasicWatcherTests] [testIndexWatch] [BasicWatcherTests#testIndexWatch]: cleaned up after test
  1> [2019-03-26T22:30:54,474][INFO ][o.e.x.w.t.i.BasicWatcherTests] [testIndexWatch] after test
FAILURE 11.2s J3 | BasicWatcherTests.testIndexWatch <<< FAILURES!
   > Throwable #1: java.lang.AssertionError: could not find executed watch record for watch _name
   > Expected: a value equal to or greater than <1L>
   >      but: <0L> was less than <1L>

https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+intake/2761/console

Reproduction:

./gradlew :x-pack:plugin:watcher:unitTest -Dtests.seed=27A4E6E50AAC7BBF -Dtests.class=org.elasticsearch.xpack.watcher.test.integration.BasicWatcherTests -Dtests.method="testConditionSearchWithIndexedTemplate" -Dtests.security.manager=true -Dtests.locale=is-IS -Dtests.timezone=America/Santarem -Dcompiler.java=12 -Druntime.java=8

Trace:

1> [2019-03-27T10:01:24,459][INFO ][o.e.x.w.t.i.BasicWatcherTests] [testConditionSearchWithIndexedTemplate] before test
  1> [2019-03-27T10:01:24,459][INFO ][o.e.x.w.t.i.BasicWatcherTests] [testConditionSearchWithIndexedTemplate] [BasicWatcherTests#testConditionSearchWithIndexedTemplate]: setting up test
  1> [2019-03-27T10:01:24,466][INFO ][o.e.c.m.MetaDataIndexTemplateService] [node_sm2] adding template [random_index_template] for index patterns [*]
  1> [2019-03-27T10:01:24,475][INFO ][o.e.x.w.t.i.BasicWatcherTests] [testConditionSearchWithIndexedTemplate] [BasicWatcherTests#testConditionSearchWithIndexedTemplate]: all set up test
  1> [2019-03-27T10:01:24,477][INFO ][o.e.x.w.t.i.BasicWatcherTests] [testConditionSearchWithIndexedTemplate] [BasicWatcherTests#testConditionSearchWithIndexedTemplate]: freezing time on nodes
  1> [2019-03-27T10:01:24,478][INFO ][o.e.x.w.t.i.BasicWatcherTests] [testConditionSearchWithIndexedTemplate] waiting to stop watcher, current states [Tuple [v1=node_sd3, v2=STOPPED], Tuple [v1=node_sm2, v2=STOPPED], Tuple [v1=node_sm1, v2=STOPPED], Tuple [v1=node_sm0, v2=STOPPED], Tuple [v1=node_sd5, v2=STOPPED], Tuple [v1=node_sd4, v2=STOPPED]]
  1> [2019-03-27T10:01:24,482][INFO ][o.e.c.m.MetaDataCreateIndexService] [node_sm2] [.watchesxyrni] creating index, cause [Index to test aliases with .watches index], templates [.watches, random_index_template], shards [1]/[0], mappings [_doc]
  1> [2019-03-27T10:01:24,483][INFO ][o.e.c.r.a.AllocationService] [node_sm2] updating number_of_replicas to [1] for indices [.watchesxyrni]
  1> [2019-03-27T10:01:24,536][INFO ][o.e.c.m.MetaDataCreateIndexService] [node_sm2] [frabdmxca] creating index, cause [api], templates [random_index_template], shards [1]/[1], mappings [_doc]
  1> [2019-03-27T10:01:24,608][INFO ][o.e.c.r.a.AllocationService] [node_sm2] Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[frabdmxca][0]] ...]).
  1> [2019-03-27T10:01:24,616][DEBUG][o.e.x.w.t.i.BasicWatcherTests] [testConditionSearchWithIndexedTemplate] indices [frabdmxca] are green
  1> [2019-03-27T10:01:24,619][INFO ][o.e.c.m.MetaDataDeleteIndexService] [node_sm2] [.watchesxyrni/VLE2R2FqSgy5i1duOszs7g] deleting index
  1> [2019-03-27T10:01:24,652][INFO ][o.e.x.w.t.i.BasicWatcherTests] [testConditionSearchWithIndexedTemplate] set alias for .watches index to [frabdmxca]
  1> [2019-03-27T10:01:24,655][INFO ][o.e.c.m.MetaDataCreateIndexService] [node_sm2] [.triggered_watches-alias-index] creating index, cause [Index to test aliases with .triggered-watches index], templates [.triggered_watches, random_index_template], shards [1]/[1], mappings [_doc]
  1> [2019-03-27T10:01:24,696][INFO ][o.e.c.m.MetaDataCreateIndexService] [node_sm2] [murwfypk] creating index, cause [api], templates [random_index_template], shards [1]/[1], mappings [_doc]
  1> [2019-03-27T10:01:24,760][INFO ][o.e.c.r.a.AllocationService] [node_sm2] Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[murwfypk][0]] ...]).
  1> [2019-03-27T10:01:24,770][DEBUG][o.e.x.w.t.i.BasicWatcherTests] [testConditionSearchWithIndexedTemplate] indices [murwfypk] are green
  1> [2019-03-27T10:01:24,773][INFO ][o.e.c.m.MetaDataDeleteIndexService] [node_sm2] [.triggered_watches-alias-index/8rnRreMKTyqgOEQnCiQaSA] deleting index
  1> [2019-03-27T10:01:24,797][INFO ][o.e.x.w.t.i.BasicWatcherTests] [testConditionSearchWithIndexedTemplate] set alias for .triggered-watches index to [murwfypk]
  1> [2019-03-27T10:01:24,802][INFO ][o.e.c.m.MetaDataCreateIndexService] [node_sm2] [.watcher-history-9-2019.03.27] creating index, cause [api], templates [.watch-history-9, random_index_template], shards [1]/[0], mappings [_doc]
  1> [2019-03-27T10:01:24,802][INFO ][o.e.c.r.a.AllocationService] [node_sm2] updating number_of_replicas to [1] for indices [.watcher-history-9-2019.03.27]
  1> [2019-03-27T10:01:24,845][INFO ][o.e.x.w.t.i.BasicWatcherTests] [testConditionSearchWithIndexedTemplate] creating watch history index [.watcher-history-9-2019.03.27]
  2> NOTE: leaving temporary files on disk at: /var/lib/jenkins/workspace/elastic+elasticsearch+7.x+intake/x-pack/plugin/watcher/build/testrun/unitTest/J7/temp/org.elasticsearch.xpack.watcher.test.integration.BasicWatcherTests_27A4E6E50AAC7BBF-001
  1> [2019-03-27T10:01:24,876][INFO ][o.e.c.r.a.AllocationService] [node_sm2] Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[.watcher-history-9-2019.03.27][0]] ...]).
  1> [2019-03-27T10:01:24,884][DEBUG][o.e.x.w.t.i.BasicWatcherTests] [testConditionSearchWithIndexedTemplate] indices [.watcher-history-9-2019.03.27, frabdmxca, murwfypk] are green
  1> [2019-03-27T10:01:24,885][INFO ][o.e.x.w.t.i.BasicWatcherTests] [testConditionSearchWithIndexedTemplate] waiting to start watcher, current states [Tuple [v1=node_sd3, v2=STOPPED], Tuple [v1=node_sm2, v2=STOPPED], Tuple [v1=node_sm1, v2=STOPPED], Tuple [v1=node_sm0, v2=STOPPED], Tuple [v1=node_sd5, v2=STOPPED], Tuple [v1=node_sd4, v2=STOPPED]]
  1> [2019-03-27T10:01:24,891][DEBUG][o.e.x.w.WatcherService   ] [node_sm0] watch service has been reloaded, reason [starting]
  1> [2019-03-27T10:01:24,892][DEBUG][o.e.x.w.WatcherService   ] [node_sm1] watch service has been reloaded, reason [starting]
  1> [2019-03-27T10:01:24,895][DEBUG][o.e.x.w.WatcherService   ] [node_sd5] watch service has been reloaded, reason [starting]
  1> [2019-03-27T10:01:24,895][DEBUG][o.e.x.w.WatcherService   ] [node_sd4] watch service has been reloaded, reason [starting]
  1> [2019-03-27T10:01:24,896][DEBUG][o.e.x.w.WatcherService   ] [node_sm2] watch service has been reloaded, reason [starting]
  1> [2019-03-27T10:01:24,897][INFO ][o.e.x.w.t.i.BasicWatcherTests] [testConditionSearchWithIndexedTemplate] waiting to start watcher, current states [Tuple [v1=node_sd3, v2=STARTED], Tuple [v1=node_sm2, v2=STARTED], Tuple [v1=node_sm1, v2=STARTED], Tuple [v1=node_sm0, v2=STARTED], Tuple [v1=node_sd5, v2=STARTED], Tuple [v1=node_sd4, v2=STARTED]]
  1> [2019-03-27T10:01:24,914][INFO ][o.e.x.w.t.i.BasicWatcherTests] [testConditionSearchWithIndexedTemplate] using custom data_path for index: [xZlFgQZDuX]
  1> [2019-03-27T10:01:24,917][INFO ][o.e.c.m.MetaDataCreateIndexService] [node_sm2] [events] creating index, cause [api], templates [random_index_template], shards [3]/[0], mappings [event]
  1> [2019-03-27T10:01:24,952][INFO ][o.e.c.r.a.AllocationService] [node_sm2] Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[events][1], [events][2]] ...]).
  1> [2019-03-27T10:01:24,961][WARN ][o.e.d.x.w.s.s.WatcherSearchTemplateRequest] [testConditionSearchWithIndexedTemplate] [types removal] Specifying types in a watcher search request is deprecated.
  1> [2019-03-27T10:01:24,963][DEBUG][o.e.x.w.WatcherIndexingListener] [node_sd4] watch [_name] should not be triggered
  2> NOTE: All tests run in this JVM: [WatchAckTests, YearlyScheduleTests, JiraAccountTests, TextTemplateTests, BasicWatcherTests]
  1> [2019-03-27T10:01:24,974][DEBUG][o.e.x.w.WatcherIndexingListener] [node_sd5] adding watch [_name] to trigger service
  1> [2019-03-27T10:01:24,974][DEBUG][o.e.x.w.t.ScheduleTriggerEngineMock] [node_sd5] adding watch [_name]
  1> [2019-03-27T10:01:24,984][INFO ][o.e.x.w.t.i.BasicWatcherTests] [testConditionSearchWithIndexedTemplate] created watch [_name] at [2019-03-27T13:01:24.984Z]
  1> [2019-03-27T10:01:25,002][DEBUG][o.e.x.w.t.ScheduleTriggerEngineMock] [testConditionSearchWithIndexedTemplate] firing watch [_name] at [2019-03-27T13:01:25.914Z]
  1> [2019-03-27T10:01:25,004][DEBUG][o.e.x.w.e.ExecutionService] [testConditionSearchWithIndexedTemplate] watcher execution service paused, not processing [1] events
  1> [2019-03-27T10:01:35,109][INFO ][o.e.x.w.t.i.BasicWatcherTests] [testConditionSearchWithIndexedTemplate] Found [0] records for watch [_name]
  1> [2019-03-27T10:01:35,109][INFO ][o.e.x.w.t.i.BasicWatcherTests] [testConditionSearchWithIndexedTemplate] [#testConditionSearchWithIndexedTemplate]: clearing watcher state
  1> [2019-03-27T10:01:35,110][INFO ][o.e.x.w.t.i.BasicWatcherTests] [testConditionSearchWithIndexedTemplate] waiting to stop watcher, current states [Tuple [v1=node_sd3, v2=STARTED], Tuple [v1=node_sm2, v2=STARTED], Tuple [v1=node_sm1, v2=STARTED], Tuple [v1=node_sm0, v2=STARTED], Tuple [v1=node_sd5, v2=STARTED], Tuple [v1=node_sd4, v2=STARTED]]
  1> [2019-03-27T10:01:35,116][INFO ][o.e.x.w.WatcherService   ] [node_sm1] stopping watch service, reason [watcher manually marked to shutdown by cluster state update]
  1> [2019-03-27T10:01:35,117][INFO ][o.e.x.w.WatcherService   ] [node_sm0] stopping watch service, reason [watcher manually marked to shutdown by cluster state update]
  1> [2019-03-27T10:01:35,119][INFO ][o.e.x.w.WatcherService   ] [node_sd4] stopping watch service, reason [watcher manually marked to shutdown by cluster state update]
  1> [2019-03-27T10:01:35,119][INFO ][o.e.x.w.WatcherService   ] [node_sd3] stopping watch service, reason [watcher manually marked to shutdown by cluster state update]
  1> [2019-03-27T10:01:35,119][INFO ][o.e.x.w.WatcherService   ] [node_sd5] stopping watch service, reason [watcher manually marked to shutdown by cluster state update]
  1> [2019-03-27T10:01:35,121][INFO ][o.e.x.w.WatcherService   ] [node_sm2] stopping watch service, reason [watcher manually marked to shutdown by cluster state update]
  1> [2019-03-27T10:01:35,124][INFO ][o.e.x.w.t.i.BasicWatcherTests] [testConditionSearchWithIndexedTemplate] waiting to stop watcher, current states [Tuple [v1=node_sd3, v2=STOPPED], Tuple [v1=node_sm2, v2=STOPPED], Tuple [v1=node_sm1, v2=STOPPED], Tuple [v1=node_sm0, v2=STOPPED], Tuple [v1=node_sd5, v2=STOPPED], Tuple [v1=node_sd4, v2=STOPPED]]
  1> [2019-03-27T10:01:35,124][INFO ][o.e.x.w.t.i.BasicWatcherTests] [testConditionSearchWithIndexedTemplate] [BasicWatcherTests#testConditionSearchWithIndexedTemplate]: cleaning up after test
  1> [2019-03-27T10:01:35,124][INFO ][o.e.t.InternalTestCluster] [testConditionSearchWithIndexedTemplate] Clearing active scheme time frozen, expected healing time 0s
  1> [2019-03-27T10:01:35,148][INFO ][o.e.c.m.MetaDataDeleteIndexService] [node_sm2] [murwfypk/MDpXU-3TRmuH2S8doaI00g] deleting index
  1> [2019-03-27T10:01:35,148][INFO ][o.e.c.m.MetaDataDeleteIndexService] [node_sm2] [frabdmxca/ks9SYzn5RGafbPcbzp2ccg] deleting index
  1> [2019-03-27T10:01:35,148][INFO ][o.e.c.m.MetaDataDeleteIndexService] [node_sm2] [.watcher-history-9-2019.03.27/N21v22jHSy6h1-sJGbReyg] deleting index
  1> [2019-03-27T10:01:35,148][INFO ][o.e.c.m.MetaDataDeleteIndexService] [node_sm2] [events/P5a_l-A2R1eNHAXV92bjjA] deleting index
  1> [2019-03-27T10:01:35,181][INFO ][o.e.c.m.MetaDataIndexTemplateService] [node_sm2] removing template [random_index_template]
  1> [2019-03-27T10:01:35,192][INFO ][o.e.x.w.t.i.BasicWatcherTests] [testConditionSearchWithIndexedTemplate] [BasicWatcherTests#testConditionSearchWithIndexedTemplate]: cleaned up after test
  1> [2019-03-27T10:01:35,192][INFO ][o.e.x.w.t.i.BasicWatcherTests] [testConditionSearchWithIndexedTemplate] after test
FAILURE 10.8s J7 | BasicWatcherTests.testConditionSearchWithIndexedTemplate <<< FAILURES!
   > Throwable #1: java.lang.AssertionError: 
   > Expected: a value equal to or greater than <1L>

Looking at the two traces, it seems that the watcher reload occurs BEFORE the watch is created. So, I am not sure it is the same suspected cause as before (reloading causing the watches to be removed).

Similar failure:
https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+7.x+intake/746/console

./gradlew :x-pack:plugin:watcher:unitTest -Dtests.seed=27A4E6E50AAC7BBF -Dtests.class=org.elasticsearch.xpack.watcher.test.integration.BasicWatcherTests -Dtests.method="testConditionSearchWithIndexedTemplate" -Dtests.security.manager=true -Dtests.locale=is-IS -Dtests.timezone=America/Santarem -Dcompiler.java=12 -Druntime.java=8
  1> [2019-03-27T05:12:08,662][INFO ][o.e.x.w.t.i.BasicWatcherTests] [testConditionSearchWithSource] before test
  1> [2019-03-27T05:12:08,662][INFO ][o.e.x.w.t.i.BasicWatcherTests] [testConditionSearchWithSource] [BasicWatcherTests#testConditionSearchWithSource]: setting up test
  1> [2019-03-27T05:12:08,669][INFO ][o.e.c.m.MetaDataIndexTemplateService] [node_s0] adding template [random_index_template] for index patterns [*]
  1> [2019-03-27T05:12:08,678][INFO ][o.e.x.w.t.i.BasicWatcherTests] [testConditionSearchWithSource] [BasicWatcherTests#testConditionSearchWithSource]: all set up test
  1> [2019-03-27T05:12:08,679][INFO ][o.e.x.w.t.i.BasicWatcherTests] [testConditionSearchWithSource] [BasicWatcherTests#testConditionSearchWithSource]: freezing time on nodes
  1> [2019-03-27T05:12:08,680][INFO ][o.e.x.w.t.i.BasicWatcherTests] [testConditionSearchWithSource] waiting to stop watcher, current states [Tuple [v1=node_s1, v2=STOPPED], Tuple [v1=node_s2, v2=STOPPED], Tuple [v1=node_s0, v2=STOPPED]]
  1> [2019-03-27T05:12:08,686][INFO ][o.e.c.m.MetaDataCreateIndexService] [node_s0] [.watchesauonl] creating index, cause [Index to test aliases with .watches index], templates [.watches, random_index_template], shards [1]/[0], mappings [_doc]
  1> [2019-03-27T05:12:08,687][INFO ][o.e.c.r.a.AllocationService] [node_s0] updating number_of_replicas to [1] for indices [.watchesauonl]
  1> [2019-03-27T05:12:08,776][INFO ][o.e.c.m.MetaDataCreateIndexService] [node_s0] [vrurzbbrej] creating index, cause [api], templates [random_index_template], shards [1]/[1], mappings [_doc]
  1> [2019-03-27T05:12:08,874][INFO ][o.e.c.r.a.AllocationService] [node_s0] Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[vrurzbbrej][0]] ...]).
  1> [2019-03-27T05:12:08,888][DEBUG][o.e.x.w.t.i.BasicWatcherTests] [testConditionSearchWithSource] indices [vrurzbbrej] are green
  1> [2019-03-27T05:12:08,890][INFO ][o.e.c.m.MetaDataDeleteIndexService] [node_s0] [.watchesauonl/QcgNNCikRqeU9toD6E6MvQ] deleting index
  1> [2019-03-27T05:12:08,922][INFO ][o.e.x.w.t.i.BasicWatcherTests] [testConditionSearchWithSource] set alias for .watches index to [vrurzbbrej]
  1> [2019-03-27T05:12:08,927][INFO ][o.e.c.m.MetaDataCreateIndexService] [node_s0] [.triggered_watches] creating index, cause [api], templates [.triggered_watches, random_index_template], shards [1]/[1], mappings [_doc]
  1> [2019-03-27T05:12:09,006][INFO ][o.e.c.m.MetaDataCreateIndexService] [node_s0] [.watcher-history-9-2019.03.27] creating index, cause [api], templates [.watch-history-9, random_index_template], shards [1]/[0], mappings [_doc]
  1> [2019-03-27T05:12:09,007][INFO ][o.e.c.r.a.AllocationService] [node_s0] updating number_of_replicas to [1] for indices [.watcher-history-9-2019.03.27]
  1> [2019-03-27T05:12:09,095][INFO ][o.e.x.w.t.i.BasicWatcherTests] [testConditionSearchWithSource] creating watch history index [.watcher-history-9-2019.03.27]
  1> [2019-03-27T05:12:09,198][INFO ][o.e.c.r.a.AllocationService] [node_s0] Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[.watcher-history-9-2019.03.27][0]] ...]).
  1> [2019-03-27T05:12:09,212][DEBUG][o.e.x.w.t.i.BasicWatcherTests] [testConditionSearchWithSource] indices [.watcher-history-9-2019.03.27, vrurzbbrej, .triggered_watches] are green
  1> [2019-03-27T05:12:09,213][INFO ][o.e.x.w.t.i.BasicWatcherTests] [testConditionSearchWithSource] waiting to start watcher, current states [Tuple [v1=node_s1, v2=STOPPED], Tuple [v1=node_s2, v2=STOPPED], Tuple [v1=node_s0, v2=STOPPED]]
  1> [2019-03-27T05:12:09,228][INFO ][o.e.x.w.t.i.BasicWatcherTests] [testConditionSearchWithSource] waiting to start watcher, current states [Tuple [v1=node_s1, v2=STARTING], Tuple [v1=node_s2, v2=STARTING], Tuple [v1=node_s0, v2=STARTED]]
  1> [2019-03-27T05:12:09,229][DEBUG][o.e.x.w.WatcherService   ] [node_s1] watch service has been reloaded, reason [starting]
  1> [2019-03-27T05:12:09,230][DEBUG][o.e.x.w.WatcherService   ] [node_s2] watch service has been reloaded, reason [starting]
  1> [2019-03-27T05:12:09,233][INFO ][o.e.x.w.t.i.BasicWatcherTests] [testConditionSearchWithSource] waiting to start watcher, current states [Tuple [v1=node_s1, v2=STARTED], Tuple [v1=node_s2, v2=STARTED], Tuple [v1=node_s0, v2=STARTED]]
  1> [2019-03-27T05:12:09,237][INFO ][o.e.c.m.MetaDataCreateIndexService] [node_s0] [events] creating index, cause [api], templates [random_index_template], shards [7]/[1], mappings [event]
  1> [2019-03-27T05:12:09,499][DEBUG][o.e.x.w.WatcherIndexingListener] [node_s2] watch [_name] should not be triggered
  1> [2019-03-27T05:12:09,539][DEBUG][o.e.x.w.WatcherIndexingListener] [node_s1] adding watch [_name] to trigger service
  1> [2019-03-27T05:12:09,539][DEBUG][o.e.x.w.t.ScheduleTriggerEngineMock] [node_s1] adding watch [_name]
  1> [2019-03-27T05:12:09,592][INFO ][o.e.x.w.t.i.BasicWatcherTests] [testConditionSearchWithSource] created watch [_name] at [2019-03-27T09:12:09.592Z]
  1> [2019-03-27T05:12:09,622][INFO ][o.e.c.r.a.AllocationService] [node_s0] Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[events][1]] ...]).
  1> [2019-03-27T05:12:09,651][DEBUG][o.e.x.w.t.ScheduleTriggerEngineMock] [testConditionSearchWithSource] firing watch [_name] at [2019-03-27T09:12:10.233Z]
  1> [2019-03-27T05:12:09,654][DEBUG][o.e.x.w.e.ExecutionService] [testConditionSearchWithSource] watcher execution service paused, not processing [1] events
  1> [2019-03-27T05:12:19,785][INFO ][o.e.x.w.t.i.BasicWatcherTests] [testConditionSearchWithSource] Found [0] records for watch [_name]
  1> [2019-03-27T05:12:19,785][INFO ][o.e.x.w.t.i.BasicWatcherTests] [testConditionSearchWithSource] [#testConditionSearchWithSource]: clearing watcher state
  1> [2019-03-27T05:12:19,786][INFO ][o.e.x.w.t.i.BasicWatcherTests] [testConditionSearchWithSource] waiting to stop watcher, current states [Tuple [v1=node_s1, v2=STARTED], Tuple [v1=node_s2, v2=STARTED], Tuple [v1=node_s0, v2=STARTED]]
  1> [2019-03-27T05:12:19,791][INFO ][o.e.x.w.WatcherService   ] [node_s2] stopping watch service, reason [watcher manually marked to shutdown by cluster state update]
  1> [2019-03-27T05:12:19,791][INFO ][o.e.x.w.WatcherService   ] [node_s1] stopping watch service, reason [watcher manually marked to shutdown by cluster state update]
  1> [2019-03-27T05:12:19,793][INFO ][o.e.x.w.WatcherService   ] [node_s0] stopping watch service, reason [watcher manually marked to shutdown by cluster state update]
  1> [2019-03-27T05:12:19,796][INFO ][o.e.x.w.t.i.BasicWatcherTests] [testConditionSearchWithSource] waiting to stop watcher, current states [Tuple [v1=node_s1, v2=STOPPED], Tuple [v1=node_s2, v2=STOPPED], Tuple [v1=node_s0, v2=STOPPED]]
  1> [2019-03-27T05:12:19,796][INFO ][o.e.x.w.t.i.BasicWatcherTests] [testConditionSearchWithSource] [BasicWatcherTests#testConditionSearchWithSource]: cleaning up after test
  1> [2019-03-27T05:12:19,796][INFO ][o.e.t.InternalTestCluster] [testConditionSearchWithSource] Clearing active scheme time frozen, expected healing time 0s
  1> [2019-03-27T05:12:19,821][INFO ][o.e.c.m.MetaDataDeleteIndexService] [node_s0] [events/UURKiggsQcm349t1qHMpNQ] deleting index
  1> [2019-03-27T05:12:19,821][INFO ][o.e.c.m.MetaDataDeleteIndexService] [node_s0] [.triggered_watches/8dqLO65eQASBHboBUPB3yQ] deleting index
  1> [2019-03-27T05:12:19,821][INFO ][o.e.c.m.MetaDataDeleteIndexService] [node_s0] [.watcher-history-9-2019.03.27/YxxObJ3OThmY9cGQSSKF3w] deleting index
  1> [2019-03-27T05:12:19,821][INFO ][o.e.c.m.MetaDataDeleteIndexService] [node_s0] [vrurzbbrej/7zLSzJHzQryFj0_XE14I5g] deleting index
  1> [2019-03-27T05:12:19,885][INFO ][o.e.c.m.MetaDataIndexTemplateService] [node_s0] removing template [random_index_template]
  1> [2019-03-27T05:12:19,892][INFO ][o.e.x.w.t.i.BasicWatcherTests] [testConditionSearchWithSource] [BasicWatcherTests#testConditionSearchWithSource]: cleaned up after test
  1> [2019-03-27T05:12:19,892][INFO ][o.e.x.w.t.i.BasicWatcherTests] [testConditionSearchWithSource] after test
FAILURE 11.3s J7 | BasicWatcherTests.testConditionSearchWithSource <<< FAILURES!
   > Throwable #1: java.lang.AssertionError: 

@benwtrent
Copy link
Member

benwtrent commented Mar 27, 2019

Another failure around the same thing:
https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+master+intake/2765/console

FAILURE 13.1s J6 | WebhookIntegrationTests.testWebhookWithBasicAuth <<< FAILURES!
   > Throwable #1: java.lang.AssertionError: could not find executed watch record for watch _id
   > Expected: a value equal to or greater than <1L>
   >      but: <0L> was less than <1L>
   > 	at __randomizedtesting.SeedInfo.seed([552D2F2E1DEA46BD:BE10AAB71FC19159]:0)
   > 	at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
   > 	at 

Muting these BasicWatcherTests and WebhookIntegrationTests as many of them appear to be failing across master and 7.x

benwtrent added a commit to benwtrent/elasticsearch that referenced this issue Mar 27, 2019
benwtrent added a commit to benwtrent/elasticsearch that referenced this issue Mar 27, 2019
benwtrent added a commit that referenced this issue Mar 27, 2019
* Muting watcher tests for issue #35503

* blocking the two troublsome suites
benwtrent added a commit that referenced this issue Mar 27, 2019
* Muting watcher tests for issue #35503

* blocking the two troublsome suites
jpountz pushed a commit that referenced this issue Mar 27, 2019
* Muting watcher tests for issue #35503

* blocking the two troublsome suites
@droberts195
Copy link
Contributor

Another test suffering from the same could not find executed watch record for watch _id error is WebhookHttpsIntegrationTests.testHttps.

For example:

I muted this test too:

master: dbbad81
7.x: 54b0ff8
7.0: 6b7a607
6.7: 3e5fe29

It didn't reproduce locally for me with:

./gradlew :x-pack:plugin:watcher:unitTest \
  -Dtests.seed=3AFF7F9BF0A9BF85 \
  -Dtests.class=org.elasticsearch.xpack.watcher.actions.webhook.WebhookHttpsIntegrationTests \
  -Dtests.method="testHttps" \
  -Dtests.security.manager=true \
  -Dtests.locale=lv-LV \
  -Dtests.timezone=Europe/Monaco \
  -Dcompiler.java=12 \
  -Druntime.java=8

@gwbrown
Copy link
Contributor

gwbrown commented Apr 9, 2019

This should be fixed in master by #40658, if anyone runs across this failure let me know. I'll wait for the backport before closing this.

@gwbrown
Copy link
Contributor

gwbrown commented Apr 12, 2019

#40658 has been merged to all currently maintained branches after a week of running in master with no failures in this test, so I'm closing this issue.

@gwbrown gwbrown closed this as completed Apr 12, 2019
@danielmitterdorfer
Copy link
Member

ActionErrorIntegrationTests.testErrorInAction which has been tracked in #39306 has failed again but #39306 (comment) suggests to reopen this issue instead if additional test failures occur.

./gradlew :x-pack:plugin:watcher:test --tests "org.elasticsearch.xpack.watcher.actions.ActionErrorIntegrationTests.testErrorInAction" \
  -Dtests.seed=930476443B322E24 \
  -Dtests.security.manager=true \
  -Dtests.locale=pt-MZ \
  -Dtests.timezone=America/Tegucigalpa \
  -Dcompiler.java=12 \
  -Druntime.java=12

Note: Later in the logs we see the output along the lines of non-failed nodes do not form a quorum. As far as I understand it this is related to nodes being shutdown as part of the suite's teardown process (i.e. after the test failure has occured) so I think that is unrelated.

gurkankaymak pushed a commit to gurkankaymak/elasticsearch that referenced this issue May 27, 2019
@jakelandis
Copy link
Contributor

Closing this issue as I can not find any actionable items here. None of the tests referenced here are currently disabled and (outside a very small handful of PRs) could not find any evidence of failures in the past 90 days. The small handful of PRs appear to have larger problems and include many other test failures.

If any of the referenced tests fail please log a new issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/Watcher >test-failure Triaged test failures from CI
Projects
None yet
Development

No branches or pull requests

9 participants