Watcher: Reload properly on remote shard change #33167

spinscale · 2018-08-27T13:03:04Z

When a node dies that carries a watcher shard or a shard is relocated to
another node, then watcher needs not only trigger a reload on the node
where the shard relocation happened, but also on other nodes where
copies of this shard, as different watches may need to be loaded.

This commit takes the change of remote nodes into account by not only
storing the local shard allocation ids in the WatcherLifeCycleService,
but storing a list of ShardRoutings based on the local active shards.

This also fixes some tests, which had a wrong assumption. Using
TestShardRouting.newShardRouting in our tests for cluster state
creation led to the issue of always creating new allocation ids which
implicitely lead to a reload.

When a node dies that carries a watcher shard or a shard is relocated to another node, then watcher needs not only trigger a reload on the node where the shard relocation happened, but also on other nodes where copies of this shard, as different watches may need to be loaded. This commit takes the change of remote nodes into account by not only storing the local shard allocation ids in the WatcherLifeCycleService, but storing a list of ShardRoutings based on the local active shards. This also fixes some tests, which had a wrong assumption. Using `TestShardRouting.newShardRouting` in our tests for cluster state creation led to the issue of always creating new allocation ids which implicitely lead to a reload.

elasticmachine · 2018-08-27T13:03:06Z

Pinging @elastic/es-core-infra

hub-cap

Can no longer repro based on this PR. Great work alex. This was most certainly a hard bug to nail down.

When a node dies that carries a watcher shard or a shard is relocated to another node, then watcher needs not only trigger a reload on the node where the shard relocation happened, but also on other nodes where copies of this shard, as different watches may need to be loaded. This commit takes the change of remote nodes into account by not only storing the local shard allocation ids in the WatcherLifeCycleService, but storing a list of ShardRoutings based on the local active shards. This also fixes some tests, which had a wrong assumption. Using `TestShardRouting.newShardRouting` in our tests for cluster state creation led to the issue of always creating new allocation ids which implicitely lead to a reload.

* 6.x: Mute test watcher usage stats output [Rollup] Fix FullClusterRestart test TEST: Disable soft-deletes in ParentChildTestCase TEST: Disable randomized soft-deletes settings Integrates soft-deletes into Elasticsearch (#33222) drop `index.shard.check_on_startup: fix` (#32279) Fix AwaitsFix issue number Mute SmokeTestWatcherWithSecurityIT testsi [DOCS] Moves ml folder from x-pack/docs to docs (#33248) TEST: mute more SmokeTestWatcherWithSecurityIT tests [DOCS] Move rollup APIs to docs (#31450) [DOCS] Rename X-Pack Commands section (#33005) Fixes SecurityIntegTestCase so it always adds at least one alias (#33296) TESTS: Fix Random Fail in MockTcpTransportTests (#33061) (#33307) MINOR: Remove Dead Code from PathTrie (#33280) (#33306) Fix pom for build-tools (#33300) Lazy evaluate java9home (#33301) SQL: test coverage for JdbcResultSet (#32813) Work around to be able to generate eclipse projects (#33295) Different handling for security specific errors in the CLI. Fix for #33230 (#33255) [ML] Refactor delimited file structure detection (#33233) SQL: Support multi-index format as table identifier (#33278) Enable forbiddenapis server java9 (#33245) [MUTE] SmokeTestWatcherWithSecurityIT flaky tests Add region ISO code to GeoIP Ingest plugin (#31669) (#33276) Don't be strict for 6.x Update serialization versions for custom IndexMetaData backport Replace IndexMetaData.Custom with Map-based custom metadata (#32749) Painless: Fix Bindings Bug (#33274) SQL: prevent duplicate generation for repeated aggs (#33252) TEST: Mute testMonitorClusterHealth Fix serialization of empty field capabilities response (#33263) Fix nested _source retrieval with includes/excludes (#33180) [DOCS] TLS file resources are reloadable (#33258) Watcher: Ensure TriggerEngine start replaces existing watches (#33157) Ignore module-info in jar hell checks (#33011) Fix docs build after #33241 [DOC] Repository GCS ADC not supported (#33238) Upgrade to latest Gradle 4.10 (#32801) Fix/30904 cluster formation part2 (#32877) Move file-based discovery to core (#33241) HLRC: add client side RefreshPolicy (#33209) [Kerberos] Add unsupported languages for tests (#33253) Watcher: Reload properly on remote shard change (#33167) Fix classpath security checks for external tests. (#33066) [Rollup] Only allow aggregating on multiples of configured interval (#32052) Added deprecation warning for rescore in scroll queries (#33070) Apply settings filter to get cluster settings API (#33247) [Rollup] Re-factor Rollup Indexer into a generic indexer for re-usability (#32743) HLRC: create base timed request class (#33216) HLRC: Use Optional in validation logic (#33104) Painless: Add Bindings (#33042)

spinscale added >bug blocker v7.0.0 :Data Management/Watcher v6.5.0 v6.4.1 labels Aug 27, 2018

spinscale requested a review from hub-cap August 27, 2018 13:03

hub-cap approved these changes Aug 29, 2018

View reviewed changes

spinscale merged commit 13880bd into elastic:master Aug 29, 2018

spinscale added the backport pending label Aug 29, 2018

spinscale removed the backport pending label Aug 29, 2018

spinscale mentioned this pull request Feb 6, 2019

Duplicate Watcher events firing #38482

Closed

colings86 added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Watcher: Reload properly on remote shard change #33167

Watcher: Reload properly on remote shard change #33167

spinscale commented Aug 27, 2018

elasticmachine commented Aug 27, 2018

hub-cap left a comment

Watcher: Reload properly on remote shard change #33167

Watcher: Reload properly on remote shard change #33167

Conversation

spinscale commented Aug 27, 2018

elasticmachine commented Aug 27, 2018

hub-cap left a comment

Choose a reason for hiding this comment