[ML] Fix calendar and filter updates from non-master nodes #31804

dimitris-athanasiou · 2018-07-04T17:40:32Z

Job updates or changes to calendars or filters may
result into updating the job process if it has been
running. To preserve the order of updates, process
updates are queued through the UpdateJobProcessNotifier
which is only running on the master node. All actions
performing such updates must run on the master node.

However, the CRUD actions for calendars and filters
are not master node actions. They have been submitting
the updates to the UpdateJobProcessNotifier even though
it might have not been running (given the action was
run on a non-master node). When that happens, the update
never reaches the process.

This commit fixes this problem by ensuring the notifier
runs on all nodes and by ensuring the process update action
gets the resources again before updating the process
(instead of having those resources passed in the request).

This ensures that even if the order of the updates
gets messed up, the latest update will read the latest
state of those resource and the process will get back
in sync.

This leaves us with 2 types of updates:

updates to the job config should happen on the master
node. This is because we cannot refetch the entire job
and update it. We need to know the parts that have been changed.
updates to resources the job uses. Those can be handled
on non-master nodes but they should be re-fetched by the
update process action.

Closes #31803

elasticmachine · 2018-07-04T17:40:34Z

Pinging @elastic/ml-core

droberts195

LGTM

Job updates or changes to calendars or filters may result into updating the job process if it has been running. To preserve the order of updates, process updates are queued through the UpdateJobProcessNotifier which is only running on the master node. All actions performing such updates must run on the master node. However, the CRUD actions for calendars and filters are not master node actions. They have been submitting the updates to the UpdateJobProcessNotifier even though it might have not been running (given the action was run on a non-master node). When that happens, the update never reaches the process. This commit fixes this problem by ensuring the notifier runs on all nodes and by ensuring the process update action gets the resources again before updating the process (instead of having those resources passed in the request). This ensures that even if the order of the updates gets messed up, the latest update will read the latest state of those resource and the process will get back in sync. This leaves us with 2 types of updates: 1. updates to the job config should happen on the master node. This is because we cannot refetch the entire job and update it. We need to know the parts that have been changed. 2. updates to resources the job uses. Those can be handled on non-master nodes but they should be re-fetched by the update process action. Closes elastic#31803

Job updates or changes to calendars or filters may result into updating the job process if it has been running. To preserve the order of updates, process updates are queued through the UpdateJobProcessNotifier which is only running on the master node. All actions performing such updates must run on the master node. However, the CRUD actions for calendars and filters are not master node actions. They have been submitting the updates to the UpdateJobProcessNotifier even though it might have not been running (given the action was run on a non-master node). When that happens, the update never reaches the process. This commit fixes this problem by ensuring the notifier runs on all nodes and by ensuring the process update action gets the resources again before updating the process (instead of having those resources passed in the request). This ensures that even if the order of the updates gets messed up, the latest update will read the latest state of those resource and the process will get back in sync. This leaves us with 2 types of updates: 1. updates to the job config should happen on the master node. This is because we cannot refetch the entire job and update it. We need to know the parts that have been changed. 2. updates to resources the job uses. Those can be handled on non-master nodes but they should be re-fetched by the update process action. Closes #31803

* 6.x: Test: Do not remove xpack templates when cleaning (#31642) SQL: Allow long literals (#31777) SQL: Fix incorrect message for aliases (#31792) Detach Transport from TransportService (#31727) 6.3.1 release notes (#31829) Add unreleased version 6.3.2 [ML][TEST] Use java 11 valid time format in DataDescriptionTests (#31817) [ML] Don't treat stale FAILED jobs as OPENING in job allocation (#31800) [ML] Fix calendar and filter updates from non-master nodes (#31804) Fix license header generation on Windows (#31790) mark XPackRestIT.test {p0=monitoring/bulk/10_basic/Bulk indexing of monitoring data} as AwaitsFix Add JDK11 support without enabling in CI (#31644) Watcher: Fix check for currently executed watches (#31137) [DOCS] Fixes 6.3.0 release notes (#31771) Watcher: Ensure correct method is used to read secure settings (#31753) [ML] Rate limit established model memory updates (#31768) SQL: Update CLI logo

* master: REST high-level client: add get index API (#31703) SQL: Allow long literals (#31777) SQL: Fix incorrect message for aliases (#31792) Test: Do not remove xpack templates when cleaning (#31642) Reduce more raw types warnings (#31780) Add unreleased version 6.3.2 Scripting: Remove support for deprecated StoredScript contexts (#31394) [ML][TEST] Use java 11 valid time format in DataDescriptionTests (#31817) [ML] Don't treat stale FAILED jobs as OPENING in job allocation (#31800) [ML] Fix calendar and filter updates from non-master nodes (#31804) Fix license header generation on Windows (#31790) mark RollupIT.testTwoJobsStartStopDeleteOne as AwaitsFix mark SearchAsyncActionTests.testFanOutAndCollect as AwaitsFix Correct exclusion of test on JDK 11 Fix doclint jdk 11 Add JDK11 support and enable in CI (#31644) Watcher: Fix check for currently executed watches (#31137) Watcher: Ensure correct method is used to read secure settings (#31753) SQL: Update CLI logo

dimitris-athanasiou added >bug review v7.0.0 :ml Machine learning v6.4.0 v6.5.0 v6.3.2 labels Jul 4, 2018

droberts195 removed the v6.5.0 label Jul 5, 2018

droberts195 approved these changes Jul 5, 2018

View reviewed changes

dimitris-athanasiou added 2 commits July 5, 2018 10:50

Fix long line

facc7a8

dimitris-athanasiou force-pushed the fix-calendar-and-filter-updates-on-process branch from 699b4f3 to facc7a8 Compare July 5, 2018 09:53

dimitris-athanasiou changed the title ~~[ML] Fix calendar and filter updates on process~~ [ML] Fix calendar and filter updates from non-master nodes Jul 5, 2018

dimitris-athanasiou merged commit 9c11bf1 into elastic:master Jul 5, 2018

dimitris-athanasiou deleted the fix-calendar-and-filter-updates-on-process branch July 5, 2018 12:14

colings86 added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ML] Fix calendar and filter updates from non-master nodes #31804

[ML] Fix calendar and filter updates from non-master nodes #31804

dimitris-athanasiou commented Jul 4, 2018

elasticmachine commented Jul 4, 2018

droberts195 left a comment

[ML] Fix calendar and filter updates from non-master nodes #31804

[ML] Fix calendar and filter updates from non-master nodes #31804

Conversation

dimitris-athanasiou commented Jul 4, 2018

elasticmachine commented Jul 4, 2018

droberts195 left a comment

Choose a reason for hiding this comment