Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Fix calendar and filter updates from non-master nodes #31804

Conversation

dimitris-athanasiou
Copy link
Contributor

Job updates or changes to calendars or filters may
result into updating the job process if it has been
running. To preserve the order of updates, process
updates are queued through the UpdateJobProcessNotifier
which is only running on the master node. All actions
performing such updates must run on the master node.

However, the CRUD actions for calendars and filters
are not master node actions. They have been submitting
the updates to the UpdateJobProcessNotifier even though
it might have not been running (given the action was
run on a non-master node). When that happens, the update
never reaches the process.

This commit fixes this problem by ensuring the notifier
runs on all nodes and by ensuring the process update action
gets the resources again before updating the process
(instead of having those resources passed in the request).

This ensures that even if the order of the updates
gets messed up, the latest update will read the latest
state of those resource and the process will get back
in sync.

This leaves us with 2 types of updates:

  1. updates to the job config should happen on the master
    node. This is because we cannot refetch the entire job
    and update it. We need to know the parts that have been changed.

  2. updates to resources the job uses. Those can be handled
    on non-master nodes but they should be re-fetched by the
    update process action.

Closes #31803

@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core

@droberts195 droberts195 removed the v6.5.0 label Jul 5, 2018
Copy link
Contributor

@droberts195 droberts195 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Job updates or changes to calendars or filters may
result into updating the job process if it has been
running. To preserve the order of updates, process
updates are queued through the UpdateJobProcessNotifier
which is only running on the master node. All actions
performing such updates must run on the master node.

However, the CRUD actions for calendars and filters
are not master node actions. They have been submitting
the updates to the UpdateJobProcessNotifier even though
it might have not been running (given the action was
run on a non-master node). When that happens, the update
never reaches the process.

This commit fixes this problem by ensuring the notifier
runs on all nodes and by ensuring the process update action
gets the resources again before updating the process
(instead of having those resources passed in the request).

This ensures that even if the order of the updates
gets messed up, the latest update will read the latest
state of those resource and the process will get back
in sync.

This leaves us with 2 types of updates:

  1. updates to the job config should happen on the master
  node. This is because we cannot refetch the entire job
  and update it. We need to know the parts that have been changed.

  2. updates to resources the job uses. Those can be handled
  on non-master nodes but they should be re-fetched by the
  update process action.

Closes elastic#31803
@dimitris-athanasiou dimitris-athanasiou force-pushed the fix-calendar-and-filter-updates-on-process branch from 699b4f3 to facc7a8 Compare July 5, 2018 09:53
@dimitris-athanasiou dimitris-athanasiou changed the title [ML] Fix calendar and filter updates on process [ML] Fix calendar and filter updates from non-master nodes Jul 5, 2018
@dimitris-athanasiou dimitris-athanasiou merged commit 9c11bf1 into elastic:master Jul 5, 2018
@dimitris-athanasiou dimitris-athanasiou deleted the fix-calendar-and-filter-updates-on-process branch July 5, 2018 12:14
dimitris-athanasiou added a commit that referenced this pull request Jul 5, 2018
Job updates or changes to calendars or filters may
result into updating the job process if it has been
running. To preserve the order of updates, process
updates are queued through the UpdateJobProcessNotifier
which is only running on the master node. All actions
performing such updates must run on the master node.

However, the CRUD actions for calendars and filters
are not master node actions. They have been submitting
the updates to the UpdateJobProcessNotifier even though
it might have not been running (given the action was
run on a non-master node). When that happens, the update
never reaches the process.

This commit fixes this problem by ensuring the notifier
runs on all nodes and by ensuring the process update action
gets the resources again before updating the process
(instead of having those resources passed in the request).

This ensures that even if the order of the updates
gets messed up, the latest update will read the latest
state of those resource and the process will get back
in sync.

This leaves us with 2 types of updates:

  1. updates to the job config should happen on the master
  node. This is because we cannot refetch the entire job
  and update it. We need to know the parts that have been changed.

  2. updates to resources the job uses. Those can be handled
  on non-master nodes but they should be re-fetched by the
  update process action.

Closes #31803
dimitris-athanasiou added a commit that referenced this pull request Jul 5, 2018
Job updates or changes to calendars or filters may
result into updating the job process if it has been
running. To preserve the order of updates, process
updates are queued through the UpdateJobProcessNotifier
which is only running on the master node. All actions
performing such updates must run on the master node.

However, the CRUD actions for calendars and filters
are not master node actions. They have been submitting
the updates to the UpdateJobProcessNotifier even though
it might have not been running (given the action was
run on a non-master node). When that happens, the update
never reaches the process.

This commit fixes this problem by ensuring the notifier
runs on all nodes and by ensuring the process update action
gets the resources again before updating the process
(instead of having those resources passed in the request).

This ensures that even if the order of the updates
gets messed up, the latest update will read the latest
state of those resource and the process will get back
in sync.

This leaves us with 2 types of updates:

  1. updates to the job config should happen on the master
  node. This is because we cannot refetch the entire job
  and update it. We need to know the parts that have been changed.

  2. updates to resources the job uses. Those can be handled
  on non-master nodes but they should be re-fetched by the
  update process action.

Closes #31803
dnhatn added a commit that referenced this pull request Jul 5, 2018
* 6.x:
  Test: Do not remove xpack templates when cleaning (#31642)
  SQL: Allow long literals (#31777)
  SQL: Fix incorrect message for aliases (#31792)
  Detach Transport from TransportService (#31727)
  6.3.1 release notes (#31829)
  Add unreleased version 6.3.2
  [ML][TEST] Use java 11 valid time format in DataDescriptionTests (#31817)
  [ML] Don't treat stale FAILED jobs as OPENING in job allocation (#31800)
  [ML] Fix calendar and filter updates from non-master nodes (#31804)
  Fix license header generation on Windows (#31790)
  mark XPackRestIT.test {p0=monitoring/bulk/10_basic/Bulk indexing of monitoring data} as AwaitsFix
  Add JDK11 support without enabling in CI (#31644)
  Watcher: Fix check for currently executed watches (#31137)
  [DOCS] Fixes 6.3.0 release notes (#31771)
  Watcher: Ensure correct method is used to read secure settings (#31753)
  [ML] Rate limit established model memory updates (#31768)
  SQL: Update CLI logo
dnhatn added a commit that referenced this pull request Jul 5, 2018
* master:
  REST high-level client: add get index API (#31703)
  SQL: Allow long literals (#31777)
  SQL: Fix incorrect message for aliases (#31792)
  Test: Do not remove xpack templates when cleaning (#31642)
  Reduce more raw types warnings (#31780)
  Add unreleased version 6.3.2
  Scripting: Remove support for deprecated StoredScript contexts (#31394)
  [ML][TEST] Use java 11 valid time format in DataDescriptionTests (#31817)
  [ML] Don't treat stale FAILED jobs as OPENING in job allocation (#31800)
  [ML] Fix calendar and filter updates from non-master nodes (#31804)
  Fix license header generation on Windows (#31790)
  mark RollupIT.testTwoJobsStartStopDeleteOne as AwaitsFix
  mark SearchAsyncActionTests.testFanOutAndCollect as AwaitsFix
  Correct exclusion of test on JDK 11
  Fix doclint jdk 11
  Add JDK11 support and enable in CI (#31644)
  Watcher: Fix check for currently executed watches (#31137)
  Watcher: Ensure correct method is used to read secure settings (#31753)
  SQL: Update CLI logo
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ML] Changes to calendars or filters do not update autodetect if handled on non-master node
4 participants