Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Integrate data frame analytics with ML upgrade mode #54326

Closed
droberts195 opened this issue Mar 27, 2020 · 3 comments
Closed

[ML] Integrate data frame analytics with ML upgrade mode #54326

droberts195 opened this issue Mar 27, 2020 · 3 comments
Labels
:ml Machine learning

Comments

@droberts195
Copy link
Contributor

In 6.7 we added an "upgrade mode" to ML. When ML upgrade mode is enabled, ML persistent tasks and endpoints are prevented from modifying ML internal indices. The intention is that ML upgrade mode be enabled while ML internal indices are undergoing reindexing or other maintenance operations. It can also optionally be enabled to reduce churn of ML persistent tasks during rolling upgrades.

The work to add this functionality to anomaly detection was done in #37837, #37942 and #38040.

As data frame analytics and inference become more mature and widely used we need to ensure ML upgrade mode has sensible effects on these features.

Additionally, there may be areas where we may have accidentally introduced loopholes into the way ML upgrade mode works with anomaly detection.

The following work items are required:

  1. Have we introduced more anomaly detection functionality into 7.x that could result in a write to an ML index when ML upgrade mode is enabled? (Maybe auto-generated notifications or annotations? Maybe retrying results writes? But this needs some research.) If so, add guards so that we don't write to ML internal indices that might be undergoing maintenance.
  2. Make sure that data frame analytics persistent tasks are unassigned when ML upgrade mode is set, and given an assignment reason that will mean they will not be considered to have failed and will get reassigned when ML upgrade mode is disabled again.
  3. Consider what we should do about inference when ML upgrade mode is enabled. The key requirement is that we don't read or write any ML internal indices while upgrade mode is enabled. Existing inference ingest processors with models loaded locally should be fine to continue. Guards need to be put around accesses to the .ml-inference index. (Inference and upgrade mode could get far more complicated when we start to use native processes on other nodes to do inference.)
@droberts195 droberts195 added the :ml Machine learning label Mar 27, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core (:ml)

@droberts195
Copy link
Contributor Author

While discussing this we thought of two more things:

  1. The nightly maintenance task should not do anything when ML upgrade mode is enabled, as that can touch internal indices. It should just reschedule itself to run the next day.
  2. Maybe our CRUD actions can check for ML upgrade mode in the same way as they check for cluster blocks, returning a cluster block exception when ML upgrade mode is enabled. (But first check whether these exceptions render nicely in the ML UI.)

@droberts195
Copy link
Contributor Author

Fixed by #54523

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:ml Machine learning
Projects
None yet
Development

No branches or pull requests

3 participants