[ML] Integrate data frame analytics with ML upgrade mode #54326

droberts195 · 2020-03-27T11:35:34Z

In 6.7 we added an "upgrade mode" to ML. When ML upgrade mode is enabled, ML persistent tasks and endpoints are prevented from modifying ML internal indices. The intention is that ML upgrade mode be enabled while ML internal indices are undergoing reindexing or other maintenance operations. It can also optionally be enabled to reduce churn of ML persistent tasks during rolling upgrades.

The work to add this functionality to anomaly detection was done in #37837, #37942 and #38040.

As data frame analytics and inference become more mature and widely used we need to ensure ML upgrade mode has sensible effects on these features.

Additionally, there may be areas where we may have accidentally introduced loopholes into the way ML upgrade mode works with anomaly detection.

The following work items are required:

Have we introduced more anomaly detection functionality into 7.x that could result in a write to an ML index when ML upgrade mode is enabled? (Maybe auto-generated notifications or annotations? Maybe retrying results writes? But this needs some research.) If so, add guards so that we don't write to ML internal indices that might be undergoing maintenance.
Make sure that data frame analytics persistent tasks are unassigned when ML upgrade mode is set, and given an assignment reason that will mean they will not be considered to have failed and will get reassigned when ML upgrade mode is disabled again.
Consider what we should do about inference when ML upgrade mode is enabled. The key requirement is that we don't read or write any ML internal indices while upgrade mode is enabled. Existing inference ingest processors with models loaded locally should be fine to continue. Guards need to be put around accesses to the .ml-inference index. (Inference and upgrade mode could get far more complicated when we start to use native processes on other nodes to do inference.)

The text was updated successfully, but these errors were encountered:

elasticmachine · 2020-03-27T11:35:36Z

Pinging @elastic/ml-core (:ml)

droberts195 · 2020-03-27T12:26:37Z

While discussing this we thought of two more things:

The nightly maintenance task should not do anything when ML upgrade mode is enabled, as that can touch internal indices. It should just reschedule itself to run the next day.
Maybe our CRUD actions can check for ML upgrade mode in the same way as they check for cluster blocks, returning a cluster block exception when ML upgrade mode is enabled. (But first check whether these exceptions render nicely in the ML UI.)

droberts195 · 2020-07-03T17:54:12Z

Fixed by #54523

droberts195 added the :ml Machine learning label Mar 27, 2020

droberts195 assigned przemekwitek Mar 27, 2020

This was referenced Apr 1, 2020

Skip daily maintenance activity if upgrade mode is enabled #54565

Merged

Unassign DFA tasks in SetUpgradeModeAction #54523

Merged

Do not execute ML CRUD actions when upgrade mode is enabled #54437

Merged

droberts195 closed this as completed Jul 3, 2020

ChrisHegarty unassigned przemekwitek Oct 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ML] Integrate data frame analytics with ML upgrade mode #54326

[ML] Integrate data frame analytics with ML upgrade mode #54326

droberts195 commented Mar 27, 2020

elasticmachine commented Mar 27, 2020

droberts195 commented Mar 27, 2020

droberts195 commented Jul 3, 2020

[ML] Integrate data frame analytics with ML upgrade mode #54326

[ML] Integrate data frame analytics with ML upgrade mode #54326

Comments

droberts195 commented Mar 27, 2020

elasticmachine commented Mar 27, 2020

droberts195 commented Mar 27, 2020

droberts195 commented Jul 3, 2020