[ML] Recheck trained model allocations when persistent tasks complete #85321

droberts195 · 2022-03-24T11:35:56Z

#85310 has made an improvement to the conditions for allocating trained models to nodes, but there is still a situation that is not covered.

It's possible that there could be a trained model that cannot be allocated because all the available ML native memory is being used for jobs. When these jobs are stopped memory will be freed up that might allow the trained model to be allocated, so we should recheck allocation.

Therefore, there should be an extra check in the trained model allocation cluster state listener for persistent tasks being changed, and, in particular, ML persistent tasks completing.

elasticmachine · 2022-03-24T11:35:59Z

Pinging @elastic/ml-core (Team:ML)

dimitris-athanasiou · 2022-08-04T07:31:32Z

This has been addressed by #88323

droberts195 added the :ml Machine learning label Mar 24, 2022

elasticmachine added the Team:ML Meta label for the ML team label Mar 24, 2022

droberts195 mentioned this issue Mar 24, 2022

[ML] Reallocate model deployments on node shutdown events. #85310

Merged

dimitris-athanasiou closed this as completed Aug 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ML] Recheck trained model allocations when persistent tasks complete #85321

[ML] Recheck trained model allocations when persistent tasks complete #85321

droberts195 commented Mar 24, 2022

elasticmachine commented Mar 24, 2022

dimitris-athanasiou commented Aug 4, 2022

[ML] Recheck trained model allocations when persistent tasks complete #85321

[ML] Recheck trained model allocations when persistent tasks complete #85321

Comments

droberts195 commented Mar 24, 2022

elasticmachine commented Mar 24, 2022

dimitris-athanasiou commented Aug 4, 2022