You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
#85310 has made an improvement to the conditions for allocating trained models to nodes, but there is still a situation that is not covered.
It's possible that there could be a trained model that cannot be allocated because all the available ML native memory is being used for jobs. When these jobs are stopped memory will be freed up that might allow the trained model to be allocated, so we should recheck allocation.
Therefore, there should be an extra check in the trained model allocation cluster state listener for persistent tasks being changed, and, in particular, ML persistent tasks completing.
The text was updated successfully, but these errors were encountered:
#85310 has made an improvement to the conditions for allocating trained models to nodes, but there is still a situation that is not covered.
It's possible that there could be a trained model that cannot be allocated because all the available ML native memory is being used for jobs. When these jobs are stopped memory will be freed up that might allow the trained model to be allocated, so we should recheck allocation.
Therefore, there should be an extra check in the trained model allocation cluster state listener for persistent tasks being changed, and, in particular, ML persistent tasks completing.
The text was updated successfully, but these errors were encountered: