[ML] Start and stop model deployments #70713

dimitris-athanasiou · 2021-03-23T09:48:24Z

No description provided.

elasticmachine · 2021-03-23T09:48:28Z

Pinging @elastic/ml-core (Team:ML)

davidkyle

LGTM

davidkyle · 2021-03-24T12:08:46Z

.../src/main/java/org/elasticsearch/xpack/core/ml/action/StartTrainedModelDeploymentAction.java

+        }
+
+        public void setTimeout(TimeValue timeout) {
+            this.timeout = timeout;


Suggested change

this.timeout = timeout;

this.timeout = ExceptionsHelper.requireNonNull(timeout, TIMEOUT);

davidkyle · 2021-03-24T12:12:22Z

.../src/main/java/org/elasticsearch/xpack/core/ml/action/StartTrainedModelDeploymentAction.java

+
+    public static class TaskParams implements PersistentTaskParams {
+
+        public static final Version VERSION_INTRODUCED = Version.V_7_13_0;


Will change to 8 and we see :-)

davidkyle · 2021-03-24T15:32:16Z

...c/main/java/org/elasticsearch/xpack/ml/action/TransportStopTrainedModelDeploymentAction.java

+                    listener.onResponse(new StopTrainedModelDeploymentAction.Response(true));
+                    return;
+                }
+                if (models.size() > 1) {


In future we may have more than one model config using the deployment. I might be that we don't do the GetTrainedModelsAction here and just look for persistent tasks that match the model ID

Agreed. That's how we typically implement stop actions where we handle stopping multiple tasks at once. Just thought this was simpler for now.

davidkyle · 2021-03-24T15:43:51Z

...ugin/ml/src/main/java/org/elasticsearch/xpack/ml/inference/deployment/DeploymentManager.java

+    }
+
+    private void doStartDeployment(TrainedModelDeploymentTask task) {
+        logger.info("[{}] Starting model deployment", task.getModelId());


Suggested change

logger.info("[{}] Starting model deployment", task.getModelId());

logger.debug("[{}] Starting model deployment", task.getModelId());

The feature branch contains changes to configure PyTorch models with a TrainedModelConfig and defines a format to store the binary models. The _start and _stop deployment actions control the model lifecycle and the model can be directly evaluated with the _infer endpoint. 2 Types of NLP tasks are supported: Named Entity Recognition and Fill Mask. The feature branch consists of these PRs: #73523, #72218, #71679 #71323, #71035, #71177, #70713

[ML] Start and stop model deployments

af9f608

dimitris-athanasiou added WIP :ml Machine learning labels Mar 23, 2021

dimitris-athanasiou requested a review from davidkyle March 23, 2021 09:48

elasticmachine added the Team:ML Meta label for the ML team label Mar 23, 2021

dimitris-athanasiou added 2 commits March 23, 2021 12:40

Wait for deployment started

bc8b898

Rename to start/stop trained model deployment

7cc739f

dimitris-athanasiou changed the base branch from master to feature/pytorch-inference March 23, 2021 12:37

dimitris-athanasiou added 3 commits March 23, 2021 16:03

More renamings plus cancelling of start action when assignment fails

9f38924

Some more renaming

072f12c

Load model from hardcoded doc

6fe7992

davidkyle approved these changes Mar 24, 2021

View reviewed changes

Address review comments

fa57945

dimitris-athanasiou merged commit 8ba697b into elastic:feature/pytorch-inference Mar 29, 2021

dimitris-athanasiou removed the WIP label Apr 1, 2021

davidkyle mentioned this pull request Apr 23, 2021

[ML] [PyTorch] Communications with between ES and the native process elastic/ml-cpp#1700

Closed

davidkyle mentioned this pull request Jun 2, 2021

[ML] Merge the pytorch-inference feature branch #73660

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ML] Start and stop model deployments #70713

[ML] Start and stop model deployments #70713

dimitris-athanasiou commented Mar 23, 2021

elasticmachine commented Mar 23, 2021

davidkyle left a comment

davidkyle Mar 24, 2021

davidkyle Mar 24, 2021

dimitris-athanasiou Mar 29, 2021

davidkyle Mar 24, 2021

dimitris-athanasiou Mar 29, 2021

davidkyle Mar 24, 2021

	this.timeout = timeout;
	this.timeout = ExceptionsHelper.requireNonNull(timeout, TIMEOUT);


		public static class TaskParams implements PersistentTaskParams {

		public static final Version VERSION_INTRODUCED = Version.V_7_13_0;

	logger.info("[{}] Starting model deployment", task.getModelId());
	logger.debug("[{}] Starting model deployment", task.getModelId());

[ML] Start and stop model deployments #70713

[ML] Start and stop model deployments #70713

Conversation

dimitris-athanasiou commented Mar 23, 2021

elasticmachine commented Mar 23, 2021

davidkyle left a comment

Choose a reason for hiding this comment

davidkyle Mar 24, 2021

Choose a reason for hiding this comment

davidkyle Mar 24, 2021

Choose a reason for hiding this comment

dimitris-athanasiou Mar 29, 2021

Choose a reason for hiding this comment

davidkyle Mar 24, 2021

Choose a reason for hiding this comment

dimitris-athanasiou Mar 29, 2021

Choose a reason for hiding this comment

davidkyle Mar 24, 2021

Choose a reason for hiding this comment