Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Start and stop model deployments #70713

Conversation

dimitris-athanasiou
Copy link
Contributor

No description provided.

@dimitris-athanasiou dimitris-athanasiou added WIP :ml Machine learning labels Mar 23, 2021
@elasticmachine elasticmachine added the Team:ML Meta label for the ML team label Mar 23, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

@dimitris-athanasiou dimitris-athanasiou changed the base branch from master to feature/pytorch-inference March 23, 2021 12:37
Copy link
Member

@davidkyle davidkyle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

}

public void setTimeout(TimeValue timeout) {
this.timeout = timeout;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
this.timeout = timeout;
this.timeout = ExceptionsHelper.requireNonNull(timeout, TIMEOUT);


public static class TaskParams implements PersistentTaskParams {

public static final Version VERSION_INTRODUCED = Version.V_7_13_0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ambitious

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will change to 8 and we see :-)

listener.onResponse(new StopTrainedModelDeploymentAction.Response(true));
return;
}
if (models.size() > 1) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In future we may have more than one model config using the deployment. I might be that we don't do the GetTrainedModelsAction here and just look for persistent tasks that match the model ID

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. That's how we typically implement stop actions where we handle stopping multiple tasks at once. Just thought this was simpler for now.

}

private void doStartDeployment(TrainedModelDeploymentTask task) {
logger.info("[{}] Starting model deployment", task.getModelId());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
logger.info("[{}] Starting model deployment", task.getModelId());
logger.debug("[{}] Starting model deployment", task.getModelId());

@dimitris-athanasiou dimitris-athanasiou merged commit 8ba697b into elastic:feature/pytorch-inference Mar 29, 2021
davidkyle added a commit that referenced this pull request Jun 3, 2021
The feature branch contains changes to configure PyTorch models with a 
TrainedModelConfig and defines a format to store the binary models. 
The _start and _stop deployment actions control the model lifecycle 
and the model can be directly evaluated with the _infer endpoint. 
2 Types of NLP tasks are supported: Named Entity Recognition and Fill Mask.

The feature branch consists of these PRs: #73523, #72218, #71679
#71323, #71035, #71177, #70713
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:ml Machine learning Team:ML Meta label for the ML team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants