Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Infer against model deployment #71177

Conversation

dimitris-athanasiou
Copy link
Contributor

This adds a temporary API for doing inference against
a trained model deployment.

@dimitris-athanasiou dimitris-athanasiou added the :ml Machine learning label Apr 1, 2021
@elasticmachine elasticmachine added the Team:ML Meta label for the ML team label Apr 1, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

Copy link
Member

@davidkyle davidkyle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of suggestions but LGTM

super(in);
result = new PyTorchResult(in);
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does response need a writeTo method?

        @Override
        public void writeTo(StreamOutput out) throws IOException {
            super.writeTo(out);
            result.writeTo(out);
        }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well spotted!

double[][] primitiveDoubles = new double[listOfListOfDoubles.size()][];
for (int i = 0; i < listOfListOfDoubles.size(); i++) {
List<Double> row = listOfListOfDoubles.get(i);
double[] primitiveRow = new double[row.size()];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
double[] primitiveRow = new double[row.size()];
primitiveDoubles[i] = row.toArray(new double[]{});

rather than copying the elements. Or if the elements must be copied perhaps System.arrayCopy(primitiveRow, row.toArray(new double[]{});

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about using the row.stream().mapToDouble(d -> d).toArray(); way?

ProcessContext processContext = processContextByAllocation.get(task.getAllocationId());
try {
String requestId = processContext.process.get().writeInferenceRequest(inputs);
waitForResult(processContext, requestId, listener);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A future improvement would be to not block the thread here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this might be worth fixing now. It's something that could be forgotten with terrible consequences. I'll work on this following the pattern we used in AutodetectCommunicator.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, the pattern in AutodetectCommunicator ensures we only perform one operation at a time against the native process. This might not be a restriction for pytorch. We could investigate multithreading capabilities in order to do inference on multiple requests in parallel in the same process. So, for now, I will just make sure we're not blocking the thread.

private static final String NAME = "pytorch_inference";

private static AtomicLong ms_RequestId = new AtomicLong(1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does the ms_ mean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(m)ember (s)tatic. Not sure if we're supposed to be using this convention. In all fairness, I saw we did it this way in AutodetectControlMsgWriter.ms_FlushNumber but perhaps that's too ancient to follow :-)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It’s the old Prelert convention that matched the C++ standards. We took these out of the Java code in 2016 but must have missed that one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool, I'll change it then. And raise a tiny PR to fix the flush one too.

logger.debug(() -> new ParameterizedMessage("[{}] Parsed result with id [{}]", deploymentId, result.getRequestId()));
PendingResult pendingResult = pendingResults.get(result.getRequestId());
if (pendingResult == null) {
logger.debug(() -> new ParameterizedMessage("[{}] no pending result for [{}]", deploymentId, result.getRequestId()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
logger.debug(() -> new ParameterizedMessage("[{}] no pending result for [{}]", deploymentId, result.getRequestId()));
logger.warn(() -> new ParameterizedMessage("[{}] no pending result for [{}]", deploymentId, result.getRequestId()));

This is interesting enough to be warn

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should only occur if the infer request timed out. That would indicate a throughput problem. I'm not sure we'd want to fill the log with those in that case as we could be getting an entry per document we're trying to apply inference to.

This adds a temporary API for doing inference against
a trained model deployment.
@dimitris-athanasiou dimitris-athanasiou force-pushed the infer-against-model-deployment branch from f9c32a9 to b3cc616 Compare April 2, 2021 08:17
@dimitris-athanasiou dimitris-athanasiou merged commit a261f0d into elastic:feature/pytorch-inference Apr 7, 2021
@dimitris-athanasiou dimitris-athanasiou deleted the infer-against-model-deployment branch April 7, 2021 12:28
davidkyle added a commit that referenced this pull request Jun 3, 2021
The feature branch contains changes to configure PyTorch models with a 
TrainedModelConfig and defines a format to store the binary models. 
The _start and _stop deployment actions control the model lifecycle 
and the model can be directly evaluated with the _infer endpoint. 
2 Types of NLP tasks are supported: Named Entity Recognition and Fill Mask.

The feature branch consists of these PRs: #73523, #72218, #71679
#71323, #71035, #71177, #70713
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :ml Machine learning Team:ML Meta label for the ML team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants