[ML] Infer against model deployment #71177

dimitris-athanasiou · 2021-04-01T12:18:15Z

This adds a temporary API for doing inference against
a trained model deployment.

elasticmachine · 2021-04-01T12:18:18Z

Pinging @elastic/ml-core (Team:ML)

davidkyle

Couple of suggestions but LGTM

davidkyle · 2021-04-01T15:11:25Z

.../src/main/java/org/elasticsearch/xpack/core/ml/action/InferTrainedModelDeploymentAction.java

+            super(in);
+            result = new PyTorchResult(in);
+        }
+


Does response need a writeTo method?

@Override public void writeTo(StreamOutput out) throws IOException { super.writeTo(out); result.writeTo(out); }

Well spotted!

davidkyle · 2021-04-01T15:33:31Z

...n/core/src/main/java/org/elasticsearch/xpack/core/ml/inference/deployment/PyTorchResult.java

+                double[][] primitiveDoubles = new double[listOfListOfDoubles.size()][];
+                for (int i = 0; i < listOfListOfDoubles.size(); i++) {
+                    List<Double> row = listOfListOfDoubles.get(i);
+                    double[] primitiveRow = new double[row.size()];


Suggested change

double[] primitiveRow = new double[row.size()];

primitiveDoubles[i] = row.toArray(new double[]{});

rather than copying the elements. Or if the elements must be copied perhaps System.arrayCopy(primitiveRow, row.toArray(new double[]{});

What about using the row.stream().mapToDouble(d -> d).toArray(); way?

davidkyle · 2021-04-01T15:43:28Z

...ugin/ml/src/main/java/org/elasticsearch/xpack/ml/inference/deployment/DeploymentManager.java

+        ProcessContext processContext = processContextByAllocation.get(task.getAllocationId());
+        try {
+            String requestId = processContext.process.get().writeInferenceRequest(inputs);
+            waitForResult(processContext, requestId, listener);


A future improvement would be to not block the thread here

I think this might be worth fixing now. It's something that could be forgotten with terrible consequences. I'll work on this following the pattern we used in AutodetectCommunicator.

Actually, the pattern in AutodetectCommunicator ensures we only perform one operation at a time against the native process. This might not be a restriction for pytorch. We could investigate multithreading capabilities in order to do inference on multiple requests in parallel in the same process. So, for now, I will just make sure we're not blocking the thread.

davidkyle · 2021-04-01T15:46:05Z

...src/main/java/org/elasticsearch/xpack/ml/inference/pytorch/process/NativePyTorchProcess.java

    private static final String NAME = "pytorch_inference";

+    private static AtomicLong ms_RequestId = new AtomicLong(1);


what does the ms_ mean?

(m)ember (s)tatic. Not sure if we're supposed to be using this convention. In all fairness, I saw we did it this way in AutodetectControlMsgWriter.ms_FlushNumber but perhaps that's too ancient to follow :-)

It’s the old Prelert convention that matched the C++ standards. We took these out of the Java code in 2016 but must have missed that one.

cool, I'll change it then. And raise a tiny PR to fix the flush one too.

davidkyle · 2021-04-01T15:50:48Z

...c/main/java/org/elasticsearch/xpack/ml/inference/pytorch/process/PyTorchResultProcessor.java

+                logger.debug(() -> new ParameterizedMessage("[{}] Parsed result with id [{}]", deploymentId, result.getRequestId()));
+                PendingResult pendingResult = pendingResults.get(result.getRequestId());
+                if (pendingResult == null) {
+                    logger.debug(() -> new ParameterizedMessage("[{}] no pending result for [{}]", deploymentId, result.getRequestId()));


Suggested change

logger.debug(() -> new ParameterizedMessage("[{}] no pending result for [{}]", deploymentId, result.getRequestId()));

logger.warn(() -> new ParameterizedMessage("[{}] no pending result for [{}]", deploymentId, result.getRequestId()));

This is interesting enough to be warn

This should only occur if the infer request timed out. That would indicate a throughput problem. I'm not sure we'd want to fill the log with those in that case as we could be getting an entry per document we're trying to apply inference to.

This adds a temporary API for doing inference against a trained model deployment.

The feature branch contains changes to configure PyTorch models with a TrainedModelConfig and defines a format to store the binary models. The _start and _stop deployment actions control the model lifecycle and the model can be directly evaluated with the _infer endpoint. 2 Types of NLP tasks are supported: Named Entity Recognition and Fill Mask. The feature branch consists of these PRs: #73523, #72218, #71679 #71323, #71035, #71177, #70713

dimitris-athanasiou added the :ml Machine learning label Apr 1, 2021

elasticmachine added the Team:ML Meta label for the ML team label Apr 1, 2021

davidkyle approved these changes Apr 1, 2021

View reviewed changes

dimitris-athanasiou added 2 commits April 2, 2021 11:17

[ML] Infer against model deployment

4badd50

This adds a temporary API for doing inference against a trained model deployment.

Some review comments

b3cc616

dimitris-athanasiou force-pushed the infer-against-model-deployment branch from f9c32a9 to b3cc616 Compare April 2, 2021 08:17

dimitris-athanasiou added 5 commits April 2, 2021 11:32

Wait for result on a different thread

b69e69f

Rename static requestId

19583cb

Better name

12227f8

Fix line length

96342de

Add infer action to constants

34fdf91

dimitris-athanasiou merged commit a261f0d into elastic:feature/pytorch-inference Apr 7, 2021

dimitris-athanasiou deleted the infer-against-model-deployment branch April 7, 2021 12:28

davidkyle mentioned this pull request Jun 2, 2021

[ML] Merge the pytorch-inference feature branch #73660

Merged

lcawl added the >enhancement label Jul 27, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ML] Infer against model deployment #71177

[ML] Infer against model deployment #71177

dimitris-athanasiou commented Apr 1, 2021

elasticmachine commented Apr 1, 2021

davidkyle left a comment

davidkyle Apr 1, 2021

dimitris-athanasiou Apr 2, 2021

davidkyle Apr 1, 2021

dimitris-athanasiou Apr 2, 2021

davidkyle Apr 1, 2021

dimitris-athanasiou Apr 2, 2021

dimitris-athanasiou Apr 2, 2021

davidkyle Apr 1, 2021

dimitris-athanasiou Apr 2, 2021

droberts195 Apr 2, 2021

dimitris-athanasiou Apr 2, 2021

davidkyle Apr 1, 2021

dimitris-athanasiou Apr 2, 2021

	double[] primitiveRow = new double[row.size()];
	primitiveDoubles[i] = row.toArray(new double[]{});

		private static final String NAME = "pytorch_inference";

		private static AtomicLong ms_RequestId = new AtomicLong(1);

	logger.debug(() -> new ParameterizedMessage("[{}] no pending result for [{}]", deploymentId, result.getRequestId()));
	logger.warn(() -> new ParameterizedMessage("[{}] no pending result for [{}]", deploymentId, result.getRequestId()));

[ML] Infer against model deployment #71177

[ML] Infer against model deployment #71177

Conversation

dimitris-athanasiou commented Apr 1, 2021

elasticmachine commented Apr 1, 2021

davidkyle left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment