[ML] allow for larger models in the inference step for data frame analytics #76116

benwtrent · 2021-08-04T17:19:59Z

When a user creates a Data frame analytics model, it is possible that the inference step fails due to he model being too large to fit in the JVM.

Example error messages:

[foo] failed running inference on model [foo-1628085713000]; cause was [Data too large, data for [foo-1628085713000] would be [...], which is larger than the limit of [...]]

[foo] failed running inference on model [foo-1628085713000]; cause was [Cannot parse model definition as the content is larger than the maximum stream size of [...] bytes. Max stream size is 10% of the JVM heap or 1GB whichever is smallest]

This commit partially addresses these error by allowing the circuit breaker to handle the OOM prevention. Since the model was recently created by an internal process, this is acceptable.

relates to #76093

…lytics

elasticmachine · 2021-08-04T17:20:03Z

Pinging @elastic/ml-core (Team:ML)

dimitris-athanasiou

LGTM

davidkyle · 2021-08-09T14:16:48Z

.../ml/src/main/java/org/elasticsearch/xpack/ml/inference/persistence/TrainedModelProvider.java

-                TrainedModelConfig config = loadModelFromResource(modelId, false).build().ensureParsedDefinition(xContentRegistry);
+                TrainedModelConfig config = loadModelFromResource(modelId, false)
+                    .build()
+                    .ensureParsedDefinitionUnsafe(xContentRegistry);


This should respect the value of the unsafe parameter the same as line 432

@davidkyle I don't think it should. Models in the resourceFiles are provided in the jar distribution. So, I don't think we should ever check their stream length on parsing.

So, I don't think we should ever check their stream length on parsing.

Because we know how big they are? Why not enforce that with a simple check.

It is superfluous to me as the only way to adjust this resource is to modify the resource files directly on disk and since we control these resource models, we already know and trust their sizes.

davidkyle

LGTM

elasticsearchmachine · 2021-08-09T16:19:15Z

💔 Backport failed

Status	Branch	Result
❌	7.x	Commit could not be cherrypicked due to conflicts

To backport manually run:
backport --pr 76116

…lytics (elastic#76116) When a user creates a Data frame analytics model, it is possible that the inference step fails due to he model being too large to fit in the JVM. Example error messages: ``` [foo] failed running inference on model [foo-1628085713000]; cause was [Data too large, data for [foo-1628085713000] would be [...], which is larger than the limit of [...]] ``` ``` [foo] failed running inference on model [foo-1628085713000]; cause was [Cannot parse model definition as the content is larger than the maximum stream size of [...] bytes. Max stream size is 10% of the JVM heap or 1GB whichever is smallest] ``` This commit partially addresses these error by allowing the circuit breaker to handle the OOM prevention. Since the model was recently created by an internal process, this is acceptable. relates to elastic#76093

…lytics (#76116) (#76256) When a user creates a Data frame analytics model, it is possible that the inference step fails due to he model being too large to fit in the JVM. Example error messages: ``` [foo] failed running inference on model [foo-1628085713000]; cause was [Data too large, data for [foo-1628085713000] would be [...], which is larger than the limit of [...]] ``` ``` [foo] failed running inference on model [foo-1628085713000]; cause was [Cannot parse model definition as the content is larger than the maximum stream size of [...] bytes. Max stream size is 10% of the JVM heap or 1GB whichever is smallest] ``` This commit partially addresses these error by allowing the circuit breaker to handle the OOM prevention. Since the model was recently created by an internal process, this is acceptable. relates to #76093

[ML] allow for larger models in the inference step for data frame ana…

dc6029f

…lytics

benwtrent added >bug :ml Machine learning v8.0.0 v7.15.0 labels Aug 4, 2021

elasticmachine added the Team:ML Meta label for the ML team label Aug 4, 2021

dimitris-athanasiou approved these changes Aug 9, 2021

View reviewed changes

davidkyle reviewed Aug 9, 2021

View reviewed changes

davidkyle approved these changes Aug 9, 2021

View reviewed changes

benwtrent added the auto-backport Automatically create backport pull requests when merged label Aug 9, 2021

benwtrent merged commit f66cc4d into elastic:master Aug 9, 2021

benwtrent deleted the feature/ml-better-support-java-inference-in-dfa branch August 9, 2021 16:17

benwtrent mentioned this pull request Aug 9, 2021

[7.x] [ML] allow for larger models in the inference step for data frame analytics (#76116) #76256

Merged

jakelandis added v8.0.0-alpha2 and removed v8.0.0 labels Sep 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ML] allow for larger models in the inference step for data frame analytics #76116

[ML] allow for larger models in the inference step for data frame analytics #76116

benwtrent commented Aug 4, 2021

elasticmachine commented Aug 4, 2021

dimitris-athanasiou left a comment

davidkyle Aug 9, 2021

benwtrent Aug 9, 2021

davidkyle Aug 9, 2021

benwtrent Aug 9, 2021

davidkyle left a comment

elasticsearchmachine commented Aug 9, 2021

[ML] allow for larger models in the inference step for data frame analytics #76116

[ML] allow for larger models in the inference step for data frame analytics #76116

Conversation

benwtrent commented Aug 4, 2021

elasticmachine commented Aug 4, 2021

dimitris-athanasiou left a comment

Choose a reason for hiding this comment

davidkyle Aug 9, 2021

Choose a reason for hiding this comment

benwtrent Aug 9, 2021

Choose a reason for hiding this comment

davidkyle Aug 9, 2021

Choose a reason for hiding this comment

benwtrent Aug 9, 2021

Choose a reason for hiding this comment

davidkyle left a comment

Choose a reason for hiding this comment

elasticsearchmachine commented Aug 9, 2021

💔 Backport failed