Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] allow for larger models in the inference step for data frame analytics #76116

Conversation

benwtrent
Copy link
Member

When a user creates a Data frame analytics model, it is possible that the inference step fails due to he model being too large to fit in the JVM.

Example error messages:

[foo] failed running inference on model [foo-1628085713000]; cause was [Data too large, data for [foo-1628085713000] would be [...], which is larger than the limit of [...]]
[foo] failed running inference on model [foo-1628085713000]; cause was [Cannot parse model definition as the content is larger than the maximum stream size of [...] bytes. Max stream size is 10% of the JVM heap or 1GB whichever is smallest]

This commit partially addresses these error by allowing the circuit breaker to handle the OOM prevention. Since the model was recently created by an internal process, this is acceptable.

relates to #76093

@elasticmachine elasticmachine added the Team:ML Meta label for the ML team label Aug 4, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

Copy link
Contributor

@dimitris-athanasiou dimitris-athanasiou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

TrainedModelConfig config = loadModelFromResource(modelId, false).build().ensureParsedDefinition(xContentRegistry);
TrainedModelConfig config = loadModelFromResource(modelId, false)
.build()
.ensureParsedDefinitionUnsafe(xContentRegistry);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should respect the value of the unsafe parameter the same as line 432

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davidkyle I don't think it should. Models in the resourceFiles are provided in the jar distribution. So, I don't think we should ever check their stream length on parsing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I don't think we should ever check their stream length on parsing.

Because we know how big they are? Why not enforce that with a simple check.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is superfluous to me as the only way to adjust this resource is to modify the resource files directly on disk and since we control these resource models, we already know and trust their sizes.

Copy link
Member

@davidkyle davidkyle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@benwtrent benwtrent added the auto-backport Automatically create backport pull requests when merged label Aug 9, 2021
@benwtrent benwtrent merged commit f66cc4d into elastic:master Aug 9, 2021
@benwtrent benwtrent deleted the feature/ml-better-support-java-inference-in-dfa branch August 9, 2021 16:17
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
7.x Commit could not be cherrypicked due to conflicts

To backport manually run:
backport --pr 76116

benwtrent added a commit to benwtrent/elasticsearch that referenced this pull request Aug 9, 2021
…lytics (elastic#76116)

When a user creates a Data frame analytics model, it is possible that the inference step fails due to he model being too large to fit in the JVM.

Example error messages:
```
[foo] failed running inference on model [foo-1628085713000]; cause was [Data too large, data for [foo-1628085713000] would be [...], which is larger than the limit of [...]]
```
```
[foo] failed running inference on model [foo-1628085713000]; cause was [Cannot parse model definition as the content is larger than the maximum stream size of [...] bytes. Max stream size is 10% of the JVM heap or 1GB whichever is smallest]
```

This commit partially addresses these error by allowing the circuit breaker to handle the OOM prevention. Since the model was recently created by an internal process, this is acceptable.

relates to elastic#76093
elasticsearchmachine pushed a commit that referenced this pull request Aug 9, 2021
…lytics (#76116) (#76256)

When a user creates a Data frame analytics model, it is possible that the inference step fails due to he model being too large to fit in the JVM.

Example error messages:
```
[foo] failed running inference on model [foo-1628085713000]; cause was [Data too large, data for [foo-1628085713000] would be [...], which is larger than the limit of [...]]
```
```
[foo] failed running inference on model [foo-1628085713000]; cause was [Cannot parse model definition as the content is larger than the maximum stream size of [...] bytes. Max stream size is 10% of the JVM heap or 1GB whichever is smallest]
```

This commit partially addresses these error by allowing the circuit breaker to handle the OOM prevention. Since the model was recently created by an internal process, this is acceptable.

relates to #76093
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Automatically create backport pull requests when merged >bug :ml Machine learning Team:ML Meta label for the ML team v7.15.0 v8.0.0-alpha2
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants