Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI][ML] Rolling upgrade failure in '30_ml_jobs_crud/Test model memory limit is updated' #36961

Closed
davidkyle opened this issue Dec 22, 2018 · 5 comments
Assignees
Labels
:ml Machine learning >test-failure Triaged test failures from CI

Comments

@davidkyle
Copy link
Member

Several failures in the rolling upgrade tests with the same error

java.lang.AssertionError: Failure at [upgraded_cluster/30_ml_jobs_crud:66]: jobs.0.analysis_limits.model_memory_limit didn't match expected value:
jobs.0.analysis_limits.model_memory_limit: expected [130mb] but was [100mb]

https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.x+bwc-tests/180/console
https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.x+bwc-tests/179/console
https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.6+bwc-tests/8/console

@davidkyle davidkyle added >test-failure Triaged test failures from CI :ml Machine learning labels Dec 22, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core

@davidkyle
Copy link
Member Author

Muted on 6.6 7827b94, 6.x 6012e98

@droberts195
Copy link
Contributor

These always seem to happen when upgrading from 6.1. The 30% discrepancy probably means we’ve accidentally changed which versions the 30% uplift in model memory limit applies to during the job-in-index work.

@droberts195 droberts195 self-assigned this Jan 3, 2019
@droberts195
Copy link
Contributor

The problem here is that we look at the job version to work out whether the model memory limit needs to be uplifted. But we also set the job version to current when migrating from cluster state to an index.

I think the solution is to add the model memory limit uplift logic into MlConfigMigrator.updateJobForMigration. This will guarantee that every job that is stored in the config index has had the uplift applied if appropriate. In the master branch the logic can be completely removed from TransportOpenJobAction.

droberts195 added a commit to droberts195/elasticsearch that referenced this issue Jan 3, 2019
When a 6.1-6.3 job is opened in a later version
we increase the model memory limit by 30% if it's
below 0.5GB. The migration of jobs from cluster
state to the config index changes the job version,
so we need to also do this uplift as part of that
config migration.

Relates elastic#36961
@droberts195
Copy link
Contributor

Fixes are #37125 for 6.6/6.x and #37126 for master.

droberts195 added a commit that referenced this issue Jan 4, 2019
When a 6.1-6.3 job is opened in a later version
we increase the model memory limit by 30% if it's
below 0.5GB. The migration of jobs from cluster
state to the config index changes the job version,
so we need to also do this uplift as part of that
config migration.

Relates #36961
droberts195 added a commit that referenced this issue Jan 4, 2019
When a 6.1-6.3 job is opened in a later version
we increase the model memory limit by 30% if it's
below 0.5GB. The migration of jobs from cluster
state to the config index changes the job version,
so we need to also do this uplift as part of that
config migration.

Relates #36961
droberts195 added a commit that referenced this issue Jan 4, 2019
When a 6.1-6.3 job is opened in a later version
we increase the model memory limit by 30% if it's
below 0.5GB. The migration of jobs from cluster
state to the config index changes the job version,
so we need to also do this uplift as part of that
config migration.

Relates #36961
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:ml Machine learning >test-failure Triaged test failures from CI
Projects
None yet
Development

No branches or pull requests

3 participants