[CI][ML] Rolling upgrade failure in '30_ml_jobs_crud/Test model memory limit is updated' #36961

davidkyle · 2018-12-22T23:58:58Z

Several failures in the rolling upgrade tests with the same error

java.lang.AssertionError: Failure at [upgraded_cluster/30_ml_jobs_crud:66]: jobs.0.analysis_limits.model_memory_limit didn't match expected value:
jobs.0.analysis_limits.model_memory_limit: expected [130mb] but was [100mb]

https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.x+bwc-tests/180/console
https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.x+bwc-tests/179/console
https://elasticsearch-ci.elastic.co/job/elastic+elasticsearch+6.6+bwc-tests/8/console

The text was updated successfully, but these errors were encountered:

elasticmachine · 2018-12-22T23:58:59Z

Pinging @elastic/ml-core

davidkyle · 2018-12-23T00:34:29Z

Muted on 6.6 7827b94, 6.x 6012e98

droberts195 · 2018-12-23T11:16:53Z

These always seem to happen when upgrading from 6.1. The 30% discrepancy probably means we’ve accidentally changed which versions the 30% uplift in model memory limit applies to during the job-in-index work.

droberts195 · 2019-01-03T16:31:08Z

The problem here is that we look at the job version to work out whether the model memory limit needs to be uplifted. But we also set the job version to current when migrating from cluster state to an index.

I think the solution is to add the model memory limit uplift logic into MlConfigMigrator.updateJobForMigration. This will guarantee that every job that is stored in the config index has had the uplift applied if appropriate. In the master branch the logic can be completely removed from TransportOpenJobAction.

When a 6.1-6.3 job is opened in a later version we increase the model memory limit by 30% if it's below 0.5GB. The migration of jobs from cluster state to the config index changes the job version, so we need to also do this uplift as part of that config migration. Relates elastic#36961

droberts195 · 2019-01-03T18:17:22Z

Fixes are #37125 for 6.6/6.x and #37126 for master.

When a 6.1-6.3 job is opened in a later version we increase the model memory limit by 30% if it's below 0.5GB. The migration of jobs from cluster state to the config index changes the job version, so we need to also do this uplift as part of that config migration. Relates #36961

davidkyle added >test-failure Triaged test failures from CI :ml Machine learning labels Dec 22, 2018

davidkyle mentioned this issue Dec 23, 2018

[ML] Metadata Migration Meta issue #32905

Closed

43 tasks

droberts195 self-assigned this Jan 3, 2019

droberts195 mentioned this issue Jan 3, 2019

[ML] Uplift model memory limit on job migration #37125

Merged

droberts195 mentioned this issue Jan 3, 2019

[ML] Uplift model memory limit on job migration #37126

Merged

droberts195 closed this as completed Jan 4, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI][ML] Rolling upgrade failure in '30_ml_jobs_crud/Test model memory limit is updated' #36961

[CI][ML] Rolling upgrade failure in '30_ml_jobs_crud/Test model memory limit is updated' #36961

davidkyle commented Dec 22, 2018

elasticmachine commented Dec 22, 2018

davidkyle commented Dec 23, 2018

droberts195 commented Dec 23, 2018

droberts195 commented Jan 3, 2019

droberts195 commented Jan 3, 2019

[CI][ML] Rolling upgrade failure in '30_ml_jobs_crud/Test model memory limit is updated' #36961

[CI][ML] Rolling upgrade failure in '30_ml_jobs_crud/Test model memory limit is updated' #36961

Comments

davidkyle commented Dec 22, 2018

elasticmachine commented Dec 22, 2018

davidkyle commented Dec 23, 2018

droberts195 commented Dec 23, 2018

droberts195 commented Jan 3, 2019

droberts195 commented Jan 3, 2019