[ML] Use feature reset API in ML REST test cleanup #71552

droberts195 · 2021-04-12T09:50:11Z

Now that we have a feature reset API, we should use
this for cleaning up in between tests instead of running
lots of bespoke cleanup code.

During testing of this change we found we need to
delete custom cluster state as part of the reset process,
so this PR also implements that.

Additionally we no longer assign persistent tasks
during feature reset.

Now that we have a feature reset API, we should use this for cleaning up in between tests instead of running lots of bespoke cleanup code.

droberts195 · 2021-04-12T09:52:07Z

As of the first commit on this PR this doesn't actually work. Several tests fail with:

cannot assign model_alias [my-regression] to model_id [a-unused-regression-model1] as model_alias already refers to [a-unused-regression-model1]. Set parameter [reassign] to [true] if model_alias should be reassigned."

It reveals that the reset is not removing ML trained model aliases.

/cc @pheyos

droberts195 · 2021-04-12T10:46:52Z

We should be able to use this API to clean up between Java integration tests as well as REST integration tests. However, I am just doing the REST tests in this PR because I suspect there will be unforeseen side effects that cause CI noise so it's better to move gradually to minimise the number of side effects that need fixing at a particular point in time.

benwtrent · 2021-04-12T10:46:52Z

x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/MachineLearning.java

+                        final ClusterState.Builder builder = ClusterState.builder(currentState);
+                        builder.metadata(Metadata.builder(currentState.getMetadata())
+                            .putCustom(ModelAliasMetadata.NAME, ModelAliasMetadata.EMPTY).build());
+                        return builder.build();


I wonder, could we simply delete the entry for ModelAliasMetadata.NAME? I am thinking of the scenario where a user hasn't ever created a model alias (and consequently, the metadata hasn't even been added to the cluster state yet). This reset call will actually be the first thing to create the metadata entry.

Needed because the feature reset API causes creation of transform custom cluster state.

This represents quite a major change of approach. The feature reset API is NOT a master node action. Therefore any cluster state updates that are done as part of it need to be separate master node actions. However, it's wasteful to do lots of cluster state updates during a reset because we want to end up with each successfully reset feature's cluster state being non-existent. Therefore, this change reuses the master node action that unsets reset mode after a successful reset to wipe the relevant section(s) of custom cluster state.

elasticmachine · 2021-04-12T16:33:18Z

Pinging @elastic/ml-core (Team:ML)

droberts195 · 2021-04-12T16:38:29Z

Just wanted to call out that there's been a major change of approach in the later commits.

The feature reset API is NOT a master node action. Therefore any cluster state updates that are done as part of it need to
be separate master node actions. This meant that the approach in the second commit didn't work reliably in a multi-node cluster. I could have introduced another internal master node action to reset cluster states. However, it's wasteful to do lots of cluster state updates during a reset because we want to end up with each successfully reset feature's cluster state being non-existent, and where do you stop with the granularity.

Therefore, the latest approach is to adapt the "set reset mode" actions so that when unsetting reset mode due to a successful reset, instead of resetting the flag the relevent section of cluster state is removed.

The question is, as a result of this change, should we take the opportunity to rename that action from "set reset mode" to something that conveys that it's also responsible for final wiping of the custom cluster state? We can do this before 7.13 feature freeze, but after that we'll have to stick with the action name.

The transform portion of the feature reset API calls other transform endpoints, and will do so even if transforms has never ever been used. We don't want it to warn about lack of transform nodes in this case. (This problem shows up in ML test clusters that use the feature reset API for cleanup but don't contain any transform nodes.)

benwtrent · 2021-04-12T17:38:37Z

...plugin/core/src/main/java/org/elasticsearch/xpack/core/action/SetResetModeActionRequest.java


    private static final ParseField ENABLED = new ParseField("enabled");
+    private static final ParseField RESET_SUCCESSFUL = new ParseField("reset_successful");


I think this should be "delete metadata" or something. Reset successful doesn't make sense in this context, at least not to me.

Especially since you could have reset_successful be true and have enabled to the `true.

droberts195 · 2021-04-12T18:38:48Z

Current test failures are all caused by:

  1> [2021-04-13T01:17:54,433][INFO ][o.e.x.t.r.XPackRestIT    ] [test] [p0=ml/set_upgrade_mode/Setting upgrade mode to disabled from enabled] before test
  1> [2021-04-13T01:18:57,287][WARN ][o.e.c.RestClient         ] [test] request [DELETE http://127.0.0.1:41000/*,-.ds-ilm-history-*?expand_wildcards=open%2Cclosed%2Chidden] returned 1 warnings: [299 Elasticsearch-8.0.0-SNAPSHOT-85b239713eb39d02cf1dc13585e4db4e53a72475 "this request accesses system indices: [.ml-config], but in a future major version, direct access to system indices will be prevented by default"]
  1> [2021-04-13T01:18:57,476][INFO ][o.e.x.t.r.XPackRestIT    ] [test] There are still tasks running after this test that might break subsequent tests [cluster:admin/features/reset, cluster:admin/xpack/ml/job/close, xpack/ml/job[c]].
  1> [2021-04-13T01:18:57,477][INFO ][o.e.x.t.r.XPackRestIT    ] [test] [p0=ml/set_upgrade_mode/Setting upgrade mode to disabled from enabled] after test

Since we reset custom cluster state by completely removing it now at the end of a successful reset.

droberts195 · 2021-04-13T12:34:44Z

Jenkins run elasticsearch-ci/2

droberts195 · 2021-04-13T14:04:47Z

@elasticmachine update branch

Now that we have a feature reset API, we should use this for cleaning up in between tests instead of running lots of bespoke cleanup code. During testing of this change we found we need to delete custom cluster state as part of the reset process, so this PR also implements that. Additionally we no longer assign persistent tasks during feature reset. Backport of elastic#71552

Now that we have a feature reset API, we should use this for cleaning up in between tests instead of running lots of bespoke cleanup code. During testing of this change we found we need to delete custom cluster state as part of the reset process, so this PR also implements that. Additionally we no longer assign persistent tasks during feature reset. Backport of #71552

[ML] Use feature reset API in ML REST test cleanup

efc1673

Now that we have a feature reset API, we should use this for cleaning up in between tests instead of running lots of bespoke cleanup code.

droberts195 added >test Issues or PRs that are addressing/adding tests :ml Machine learning v8.0.0 v7.13.0 labels Apr 12, 2021

Put empty trained model aliases in ML feature reset

99f0af4

benwtrent approved these changes Apr 12, 2021

View reviewed changes

droberts195 added 7 commits April 12, 2021 11:55

Remove model aliases cluster state instead of emptying

5992a3e

Add transform plugin to native multi-node tests

6fd99cf

Needed because the feature reset API causes creation of transform custom cluster state.

Fix internal cluster test compilation

731de41

Don't assign persistent tasks during reset

5166b8d

Also don't assign transforms in reset mode

2d8a9cf

Merge branch 'master' into use_feature_reset_in_ml_cleanup

17892bb

droberts195 marked this pull request as ready for review April 12, 2021 16:33

elasticmachine added the Team:ML Meta label for the ML team label Apr 12, 2021

benwtrent self-requested a review April 12, 2021 17:26

benwtrent reviewed Apr 12, 2021

View reviewed changes

Address code review comments

e7fdb47

droberts195 added 2 commits April 13, 2021 10:57

Allow some actions that are disallowed in upgrade mode during resets

451679b

Transform test adjustment

c8df12f

Since we reset custom cluster state by completely removing it now at the end of a successful reset.

Merge branch 'master' into use_feature_reset_in_ml_cleanup

e0f615d

droberts195 merged commit c436458 into elastic:master Apr 13, 2021

droberts195 deleted the use_feature_reset_in_ml_cleanup branch April 13, 2021 15:05

This was referenced Apr 13, 2021

[CI] Failure of {p0=ml/set_upgrade_mode/Setting upgrade mode to disabled from enabled} #71646

Closed

[ML] More debugging to confirm the cause of #71646 #71647

Closed

droberts195 mentioned this pull request Apr 15, 2021

[ML] Use feature reset API in ML REST test cleanup #71746

Merged

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ML] Use feature reset API in ML REST test cleanup #71552

[ML] Use feature reset API in ML REST test cleanup #71552

droberts195 commented Apr 12, 2021 •

edited

Loading

droberts195 commented Apr 12, 2021

droberts195 commented Apr 12, 2021

benwtrent Apr 12, 2021

elasticmachine commented Apr 12, 2021

droberts195 commented Apr 12, 2021

benwtrent Apr 12, 2021

droberts195 commented Apr 12, 2021

droberts195 commented Apr 13, 2021

droberts195 commented Apr 13, 2021


		private static final ParseField ENABLED = new ParseField("enabled");
		private static final ParseField RESET_SUCCESSFUL = new ParseField("reset_successful");

[ML] Use feature reset API in ML REST test cleanup #71552

[ML] Use feature reset API in ML REST test cleanup #71552

Conversation

droberts195 commented Apr 12, 2021 • edited Loading

droberts195 commented Apr 12, 2021

droberts195 commented Apr 12, 2021

benwtrent Apr 12, 2021

Choose a reason for hiding this comment

elasticmachine commented Apr 12, 2021

droberts195 commented Apr 12, 2021

benwtrent Apr 12, 2021

Choose a reason for hiding this comment

droberts195 commented Apr 12, 2021

droberts195 commented Apr 13, 2021

droberts195 commented Apr 13, 2021

droberts195 commented Apr 12, 2021 •

edited

Loading