-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] Use feature reset API in ML REST test cleanup #71552
[ML] Use feature reset API in ML REST test cleanup #71552
Conversation
Now that we have a feature reset API, we should use this for cleaning up in between tests instead of running lots of bespoke cleanup code.
As of the first commit on this PR this doesn't actually work. Several tests fail with:
It reveals that the reset is not removing ML trained model aliases. /cc @pheyos |
We should be able to use this API to clean up between Java integration tests as well as REST integration tests. However, I am just doing the REST tests in this PR because I suspect there will be unforeseen side effects that cause CI noise so it's better to move gradually to minimise the number of side effects that need fixing at a particular point in time. |
final ClusterState.Builder builder = ClusterState.builder(currentState); | ||
builder.metadata(Metadata.builder(currentState.getMetadata()) | ||
.putCustom(ModelAliasMetadata.NAME, ModelAliasMetadata.EMPTY).build()); | ||
return builder.build(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder, could we simply delete the entry for ModelAliasMetadata.NAME
? I am thinking of the scenario where a user hasn't ever created a model alias (and consequently, the metadata hasn't even been added to the cluster state yet). This reset call will actually be the first thing to create the metadata entry.
Needed because the feature reset API causes creation of transform custom cluster state.
This represents quite a major change of approach. The feature reset API is NOT a master node action. Therefore any cluster state updates that are done as part of it need to be separate master node actions. However, it's wasteful to do lots of cluster state updates during a reset because we want to end up with each successfully reset feature's cluster state being non-existent. Therefore, this change reuses the master node action that unsets reset mode after a successful reset to wipe the relevant section(s) of custom cluster state.
Pinging @elastic/ml-core (Team:ML) |
Just wanted to call out that there's been a major change of approach in the later commits. The feature reset API is NOT a master node action. Therefore any cluster state updates that are done as part of it need to Therefore, the latest approach is to adapt the "set reset mode" actions so that when unsetting reset mode due to a successful reset, instead of resetting the flag the relevent section of cluster state is removed. The question is, as a result of this change, should we take the opportunity to rename that action from "set reset mode" to something that conveys that it's also responsible for final wiping of the custom cluster state? We can do this before 7.13 feature freeze, but after that we'll have to stick with the action name. |
The transform portion of the feature reset API calls other transform endpoints, and will do so even if transforms has never ever been used. We don't want it to warn about lack of transform nodes in this case. (This problem shows up in ML test clusters that use the feature reset API for cleanup but don't contain any transform nodes.)
|
||
private static final ParseField ENABLED = new ParseField("enabled"); | ||
private static final ParseField RESET_SUCCESSFUL = new ParseField("reset_successful"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be "delete metadata" or something. Reset successful doesn't make sense in this context, at least not to me.
Especially since you could have reset_successful
be true
and have enabled
to the `true.
Current test failures are all caused by:
|
Since we reset custom cluster state by completely removing it now at the end of a successful reset.
Jenkins run elasticsearch-ci/2 |
@elasticmachine update branch |
Now that we have a feature reset API, we should use this for cleaning up in between tests instead of running lots of bespoke cleanup code. During testing of this change we found we need to delete custom cluster state as part of the reset process, so this PR also implements that. Additionally we no longer assign persistent tasks during feature reset. Backport of elastic#71552
Now that we have a feature reset API, we should use this for cleaning up in between tests instead of running lots of bespoke cleanup code. During testing of this change we found we need to delete custom cluster state as part of the reset process, so this PR also implements that. Additionally we no longer assign persistent tasks during feature reset. Backport of #71552
Now that we have a feature reset API, we should use
this for cleaning up in between tests instead of running
lots of bespoke cleanup code.
During testing of this change we found we need to
delete custom cluster state as part of the reset process,
so this PR also implements that.
Additionally we no longer assign persistent tasks
during feature reset.