Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Transform] Make force-stopping the transform always remove persistent task from cluster state #106989

Merged
merged 2 commits into from
Apr 10, 2024

Conversation

przemekwitek
Copy link
Contributor

@przemekwitek przemekwitek commented Apr 2, 2024

Currently, if, for any reason, persistent transform task removal cannot commit to cluster state, the leftover persistent task entry will stay forever in the cluster state.
This PR fixes the _stop/force API so that persistent transform task removal is possible in such a case.
Rather than relying on taskOperation() method (which might not be called if the task is in this problematic state), this PR makes the force-stop action remove the persistent task as soon as possible in the doExecute method (before the delegation to taskOperation() method).

Fixes #106811

@przemekwitek przemekwitek force-pushed the fix_transform_force_stop branch 6 times, most recently from 770d234 to 7af527f Compare April 8, 2024 10:04
@przemekwitek przemekwitek force-pushed the fix_transform_force_stop branch from 7af527f to db7397f Compare April 9, 2024 15:29
@przemekwitek przemekwitek marked this pull request as ready for review April 9, 2024 15:34
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

@elasticsearchmachine elasticsearchmachine added the Team:ML Meta label for the ML team label Apr 9, 2024
@elasticsearchmachine
Copy link
Collaborator

Hi @przemekwitek, I've created a changelog YAML for you.

Copy link
Member

@prwhelan prwhelan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent, great job debugging this

Comment on lines -251 to -253
transformTask.shutdown();
// Here the indexer is aborted so that its thread finishes work ASAP.
transformTask.onCancelled();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are no longer needed because persistentTasksService.sendRemoveRequest will invoke them, is that correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the sequence of events is as follows:

  • persistentTasksService.sendRemoveRequest removes the locally running task and notifies PersistentTasksClusterService to remove the task from cluster state.
  • when the cluster state changes, PersistentTaskNodeService is notified via clusterChanged method
  • PersistentTaskNodeService cancels the task
  • TransformTask.onCancelled method is called and this method calls shutdown

@przemekwitek przemekwitek merged commit e21f2e3 into elastic:main Apr 10, 2024
14 checks passed
@przemekwitek przemekwitek deleted the fix_transform_force_stop branch April 10, 2024 08:18
walterra added a commit to elastic/kibana that referenced this pull request Apr 11, 2024
## Summary

Fixes #180503
Fixes #180499
Fixes #180495
Fixes #180496
Fixes #180497
Fixes #180504

The tests themselves were passing, but the cleanup in the `after` blocks
turned out problematic after a [recent update in ES related to stopping
transforms](elastic/elasticsearch#106989) using
`force`.

Originally we called `stop` on all test transforms and then called the
helper function `.cleanTransformIndices();`. For some time now this
helper was updated to first both force stop and delete transforms before
deleting related indices. So we ended up calling stop twice in some
tests! The second stop using force could then return an error if the
transform was already stopped.

This PR fixes the tests by removing the now unnecessary stop commands.

### Checklist

- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
- [x] [Flaky Test
Runner](https://ci-stats.kibana.dev/trigger_flaky_test_runner/1) was
used on any tests changed
- [x] This was checked for breaking API changes and was [labeled
appropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :ml/Transform Transform Team:ML Meta label for the ML team v8.14.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Transform] There is no way of removing leftover persistent transform tasks from cluster state
3 participants