Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Transform] can't delete transform, stop start after rolling upgrade / node role change #69260

Closed
hendrikmuhs opened this issue Feb 19, 2021 · 2 comments · Fixed by #69419
Closed
Assignees
Labels
>bug :ml/Transform Transform Team:ML Meta label for the ML team

Comments

@hendrikmuhs
Copy link

hendrikmuhs commented Feb 19, 2021

Upstream issue: elastic/kibana#91570

Affected versions: 7.7.0 - 7.11.2

After a change to the cluster - could by a rolling upgrade or fine-tuning of settings - a formerly running transform reports it is stopped, e.g.:

{
  "count" : 1,
  "transforms" : [
    {
      "id" : "t2",
      "state" : "stopped",

However trying to delete it, claims it is running:

{
  "error" : {
    "root_cause" : [
      {
        "type" : "status_exception",
        "reason" : "Cannot delete transform [t2] as the task is running. Stop the task first"
      }
    ],
    "type" : "status_exception",
    "reason" : "Cannot delete transform [t2] as the task is running. Stop the task first"
  },
  "status" : 409
}

Trying to delete it with force times out and trying to start it, claims the task as well.

As a result it is not possible to delete the transform or use it.

Mitigation:

Transform requires a transform node to run on. To verify whether you have a node that can run transform check the output of GET _cat/nodes:

v.x.y.z 2 99 2 0.68 0.66 1.43 dm  * elasticsearch01
...

The above output is not valid, because only data and master node roles are available. You must have at least 1 node that has a t, e.g.:

...
v.x.y.z 2 99 2 0.68 0.66 1.43 dt  * elasticsearch03
...

The above shows a data and transform node. Note, you only need 1 node with a t aka transform node.

Solution:

Add a transform node to your cluster, see https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-node.html

At least 1 node should specify the transform role:

node.roles: [ ..., transform, ... ]

Note: if you specify no roles, you automatically use all roles, see the docs for details.

Fix:

The solution of this problem has 2 aspects:

  • operational:
    • it must be possible to delete a (running) transform
    • stats should report the correct state
  • user-experience:
    • API's must better handle the case of a missing transform node
      • stats warn about no transform nodes / show number of transform nodes
      • preview should warn about no transform nodes
      • the UI should show the number of transform nodes (and visually indicate if there is none)
@elasticmachine elasticmachine added the Team:ML Meta label for the ML team label Feb 19, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

@hendrikmuhs
Copy link
Author

hendrikmuhs commented Feb 19, 2021

Retrospective

7.2 - 7.6

Originally transform ran on any node in a cluster. Although the transform task is lightweight - it only coordinates search and index requests - user wished to disable transform on certain nodes: #52200

A solution was implemented in #52712, version 7.7

7.7 - 7.9

This implementation allowed you to opt out certain nodes from acting as transform nodes. As default starting from 7.7 transform only executed on data nodes (so it also prevented to run transform on low-capacity nodes like coordinator or dedicated specialized nodes like ml).

To opt out use: node.transform: false in elasticsearch.yml.

7.9 - 7.last

node.* is still supported, however got deprecated. The new way of specifying node roles is using an array, see #54998.

Instead of e.g. node.transform: true/false, transform must be part of 'node.roles: []'. If you don't specify node.roles a node has every role(dev mode).

However, this change basically switches the logic from an opt_out model to opt_in! It's now easily possible to loose the ability to execute transforms by switching to the new syntax.

Deleting entries like node.data: true and translating it to node.role: [data] implicitly deletes the transform role.

8.0 -

It is planned to remove the deprecated node.* syntax, see #66409

Summary

By switching node roles from an opt out (implicit) to an opt in(explicit) model it can happen that a user accidentally removes the transform role.

However as the old syntax is still supported at least for 7.x this can only happen if he changes the configuration.

@hendrikmuhs hendrikmuhs self-assigned this Feb 22, 2021
hendrikmuhs pushed a commit that referenced this issue Feb 24, 2021
allow stop transform to stop a transform task if its waiting for assignment(e.g. if
the cluster lacks a transform node)

fixes #69260
hendrikmuhs pushed a commit to hendrikmuhs/elasticsearch that referenced this issue Feb 24, 2021
allow stop transform to stop a transform task if its waiting for assignment(e.g. if
the cluster lacks a transform node)

fixes elastic#69260
hendrikmuhs pushed a commit to hendrikmuhs/elasticsearch that referenced this issue Feb 24, 2021
allow stop transform to stop a transform task if its waiting for assignment(e.g. if
the cluster lacks a transform node)

fixes elastic#69260
hendrikmuhs pushed a commit that referenced this issue Feb 24, 2021
#69526)

allow stop transform to stop a transform task if its waiting for assignment(e.g. if
the cluster lacks a transform node)

fixes #69260
hendrikmuhs pushed a commit that referenced this issue Feb 24, 2021
… (#69527)

allow stop transform to stop a transform task if its waiting for assignment(e.g. if
the cluster lacks a transform node)

fixes #69260
hendrikmuhs pushed a commit that referenced this issue Feb 24, 2021
… (#69528)

allow stop transform to stop a transform task if its waiting for assignment(e.g. if
the cluster lacks a transform node)

fixes #69260
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :ml/Transform Transform Team:ML Meta label for the ML team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants