Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Transform] Empty dest index is created when a pipeline with a date_index_name is configured #74547

Open
fdartayre opened this issue Jun 24, 2021 · 3 comments
Labels
>enhancement :ml/Transform Transform Team:ML Meta label for the ML team

Comments

@fdartayre
Copy link
Contributor

fdartayre commented Jun 24, 2021

Description of desired versus actual behavior:
The dest index configured in the transform is recreated each time the transform is restarted, even though it remains empty because of the date_index_name pipeline.

Desired behavior: do not create the index.

Steps to reproduce:

  1. Add Kibana sample data "Sample web logs"
  2. Create the ingest pipeline:
PUT _ingest/pipeline/transform_monthly_index
{
  "description": "Monthly date-time index naming",
  "processors": [
    {
      "date_index_name": {
        "field": "timestamp",
        "index_name_prefix": "{{{ _index }}}-",
        "index_name_format": "yyyy-MM",
        "date_rounding": "M"
      }
    }
  ]
}
  1. Create the transform and start it:
PUT _transform/data_logs_transform
{
  "source": {
    "index": "kibana_sample_data_logs"
  },
  "dest": {
    "index": "transform-demo-index",
    "pipeline": "transform_monthly_index"
  },
  "pivot": {
    "group_by": {
      "clientip": {
        "terms": {
          "field": "clientip"
        }
      },
      "timestamp": {
        "date_histogram": {
          "field": "timestamp",
          "calendar_interval": "1h"
        }
      }
    },
    "aggregations": {
      "url_dc": {
        "cardinality": {
          "field": "url.keyword"
        }
      }
    }
  }
}

POST _transform/data_logs_transform/_start
  1. Check the created indices:
GET _cat/indices/transform-demo-index*?v
health status index                        uuid                   pri rep docs.count docs.deleted store.size pri.store.size
yellow open   transform-demo-index-2021-08 rRbkJe0pQI2YOq8robil0w   1   1       1119            0    126.6kb        126.6kb
yellow open   transform-demo-index-2021-06 MMN8_P4CQTuYMB3EJjHyLw   1   1       5715            0    453.1kb        453.1kb
yellow open   transform-demo-index-2021-07 GA2SNOsDTU2MPl5wMf-n9Q   1   1       7010            0      539kb          539kb
green  open   transform-demo-index         7ru7bTZDSBi2ssBDvHhoag   1   0          0            0       208b           208b

Even if "transform-demo-index" is deleted, it will be recreated when the transform is restarted.

@fdartayre fdartayre added >enhancement :ml/Transform Transform Team:ML Meta label for the ML team needs:triage Requires assignment of a team area label labels Jun 24, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

@DJRickyB DJRickyB removed the needs:triage Requires assignment of a team area label label Jun 24, 2021
@hendrikmuhs
Copy link

hendrikmuhs commented Jul 6, 2021

This is by design. Transform creates the destination index if it does not exist at start. This is due to the schema deduction.

A manipulation of _index in an ingest pipeline fed by transform is possible, but not supported. If you do this, you are on your own. Note that other features of transform won't work correctly, e.g. retention_policy.

By design bulk API's require named requests, the name of the index is mandatory:

(Optional, string) Name of the data stream, index, or index alias to perform the action on. This parameter is required if a <target> is not specified in the request path. 

Either it has to be part of the request path, or specified.

Consequently the question should be: Can ingest API's be enhanced to run without _index being specified? I could not find an issue for this, #63798 sounds like a similar idea.

If so transform can be enhanced, so it requires either index, pipeline or both and something like this sounds possible:

  "dest": {
    "pipeline": "transform_monthly_index"
  },

But this is illegal today, 1st this functionality needs to be provided on a lower level and 2nd transform can use it.

Taking a step back and looking at the user story:

We are looking into destination index rollover. Today ILM and transform are not integrated with each other, that's why you can't use ILM together with transform without seeing side effects like duplicates.

That's why we are looking into ways to provide index rollover as part of transform or make ILM transform aware (hard, probably not possible).

In summary, consider this enhancement request as closed by design. However we have plans to provide index rollover capabilities in transform in the future. I am fine with keeping this issue open as a reminder and replace it once I have more information.

@rseldner
Copy link
Contributor

rseldner commented Feb 3, 2024

Since this date_index_name processor might be applied to a Transform for using ILM (though not recommended)...

...If you use ILM to have time-based indices, please consider using the Date index name instead. The processor works without duplicated documents if your transform contains a group_by based on date_histogram.
Source:https://www.elastic.co/guide/en/elasticsearch/reference/8.12/transform-limitations.html#transform-ilm-destination

...Here's a workaround to avoid the empty index...

The result is the index gets immediately accelerated to delete phase even if recreated

  "indices": {
    "ilm-with-transform-test-2020.01.01": {
      "index": "ilm-with-transform-test-2020.01.01",
      "managed": true,
      "policy": "ilm-with-transform-test-ilm-policy",
      "index_creation_date_millis": 1706915286359,
      "time_since_index_creation": "34.47s",
      "lifecycle_date_millis": 1577836800000,
+      "age": "1493.96d",
+      "phase": "delete",
      "phase_time_millis": 1706915288070,
      "action": "complete",
      "action_time_millis": 1706915287870,
      "step": "complete",
      "step_time_millis": 1706915288070,
      "phase_execution": {
        "policy": "ilm-with-transform-test-ilm-policy",
        "phase_definition": {
          "min_age": "60d",
          "actions": {}
        },
        "version": 5,
        "modified_date_in_millis": 1706915261510
      }
    }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :ml/Transform Transform Team:ML Meta label for the ML team
Projects
None yet
Development

No branches or pull requests

5 participants