[Ingest Pipelines][discuss] Support custom pipelines #129

simitt · 2021-02-05T11:50:26Z

Feature Request

Allow users to customize Ingest Pipelines per data stream. This might be done in multiple iterations, e.g.
(1) allow to disable loading the default pipelines, and documenting how custom pipelines can be set up manually. Removing pipelines could break data streams. The package-spec could be extended to define which pipelines are required and which are optional. Optional pipelines could then be disabled by users.
(2) support customizing default pipelines. Building on top of (1) the spec could be extended to allow adding custom pipelines.

Why is it important

APM Server today allows users to define pipelines via a json file. It allows users to configure whether ingest pipelines should be set up and overwritten. When migrating to managed APM Server users should at least be able to set up their pipelines manually and ensure these are not overwritten.

Current State

@ruflin started a docs PR to document how users can set up their own pipelines.

* Removing outdated information * Moving context to correct place * Fixing links to anchors

axw · 2021-03-25T06:16:27Z

One challenge around customising pipelines is that they are renamed by Fleet, based on the data stream and package version. When you upgrade the package, new pipelines will be installed and customisations to existing ones will be lost. Also, because pipelines may change between package versions, I'm not sure we can ever sensibly carry customisations across.

I'm inclined to find a solution that does not involve either manually modifying pipelines that are installed by Fleet, or changing which of those pipelines are run. Instead, can we limit the feature to enabling users to define a pipeline which runs either before or after the default pipeline (or both)? This could be implemented in multiple ways, e.g. post-default by using a final pipeline, or by having Fleet inject pipeline processor(s) into the package-defined pipeline, referencing the user-specified pipeline(s).

jalvz · 2021-03-25T07:33:17Z

Users have already other ways to create ingest pipelines, right? (ingest API, Kibana) What if we don't give them another way to do it and resort existing mechanisms? Or in other words: how the suggested way is better/simpler?

axw · 2021-03-25T08:15:45Z

The problem is not in modifying a pipeline, it's in carrying it across package upgrades. Users do not have an existing mechanism for that. They can create their own custom pipeline using the usual mechanisms.

Maybe we could just allow users to modify the pipeline installed by Fleet, and have Fleet perform a three-way merge with the new pipeline on upgrade. I'm not convinced that'll be simpler, but it's certainly an option.

ruflin · 2021-03-25T13:52:21Z

I would prefer to stay away from a 3-way merge if possible. Instead I think it would be nicer if Elasticsearch would support to add a list of ingest pipelines: elastic/elasticsearch#61185 Like this we only need to merge the list of pipelines and not the pipelines itself. The alternative is using the full fledged document routing where Elasticsearch would support adding certain pipelines for certain data(streams): elastic/elasticsearch#63798

felixbarny · 2022-04-11T05:57:44Z

I’ve been thinking about this in a different context and have created a proposal for a potential solution: elastic/elasticsearch#61185 (comment).

  @axw @simitt could you have a look and see whether the proposal would work for the use case mentioned in this issue?

axw · 2022-04-11T06:59:20Z

@felixbarny thanks for chiming in. I think that would work. One issue would be about ownership: who owns logs-foo once a user has customised it? If it was installed by a package, should uninstalling the package remove the customised pipeline? How do we know it has been customised?

I had in mind something similar, but reversed:

allow the pipeline processor to refer to a non-existent pipeline
have package-provided pipelines reference initially non-existent before/after pipelines with stable identifiers, using the pipeline processor

e.g. something like

install package v1.0.0, it creates pipeline logs-foo-1.0.0. This pipeline will call pipeline processors logs-foo-before and logs-foo-after (before and after the builtin processors, respectively)
create logs-foo-before; it will now be called by logs-foo-1.0.0
upgrade package to v1.1.0, creates pipeline logs-foo-1.1.0 and removes logs-foo-1.0.0; it still calls logs-foo-before
uninstall package, removes pipeline logs-foo-1.1.0, but leaves logs-foo-before alone

felixbarny · 2022-04-11T07:11:54Z

If it was installed by a package, should uninstalling the package remove the customised pipeline?

Good question. One approach would be to never remove the whole pipeline but to only remove the pipeline processor that's managed by the integration which can be identified with a processor tag. One issue is that re-installing the integration could mess up the order. A potential solution would be to not remove the pipeline processor, just disabling it with if: false.

have package-provided pipelines reference initially non-existent before/after pipelines with stable identifiers, using the pipeline processor

I've been thinking about that, too. It would even be possible today by configuring the pipeline processors with ignore_failure=true. However, this will still create an exception in Elasticsearch which adds overhead. We could try to avoid creating an Exception if ignore_failure=true but from looking at the code, I couldn't find an elegant solution to do that. Not saying it's not possible, though.

felixbarny · 2022-04-11T10:05:46Z

have package-provided pipelines reference initially non-existent before/after pipelines with stable identifiers, using the pipeline processor

Or we just create empty before/after pipelines where users can add their processors. (inspired by elastic/elasticsearch#61185 (comment))

axw · 2022-04-11T10:11:22Z

Good idea. That sounds reasonable to me, since it's exactly what we do for component templates.

ruflin · 2022-04-11T10:30:21Z

Is there any cost attached to empty pipelines?

felixbarny · 2022-04-11T12:43:08Z

Based on my read of the Elasticsearch code, my expectation is that the overhead should be negligible. The way it works is that Elasticsearch is looking up the pipeline from a Map by key.

https://github.com/elastic/elasticsearch/blob/d834dd5a2ffd4543b65455fe7ee3334fdf4b42f3/server/src/main/java/org/elasticsearch/ingest/PipelineProcessor.java#L32-L34

https://github.com/elastic/elasticsearch/blob/d834dd5a2ffd4543b65455fe7ee3334fdf4b42f3/server/src/main/java/org/elasticsearch/ingest/IngestService.java#L457-L458

Executing an empty pipeline should be about as much overhead as iterating over an empty list.

https://github.com/elastic/elasticsearch/blob/d834dd5a2ffd4543b65455fe7ee3334fdf4b42f3/server/src/main/java/org/elasticsearch/ingest/CompoundProcessor.java#L125-L129

joshdover · 2022-06-02T10:51:28Z

I've been thinking about that, too. It would even be possible today by configuring the pipeline processors with ignore_failure=true. However, this will still create an exception in Elasticsearch which adds overhead. We could try to avoid creating an Exception if ignore_failure=true but from looking at the code, I couldn't find an elegant solution to do that. Not saying it's not possible, though.

I've attempted to do this using an on_failure block with a fail processor that is so close to accomplishing this with the existing functionality.

My idea was to use an on_failure block with a single fail processor that has a condition to not fail if the failure was a non-existent pipeline error. However, it seems I can’t figure how to access the _ingest.on_failure_* fields from within the condition, but they are accessible from within the message option. I tried accessing this via _ingest. and ctx['_ingest'] as well without any luck. I'm chatting with ES Data Management on Slack to determine how to work around this, but posting here in case you all have any ideas. Here’s the pipeline I’m trying to create:

PUT _ingest/pipeline/test-pipeline
{
  "processors": [
    {
      "pipeline": {
        "name": "test-custom",
        "tag": "fleet-custom-caller",
        "on_failure": [
          {
            "fail": {
              "if": "_ingest.on_failure_processor_type != 'pipeline' || _ingest.on_failure_processor_tag != 'fleet-custom-caller' || !_ingest.on_failure_message.contains('non-existent pipeline [test-custom]')",
              "message": "test-custom pipeline failed with {{ _ingest.on_failure_message }}"
            }
          }
        ]
      }
    }
  ]
}

felixbarny · 2022-06-02T11:31:44Z

I think it would be cleaner to create a new parameter, such as ignore_missing and avoid an exception to be created in the first place.

joshdover · 2022-06-02T12:03:11Z

Opened an issue to discuss this: elastic/elasticsearch#87323

gbanasiak · 2022-10-20T17:10:13Z

Has elastic/kibana#133740 made this issue obsolete?

axw · 2022-10-24T01:30:26Z

@gbanasiak thanks, yes, this is resolved by elastic/kibana#133740.

simitt mentioned this issue Feb 5, 2021

[meta] APM Server managed by Elastic Agent with Fleet (GA) elastic/apm-server#4636

Closed

16 tasks

rw-access pushed a commit to rw-access/package-spec that referenced this issue Mar 23, 2021

Minor fixes: typos and formatting (elastic#129)

6c3c762

* Removing outdated information * Moving context to correct place * Fixing links to anchors

simitt mentioned this issue Jun 1, 2021

[meta] Elastic Agent APM Integration with Fleet [7.14] elastic/apm-server#5388

Closed

20 tasks

simitt mentioned this issue Nov 2, 2021

Add content to APM User guide elastic/apm-server#6378

Merged

7 tasks

jsoriano mentioned this issue May 2, 2022

Add sample "input" packages #325

Merged

bmorelli25 mentioned this issue May 4, 2022

[APM] Document custom ingest pipelines elastic/observability-docs#1226

Closed

joshdover mentioned this issue Jun 2, 2022

Add ability to ignore a missing pipeline to the pipeline processor elastic/elasticsearch#87323

Closed

axw closed this as completed Oct 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Ingest Pipelines][discuss] Support custom pipelines #129

[Ingest Pipelines][discuss] Support custom pipelines #129

simitt commented Feb 5, 2021

axw commented Mar 25, 2021

jalvz commented Mar 25, 2021

axw commented Mar 25, 2021

ruflin commented Mar 25, 2021

felixbarny commented Apr 11, 2022

axw commented Apr 11, 2022

felixbarny commented Apr 11, 2022

felixbarny commented Apr 11, 2022 •

edited

Loading

axw commented Apr 11, 2022

ruflin commented Apr 11, 2022

felixbarny commented Apr 11, 2022 •

edited

Loading

joshdover commented Jun 2, 2022

felixbarny commented Jun 2, 2022

joshdover commented Jun 2, 2022

gbanasiak commented Oct 20, 2022

axw commented Oct 24, 2022

[Ingest Pipelines][discuss] Support custom pipelines #129

[Ingest Pipelines][discuss] Support custom pipelines #129

Comments

simitt commented Feb 5, 2021

Feature Request

Why is it important

Current State

axw commented Mar 25, 2021

jalvz commented Mar 25, 2021

axw commented Mar 25, 2021

ruflin commented Mar 25, 2021

felixbarny commented Apr 11, 2022

axw commented Apr 11, 2022

felixbarny commented Apr 11, 2022

felixbarny commented Apr 11, 2022 • edited Loading

axw commented Apr 11, 2022

ruflin commented Apr 11, 2022

felixbarny commented Apr 11, 2022 • edited Loading

joshdover commented Jun 2, 2022

felixbarny commented Jun 2, 2022

joshdover commented Jun 2, 2022

gbanasiak commented Oct 20, 2022

axw commented Oct 24, 2022

felixbarny commented Apr 11, 2022 •

edited

Loading

felixbarny commented Apr 11, 2022 •

edited

Loading