-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Ingest Pipelines][discuss] Support custom pipelines #129
Comments
* Removing outdated information * Moving context to correct place * Fixing links to anchors
One challenge around customising pipelines is that they are renamed by Fleet, based on the data stream and package version. When you upgrade the package, new pipelines will be installed and customisations to existing ones will be lost. Also, because pipelines may change between package versions, I'm not sure we can ever sensibly carry customisations across. I'm inclined to find a solution that does not involve either manually modifying pipelines that are installed by Fleet, or changing which of those pipelines are run. Instead, can we limit the feature to enabling users to define a pipeline which runs either before or after the default pipeline (or both)? This could be implemented in multiple ways, e.g. post-default by using a final pipeline, or by having Fleet inject pipeline processor(s) into the package-defined pipeline, referencing the user-specified pipeline(s). |
Users have already other ways to create ingest pipelines, right? (ingest API, Kibana) What if we don't give them another way to do it and resort existing mechanisms? Or in other words: how the suggested way is better/simpler? |
The problem is not in modifying a pipeline, it's in carrying it across package upgrades. Users do not have an existing mechanism for that. They can create their own custom pipeline using the usual mechanisms. Maybe we could just allow users to modify the pipeline installed by Fleet, and have Fleet perform a three-way merge with the new pipeline on upgrade. I'm not convinced that'll be simpler, but it's certainly an option. |
I would prefer to stay away from a 3-way merge if possible. Instead I think it would be nicer if Elasticsearch would support to add a list of ingest pipelines: elastic/elasticsearch#61185 Like this we only need to merge the list of pipelines and not the pipelines itself. The alternative is using the full fledged document routing where Elasticsearch would support adding certain pipelines for certain data(streams): elastic/elasticsearch#63798 |
I’ve been thinking about this in a different context and have created a proposal for a potential solution: elastic/elasticsearch#61185 (comment). @axw @simitt could you have a look and see whether the proposal would work for the use case mentioned in this issue? |
@felixbarny thanks for chiming in. I think that would work. One issue would be about ownership: who owns I had in mind something similar, but reversed:
e.g. something like
|
Good question. One approach would be to never remove the whole pipeline but to only remove the pipeline processor that's managed by the integration which can be identified with a processor tag. One issue is that re-installing the integration could mess up the order. A potential solution would be to not remove the pipeline processor, just disabling it with
I've been thinking about that, too. It would even be possible today by configuring the pipeline processors with |
Or we just create empty before/after pipelines where users can add their processors. (inspired by elastic/elasticsearch#61185 (comment)) |
Good idea. That sounds reasonable to me, since it's exactly what we do for component templates. |
Is there any cost attached to empty pipelines? |
Based on my read of the Elasticsearch code, my expectation is that the overhead should be negligible. The way it works is that Elasticsearch is looking up the pipeline from a Executing an empty pipeline should be about as much overhead as iterating over an empty list. |
I've attempted to do this using an My idea was to use an
|
I think it would be cleaner to create a new parameter, such as |
Opened an issue to discuss this: elastic/elasticsearch#87323 |
Has elastic/kibana#133740 made this issue obsolete? |
@gbanasiak thanks, yes, this is resolved by elastic/kibana#133740. |
Feature Request
Allow users to customize Ingest Pipelines per data stream. This might be done in multiple iterations, e.g.
(1) allow to disable loading the default pipelines, and documenting how custom pipelines can be set up manually. Removing pipelines could break data streams. The package-spec could be extended to define which pipelines are required and which are optional. Optional pipelines could then be disabled by users.
(2) support customizing default pipelines. Building on top of (1) the spec could be extended to allow adding custom pipelines.
Why is it important
APM Server today allows users to define pipelines via a json file. It allows users to configure whether ingest pipelines should be set up and overwritten. When migrating to managed APM Server users should at least be able to set up their pipelines manually and ensure these are not overwritten.
Current State
@ruflin started a docs PR to document how users can set up their own pipelines.
The text was updated successfully, but these errors were encountered: