[Fleet] Support local routing rules during integration installation #155910

kpollich · 2023-04-26T17:26:00Z

When integrations are installed, Fleet should honor any local routing rules defined by the integration. This means that when an integration defines a "catch-all" dataset that routes data to that integration's own data streams, the resulting ingest pipeline for the "catch-all" dataset contains reroute processors as appropriate.

For example, when an nginx data stream is defined as follows

# nginx/data_stream/nginx/manifest.yml
title: Nginx logs
type: logs

# This is a catch-all "sink" data stream that routes documents to 
# other datasets based on conditions or variables
dataset: nginx

# Ensures agents have permissions to write data to `logs-nginx.*-*`
elasticsearch.dynamic_dataset: true
elasticsearch.dynamic_namespace: true

routing_rules:
  # "Local" routing rules are included under this current dataset, not a special case
  nginx:
    # Route error logs to `nginx.error` when they're sourced from an error logfile
    - dataset: nginx.error
      if: "ctx?.file?.path?.contains('/var/log/nginx/error')"
      namespace:
        - {{labels.data_stream.namespace}}
        - default

    # Route access logs to `nginx.access` when they're sourced from an access logfile
    - dataset: nginx.access
      if: "ctx?.file?.path?.contains('/var/log/nginx/access')"
      namespace:
        - {{labels.data_stream.namespace}}
        - default

A resulting logs-nginx-1.2.3 ingest pipeline should be generated that includes the following processors:

{
    "processors": [
        {
          "pipeline": {
             "name": "logs-nginx@custom"
          }
        },
        {
            "reroute": {
                "tag": "logs-nginx",
                "dataset": "nginx.error",
                "namespace": [
                    "{{labels.data_stream.namespace}}",
                    "default"
                ],
                "if": "ctx?.file?.path?.contains('/var/log/nginx/error')"
            }
        },
        {
            "reroute": {
                "tag": "logs-nginx",
                "dataset": "nginx.access",
                "namespace": [
                    "{{labels.data_stream.namespace}}",
                    "default"
                ],
                "if": "ctx?.file?.path?.contains('/var/log/nginx/access')"
            }
        }
    ]
}

Note: This issue does not include the support of "injected" routing rules that are defined on non-local datasets. That will be implemented in a follow-up captured by #157422.

Acceptance Criteria

For each routing_rule defined in a data stream manifest, a corresponding reroute processor is generated on the resulting ingest pipeline that routes to the provided dataset value on the defined rule
Ensure the @custom pipeline processor is called BEFORE the reroute processors we generate - this ensures user customizations and routing overrides can be provided if needed
Each processor includes a tag value set to the datastream name from which the routing rule was sourced
Each processor includes the namespace and if value provided by the rule
Only "local" rules are supported, if a rule is defined for a non-local dataset, it is ignored

The text was updated successfully, but these errors were encountered:

felixbarny · 2023-05-12T14:52:24Z

A resulting logs-nginx-1.2.3 ingest pipeline should be generated that includes the following processors

We need to discuss the names and the order of the different ingest pipelines. Data streams managed by fleet have a logs-<integration>-<version> pipeline that's structured like this:

Integration-specific processing
invoke <package>@custom

It seems like the suggestion is to add routing rules directly to the default pipeline of the integration. While this is possible, there are a few things to watch out for when it comes to ordering and we should at least discuss whether to create dedicated routing pipelines.

Option 1: put package-defined routing rules in the default pipeline
Basically this is what's currently proposed. However, we'll want to make sure the user can add routing rules with a higher precedence than the package-provided rules. Therefore, the <package>@custom needs to be invoked before the package, provided routing rules.

Integration-provided processors...
invoke <package>@custom
Integration-provided routing rules...

Option 2: dedicated pipeline for package-provided routing rules

We could put all package-provided routing rules in a dedicated pipeline. This may make it easier to show all package-provided routing rules in a routing rules UI.

Integration-provided processors...
invoke <package>@custom
invoke <package>@routing

Option 2a: dedicated pipeline for custom routing rules

Similar to the above, a custom pipeline that's dedicated to just routing rules might make it easier for a routing rules UI to display and add user-provided routing rules. The downside is that this UI would then have a lot of assumptions about the pipeline names and there may be routing rules in other pipelines that it would either not show or are read-only.

Integration-provided processors...
invoke <package>@custom
invoke <package>@custom_routing
invoke <package>@routing

kpollich · 2023-05-18T13:05:39Z

@felixbarny - Thanks for providing that summary. I think leaving room for user customizations is definitely an important consideration here. However, this issue is only concerning itself with the "package-sourced" rules implementation. I think it's good to future proof our plans here to make sure we're not going to collide with any follow-up implementations around user-provided routing rules.

To that end, I think Option 1 that you've listed is still the most compelling to me. Introducing new pipelines and naming schemes means we need to carefully document and tutorialize these customization practices, whereas by simply appending routing rules after the existing pipeline: <package>@custom call this implementation becomes purely additive.

One thing I want to avoid is a user opening their integration's pipeline definition and being overwhelmed by how many specially named processors, pipelines, etc exist and not being able to quickly add customizations. The simplicity of a single @custom pipeline is very compelling to me here, and reduces the barrier to entry around adding customizations to ingestion.

The organization and neatness around grouping various processors into separate pipelines seems compelling from a technical standpoint, but from an end-user perspective I worry it'd be noisy and needlessly complex. I'm in favor of keeping things simple here, especially for our MVP implementation.

Unless there are any objections, I'll update the description here to capture the ordering of the @custom pipeline processor and package-sourced routing rules as an explicit requirement.

gsantoro · 2023-06-12T10:25:20Z

hey @kpollich if I understand correctly this comment

# Ensures agents have permissions to write data to `logs-nginx.*-*`

should be changed to

# Ensures agents have permissions to write data to `logs-*-*`

since the dynamic dataset.

Can you please confirm that since I am adding the same comment in my connected PR. thanks

jlind23 · 2023-06-23T11:52:58Z

@nchaulet once you start working on this it would be great to kick an initial call with the observability folks in order to get a broad understanding of this.

kpollich mentioned this issue Apr 26, 2023

[Fleet] Support for document-based routing via ingest pipelines #151898

Closed

botelastic bot added the needs-team Issues missing a team label label Apr 26, 2023

kpollich added Team:Fleet Team label for Observability Data Collection Fleet team and removed needs-team Issues missing a team label labels Apr 26, 2023

kpollich changed the title ~~[Fleet] Support routing rules during integration installation~~ [Fleet] Support localrouting rules during integration installation May 11, 2023

kpollich changed the title ~~[Fleet] Support localrouting rules during integration installation~~ [Fleet] Support local routing rules during integration installation May 11, 2023

This was referenced May 11, 2023

[Fleet] Add support for routing rules in integrations elastic/package-spec#514

Closed

[Fleet] Support injected routing rules during integration installation #157422

Open

jlind23 assigned criamico May 22, 2023

gsantoro mentioned this issue May 26, 2023

Allow routing for integrations that are not input packages elastic/integrations#6340

Merged

4 tasks

jlind23 assigned nchaulet and unassigned criamico Jun 14, 2023

jlind23 mentioned this issue Jun 23, 2023

[Fleet] Document how routing rules can be used elastic/ingest-docs#276

Open

nchaulet mentioned this issue Jul 10, 2023

[Fleet] Support local routing rules #161573

Merged

3 tasks

nchaulet closed this as completed in #161573 Jul 12, 2023

felixbarny mentioned this issue Jul 13, 2023

Allow users to specify dataset and namespace for rerouting data collected by kubernetes.container_logs datastream elastic/integrations#6845

Closed

felixbarny mentioned this issue Jul 24, 2023

[Kubernetes] Reroute container logs based on pod annotations elastic/integrations#7118

Merged

14 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Fleet] Support local routing rules during integration installation #155910

[Fleet] Support local routing rules during integration installation #155910

kpollich commented Apr 26, 2023 •

edited by nchaulet

Loading

felixbarny commented May 12, 2023

kpollich commented May 18, 2023

gsantoro commented Jun 12, 2023 •

edited

Loading

jlind23 commented Jun 23, 2023

[Fleet] Support local routing rules during integration installation #155910

[Fleet] Support local routing rules during integration installation #155910

Comments

kpollich commented Apr 26, 2023 • edited by nchaulet Loading

Acceptance Criteria

felixbarny commented May 12, 2023

kpollich commented May 18, 2023

gsantoro commented Jun 12, 2023 • edited Loading

jlind23 commented Jun 23, 2023

kpollich commented Apr 26, 2023 •

edited by nchaulet

Loading

gsantoro commented Jun 12, 2023 •

edited

Loading