Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fleet] Support local routing rules during integration installation #155910

Closed
5 tasks done
Tracked by #151898
kpollich opened this issue Apr 26, 2023 · 4 comments · Fixed by #161573
Closed
5 tasks done
Tracked by #151898

[Fleet] Support local routing rules during integration installation #155910

kpollich opened this issue Apr 26, 2023 · 4 comments · Fixed by #161573
Assignees
Labels
Team:Fleet Team label for Observability Data Collection Fleet team

Comments

@kpollich
Copy link
Member

kpollich commented Apr 26, 2023

Ref elastic/package-spec#514

When integrations are installed, Fleet should honor any local routing rules defined by the integration. This means that when an integration defines a "catch-all" dataset that routes data to that integration's own data streams, the resulting ingest pipeline for the "catch-all" dataset contains reroute processors as appropriate.

For example, when an nginx data stream is defined as follows

# nginx/data_stream/nginx/manifest.yml
title: Nginx logs
type: logs

# This is a catch-all "sink" data stream that routes documents to 
# other datasets based on conditions or variables
dataset: nginx

# Ensures agents have permissions to write data to `logs-nginx.*-*`
elasticsearch.dynamic_dataset: true
elasticsearch.dynamic_namespace: true

routing_rules:
  # "Local" routing rules are included under this current dataset, not a special case
  nginx:
    # Route error logs to `nginx.error` when they're sourced from an error logfile
    - dataset: nginx.error
      if: "ctx?.file?.path?.contains('/var/log/nginx/error')"
      namespace:
        - {{labels.data_stream.namespace}}
        - default

    # Route access logs to `nginx.access` when they're sourced from an access logfile
    - dataset: nginx.access
      if: "ctx?.file?.path?.contains('/var/log/nginx/access')"
      namespace:
        - {{labels.data_stream.namespace}}
        - default

A resulting logs-nginx-1.2.3 ingest pipeline should be generated that includes the following processors:

{
    "processors": [
        {
          "pipeline": {
             "name": "logs-nginx@custom"
          }
        },
        {
            "reroute": {
                "tag": "logs-nginx",
                "dataset": "nginx.error",
                "namespace": [
                    "{{labels.data_stream.namespace}}",
                    "default"
                ],
                "if": "ctx?.file?.path?.contains('/var/log/nginx/error')"
            }
        },
        {
            "reroute": {
                "tag": "logs-nginx",
                "dataset": "nginx.access",
                "namespace": [
                    "{{labels.data_stream.namespace}}",
                    "default"
                ],
                "if": "ctx?.file?.path?.contains('/var/log/nginx/access')"
            }
        }
    ]
}

Note: This issue does not include the support of "injected" routing rules that are defined on non-local datasets. That will be implemented in a follow-up captured by #157422.

Acceptance Criteria

  • For each routing_rule defined in a data stream manifest, a corresponding reroute processor is generated on the resulting ingest pipeline that routes to the provided dataset value on the defined rule
  • Ensure the @custom pipeline processor is called BEFORE the reroute processors we generate - this ensures user customizations and routing overrides can be provided if needed
  • Each processor includes a tag value set to the datastream name from which the routing rule was sourced
  • Each processor includes the namespace and if value provided by the rule
  • Only "local" rules are supported, if a rule is defined for a non-local dataset, it is ignored
@botelastic botelastic bot added the needs-team Issues missing a team label label Apr 26, 2023
@kpollich kpollich added Team:Fleet Team label for Observability Data Collection Fleet team and removed needs-team Issues missing a team label labels Apr 26, 2023
@kpollich kpollich changed the title [Fleet] Support routing rules during integration installation [Fleet] Support localrouting rules during integration installation May 11, 2023
@kpollich kpollich changed the title [Fleet] Support localrouting rules during integration installation [Fleet] Support local routing rules during integration installation May 11, 2023
@felixbarny
Copy link
Member

A resulting logs-nginx-1.2.3 ingest pipeline should be generated that includes the following processors

We need to discuss the names and the order of the different ingest pipelines. Data streams managed by fleet have a logs-<integration>-<version> pipeline that's structured like this:

  1. Integration-specific processing
  2. invoke <package>@custom

It seems like the suggestion is to add routing rules directly to the default pipeline of the integration. While this is possible, there are a few things to watch out for when it comes to ordering and we should at least discuss whether to create dedicated routing pipelines.

Option 1: put package-defined routing rules in the default pipeline
Basically this is what's currently proposed. However, we'll want to make sure the user can add routing rules with a higher precedence than the package-provided rules. Therefore, the <package>@custom needs to be invoked before the package, provided routing rules.

  1. Integration-provided processors...
  2. invoke <package>@custom
  3. Integration-provided routing rules...

Option 2: dedicated pipeline for package-provided routing rules

We could put all package-provided routing rules in a dedicated pipeline. This may make it easier to show all package-provided routing rules in a routing rules UI.

  1. Integration-provided processors...
  2. invoke <package>@custom
  3. invoke <package>@routing

Option 2a: dedicated pipeline for custom routing rules

Similar to the above, a custom pipeline that's dedicated to just routing rules might make it easier for a routing rules UI to display and add user-provided routing rules. The downside is that this UI would then have a lot of assumptions about the pipeline names and there may be routing rules in other pipelines that it would either not show or are read-only.

  1. Integration-provided processors...
  2. invoke <package>@custom
  3. invoke <package>@custom_routing
  4. invoke <package>@routing

@kpollich
Copy link
Member Author

@felixbarny - Thanks for providing that summary. I think leaving room for user customizations is definitely an important consideration here. However, this issue is only concerning itself with the "package-sourced" rules implementation. I think it's good to future proof our plans here to make sure we're not going to collide with any follow-up implementations around user-provided routing rules.

To that end, I think Option 1 that you've listed is still the most compelling to me. Introducing new pipelines and naming schemes means we need to carefully document and tutorialize these customization practices, whereas by simply appending routing rules after the existing pipeline: <package>@custom call this implementation becomes purely additive.

One thing I want to avoid is a user opening their integration's pipeline definition and being overwhelmed by how many specially named processors, pipelines, etc exist and not being able to quickly add customizations. The simplicity of a single @custom pipeline is very compelling to me here, and reduces the barrier to entry around adding customizations to ingestion.

The organization and neatness around grouping various processors into separate pipelines seems compelling from a technical standpoint, but from an end-user perspective I worry it'd be noisy and needlessly complex. I'm in favor of keeping things simple here, especially for our MVP implementation.

Unless there are any objections, I'll update the description here to capture the ordering of the @custom pipeline processor and package-sourced routing rules as an explicit requirement.

@gsantoro
Copy link
Contributor

gsantoro commented Jun 12, 2023

hey @kpollich if I understand correctly this comment

# Ensures agents have permissions to write data to `logs-nginx.*-*`

should be changed to

# Ensures agents have permissions to write data to `logs-*-*`

since the dynamic dataset.

Can you please confirm that since I am adding the same comment in my connected PR. thanks

@jlind23
Copy link
Contributor

jlind23 commented Jun 23, 2023

@nchaulet once you start working on this it would be great to kick an initial call with the observability folks in order to get a broad understanding of this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Fleet Team label for Observability Data Collection Fleet team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants