Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for extracting attributes/labels from log body #14938

Closed
lsampras opened this issue Oct 13, 2022 · 11 comments
Closed

Support for extracting attributes/labels from log body #14938

lsampras opened this issue Oct 13, 2022 · 11 comments
Labels
enhancement New feature or request exporter/loki Loki Exporter processor/attributes Attributes processor processor/transform Transform processor

Comments

@lsampras
Copy link

Is your feature request related to a problem? Please describe.

I wanna set loki labels for my logs based on a key-value pair in my JSON formatted logs,
we provide an option to set loki labels from attributes but I'm not able to extract the body value to an attribute...

E.g
assuming the logs to be something like

{
    "message":"Adding oranges to basket",
    "service":"my-service-1",
    "log_type":"orange_log"
    ...
},
{
    "message":"Peeling apples for Milkshake",
    "service":"my-service-1",
    "log_type":"apple_log"
    ...
}

I wanna extract the the log_type and set it as a loki label...

Promtail supports this behaviour via its pipeline_stages configuration

  pipeline_stages:
  - json:
      expressions:
        log_type:
  - labels:
      log_type:

Describe the solution you'd like

I see that we already have the attribute processor which supports setting loki labels and adding context based or static attributes,
we also have the transform processor which can be extended to read/parse from log bodies

I think it would be appropriate to extend the behaviour for one of these processors..

Although not sure if this requirement is loki specific (Since most logging solutions index the body and don't need an explicit label) and needs to be implemented in the loki exporter...

Describe alternatives you've considered

For now the alternatives seem to be either not adding these labels (which would lead to inefficient queries in loki) or using another tool (maybe kafka) to handle this processing...

Additional context

No response

@lsampras lsampras added enhancement New feature or request needs triage New item requiring triage labels Oct 13, 2022
@evan-bradley evan-bradley added processor/attributes Attributes processor processor/transform Transform processor exporter/loki Loki Exporter pkg/stanza and removed needs triage New item requiring triage labels Oct 14, 2022
@github-actions
Copy link
Contributor

Pinging code owners: @gramidt @gouthamve @jpkrohling @kovrus @mar4uk. See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions
Copy link
Contributor

Pinging code owners: @boostchicken. See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions
Copy link
Contributor

Pinging code owners: @djaglowski. See Adding Labels via Comments if you do not have permissions to add labels yourself.

@lsampras
Copy link
Author

lsampras commented Oct 19, 2022

I've added a minimal version here(#15282) which modifies transformprocessor for this behaviour (which satisfies my usecase),
This still is far from completion, will need more direction/consideration before we choose a solution and cover all cases from a feature standpoint...

@djaglowski
Copy link
Member

@lsampras, please take a look at #9410.

We have some fairly robust capabilities to parse and manipulate logs, but only in receivers that use pkg/stanza. The issue above is an attempt to spec out the rough interface that transformprocessor would need to implement in order to support equivalent functionality. The main thing to note about this implementation is that we should separate "parsing" from further manipulation.

In the case of plucking a value from a json log, I think we should have a function that parses the json and assigns the resulting object to a specified field. Separately, we should have a function that moves values from one field to another. This kind of design allows us to compose functions.

@jpkrohling
Copy link
Member

cc @mar4uk and @kovrus: I think this belongs to a processor instead of to the Loki exporter, but I wonder what your opinions are.

@kovrus
Copy link
Member

kovrus commented Oct 20, 2022

cc @mar4uk and @kovrus: I think this belongs to a processor instead of to the Loki exporter, but I wonder what your opinions are.

Agree, it should not be a part of the Loki exporter scope, but the processor components.

@mar4uk
Copy link
Contributor

mar4uk commented Nov 10, 2022

After merging the promtail receiver pipeline_stages will be available the same way they are available in promtail itself.
Just checked it and it works. Here is config:

receivers:
  promtail:
    config:
        - job_name: loki_push
          loki_push_api:
            server:
              http_listen_port: 3101
              grpc_listen_port: 3600
            labels:
              pushserver: push1
            use_incoming_timestamp: true
          pipeline_stages:
            - json:
                expressions:
                  http_code: http_code
            - labels:
                http_code:

      target_config:
        sync_period: 10s

processors:
  attributes:
    actions:
      - action: insert
        key: loki.attribute.labels
        value: http_code

exporters:
  loki:
    endpoint: http://localhost:3100/loki/api/v1/push
service:
  pipelines:
    logs:
      receivers: [ promtail ]
      processors: [ attributes ]
      exporters: [ loki ]

So when using promtail receiver -> loki exporter labels will work out of the box.
If another receiver is used then I agree that extraction belongs to a processor. But which one? transformprocessor? logstransformprocessor?

@djaglowski
Copy link
Member

If another receiver is used then I agree that extraction belongs to a processor. But which one? transformprocessor? logstransformprocessor?

It is intended that logstransformprocessor will be deprecated as soon as its functionality is available in transformprocessor, so I would recommend any new functionality should be added to transformprocessor.

@github-actions
Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Jan 11, 2023
codeboten pushed a commit that referenced this issue Jan 18, 2023
We've had discussion recently around how to handle functions and converters (previously factory functions) that share the same functionality (see #16571)

This PR suggests a different approach that should reduce the need to chain converters/functions.

Before we chained converters together because OTTL had no place to store data except for the telemetry payload itself. While attributes could be used, it results in the need for cleanup and potentially unwanted transformations based on condition resolution. This PR introduces the concept of tmp to the logs contexts (and future contexts if we like the solution) that statements can use as a "staging" location for complex transformations.

Before to handle situations like the one in #14938 we would have to chain together functions. Pretending we had KeepKeys converter the statement would look like

merge_maps(attributes, KeepKeys(ParseJSON(body), ["log_type"]), "upsert").

These types of statements are tricky to write and can difficult to comprehend, especially for new users. Each Converter we add on increased the burden. Adding a single extra function, like Flatten, really makes a difference: merge_maps(attributes, Flatten(KeepKeys(ParseJSON(body), ["log_type"]), "."), "upsert")

Co-authored-by: Evan Bradley <[email protected]>
@TylerHelmuth
Copy link
Member

TylerHelmuth commented Jan 19, 2023

@lsampras this capability should be available via the next release of the collector via the ParseJson, merge_maps, and keep_keys/delete_key functions and the cache path accessor. Since we have the bare-minimum functionality I am going to close this issue. If you think thats not the case please ping me and we can reopen. If you have related enhancements you'd like to see to the OTTL/transformprocessor for logs please open a new issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request exporter/loki Loki Exporter processor/attributes Attributes processor processor/transform Transform processor
Projects
None yet
Development

No branches or pull requests

7 participants