Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

out_opentelemetry: make add_label have an effect on logs #7029

Closed
wants to merge 4 commits into from

Conversation

Frefreak
Copy link

This patch makes add_label option in opentelemetry output plugin has an effect on logs data. Previously only metrics
data will get the labels added. This is useful because then logs can then be easily filtered using their labels (provided I set different labels config for different workloads)

The labels (kv) will be added to opentelemetry logs data, in Resource attributes (not LogRecord attributes) so multiple LogRecords can share the same key-value pairs.


Testing
Before we can approve your change; please submit the following in a comment:

  • Example configuration file for the change
  • Debug log output from testing the change
  • Attached Valgrind output that shows no leaks or memory corruption was found

If this is a change to packaging of containers or native binaries then please confirm it works for all targets.

  • [N/A] Run local packaging test showing all targets (including any new ones) build.
  • [N/A] Set ok-package-test label to test for all targets (requires maintainer to do).

Documentation

  • [N/A] Documentation required for this feature

Backporting

  • [N/A] Backport to latest stable release.

Configuration

service:
  flush: 3
  log_level: debug
  http_server: on

pipeline:
  inputs:
    - name: tail
      path: /tmp/asdf.txt

  outputs:
    - name: opentelemetry
      match: '*'
      host: localhost
      port: 4318
      # log_response_payload: true
      add_label: app fluent-bit
      add_label: color blue
    - name: stdout
      match: '*'

Debug/Valgrind log output

==673768== Memcheck, a memory error detector
==673768== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==673768== Using Valgrind-3.19.0 and LibVEX; rerun with -h for copyright info
==673768== Command: bin/fluent-bit -c /home/adv_zxy/test-fluent-bit/fluent-bit-conf.yaml
==673768==
Fluent Bit v2.1.0
* Copyright (C) 2015-2022 The Fluent Bit Authors
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

[2023/03/17 16:25:31] [ info] Configuration:
[2023/03/17 16:25:31] [ info]  flush time     | 3.000000 seconds
[2023/03/17 16:25:31] [ info]  grace          | 5 seconds
[2023/03/17 16:25:31] [ info]  daemon         | 0
[2023/03/17 16:25:31] [ info] ___________
[2023/03/17 16:25:31] [ info]  inputs:
[2023/03/17 16:25:31] [ info]      tail
[2023/03/17 16:25:31] [ info] ___________
[2023/03/17 16:25:31] [ info]  filters:
[2023/03/17 16:25:31] [ info] ___________
[2023/03/17 16:25:31] [ info]  outputs:
[2023/03/17 16:25:31] [ info]      opentelemetry.0
[2023/03/17 16:25:31] [ info]      stdout.1
[2023/03/17 16:25:31] [ info] ___________
[2023/03/17 16:25:31] [ info]  collectors:
[2023/03/17 16:25:32] [ info] [fluent bit] version=2.1.0, commit=2467d5bb01, pid=673768
[2023/03/17 16:25:32] [debug] [engine] coroutine stack size: 24576 bytes (24.0K)
[2023/03/17 16:25:32] [ info] [output:stdout:stdout.1] worker #0 started
[2023/03/17 16:25:32] [ info] [storage] ver=1.4.0, type=memory, sync=normal, checksum=off, max_chunks_up=128
[2023/03/17 16:25:32] [ info] [cmetrics] version=0.5.8
[2023/03/17 16:25:32] [ info] [ctraces ] version=0.3.0
[2023/03/17 16:25:32] [ info] [input:tail:tail.0] initializing
[2023/03/17 16:25:32] [ info] [input:tail:tail.0] storage_strategy='memory' (memory only)
[2023/03/17 16:25:32] [debug] [tail:tail.0] created event channels: read=37 write=38
[2023/03/17 16:25:32] [debug] [input:tail:tail.0] flb_tail_fs_inotify_init() initializing inotify tail input
[2023/03/17 16:25:32] [debug] [input:tail:tail.0] inotify watch fd=43
[2023/03/17 16:25:32] [debug] [input:tail:tail.0] scanning path /tmp/asdf.txt
[2023/03/17 16:25:32] [debug] [input:tail:tail.0] inode=22557 with offset=831 appended as /tmp/asdf.txt
[2023/03/17 16:25:32] [debug] [input:tail:tail.0] scan_glob add(): /tmp/asdf.txt, inode 22557
[2023/03/17 16:25:32] [debug] [input:tail:tail.0] 1 new files found on path '/tmp/asdf.txt'
[2023/03/17 16:25:32] [debug] [opentelemetry:opentelemetry.0] created event channels: read=45 write=46
[2023/03/17 16:25:32] [debug] [stdout:stdout.1] created event channels: read=47 write=48
[2023/03/17 16:25:32] [debug] [router] match rule tail.0:opentelemetry.0
[2023/03/17 16:25:32] [debug] [router] match rule tail.0:stdout.1
[2023/03/17 16:25:32] [ info] [http_server] listen iface=0.0.0.0 tcp_port=2020
[2023/03/17 16:25:32] [ info] [sp] stream processor started
[2023/03/17 16:25:32] [debug] [input:tail:tail.0] inode=22557 file=/tmp/asdf.txt promote to TAIL_EVENT
[2023/03/17 16:25:32] [ info] [input:tail:tail.0] inotify_fs_add(): inode=22557 watch_fd=1 name=/tmp/asdf.txt
[2023/03/17 16:25:32] [debug] [input:tail:tail.0] [static files] processed 0b, done
[2023/03/17 16:25:37] [debug] [input:tail:tail.0] inode=22557, /tmp/asdf.txt, events: IN_MODIFY
[2023/03/17 16:25:37] [debug] [input chunk] update output instances with new chunk size diff=59
[0] tail.0: [1679041537.238717626, {"log"=>"{"key 1": "123k6789", "key 2": "abcdefg"}"}]
[2023/03/17 16:25:38] [debug] [task] created task=0x561adb0 id=0 OK
[2023/03/17 16:25:38] [debug] [output:stdout:stdout.1] task_id=0 assigned to thread #0
[2023/03/17 16:25:38] [debug] [out flush] cb_destroy coro_id=0
[2023/03/17 16:25:38] [debug] [upstream] KA connection #79 to localhost:4318 is connected
[2023/03/17 16:25:38] [debug] [http_client] not using http_proxy for header
[2023/03/17 16:25:38] [ info] [output:opentelemetry:opentelemetry.0] localhost:4318, HTTP status=200
[2023/03/17 16:25:38] [debug] [upstream] KA connection #79 to localhost:4318 is now available
[2023/03/17 16:25:38] [debug] [out flush] cb_destroy coro_id=0
[2023/03/17 16:25:38] [debug] [task] destroy task=0x561adb0 (task_id=0)
^C[2023/03/17 16:25:39] [engine] caught signal (SIGINT)
[2023/03/17 16:25:39] [ warn] [engine] service will shutdown in max 5 seconds
[2023/03/17 16:25:39] [ info] [input] pausing tail.0
[2023/03/17 16:25:40] [ info] [engine] service has stopped (0 pending tasks)
[2023/03/17 16:25:40] [ info] [input] pausing tail.0
[2023/03/17 16:25:40] [debug] [input:tail:tail.0] inode=22557 removing file name /tmp/asdf.txt
[2023/03/17 16:25:40] [ info] [input:tail:tail.0] inotify_fs_remove(): inode=22557 watch_fd=1
[2023/03/17 16:25:40] [ info] [output:stdout:stdout.1] thread worker #0 stopping...
[2023/03/17 16:25:40] [ info] [output:stdout:stdout.1] thread worker #0 stopped
==673768==
==673768== HEAP SUMMARY:
==673768==     in use at exit: 0 bytes in 0 blocks
==673768==   total heap usage: 12,895 allocs, 12,895 frees, 2,020,137 bytes allocated
==673768==
==673768== All heap blocks were freed -- no leaks are possible
==673768==
==673768== For lists of detected and suppressed errors, rerun with: -s
==673768== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

opentelemetry screenshot

below is opentelemetry collector's logging exporter's output, the labels can be seen in the "Resource" section
image


Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

@Frefreak
Copy link
Author

Is there anything i'm missing? The 2 workflows seem to be stuck on "Waiting for status to be reported".

@patrick-stephens
Copy link
Contributor

No, it's awaiting approval until we allow it as a new contributor - otherwise it opens the door to people submitting malicious PRs just to execute workloads or try to expose secrets.

@patrick-stephens
Copy link
Contributor

Docs are required to be updated as well, they currently make it clear labels only apply to metrics:

This allows you to add custom labels to all metrics exposed through the OpenTelemetry exporter. You may have multiple of these fields

Please provide a docs PR as well and link it to this PR.
https://github.com/fluent/fluent-bit-docs/blob/master/pipeline/outputs/opentelemetry.md

@Frefreak Frefreak temporarily deployed to pr March 22, 2023 10:30 — with GitHub Actions Inactive
@Frefreak Frefreak temporarily deployed to pr March 22, 2023 10:31 — with GitHub Actions Inactive
@Frefreak Frefreak temporarily deployed to pr March 22, 2023 10:31 — with GitHub Actions Inactive
@Frefreak Frefreak temporarily deployed to pr March 22, 2023 10:56 — with GitHub Actions Inactive
@Frefreak
Copy link
Author

Thanks for the clarification.

Docs added: fluent/fluent-bit-docs#1060

@Frefreak
Copy link
Author

Frefreak commented Apr 3, 2023

Gentle bump for a review.

@amitava82
Copy link

Can we please get this reviewed? I'm looking for a way to add Resource attribute so that I can add service.name.

Copy link
Collaborator

@leonardo-albertovich leonardo-albertovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I'd prefer this to be implemented in a holistic manner with the log event attribute facility which would also entail adding support for it to processor_logs but considering the invested effort and timing this could be good enough to make the feature available.

@@ -723,6 +731,37 @@ static int flush_to_otel(struct opentelemetry_context *ctx,
resource_log.n_scope_logs = 1;
resource_logs[0] = &resource_log;

kv_size = mk_list_size(&(ctx->kv_labels));
attributes_list = flb_calloc(kv_size, sizeof(Opentelemetry__Proto__Common__V1__KeyValue *));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code that depends on kv_size holding a value higher than zero should be conditional so it doesn't run unless necessary.

}
attributes = flb_calloc(kv_size, sizeof(Opentelemetry__Proto__Common__V1__KeyValue));
if (attributes == NULL) {
flb_errno();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

attributes_list is leaked in this code path.

kv = mk_list_entry(kv_head, struct flb_kv, _head);
opentelemetry__proto__common__v1__key_value__init(&attributes[kv_index]);
attributes[kv_index].key = kv->key;
attributes[kv_index].value = otlp_any_value_initialize(MSGPACK_OBJECT_STR, 0);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to verify the result of otlp_any_value_initialize here and have a proper cleanup sequence.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this loop could be merged with the one above.

kv_index++;
}

resource_log.resource = flb_calloc(1, sizeof(Opentelemetry__Proto__Resource__V1__Resource));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to check the result of flb_calloc here and have the appropriate cleanup sequence as well.

@Frefreak
Copy link
Author

Frefreak commented Jun 26, 2023

Thanks for the comments, will make the modifications later (probably tomorrow)

Signed-off-by: Xiangyu Zhu <[email protected]>
@Frefreak Frefreak temporarily deployed to pr June 27, 2023 12:03 — with GitHub Actions Inactive
@Frefreak Frefreak temporarily deployed to pr June 27, 2023 12:03 — with GitHub Actions Inactive
@Frefreak
Copy link
Author

Hi all, could we please get his reviewed again?

@github-actions
Copy link
Contributor

This PR is stale because it has been open 45 days with no activity. Remove stale label or comment or this will be closed in 10 days.

@github-actions github-actions bot added the Stale label Oct 18, 2023
@SHaaD94
Copy link

SHaaD94 commented Jan 10, 2024

Hi guys.
I really need this change as well. Is anyone working on it right now? Any way I could help?

@Frefreak
Copy link
Author

As far as I understand this needs to get reviewed in order to proceed (I had made the modifications with regarding to the comments). If there's anything else that needs to be changed I'm willing to modify accordingly.

@github-actions github-actions bot removed the Stale label Jan 11, 2024
Copy link
Contributor

This PR is stale because it has been open 45 days with no activity. Remove stale label or comment or this will be closed in 10 days.

@github-actions github-actions bot added the Stale label Apr 12, 2024
@omri-shilton
Copy link

any news on this?

@github-actions github-actions bot removed the Stale label Jun 5, 2024
@edsiper
Copy link
Member

edsiper commented Jun 25, 2024

There are a couple of recent OTel changes in Fluent Bit, specifically around Logs handling in the input and output, those changes will be available in v3.1 release, some references:

@edsiper edsiper closed this Jun 25, 2024
@azat
Copy link

azat commented Nov 20, 2024

Hi @edsiper, Add_labels still looks useful, since processors should be specified on a per-input basis (though maybe I'm missing something)

@azat
Copy link

azat commented Nov 20, 2024

Never mind, processors can be specified for output

Maybe it will be useful for someone:

outputs:
- name: opentelemetry
  match: "*"
  port: 4318
  processors:
    logs:
    - name: opentelemetry_envelope
    - name: content_modifier
      context: otel_resource_attributes
      action: insert
      key: service.name
      value: foobar
    - name: content_modifier
      context: otel_resource_attributes
      action: insert
      key: service.label
      value: "${SERVICE_LABEL}"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants