Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fluent-Bit Crashes when using opentelemetry logs output (version 2.0.6) #6512

Closed
abeach-nr opened this issue Dec 6, 2022 · 15 comments · Fixed by #6583
Closed

Fluent-Bit Crashes when using opentelemetry logs output (version 2.0.6) #6512

abeach-nr opened this issue Dec 6, 2022 · 15 comments · Fixed by #6583

Comments

@abeach-nr
Copy link

Bug Report

Fluent bit crashes when using the opentelemetry output plugin, but only with logs. If the dummy input is omitted, it will not crash

fluent-bit-fluent-bit-1     | Fluent Bit v2.0.6
fluent-bit-fluent-bit-1     | * Copyright (C) 2015-2022 The Fluent Bit Authors
fluent-bit-fluent-bit-1     | * Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
fluent-bit-fluent-bit-1     | * https://fluentbit.io
fluent-bit-fluent-bit-1     | 
fluent-bit-fluent-bit-1     | [2022/12/06 22:05:58] [ info] Configuration:
fluent-bit-fluent-bit-1     | [2022/12/06 22:05:58] [ info]  flush time     | 1.000000 seconds
fluent-bit-fluent-bit-1     | [2022/12/06 22:05:58] [ info]  grace          | 5 seconds
fluent-bit-fluent-bit-1     | [2022/12/06 22:05:58] [ info]  daemon         | 0
fluent-bit-fluent-bit-1     | [2022/12/06 22:05:58] [ info] ___________
fluent-bit-fluent-bit-1     | [2022/12/06 22:05:58] [ info]  inputs:
fluent-bit-fluent-bit-1     | [2022/12/06 22:05:58] [ info]      node_exporter_metrics
fluent-bit-fluent-bit-1     | [2022/12/06 22:05:58] [ info]      dummy
fluent-bit-fluent-bit-1     | [2022/12/06 22:05:58] [ info] ___________
fluent-bit-fluent-bit-1     | [2022/12/06 22:05:58] [ info]  filters:
fluent-bit-fluent-bit-1     | [2022/12/06 22:05:58] [ info] ___________
fluent-bit-fluent-bit-1     | [2022/12/06 22:05:58] [ info]  outputs:
fluent-bit-fluent-bit-1     | [2022/12/06 22:05:58] [ info]      opentelemetry.0
fluent-bit-fluent-bit-1     | [2022/12/06 22:05:58] [ info] ___________
fluent-bit-fluent-bit-1     | [2022/12/06 22:05:58] [ info]  collectors:
fluent-bit-fluent-bit-1     | [2022/12/06 22:05:58] [ info] [fluent bit] version=2.0.6, commit=211f841fe2, pid=1
fluent-bit-fluent-bit-1     | [2022/12/06 22:05:58] [debug] [engine] coroutine stack size: 24576 bytes (24.0K)
fluent-bit-fluent-bit-1     | [2022/12/06 22:05:58] [ info] [storage] created root path /fluent-bit/run/buffer
fluent-bit-fluent-bit-1     | [2022/12/06 22:05:58] [ info] [storage] ver=1.3.0, type=memory+filesystem, sync=normal, checksum=off, max_chunks_up=128
fluent-bit-fluent-bit-1     | [2022/12/06 22:05:58] [ info] [storage] backlog input plugin: storage_backlog.2
fluent-bit-fluent-bit-1     | [2022/12/06 22:05:58] [ info] [cmetrics] version=0.5.7
fluent-bit-fluent-bit-1     | [2022/12/06 22:05:58] [ info] [ctraces ] version=0.2.5
fluent-bit-fluent-bit-1     | [2022/12/06 22:05:58] [ info] [input:node_exporter_metrics:node_exporter_metrics.0] initializing
fluent-bit-fluent-bit-1     | [2022/12/06 22:05:58] [ info] [input:node_exporter_metrics:node_exporter_metrics.0] storage_strategy='memory' (memory only)
fluent-bit-fluent-bit-1     | [2022/12/06 22:05:58] [debug] [node_exporter_metrics:node_exporter_metrics.0] created event channels: read=21 write=22
fluent-bit-fluent-bit-1     | [2022/12/06 22:05:58] [ info] [input:node_exporter_metrics:node_exporter_metrics.0] path.procfs = /proc
fluent-bit-fluent-bit-1     | [2022/12/06 22:05:58] [ info] [input:node_exporter_metrics:node_exporter_metrics.0] path.sysfs  = /sys
fluent-bit-fluent-bit-1     | [2022/12/06 22:05:58] [ info] [input:dummy:dummy.1] initializing
fluent-bit-fluent-bit-1     | [2022/12/06 22:05:58] [ info] [input:dummy:dummy.1] storage_strategy='memory' (memory only)
fluent-bit-fluent-bit-1     | [2022/12/06 22:05:58] [debug] [dummy:dummy.1] created event channels: read=23 write=24
fluent-bit-fluent-bit-1     | [2022/12/06 22:05:58] [ warn] [input:dummy:dummy.1] data is incomplete. Use default string.
fluent-bit-fluent-bit-1     | [2022/12/06 22:05:58] [ info] [input:storage_backlog:storage_backlog.2] initializing
fluent-bit-fluent-bit-1     | [2022/12/06 22:05:58] [ info] [input:storage_backlog:storage_backlog.2] storage_strategy='memory' (memory only)
fluent-bit-fluent-bit-1     | [2022/12/06 22:05:58] [debug] [storage_backlog:storage_backlog.2] created event channels: read=25 write=26
fluent-bit-fluent-bit-1     | [2022/12/06 22:05:58] [ info] [input:storage_backlog:storage_backlog.2] queue memory limit: 95.4M
fluent-bit-fluent-bit-1     | [2022/12/06 22:05:58] [debug] [opentelemetry:opentelemetry.0] created event channels: read=27 write=28
fluent-bit-fluent-bit-1     | [2022/12/06 22:05:58] [debug] [router] match rule node_exporter_metrics.0:opentelemetry.0
fluent-bit-fluent-bit-1     | [2022/12/06 22:05:58] [debug] [router] match rule dummy.1:opentelemetry.0
fluent-bit-fluent-bit-1     | [2022/12/06 22:05:58] [debug] [router] match rule storage_backlog.2:opentelemetry.0
fluent-bit-fluent-bit-1     | [2022/12/06 22:05:59] [ info] [http_server] listen iface=0.0.0.0 tcp_port=2020
fluent-bit-fluent-bit-1     | [2022/12/06 22:05:59] [ info] [sp] stream processor started
fluent-bit-fluent-bit-1     | [2022/12/06 22:05:59] [debug] [input chunk] update output instances with new chunk size diff=26
fluent-bit-fluent-bit-1     | [2022/12/06 22:06:00] [debug] [task] created task=0x7fad3003e420 id=0 OK
fluent-bit-fluent-bit-1     | [2022/12/06 22:06:00] [debug] [input:node_exporter_metrics:node_exporter_metrics.0] skip device: ram0
fluent-bit-fluent-bit-1     | [2022/12/06 22:06:00] [debug] [input:node_exporter_metrics:node_exporter_metrics.0] skip device: ram1
fluent-bit-fluent-bit-1     | [2022/12/06 22:06:00] [debug] [input:node_exporter_metrics:node_exporter_metrics.0] skip device: ram2
fluent-bit-fluent-bit-1     | [2022/12/06 22:06:00] [debug] [input:node_exporter_metrics:node_exporter_metrics.0] skip device: ram3
fluent-bit-fluent-bit-1     | [2022/12/06 22:06:00] [debug] [input:node_exporter_metrics:node_exporter_metrics.0] skip device: ram4
fluent-bit-fluent-bit-1     | [2022/12/06 22:06:00] [debug] [input:node_exporter_metrics:node_exporter_metrics.0] skip device: ram5
fluent-bit-fluent-bit-1     | [2022/12/06 22:06:00] [debug] [input:node_exporter_metrics:node_exporter_metrics.0] skip device: ram6
fluent-bit-fluent-bit-1     | [2022/12/06 22:06:00] [debug] [input:node_exporter_metrics:node_exporter_metrics.0] skip device: ram7
fluent-bit-fluent-bit-1     | [2022/12/06 22:06:00] [debug] [input:node_exporter_metrics:node_exporter_metrics.0] skip device: ram8
fluent-bit-fluent-bit-1     | [2022/12/06 22:06:00] [debug] [input:node_exporter_metrics:node_exporter_metrics.0] skip device: ram9
fluent-bit-fluent-bit-1     | [2022/12/06 22:06:00] [debug] [input:node_exporter_metrics:node_exporter_metrics.0] skip device: ram10
fluent-bit-fluent-bit-1     | [2022/12/06 22:06:00] [debug] [input:node_exporter_metrics:node_exporter_metrics.0] skip device: ram11
fluent-bit-fluent-bit-1     | [2022/12/06 22:06:00] [debug] [input:node_exporter_metrics:node_exporter_metrics.0] skip device: ram12
fluent-bit-fluent-bit-1     | [2022/12/06 22:06:00] [debug] [input:node_exporter_metrics:node_exporter_metrics.0] skip device: ram13
fluent-bit-fluent-bit-1     | [2022/12/06 22:06:00] [debug] [input:node_exporter_metrics:node_exporter_metrics.0] skip device: ram14
fluent-bit-fluent-bit-1     | [2022/12/06 22:06:00] [debug] [input:node_exporter_metrics:node_exporter_metrics.0] skip device: ram15
fluent-bit-fluent-bit-1     | [2022/12/06 22:06:00] [debug] [input:node_exporter_metrics:node_exporter_metrics.0] skip device: loop0
fluent-bit-fluent-bit-1     | [2022/12/06 22:06:00] [debug] [input:node_exporter_metrics:node_exporter_metrics.0] skip device: loop1
fluent-bit-fluent-bit-1     | [2022/12/06 22:06:00] [debug] [input:node_exporter_metrics:node_exporter_metrics.0] skip device: loop2
fluent-bit-fluent-bit-1     | [2022/12/06 22:06:00] [debug] [input:node_exporter_metrics:node_exporter_metrics.0] skip device: loop3
fluent-bit-fluent-bit-1     | [2022/12/06 22:06:00] [debug] [input:node_exporter_metrics:node_exporter_metrics.0] skip device: loop4
fluent-bit-fluent-bit-1     | [2022/12/06 22:06:00] [debug] [input:node_exporter_metrics:node_exporter_metrics.0] skip device: loop5
fluent-bit-fluent-bit-1     | [2022/12/06 22:06:00] [debug] [input:node_exporter_metrics:node_exporter_metrics.0] skip device: loop6
fluent-bit-fluent-bit-1     | [2022/12/06 22:06:00] [debug] [input:node_exporter_metrics:node_exporter_metrics.0] skip device: loop7
fluent-bit-fluent-bit-1     | [2022/12/06 22:06:00] [debug] [input:node_exporter_metrics:node_exporter_metrics.0] skip device: vda1
fluent-bit-fluent-bit-1     | [2022/12/06 22:06:00] [debug] [input chunk] update output instances with new chunk size diff=48502
fluent-bit-fluent-bit-1     | [2022/12/06 22:06:00] [debug] [input chunk] update output instances with new chunk size diff=26
fluent-bit-fluent-bit-1     | [2022/12/06 22:06:00] [engine] caught signal (SIGSEGV)
fluent-bit-fluent-bit-1     | [2022/12/06 22:06:00] [engine] caught signal (SIGSEGV)
fluent-bit-fluent-bit-1     | #0  0x7fad3710c5bd      in  ???() at ???:0
fluent-bit-fluent-bit-1     | #1  0x7fad3710eed0      in  ???() at ???:0
fluent-bit-fluent-bit-1     | #2  0x561d83f05f7b      in  flb_calloc() at include/fluent-bit/flb_mem.h:88
fluent-bit-fluent-bit-1     | #3  0x561d83f0ab9e      in  flush_to_otel() at plugins/out_opentelemetry/opentelemetry.c:668
fluent-bit-fluent-bit-1     | #4  0x561d83f0aed9      in  process_logs() at plugins/out_opentelemetry/opentelemetry.c:762
fluent-bit-fluent-bit-1     | #5  0x561d83f0bb4c      in  cb_opentelemetry_flush() at plugins/out_opentelemetry/opentelemetry.c:992
fluent-bit-fluent-bit-1     | #6  0x561d83b2b4ca      in  output_pre_cb_flush() at include/fluent-bit/flb_output.h:524
fluent-bit-fluent-bit-1     | #7  0x561d84564b06      in  co_init() at lib/monkey/deps/flb_libco/amd64.c:117
fluent-bit-fluent-bit-1     | #8  0xffffffffffffffff  in  ???() at ???:0
fluent-bit-fluent-bit-1 exited with code 139

Steps to reproduce

run docker-compose up with the following files

docker-compose.yml

version: "3.7"

services:
  fluent-bit:
    image: fluent/fluent-bit:2.0.6
    volumes:
      - ./fluent-bit.conf:/fluent-bit/etc/fluent-bit.conf
    depends_on:
      - opentelemetry

  opentelemetry:
    image: otel/opentelemetry-collector-contrib
    command: ["--config=/etc/otel-collector-config.yaml"]
    volumes:
      - ./otel-collector-config.yaml:/etc/otel-collector-config.yaml
    ports:
      - "1889:1888"   # pprof extension
      - "8893:8888"   # Prometheus metrics exposed by the collector
      - "8894:8889"   # Prometheus exporter metrics
      - "13134:13133" # health_check extension
      - "4322:4317"   # OTLP gRPC receiver
      - "4323:4318"   # OTLP http receiver
      - "55680:55679" # zpages extension

fluent-bit.conf

[SERVICE]
    Parsers_File    /fluent-bit/etc/parsers.conf
    HTTP_Server     On
    HTTP_Listen     0.0.0.0
    HTTP_Port       2020
    Health_Check    On
    storage.path    /fluent-bit/run/buffer
    storage.metrics on
    Log_Level       debug


[INPUT]
    Name                 node_exporter_metrics
    Tag                  node_metrics
    Scrape_interval      2


[INPUT]
    Name dummy
    Tag dummy.log
    Rate 1
    Dummy {"key": ["val1", 1, 2.3], "key2": "val2"}


[OUTPUT]
    Name                 opentelemetry
    Match                *
    Host                 opentelemetry
    Port                 4318
    Metrics_uri          /v1/metrics
    Logs_uri             /v1/logs
    Traces_uri           /v1/traces
    Log_response_payload True
    Tls                  Off
    Tls.verify           Off

otel-collector-config.yaml

receivers:
  otlp:
    protocols:
      grpc:
        include_metadata: true
      http:
        include_metadata: true

processors:
  batch:
  resource:
    attributes:
      - key: host.name
        value: "fluent bit"
        action: upsert

exporters:
  logging:
    verbosity: detailed
    sampling_initial: 5
    sampling_thereafter: 200

service:
  telemetry:
    logs:
      level: "debug"
      encoding: json
    metrics:
      level: detailed

  pipelines:
    metrics/multi:
      receivers: [otlp]
      processors: [resource, batch]
      exporters: [logging]
    logs:
      receivers: [otlp]
      processors: [resource, batch]
      exporters: [logging]
@agup006
Copy link
Member

agup006 commented Dec 6, 2022

@Syn3rman @leonardo-albertovich would you be able to take a look?

@Syn3rman
Copy link
Contributor

Syn3rman commented Dec 7, 2022

Didn't run into any issues in docker on a mac 😓

otel-docker-logs-fluent-bit-1     | Fluent Bit v2.0.6
otel-docker-logs-fluent-bit-1     | * Copyright (C) 2015-2022 The Fluent Bit Authors
otel-docker-logs-fluent-bit-1     | * Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
otel-docker-logs-fluent-bit-1     | * https://fluentbit.io
otel-docker-logs-fluent-bit-1     |
otel-docker-logs-fluent-bit-1     | [2022/12/07 04:54:09] [ info] Configuration:
otel-docker-logs-fluent-bit-1     | [2022/12/07 04:54:09] [ info]  flush time     | 1.000000 seconds
otel-docker-logs-fluent-bit-1     | [2022/12/07 04:54:09] [ info]  grace          | 5 seconds
otel-docker-logs-fluent-bit-1     | [2022/12/07 04:54:09] [ info]  daemon         | 0
otel-docker-logs-fluent-bit-1     | [2022/12/07 04:54:09] [ info] ___________
otel-docker-logs-fluent-bit-1     | [2022/12/07 04:54:09] [ info]  inputs:
otel-docker-logs-fluent-bit-1     | [2022/12/07 04:54:09] [ info]      node_exporter_metrics
otel-docker-logs-fluent-bit-1     | [2022/12/07 04:54:09] [ info]      dummy
otel-docker-logs-fluent-bit-1     | [2022/12/07 04:54:09] [ info] ___________
otel-docker-logs-fluent-bit-1     | [2022/12/07 04:54:09] [ info]  filters:
otel-docker-logs-fluent-bit-1     | [2022/12/07 04:54:09] [ info] ___________
otel-docker-logs-fluent-bit-1     | [2022/12/07 04:54:09] [ info]  outputs:
otel-docker-logs-fluent-bit-1     | [2022/12/07 04:54:09] [ info]      opentelemetry.0
otel-docker-logs-fluent-bit-1     | [2022/12/07 04:54:09] [ info] ___________
otel-docker-logs-fluent-bit-1     | [2022/12/07 04:54:09] [ info]  collectors:
otel-docker-logs-fluent-bit-1     | [2022/12/07 04:54:09] [ info] [fluent bit] version=2.0.6, commit=211f841fe2, pid=1
otel-docker-logs-fluent-bit-1     | [2022/12/07 04:54:09] [debug] [engine] coroutine stack size: 196608 bytes (192.0K)
otel-docker-logs-fluent-bit-1     | [2022/12/07 04:54:09] [ info] [storage] ver=1.3.0, type=memory+filesystem, sync=normal, checksum=off, max_chunks_up=128
otel-docker-logs-fluent-bit-1     | [2022/12/07 04:54:09] [ info] [storage] backlog input plugin: storage_backlog.2
otel-docker-logs-fluent-bit-1     | [2022/12/07 04:54:09] [ info] [cmetrics] version=0.5.7
otel-docker-logs-fluent-bit-1     | [2022/12/07 04:54:09] [ info] [ctraces ] version=0.2.5
otel-docker-logs-fluent-bit-1     | [2022/12/07 04:54:09] [ info] [input:node_exporter_metrics:node_exporter_metrics.0] initializing
otel-docker-logs-fluent-bit-1     | [2022/12/07 04:54:09] [ info] [input:node_exporter_metrics:node_exporter_metrics.0] storage_strategy='memory' (memory only)
otel-docker-logs-fluent-bit-1     | [2022/12/07 04:54:09] [debug] [node_exporter_metrics:node_exporter_metrics.0] created event channels: read=21 write=22
otel-docker-logs-fluent-bit-1     | [2022/12/07 04:54:09] [ info] [input:node_exporter_metrics:node_exporter_metrics.0] path.procfs = /proc
otel-docker-logs-fluent-bit-1     | [2022/12/07 04:54:09] [ info] [input:node_exporter_metrics:node_exporter_metrics.0] path.sysfs  = /sys
otel-docker-logs-fluent-bit-1     | [2022/12/07 04:54:09] [ info] [input:dummy:dummy.1] initializing
otel-docker-logs-fluent-bit-1     | [2022/12/07 04:54:09] [ info] [input:dummy:dummy.1] storage_strategy='memory' (memory only)
otel-docker-logs-fluent-bit-1     | [2022/12/07 04:54:09] [debug] [dummy:dummy.1] created event channels: read=23 write=24
otel-docker-logs-fluent-bit-1     | [2022/12/07 04:54:09] [ info] [input:storage_backlog:storage_backlog.2] initializing
otel-docker-logs-fluent-bit-1     | [2022/12/07 04:54:09] [ info] [input:storage_backlog:storage_backlog.2] storage_strategy='memory' (memory only)
otel-docker-logs-fluent-bit-1     | [2022/12/07 04:54:09] [debug] [storage_backlog:storage_backlog.2] created event channels: read=25 write=26
otel-docker-logs-fluent-bit-1     | [2022/12/07 04:54:09] [ info] [input:storage_backlog:storage_backlog.2] queue memory limit: 95.4M
otel-docker-logs-fluent-bit-1     | [2022/12/07 04:54:09] [debug] [opentelemetry:opentelemetry.0] created event channels: read=27 write=28
otel-docker-logs-fluent-bit-1     | [2022/12/07 04:54:09] [debug] [router] match rule node_exporter_metrics.0:opentelemetry.0
otel-docker-logs-fluent-bit-1     | [2022/12/07 04:54:09] [debug] [router] match rule dummy.1:opentelemetry.0
otel-docker-logs-fluent-bit-1     | [2022/12/07 04:54:09] [debug] [router] match rule storage_backlog.2:opentelemetry.0
otel-docker-logs-fluent-bit-1     | [2022/12/07 04:54:09] [ info] [http_server] listen iface=0.0.0.0 tcp_port=2020
otel-docker-logs-fluent-bit-1     | [2022/12/07 04:54:09] [ info] [sp] stream processor started
otel-docker-logs-fluent-bit-1     | [2022/12/07 04:54:09] [debug] [input chunk] update output instances with new chunk size diff=42
otel-docker-logs-fluent-bit-1     | [2022/12/07 04:54:10] [debug] [task] created task=0xffff9803db80 id=0 OK
otel-docker-logs-fluent-bit-1     | [2022/12/07 04:54:10] [debug] [input:node_exporter_metrics:node_exporter_metrics.0] skip device: loop0
otel-docker-logs-fluent-bit-1     | [2022/12/07 04:54:10] [debug] [input:node_exporter_metrics:node_exporter_metrics.0] skip device: loop1
otel-docker-logs-fluent-bit-1     | [2022/12/07 04:54:10] [debug] [input:node_exporter_metrics:node_exporter_metrics.0] skip device: loop2
otel-docker-logs-fluent-bit-1     | [2022/12/07 04:54:10] [debug] [input:node_exporter_metrics:node_exporter_metrics.0] skip device: loop3
otel-docker-logs-fluent-bit-1     | [2022/12/07 04:54:10] [debug] [input:node_exporter_metrics:node_exporter_metrics.0] skip device: loop4
otel-docker-logs-fluent-bit-1     | [2022/12/07 04:54:10] [debug] [input:node_exporter_metrics:node_exporter_metrics.0] skip device: loop5
otel-docker-logs-fluent-bit-1     | [2022/12/07 04:54:10] [debug] [input:node_exporter_metrics:node_exporter_metrics.0] skip device: loop6
otel-docker-logs-fluent-bit-1     | [2022/12/07 04:54:10] [debug] [input:node_exporter_metrics:node_exporter_metrics.0] skip device: loop7
otel-docker-logs-fluent-bit-1     | [2022/12/07 04:54:10] [debug] [input:node_exporter_metrics:node_exporter_metrics.0] skip device: vda1
otel-docker-logs-fluent-bit-1     | [2022/12/07 04:54:10] [debug] [input chunk] update output instances with new chunk size diff=46863
otel-docker-logs-fluent-bit-1     | [2022/12/07 04:54:10] [debug] [input chunk] update output instances with new chunk size diff=42
otel-docker-logs-fluent-bit-1     | [2022/12/07 04:54:10] [debug] [http_client] not using http_proxy for header
otel-docker-logs-fluent-bit-1     | [2022/12/07 04:54:10] [ info] [output:opentelemetry:opentelemetry.0] opentelemetry:4318, HTTP status=200
otel-docker-logs-fluent-bit-1     |
otel-docker-logs-fluent-bit-1     |
otel-docker-logs-fluent-bit-1     | [2022/12/07 04:54:10] [debug] [out flush] cb_destroy coro_id=0
otel-docker-logs-fluent-bit-1     | [2022/12/07 04:54:10] [debug] [task] destroy task=0xffff9803db80 (task_id=0)
otel-docker-logs-opentelemetry-1  | {"level":"info","ts":1670388850.0979016,"msg":"LogsExporter","kind":"exporter","data_type":"logs","name":"logging","#logs":1}
otel-docker-logs-opentelemetry-1  | {"level":"info","ts":1670388850.0983992,"msg":"ResourceLog #0\nResource SchemaURL: \nResource attributes:\n     -> host.name: Str(fluent bit)\nScopeLogs #0\nScopeLogs SchemaURL: \nInstrumentationScope  \nLogRecord #0\nObservedTimestamp: 1970-01-01 00:00:00 +0000 UTC\nTimestamp: 2022-12-07 04:54:09.046844133 +0000 UTC\nSeverityText: \nSeverityNumber: Unspecified(0)\nBody: Map({\"key\":[\"val1\",1,2.3],\"key2\":\"val2\"})\nTrace ID: \nSpan ID: \nFlags: 0\n","kind":"exporter","data_type":"logs","name":"logging"}
otel-docker-logs-fluent-bit-1     | [2022/12/07 04:54:11] [debug] [task] created task=0xffff980658e0 id=0 OK
otel-docker-logs-fluent-bit-1     | [2022/12/07 04:54:11] [debug] [task] created task=0xffff98065960 id=1 OK
otel-docker-logs-fluent-bit-1     | [2022/12/07 04:54:11] [debug] [input chunk] update output instances with new chunk size diff=42
otel-docker-logs-fluent-bit-1     | [2022/12/07 04:54:11] [debug] [output:opentelemetry:opentelemetry.0] cmetrics msgpack size: 46863
otel-docker-logs-fluent-bit-1     | [2022/12/07 04:54:11] [debug] [output:opentelemetry:opentelemetry.0] cmetric_id=0 decoded 0-46863 payload_size=24909
otel-docker-logs-fluent-bit-1     | [2022/12/07 04:54:11] [debug] [output:opentelemetry:opentelemetry.0] final payload size: 24909
otel-docker-logs-fluent-bit-1     | [2022/12/07 04:54:11] [debug] [http_client] not using http_proxy for header
otel-docker-logs-fluent-bit-1     | [2022/12/07 04:54:11] [debug] [http_client] not using http_proxy for header
otel-docker-logs-fluent-bit-1     | [2022/12/07 04:54:11] [ info] [output:opentelemetry:opentelemetry.0] opentelemetry:4318, HTTP status=200
otel-docker-logs-fluent-bit-1     |
otel-docker-logs-fluent-bit-1     |
otel-docker-logs-fluent-bit-1     | [2022/12/07 04:54:11] [ info] [output:opentelemetry:opentelemetry.0] opentelemetry:4318, HTTP status=200
otel-docker-logs-fluent-bit-1     |
otel-docker-logs-fluent-bit-1     |
otel-docker-logs-fluent-bit-1     | [2022/12/07 04:54:11] [debug] [output:opentelemetry:opentelemetry.0] http_post result FLB_OK
otel-docker-logs-fluent-bit-1     | [2022/12/07 04:54:11] [debug] [out flush] cb_destroy coro_id=2
otel-docker-logs-fluent-bit-1     | [2022/12/07 04:54:11] [debug] [out flush] cb_destroy coro_id=1
otel-docker-logs-fluent-bit-1     | [2022/12/07 04:54:11] [debug] [task] destroy task=0xffff98065960 (task_id=1)
otel-docker-logs-fluent-bit-1     | [2022/12/07 04:54:11] [debug] [task] destroy task=0xffff980658e0 (task_id=0)
otel-docker-logs-opentelemetry-1  | {"level":"info","ts":1670388851.1091185,"msg":"LogsExporter","kind":"exporter","data_type":"logs","name":"logging","#logs":1}
otel-docker-logs-opentelemetry-1  | {"level":"info","ts":1670388851.1092324,"msg":"MetricsExporter","kind":"exporter","data_type":"metrics","name":"logging","#metrics":102}
otel-docker-logs-opentelemetry-1  | {"level":"info","ts":1670388851.109476,"msg":"ResourceLog #0\nResource SchemaURL: \nResource attributes:\n     -> host.name: Str(fluent bit)\nScopeLogs #0\nScopeLogs SchemaURL: \nInstrumentationScope  \nLogRecord #0\nObservedTimestamp: 1970-01-01 00:00:00 +0000 UTC\nTimestamp: 2022-12-07 04:54:10.053953175 +0000 UTC\nSeverityText: \nSeverityNumber: Unspecified(0)\nBody: Map({\"key\":[\"val1\",1,2.3],\"key2\":\"val2\"})\nTrace ID: \nSpan ID: \nFlags: 0\n","kind":"exporter","data_type":"logs","name":"logging"}

@abeach-nr
Copy link
Author

abeach-nr commented Dec 7, 2022

@Syn3rman this also fails on aws linux outside of docker

@abeach-nr
Copy link
Author

there is another issue reporting this:

#6457

@Syn3rman
Copy link
Contributor

Syn3rman commented Dec 8, 2022

checking it on an aws instance

@Syn3rman
Copy link
Contributor

I was unable to reproduce the same error. It did crash with a different error but I think that the root cause of both is the same: the event chunk was being corrupted because of this part of code since it blocks up memory for FLB_LOG_RECORD_BATCH_SIZE (1000) log records even when it is likely that we do not have these many records.

I reduced it to a smaller number (128) and it worked fine, could you try it and let me know if it works? I'll be opening a PR since it is something we anyway need to address so you could also wait till the next release.

@abeach-nr
Copy link
Author

is FLB_LOG_RECORD_BATCH_SIZE configurable or is it just hard coded to 1000?

#define FLB_LOG_RECORD_BATCH_SIZE 1000

@Syn3rman
Copy link
Contributor

It is hard coded but I will open a PR to move it to config param

@gingerwizard
Copy link

Using the helm chart @Syn3rman with 2.0.8 and the following config

  inputs: |
    [INPUT]
        Name tail
        Path /var/log/containers/*.log
        multiline.parser docker, cri
        Tag kube.*
        Mem_Buf_Limit 5MB
        Skip_Long_Lines On

    [INPUT]
        Name systemd
        Tag host.*
        Systemd_Filter _SYSTEMD_UNIT=kubelet.service
        Read_From_Tail On

  ## https://docs.fluentbit.io/manual/pipeline/filters
  filters: |
    [FILTER]
        Name kubernetes
        Match kube.*
        Merge_Log On
        Keep_Log Off
        K8S-Logging.Parser On
        K8S-Logging.Exclude On

  ## https://docs.fluentbit.io/manual/pipeline/outputs
  outputs: |
    [OUTPUT]
        Name opentelemetry
        Match *
        Host pmm-opentelemetry-fluent-collector-gateway
        Port 4318
        Logs_uri /v1/logs
        Log_response_payload True
        Tls Off
        Tls.verify Off
        batch_size 64

this causes crashes

lski-experiments_prometheus-adapter-ae1d64826f919bedbcbd8309e1e1a21c767ddd5e55559a2a3b4f78aa165f1e54.log
[2022/12/27 15:48:18] [ info] [input:tail:tail.0] inotify_fs_add(): inode=77783751 watch_fd=38 name=/var/log/containers/c-canary-os-78-server-0_ns-canary-os-78_c-canary-os-78-server-1d7f16e01400e600bee00c1df25a632c84f5b44f9cc8d5259421ad386e29aa89.log
[2022/12/27 15:48:18] [ info] [input:tail:tail.0] inotify_fs_add(): inode=201373101 watch_fd=39 name=/var/log/containers/c-salmon-sq-90-server-0_ns-salmon-sq-90_c-salmon-sq-90-server-09da943606a9168ce3faf78de03abcdb128b714e68f278112ec2e21ab80bd5ab.log
[2022/12/27 15:48:18] [ info] [input:tail:tail.0] inotify_fs_add(): inode=166880940 watch_fd=40 name=/var/log/containers/falcosidekick-ui-84f99ccfbc-6wp8c_falco_falcosidekick-ui-552e6fb1b050a39dacce80b6e16bb206b785e99758b5a23d86de5f0268b8784f.log
[2022/12/27 15:48:18] [ info] [input:tail:tail.0] inotify_fs_add(): inode=127783699 watch_fd=41 name=/var/log/containers/c-ivory-es-35-server-0_ns-ivory-es-35_c-ivory-es-35-server-60bbc57b187b29b7fad824c4cd3c1a5169913a455e303e3c3c2827fe0a2ce50c.log
[2022/12/27 15:48:18] [ info] [input:tail:tail.0] inotify_fs_add(): inode=70322923 watch_fd=42 name=/var/log/containers/falco-9q8nj_falco_falco-6f19e9dd900ccff3b942558ed9be537ab0cf67af07234de488aed0181d40bfab.log
[2022/12/27 15:48:18] [ info] [input:tail:tail.0] inotify_fs_add(): inode=83920747 watch_fd=43 name=/var/log/containers/falcosidekick-59c48d4dd6-cgtv2_falco_falcosidekick-76e4435098eb210eadd1a039386873d772822397372f665dd0f7a57d4bf307b7.log
[2022/12/27 15:48:18] [ info] [input:tail:tail.0] inotify_fs_add(): inode=61875940 watch_fd=44 name=/var/log/containers/coredns-57f98d8bd7-27wfm_kube-system_coredns-0e57b3cf6b6ce993a6dc684845094b1ac15d66f871eccd52f6694fcc3512f343.log
[2022/12/27 15:48:18] [ info] [input:tail:tail.0] inotify_fs_add(): inode=200933123 watch_fd=45 name=/var/log/containers/fluent-bit-rgrqm_open-telemetry_fluent-bit-6f11f3cd32c6233c76ccac8d28c8783fa5d1e8231cdd22d3621b56b913ac1af2.log
[2022/12/27 15:48:18] [engine] caught signal (SIGSEGV)
#0  0x562f598f3c94      in  protobuf_c_message_get_packed_size() at lib/fluent-otel-proto/proto_c/protobuf-c/protobuf-c.c:713
#1  0x562f598f36b0      in  required_field_get_packed_size() at lib/fluent-otel-proto/proto_c/protobuf-c/protobuf-c.c:455
#2  0x562f598f3951      in  unlabeled_field_get_packed_size() at lib/fluent-otel-proto/proto_c/protobuf-c/protobuf-c.c:593
#3  0x562f598f3dc6      in  protobuf_c_message_get_packed_size() at lib/fluent-otel-proto/proto_c/protobuf-c/protobuf-c.c:739
#4  0x562f598f3bfd      in  repeated_field_get_packed_size() at lib/fluent-otel-proto/proto_c/protobuf-c/protobuf-c.c:675
#5  0x562f598f3de6      in  protobuf_c_message_get_packed_size() at lib/fluent-otel-proto/proto_c/protobuf-c/protobuf-c.c:744
#6  0x562f598f36b0      in  required_field_get_packed_size() at lib/fluent-otel-proto/proto_c/protobuf-c/protobuf-c.c:455
#7  0x562f598f376f      in  oneof_field_get_packed_size() at lib/fluent-otel-proto/proto_c/protobuf-c/protobuf-c.c:492
#8  0x562f598f3d77      in  protobuf_c_message_get_packed_size() at lib/fluent-otel-proto/proto_c/protobuf-c/protobuf-c.c:727
#9  0x562f598f36b0      in  required_field_get_packed_size() at lib/fluent-otel-proto/proto_c/protobuf-c/protobuf-c.c:455
#10 0x562f598f3951      in  unlabeled_field_get_packed_size() at lib/fluent-otel-proto/proto_c/protobuf-c/protobuf-c.c:593
#11 0x562f598f3dc6      in  protobuf_c_message_get_packed_size() at lib/fluent-otel-proto/proto_c/protobuf-c/protobuf-c.c:739
#12 0x562f598f3bfd      in  repeated_field_get_packed_size() at lib/fluent-otel-proto/proto_c/protobuf-c/protobuf-c.c:675
#13 0x562f598f3de6      in  protobuf_c_message_get_packed_size() at lib/fluent-otel-proto/proto_c/protobuf-c/protobuf-c.c:744
#14 0x562f598f36b0      in  required_field_get_packed_size() at lib/fluent-otel-proto/proto_c/protobuf-c/protobuf-c.c:455
#15 0x562f598f376f      in  oneof_field_get_packed_size() at lib/fluent-otel-proto/proto_c/protobuf-c/protobuf-c.c:492
#16 0x562f598f3d77      in  protobuf_c_message_get_packed_size() at lib/fluent-otel-proto/proto_c/protobuf-c/protobuf-c.c:727
#17 0x562f598f36b0      in  required_field_get_packed_size() at lib/fluent-otel-proto/proto_c/protobuf-c/protobuf-c.c:455
#18 0x562f598f3951      in  unlabeled_field_get_packed_size() at lib/fluent-otel-proto/proto_c/protobuf-c/protobuf-c.c:593
#19 0x562f598f3dc6      in  protobuf_c_message_get_packed_size() at lib/fluent-otel-proto/proto_c/protobuf-c/protobuf-c.c:739
#20 0x562f598f3bfd      in  repeated_field_get_packed_size() at lib/fluent-otel-proto/proto_c/protobuf-c/protobuf-c.c:675
#21 0x562f598f3de6      in  protobuf_c_message_get_packed_size() at lib/fluent-otel-proto/proto_c/protobuf-c/protobuf-c.c:744
#22 0x562f598f3bfd      in  repeated_field_get_packed_size() at lib/fluent-otel-proto/proto_c/protobuf-c/protobuf-c.c:675
#23 0x562f598f3de6      in  protobuf_c_message_get_packed_size() at lib/fluent-otel-proto/proto_c/protobuf-c/protobuf-c.c:744
#24 0x562f598f3bfd      in  repeated_field_get_packed_size() at lib/fluent-otel-proto/proto_c/protobuf-c/protobuf-c.c:675
#25 0x562f598f3de6      in  protobuf_c_message_get_packed_size() at lib/fluent-otel-proto/proto_c/protobuf-c/protobuf-c.c:744
#26 0x562f598fb9b6      in  opentelemetry__proto__collector__logs__v1__export_logs_service_request__get_packed_size() at lib/fluent-otel-proto/proto_c/opentelemetry/proto/collector/logs/v1/logs_service.pb-c.c:20
#27 0x562f59737c88      in  flush_to_otel() at plugins/out_opentelemetry/opentelemetry.c:671
#28 0x562f597380b9      in  process_logs() at plugins/out_opentelemetry/opentelemetry.c:794
#29 0x562f59738df1      in  cb_opentelemetry_flush() at plugins/out_opentelemetry/opentelemetry.c:1029
#30 0x562f5934f4a0      in  output_pre_cb_flush() at include/fluent-bit/flb_output.h:527
#31 0x562f59d92266      in  co_init() at lib/monkey/deps/flb_libco/amd64.c:117
#32 0xffffffffffffffff  in  ???() at ???:0

@gingerwizard
Copy link

This assumes a http otlp input on otel-collector i.e.

  receivers:
    otlp:
      protocols:
        grpc:
          endpoint: 0.0.0.0:4317
        http:
          endpoint: 0.0.0.0:4318

@agup006
Copy link
Member

agup006 commented Dec 28, 2022

@Syn3rman would you be able to take a look at @gingerwizard's issue? Might need to create a new one

@gingerwizard
Copy link

If i configure the otel input i.e.

  inputs: |
    [INPUT]
        Name              opentelemetry
        Listen            0.0.0.0
        Port              4318
        Buffer_Chunk_Size 10M
        Buffer_Max_Size   500M

and configure an otel agent to use

  exporters:
    otlphttp:
      endpoint: http://fluent-bit-collector:4318
      tls:
        insecure: true
      sending_queue:
        num_consumers: 4
        queue_size: 100
      retry_on_failure:
        enabled: true

fluent bit crashes with a similar error

[2022/12/30 11:39:09] [engine] caught signal (SIGSEGV)
[2022/12/30 11:39:09] [error] [otel] Failed to unpack input logs
[2022/12/30 11:39:09] [error] [otel] Failed to unpack input logs
#0  0x55c3a19e9d9f      in  flb_sds_len() at include/fluent-bit/flb_sds.h:50
#1  0x55c3a19edb57      in  cb_stdout_flush() at plugins/out_stdout/stdout.c:196
#2  0x55c3a166d267      in  output_pre_cb_flush() at include/fluent-bit/flb_output.h:527
#3  0x55c3a20a9266      in  co_init() at lib/monkey/deps/flb_libco/amd64.c:117

@Syn3rman
Copy link
Contributor

Taking a look

@juanmolle
Copy link

I have a crash with similar stack, It is happening only in M1, probably it is related to emulation

@leonardo-albertovich
Copy link
Collaborator

Would you be able to create a minimal reproduction case for us to investigate?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants