Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fluent Bit crashes for tail & opentelemetry with logs_uri #6457

Closed
wtobis opened this issue Nov 25, 2022 · 8 comments
Closed

Fluent Bit crashes for tail & opentelemetry with logs_uri #6457

wtobis opened this issue Nov 25, 2022 · 8 comments

Comments

@wtobis
Copy link

wtobis commented Nov 25, 2022

Bug Report

Describe the bug
Fluent Bit crashes with error

[2022/11/10 20:02:37] [ info] [input:tail:tail.0] inotify_fs_add(): inode=2533274793273371 watch_fd=1 name=/var/log/mylogs.log
[2022/11/10 20:02:38] [engine] caught signal (SIGSEGV)
#0  0x55b173e87101      in  mk_list_size() at lib/monkey/include/monkey/mk_core/mk_list.h:165
#1  0x55b173e8b368      in  handle_output_event() at src/flb_engine.c:289
#2  0x55b173e8cb85      in  flb_engine_start() at src/flb_engine.c:971
#3  0x55b173e34ab4      in  flb_lib_worker() at src/flb_lib.c:629
#4  0x7fa2ca09dea6      in  ???() at ???:0
#5  0x7fa2c996da2e      in  ???() at ???:0
#6  0xffffffffffffffff  in  ???() at ???:0

when use tail input and opentelemetry output (with logs_uri) plugins.

Important - for other input plugins (e.g. dummy) it works.

To Reproduce

  • Start Fluent Bit with provided configuration
  • Move test.log file with some logs into /var/log/ directory

Screenshots
The whole log message from start to failure:

[2022/11/10 20:02:24] [ info] [fluent bit] version=2.0.4, commit=abb65a1f31, pid=1
[2022/11/10 20:02:24] [ info] [storage] ver=1.3.0, type=memory, sync=normal, checksum=off, max_chunks_up=128
[2022/11/10 20:02:24] [ info] [cmetrics] version=0.5.6
[2022/11/10 20:02:24] [ info] [ctraces ] version=0.2.5
[2022/11/10 20:02:24] [ info] [input:tail:tail.0] initializing
[2022/11/10 20:02:24] [ info] [input:tail:tail.0] storage_strategy='memory' (memory only)
[2022/11/10 20:02:24] [ info] [sp] stream processor started
[2022/11/10 20:02:24] [ info] [output:stdout:stdout.0] worker #0 started
[2022/11/10 20:02:37] [ info] [input:tail:tail.0] inotify_fs_add(): inode=2533274793273371 watch_fd=1 name=/var/log/mylogs.log
[2022/11/10 20:02:38] [engine] caught signal (SIGSEGV)
#0  0x55b173e87101      in  mk_list_size() at lib/monkey/include/monkey/mk_core/mk_list.h:165
#1  0x55b173e8b368      in  handle_output_event() at src/flb_engine.c:289
#2  0x55b173e8cb85      in  flb_engine_start() at src/flb_engine.c:971
#3  0x55b173e34ab4      in  flb_lib_worker() at src/flb_lib.c:629
#4  0x7fa2ca09dea6      in  ???() at ???:0
#5  0x7fa2c996da2e      in  ???() at ???:0
#6  0xffffffffffffffff  in  ???() at ???:0

Your Environment

  • Version used: Docker image: fluent/fluent-bit:2.0.4
  • Configuration: fluent-bit.conf:
[INPUT]
	Name tail
	Path /var/log/*.log
	Refresh_Interval 10 

[FILTER]
	Name modify
	Match *
	Add testattribute attrvalue

[OUTPUT]
	Name stdout
	Match *

[OUTPUT]
	Name  opentelemetry
	Match *
	Host  hostname
	Port  443
	logs_uri /v1/logs
	add_label testlabel labelvalue
  • Operating System and version: Windows 10 + Docker Desktop
  • Filters and plugins: tail, modify, opentelemetry and stdout (just for testing)

Additional context
It might be related to this issue because before 2.0.4 this configuration didn't fail.

@BertelBB
Copy link

BertelBB commented Dec 1, 2022

Also encountering this in v2.0.5 although I am not using the opentelemetry output plugin. I'm only using elasticsearch output plugin

@Syn3rman
Copy link
Contributor

Syn3rman commented Dec 1, 2022

@wtobis I wasn't able to reproduce this in docker on a mac. I'm wondering if this might be related to the tail plugin on windows

@BertelBB
Copy link

BertelBB commented Dec 1, 2022

@wtobis I wasn't able to reproduce this in docker on a mac. I'm wondering if this might be related to the tail plugin on windows

Don't think it is isolated to Windows only since I am running on Linux

@Syn3rman
Copy link
Contributor

Syn3rman commented Dec 1, 2022

Could you please send your config too?

@BertelBB
Copy link

BertelBB commented Dec 1, 2022

Could you please send your config too?

Environment: AKS with Linux nodes
FluentBit version: 2.0.5

[SERVICE]
    flush 1
    daemon off
    log_level warning
    parsers_file custom_parsers.conf
    parsers_file parsers.conf
    http_server on
    http_listen 0.0.0.0
    http_port 2020
    storage.path /fluent-bit/etc/data

[INPUT]
    name tail
    alias kube
    path /var/log/containers/*.log
    path_key log_file_path
    db /fluent-bit/etc/db/kube.db
    parser cri-custom
    tag kube.*
    buffer_chunk_size 32k
    buffer_max_size 256k
    mem_buf_limit 5m
    read_from_head true
    refresh_interval 10
    skip_empty_lines on
    skip_long_lines off
    storage.type filesystem

[FILTER]
    name kubernetes
    match *
    kube_url https://kubernetes.default.svc.cluster.local:443
    kube_ca_file /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    kube_token_file /var/run/secrets/kubernetes.io/serviceaccount/token
    kube_tag_prefix kube.var.log.containers.
    buffer_size 256k
    merge_log on
    merge_log_trim on
    keep_log off
    k8s-logging.parser on
    k8s-logging.exclude on
    annotations off
    labels on

[OUTPUT]
    name es
    alias elasticsearch
    match *
    host ${FLUENT_ES_HOST}
    port ${FLUENT_ES_PORT}
    http_user ${FLUENT_ES_USER}
    http_passwd ${FLUENT_ES_PW}
    buffer_size 1m
    index kubernetes
    generate_id on
    logstash_format on
    logstash_prefix ${FLUENT_ES_LOGSTASH_PREFIX}
    replace_dots on
    retry_limit 5
    tls on
    tls.verify off
    trace_error off

@agup006
Copy link
Member

agup006 commented Dec 7, 2022

@BertelBB i think this is missing the opentelemtry output config. Also while I don’t expect it to be fixed could you also sanity check 2.0.6?

@Syn3rman
Copy link
Contributor

Similar to what I reported in the linked issue (6512), reducing the batch size of log records to flush seems to fix this issue. A short term fix will be to provide it as a config parameter and default it to a smaller number (64 or 128), but we might have to look into the retry logic for this.

@edsiper
Copy link
Member

edsiper commented Jan 27, 2023

fixed by #6559 #6583

@edsiper edsiper closed this as completed Jan 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants