Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fluent bit crash after a while #7036

Closed
juanmolle opened this issue Mar 17, 2023 · 4 comments
Closed

fluent bit crash after a while #7036

juanmolle opened this issue Mar 17, 2023 · 4 comments
Labels
Stale status: waiting-for-triage waiting-for-user Waiting for more information, tests or requested changes

Comments

@juanmolle
Copy link

Bug Report

Describe the bug
After running for some time the process crash, this is happening with the amd64 build running on a M1 Mac

To Reproduce
After some time running with 2.0.9
/usr/local/bin/fluent-bit -c //fluent/main.conf -w /tmp

this issue is only observed in an emulated environment on M1

#0  0x40006862f0        in  mpack_load_u8() at lib/mpack-amalgamation-1.1/src/mpack/mpack.h:2689
#1  0x400068cd53        in  mpack_parse_tag() at lib/mpack-amalgamation-1.1/src/mpack/mpack.c:3297
#2  0x400068d895        in  mpack_read_tag() at lib/mpack-amalgamation-1.1/src/mpack/mpack.c:3654
#3  0x400068d956        in  mpack_discard() at lib/mpack-amalgamation-1.1/src/mpack/mpack.c:3703
#4  0x400017f7f1        in  flb_mp_count_remaining() at src/flb_mp.c:54
#5  0x400017f795        in  flb_mp_count() at src/flb_mp.c:39
#6  0x40000b06d3        in  flb_pack_msgpack_to_json_format() at src/flb_pack.c:927
#7  0x40003ccbcc        in  compose_payload() at plugins/out_http/http.c:390
#8  0x40003cd688        in  cb_http_flush() at plugins/out_http/http.c:584
#9  0x40000d63ec        in  output_pre_cb_flush() at include/fluent-bit/flb_output.h:528
#10 0x400094acc6        in  co_init() at lib/monkey/deps/flb_libco/amd64.c:117
#11 0xffffffffffffffff  in  ???() at ???:0
qemu: uncaught target signal 6 (Aborted) - core dumped
  • Steps to reproduce the problem:

Your Environment

  • Version used: 1.8.0 to 2.0.9
  • Configuration:
  • Environment name and version running on a linux ubuntu x86_64 docker image using docker desktop on a Mac M1
  • Server type and version:
  • Operating System and version: Mac M1 Ventura 13.2.1
  • Filters and plugins:
@leonardo-albertovich
Copy link
Collaborator

Could you please share your configuration file and any instructions needed to reproduce the issue?

@RicardoAAD RicardoAAD added the waiting-for-user Waiting for more information, tests or requested changes label Mar 20, 2023
@github-actions
Copy link
Contributor

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days. Maintainers can add the exempt-stale label.

@github-actions github-actions bot added the Stale label Jun 19, 2023
@github-actions
Copy link
Contributor

This issue was closed because it has been stalled for 5 days with no activity.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jun 24, 2023
@Shobu12
Copy link

Shobu12 commented Dec 4, 2023

I am getting same error SIGSEGV on fluentbit1.9, is there any suggestion to fix this issue
2023-12-04T16:37:23.723130265Z [2023/12/04 16:37:23] [ warn] [engine] failed to flush chunk '1-1701705786.22939653.flb', retry in 8 seconds: task_id=191, input=storage_backlog.5 > output=forward.0 (out_id=0)
2023-12-04T16:37:23.723146345Z [2023/12/04 16:37:23] [ warn] [engine] failed to flush chunk '1-1701706569.124160349.flb', retry in 9 seconds: task_id=193, input=storage_backlog.5 > output=forward.0 (out_id=0)
2023-12-04T16:37:24.122181101Z [2023/12/04 16:37:24] [ warn] [engine] failed to flush chunk '1-1701707364.25723752.flb', retry in 9 seconds: task_id=195, input=storage_backlog.5 > output=forward.0 (out_id=0)
2023-12-04T16:37:24.321961669Z [2023/12/04 16:37:24] [ info] [task] re-schedule retry=0x7f5fe103a6a0 133 in the next 11 seconds
2023-12-04T16:37:25.321823901Z [2023/12/04 16:37:25] [ info] [task] re-schedule retry=0x7f5fe103a290 92 in the next 7 seconds
2023-12-04T16:37:25.321823901Z [2023/12/04 16:37:25] [ info] [task] re-schedule retry=0x7f5fe103a2b8 94 in the next 7 seconds
2023-12-04T16:37:25.321849600Z [2023/12/04 16:37:25] [ info] [task] re-schedule retry=0x7f5fe103a358 102 in the next 8 seconds
2023-12-04T16:37:25.321856023Z [2023/12/04 16:37:25] [ info] [task] re-schedule retry=0x7f5fe103a7b8 143 in the next 8 seconds
2023-12-04T16:37:25.321872134Z [2023/12/04 16:37:25] [ info] [task] re-schedule retry=0x7f5fe103a7e0 130 in the next 9 seconds
2023-12-04T16:37:26.321823294Z [2023/12/04 16:37:26] [ info] [task] re-schedule retry=0x7f5fe103a448 123 in the next 11 seconds
2023-12-04T16:37:26.321823294Z [2023/12/04 16:37:26] [ info] [task] re-schedule retry=0x7f5fe103a650 124 in the next 8 seconds
2023-12-04T16:37:26.321823294Z [2023/12/04 16:37:26] [ info] [task] re-schedule retry=0x7f5fe103a6f0 135 in the next 11 seconds
2023-12-04T16:37:26.321848212Z [2023/12/04 16:37:26] [ info] [task] re-schedule retry=0x7f5fe103a740 128 in the next 9 seconds
2023-12-04T16:37:26.321858391Z [2023/12/04 16:37:26] [ info] [task] re-schedule retry=0x7f5fe103a970 159 in the next 9 seconds
2023-12-04T16:37:26.321883699Z [2023/12/04 16:37:26] [ info] [task] re-schedule retry=0x7f5fe103a150 99 in the next 6 seconds
2023-12-04T16:37:26.321899940Z [2023/12/04 16:37:26] [ info] [task] re-schedule retry=0x7f5fe103a178 101 in the next 7 seconds
2023-12-04T16:37:26.321910360Z [2023/12/04 16:37:26] [ info] [task] re-schedule retry=0x7f5fe103a1a0 103 in the next 9 seconds
2023-12-04T16:37:27.321831675Z [2023/12/04 16:37:27] [ info] [task] re-schedule retry=0x7f5fe103a538 114 in the next 11 seconds
2023-12-04T16:37:27.321831675Z [2023/12/04 16:37:27] [ info] [task] re-schedule retry=0x7f5fe103a5b0 120 in the next 6 seconds
2023-12-04T16:37:27.321858236Z [2023/12/04 16:37:27] [ info] [task] re-schedule retry=0x7f5fe103a5d8 127 in the next 9 seconds
2023-12-04T16:37:27.321858236Z [2023/12/04 16:37:27] [ info] [task] re-schedule retry=0x7f5fe103a600 122 in the next 10 seconds
2023-12-04T16:37:27.321874827Z [2023/12/04 16:37:27] [ info] [task] re-schedule retry=0x7f5fe103a880 151 in the next 11 seconds
2023-12-04T16:37:27.321887772Z [2023/12/04 16:37:27] [ info] [task] re-schedule retry=0x7f5fe103a9e8 163 in the next 11 seconds
2023-12-04T16:37:27.321903180Z [2023/12/04 16:37:27] [ info] [task] re-schedule retry=0x7f5fe103a330 100 in the next 11 seconds
2023-12-04T16:37:27.321917248Z [2023/12/04 16:37:27] [ info] [task] re-schedule retry=0x7f5fe103a420 121 in the next 11 seconds
2023-12-04T16:37:27.321934771Z [2023/12/04 16:37:27] [ info] [task] re-schedule retry=0x7f5fe103a498 106 in the next 7 seconds
2023-12-04T16:37:27.321952334Z [2023/12/04 16:37:27] [ info] [task] re-schedule retry=0x7f5fe103a510 112 in the next 6 seconds
2023-12-04T16:37:28.321833302Z [2023/12/04 16:37:28] [ info] [task] re-schedule retry=0x7f5fe103a560 116 in the next 6 seconds
2023-12-04T16:37:28.321833302Z [2023/12/04 16:37:28] [ info] [task] re-schedule retry=0x7f5fe103a588 118 in the next 8 seconds
2023-12-04T16:37:28.321833302Z [2023/12/04 16:37:28] [ info] [task] re-schedule retry=0x7f5fe103a6c8 126 in the next 9 seconds
2023-12-04T16:37:28.321858811Z [2023/12/04 16:37:28] [ info] [task] re-schedule retry=0x7f5fe103a718 137 in the next 9 seconds
2023-12-04T16:37:28.321882205Z [2023/12/04 16:37:28] [ info] [task] re-schedule retry=0x7f5fe1039d68 53 in the next 8 seconds
2023-12-04T16:37:28.321890150Z [2023/12/04 16:37:28] [ info] [task] re-schedule retry=0x7f5fe1039ea8 71 in the next 7 seconds
2023-12-04T16:37:28.321908856Z [2023/12/04 16:37:28] [ info] [task] re-schedule retry=0x7f5fe103a858 149 in the next 11 seconds
2023-12-04T16:37:28.321927872Z [2023/12/04 16:37:28] [ info] [task] re-schedule retry=0x7f5fe103a920 155 in the next 6 seconds
2023-12-04T16:37:28.321946737Z [2023/12/04 16:37:28] [ info] [task] re-schedule retry=0x7f5fe103a380 104 in the next 7 seconds
2023-12-04T16:37:28.321961676Z [2023/12/04 16:37:28] [ info] [task] re-schedule retry=0x7f5fe103aba0 179 in the next 11 seconds
2023-12-04T16:37:28.324578511Z [2023/12/04 16:37:28] [engine] caught signal (SIGSEGV)
2023-12-04T16:37:28.724282978Z #0 0x536f5e in ???() at ???:0
2023-12-04T16:37:28.724301284Z #1 0x759da5 in co_init() at lib/monkey/deps/flb_libco/amd64.c:117
2023-12-04T16:37:28.724308287Z #2 0x7f5fd181b8ff in ???() at ???:0

Fluentbit configuration i have
apiVersion: v1
data:
fluent-bit.conf: |
[SERVICE]
Parsers_File /etc/fluentbit/config/parsers.conf
Flush 1
storage.path /var/fluentbit/state/flb-storage/
storage.sync normal
storage.checksum off
storage.backlog.mem_limit 5MB
storage.max_chunks_up 128
#storage.total_limit_size 10MB
storage.delete_irrecoverable_chunks On
#Grace 60

[INPUT]
  Name              tail
  Path              /demo/demo-application.log
  Parser            json
  Db                /tmp/logs.db
  # Tag - analytics.<pod>.<optional_container>.<stream>
  Tag               demo-application.log
  Buffer_Chunk_Size 256K
  Buffer_Max_Size   1MB
  #Mem_Buf_Limit     20MB
  storage.type      filesystem
  storage.pause_on_chunks_overlimit On
  Skip_Long_Lines   On
  Rotate_Wait       20
  Refresh_Interval  10
  Skip_Empty_Lines  On
  Read_from_Head    True
  #Ignore_Older      1m
[FILTER]
  Name record_modifier
  Match *
  Record shard ${SHARD_KEY}
  Record cluster ${CLUSTER_KEY}
  Record kubernetes.host ${MY_NODE_NAME}
  Record kubernetes.pod_id ${MY_POD_UID}
  Record kubernetes.pod_name ${MY_POD_NAME}
  Record kubernetes.container_image ${MY_IMAGE}
  Record kubernetes.namespace_name ${MY_POD_NAMESPACE}
  Record kubernetes.container_name demo-container
[OUTPUT]
  Name forward
  Host ${MY_NODE_NAME}
  Port 24224
  Time_as_Integer On
  Match *
  #Retry_Limit 1
  net.keepalive on
  net.keepalive_idle_timeout 10
  net.keepalive_max_recycle 2000
  net.connect_timeout_log_error true
  net.connect_timeout 60

parsers.conf: |
[PARSER]
Name json
Format json
Time_Key time
Time_Format %s.%L
Time_Strict Off

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Stale status: waiting-for-triage waiting-for-user Waiting for more information, tests or requested changes
Projects
None yet
Development

No branches or pull requests

4 participants