-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fluent-bit pod generate core dump every time the pod is restarted after migrating to 1.4.2 #2127
Comments
is that golang plugin open source and available somewhere to see the code ? |
that golang plugin is not opensourced, but it worked when I use fluent-bit 1.1.0. Actually, you can use the example from fluent-bit-go repo in github to reproduce.. |
I tried on 1.1 and 1.4 and I see no issues on SIGTERM:
I am happy to help if you provide a simplified plugin (and its source) to reproduce the issue. |
Indeed, I just downloaded some example and I am testing with fluent-bit 1.4.2. There is no such error. So the problem is narrowed down to my plugin that is probably not working well. I will share my finding in this thread tomorrow, if I find the problem. Otherwise I will create a simplified version of my plugin and ask for your help @edsiper Thank you for your support |
I was wondering if my problem is related to this change: #1421. As the core dump stack trace is actually from flb_proxy_cb_exit (). |
actually, this is the log I have for the cored one:
[2020/04/23 17:21:52] [ info] Configuration: And this is the one without core:
[2020/04/23 17:22:59] [ info] [storage] version=1.0.3, initializing... The main difference is at the exit time, for fluent-bit 1.4.2, there is another SIGSEGV which I think is the cause of the core. However, the one for 1.1.0 doesn't have that SIGSEGV. They are tested with the exactly same out_plugin. |
@huzhou The changes from #1421 haven't affected any of the AWS go plugins, which I maintain. Looking at the stacktrace, I think this is a bug in your Go code. Before 1.4, the go plugin exit callback was not actually called. It would never be executed. So if your Go code had a bug, it would not show up because it was never run. Starting with 1.4, it will be run, so that is probably why you are seeing an issue now. |
@PettitWesley Thanks for the info. Can you please share a bit more info on how the exit callback is working? In my code, I just simply return a |
Hi @PettitWesley and @edsiper |
@huzhou Awesome! Can we close the issue? |
Yes |
Actually it is a bug in OUR code. If the golang function is not exporting the symbol will not be resolved in our side, leading to have I've pushed a fix into a test branch I am working now that will be merged shortly. The fix is here: 379b244 |
Do you think this fix can potentially be applied to all the register, initialize and flush steps of the plugin? |
yeah found another place where there is no check... ideally, the fix must do the checks and make sure at least one flush() callback type is defined, otherwise abort and let the user know about it anyone ? |
sounds good, actually, if there is no out plugin registered, you can probably already abort. If the plugin is registered, then it must be initialized and flushable. |
initialization check is there, but no for flush |
+1 |
Bug Report
Describe the bug
A core dump is generated every time the fluent-bit pod is restarted/shutdown, after we migrate to fluent-bit 1.4.2.
To Reproduce
kubectl delete pod <fluent-bit-pod>
Expected behavior
Pod restart/shutdown properly without core dump generated.
Screenshots
Your Environment
[SERVICE]
Flush 1
Daemon off
Log_Level debug
[INPUT]
Name tail
Path /etc/fluent-bit/fdf/fluent-bit_fdf*
Refresh_Interval 2
[OUTPUT]
Name fdf_prom_plugin
Match *
where fdf_prom_plugin is written using fluent-bit-go
Additional context
I have the stacktrace of the core dump, it seems like that the signal handling has some issue or bug. Here is the stack trace:
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-94.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
http://www.gnu.org/software/gdb/bugs/...
Reading symbols from /fluent-bit/bin/fluent-bit...done.
[New LWP 1]
[New LWP 10]
[New LWP 12]
[New LWP 7]
[New LWP 8]
[New LWP 9]
[New LWP 11]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by
/fluent-bit/bin/fluent-bit -c /etc/fluent-bit/config.flb -e /etc/plugin/out_plu'. Program terminated with signal 11, Segmentation fault. #0 0x00007f6c75ef2a17 in abort () from /lib64/libc.so.6 warning: Missing auto-load scripts referenced in section .debug_gdb_scripts of file /etc/plugin/out_plugin.so Use
info auto-load python [REGEXP]' to list them.Missing separate debuginfos, use: debuginfo-install glibc-2.17-157.el7.x86_64 libgcc-4.8.5-11.el7.x86_64
(gdb) bt
#0 0x00007f6c75ef2a17 in abort () from /lib64/libc.so.6
#1 0x000000000042b374 in flb_signal_handler ()
#2 0x00007f6c752bd81d in runtime.sigfwd () at /usr/lib/golang/src/runtime/sys_linux_amd64.s:286
#3 0x00007fffe7de3cd8 in ?? ()
#4 0x00007f6c752a47fb in runtime.sigfwdgo (sig=11, info=0x7fffe7de3f30, ctx=0x7fffe7de3e00, ~r3=false) at /usr/lib/golang/src/runtime/signal_unix.go:630
#5 0x00007f6c752a3b1b in runtime.sigtrampgo (sig=11, info=0x7fffe7de3f30, ctx=0x7fffe7de3e00) at /usr/lib/golang/src/runtime/signal_unix.go:272
#6 0x00007f6c752bd873 in runtime.sigtramp () at /usr/lib/golang/src/runtime/sys_linux_amd64.s:306
#7
#8 0x0000000000000000 in ?? ()
#9 0x00000000004578af in flb_proxy_cb_exit ()
#10 0x000000000043cee4 in flb_output_exit ()
#11 0x00000000004476e0 in flb_engine_shutdown ()
#12 0x0000000000447532 in flb_engine_start ()
#13 0x000000000042c4ee in main ()
The text was updated successfully, but these errors were encountered: