Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

in_systemd: prevent infinite loop #899

Closed
kabakaev opened this issue Nov 10, 2018 · 0 comments · Fixed by #1285
Closed

in_systemd: prevent infinite loop #899

kabakaev opened this issue Nov 10, 2018 · 0 comments · Fixed by #1285

Comments

@kabakaev
Copy link
Contributor

Bug Report

Fluent-bit-v0.14.4 was caught in 100% CPU inside sd_journal_enumerate_data() of in_systemd. The GDB backtrace is given below.

The systemd-journald.service was restarted on that node prior to the observed 100% CPU loop, which might have triggered the issue.
Here we see that negative return values are possible:
https://github.com/systemd/systemd/blob/9e8b1ec08e8eb0b4611b7caf6adb8828feb32312/src/journal/sd-journal.c#L2312

assert_return(!journal_pid_changed(j), -ECHILD);

But the negative values of the sd_journal_enumerate_data() are not handled in the current FLB:

while (sd_journal_enumerate_data(ctx->j, &data, &length)) {
    entries++;
}

I was unable to reproduce the issue so far. It seems that it was a combination of journald restart with some other factors, such as removal of a systemd log database file due to journald vacuum.

Backtrace of the frozen process:

gdb /fluent-bit/bin/fluent-bit 15
Attaching to program: /fluent-bit/bin/fluent-bit, process 15
[New LWP 17]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007f93c268b883 in ?? () from /lib/x86_64-linux-gnu/libsystemd.so.0
(gdb) i thr
  Id   Target Id         Frame
* 1    Thread 0x7f93c194af00 (LWP 15) "fluent-bit" 0x00007f93c268b883 in ?? () from /lib/x86_64-linux-gnu/libsystemd.so.0
  2    Thread 0x7f93c13ff700 (LWP 17) "fluent-bit" 0x00007f93c21c7207 in epoll_wait (epfd=3, events=0x7f93c144e300, maxevents=16, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
(gdb) thread  1
[Switching to thread 1 (Thread 0x7f93c194af00 (LWP 15))]
#0  0x00007f93c268b883 in ?? () from /lib/x86_64-linux-gnu/libsystemd.so.0
(gdb) bt
#0  0x00007f93c268b883 in ?? () from /lib/x86_64-linux-gnu/libsystemd.so.0
#1  0x00007f93c26c66f7 in sd_journal_enumerate_data () from /lib/x86_64-linux-gnu/libsystemd.so.0
#2  0x0000561f6fb1005f in in_systemd_collect (i_ins=0x7f93c1466480, config=0x7f93c141c1c0, in_context=0x7f93c143b000) at /tmp/src/plugins/in_systemd/systemd.c:147
#3  0x0000561f6fb10427 in in_systemd_collect_archive (i_ins=0x7f93c1466480, config=0x7f93c141c1c0, in_context=0x7f93c143b000) at /tmp/src/plugins/in_systemd/systemd.c:259
#4  0x0000561f6fad7f37 in flb_input_collector_fd (fd=15, config=0x7f93c141c1c0) at /tmp/src/src/flb_input.c:995
#5  0x0000561f6fae0421 in flb_engine_handle_event (config=0x7f93c141c1c0, mask=1, fd=15) at /tmp/src/src/flb_engine.c:296
#6  flb_engine_start (config=0x7f93c141c1c0) at /tmp/src/src/flb_engine.c:515
#7  0x0000561f6fa7c332 in main (argc=4, argv=0x7ffc435bd108) at /tmp/src/src/fluent-bit.c:824

Unfortunately, GDB messed up with this process upon detach, so i was not able to step over it to find out the actual return value.

kabakaev added a commit to c445/fluent-bit that referenced this issue Apr 17, 2019
It fixes fluent-bit issue fluent#899.

Signed-off-by: Alexander Kabakaev <[email protected]>
kabakaev added a commit to c445/fluent-bit that referenced this issue May 1, 2019
It fixes fluent-bit issue fluent#899.

Signed-off-by: Alexander Kabakaev <[email protected]>
kabakaev added a commit to c445/fluent-bit that referenced this issue May 7, 2019
It fixes fluent-bit issue fluent#899.

Signed-off-by: Alexander Kabakaev <[email protected]>
edsiper pushed a commit that referenced this issue May 7, 2019
It fixes fluent-bit issue #899.

Signed-off-by: Alexander Kabakaev <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant